Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Storage

Storage

A schema is a mental framework that helps individuals organize and interpret information. Schemata, formed from previous experiences, influence how we process new information: how we encode it, the inferences we make, and how we retrieve it. For instance, a schema for what a typical classroom looks like might include desks, a teacher's desk, a whiteboard, and students in such an environment. This expectation helps us quickly understand and navigate new classrooms without needing to analyze...

Distribution Reliability and Automation

Distribution Reliability and Automation

Distribution reliability in electrical power systems is critical for ensuring an uninterrupted power supply to consumers at minimal cost. According to IEEE Standard Terms, reliability is the probability that a device will function without failure over a specified time period or amount of usage. For electric power distribution, this translates to maintaining continuous power supply and addressing customer concerns over power outages. Several indices, as defined by IEEE Standard 1366-2012, are...

Data Reporting and Recording

Data Reporting and Recording

Reporting and recording are crucial in data documentation. The timely, thorough, and accurate documentation of facts is essential when recording patient data. Failure to record findings during an assessment or interpretation of a problem will result in loss of information and make the patient document unreliable. The reader is left with general impressions if the information is not specific. A recording is documenting data of the individual's health information in a traceable, secure, and...

Data: Types and Distribution

Data: Types and Distribution

In biostatistics, data are the observations collected for analysis. There are two main types: parametric and non-parametric. Parametric data, which include continuous (e.g., weight) and discrete numerical data (e.g., number of tablets), assume a particular distribution pattern, often the normal distribution. Non-parametric data do not adhere to a specific distribution and typically comprise nominal (e.g., gender) and ordinal categorical data (e.g., pain scale ratings).
Distributions in...

Archival Research

Archival Research

Some researchers gain access to large amounts of data without interacting with a single research participant. Instead, they use existing records to answer various research questions. This type of research approach is known as archival research. Archival research relies on looking at past records or data sets to look for interesting patterns or relationships. For example, a researcher might access the academic records of all individuals who enrolled in college within the past ten years and...

Methods of Documentation I: Source-Oriented Records

Methods of Documentation I: Source-Oriented Records

Source-oriented records, or SOR, are medical record-keeping organized by the data source. The SOR system was first developed in the mid-1900s to organize the growing patient data in hospitals and other healthcare facilities.
In an SOR, each discipline involved in patient care maintains a separate medical record section. This record-keeping method enables easy tracking of patient progress and ensures healthcare staff have access to up-to-date information.
Key Attributes include the following:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Ternary Strategy Enables 18.16% Efficiency in All-Small-Molecule Organic Solar Cells with Improved Fill Factor and Reduced Voltage Loss.

ACS applied materials & interfaces·2026

Same author

Thermodynamics and Phase Stability of SmF<sub>3</sub> with LiF, NaF, and KF for Molten Salt Reactor Applications.

ACS omega·2026

Same author

Extensive pityriasis versicolor presenting with truncal hypopigmentation: a rare clinical image.

The Pan African medical journal·2026

Same author

Cosolvent-Modulated Donor Preaggregation Enhances Molecular Order in 20% Efficient Bilayer Organic Solar Cells.

ACS applied materials & interfaces·2026

Same author

Rare case of congenital pterygium involving the right eye: clinical image.

The Pan African medical journal·2026

Same author

Correction: Comprehensive analysis of gut microbiota and fecal metabolites in patients with autism spectrum disorder.

Frontiers in microbiology·2026

Same journal

Uldp-FL: Federated Learning with Across-Silo User-Level Differential Privacy.

Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases·2025

Same journal

Efficient Join Algorithms For Large Database Tables in a Multi-GPU Environment.

Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases·2024

Same journal

Models and Mechanisms for Spatial Data Fairness.

Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases·2023

Same journal

Beyond Equi-joins: Ranking, Enumeration and Factorization.

Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases·2022

Same journal

Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries.

Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases·2021

Same journal

Snuba: Automating Weak Supervision to Label Training Data.

Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases·2019

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 25, 2026

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff.

Souvik Bhattacherjee¹, Amit Chavan¹, Silu Huang²

¹University of Maryland, College Park.

Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases

|July 29, 2017

Summary

This summary is machine-generated.

Managing dataset versions is challenging due to the storage-recreation trade-off. This study proposes efficient heuristics for dataset version management, balancing storage use and retrieval speed.

More Related Videos

A User-friendly and Powerful R Analysis of Large-scale Datasets

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

Author Spotlight: Automated Deep Brain Stimulation for Parkinson's Disease - Exploring the Possibilities and Challenges of Home Monitoring

Author Spotlight: Automated Deep Brain Stimulation for Parkinson's Disease - Exploring the Possibilities and Challenges of Home Monitoring

Published on: July 14, 2023

Related Experiment Videos

Last Updated: Feb 25, 2026

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

A User-friendly and Powerful R Analysis of Large-scale Datasets

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

Author Spotlight: Automated Deep Brain Stimulation for Parkinson's Disease - Exploring the Possibilities and Challenges of Home Monitoring

Author Spotlight: Automated Deep Brain Stimulation for Parkinson's Disease - Exploring the Possibilities and Challenges of Home Monitoring

Published on: July 14, 2023

Area of Science:

Data Science
Computer Science
Information Management

Background:

Collaborative data science generates numerous dataset versions, posing management and storage challenges.
The core issue is the storage-recreation trade-off: increased storage allows faster retrieval but uses more space.
Existing research on this fundamental problem is limited.

Purpose of the Study:

To systematically study the storage-recreation trade-off in dataset versioning.
To formulate and analyze tractable and intractable problems related to this trade-off.
To develop efficient heuristics for practical dataset version management.

Main Methods:

Formulation of six distinct problems addressing the storage-recreation trade-off under various constraints.
Demonstration of the intractability of most formulated problems.
Development of heuristics inspired by delay-constrained scheduling and spanning tree algorithms.

Main Results:

Proposed heuristics offer efficient solutions for dataset versioning scenarios.
Experimental validation confirms the practical effectiveness of the developed heuristics.
A prototype version management system was built as a foundation for DataHub.

Conclusions:

The proposed heuristics effectively address the storage-recreation trade-off in dataset versioning.
The developed system provides a practical foundation for collaborative data science environments.
Further research can build upon these heuristics for enhanced data management systems.