Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Storage01:23

Storage

437
A schema is a mental framework that helps individuals organize and interpret information. Schemata, formed from previous experiences, influence how we process new information: how we encode it, the inferences we make, and how we retrieve it. For instance, a schema for what a typical classroom looks like might include desks, a teacher's desk, a whiteboard, and students in such an environment. This expectation helps us quickly understand and navigate new classrooms without needing to analyze...
437
Distribution Reliability and Automation01:25

Distribution Reliability and Automation

542
Distribution reliability in electrical power systems is critical for ensuring an uninterrupted power supply to consumers at minimal cost. According to IEEE Standard Terms, reliability is the probability that a device will function without failure over a specified time period or amount of usage. For electric power distribution, this translates to maintaining continuous power supply and addressing customer concerns over power outages. Several indices, as defined by IEEE Standard 1366-2012, are...
542
Data Reporting and Recording01:24

Data Reporting and Recording

5.5K
Reporting and recording are crucial in data documentation. The timely, thorough, and accurate documentation of facts is essential when recording patient data. Failure to record findings during an assessment or interpretation of a problem will result in loss of information and make the patient document unreliable. The reader is left with general impressions if the information is not specific. A recording is documenting data of the individual's health information in a traceable, secure, and...
5.5K
Data: Types and Distribution01:19

Data: Types and Distribution

2.0K
In biostatistics, data are the observations collected for analysis. There are two main types: parametric and non-parametric. Parametric data, which include continuous (e.g., weight) and discrete numerical data (e.g., number of tablets), assume a particular distribution pattern, often the normal distribution. Non-parametric data do not adhere to a specific distribution and typically comprise nominal (e.g., gender) and ordinal categorical data (e.g., pain scale ratings).
Distributions in...
2.0K
Archival Research01:40

Archival Research

17.5K
Some researchers gain access to large amounts of data without interacting with a single research participant. Instead, they use existing records to answer various research questions. This type of research approach is known as archival research. Archival research relies on looking at past records or data sets to look for interesting patterns or relationships. For example, a researcher might access the academic records of all individuals who enrolled in college within the past ten years and...
17.5K
Methods of Documentation I: Source-Oriented Records01:18

Methods of Documentation I: Source-Oriented Records

1.8K
Source-oriented records, or SOR, are medical record-keeping organized by the data source. The SOR system was first developed in the mid-1900s to organize the growing patient data in hospitals and other healthcare facilities.
In an SOR, each discipline involved in patient care maintains a separate medical record section. This record-keeping method enables easy tracking of patient progress and ensures healthcare staff have access to up-to-date information.
Key Attributes include the following:
1.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Ternary Strategy Enables 18.16% Efficiency in All-Small-Molecule Organic Solar Cells with Improved Fill Factor and Reduced Voltage Loss.

ACS applied materials & interfaces·2026
Same author

Thermodynamics and Phase Stability of SmF<sub>3</sub> with LiF, NaF, and KF for Molten Salt Reactor Applications.

ACS omega·2026
Same author

Extensive pityriasis versicolor presenting with truncal hypopigmentation: a rare clinical image.

The Pan African medical journal·2026
Same author

Cosolvent-Modulated Donor Preaggregation Enhances Molecular Order in 20% Efficient Bilayer Organic Solar Cells.

ACS applied materials & interfaces·2026
Same author

Rare case of congenital pterygium involving the right eye: clinical image.

The Pan African medical journal·2026
Same author

Correction: Comprehensive analysis of gut microbiota and fecal metabolites in patients with autism spectrum disorder.

Frontiers in microbiology·2026
Same journal

Uldp-FL: Federated Learning with Across-Silo User-Level Differential Privacy.

Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases·2025
Same journal

Efficient Join Algorithms For Large Database Tables in a Multi-GPU Environment.

Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases·2024
Same journal

Models and Mechanisms for Spatial Data Fairness.

Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases·2023
Same journal

Beyond Equi-joins: Ranking, Enumeration and Factorization.

Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases·2022
Same journal

Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries.

Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases·2021
Same journal

Snuba: Automating Weak Supervision to Label Training Data.

Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases·2019
See all related articles

Related Experiment Video

Updated: Feb 25, 2026

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
09:43

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

6.8K

Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff.

Souvik Bhattacherjee1, Amit Chavan1, Silu Huang2

  • 1University of Maryland, College Park.

Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases
|July 29, 2017
PubMed
Summary
This summary is machine-generated.

Managing dataset versions is challenging due to the storage-recreation trade-off. This study proposes efficient heuristics for dataset version management, balancing storage use and retrieval speed.

More Related Videos

A User-friendly and Powerful R Analysis of Large-scale Datasets
10:56

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

417
Author Spotlight: Automated Deep Brain Stimulation for Parkinson's Disease - Exploring the Possibilities and Challenges of Home Monitoring
06:32

Author Spotlight: Automated Deep Brain Stimulation for Parkinson's Disease - Exploring the Possibilities and Challenges of Home Monitoring

Published on: July 14, 2023

1.9K

Related Experiment Videos

Last Updated: Feb 25, 2026

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
09:43

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

6.8K
A User-friendly and Powerful R Analysis of Large-scale Datasets
10:56

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

417
Author Spotlight: Automated Deep Brain Stimulation for Parkinson's Disease - Exploring the Possibilities and Challenges of Home Monitoring
06:32

Author Spotlight: Automated Deep Brain Stimulation for Parkinson's Disease - Exploring the Possibilities and Challenges of Home Monitoring

Published on: July 14, 2023

1.9K

Area of Science:

  • Data Science
  • Computer Science
  • Information Management

Background:

  • Collaborative data science generates numerous dataset versions, posing management and storage challenges.
  • The core issue is the storage-recreation trade-off: increased storage allows faster retrieval but uses more space.
  • Existing research on this fundamental problem is limited.

Purpose of the Study:

  • To systematically study the storage-recreation trade-off in dataset versioning.
  • To formulate and analyze tractable and intractable problems related to this trade-off.
  • To develop efficient heuristics for practical dataset version management.

Main Methods:

  • Formulation of six distinct problems addressing the storage-recreation trade-off under various constraints.
  • Demonstration of the intractability of most formulated problems.
  • Development of heuristics inspired by delay-constrained scheduling and spanning tree algorithms.

Main Results:

  • Proposed heuristics offer efficient solutions for dataset versioning scenarios.
  • Experimental validation confirms the practical effectiveness of the developed heuristics.
  • A prototype version management system was built as a foundation for DataHub.

Conclusions:

  • The proposed heuristics effectively address the storage-recreation trade-off in dataset versioning.
  • The developed system provides a practical foundation for collaborative data science environments.
  • Further research can build upon these heuristics for enhanced data management systems.