Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

6.8K
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
6.8K
Pedigree Analysis01:35

Pedigree Analysis

88.7K
Overview
88.7K
Lineage Commitment01:21

Lineage Commitment

4.0K
Commitment is the  process whereby stem cells:
4.0K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

3.4K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
3.4K
Sanger Sequencing01:57

Sanger Sequencing

772.6K
DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...
772.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Aflatoxin M1 and ochratoxin A induce a competitive endogenous RNA regulatory network of intestinal immunosuppression by whole-transcriptome analysis.

The Science of the total environment·2022
Same author

FDA Approval Summary: Mobocertinib for Metastatic Non-Small Cell Lung Cancer with EGFR Exon 20 Insertion Mutations.

Clinical cancer research : an official journal of the American Association for Cancer Research·2022
Same author

Evaluation of the shortening of the stimulus-to-peak left ventricular activation time at continuous low output to confirm left bundle branch capture.

Heart rhythm O2·2022
Same author

Epidemiological characteristics and transmission dynamics of the outbreak caused by the SARS-CoV-2 Omicron variant in Shanghai, China: A descriptive study.

The Lancet regional health. Western Pacific·2022
Same author

Case report: Multiple gastrointestinal perforations in a rare musculocontractural Ehlers-Danlos syndrome with multiple organ dysfunction.

Frontiers in genetics·2022
Same author

Morphological transition and transformation of 2D nanosheets by controlling the balance of π<b>-</b>π stacking interaction and crystalline driving forces.

Materials horizons·2022
Same journal

A Neural Database for Answering Aggregate Queries on Incomplete Relational Data (Extended Abstract).

Proceedings. International Conference on Data Engineering·2024
Same journal

Wearables for Health (W4H) Toolkit for Acquisition, Storage, Analysis and Visualization of Data from Various Wearable Devices.

Proceedings. International Conference on Data Engineering·2024
Same journal

SPEAR: Dynamic Spatio-Temporal Query Processing over High Velocity Data Streams.

Proceedings. International Conference on Data Engineering·2022
Same journal

A Scalable Data Integration and Analysis Architecture for Sensor Data of Pediatric Asthma.

Proceedings. International Conference on Data Engineering·2018
Same journal

Integrated Theory- and Data-driven Feature Selection in Gene Expression Data Analysis.

Proceedings. International Conference on Data Engineering·2018
Same journal

Quantifying Differential Privacy under Temporal Correlations.

Proceedings. International Conference on Data Engineering·2017
See all related articles

Related Experiment Video

Updated: Jan 6, 2026

The Terroir Concept Interpreted through Grape Berry Metabolomics and Transcriptomics
13:02

The Terroir Concept Interpreted through Grape Berry Metabolomics and Transcriptomics

Published on: October 5, 2016

10.8K

Fine-Grained Provenance for Matching & ETL.

Nan Zheng1, Abdussalam Alawini2, Zachary G Ives1

  • 1University of Pennsylvania.

Proceedings. International Conference on Data Engineering
|October 10, 2019
PubMed
Summary
This summary is machine-generated.

Scientists need better data provenance tools for error tracing in complex tasks like ETL and matching. PROVision offers a new solution, tracking data provenance within objects to pinpoint errors and understand result variations.

More Related Videos

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.7K
HPLC Coupled with Chemical Fingerprinting for Multi-Pattern Recognition for Identifying the Authenticity of Clematidis Armandii Caulis
07:29

HPLC Coupled with Chemical Fingerprinting for Multi-Pattern Recognition for Identifying the Authenticity of Clematidis Armandii Caulis

Published on: November 11, 2022

2.4K

Related Experiment Videos

Last Updated: Jan 6, 2026

The Terroir Concept Interpreted through Grape Berry Metabolomics and Transcriptomics
13:02

The Terroir Concept Interpreted through Grape Berry Metabolomics and Transcriptomics

Published on: October 5, 2016

10.8K
Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.7K
HPLC Coupled with Chemical Fingerprinting for Multi-Pattern Recognition for Identifying the Authenticity of Clematidis Armandii Caulis
07:29

HPLC Coupled with Chemical Fingerprinting for Multi-Pattern Recognition for Identifying the Authenticity of Clematidis Armandii Caulis

Published on: November 11, 2022

2.4K

Area of Science:

  • Computer Science
  • Data Management
  • Scientific Computing

Background:

  • Existing data provenance tools have limitations in granularity (file-level vs. tuple-level) and scope, failing to adequately support common data science tasks like ETL, record alignment, and matching.
  • Current solutions like workflow systems, provenance APIs, and database provenance tools do not effectively trace errors within complex data types (e.g., strings, images) or identify the root causes of discrepancies between code versions or parameter values.

Purpose of the Study:

  • To introduce PROVision, a novel provenance-driven troubleshooting tool designed to address the limitations of existing systems.
  • To enable scientists to trace errors within data objects during ETL and matching computations, identify sources of errors, and understand the impact of code versions and parameters on analysis results.

Main Methods:

  • PROVision extends database-style provenance techniques to capture data equivalences and support optimizations.
  • The system enables selective evaluation and traces the extraction of content within data objects, going beyond traditional file- or tuple-level tracking.
  • Formalization of extensions, implementation within the PROVision system, and validation through common ETL and matching tasks.

Main Results:

  • PROVision effectively supports ETL and matching computations by tracing provenance within data objects.
  • The tool demonstrates capabilities in identifying error sources and understanding variations in results due to different code versions or parameter settings.
  • Validation confirms the effectiveness and scalability of PROVision for common data science workflows.

Conclusions:

  • PROVision offers a significant advancement in data provenance by enabling fine-grained, object-level tracking for troubleshooting.
  • The developed system addresses critical needs for scientists in identifying and resolving errors in complex data analysis tasks.
  • PROVision's approach enhances the reliability and reproducibility of scientific data analysis by providing deeper insights into the data processing pipeline.