Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Statistical Analysis System (SAS)01:14

Statistical Analysis System (SAS)

618
SAS, short for Statistical Analysis System, is a powerful data analysis, management, and visualization tool. Developed by the SAS Institute in the early 1970s, SAS has evolved into a comprehensive software suite used across various industries for statistical analysis, business intelligence, and predictive modeling.
Applications: SAS finds applications in numerous fields, including healthcare for clinical trial analysis, finance for risk assessment, marketing for customer data analysis, and...
618
Statistical Software for Data Analysis and Clinical Trials01:12

Statistical Software for Data Analysis and Clinical Trials

1.2K
Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...
1.2K
Run Charts01:12

Run Charts

191
Run charts serve as an essential instrument for visualizing the performance of various processes over time, enabling the identification of trends and patterns crucial for quality improvement. These charts map out a series of data points chronologically, offering insights into the stability and efficiency of a process. A run chart's creation involves plotting data points on a graph, with the time intervals on the horizontal axis and the specific measurements on the vertical axis. For...
191
Contingency Table01:29

Contingency Table

3.6K
A contingency table provides a way of portraying data that can facilitate calculating probabilities. It is a method of displaying a frequency distribution as a table with rows and columns to show how two variables may be dependent (contingent) upon each other; The table helps determine conditional probabilities quite quickly and can help systematically organize, analyze and quantify data. The table displays sample values concerning two variables that may be dependent or contingent on one...
3.6K
Statgraphics01:10

Statgraphics

292
Statgraphics is a comprehensive statistical software suite designed for both basic and advanced data analysis. Originating in 1980 at Princeton University under Dr. Neil W. Polhemus, it was one of the pioneering tools for statistical computing on personal computers, with its public release in 1982 marking an early milestone in data science software. Over the years, it has evolved into a robust platform for data science, offering tools for regression analysis, ANOVA, multivariate statistics,...
292
Performing a Simple Data Analysis using MS-Excel Function01:17

Performing a Simple Data Analysis using MS-Excel Function

791
Microsoft Excel offers a suite of functions and tools ideal for statistical analysis, making it accessible to students and researchers. This article outlines fundamental Excel functions pivotal for data analysis.
SUM: This function calculates the total sum of a range of values. It's the foundation for aggregating data, essential for determining overall trends and totals in datasets.
AVERAGE: It computes the mean value of a given set of numbers, providing a quick insight into the central...
791

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

An integrated fluorescence detection system for lab-on-a-chip applications.

Lab on a chip·2006
Same author

Rare group I intron with insertion sequence element in a bacterial ribonucleotide reductase gene.

Journal of bacteriology·2006
Same author

Degradation mechanisms of 4-chlorophenol in a novel gas-liquid hybrid discharge reactor by pulsed high voltage system with oxygen or nitrogen bubbling.

Chemosphere·2006
Same author

[Routine chemotherapeutic drug treatment effectiveness predictive molecules and chemotherapeutic drug selection].

Ai zheng = Aizheng = Chinese journal of cancer·2006
Same author

[Western Blot analysis of type I, III, V, VI collagen after laser epithelial keratomileusis and photorefractive keratectomy in cornea of rabbits].

Yan ke xue bao = Eye science·2006
Same author

Hyperpolarization-activated cyclic nucleotide-gated channels in pancreatic beta-cells.

Molecular endocrinology (Baltimore, Md.)·2006
Same journal

Structural Generalizability: The Case of Similarity Search.

Proceedings. ACM-SIGMOD International Conference on Management of Data·2026
Same journal

Flexible and Feasible Support Measures for Mining Frequent Patterns in Large Labeled Graphs.

Proceedings. ACM-SIGMOD International Conference on Management of Data·2024
Same journal

iQCAR: inter-Query Contention Analyzer for Data Analytics Frameworks.

Proceedings. ACM-SIGMOD International Conference on Management of Data·2021
Same journal

Optimal Join Algorithms Meet Top-<i>k</i>.

Proceedings. ACM-SIGMOD International Conference on Management of Data·2021
Same journal

Near-Optimal Distributed Band-Joins through Recursive Partitioning.

Proceedings. ACM-SIGMOD International Conference on Management of Data·2021
Same journal

Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances.

Proceedings. ACM-SIGMOD International Conference on Management of Data·2020
See all related articles

Related Experiment Video

Updated: Dec 2, 2025

Facilitating the Analysis of Immunological Data with Visual Analytic Techniques
10:58

Facilitating the Analysis of Immunological Data with Visual Analytic Techniques

Published on: January 2, 2011

10.4K

Finding Related Tables in Data Lakes for Interactive Data Science.

Yi Zhang1, Zachary G Ives1

  • 1University of Pennsylvania, Philadelphia, PA.

Proceedings. ACM-SIGMOD International Conference on Management of Data
|November 2, 2020
PubMed
Summary
This summary is machine-generated.

Data scientists need better tools to find and manage data in data lakes. This study introduces search and management solutions for Jupyter Notebooks to improve data discovery and usability.

Keywords:
data lakesinteractive data sciencenotebookstable search

More Related Videos

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
09:43

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

6.6K
Generation of Comprehensive Thoracic Oncology Database - Tool for Translational Research
11:18

Generation of Comprehensive Thoracic Oncology Database - Tool for Translational Research

Published on: January 22, 2011

16.3K

Related Experiment Videos

Last Updated: Dec 2, 2025

Facilitating the Analysis of Immunological Data with Visual Analytic Techniques
10:58

Facilitating the Analysis of Immunological Data with Visual Analytic Techniques

Published on: January 2, 2011

10.4K
Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
09:43

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

6.6K
Generation of Comprehensive Thoracic Oncology Database - Tool for Translational Research
11:18

Generation of Comprehensive Thoracic Oncology Database - Tool for Translational Research

Published on: January 22, 2011

16.3K

Area of Science:

  • Data Science
  • Computer Science

Background:

  • Data lakes offer vast data storage but lack organization, hindering data discovery for scientific applications.
  • Modern data science workflows require efficient access to tables, schemas, and datasets within these repositories.

Purpose of the Study:

  • To develop integrated search and management solutions for data lakes within the Jupyter Notebook environment.
  • To enhance data scientists' ability to find and utilize relevant data for tasks like feature extraction and data augmentation.

Main Methods:

  • Development of search functionalities tailored for schema-agnostic data lake repositories.
  • Integration of data management tools within the Jupyter Notebook platform.
  • Generalization of core methods for broader application in script-based computational tasks.

Main Results:

  • Scientists can now more easily locate and leverage data lake assets within their existing workflows.
  • Improved efficiency in augmenting training data, identifying features, and finding joinable tables.
  • Demonstrated applicability of the developed methods beyond the initial Jupyter Notebook environment.

Conclusions:

  • The developed search and management solutions significantly improve data discoverability and usability in data lake environments.
  • These tools empower data scientists to more effectively utilize data lake resources for various analytical tasks.
  • The generalized methods offer a scalable approach to data management in diverse computational settings.