Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Quantifying Work02:30

Quantifying Work

20.9K
As a system undergoes a change, its internal energy can change, and energy can be transferred from the system to the surroundings, or from the surroundings to the system. 
20.9K
Statistical Analysis: Overview01:11

Statistical Analysis: Overview

6.9K
When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...
6.9K
Regression Analysis01:11

Regression Analysis

6.0K
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
6.0K
Manipulation and Analysis01:21

Manipulation and Analysis

52
GIS manipulation and analysis functions are vital for decision-making and planning. These activities range from data retrieval tasks, such as selecting information based on specific criteria, to advanced analytical techniques that address complex spatial problems.One critical GIS analysis method is overlaying, which combines multiple data layers to examine impacts. For example, overlaying a river-dammed lake boundary with road networks can identify affected infrastructure. Another common...
52
Steps in Outbreak Investigation01:18

Steps in Outbreak Investigation

170
In the ever-evolving field of public health, statistical analysis serves as a cornerstone for understanding and managing disease outbreaks. By leveraging various statistical tools, health professionals can predict potential outbreaks, analyze ongoing situations, and devise effective responses to mitigate impact. For that to happen, there are a few possible stages of the analysis:
170
Interpreting Run Charts01:25

Interpreting Run Charts

244
Run charts, essentially line graphs plotted over time, serve as fundamental yet effective tools for process analysis. They chronicle data sequentially, facilitating the identification of trends, shifts, or cyclical movements. This graphical representation is instrumental in determining whether a process is stable or exhibits signs of potential instability indicative of special cause variation. In the healthcare domain, run charts depict infection rates over time, enabling hospitals to monitor...
244

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same authorSame journal

AI support for data scientists: An empirical study on workflow and alternative code recommendations.

Empirical software engineering·2025
Same author

Adaptive political surveys and GPT-4: Tackling the cold start problem with simulated user interactions.

PloS one·2025
Same author

Simplifying software compliance: AI technologies in drafting technical documentation for the AI Act.

Empirical software engineering·2025
Same author

We need to understand the effect of narratives about generative AI.

Nature human behaviour·2024
Same author

On Refining the SZZ Algorithm with Bug Discussion Data.

Empirical software engineering·2024
Same author

Trolleys, crashes, and perception-a survey on how current autonomous vehicles debates invoke problematic expectations.

AI and ethics·2024
Same journal

How students use generative AI for software testing: An observational study.

Empirical software engineering·2026
Same journal

Is common sense all you need? Using expert defined rules to identify vulnerability patches instead of machine learning.

Empirical software engineering·2026
Same journal

Less is more: usefulness of data flow diagrams and large language models for security threat validation.

Empirical software engineering·2026
Same journal

SecMLOps: A comprehensive framework for integrating security throughout the machine learning operations lifecycle.

Empirical software engineering·2026
Same journal

Tools and benchmarks evolve: what is their impact on parameter tuning in SBSE experiments?

Empirical software engineering·2025
See all related articles

Related Experiment Video

Updated: Aug 20, 2025

Inherent Dynamics Visualizer, an Interactive Application for Evaluating and Visualizing Outputs from a Gene Regulatory Network Inference Pipeline
10:44

Inherent Dynamics Visualizer, an Interactive Application for Evaluating and Visualizing Outputs from a Gene Regulatory Network Inference Pipeline

Published on: December 7, 2021

2.2K

Workflow analysis of data science code in public GitHub repositories.

Dhivyabharathi Ramasamy1, Cristina Sarasua1, Alberto Bacchelli1

  • 1Department of Informatics, University of Zurich, Zurich, Switzerland.

Empirical Software Engineering
|November 24, 2022
PubMed
Summary
This summary is machine-generated.

This study reveals that data science coding is iterative, with scientists frequently transitioning between tasks like data preprocessing and modeling. Analyzing Jupyter notebooks shows specific workflow patterns, aiding tool development.

Keywords:
Data scienceData science life cycleData science workflowJupyter notebooksNotebooksSource code classificationWorkflow analysis

More Related Videos

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts
08:51

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

1.4K
Author Spotlight: A Computational Pipeline for Analyzing Chimeric Noncoding RNA-Target RNA Interactions in High-Throughput Sequencing Data
07:35

Author Spotlight: A Computational Pipeline for Analyzing Chimeric Noncoding RNA-Target RNA Interactions in High-Throughput Sequencing Data

Published on: December 1, 2023

750

Related Experiment Videos

Last Updated: Aug 20, 2025

Inherent Dynamics Visualizer, an Interactive Application for Evaluating and Visualizing Outputs from a Gene Regulatory Network Inference Pipeline
10:44

Inherent Dynamics Visualizer, an Interactive Application for Evaluating and Visualizing Outputs from a Gene Regulatory Network Inference Pipeline

Published on: December 7, 2021

2.2K
Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts
08:51

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

1.4K
Author Spotlight: A Computational Pipeline for Analyzing Chimeric Noncoding RNA-Target RNA Interactions in High-Throughput Sequencing Data
07:35

Author Spotlight: A Computational Pipeline for Analyzing Chimeric Noncoding RNA-Target RNA Interactions in High-Throughput Sequencing Data

Published on: December 1, 2023

750

Area of Science:

  • Computer Science
  • Software Engineering
  • Data Science

Background:

  • Understanding data science coding practices is crucial for improving developer tools and workflows.
  • Existing literature suggests data science coding is iterative but lacks detailed empirical evidence.

Purpose of the Study:

  • To empirically investigate the iterative and explorative nature of data science coding.
  • To identify common transitions and patterns in data science workflows.
  • To inform the development of better tooling for data scientists.

Main Methods:

  • Analysis of 470 publicly available Jupyter notebooks from GitHub repositories.
  • Manual annotation of code cells into distinct data science steps by five domain experts.
  • Application of first-order Markov chain models to analyze transitions between steps.

Main Results:

  • Empirical evidence confirms the iterative nature of data science workflows.
  • Identified specific transition patterns and probabilities between different data science activities.
  • Machine learning models achieved an F1-score of approximately 71% in predicting data science steps.

Conclusions:

  • Data science coding workflows are characterized by frequent, iterative transitions between diverse tasks.
  • The findings provide valuable insights into the practical implementation of data science.
  • The annotated dataset and developed models show promise for automating workflow analysis and supporting data scientists.