Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Statistical Software for Data Analysis and Clinical Trials01:12

Statistical Software for Data Analysis and Clinical Trials

1.4K
Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...
1.4K
Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test01:09

Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test

5.5K
In parametric statistics, two fundamental tests stand out for their utility and wide application: the Student's t-test and goodness-of-fit tests. These tests provide researchers with a robust method for drawing insights from data, testing hypotheses, and making informed decisions based on their findings.
The Student's t-test is a statistical test that examines if there is a statistically significant difference between the means of two groups. This test is instrumental when dealing with...
5.5K
Statistical Analysis: Overview01:11

Statistical Analysis: Overview

14.1K
When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...
14.1K
Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data01:16

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

436
Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...
436
Estimating Population Mean with Unknown Standard Deviation01:22

Estimating Population Mean with Unknown Standard Deviation

8.7K
In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...
8.7K
Estimating Population Standard Deviation01:26

Estimating Population Standard Deviation

3.3K
When the population standard deviation is unknown and the sample size is large, the sample standard deviation s is commonly used as a point estimate of σ. However, it can sometimes under or overestimate the population standard deviation. To overcome this drawback, confidence intervals are determined to estimate population parameters and eliminate any calculation bias accurately. However, this only applies to random samples from normally distributed populations. Knowing the sample mean and...
3.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

SATB1 is a targetable modulator of JAK-STAT signaling and cytokines in human Treg and Tconv cells.

EMBO reports·2026
Same author

Critical evaluation of drug response prediction models with DrEval.

Nature communications·2026
Same author

Drugst.One DREAM-Drug repurposing through expert annotation and modification.

British journal of pharmacology·2026
Same author

SATB1 is a targetable modulator of JAK-STAT signaling and cytokines in human Treg and Tconv cells.

bioRxiv : the preprint server for biology·2026
Same author

Correction: Maurer et al. Gut Microbial Disruption in Critically Ill Patients with COVID-19-Associated Pulmonary Aspergillosis. <i>J. Fungi</i> 2022, <i>8</i>, 1265.

Journal of fungi (Basel, Switzerland)·2026
Same author

Detection of Candidate Circular RNAs to Monitor Anti-Hormonal Response in the Mammary Gland.

bioRxiv : the preprint server for biology·2026
Same journal

NanoporeDB: A Structural Resource Of Multimeric Protein Nanopores For Single-Molecule Sensing.

GigaScience·2026
Same journal

From the Brain Cell Atlas to Precision Neurology: A review of the application of AI-driven multi-omics in brain science.

GigaScience·2026
Same journal

Comparison of Deep Learning Approaches for Extreme Low-SNR Image Restoration.

GigaScience·2026
Same journal

ScopeViewer: A Browser-Based Solution for Visualizing Large Biological Images.

GigaScience·2026
Same journal

ChatMDV: Reducing Technical Barriers in Bioinformatics Analysis using Large Language Models.

GigaScience·2026
Same journal

ClusterGraph: a new tool for visualisation and compression of multidimensional data.

GigaScience·2026
See all related articles

Related Experiment Video

Updated: Jan 11, 2026

A User-friendly and Powerful R Analysis of Large-scale Datasets
10:56

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

328

NApy: efficient statistics in Python for large-scale heterogeneous data with enhanced support for missing data.

Fabian Woller1,2, Lis Arend2,3, Christian Fuchsberger2

  • 1Biomedical Network Science Lab, Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nürnberger Straße 74, 91052 Erlangen, Germany.

Gigascience
|November 9, 2025
PubMed
Summary
This summary is machine-generated.

NApy is a new Python package for efficient statistical testing on large datasets with missing values. It significantly improves runtime and memory usage compared to existing tools, enabling real-time data analysis.

Keywords:
Pythonefficient computing and parallelizationlarge-scale datasetsmissing datastatistical software

More Related Videos

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
09:43

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

6.7K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.9K

Related Experiment Videos

Last Updated: Jan 11, 2026

A User-friendly and Powerful R Analysis of Large-scale Datasets
10:56

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

328
Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
09:43

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

6.7K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.9K

Area of Science:

  • Computational Biology
  • Data Science
  • Statistical Computing

Background:

  • Existing Python libraries struggle with efficient statistical testing on large datasets containing missing values.
  • Runtime and memory constraints are critical for applications like interactive biomedical data analysis.
  • This limitation hinders exploratory data analysis in resource-intensive fields.

Purpose of the Study:

  • To introduce NApy, a Python package designed for scalable statistical testing.
  • To address the challenge of handling missing values in large, mixed-type datasets.
  • To provide an efficient solution for computational tasks in data science and bioinformatics.

Main Methods:

  • Developed NApy with a Numba and C++ backend.
  • Implemented OpenMP for parallelization to enhance performance.
  • Focused on optimizing statistical test computations for datasets with missing entries.

Main Results:

  • NApy demonstrates significant improvements in runtime and memory consumption.
  • Outperforms existing tools and naive parallelization methods by orders of magnitude.
  • Enables efficient on-the-fly statistical analyses for interactive applications.

Conclusions:

  • NApy offers a scalable and efficient solution for statistical testing with missing data.
  • The package facilitates real-time data analysis in interactive environments.
  • NApy is publicly available, promoting its adoption in research and industry.