Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Statistical Software for Data Analysis and Clinical Trials

Statistical Software for Data Analysis and Clinical Trials

Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...

Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test

Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test

In parametric statistics, two fundamental tests stand out for their utility and wide application: the Student's t-test and goodness-of-fit tests. These tests provide researchers with a robust method for drawing insights from data, testing hypotheses, and making informed decisions based on their findings.
The Student's t-test is a statistical test that examines if there is a statistically significant difference between the means of two groups. This test is instrumental when dealing with...

Statistical Analysis: Overview

Statistical Analysis: Overview

When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

Estimating Population Mean with Unknown Standard Deviation

Estimating Population Mean with Unknown Standard Deviation

In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...

Estimating Population Standard Deviation

Estimating Population Standard Deviation

When the population standard deviation is unknown and the sample size is large, the sample standard deviation s is commonly used as a point estimate of σ. However, it can sometimes under or overestimate the population standard deviation. To overcome this drawback, confidence intervals are determined to estimate population parameters and eliminate any calculation bias accurately. However, this only applies to random samples from normally distributed populations. Knowing the sample mean and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

SATB1 is a targetable modulator of JAK-STAT signaling and cytokines in human Treg and Tconv cells.

EMBO reports·2026

Same author

Critical evaluation of drug response prediction models with DrEval.

Nature communications·2026

Same author

Drugst.One DREAM-Drug repurposing through expert annotation and modification.

British journal of pharmacology·2026

Same author

SATB1 is a targetable modulator of JAK-STAT signaling and cytokines in human Treg and Tconv cells.

bioRxiv : the preprint server for biology·2026

Same author

Correction: Maurer et al. Gut Microbial Disruption in Critically Ill Patients with COVID-19-Associated Pulmonary Aspergillosis. <i>J. Fungi</i> 2022, <i>8</i>, 1265.

Journal of fungi (Basel, Switzerland)·2026

Same author

Detection of Candidate Circular RNAs to Monitor Anti-Hormonal Response in the Mammary Gland.

bioRxiv : the preprint server for biology·2026

Same journal

NanoporeDB: A Structural Resource Of Multimeric Protein Nanopores For Single-Molecule Sensing.

GigaScience·2026

Same journal

From the Brain Cell Atlas to Precision Neurology: A review of the application of AI-driven multi-omics in brain science.

GigaScience·2026

Same journal

Comparison of Deep Learning Approaches for Extreme Low-SNR Image Restoration.

GigaScience·2026

Same journal

ScopeViewer: A Browser-Based Solution for Visualizing Large Biological Images.

GigaScience·2026

Same journal

ChatMDV: Reducing Technical Barriers in Bioinformatics Analysis using Large Language Models.

GigaScience·2026

Same journal

ClusterGraph: a new tool for visualisation and compression of multidimensional data.

GigaScience·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 11, 2026

A User-friendly and Powerful R Analysis of Large-scale Datasets

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

NApy: efficient statistics in Python for large-scale heterogeneous data with enhanced support for missing data.

Fabian Woller^1,2, Lis Arend^2,3, Christian Fuchsberger²

¹Biomedical Network Science Lab, Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nürnberger Straße 74, 91052 Erlangen, Germany.

|November 9, 2025

Summary

This summary is machine-generated.

NApy is a new Python package for efficient statistical testing on large datasets with missing values. It significantly improves runtime and memory usage compared to existing tools, enabling real-time data analysis.

Keywords:

Python efficient computing and parallelization large-scale datasets missing data statistical software

More Related Videos

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Related Experiment Videos

Last Updated: Jan 11, 2026

A User-friendly and Powerful R Analysis of Large-scale Datasets

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Computational Biology
Data Science
Statistical Computing

Background:

Existing Python libraries struggle with efficient statistical testing on large datasets containing missing values.
Runtime and memory constraints are critical for applications like interactive biomedical data analysis.
This limitation hinders exploratory data analysis in resource-intensive fields.

Purpose of the Study:

To introduce NApy, a Python package designed for scalable statistical testing.
To address the challenge of handling missing values in large, mixed-type datasets.
To provide an efficient solution for computational tasks in data science and bioinformatics.

Main Methods:

Developed NApy with a Numba and C++ backend.
Implemented OpenMP for parallelization to enhance performance.
Focused on optimizing statistical test computations for datasets with missing entries.

Main Results:

NApy demonstrates significant improvements in runtime and memory consumption.
Outperforms existing tools and naive parallelization methods by orders of magnitude.
Enables efficient on-the-fly statistical analyses for interactive applications.

Conclusions:

NApy offers a scalable and efficient solution for statistical testing with missing data.
The package facilitates real-time data analysis in interactive environments.
NApy is publicly available, promoting its adoption in research and industry.