Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Multiple Comparison Tests

Multiple Comparison Tests

Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...

Introduction to R

Introduction to R

R is a powerful software environment for statistical computing and graphics. Originating as an implementation of the S language, developed at Bell Laboratories, R has evolved into a robust, open-source statistical software favored by statisticians and data scientists worldwide. Its comprehensive suite includes data manipulation, calculation, and graphical display capabilities, making it versatile for data analysis and visualization. Its programming language is at the core of R's...

Column Efficiency: Rate Theory

Column Efficiency: Rate Theory

The rate theory of chromatography provides quantitative insight into the shapes and widths of elution bands. These bands are based on the random-walk mechanism governing molecular migration within a column. The Gaussian profile of chromatographic bands arises from the cumulative effect of random molecular motions as they progress through the column.
During elution, a solute molecule experiences numerous transitions between stationary and mobile phases, exhibiting irregular residence times in...

Comparing Copy Number Variations and SNPs

Comparing Copy Number Variations and SNPs

Sequencing of the human genome has opened up several best-kept secrets of the genome. Scientists have identified thousands of genome variations that exist within a population. These variations can be a single nucleotide or a larger chromosomal variation.
Copy number variations or CNVs are the structural variations that cover more than 1kb of DNA sequence. The single nucleotide polymorphism (SNP), on the other hand, is a single nucleotide change or a point mutation that is found in more than 1%...

Law of Independent Assortment

Law of Independent Assortment

While Mendel’s Law of Segregation states that the two alleles for one gene are separated into different gametes, a different question of how different genes are inherited remains. For example, is the gene for tall plants inherited with the gene for green peas? Mendel asked this question by experimenting with a dihybrid cross; a cross in which both parents are homozygous for two distinct traits resulting in an F1 generation that are heterozygous for both traits.

Parallel Processing

Parallel Processing

The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Facilitating analysis of open neurophysiology data on the DANDI Archive using large language model tools.

Scientific data·2025

Same author

Facilitating analysis of open neurophysiology data on the DANDI Archive using large language model tools.

bioRxiv : the preprint server for biology·2025

Same author

Distributed Augmentation, Hypersweeps, and Branch Decomposition of Contour Trees for Scientific Exploration.

IEEE transactions on visualization and computer graphics·2024

Same author

Spyglass: a framework for reproducible and shareable neuroscience research.

bioRxiv : the preprint server for biology·2024

Same author

Structured behavioral data format: An NWB extension standard for task-based behavioral neuroscience experiments.

bioRxiv : the preprint server for biology·2024

Same author

FAIR for AI: An interdisciplinary and international community building perspective.

Scientific data·2023

Same journal

Architectural Implications for Spatial Object Association Algorithms.

Proceedings. IPDPS (Conference)·2015

Same journal

Orientation Refinement of Virus Structures with Unknown Symmetry.

Proceedings. IPDPS (Conference)·2015

Same journal

High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms.

Proceedings. IPDPS (Conference)·2014

Same journal

Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems.

Proceedings. IPDPS (Conference)·2014

Same journal

Parallel Mapping Approaches for GNUMAP.

Proceedings. IPDPS (Conference)·2013

Same journal

Translational Research Design Templates, Grid Computing, and HPC.

Proceedings. IPDPS (Conference)·2011

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 17, 2025

A High-throughput Cell Microarray Platform for Correlative Analysis of Cell Differentiation and Traction Forces

A High-throughput Cell Microarray Platform for Correlative Analysis of Cell Differentiation and Traction Forces

Published on: March 1, 2017

Predicting and Comparing the Performance of Array Management Libraries.

Donghe Kang¹, Oliver Rübel², Suren Byna²

¹The Ohio State University.

Proceedings. IPDPS (Conference)

|October 11, 2021

Summary

This summary is machine-generated.

New models predict application performance for I/O-bound scientific computing, considering array libraries like HDF5 and Zarr. These models accurately capture performance beyond just I/O, improving scalability for complex data.

More Related Videos

Simulating Imaging of Large Scale Radio Arrays on the Lunar Surface

Simulating Imaging of Large Scale Radio Arrays on the Lunar Surface

Published on: July 30, 2020

Competitive Genomic Screens of Barcoded Yeast Libraries

Competitive Genomic Screens of Barcoded Yeast Libraries

Published on: August 11, 2011

Related Experiment Videos

Last Updated: Oct 17, 2025

A High-throughput Cell Microarray Platform for Correlative Analysis of Cell Differentiation and Traction Forces

A High-throughput Cell Microarray Platform for Correlative Analysis of Cell Differentiation and Traction Forces

Published on: March 1, 2017

Simulating Imaging of Large Scale Radio Arrays on the Lunar Surface

Simulating Imaging of Large Scale Radio Arrays on the Lunar Surface

Published on: July 30, 2020

Competitive Genomic Screens of Barcoded Yeast Libraries

Competitive Genomic Screens of Barcoded Yeast Libraries

Published on: August 11, 2011

Area of Science:

High-Performance Computing (HPC)
Data Storage and Management
Scientific Data Analysis

Background:

Many scientific applications are I/O-bound, necessitating performance optimization for scalability.
Existing I/O performance models are insufficient for applications using array libraries (e.g., HDF5, Zarr) due to complex data access patterns and storage models.
I/O optimization is often ad-hoc, performed by domain scientists lacking deep storage hierarchy expertise.

Purpose of the Study:

To present an analytical cost model for predicting end-to-end execution time of applications using array management libraries.
To evaluate the model's accuracy in capturing performance beyond raw I/O, including data transformation and caching.
To compare the performance of different storage libraries, specifically HDF5 and Zarr, using the developed model.

Main Methods:

Developed an analytical cost model incorporating I/O time, memory copy costs, and software cache benefits.
Focused on HDF5 (single-file storage) and Zarr (multi-file storage) as representative array libraries.
Evaluated the model on real-world applications in neuroscience and plasma physics across three HPC clusters.

Main Results:

I/O can account for as little as 10% of total execution time, highlighting the inadequacy of I/O-only models.
The new model accurately predicts the fastest storage library (HDF5 vs. Zarr) 94% of the time.
This significantly outperforms a cutting-edge I/O model, which achieves 70% accuracy.

Conclusions:

End-to-end performance modeling, including data layout transformations and caching, is crucial for applications using array libraries.
The developed analytical model provides a more accurate prediction of application performance compared to traditional I/O models.
This work offers a valuable tool for optimizing data storage and access in scientific computing, improving application scalability and developer efficiency.