Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

Random Sampling Method

Random Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...

Wald-Wolfowitz Runs Test II

Wald-Wolfowitz Runs Test II

The Wald-Wolfowitz runs test, commonly referred to as the runs test, is a nonparametric test used to assess the randomness of ordered data. The test evaluates the number of runs, which are consecutive sequences of similar elements within the data. If the number of runs is significantly higher or lower than expected, the data is considered non-random, indicating a detectable pattern or structure.
For binary data, runs are identified using symbols such as + and −, or equivalently, 1s and...

Wald-Wolfowitz Runs Test I

Wald-Wolfowitz Runs Test I

The Wald-Wolfowitz test, also known as the runs test, is a nonparametric statistical test used to assess the randomness of a sequence of two different types of elements (e.g., positive/negative values, successes/failures). It examines whether the order of the elements in a sequence is random or if there is a pattern or trend present. This nonparametric test applies to any ordered data despite the population and sample data distribution, even if a higher sample size is available.
The test works...

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Sampling Plans

Sampling Plans

Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Promera: a unified model for biomolecular structure prediction, filtering, and design.

bioRxiv : the preprint server for biology·2026

Same author

Machine-learning prediction of affinity and epistasis in the bovine pancreatic trypsin inhibitor-chymotrypsin complex.

Protein science : a publication of the Protein Society·2026

Same author

Evolutionary dynamics under phenotypic uncertainty.

bioRxiv : the preprint server for biology·2026

Same author

Profile of David Baker, Demis Hassabis, and John Jumper: 2024 Nobel laureates in chemistry.

Proceedings of the National Academy of Sciences of the United States of America·2026

Same author

Learning the language of protein-protein interactions.

Nature communications·2026

Same author

Single-cell and spatial profiling highlights TB-induced myofibroblasts as drivers of lung pathology.

The Journal of experimental medicine·2026

Same journal

Efficient Analysis of Annotation Colocalization Accounting for Genomic Contexts.

Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )·2025

Same journal

Secure Discovery of Genetic Relatives across Large-Scale and Distributed Genomic Datasets.

Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )·2024

Same journal

A Fast, Provably Accurate Approximation Algorithm for Sparse Principal Component Analysis Reveals Human Genetic Variation Across the World.

Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )·2023

Same journal

Transcription Factor-Centric Approach to Identify Non-Recurring Putative Regulatory Drivers in Cancer.

Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )·2022

Same journal

Privacy-Preserving Genotype Imputation in a Trusted Execution Environment.

Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )·2021

Same journal

RoboCOP: Multivariate State Space Model Integrating Epigenomic Accessibility Data to Elucidate Genome-Wide Chromatin Occupancy.

Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )·2021

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 24, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

A Randomized Parallel Algorithm for Efficiently Finding Near-Optimal Universal Hitting Sets.

Barış Ekim^1,2, Bonnie Berger^1,2, Yaron Orenstein³

¹Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

Research in Computational Molecular Biology : ... Annual International Conference, RECOMB ... : Proceedings. RECOMB (Conference : 2005- )

|June 5, 2024

Summary

This summary is machine-generated.

We developed PASHA, a new parallel algorithm for generating universal hitting sets (UHS). PASHA significantly speeds up processing of large sequencing datasets while maintaining high accuracy and low memory usage.

Keywords:

Parallelization Randomization Universal hitting sets

More Related Videos

Computation of Atmospheric Concentrations of Molecular Clusters from ab initio Thermochemistry

Computation of Atmospheric Concentrations of Molecular Clusters from ab initio Thermochemistry

Published on: April 8, 2020

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Related Experiment Videos

Last Updated: Jun 24, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Computation of Atmospheric Concentrations of Molecular Clusters from ab initio Thermochemistry

Computation of Atmospheric Concentrations of Molecular Clusters from ab initio Thermochemistry

Published on: April 8, 2020

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Area of Science:

Bioinformatics
Computational Biology
Algorithm Design

Background:

Next-generation sequencing generates massive data, necessitating efficient processing algorithms.
Universal hitting sets (UHS) offer a promising alternative to minimizers for sequence analysis tasks.
Current UHS computation methods are too slow and memory-intensive for practical, large-scale sequencing applications.

Purpose of the Study:

To develop a practical, efficient algorithm for computing near-optimal universal hitting sets.
To address the computational bottlenecks of existing UHS construction methods.
To enable the application of UHS in high-throughput sequence analysis.

Main Methods:

Developed a randomized parallel algorithm (PASHA) for UHS generation.
Leveraged theoretical and architectural techniques to parallelize -mer hitting number calculation and reduce memory usage.
Applied randomized Set Cover techniques for faster universal -mer selection.

Main Results:

PASHA achieves orders of magnitude improvement in runtime and memory usage compared to existing algorithms.
The algorithm efficiently handles large values of (e.g., ).
PASHA generates near-optimal UHSs with set sizes provably close to the optimal, only slightly larger than serial deterministic methods.

Conclusions:

PASHA is the first practical, randomized parallel algorithm for generating near-optimal universal hitting sets.
The developed methods significantly reduce runtime and memory demands for UHS construction.
PASHA is expected to be widely adopted in high-throughput sequence analysis pipelines.