Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving01:29

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

48
Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...
48
Random Sampling Method01:09

Random Sampling Method

11.0K
Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...
11.0K
Wald-Wolfowitz Runs Test II01:17

Wald-Wolfowitz Runs Test II

227
The Wald-Wolfowitz runs test, commonly referred to as the runs test, is a nonparametric test used to assess the randomness of ordered data. The test evaluates the number of runs, which are consecutive sequences of similar elements within the data. If the number of runs is significantly higher or lower than expected, the data is considered non-random, indicating a detectable pattern or structure.
For binary data, runs are identified using symbols such as + and −, or equivalently, 1s and...
227
Wald-Wolfowitz Runs Test I01:17

Wald-Wolfowitz Runs Test I

640
The Wald-Wolfowitz test, also known as the runs test, is a nonparametric statistical test used to assess the randomness of a sequence of two different types of elements (e.g., positive/negative values, successes/failures). It examines whether the order of the elements in a sequence is random or if there is a pattern or trend present. This nonparametric test applies to any ordered data despite the population and sample data distribution, even if a higher sample size is available.
The test works...
640
Cluster Sampling Method01:20

Cluster Sampling Method

11.9K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
11.9K
Sampling Plans01:23

Sampling Plans

180
Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
180

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Promera: a unified model for biomolecular structure prediction, filtering, and design.

bioRxiv : the preprint server for biology·2026
Same author

Machine-learning prediction of affinity and epistasis in the bovine pancreatic trypsin inhibitor-chymotrypsin complex.

Protein science : a publication of the Protein Society·2026
Same author

Evolutionary dynamics under phenotypic uncertainty.

bioRxiv : the preprint server for biology·2026
Same author

Profile of David Baker, Demis Hassabis, and John Jumper: 2024 Nobel laureates in chemistry.

Proceedings of the National Academy of Sciences of the United States of America·2026
Same author

Learning the language of protein-protein interactions.

Nature communications·2026
Same author

Single-cell and spatial profiling highlights TB-induced myofibroblasts as drivers of lung pathology.

The Journal of experimental medicine·2026
Same journal

Efficient Analysis of Annotation Colocalization Accounting for Genomic Contexts.

Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )·2025
Same journal

Secure Discovery of Genetic Relatives across Large-Scale and Distributed Genomic Datasets.

Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )·2024
Same journal

A Fast, Provably Accurate Approximation Algorithm for Sparse Principal Component Analysis Reveals Human Genetic Variation Across the World.

Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )·2023
Same journal

Transcription Factor-Centric Approach to Identify Non-Recurring Putative Regulatory Drivers in Cancer.

Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )·2022
Same journal

Privacy-Preserving Genotype Imputation in a Trusted Execution Environment.

Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )·2021
Same journal

RoboCOP: Multivariate State Space Model Integrating Epigenomic Accessibility Data to Elucidate Genome-Wide Chromatin Occupancy.

Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )·2021
See all related articles

Related Experiment Video

Updated: Jun 24, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.5K

A Randomized Parallel Algorithm for Efficiently Finding Near-Optimal Universal Hitting Sets.

Barış Ekim1,2, Bonnie Berger1,2, Yaron Orenstein3

  • 1Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

Research in Computational Molecular Biology : ... Annual International Conference, RECOMB ... : Proceedings. RECOMB (Conference : 2005- )
|June 5, 2024
PubMed
Summary
This summary is machine-generated.

We developed PASHA, a new parallel algorithm for generating universal hitting sets (UHS). PASHA significantly speeds up processing of large sequencing datasets while maintaining high accuracy and low memory usage.

Keywords:
ParallelizationRandomizationUniversal hitting sets

More Related Videos

Computation of Atmospheric Concentrations of Molecular Clusters from ab initio Thermochemistry
12:11

Computation of Atmospheric Concentrations of Molecular Clusters from ab initio Thermochemistry

Published on: April 8, 2020

8.2K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.5K

Related Experiment Videos

Last Updated: Jun 24, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.5K
Computation of Atmospheric Concentrations of Molecular Clusters from ab initio Thermochemistry
12:11

Computation of Atmospheric Concentrations of Molecular Clusters from ab initio Thermochemistry

Published on: April 8, 2020

8.2K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.5K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Algorithm Design

Background:

  • Next-generation sequencing generates massive data, necessitating efficient processing algorithms.
  • Universal hitting sets (UHS) offer a promising alternative to minimizers for sequence analysis tasks.
  • Current UHS computation methods are too slow and memory-intensive for practical, large-scale sequencing applications.

Purpose of the Study:

  • To develop a practical, efficient algorithm for computing near-optimal universal hitting sets.
  • To address the computational bottlenecks of existing UHS construction methods.
  • To enable the application of UHS in high-throughput sequence analysis.

Main Methods:

  • Developed a randomized parallel algorithm (PASHA) for UHS generation.
  • Leveraged theoretical and architectural techniques to parallelize -mer hitting number calculation and reduce memory usage.
  • Applied randomized Set Cover techniques for faster universal -mer selection.

Main Results:

  • PASHA achieves orders of magnitude improvement in runtime and memory usage compared to existing algorithms.
  • The algorithm efficiently handles large values of (e.g., ).
  • PASHA generates near-optimal UHSs with set sizes provably close to the optimal, only slightly larger than serial deterministic methods.

Conclusions:

  • PASHA is the first practical, randomized parallel algorithm for generating near-optimal universal hitting sets.
  • The developed methods significantly reduce runtime and memory demands for UHS construction.
  • PASHA is expected to be widely adopted in high-throughput sequence analysis pipelines.