Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Weighted Mean

Weighted Mean

While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

End-to-end extraction of temporal information from psychiatric discharge summaries.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science·2026

Same author

Exploration of NMDA and GABA receptor-mediated plasticity induced by 10-Hz repetitive transcranial magnetic stimulation.

Translational psychiatry·2026

Same author

Efficacy and Safety of Lumateperone and Other Atypical Antipsychotics Approved as Adjunctive Treatment for Major Depressive Disorder in the United States: A Network Meta-Analysis.

Advances in therapy·2026

Same author

Smartphone-based cognitive assessment in older adults with depression: Feasibility and task performance using ecological momentary assessment.

Journal of affective disorders·2026

Same author

Modeling the effects of routine screening for accidental lab-acquired infections on the risk of potential pandemic pathogen escape from high-biosafety research facilities.

Frontiers in bioengineering and biotechnology·2026

Same author

Paranoia, experiential avoidance, and narcissism as predictors of outcome in residential treatment of borderline personality disorder.

Frontiers in psychiatry·2026

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026

Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026

Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026

Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026

Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 16, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

A pairwise strategy for imputing predictive features when combining multiple datasets.

Yujie Wu¹, Boyu Ren^2,3, Prasad Patil⁴

¹Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.

Bioinformatics (Oxford, England)

|December 28, 2022

Summary

This summary is machine-generated.

Combining genomic datasets improves models, but feature differences cause data loss. A new pairwise imputation strategy effectively uses study-specific features, outperforming methods that merge all data first for better predictive model performance.

More Related Videos

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Spotting Cheetahs: Identifying Individuals by Their Footprints

Spotting Cheetahs: Identifying Individuals by Their Footprints

Published on: May 1, 2016

Related Experiment Videos

Last Updated: Aug 16, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Spotting Cheetahs: Identifying Individuals by Their Footprints

Spotting Cheetahs: Identifying Individuals by Their Footprints

Published on: May 1, 2016

Area of Science:

Genomics
Bioinformatics
Computational Biology

Background:

Combining multiple genomic studies enhances predictive model generalizability.
Variations in measurement platforms lead to different feature sets across studies.
Using only common features discards potentially valuable data.

Purpose of the Study:

Quantify performance loss from using only intersected features.
Evaluate imputation methods for missing genomic data.
Propose and validate a pairwise imputation strategy for cross-study analysis.

Main Methods:

Characterized performance loss using linear and polynomial regression for imputation.
Simulated data and used breast cancer gene expression datasets.
Developed and tested a pairwise imputation strategy, averaging imputed features across pairs.

Main Results:

Pairwise imputation significantly improves external predictive model performance compared to using only intersected features.
The pairwise strategy outperforms merging all datasets before imputation.
Identified optimal feature subsets for imputation to enhance cross-study replicability.

Conclusions:

Discarding study-specific genomic features leads to substantial predictive performance loss.
Pairwise imputation is a superior strategy for integrating heterogeneous genomic datasets.
This approach maximizes information utilization for robust cross-study genomic prediction.