Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Prediction Intervals01:03

Prediction Intervals

2.3K
The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y. 
2.3K
Multiple Regression01:25

Multiple Regression

3.1K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.1K
Improving Translational Accuracy02:07

Improving Translational Accuracy

11.8K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
11.8K
Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

137
Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...
137
Survival Tree01:19

Survival Tree

131
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
131
Weighted Mean00:57

Weighted Mean

5.3K
While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...
5.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

End-to-end extraction of temporal information from psychiatric discharge summaries.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science·2026
Same author

Exploration of NMDA and GABA receptor-mediated plasticity induced by 10-Hz repetitive transcranial magnetic stimulation.

Translational psychiatry·2026
Same author

Efficacy and Safety of Lumateperone and Other Atypical Antipsychotics Approved as Adjunctive Treatment for Major Depressive Disorder in the United States: A Network Meta-Analysis.

Advances in therapy·2026
Same author

Smartphone-based cognitive assessment in older adults with depression: Feasibility and task performance using ecological momentary assessment.

Journal of affective disorders·2026
Same author

Modeling the effects of routine screening for accidental lab-acquired infections on the risk of potential pandemic pathogen escape from high-biosafety research facilities.

Frontiers in bioengineering and biotechnology·2026
Same author

Paranoia, experiential avoidance, and narcissism as predictors of outcome in residential treatment of borderline personality disorder.

Frontiers in psychiatry·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026
Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026
Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026
Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026
Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Aug 16, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K

A pairwise strategy for imputing predictive features when combining multiple datasets.

Yujie Wu1, Boyu Ren2,3, Prasad Patil4

  • 1Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.

Bioinformatics (Oxford, England)
|December 28, 2022
PubMed
Summary
This summary is machine-generated.

Combining genomic datasets improves models, but feature differences cause data loss. A new pairwise imputation strategy effectively uses study-specific features, outperforming methods that merge all data first for better predictive model performance.

More Related Videos

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

837
Spotting Cheetahs: Identifying Individuals by Their Footprints
09:47

Spotting Cheetahs: Identifying Individuals by Their Footprints

Published on: May 1, 2016

14.9K

Related Experiment Videos

Last Updated: Aug 16, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K
Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

837
Spotting Cheetahs: Identifying Individuals by Their Footprints
09:47

Spotting Cheetahs: Identifying Individuals by Their Footprints

Published on: May 1, 2016

14.9K

Area of Science:

  • Genomics
  • Bioinformatics
  • Computational Biology

Background:

  • Combining multiple genomic studies enhances predictive model generalizability.
  • Variations in measurement platforms lead to different feature sets across studies.
  • Using only common features discards potentially valuable data.

Purpose of the Study:

  • Quantify performance loss from using only intersected features.
  • Evaluate imputation methods for missing genomic data.
  • Propose and validate a pairwise imputation strategy for cross-study analysis.

Main Methods:

  • Characterized performance loss using linear and polynomial regression for imputation.
  • Simulated data and used breast cancer gene expression datasets.
  • Developed and tested a pairwise imputation strategy, averaging imputed features across pairs.

Main Results:

  • Pairwise imputation significantly improves external predictive model performance compared to using only intersected features.
  • The pairwise strategy outperforms merging all datasets before imputation.
  • Identified optimal feature subsets for imputation to enhance cross-study replicability.

Conclusions:

  • Discarding study-specific genomic features leads to substantial predictive performance loss.
  • Pairwise imputation is a superior strategy for integrating heterogeneous genomic datasets.
  • This approach maximizes information utilization for robust cross-study genomic prediction.