Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Randomized Experiments01:13

Randomized Experiments

9.1K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
9.1K
Regression Toward the Mean01:52

Regression Toward the Mean

7.2K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
7.2K
Multiple Regression01:25

Multiple Regression

4.0K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
4.0K
Correlation and Regression00:53

Correlation and Regression

3.5K
In statistics, correlation describes the degree of association between two variables. In the subfield of linear regression, correlation is mathematically expressed by the correlation coefficient, which describes the strength and direction of the relationship between two variables. The coefficient is symbolically represented by 'r' and ranges from -1 to +1. A positive value indicates a positive correlation where the two variables move in the same direction. A negative value suggests a...
3.5K
Regression Analysis01:11

Regression Analysis

8.4K
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
8.4K
pH Scale02:41

pH Scale

80.0K
Hydronium and hydroxide ions are present both in pure water and in all aqueous solutions, and their concentrations are inversely proportional as determined by the ion product of water (Kw). The concentrations of these ions in a solution are often critical determinants of the solution’s properties and the chemical behaviors of its other solutes. Two different solutions can differ in their hydronium or hydroxide ion concentrations by a million, billion, or even trillion times. A common means of...
80.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

STrategies for developing REseArch Methods guidance (STREAM): Protocol.

Journal of clinical epidemiology·2026
Same author

The statistical software revolution in pharmaceutical development: challenges and opportunities in open source.

Drug discovery today·2026
Same author

On "Confirmatory" Methodological Research in Statistics and Related Fields.

Statistics in medicine·2025
Same author

ChatGPT as a Tool for Biostatisticians: A Tutorial on Applications, Opportunities, and Limitations.

Statistics in medicine·2025
Same author

Rethinking the Handling of Method Failure in Comparison Studies.

Statistics in medicine·2025
Same author

Comparing supervised machine learning algorithms for the prediction of partial arterial pressure of oxygen during craniotomy.

BMC medical informatics and decision making·2025
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Related Experiment Video

Updated: Feb 7, 2026

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils
09:16

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Published on: November 25, 2016

17.4K

Random forest versus logistic regression: a large-scale benchmark experiment.

Raphael Couronné1, Philipp Probst2, Anne-Laure Boulesteix2

  • 1Department of Medical Information Processing, Biometry and Epidemiology, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany. raphael.couronne@gmail.com.

BMC Bioinformatics
|July 19, 2018
PubMed
Summary
This summary is machine-generated.

Random Forest (RF) outperformed logistic regression (LR) in 69% of datasets for binary classification tasks. This large-scale benchmarking study highlights RF's superior predictive performance, emphasizing the need for rigorous methodology in algorithm evaluation.

Keywords:
ClassificationComparison studyLogistic regressionPrediction

More Related Videos

Collecting and Processing Drone-based Remotely Sensed Data for Use in Forest Recovery Monitoring
08:16

Collecting and Processing Drone-based Remotely Sensed Data for Use in Forest Recovery Monitoring

Published on: October 24, 2025

607
Simulating Impacts of Ice Storms on Forest Ecosystems
06:27

Simulating Impacts of Ice Storms on Forest Ecosystems

Published on: June 30, 2020

7.5K

Related Experiment Videos

Last Updated: Feb 7, 2026

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils
09:16

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Published on: November 25, 2016

17.4K
Collecting and Processing Drone-based Remotely Sensed Data for Use in Forest Recovery Monitoring
08:16

Collecting and Processing Drone-based Remotely Sensed Data for Use in Forest Recovery Monitoring

Published on: October 24, 2025

607
Simulating Impacts of Ice Storms on Forest Ecosystems
06:27

Simulating Impacts of Ice Storms on Forest Ecosystems

Published on: June 30, 2020

7.5K

Area of Science:

  • Machine Learning
  • Computational Biology
  • Data Science

Background:

  • The Random Forest (RF) algorithm has become a popular classification tool since 2001.
  • It is increasingly used in scientific fields, competing with logistic regression (LR).

Purpose of the Study:

  • To benchmark the prediction performance of the original RF algorithm against LR.
  • To compare these algorithms using a large-scale experiment with 243 real datasets for binary classification.

Main Methods:

  • A large-scale benchmarking experiment was conducted using 243 real-world datasets.
  • The study compared the original Random Forest (RF) algorithm with default parameters against logistic regression (LR).
  • The experimental design was inspired by clinical trial methodology to minimize bias.

Main Results:

  • Random Forest demonstrated superior accuracy in approximately 69% of the datasets compared to logistic regression.
  • Statistical analysis showed significantly better performance for RF across accuracy, Area Under the Curve (AUC), and Brier score.
  • Dataset selection criteria notably influenced the observed results, underscoring the importance of transparent methodology.

Conclusions:

  • The original Random Forest algorithm with default parameters generally outperforms logistic regression for binary classification.
  • Future research should involve neutral, large-scale studies to evaluate various RF implementations and parameters.
  • Clear reporting of dataset selection criteria is crucial for reproducible and reliable algorithm benchmarking.