Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Correlation and Regression

Correlation and Regression

In statistics, correlation describes the degree of association between two variables. In the subfield of linear regression, correlation is mathematically expressed by the correlation coefficient, which describes the strength and direction of the relationship between two variables. The coefficient is symbolically represented by 'r' and ranges from -1 to +1. A positive value indicates a positive correlation where the two variables move in the same direction. A negative value suggests a...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

pH Scale

pH Scale

Hydronium and hydroxide ions are present both in pure water and in all aqueous solutions, and their concentrations are inversely proportional as determined by the ion product of water (Kw). The concentrations of these ions in a solution are often critical determinants of the solution’s properties and the chemical behaviors of its other solutes. Two different solutions can differ in their hydronium or hydroxide ion concentrations by a million, billion, or even trillion times. A common means of...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

STrategies for developing REseArch Methods guidance (STREAM): Protocol.

Journal of clinical epidemiology·2026

Same author

The statistical software revolution in pharmaceutical development: challenges and opportunities in open source.

Drug discovery today·2026

Same author

On "Confirmatory" Methodological Research in Statistics and Related Fields.

Statistics in medicine·2025

Same author

ChatGPT as a Tool for Biostatisticians: A Tutorial on Applications, Opportunities, and Limitations.

Statistics in medicine·2025

Same author

Rethinking the Handling of Method Failure in Comparison Studies.

Statistics in medicine·2025

Same author

Comparing supervised machine learning algorithms for the prediction of partial arterial pressure of oxygen during craniotomy.

BMC medical informatics and decision making·2025

Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026

Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026

Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026

Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026

Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026

Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 7, 2026

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Published on: November 25, 2016

Random forest versus logistic regression: a large-scale benchmark experiment.

Raphael Couronné¹, Philipp Probst², Anne-Laure Boulesteix²

¹Department of Medical Information Processing, Biometry and Epidemiology, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany. raphael.couronne@gmail.com.

BMC Bioinformatics

|July 19, 2018

Summary

This summary is machine-generated.

Random Forest (RF) outperformed logistic regression (LR) in 69% of datasets for binary classification tasks. This large-scale benchmarking study highlights RF's superior predictive performance, emphasizing the need for rigorous methodology in algorithm evaluation.

Keywords:

Classification Comparison study Logistic regression Prediction

More Related Videos

Collecting and Processing Drone-based Remotely Sensed Data for Use in Forest Recovery Monitoring

Collecting and Processing Drone-based Remotely Sensed Data for Use in Forest Recovery Monitoring

Published on: October 24, 2025

Simulating Impacts of Ice Storms on Forest Ecosystems

Simulating Impacts of Ice Storms on Forest Ecosystems

Published on: June 30, 2020

Related Experiment Videos

Last Updated: Feb 7, 2026

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Published on: November 25, 2016

Collecting and Processing Drone-based Remotely Sensed Data for Use in Forest Recovery Monitoring

Collecting and Processing Drone-based Remotely Sensed Data for Use in Forest Recovery Monitoring

Published on: October 24, 2025

Simulating Impacts of Ice Storms on Forest Ecosystems

Simulating Impacts of Ice Storms on Forest Ecosystems

Published on: June 30, 2020

Area of Science:

Machine Learning
Computational Biology
Data Science

Background:

The Random Forest (RF) algorithm has become a popular classification tool since 2001.
It is increasingly used in scientific fields, competing with logistic regression (LR).

Purpose of the Study:

To benchmark the prediction performance of the original RF algorithm against LR.
To compare these algorithms using a large-scale experiment with 243 real datasets for binary classification.

Main Methods:

A large-scale benchmarking experiment was conducted using 243 real-world datasets.
The study compared the original Random Forest (RF) algorithm with default parameters against logistic regression (LR).
The experimental design was inspired by clinical trial methodology to minimize bias.

Main Results:

Random Forest demonstrated superior accuracy in approximately 69% of the datasets compared to logistic regression.
Statistical analysis showed significantly better performance for RF across accuracy, Area Under the Curve (AUC), and Brier score.
Dataset selection criteria notably influenced the observed results, underscoring the importance of transparent methodology.

Conclusions:

The original Random Forest algorithm with default parameters generally outperforms logistic regression for binary classification.
Future research should involve neutral, large-scale studies to evaluate various RF implementations and parameters.
Clear reporting of dataset selection criteria is crucial for reproducible and reliable algorithm benchmarking.