Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Outliers and Influential Points01:08

Outliers and Influential Points

5.7K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
5.7K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

3.4K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
3.4K
Multiple Regression01:25

Multiple Regression

3.7K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.7K
Friedman Two-way Analysis of Variance by Ranks01:21

Friedman Two-way Analysis of Variance by Ranks

438
Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...
438
Frequency-dependent Selection01:21

Frequency-dependent Selection

22.9K
When the fitness of a trait is influenced by how common it is (i.e., its frequency) relative to different traits within a population, this is referred to as frequency-dependent selection. Frequency-dependent selection may occur between species or within a single species. This type of selection can either be positive—with more common phenotypes having higher fitness—or negative, with rarer phenotypes conferring increased fitness.
22.9K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

6.6K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
6.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Replicability of Functional Brain Networks: A Study Through the Lens of Seven Resting-State Networks.

Human brain mapping·2026
Same author

Detecting misfolded non-covalent lasso entanglements in protein structures, simulation trajectories, and mass spectrometry data.

bioRxiv : the preprint server for biology·2026
Same author

Natively entangled proteins are linked to human disease and pathogenic mutations likely due to a greater misfolding propensity.

bioRxiv : the preprint server for biology·2026
Same author

Targeted digital voter suppression efforts likely decrease voter turnout.

Proceedings of the National Academy of Sciences of the United States of America·2026
Same author

A widespread protein misfolding mechanism is differentially rescued in vitro by chaperones based on gene essentiality.

Nature communications·2025
Same author

Genetic modifiers and ascertainment drive variable expressivity of complex disorders.

Cell·2025
Same journal

Instrumental Variable Estimation of Marginal Structural Mean Models for Time-Varying Treatment.

Journal of the American Statistical Association·2026
Same journal

Semiparametric Joint Modeling for Survival Analysis with Longitudinal Covariates.

Journal of the American Statistical Association·2026
Same journal

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference.

Journal of the American Statistical Association·2026
Same journal

Facilitating Heterogeneous Effect Estimation via Statistically Efficient Categorical Modifiers.

Journal of the American Statistical Association·2026
Same journal

Nonparametric Density Estimation of a Long-Term Trend from Repeated Semicontinuous Data.

Journal of the American Statistical Association·2026
Same journal

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026
See all related articles

Related Experiment Video

Updated: Dec 24, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.9K

PUlasso: High-Dimensional Variable Selection With Presence-Only Data.

Hyebin Song1, Garvesh Raskutti1

  • 1Department of Statistics, University of Wisconsin-Madison, Madison, WI.

Journal of the American Statistical Association
|April 8, 2020
PubMed
Summary
This summary is machine-generated.

We introduce PUlasso, a new algorithm for variable selection and classification using positive and unlabeled data, especially effective in high-dimensional settings. PUlasso offers improved performance and theoretical guarantees for presence-only response problems.

Keywords:
Majorization-minimizationNonconvexity, PU-learningRegularization

More Related Videos

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
14:06

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

15.6K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.8K

Related Experiment Videos

Last Updated: Dec 24, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.9K
Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
14:06

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

15.6K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.8K

Area of Science:

  • Machine Learning
  • Statistical Learning
  • Bioinformatics

Background:

  • Real-world classification often involves positive and unlabeled data (presence-only responses).
  • High dimensionality (large number of features, p) combined with presence-only data poses significant statistical and computational challenges.

Purpose of the Study:

  • To develop a scalable algorithm, PUlasso, for variable selection and classification with positive and unlabeled data.
  • To address the challenges posed by high dimensionality in presence-only response scenarios.

Main Methods:

  • The PUlasso algorithm utilizes the majorization-minimization framework, a generalization of the expectation-maximization (EM) algorithm.
  • Two computational speed-ups are incorporated to enhance the scalability of the standard EM algorithm.
  • Theoretical convergence to a stationary point is established, with guarantees of minimax optimal mean-squared error under sparsity assumptions.

Main Results:

  • The PUlasso algorithm demonstrates convergence to a stationary point.
  • Theoretical analysis shows minimax optimal mean-squared error achievement under strict and group sparsity.
  • Simulations indicate superior classification performance compared to state-of-the-art methods in moderate p settings.
  • Successful application of PUlasso is shown in a biochemistry example.

Conclusions:

  • PUlasso provides an effective and scalable solution for variable selection and classification with positive and unlabeled data.
  • The algorithm offers strong theoretical guarantees and practical performance benefits, particularly in high-dimensional contexts.
  • The study highlights the utility of PUlasso in real-world applications, including biochemistry.