Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Force Classification01:22

Force Classification

1.3K
Forces play a crucial role in the study of physics and engineering. They are essential in describing the motion, behavior, and equilibrium of objects in the physical world. Forces can be classified based on their origin, type, and direction of action.
Contact and non-contact forces are two of the most widely used categories of forces. As the name suggests, contact forces require physical contact between two objects to act upon each other. Examples of contact forces include frictional,...
1.3K
Classification of Systems-II01:31

Classification of Systems-II

196
Continuous-time systems have continuous input and output signals, with time measured continuously. These systems are generally defined by differential or algebraic equations. For instance, in an RC circuit, the relationship between input and output voltage is expressed through a differential equation derived from Ohm's law and the capacitor relation,
196
Classification of Systems-I01:26

Classification of Systems-I

240
Linearity is a system property characterized by a direct input-output relationship, combining homogeneity and additivity.
Homogeneity dictates that if an input x(t) is multiplied by a constant c, the output y(t) is multiplied by the same constant. Mathematically, this is expressed as:
240
Aggregates Classification01:29

Aggregates Classification

358
Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...
358
Classification of Signals01:30

Classification of Signals

616
In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
616
Classification of Leukocytes01:30

Classification of Leukocytes

2.1K
Leukocytes are classified into two groups based on the presence or absence of cytoplasmic granules. Granular leukocytes, which contain granules, belong to the myeloid lineage and are divided into three subtypes: neutrophils, eosinophils, and basophils. These cells are roughly spherical and characterized by the granules in their cytoplasm.
Neutrophils are the most abundant type of granular leukocytes, comprising 50-70% of all leukocytes. They feature small, evenly distributed granules and a...
2.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Penalized regression splines in Mixture Density Networks.

The international journal of biostatistics·2025
Same author

Probabilistic Topic Modeling With Transformer Representations.

IEEE transactions on neural networks and learning systems·2025
Same author

What makes German manufacturing plants move locations?

The Annals of regional science·2024
Same author

Physiological aging in India: The role of the epidemiological transition.

PloS one·2023
Same author

[Expected education deficiencies from Covid-related school-lockdowns in spring 2020-empirical evidence on family-education resources using nonlinear regression].

Zeitschrift fur Erziehungswissenschaft : ZfE·2023
Same author

Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using Twitter data.

Computational statistics·2023
Same journal

Elastic functional Cox regression model with shape predictors.

Journal of applied statistics·2026
Same journal

An improved two-stage binary relevance method for multilabel classification.

Journal of applied statistics·2026
Same journal

Classification of multivariate functional data with an application to ADHD fMRI data.

Journal of applied statistics·2026
Same journal

Assessing the performance of longitudinal T-lymphocytes as biomarkers of immune recovery in HIV-infected children with or without TB co-infection.

Journal of applied statistics·2026
Same journal

Sparse long-only Markowitz portfolio optimization.

Journal of applied statistics·2026
Same journal

Homogeneity of multinomial populations when data are classified into a large number of groups.

Journal of applied statistics·2026
See all related articles

Related Experiment Video

Updated: Aug 9, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

657

Unsupervised document classification integrating web scraping, one-class SVM and LDA topic modelling.

Anton Thielmann1, Christoph Weisser1,2, Astrid Krenz1,3

  • 1Center for Statistics, Georg-August-Universität Göttingen, Göttingen, Germany.

Journal of Applied Statistics
|February 23, 2023
PubMed
Summary
This summary is machine-generated.

This study introduces a novel unsupervised document classification method for imbalanced datasets, combining web scraping, one-class Support Vector Machines (SVM), and Latent Dirichlet Allocation (LDA) topic modeling to bypass manual labeling and improve accuracy.

Keywords:
LDA topic modelUnsupervised document classificationmachine learningone-class SVMout-of-domain training dataweb scraping

More Related Videos

Asthma Detection Research Based on Voice Signal Processing and Machine Learning
04:04

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

221
Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images
08:20

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Published on: October 27, 2023

1.5K

Related Experiment Videos

Last Updated: Aug 9, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

657
Asthma Detection Research Based on Voice Signal Processing and Machine Learning
04:04

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

221
Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images
08:20

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Published on: October 27, 2023

1.5K

Area of Science:

  • Computer Science
  • Machine Learning
  • Data Science

Background:

  • Unsupervised document classification is challenging for imbalanced datasets.
  • Manual data labeling is time-consuming, costly, and may miss underrepresented categories.

Purpose of the Study:

  • To develop an automated method for document classification that overcomes manual labeling limitations.
  • To improve the accuracy of classifying imbalanced datasets.

Main Methods:

  • Integration of web scraping for data acquisition.
  • Application of one-class Support Vector Machines (SVM) for classification.
  • Utilizing Latent Dirichlet Allocation (LDA) topic modeling for feature extraction.

Main Results:

  • Achieved unsupervised one-class document classification using out-of-domain training data.
  • Demonstrated successful classification of over 80% of target data.
  • Outperformed common machine learning classifiers on multiple datasets.

Conclusions:

  • The proposed multi-step classification rule effectively circumvents manual labeling.
  • This method offers a robust solution for unsupervised document classification in imbalanced scenarios.