Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Generalization, Discrimination, and Extinction

Generalization, Discrimination, and Extinction

Generalization, discrimination, and extinction are key concepts in operant conditioning that influence how behaviors are learned and maintained.
Generalization occurs when a behavior reinforced in one context is performed in similar situations. For instance, a student who studies diligently for calculus and receives excellent grades might apply the same study habits to psychology and history, expecting similar results. Generalization shows how learning in one setting can influence behavior in...

Language Development

Language Development

Children master language quickly and with relative ease, supported by both biological predisposition and reinforcement. B. F. Skinner (1957) proposed that language is learned through reinforcement, while Noam Chomsky (1965) argued that language acquisition mechanisms are biologically determined.
The critical period for language acquisition suggests that the ability to acquire language is at its peak early in life. As people age, this proficiency decreases. Language development begins very...

Censoring Survival Data

Censoring Survival Data

Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Difference from Background: Limit of Detection

Difference from Background: Limit of Detection

The limit of detection (LOD) is the smallest amount of analyte that can be distinguished from the background noise. The LOD value corresponds to the concentration at which the analyte signal is three times larger than the standard deviation of the blank signal. Below this value, the analyte signal cannot be differentiated from the background noise. It is calculated by dividing the calibration slope by 3 times the standard deviation of the blank signals.
The LOD indicates the presence or absence...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

R3DG: Retrieve, Rank, and Reconstruction with Different Granularities for Multimodal Sentiment Analysis.

Research (Washington, D.C.)·2025

Same author

Differently Implicational Bandler-Kohout Subproduct Method.

IEEE transactions on cybernetics·2025

Same author

Triple dimensional psychology knowledge encouraging graph attention networks to exploit aspect-based sentiment analysis.

Scientific reports·2025

Same author

Cross-Modal Data Fusion via Vision-Language Model for Crop Disease Recognition.

Sensors (Basel, Switzerland)·2025

Same author

Dense skip-attention for convolutional networks.

Scientific reports·2025

Same author

ViE-Take: A Vision-Driven Multi-Modal Dataset for Exploring the Emotional Landscape in Takeover Safety of Autonomous Driving.

Research (Washington, D.C.)·2025

Same journal

Predicting 1-Year Renal Outcomes in Patients with Diabetic Kidney Disease in CKD Stages 3 to 4: A Multimodal Machine Learning Approach Fusing Clinical Composites and Pathology Images.

Research (Washington, D.C.)·2026

Same journal

Antioxidant Nanozymes: From Rational Design to Biomedical Applications.

Research (Washington, D.C.)·2026

Same journal

Quantum-Inspired Fast Algorithm and Circuit Realization for Constrained Combinatorial Optimization Problem.

Research (Washington, D.C.)·2026

Same journal

Monocyte-Derived LGMN<sup>+</sup> Macrophages Divert Lung Injury Outcomes toward Fibrosis through Matrix Remodeling.

Research (Washington, D.C.)·2026

Same journal

From Isolation to Collaboration: Data Trading Mechanism in the Era of Large Language Model Democratization.

Research (Washington, D.C.)·2026

Same journal

Ultrasensitive In Vivo Imaging of Adoptive Immune Cell Distribution and Expansion Using Second Near-Infrared Conjugated Oligoelectrolyte Probes.

Research (Washington, D.C.)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 16, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Enhancing Offensive Language Detection with Data Augmentation and Knowledge Distillation.

Jiawen Deng^1,1, Zhuang Chen¹, Hao Sun¹

¹The CoAI group, DCST; Institute for Artificial Intelligence; State Key Lab of Intelligent Technology and Systems; Beijing National Research Center for Information Science and Technology; Tsinghua University, Beijing 100084, China.

Research (Washington, D.C.)

|September 20, 2023

Summary

This summary is machine-generated.

This study introduces AugCOLD, a 1 million sample dataset to improve Chinese offensive language detection. A novel multiteacher distillation framework enhances model performance and robustness for safer online communication.

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

DNA Virus Detection System Based on RPA-CRISPR/Cas12a-SPM and Deep Learning

DNA Virus Detection System Based on RPA-CRISPR/Cas12a-SPM and Deep Learning

Published on: May 10, 2024

Related Experiment Videos

Last Updated: Jul 16, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

DNA Virus Detection System Based on RPA-CRISPR/Cas12a-SPM and Deep Learning

DNA Virus Detection System Based on RPA-CRISPR/Cas12a-SPM and Deep Learning

Published on: May 10, 2024

Area of Science:

Natural Language Processing
Computational Linguistics
Artificial Intelligence

Background:

Offensive language detection is vital for social media and safe AI deployment.
Existing Chinese datasets for offensive language are limited in scale and scope compared to English resources.
This data scarcity hinders the accuracy of Chinese offensive language detectors, particularly for complex or novel cases.

Purpose of the Study:

To address the limitations of existing Chinese offensive language datasets.
To develop a large-scale, unsupervised dataset for training more robust detectors.
To enhance the performance and generalization capabilities of Chinese offensive language detection models.

Main Methods:

Introduced AugCOLD (Augmented Chinese Offensive Language Dataset), a 1 million sample unsupervised dataset created via data crawling and model generation.
Employed a multiteacher knowledge distillation framework to leverage unsupervised data.
Utilized publicly available datasets to train multiple teacher models, which then assigned soft labels to AugCOLD for knowledge transfer to a student network (the final detector).

Main Results:

Demonstrated significant improvements in offensive language detection performance.
Showcased enhanced generalization and robustness of the offensive language detector on various test sets, including challenging hard cases.
Validated the effectiveness of the proposed multiteacher distillation approach with the AugCOLD dataset.

Conclusions:

The AugCOLD dataset and the multiteacher distillation framework effectively address the scarcity of Chinese offensive language data.
The proposed method significantly improves the accuracy, generalization, and robustness of Chinese offensive language detectors.
This work contributes to safer online communication and the responsible deployment of large language models in Chinese contexts.