Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Aggregates Classification01:29

Aggregates Classification

350
Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...
350
Improving Translational Accuracy02:07

Improving Translational Accuracy

11.7K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
11.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Assessing the reliability and validity of the International Trauma Interview in a sample of Ukrainian soldiers.

Journal of anxiety disorders·2026
Same author

Clinician Assessed Rates of PTSD and Complex PTSD in a Medical-Rehabilitation Sample of Active-Duty Military Personnel in the Armed Forces of Ukraine.

Acta psychiatrica Scandinavica·2025
Same author

Hypoxia-induced metastatic heterogeneity in pancreatic cancer.

Research square·2025
Same author

Hypoxia-induced metastatic heterogeneity in pancreatic cancer.

bioRxiv : the preprint server for biology·2025
Same author

Strain-induced crumpling of graphene oxide lamellas to achieve fast and selective transport of H<sub>2</sub> and CO<sub>2</sub>.

Nature nanotechnology·2025
Same author

Linguacodus: a synergistic framework for transformative code generation in machine learning pipelines.

PeerJ. Computer science·2024
Same journal

DARUMA: a gateway to fast and easy prediction of intrinsically disordered regions.

PeerJ. Computer science·2026
Same journal

Alzheimer's disease detection using a quantum deep neural network with Haralick feature extraction and simulated annealing optimization.

PeerJ. Computer science·2026
Same journal

Network anomaly detection using Deep Autoencoder and parallel Artificial Bee Colony algorithm-trained neural network.

PeerJ. Computer science·2026
Same journal

An anomaly detection model for multivariate time series with anomaly perception.

PeerJ. Computer science·2026
Same journal

Retraction: A wormhole attack detection method for tactical wireless sensor networks.

PeerJ. Computer science·2026
Same journal

Evaluation of mental disorder with prioritization of its type by utilizing the bipolar complex fuzzy decision-making approach based on Schweizer-Sklar prioritized aggregation operators.

PeerJ. Computer science·2026
See all related articles

Related Experiment Video

Updated: Jul 26, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

630

Code4ML: a large-scale dataset of annotated Machine Learning code.

Anastasia Drozdova1, Ekaterina Trofimova1, Polina Guseva1

  • 1Department of Computer Science, NRU Higher School of Economics, Moscow, Russia.

Peerj. Computer Science
|June 22, 2023
PubMed
Summary
This summary is machine-generated.

Researchers developed the Code4ML corpus, a large dataset of annotated machine learning (ML) code snippets from Kaggle. This resource aids ML development by providing labeled code for tasks like classification and generation.

Keywords:
Jupyter code snippetsML code dataset

More Related Videos

Deep Neural Networks for Image-Based Dietary Assessment
13:19

Deep Neural Networks for Image-Based Dietary Assessment

Published on: March 13, 2021

9.2K
Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates
08:56

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Published on: January 13, 2023

2.2K

Related Experiment Videos

Last Updated: Jul 26, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

630
Deep Neural Networks for Image-Based Dietary Assessment
13:19

Deep Neural Networks for Image-Based Dietary Assessment

Published on: March 13, 2021

9.2K
Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates
08:56

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Published on: January 13, 2023

2.2K

Area of Science:

  • Computer Science
  • Machine Learning
  • Software Engineering
  • Data Science

Background:

  • Program code is increasingly used as a data source in data science for tasks like semantic classification and program generation.
  • Machine learning model application is hindered by the lack of annotated code snippet datasets.

Purpose of the Study:

  • To address the scarcity of annotated code datasets for machine learning.
  • To introduce the Code4ML corpus, a comprehensive collection of annotated ML code snippets.

Main Methods:

  • Collected approximately 2.5 million machine learning code snippets from 100,000 Jupyter notebooks hosted on Kaggle.
  • Annotated a representative fraction of these code snippets using a custom-designed, user-friendly interface.
  • Included associated metadata such as task summaries, competition details, and dataset descriptions.

Main Results:

  • The Code4ML corpus provides a large-scale, annotated dataset of ML code snippets.
  • The dataset is derived from publicly available data from Kaggle, a leading data science competition platform.
  • Human annotation was performed on a significant portion of the collected code snippets.

Conclusions:

  • The Code4ML dataset offers a valuable resource for data science and software engineering research.
  • It can facilitate data-driven approaches to challenges such as semantic code classification, code auto-completion, and natural language-based code generation for ML tasks.