Reaping the Fruits of LLM Pruning: Towards Small Language Models for Efficient Non-Coding Variant Effect Prediction
View abstract on PubMed
Summary
This summary is machine-generated.Layer pruning makes large genomic language models more efficient for variant prediction. Removing redundant layers reduces computational demands without sacrificing accuracy, improving non-coding variant interpretation.
Area Of Science
- Genomics
- Computational Biology
- Artificial Intelligence
Background
- Interpreting genetic variants is crucial for precision medicine.
- Large genomic language models (LLMs) struggle with non-coding variant prediction due to computational scaling.
- Layer pruning, successful in natural language processing, can optimize LLMs.
Purpose Of The Study
- To systematically evaluate the contribution of each Transformer layer in genomic LLMs (DNABERT 2, Nucleotide Transformer) for variant prediction.
- To develop pruned, more computationally efficient LLMs by removing non-critical layers.
- To assess the performance of pruned LLMs on a non-coding variant effect prediction benchmark.
Main Methods
- Systematic layer ablation of DNABERT 2 and Nucleotide Transformer models.
- Building layer importance profiles based on performance changes.
- Fine-tuning pruned and full models on the Enformer eQTL causal variant dataset.
- Comparing performance metrics (accuracy, AUC) and resource usage (training time, memory).
Main Results
- Layer importance varied significantly across models, with some layers being removable with minimal performance loss.
- Pruned models achieved accuracy and AUC comparable to full models after fine-tuning.
- Pruned models demonstrated substantial reductions in training time and memory requirements.
Conclusions
- Layer-wise pruning is an effective strategy for creating compact and efficient genomic LLMs.
- Pruned LLMs maintain predictive power while significantly lowering computational demands.
- This approach enhances the accessibility of large-scale non-coding variant analysis for research and clinical applications.
Related Concept Videos
Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
In humans, more than 80% of the genome gets transcribed. However, only around 2% of the genome codes for proteins. The remaining part produces non-coding RNAs which includes ribosomal RNAs, transfer RNAs, telomerase RNAs, and regulatory RNAs, among other types. A large number of regulatory non-coding RNAs have been classified into two groups depending upon their length – small non-coding RNAs, such as microRNA, which are less than 200 nucleotides in length, and long non-coding RNA...
Truncation in survival analysis refers to the exclusion of individuals or events from the dataset based on specific criteria related to the time of the event. This exclusion can happen in two primary forms: left truncation and right truncation.
Left truncation occurs when individuals who experienced the event of interest before a certain time are not included in the study. This is often due to a "delayed entry" into the study where only those who survive until a certain entry point are...
When a nucleophile and an alkyl halide react, nucleophilic substitution and β-elimination reactions compete to generate products.
The following factors can influence the mechanisms competing against each other:
• Structure of the substrate
• Structure and basicity of the nucleophile
• Temperature conditions
• Solvent (protic vs. aprotic)
Thus, depending upon the relative rate of the unimolecular or bimolecular...

