DEAST: A dataset for English-Arabic scientific translation and vice versa

  • 0Department of Computer Science, Faculty of Computers and Artificial Intelligence, Benha University, Benha 13518, Egypt.

|

|

Summary

This summary is machine-generated.

This study introduces the first Arabic-English dataset for scientific text, addressing a critical resource gap for machine translation in specialized domains.

Area Of Science

  • Natural Language Processing
  • Computational Linguistics
  • Scientific Translation

Background

  • Scientific text volume is rapidly increasing.
  • Domain-specific translation requires specialized expertise.
  • A significant lack of high-quality Arabic-English resources hinders scientific machine translation.

Purpose Of The Study

  • To introduce a novel Arabic-English dataset for scientific text.
  • To address the resource scarcity in Arabic scientific machine translation.
  • To facilitate advancements in natural language processing for specialized domains.

Main Methods

  • Collected Arabic-English scientific text data from thesis titles across various sources.
  • Curated a dataset specifically for scientific text translation.
  • Focused on titles to create a foundational resource.

Main Results

  • Developed the first available Arabic-English dataset for scientific text.
  • The dataset comprises titles from academic theses.
  • This resource aims to support research in machine translation.

Conclusions

  • The introduced dataset is a crucial step towards improving Arabic scientific text translation.
  • It provides a much-needed resource for natural language processing research.
  • Further development of specialized datasets is essential for domain-specific translation.

Related Concept Videos

Translation 01:31

155.2K

Lesson: Translation
Translation is the process of synthesizing proteins from the genetic information carried by messenger RNA (mRNA). Following transcription, it constitutes the final step in the expression of genes. This process is carried out by ribosomes, complexes of protein and specialized RNA molecules. Ribosomes, transfer RNA (tRNA), and other proteins produce a chain of amino acids—the polypeptide—as the end product of translation.
Translation Produces the Building Blocks of...

Translation 01:31

17.5K

Translation is the process of synthesizing proteins from the genetic information carried by messenger RNA (mRNA). Following transcription, it constitutes the final step in the expression of genes. This process is carried out by ribosomes, complexes of protein and specialized RNA molecules. Ribosomes, transfer RNA (tRNA), and other proteins produce a chain of amino acids—the polypeptide—as the end product of translation.
Translation Produces the Building Blocks of Life
Proteins are...

Improving Translational Accuracy 02:07

3.5K
Improving Translational Accuracy 02:07

14.1K

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Genetic Lingo 01:11

113.7K

Overview

An organism is diploid if it inherits two variants, or alleles, of each gene, one from each parent. These two alleles constitute the genotype for a given gene. The term genotype is also used to refer to an organism’s complete set of genes. A diploid organism with two identical alleles has a homozygous genotype, whereas two different alleles indicates a heterozygous genotype. Observable traits arising from genotypes are called phenotypes, which can also be influenced by...

Bioequivalence Data: Statistical Interpretation 01:16

190

Body:The statistical interpretation of bioequivalence data is a significant aspect of pharmaceutical research. Bioequivalence refers to the absence of any significant difference in the rate and extent to which the active ingredient in pharmaceutical products becomes available at the site of drug action when administered at the same molar dose under similar conditions. This helps determine if different drug products have similar absorption rates, ensuring their interchangeability.Statistical...