Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Language trees and zipping.

Dario Benedetto1, Emanuele Caglioti, Vittorio Loreto

  • 1La Sapienza University, Mathematics Department, Piazzale Aldo Moro 5, 00185 Rome, Italy. benedetto@mat.uniroma1.it

Physical Review Letters
|January 22, 2002
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Time-space dynamics of income segregation in the city of Milan.

PNAS nexus·2025
Same author

Clinical Impact of Sarcopenia in the Decision-Making Process for Patients with Acute Diverticulitis.

Journal of clinical medicine·2025
Same author

Cities beyond proximity.

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences·2024
Same author

The geography of technological innovation dynamics.

Scientific reports·2023
Same author

Exploitation and exploration in text evolution. Quantifying planning and translation flows during writing.

PloS one·2023
Same author

Urban economic fitness and complexity from patent data.

Scientific reports·2023
Same journal

Erratum: Bacterial Turbulence at Compressible Fluid Interfaces [Phys. Rev. Lett. 136, 138301 (2026)].

Physical review letters·2026
Same journal

Unveiling Light-Quark Yukawa Flavor Structure via Dihadron Fragmentation at Lepton Colliders.

Physical review letters·2026
Same journal

Adaptable Route to Fast Coherent State Transport via Bang-Bang-Bang Protocols.

Physical review letters·2026
Same journal

Topological Transition and Emergence of Elasticity of Dislocation in Skyrmion Lattice: Beyond Kittel's Magnetic-Polar Analogy.

Physical review letters·2026
Same journal

Pound-Drever-Hall Method for Superconducting-Qubit Readout.

Physical review letters·2026
Same journal

Coupling a ^{73}Ge Nuclear Spin to an Electrostatically Defined Quantum Dot in Silicon.

Physical review letters·2026
See all related articles

This study introduces a general method using data-compression techniques to extract information from various data types like text and DNA sequences. The approach accurately identifies languages, authors, and classifies text, demonstrating its broad applicability.

Area of Science:

  • Computational Linguistics
  • Bioinformatics
  • Data Science

Background:

  • Extracting meaningful information from diverse data types (text, DNA, time series) is challenging.
  • Existing methods may lack generality or accuracy across different domains.
  • Developing a unified approach for information extraction is a significant research goal.

Purpose of the Study:

  • To present a highly generalizable method for information extraction from generic character strings.
  • To demonstrate the method's effectiveness in linguistic applications.
  • To establish a novel approach for analyzing sequences and identifying patterns.

Main Methods:

  • The core method is based on data-compression techniques.
  • It involves computing a measure of 'remoteness' between different knowledge bodies.

Related Experiment Videos

  • The implementation is applied to linguistic problems for evaluation.
  • Main Results:

    • Achieved highly accurate results in language recognition.
    • Demonstrated strong performance in authorship attribution.
    • Showcased effectiveness in language classification tasks.

    Conclusions:

    • The presented data-compression-based method offers a versatile tool for information extraction.
    • The approach is particularly effective for linguistic analysis and classification.
    • This technique provides a robust framework for analyzing generic character strings across various scientific fields.