Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Language Development01:22

Language Development

317
Children master language quickly and with relative ease, supported by both biological predisposition and reinforcement. B. F. Skinner (1957) proposed that language is learned through reinforcement, while Noam Chomsky (1965) argued that language acquisition mechanisms are biologically determined.
The critical period for language acquisition suggests that the ability to acquire language is at its peak early in life. As people age, this proficiency decreases. Language development begins very...
317
Language and Cognition01:27

Language and Cognition

323
Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.
323

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Improving synthetic media generation and detection using generative adversarial networks.

PeerJ. Computer science·2024
Same author

Development of Biocompatible Electrospun PHBV-PLLA Polymeric Bilayer Composite Membranes for Skin Tissue Engineering Applications.

Molecules (Basel, Switzerland)·2024
Same author

The Use of CRISPR-Cas9 Genetic Technology in Cardiovascular Disease: A Comprehensive Review of Current Progress and Future Prospective.

Cureus·2024
Same author

Launaea fragilis extract attenuated arthritis in rats through modulation of IL-1β, TNF-α, IL-6, NF-κB, COX-2, IL-4, and IL-10.

Inflammopharmacology·2024
Same author

Trends in rheumatoid arthritis associated cardiovascular mortality in the United States from 1999 to 2020.

Current problems in cardiology·2024
Same author

Advertisement design in dynamic interactive scenarios using DeepFM and long short-term memory (LSTM).

PeerJ. Computer science·2024
Same journal

DARUMA: a gateway to fast and easy prediction of intrinsically disordered regions.

PeerJ. Computer science·2026
Same journal

Alzheimer's disease detection using a quantum deep neural network with Haralick feature extraction and simulated annealing optimization.

PeerJ. Computer science·2026
Same journal

Network anomaly detection using Deep Autoencoder and parallel Artificial Bee Colony algorithm-trained neural network.

PeerJ. Computer science·2026
Same journal

An anomaly detection model for multivariate time series with anomaly perception.

PeerJ. Computer science·2026
Same journal

Retraction: A wormhole attack detection method for tactical wireless sensor networks.

PeerJ. Computer science·2026
Same journal

Evaluation of mental disorder with prioritization of its type by utilizing the bipolar complex fuzzy decision-making approach based on Schweizer-Sklar prioritized aggregation operators.

PeerJ. Computer science·2026
See all related articles

Related Experiment Video

Updated: Jun 5, 2025

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application
05:56

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

2.4K

Machine learning based framework for fine-grained word segmentation and enhanced text normalization for low resourced

Shahzad Nazir1, Muhammad Asif1, Mariam Rehman2

  • 1Department of Computer Science, National Textile University, Faisalabad, Pakistan.

Peerj. Computer Science
|December 13, 2024
PubMed
Summary
This summary is machine-generated.

This study introduces advanced text normalization and tokenization methods for Urdu, enhancing natural language processing (NLP) outcomes. These techniques significantly improve Urdu text pre-processing, addressing a gap in research for this widely spoken language.

Keywords:
Low resourced languagesMachine learningText normalizationWord segmentation

More Related Videos

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

502
Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
06:48

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

9.1K

Related Experiment Videos

Last Updated: Jun 5, 2025

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application
05:56

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

2.4K
Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

502
Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
06:48

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

9.1K

Area of Science:

  • Computational Linguistics
  • Natural Language Processing
  • Urdu Language Technology

Background:

  • Text pre-processing, including normalization and tokenization, is crucial for effective natural language processing (NLP).
  • Existing NLP tools often overlook the 10th most spoken language, Urdu, despite its global significance.
  • There is a need for specialized and improved pre-processing techniques for the Urdu language.

Purpose of the Study:

  • To develop and present enhanced text normalization techniques for Urdu.
  • To introduce improved word tokenization methods specifically designed for Urdu text.
  • To address the research gap in Urdu language pre-processing within the NLP community.

Main Methods:

  • Urdu text normalization using a combination of regular expressions and rule-based systems, including character normalization and digit separation.
  • Urdu word tokenization employing a machine learning model with handcrafted features to predict word boundaries.
  • Creation of the largest human-annotated Urdu dataset across five distinct domains for model training and evaluation.

Main Results:

  • The proposed normalization approach achieved a 20% improvement in Urdu text pre-processing.
  • The developed tokenization method resulted in a 6% improvement for Urdu word segmentation.
  • Evaluation metrics including precision, recall, F-measure, and accuracy demonstrate the effectiveness of the proposed techniques compared to state-of-the-art methods.

Conclusions:

  • The implemented text normalization and tokenization techniques offer significant advancements for Urdu language processing.
  • These methods enhance the accuracy and efficiency of natural language processing tasks involving Urdu text.
  • The study contributes valuable resources and methodologies for Urdu NLP, paving the way for future research and applications.