Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Language Development

Language Development

Children master language quickly and with relative ease, supported by both biological predisposition and reinforcement. B. F. Skinner (1957) proposed that language is learned through reinforcement, while Noam Chomsky (1965) argued that language acquisition mechanisms are biologically determined.
The critical period for language acquisition suggests that the ability to acquire language is at its peak early in life. As people age, this proficiency decreases. Language development begins very...

Language and Cognition

Language and Cognition

Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Improving synthetic media generation and detection using generative adversarial networks.

PeerJ. Computer science·2024

Same author

Development of Biocompatible Electrospun PHBV-PLLA Polymeric Bilayer Composite Membranes for Skin Tissue Engineering Applications.

Molecules (Basel, Switzerland)·2024

Same author

The Use of CRISPR-Cas9 Genetic Technology in Cardiovascular Disease: A Comprehensive Review of Current Progress and Future Prospective.

Cureus·2024

Same author

Launaea fragilis extract attenuated arthritis in rats through modulation of IL-1β, TNF-α, IL-6, NF-κB, COX-2, IL-4, and IL-10.

Inflammopharmacology·2024

Same author

Trends in rheumatoid arthritis associated cardiovascular mortality in the United States from 1999 to 2020.

Current problems in cardiology·2024

Same author

Advertisement design in dynamic interactive scenarios using DeepFM and long short-term memory (LSTM).

PeerJ. Computer science·2024

Same journal

DARUMA: a gateway to fast and easy prediction of intrinsically disordered regions.

PeerJ. Computer science·2026

Same journal

Alzheimer's disease detection using a quantum deep neural network with Haralick feature extraction and simulated annealing optimization.

PeerJ. Computer science·2026

Same journal

Network anomaly detection using Deep Autoencoder and parallel Artificial Bee Colony algorithm-trained neural network.

PeerJ. Computer science·2026

Same journal

An anomaly detection model for multivariate time series with anomaly perception.

PeerJ. Computer science·2026

Same journal

Retraction: A wormhole attack detection method for tactical wireless sensor networks.

PeerJ. Computer science·2026

Same journal

Evaluation of mental disorder with prioritization of its type by utilizing the bipolar complex fuzzy decision-making approach based on Schweizer-Sklar prioritized aggregation operators.

PeerJ. Computer science·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 5, 2025

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

Machine learning based framework for fine-grained word segmentation and enhanced text normalization for low resourced

Shahzad Nazir¹, Muhammad Asif¹, Mariam Rehman²

¹Department of Computer Science, National Textile University, Faisalabad, Pakistan.

Peerj. Computer Science

|December 13, 2024

Summary

This summary is machine-generated.

This study introduces advanced text normalization and tokenization methods for Urdu, enhancing natural language processing (NLP) outcomes. These techniques significantly improve Urdu text pre-processing, addressing a gap in research for this widely spoken language.

Keywords:

Low resourced languages Machine learning Text normalization Word segmentation

More Related Videos

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Related Experiment Videos

Last Updated: Jun 5, 2025

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Area of Science:

Computational Linguistics
Natural Language Processing
Urdu Language Technology

Background:

Text pre-processing, including normalization and tokenization, is crucial for effective natural language processing (NLP).
Existing NLP tools often overlook the 10th most spoken language, Urdu, despite its global significance.
There is a need for specialized and improved pre-processing techniques for the Urdu language.

Purpose of the Study:

To develop and present enhanced text normalization techniques for Urdu.
To introduce improved word tokenization methods specifically designed for Urdu text.
To address the research gap in Urdu language pre-processing within the NLP community.

Main Methods:

Urdu text normalization using a combination of regular expressions and rule-based systems, including character normalization and digit separation.
Urdu word tokenization employing a machine learning model with handcrafted features to predict word boundaries.
Creation of the largest human-annotated Urdu dataset across five distinct domains for model training and evaluation.

Main Results:

The proposed normalization approach achieved a 20% improvement in Urdu text pre-processing.
The developed tokenization method resulted in a 6% improvement for Urdu word segmentation.
Evaluation metrics including precision, recall, F-measure, and accuracy demonstrate the effectiveness of the proposed techniques compared to state-of-the-art methods.

Conclusions:

The implemented text normalization and tokenization techniques offer significant advancements for Urdu language processing.
These methods enhance the accuracy and efficiency of natural language processing tasks involving Urdu text.
The study contributes valuable resources and methodologies for Urdu NLP, paving the way for future research and applications.