Lexicon dataset for the Hausa language

  • 0Computer Science Department, African University of Science and Technology, Abuja, Nigeria.

|

|

Summary

This summary is machine-generated.

Researchers created a new sentiment analysis dataset for the Hausa language. This augmented lexicon resource aids natural language processing for low-resource languages like Hausa.

Area Of Science

  • Natural Language Processing
  • Computational Linguistics
  • African Languages

Background

  • Sentiment analysis research often lacks resources for low-resource languages.
  • The Hausa language presents unique challenges due to limited digital data.

Purpose Of The Study

  • To develop and present a comprehensive sentiment analysis dataset for the Hausa language.
  • To address the scarcity of labeled data for Hausa sentiment analysis.

Main Methods

  • Constructed an augmented lexicon using a Hausa dictionary.
  • Applied data augmentation techniques to expand the dataset size.
  • Manually annotated data for sentiment polarity (positive, negative, neutral).

Main Results

  • Created a dataset with 14,663 entries: 4,154 positive, 4,310 negative, and 6,199 neutral.
  • The dataset provides a balanced representation of sentiment polarities.
  • Successfully augmented a lexicon-based dataset for sentiment analysis.

Conclusions

  • The developed Hausa sentiment analysis dataset is a valuable resource for NLP research.
  • This dataset will facilitate the development of sentiment analysis models for Hausa social media and product reviews.
  • Contributes significantly to the field of low-resource language sentiment analysis.

Related Concept Videos

Genetic Lingo 01:11

102.8K

Overview

An organism is diploid if it inherits two variants, or alleles, of each gene, one from each parent. These two alleles constitute the genotype for a given gene. The term genotype is also used to refer to an organism’s complete set of genes. A diploid organism with two identical alleles has a homozygous genotype, whereas two different alleles indicates a heterozygous genotype. Observable traits arising from genotypes are called phenotypes, which can also be influenced by...

Nomenclature of Carboxylic Acid Derivatives: Acid Halides, Esters, and Acid Anhydrides 01:16

4.3K

Naming Acid Halides
The IUPAC and common names of acid halides are derived from the corresponding carboxylic acids, by changing “ic acid” to “yl halide.” For example, as shown below, the IUPAC name ethanoyl chloride is derived from ethanoic acid, and the common name, acetyl chloride, is obtained from acetic acid.








IUPAC:
Ethanoic acid
Ethanoyl chloride


Common:
Acetic acid
Acetyl chloride



Cyclic acid halides are named by replacing the...

IUPAC Nomenclature of Ketones 01:09

5.8K

Like aldehydes, ketones are named using IUPAC rules; in this case, by replacing “e” in the name of the longest hydrocarbon chain with “one.” In acyclic ketones, the ketonic carbon is given the lowest locant value. For instance, as shown below, a simple five-carbon ketone is named pentan-2-one, instead of pentan-4-one. IUPAC rules also allow the placing of the locant value before the parent name to give an alternate name, 2-pentanone.

Cyclic ketones are numbered starting...

Nomenclature of Alkanes 02:22

21.9K

In the late 19th-century, the number of new chemical compounds discovered increased tremendously. Hence, the necessity arose to develop a naming system for the systematic nomenclature of these newly discovered compounds. IUPAC (International Union for Pure and Applied Chemistry), established in 1919, sets rules for the nomenclature.
The alkane nomenclature considers the length of the carbon chain, the number, and the location of the substituent to arrive at its systematic name. The IUPAC...

Nomenclature of Aromatic Compounds with Multiple Substituents 01:11

7.8K

When more than one substituent is present on the benzene ring, the IUPAC nomenclature depends on the number of substituents present.
For disubstituted benzene derivatives, with two groups attached to the benzene ring, three constitutional isomers are possible. For example, consider dimethyl benzene, often called xylene, where the second methyl group can be substituted at the second, third, or fourth carbon. The relative position of the substituents is represented by prefixes ortho, meta, or...

Nomenclature of Alkenes 02:29

12.0K

The IUPAC naming system for alkenes replaces -an- with -en- in the corresponding parent alkanes. Accordingly, a simple alkene replaces the -ane suffix of the alkane with -ene.
As per the IUPAC rules, the longest carbon chain containing the maximum number of double bonds is identified as the parent chain and is numbered such that the doubly bonded carbon atoms receive the lowest possible numbers. The location of the double bond is indicated by the number of its first carbon atom. In branched...