MOIRE: a software package for the estimation of allele frequencies and effective multiplicity of infection from polyallelic data

  • 0Department of Biostatistics, School of Public Health, University of California, Berkeley, CA 94704, United States.
Bioinformatics (Oxford, England) +

|

Abstract

MOTIVATION

Malaria parasite genetic data can provide insight into parasite phenotypes, evolution, and transmission. However, estimating key parameters such as allele frequencies, multiplicity of infection (MOI), and within-host relatedness from genetic data is challenging, particularly in the presence of multiple related coinfecting strains. Existing methods often rely on single nucleotide polymorphism (SNP) data and do not account for within-host relatedness.

RESULTS

We present Multiplicity Of Infection and allele frequency REcovery (MOIRE), a Bayesian approach to estimate allele frequencies, MOI, and within-host relatedness from genetic data subject to experimental error. MOIRE accommodates both polyallelic and SNP data, making it applicable to diverse genotyping panels. We also introduce a novel metric, the effective MOI (eMOI), which integrates MOI and within-host relatedness, providing a robust and interpretable measure of genetic diversity. Extensive simulations and real-world data from a malaria study in Namibia demonstrate the superior performance of MOIRE over naive estimation methods, accurately estimating MOI up to seven with moderate-sized panels of diverse loci (e.g. microhaplotypes). MOIRE also revealed substantial heterogeneity in population mean MOI and mean relatedness across health districts in Namibia, suggesting detectable differences in transmission dynamics. Notably, eMOI emerges as a portable metric of within-host diversity, facilitating meaningful comparisons across settings when allele frequencies or genotyping panels differ. Compared to existing software, MOIRE enables more comprehensive insights into within-host diversity and population structure.

AVAILABILITY AND IMPLEMENTATION

MOIRE is available as an R package at https://eppicenter.github.io/moire/.

Related Concept Videos

Hardy-Weinberg Principle 01:49

71.8K

Diploid organisms have two alleles of each gene, one from each parent, in their somatic cells. Therefore, each individual contributes two alleles to the gene pool of the population. The gene pool of a population is the sum of every allele of all genes within that population and has some degree of variation. Genetic variation is typically expressed as a relative frequency, which is the percentage of the total population that has a given allele, genotype or phenotype.

In the early 20th century,...

Multiple Allele Traits 01:49

34.0K

The Concept of Multiple Allelism

Multiple allelism describes genes that exist in three or more allelic forms. Although diploid organisms, like humans, normally possess only two alleles of each gene, there are multiple alleles of many (if not most) human genes present in a population. Blood type is one example of multiple allelism. There are three alleles for blood type (HBB gene) in humans: IA, IB, and i.

Incomplete Dominance

Sickle cell anemia, which is caused by a mutation in the gene...

Expected Frequencies in Goodness-of-Fit Tests 01:19

2.5K

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).

Hence, the expected frequency of any number appearing when casting a die...

Determination of Expected Frequency 01:08

2.1K

Suppose one wants to test independence between the two variables of a contingency table. The values in the table constitute the observed frequencies of the dataset. But how does one determine the expected frequency of the dataset? One of the important assumptions is that the two variables are independent, which means the variables do not influence each other. For independent variables, the statistical probability of any event involving both variables is calculated by multiplying the individual...

Comparing Copy Number Variations and SNPs 02:26

17.4K

Sequencing of the human genome has opened up several best-kept secrets of the genome. Scientists have identified thousands of genome variations that exist within a population. These variations can be a single nucleotide or a larger chromosomal variation.
Copy number variations or CNVs are the structural variations that cover more than 1kb of DNA sequence. The single nucleotide polymorphism (SNP), on the other hand, is a single nucleotide change or a point mutation that is found in more than 1%...

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for <em data-lazy-src=

399

This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...