Structural and genetic diversity in the secreted mucins MUC5AC and MUC5B

Affiliations
  • 1Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA.
  • 2Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany; Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany.
  • 3Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Department of Genetics, Cell Biology, and Development, University of Minnesota Medical School, Minneapolis, MN 55455, USA.
  • 4Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA.
  • 5Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA.
  • 6Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, USA.
  • 7Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA.
  • 8Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA.
  • 9Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA. Electronic address: ee3@uw.edu.

Abstract

The secreted mucins MUC5AC and MUC5B are large glycoproteins that play critical defensive roles in pathogen entrapment and mucociliary clearance. Their respective genes contain polymorphic and degenerate protein-coding variable number tandem repeats (VNTRs) that make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5,761-5,762 amino acids [aa]); however, seven haplotypes have expanded VNTRs (6,291-7,019 aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5,249-6,325 aa) with cysteine-rich domain and VNTR copy-number variation. We group MUC5AC alleles into three phylogenetic clades: H1 (46%, ∼5,654 aa), H2 (33%, ∼5,742 aa), and H3 (7%, ∼6,325 aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium and Tajima’s D analyses reveal that East Asians carry exceptionally large blocks with an excess of rare variation (p < 0.05) at MUC5AC. To validate this result, we use Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observe a signature of positive selection in H1 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium (p < 0.05), consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein-coding VNTRs for improved disease associations.

Related Concept Videos

JoVE Research Video for Overview of Secretory Vesicles 01:33

3.2K

Secretory vesicles, also known as dense core vesicles (DCVs), are membrane-bound vesicles that transport secretory proteins, such as hormones or neurotransmitters. Regulated secretory vesicles transport proteins from the trans-Golgi network to the exterior of the cell. Proteins present in regulated secretory vesicles are required to be rapidly exocytosed in large amounts upon a specific stimulus.
Various proteins regulate the aggregation of molecules inside the secretory vesicles. Chromogranins…

JoVE Research Video for Protein Complexes with Interchangeable Parts 01:57

2.5K

Groups of proteins may form a complex where each protein in this complex has a different role in the overall execution of the complex’s function. Often some of the proteins in the complex can be replaced by a closely related variant to give a complex that contains many of the same components yet is functionally distinct.
The SCF ubiquitin ligase is a protein complex of five individual proteins. This complex attaches ubiquitin to other target proteins to mark them for degradation. In order…

JoVE Research Video for Structure of Cadherins 01:25

3.0K

The cadherins were one of the first cell adhesion molecules discovered; the term “cadherins”   is based on their calcium-dependent adhering properties. The first cadherins discovered on the epithelial, neuronal, and placental cells were named E-cadherin, P-cadherin, and N-cadherin, respectively. These classical cadherins share sequence and structural similarities. Other cadherins, including those involved in cell signaling, are grouped into non-classical cadherins. This…