Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Karyotyping01:17

Karyotyping

Describing the number and physical features of chromosomes can reveal abnormalities that underlie genetic diseases. This description is facilitated by special staining techniques that produce a particular banding pattern on each chromosome. State-of-the-art techniques make this approach even more powerful, enabling the detection of individual genes that cause disease.A Simple Chromosome Staining Technique Provides Valuable Scientific InsightSome genetic diseases can be detected by looking at...
DNA Microarrays02:34

DNA Microarrays

Microarrays are high-throughput and relatively inexpensive assays that can be automated to analyze large quantities of data at a time. They are used in genome-wide studies to compare gene or protein expression under two varied conditions, such as healthy and diseased states. Microarrays consist of glass or silica slides on which probe molecules are covalently attached through surface functionalization. Most commonly, the slides are prepared through the chemisorption of silanes to silica...
Nucleic Acid Structure01:25

Nucleic Acid Structure

The pentose sugar in DNA is deoxyribose, while in RNA the pentose sugar is ribose. The difference between the sugars is the presence of the hydroxyl group on the ribose's second carbon and a hydrogen on the deoxyribose's second carbon. The phosphate residue attaches to the hydroxyl group of the 5′ carbon of one sugar and the hydroxyl group of the 3′ carbon of the sugar of the next nucleotide, which forms  a 5′ to 3′ phosphodiester linkage.
DNA Structure
DNA has a double-helix structure. The...
Modern Molecular Taxonomy01:29

Modern Molecular Taxonomy

Advancements in molecular biology have revolutionized the identification and characterization of bacteria, with multiple methods leveraging DNA sequencing for enhanced precision. As sequencing technologies improve and costs decline, these approaches are increasingly used in clinical, environmental, and evolutionary studies.Multilocus Sequence Typing (MLST) examines several housekeeping genes, essential chromosomal genes encoding cellular functions, to distinguish strains. Approximately...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Global-focal adaptation with information separation for noise-robust transfer fault diagnosis.

Neural networks : the official journal of the International Neural Network Society·2026
Same author

Subgraph-Mamba: Subgraph Mamba model with positional encoding.

Neural networks : the official journal of the International Neural Network Society·2026
Same author

Recent Advances of Multimodal Continual Learning: A Comprehensive Survey.

IEEE transactions on neural networks and learning systems·2026
Same author

Collaborative Coarse-to-Fine Disease Learning With Discharge Summary Awareness for EHR Event Prediction.

IEEE transactions on cybernetics·2026
Same author

Enhancing Multi-View Clustering: A Sufficient Information-Theoretic Approach for Consistency Acquisition and Redundancy Elimination.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

ASIL: Augmented Structural Information Learning for Deep Graph Clustering in Hyperbolic Space.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Epitope prediction algorithms for peptide-based vaccine design.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006
Same journal

Keynote address: the role of algorithmic research in computational genomics.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006
Same journal

Stepping up the pace of discovery: the genomes to life program.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006
Same journal

Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006
Same journal

Efficient reconstruction of phylogenetic networks with constrained recombination.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006
Same journal

A new approach for gene annotation using unambiguous sequence joining.

Proceedings. IEEE Computer Society Bioinformatics Conference·2006
See all related articles

Related Experiment Video

Updated: Jun 25, 2026

Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA
12:36

Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA

Published on: May 9, 2011

An index structure for pattern similarity searching in DNA microarray data.

Haixun Wang1, Chang-Shing Perng, Wei Fan

  • 1T. J. Watson Research Center, IBM, Hawthorne, NY, 10532, USA. haixun@us.ibm.com

Proceedings. IEEE Computer Society Bioinformatics Conference
|April 20, 2005
PubMed
Summary
This summary is machine-generated.

This article describes a new computational method to quickly find genes that show similar patterns of activity across different experimental conditions. By converting complex genetic data into a structured format called a weighted-sequence, researchers can efficiently search large databases for genes that behave in a coordinated way.

Keywords:
gene expressionsubsequence matchinggenomic databasespattern correlation

Frequently Asked Questions

More Related Videos

Pattern-based Search of Epigenomic Data Using GeNemo
06:38

Pattern-based Search of Epigenomic Data Using GeNemo

Published on: October 8, 2017

High-Density DNA and RNA microarrays - Photolithographic Synthesis, Hybridization and Preparation of Large Nucleic Acid Libraries
11:22

High-Density DNA and RNA microarrays - Photolithographic Synthesis, Hybridization and Preparation of Large Nucleic Acid Libraries

Published on: August 12, 2019

Related Experiment Videos

Last Updated: Jun 25, 2026

Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA
12:36

Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA

Published on: May 9, 2011

Pattern-based Search of Epigenomic Data Using GeNemo
06:38

Pattern-based Search of Epigenomic Data Using GeNemo

Published on: October 8, 2017

High-Density DNA and RNA microarrays - Photolithographic Synthesis, Hybridization and Preparation of Large Nucleic Acid Libraries
11:22

High-Density DNA and RNA microarrays - Photolithographic Synthesis, Hybridization and Preparation of Large Nucleic Acid Libraries

Published on: August 12, 2019

Area of Science:

  • Bioinformatics and computational biology research within DNA microarray analysis
  • Data structures and indexing algorithms for weighted-sequence pattern matching

Background:

The rapid expansion of gene expression datasets has created a significant challenge for existing computational retrieval systems. No prior work had resolved how to efficiently identify genes with coherent expression profiles across numerous experimental conditions. Researchers often struggle to compare the shapes of gene activity fluctuations within massive, high-dimensional biological repositories. That uncertainty drove the need for specialized indexing techniques capable of handling complex, multi-dimensional data patterns. Prior research has shown that traditional sequence matching tools often fail to capture the nuanced correlations required for genomic analysis. This gap motivated the development of novel structures that can accommodate the inherent variability found in biological measurements. Existing methods frequently lack the speed necessary to process the sheer volume of information generated by modern high-throughput technologies. Consequently, the field remains limited by the computational overhead associated with exhaustive searching of large-scale gene expression databases.

Purpose Of The Study:

The aim of this study is to introduce an index structure for pattern similarity searching within DNA microarray data. Researchers seek to address the challenge of identifying genes that exhibit coherent expression fluctuations across experimental conditions. This problem arises because the volume of gene expression data is rapidly increasing, potentially exceeding the scale of human sequencing projects. The authors propose that queries based on pattern correlations can be supported by a weighted-sequence model. This model was originally designed for sequence matching but is adapted here for biological data. The study motivates the need for efficient retrieval methods to handle the complex shapes of gene activity. By transforming microarray data into weighted-sequences, the authors intend to facilitate the identification of co-regulated genes. This work addresses the computational limitations inherent in searching large-scale genomic databases for similar expression patterns.

Main Methods:

The review approach involves transforming gene expression data into a two-dimensional structure where each element possesses an associated weight. Researchers utilize subsequence matching algorithms to perform queries against these indexed datasets. The design focuses on adapting existing sequence matching tools to accommodate the specific requirements of pattern correlation analysis. This computational strategy converts both the database entries and the user-defined patterns into the same weighted format. The authors evaluate the performance of this indexing structure using both synthetic and real-world datasets. This approach ensures that the method remains robust when handling diverse types of biological information. By leveraging these algorithms, the system retrieves all genes that exhibit a similar shape of fluctuation across experimental conditions. The methodology emphasizes efficiency and scalability to address the challenges posed by large-scale genomic repositories.

Main Results:

Key findings from the literature demonstrate that the weighted-sequence model effectively supports pattern correlation queries against large-scale databases. The authors report that their method successfully retrieves genes exhibiting coherent expression fluctuations across various conditions. By converting microarray data into two-dimensional structures, the system achieves efficient subsequence matching. The results indicate that the proposed indexing strategy is effective for both synthetic and real-world datasets. This approach allows for the identification of genes whose expression levels rise and fall in a similar shape. The study shows that the method handles the complexity of gene expression data better than traditional sequence matching tools. These findings confirm that the weighted-sequence model provides a viable solution for searching massive genomic repositories. The researchers highlight the efficiency of their retrieval process when compared to standard methods for pattern similarity searching.

Conclusions:

The authors propose that the weighted-sequence model effectively supports pattern correlation queries within large-scale genomic repositories. This approach enables the retrieval of genes exhibiting coherent expression fluctuations across diverse experimental conditions. Synthesis and implications suggest that transforming microarray data into two-dimensional structures facilitates rapid subsequence matching. The researchers demonstrate that their indexing strategy maintains high efficiency when applied to both synthetic and real-world datasets. This work provides a scalable solution for identifying genes with similar activity shapes in high-dimensional environments. The findings indicate that the proposed method outperforms traditional search techniques by leveraging specialized sequence matching algorithms. Future applications could utilize this indexing structure to accelerate the discovery of co-regulated gene networks. The study confirms that structured data representation is a viable strategy for managing the growing complexity of biological information.

The researchers propose using a weighted-sequence model to identify genes with similar expression shapes. This mechanism converts microarray data into two-dimensional structures, allowing subsequence matching algorithms to retrieve genes that exhibit coherent fluctuations across various experimental conditions.

The authors utilize a weighted-sequence, which is a two-dimensional structure where every element in the sequence is associated with a specific weight. This tool enables the indexing of microarray data to support pattern-based queries against large databases.

A weighted-sequence is necessary because it allows for the representation of two-dimensional data, where each element is paired with a weight. This structure is required to support subsequence matching algorithms that identify genes with similar expression shapes, unlike standard one-dimensional sequence models.

The researchers transform both the raw DNA microarray data and the user-defined pattern queries into weighted-sequences. This transformation allows the system to apply subsequence matching algorithms to retrieve genes that match the query pattern from the database.

The authors measure the effectiveness and efficiency of their method by testing it against both synthetic and real-world datasets. This measurement confirms that the proposed indexing structure can handle the scale and complexity of actual gene expression data.

The researchers propose that this indexing structure provides a scalable approach for managing the explosion of gene expression data. They claim this method supports efficient pattern correlation queries, which are essential for identifying co-regulated genes in large-scale databases.