Karyotyping
DNA Microarrays
Nucleic Acid Structure
Modern Molecular Taxonomy
You might also read
Articles linked to this work by shared authors, journal, and citation graph.
Updated: Jun 25, 2026

Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA
Published on: May 9, 2011
Haixun Wang1, Chang-Shing Perng, Wei Fan
1T. J. Watson Research Center, IBM, Hawthorne, NY, 10532, USA. haixun@us.ibm.com
This article describes a new computational method to quickly find genes that show similar patterns of activity across different experimental conditions. By converting complex genetic data into a structured format called a weighted-sequence, researchers can efficiently search large databases for genes that behave in a coordinated way.
Area of Science:
Background:
The rapid expansion of gene expression datasets has created a significant challenge for existing computational retrieval systems. No prior work had resolved how to efficiently identify genes with coherent expression profiles across numerous experimental conditions. Researchers often struggle to compare the shapes of gene activity fluctuations within massive, high-dimensional biological repositories. That uncertainty drove the need for specialized indexing techniques capable of handling complex, multi-dimensional data patterns. Prior research has shown that traditional sequence matching tools often fail to capture the nuanced correlations required for genomic analysis. This gap motivated the development of novel structures that can accommodate the inherent variability found in biological measurements. Existing methods frequently lack the speed necessary to process the sheer volume of information generated by modern high-throughput technologies. Consequently, the field remains limited by the computational overhead associated with exhaustive searching of large-scale gene expression databases.
Purpose Of The Study:
The aim of this study is to introduce an index structure for pattern similarity searching within DNA microarray data. Researchers seek to address the challenge of identifying genes that exhibit coherent expression fluctuations across experimental conditions. This problem arises because the volume of gene expression data is rapidly increasing, potentially exceeding the scale of human sequencing projects. The authors propose that queries based on pattern correlations can be supported by a weighted-sequence model. This model was originally designed for sequence matching but is adapted here for biological data. The study motivates the need for efficient retrieval methods to handle the complex shapes of gene activity. By transforming microarray data into weighted-sequences, the authors intend to facilitate the identification of co-regulated genes. This work addresses the computational limitations inherent in searching large-scale genomic databases for similar expression patterns.
Main Methods:
The review approach involves transforming gene expression data into a two-dimensional structure where each element possesses an associated weight. Researchers utilize subsequence matching algorithms to perform queries against these indexed datasets. The design focuses on adapting existing sequence matching tools to accommodate the specific requirements of pattern correlation analysis. This computational strategy converts both the database entries and the user-defined patterns into the same weighted format. The authors evaluate the performance of this indexing structure using both synthetic and real-world datasets. This approach ensures that the method remains robust when handling diverse types of biological information. By leveraging these algorithms, the system retrieves all genes that exhibit a similar shape of fluctuation across experimental conditions. The methodology emphasizes efficiency and scalability to address the challenges posed by large-scale genomic repositories.
Main Results:
Key findings from the literature demonstrate that the weighted-sequence model effectively supports pattern correlation queries against large-scale databases. The authors report that their method successfully retrieves genes exhibiting coherent expression fluctuations across various conditions. By converting microarray data into two-dimensional structures, the system achieves efficient subsequence matching. The results indicate that the proposed indexing strategy is effective for both synthetic and real-world datasets. This approach allows for the identification of genes whose expression levels rise and fall in a similar shape. The study shows that the method handles the complexity of gene expression data better than traditional sequence matching tools. These findings confirm that the weighted-sequence model provides a viable solution for searching massive genomic repositories. The researchers highlight the efficiency of their retrieval process when compared to standard methods for pattern similarity searching.
Conclusions:
The authors propose that the weighted-sequence model effectively supports pattern correlation queries within large-scale genomic repositories. This approach enables the retrieval of genes exhibiting coherent expression fluctuations across diverse experimental conditions. Synthesis and implications suggest that transforming microarray data into two-dimensional structures facilitates rapid subsequence matching. The researchers demonstrate that their indexing strategy maintains high efficiency when applied to both synthetic and real-world datasets. This work provides a scalable solution for identifying genes with similar activity shapes in high-dimensional environments. The findings indicate that the proposed method outperforms traditional search techniques by leveraging specialized sequence matching algorithms. Future applications could utilize this indexing structure to accelerate the discovery of co-regulated gene networks. The study confirms that structured data representation is a viable strategy for managing the growing complexity of biological information.
The researchers propose using a weighted-sequence model to identify genes with similar expression shapes. This mechanism converts microarray data into two-dimensional structures, allowing subsequence matching algorithms to retrieve genes that exhibit coherent fluctuations across various experimental conditions.
The authors utilize a weighted-sequence, which is a two-dimensional structure where every element in the sequence is associated with a specific weight. This tool enables the indexing of microarray data to support pattern-based queries against large databases.
A weighted-sequence is necessary because it allows for the representation of two-dimensional data, where each element is paired with a weight. This structure is required to support subsequence matching algorithms that identify genes with similar expression shapes, unlike standard one-dimensional sequence models.
The researchers transform both the raw DNA microarray data and the user-defined pattern queries into weighted-sequences. This transformation allows the system to apply subsequence matching algorithms to retrieve genes that match the query pattern from the database.
The authors measure the effectiveness and efficiency of their method by testing it against both synthetic and real-world datasets. This measurement confirms that the proposed indexing structure can handle the scale and complexity of actual gene expression data.
The researchers propose that this indexing structure provides a scalable approach for managing the explosion of gene expression data. They claim this method supports efficient pattern correlation queries, which are essential for identifying co-regulated genes in large-scale databases.