The FASTA program efficiently identifies related protein sequences using amino acid identities and sequence alignments. While powerful, careful analysis is needed to avoid misinterpreting similarity scores for distantly related proteins.
Area of Science:
Bioinformatics
Computational Biology
Genomics
Background:
Protein sequence comparison is crucial for understanding evolutionary relationships and identifying novel protein families.
Existing methods for sequence searching can be computationally intensive and may lack selectivity.
The FASTA algorithm was developed to address the need for faster and more accurate protein sequence similarity searches.
Purpose of the Study:
To evaluate the performance of the FASTA program in identifying homologous protein sequences.
To compare FASTA's speed and selectivity against other sequence search programs.
To investigate the utility of FASTA in discovering distantly related proteins, such as within the G-protein-coupled receptor family.
Main Methods:
Utilized the FASTA program to search the NBRF protein sequence library.
Employed amino acid identities for initial screening and a PAM250 matrix for scoring and rescoring.
Incorporated a sequence joining step to calculate the initn score, particularly for sequences with gapped similarities.
Analyzed results by examining similarity scores, statistical significance, sequence alignments, and biological context.
Main Results:
FASTA identified homologous proteins with high speed (under 20 minutes on an IBM-PC) and selectivity.
FASTA demonstrated favorable comparison with slower NWS-based programs, offering comparable or superior selectivity.
The joining step proved effective for identifying sequences with similarity regions separated by variable loops.
In specific cases, FASTA highlighted potential new members of the G-protein-coupled receptor family, necessitating careful validation of sequence alignments.
Conclusions:
FASTA is a fast and selective tool for identifying protein sequences with a common evolutionary origin.
While FASTA provides unambiguous results in many cases, careful interpretation of scores and alignments is essential, especially for distantly related sequences.
Increasing sensitivity in sequence comparison methods requires greater analytical rigor to prevent misleading conclusions.