Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Material Measurement Laboratory / Biomolecular Measurement Division

Applied Genetics Group

How to Use Sequence Alignments

Return to Clinical DNA Informational Resource

General Information

Three file configurations are available:

1) SeqMan files – these are the original files used to generate sequence alignments in the areas of interest

2) FASTA formatted text files – these files are in FASTA format and have been verified to work with ClustalX 2.0.12 downloaded from Clustal.org on 11/23/2009.

3) Searchable text files – these files are searchable without additional programs; although there are some limitations

Decoding the FASTA name:

Example:
>gi|124303521|gb|AY315197.2|Towne|UL34for|nt116771-114648
(sequence)

The "GenInfo Identifier" (gi) is an early numbering system used by GenBank and each gi number is unique to the specific sequence. The "gb" indicates the sequence comes from GenBank. The accession.version includes both the accession number and version number. Either the gi-number or accession.version can locate the exact sequence used. The strain is simply the common name given. For location specific information, see "Structure of CMV Genome" and/or "Primers with Citations". The nucleotide positions have been given as additional information.

Searchable text file

.txt stands for text file

Text files where made so that sequences could be searched without proprietary software, but there are some artifacts of being a text file that make some instances difficult
- There is ~50 base overlap between the end of one line and the beginning of the next.
- This was done so that primers hanging on the edge of one line will still be found.
- But it also means if a primer falls in the first or last 50 bases it will be found twice.
- Notepad only finds exact matches; therefore, if there is an addition in one sequence all the others will have a dash (-) instead of a nucleotide. The dash must be included to use the find tool.
- To overcome this you can put a dash between two nucleotides, trying each one by one or you may search using known nucleotide position.
- Both Forward and Reverse sequence alignments have been included so that transcribing the reverse complement is unnecessary.

SeqMan file (.sqd Lasergene software)

The SeqMan files will not search the reverse complement. You have two options when you can not find a sequence you can transcribe the reverse complement and search again or you can use both forward and reverse SeqMan files.

Searching .txt files using notepad

Format => uncheck "word wrap"

Searching with "Find" tool:
- Edit => Find (Ctrl+F)
- Enter primer or probe sequence
- Click "Find Next"
Notepad will only find in one direction "up" or "down" if the cursor is down stream of the sequence you are searching you must change the radio usa-button to "up" or it will not find your sequence
Notepad only finds exact matches
- If there is an addition in one sequence all the others will have a dash (-) instead of a nucleotide. The dash must be included to use the find tool or you may search using known nucleotide position.
- There are ~50 bases of overlap between the end of one line and the beginning of the next. This was done so that primers hanging on the edge of one line will still be found. But it also means if a primer falls in the first or last 50 bases it will be found twice.

Searching by nucleotide location:
- Nucleotide position numbers are given if you know the location based on any of the strains provided you can find your primer sequence; although this may be tedious.

How to use Seq Align

Created November 5, 2009, Updated October 1, 2012