Return to Clinical DNA Informational Resource
Three file configurations are available:
1) SeqMan files – these are the original files used to generate sequence alignments in the areas of interest
2) FASTA formatted text files – these files are in FASTA format and have been verified to work with ClustalX 2.0.12 downloaded from Clustal.org on 11/23/2009.
3) Searchable text files – these files are searchable without additional programs; although there are some limitations
>gi|gi-number|gb|accession.version|strain|locus|nucleotide position
(sequence)
Example:
>gi|124303521|gb|AY315197.2|Towne|UL34for|nt116771-114648
(sequence)
The "GenInfo Identifier" (gi) is an early numbering system used by GenBank and each gi number is unique to the specific sequence. The "gb" indicates the sequence comes from GenBank. The accession.version includes both the accession number and version number. Either the gi-number or accession.version can locate the exact sequence used. The strain is simply the common name given. For location specific information, see "Structure of CMV Genome" and/or "Primers with Citations". The nucleotide positions have been given as additional information.
.txt stands for text file
Text files where made so that sequences could be searched without proprietary software, but there are some artifacts of being a text file that make some instances difficult
There is ~50 base overlap between the end of one line and the beginning of the next.
This was done so that primers hanging on the edge of one line will still be found.
But it also means if a primer falls in the first or last 50 bases it will be found twice.
Notepad only finds exact matches; therefore, if there is an addition in one sequence all the others will have a dash (-) instead of a nucleotide. The dash must be included to use the find tool.
To overcome this you can put a dash between two nucleotides, trying each one by one or you may search using known nucleotide position.
Both Forward and Reverse sequence alignments have been included so that transcribing the reverse complement is unnecessary.
The SeqMan files will not search the reverse complement. You have two options when you can not find a sequence you can transcribe the reverse complement and search again or you can use both forward and reverse SeqMan files.
Format => uncheck "word wrap"
Searching with "Find" tool:
Edit => Find (Ctrl+F)
Enter primer or probe sequence
Click "Find Next"
Notepad will only find in one direction "up" or "down" if the cursor is down stream of the sequence you are searching you must change the radio usa-button to "up" or it will not find your sequence
Notepad only finds exact matches
If there is an addition in one sequence all the others will have a dash (-) instead of a nucleotide. The dash must be included to use the find tool or you may search using known nucleotide position.
There are ~50 bases of overlap between the end of one line and the beginning of the next. This was done so that primers hanging on the edge of one line will still be found. But it also means if a primer falls in the first or last 50 bases it will be found twice.
Searching by nucleotide location:
Nucleotide position numbers are given if you know the location based on any of the strains provided you can find your primer sequence; although this may be tedious.