Abstract
The National Institute of Standards and Technology (NIST) U.S. population sample set of unrelated individuals was used to determine allele and haplotype frequencies for seven X-chromosome short tandem repeat (STR) loci in four linkage groups. DXS7132, DXS7423, DXS8378, DXS10074, DXS10103, DXS10135, and HPRTB were sequenced using the ForenSeq DNA Signature Prep Kit on a MiSeq FGx instrument from Verogen. Capillary electrophoresis data produced using the Qiagen Investigator Argus X-12 was compared to ForenSeq length-based alleles and found to be 99% concordant. For three loci (DXS10103, DXS10074, and HPRTB) the length-based allele call is affected by the extent of flanking region included in the reported sequence. Six of the seven loci gained alleles by sequencing compared to length-based determinations. The increase in alleles are found in both the repeat and flanking region sequences. All sequences for which frequencies are reported in this dataset were cataloged as GenBank records in the STRSeq NCBI BioProject (
https://www.ncbi.nlm.nih.gov/bioproject/380127). Frequency information for both the loci and linkage groups is reported, along with results of statistical tests including gene diversity, polymorphism information content, power of discrimination, and linkage disequilibrium. All supplemental files are available at the NIST Public Data Repository – Sequence-based U.S. population data (
https://doi.org/10.18434/t4/1500024).