Twenty years ago, scientists mapped more than 90% of the entire human genome. A huge accomplishment involving many scientists, mapping our genes has allowed researchers to better understand inherited diseases and led to many other scientific advancements.
If a researcher wanted to analyze your DNA, they’d compare you to that single reference genome that’s been in use for years and try to figure out how your DNA is different from that reference.
Last year, we were part of a large international team that finished the last 8% and completed the first human genome. This has enabled insights into previously unexplored parts of the genome, including providing a map of millions of genetic variations, or stretches of DNA that vary from person to person.
But here’s the problem — the composite human genome, while taken from a diverse group of people from all over the world, does not represent the full diversity of human DNA. Even the first “complete” human genome is missing sequences that only exist in some individuals. That can make medical research or testing someone’s genes for diseases a challenge for scientists. This is especially true for people with ancestry from regions of the world that have high genetic diversity — though it affects everyone.
Say you have an extra copy of a gene that’s not in the reference. That could be missed when comparing your DNA to the standard reference.
My colleagues and I are helping scientists make a new form of a genome — the pangenome — or “all genome.” The pangenome has been released in a draft version with about 50 people’s DNA, with the long-term goal of having about 350 people’s DNA in the completed pangenome in the next two or three years. It will allow us to understand the full diversity of our genes and advance medical research.
So instead of researchers comparing one person’s DNA to one standard reference, researchers can compare the person’s DNA to a reference library containing potentially hundreds of diverse people’s DNA.
You may have taken one of those DNA tests to learn about your genetic heritage. Or maybe you’ve been tested for a certain gene for a medical concern.
But how does DNA really work? Here’s a quick primer.
DNA is inherited material in humans and most other organisms. It determines traits about you, such as what color your eyes are or your susceptibility to a certain medical condition. DNA has two strands that are joined together to form a spiral-like shape. More than 99% of our human genes are identical among individuals, but that remaining less than 1% has a lot of valuable information.
It’s important for researchers to understand DNA because many diseases are inherited. Many people use genetic testing to diagnose conditions or understand their (or their children’s) risk factors for a disease.
The Human Pangenome Reference Consortium (HPRC) is a group of researchers working on the new pangenome.
The consortium chooses people carefully based on the diversity of their genes. For example, Africa has the most genetic diversity, so selecting many people with African ancestry helped add more sequences to the pangenome.
Here at NIST, we have something called the Genome in a Bottle consortium. With collaborators, we are essentially making a measuring stick for the human genome. The resulting reference material allows labs to evaluate the performance of their DNA analysis equipment.
As researchers are developing the pangenome, we’re using that reference material to help them evaluate it. For example, does a DNA sequence compared to the pangenome get a more accurate analysis than DNA compared to the current reference genome? So far, our research has shown that in several places in our DNA, the new pangenome is more accurate, but additional studies will be needed to help us learn more.
What does this look like in practice?
Let’s take one area of our genes, known as a region. One region, known as the MHC, contains many genes that relate to how our immune systems function. These genes are highly variable among people. Everyone has different versions of these genes, and some have extra copies. Using the pangenome to understand our genetic diversity in MHC can help scientists study autoimmune and infectious diseases and have a better understanding of who is more susceptible to them.
There are parts of the human genome that are missing from NIST’s reference material of the genome, and many are also missing from the current version of the pangenome. That’s because these are the most scientifically tricky parts of the genome that scientists are still figuring out how to map in many individuals.
DNA is incredibly complex. It’s like trying to put together a 100,000-piece puzzle, where the most difficult pieces look very similar. But with so few areas of our DNA left to map, scientists are hard at work trying to figure out those last few puzzle pieces.
At NIST, our role is to help researchers evaluate how current and future versions of the pangenome (including 350 DNA samples) can improve analysis of any individual’s genome from around the world. The goal is to include those tricky, puzzle-like parts of the DNA scientists are still trying to map.
Mapping the human genome wasn’t what I planned to do when I arrived at NIST in 2009 after completing my Ph.D., which was mostly focused on chemistry experiments. But as I learned more about mapping the human genome, I enjoyed comparing and piecing together large DNA sequence datasets, just like I enjoy complicated puzzles. Analyzing data and understanding concepts, such as what causes errors, fascinate me.
The pangenome is a very new area of research. Although the first version from HPRC is available for use, not many laboratories are using it yet. But as we develop tools to understand how it improves accuracy, I think its use will become more common, and even more so once the 350-person version is released in the coming years. That will make DNA testing more effective for everyone.
Our goal is a simple one — to make sure that no matter where someone’s genes come from, genetic testing and information can work accurately for them and empower them with the medical information they need to manage their health.
There’s certainly more to do. But this is an exciting development and one that I hope will lead to more equity in DNA research and health outcomes.
This is fascinating and thank you for your work on this valuable project. Your explanations about the DNA research you are doing are easy to understand and I'm looking forward to more information on your progress.
We all benefit from your diligent dedication to acquiring scientific knowledge. Your work and that of your colleagues helps to understand difference, thereby bringing us all together. Thank you.
i love science and i want all humans to appreciate science. your work is immensely helpful to all of us. thank you.
This is so awesome. I applause you for your work and enthusiasm for the project. I hop e you publish more in this medium as you progress. I also am a Chemist. I worked mostly for EPA and EPA contractors. I am retired now but still love reading about this kind of project. Thanks for all your hard work