Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

A Big Science Framework for Big Genomes

An illustration of 6 scientists working on DNA

As researchers are able to engineer larger and larger genomes, collaborations are necessarily larger and more complex. "Organizing genome engineering for the gigabase scale" provides a framework for efficiencies in workflows and data management, and ethical, legal, and contractual matters to help scientific collaborations realize efficiencies in large-scale projects. 

In a recent Perspectives article in Nature Communications, NIST’s Elizabeth Strychalski and co-authors from industry and academia offer a framework for engineering whole genomes of organisms.

Since the turn of this century, as researchers have progressed from engineering the genomes of tiny viruses to more complex bacteria, the necessary collaborations have grown correspondingly larger and more complex. The synthetic polio virus—7,500 base pairs long—required three researchers to complete back in 2002. The soon-to-be completed synthetic yeast genome, at more than 12 million base pairs, has required nearly 175 collaborators. Strychalski and authors estimate that the billions of base pairs in mammals’ DNA will require about 500 researchers to engineer—unless research teams adopt efficiencies.

The challenges of large-scale scientific endeavors extend beyond the work in the lab to design and assemble a genome, to managing workflows and data, to legal and contractual matters. The fields of physics and astronomy have resolved similar issues to build their supercolliders and space telescopes; In the field of biology, only the Human Genome Project, which decoded our DNA, has been a similarly massive effort. Nearly 3,000 researchers shared authorship on the article in Nature revealing the reading of the human genome’s “first draft.”

The intent of the Nature Communications article, says Strychalski, is “to break down the ambitious goal of engineering a large genome into tractable pieces that take place in individual labs.” The article gives guidance for using existing or developing new “technologies, repositories, standards, and frameworks,” according to the authors.

The authors’ recommendations span the entire “design, build, test, learn” cycle of genome engineering, including

  • Automate more steps
  • Consistently annotate reference genomes, so that engineers understand what functions are likely to be affected when genes are edited
  • Adopt, extend, and develop new formats for designing and encoding genetic sequences
  • Adopt successful practices from biomanufacturing for data sharing and management
  • Develop new and formalize current metrics for the fitness of engineered organisms
  • Extend biology’s computational modelling tools to help practitioners learn from others' engineering efforts
  • Adopt and develop tools for managing workflows and protocols
  • Develop frameworks for managing intellectual property and material transfers
  • Address the ethical, legal, and societal implications of genome engineering projects as early as possible

The authors point out that other scientific communities have already successfully navigated similar challenges, and so genome engineers should seek to adopt solutions from the aerospace and semiconductor industries, as examples.

“What kinds of projects are possible if you can organize at scale?” asks Strychalski.

Paper: Bartley, B.A., Beal, J., Karr, J.R. et al. Organizing genome engineering for the gigabase scale. Nat Commun 11, 689 (2020). DOI:

Released February 4, 2020, Updated February 4, 2021