Although it was almost complete for decades, scientists say that, finally, they managed to finish decoding the human genome, the set of instructions for building and maintaining a human being.
This genetic plan, according to the participants of the study published this Thursday in the journal Science, was fully assembled. An international team described the first sequencing of a complete human genome. The previous effort, celebrated around the world, was incomplete because the DNA sequencing technologies of the time could not read certain parts of it. Even after the updates , about 8% of the genome was missing.
“Some of the genes that make us only human were actually in this 'dark matter of the genome' and were completely overlooked,” said Evan Eichler, a researcher at the University of Washington who participated in the current effort and the original Human Genome Project. “It took more than 20 years, but we finally succeeded,” he stressed.
Many, including Eichler's own students, thought it was over. “I was teaching them and they said, 'Wait a minute. Isn't this like the sixth time they've declared victory? I said, 'No, this time we really, really did it!
Scientists said that this comprehensive picture of the genome will give humanity a greater understanding of our evolution and biology, while also opening the door to medical discoveries in areas such as aging, neurodegenerative diseases, cancer and heart disease.
“We are simply expanding our opportunities to understand human disease,” said Karen Miga, author of one of six studies released on Thursday.
The research culminates decades of work. The first draft of the human genome was announced at a White House ceremony in 2000 by the leaders of two competing entities: an international publicly funded project led by an agency of the U.S. Department of Commerce and a private company, Celera Genomics, based in Maryland.
“The telomere-to-telomere (T2T) consortium has completed the first truly complete 3.055 billion base pair (bp) sequence of a human genome, representing the greatest improvement of the human reference genome since its initial launch,” the scientists wrote last June, in an article published on the research server BioRXIV, since until then it was not peer-reviewed, which finally happened now.
The new genome is a leap forward, the researchers said at the time, which was made possible by new DNA sequencing technologies developed by two private sector companies: California's Pacific Biosciences, also known as PACBio, and British Oxford Nanopore . Its technologies for reading DNA have very specific advantages over tools that have long been considered fundamental for researchers.
“This 8% of the genome has not been overlooked because of its lack of importance, but because of technological limitations,” the researchers wrote. “High-precision long-read sequencing has finally eliminated this technological barrier, allowing comprehensive studies of genomic variation across the human genome. Such studies will necessarily require a complete and accurate human reference genome, which will ultimately drive the adoption of the T2T-CHM13 set presented here.”
The human genome consists of about 3.1 billion subunits of DNA, chemical base pairs known by the letters A, C, G and T. Genes are strings of these letter pairs that contain instructions for making proteins, the building blocks of life. Humans have about 30,000 genes, organized into 23 groups called chromosomes that are found in the nucleus of each cell.
So far “large and persistent gaps that have been on our map, and these gaps fall in quite important regions,” Miga said.
Miga, a genomics researcher at the University of California-Santa Cruz, worked with Adam Phillippy from the National Institute for Human Genome Research to organize the team of scientists to start from scratch with a new genome with the goal of sequencing everything, including missing pieces. The group, named for the sections at the ends of chromosomes, called telomeres, is known as the Telomere to Telomere consortium, or T2T.
His work adds new genetic information to the human genome, corrects previous errors, and reveals long stretches of DNA that are known to play an important role in both evolution and disease. A version of the research was published last year before being reviewed by fellow scientists.
“I would say this is a huge improvement of the Human Genome Project,” which doubles its impact, said geneticist Ting Wang of the University of Washington School of Medicine in St. Louis, who was not involved in the investigation.
Eichler said that some scientists used to think that unknown areas contained “garbage.” But “some of us always thought there was gold in those hills,” he said.
It turns out that gold includes many important genes, he said, such as those essential to make a person's brain bigger than that of a chimpanzee, with more neurons and connections.
To find such genes, scientists needed new ways of reading the cryptic genetic language of life. Reading genes requires cutting strands of DNA into pieces of hundreds to thousands of letters. Sequencing machines read the letters on each piece and scientists try to put the pieces in the right order. This is especially difficult in areas where letters are repeated.
Scientists said some areas were illegible before improvements in gene sequencing machines that now allow them, for example, to accurately read a million letters of DNA at a time. That allows scientists to see genes with repeated areas as longer chains rather than fragments that they later had to put together.
The researchers also had to overcome another challenge: most cells contain genomes of both mother and father, which confuses attempts to assemble the parts correctly. The T2T researchers solved this by using a cell line from a “complete hydatiform mole”, an abnormal fertilized egg that does not contain fetal tissue and that has two copies of the father's DNA and none of the mother's DNA.
The next step will be to map more genomes, including those that include collections of genes from both parents. This effort did not map one of the 23 chromosomes found in men, called the Y chromosome, because the mole contained only one X.
Wang said he is working with the T2T group in the Human Pangenome Reference Consortium, which is trying to generate “reference” genomes or template for 350 people representing the breadth of human diversity.
“We now have a correct genome and we need to do many, many more,” Eichler said. “This is the start of something really fantastic for the field of human genetics.”
In another notable advance in genetics, last February scientists from the University of Oxford created the first family tree of humanity. “The theoretical background of genome-wide genealogies, which describe how we have inherited genes from our ancestors, has developed over the past three decades,” explained Anthony Wilder Wohns, a postdoctoral researcher at the Broad Institute at MIT and Harvard, a former doctoral student at the Big Data Institute (BDI) of the Oxford University and lead author of the research-. However, really estimating this structure is a tremendously difficult statistical problem.”
The difficulties associated with combining huge data sets from a large number of different databases have been a major barrier to this research effort, until now. Wilder Wohns, another lead author of this new study conducted during his time at the BDI, reported on “a novel method for easily combining millions of genomic sequences from ancient and modern populations.”
Together with his colleagues, Wohns used this method to create a “first draft” of the family tree of humanity. “We devised a novel algorithm that deduces genetic relationships without the need to compare each DNA sequence against each other and combined it with another algorithm that places dates on common ancestors by treating all ancestry as a single network,” he explained. In addition, by estimating the entire genealogy of humanity, we were able to create algorithms that allowed us to use the entire genome to estimate when and, for the first time, where our ancestors lived.”
KEEP READING