Imagine the body's instruction manual, the genome: here words are genes, letters are DNA, and the equivalent of typos can have disastrous consequences.
In recent years, scientists have grown increasingly fluent in the language of genome, but much remains mysterious, including the function of many of our genes.
Discovering what these genes are for, and how they work, is key to understanding what happens when they malfunction, causing disease and sometimes death.
Now a group of scientists is harnessing a massive database of genetic information from over 140,000 people to better understand which of our genes are important, and how we might better target medicines to treat genetic disease.
The database itself is something of a landmark. Known as the Genome Aggregation Database or gnomAD (pronounced nomad), it contains over 15,000 whole genome sequences - the equivalent of a full-length instruction manual - and over 125,000 whole exome sequences, akin to key points in the set of guidelines for the human body.
In seven papers published Wednesday in the journals Nature, Nature Communications and Nature Medicine, scientists combed through gnomAD data, focusing on a type of spelling error that effectively breaks the gene.
We all have some of these errors, known as loss-of-function variants, in our genome. But in most cases, they switch off or break a gene without ill-effect. We might end up with a diminished sense of smell, for example, but otherwise be healthy.
But when these mistakes occur in more important genes, they can result in serious illness.
Finding drug targets
People with these variants in important genes often don't pass them on because they die young or can't have children.
That means scientists can search a giant dataset like gnomAD, looking for genes that have fewer variants than expected.
They can extrapolate that these genes must be important to our health, because variants in them have not been passed down due to natural selection.
"With 144,000 people we start to have big enough numbers that if we don't see loss-of-function variants in a particular gene that tells us that people carrying broken copies of this gene are being lost from the population, probably as a result of severe genetic disease," said Daniel MacArthur, who led the gnomAD project.
"We can't tell exactly what that disease is, but this tells us that this particular gene is likely to be important in some way," added MacArthur, senior author on six of the seven papers, who carried out the research at Harvard and MIT's Broad Institute.
Knowing which genes are important in disease not only offers targets for new drugs, but can also suggest whether a new treatment will be safe.
That was the focus of research by Eric Vallabh Minikel, who is studying a rare illness called prion disease at the Broad Institute.
The research is personal for Minikel. His mother-in-law died from the disease and his wife, a fellow scientist, carries a genetic mutation that means she is likely to develop it too.
He and his wife want to find a drug that prevents the disease, and examining naturally occurring gene inactivation offers insights into what side effects such a new treatment might have.
"The effects of DNA changes that inactivate a gene can help to predict what might happen if we treated people with a drug to target that gene," he told AFP.
Need for bigger datasets
In a similar vein, another team used the data to predict whether switching off a gene associated with Parkinson's might cause side effects.
They found genomes where that gene was naturally switched off by loss-of-function variants, and in the majority of cases found this caused no harmful health effects.
That could suggest a drug targeting the gene might protect against Parkinson's, though the researchers cautioned significant further work is needed.
MacArthur, who is now director of the Garvan Institute's Centre for Population Genomics, said much bigger datasets than gnomAD are needed for further breakthroughs, potentially involving hundreds of millions of people and information on their health.
"We need that type of information... to get to the point where we can really start to fully understand the impact of variation in all genes in the genome on human disease risk."