By combining a resource called 1000 Genomes and the largest study of Hispanic health in the country, biostatisticians at UW hope to better understand risk factors for diseases like diabetes and asthma. Leer en Español
Genetically speaking, humans are 99 percent identical. The differences we can see — how tall we are, the color of our eyes, and our predisposition to certain diseases — lie in the remaining one percent of our genes.
By studying the genetic differences or variants in the remaining one percent of the human genome scientists at the University of Washington and around the world hope to shine a light on the genetic factors that play a role in the development of cancer, heart disease and other maladies.
Because genetic variants occur in approximately one out of every 100 people, in order to find them, early genome sequencing projects like the Human Genome Project and the HapMap Project, built huge databases of the genetic information of many different individuals.
But there was a problem — there wasn’t a lot of diversity in the individuals who participated in those early projects.
So, for example, if scientists looked only at genetic variants that relate to breast cancer in people of Western European ancestry, they could be missing different variants that relate to breast cancer in people of African, Asian or Native American ancestry.
Some of the most rare genetic variants tend to be found within populations whose individuals share a common ancestry. Thus, the greater the diversity of individuals who are sequenced, the more likely scientists are to find rare genetic variants.
To address the lack of diversity an international consortium spearheaded by the National Institutes of Health (NIH), launched the 1000 Genomes Project. The goal of this project is to sequence the genomes of over 2,000 individuals from 26 populations around the world, from Punjabi to Puerto Rican.
This collection of sequences — which had its latest phase completed in late 2012 — represents the most comprehensive catalog of human genetic variations to date. The information is freely available to scientists everywhere.
Separating the signal from the noise
Sequencing the genomes of thousands of individuals gives rise to large-scale data sets containing the information of hundreds of thousands of genetic variants. Some of those genetic variants are associated with disease, and others are not.
So how do scientists distinguish which variants are relevant and which are not? How do they separate the signal from the noise? They use the power of biostatistics. As a science, biostatistics uses mathematical and computational methods to discover meaningful patterns in a sea of biological or health data.
“For most of the common diseases, the genetic component is probabilistic, not deterministic. That means that having a certain genetic variant may increase the probability of getting the disease, but doesn’t mean you will definitely get it. That’s where biostatistics comes in,” said Dr. Cathy Laurie, senior principal research scientist at the University of Washington Department of Biostatistics.
Using biostatistical methods scientists can determine which genetic variants are associated with disease and how they affect the probability that someone would get the disease the variant is associated with.
Biostatisticians not only apply, but they develop methods to obtain valuable information from large-scale data sets. For example, Brian Browning, Associate Professor at UW Biostatistics, has developed a computer program called BEAGLE, one of several pieces of software used to analyze data from 1000 Genomes. Using BEAGLE, scientists can use sequences from the 1000 Genomes Project to complement the information obtained from their own studies to better identify genes related to disease.
“The 1000 Genomes Project is a very important tool for modern genetic studies,” said Dr. Bruce Weir, professor and chair at UW Biostatistics. “Although DNA sequencing technology has become significantly cheaper since the human genome was first sequenced, it would still be very expensive for a single researcher to sequence the genomes of thousands of people for a particular disease study,” Weir said.
Genetic analysis of Latinos
Weir and his colleagues at UW Biostatistics will use data from 1000 Genomes to complement their analyses of the genomes of approximately 16,000 Hispanic individuals to study the genetic risk factors for diseases prevalent in Hispanic/Latino populations in the U.S. The team of researchers was recently awarded a multi-million dollar National Heart, Lung and Blood Institute contract to establish the Omics in Latinos Genetic Analysis Center. (Omics refers to the study of a set of biological molecules — e.g. genomics is the study of the genome, the collection of genes or genetic material of an organism).
The Omics in Latinos Genetics Analysis Center is part of the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), a multi-center study that aims to identify the patterns, causes and risk factors for disease in Hispanic/Latino populations in the U.S.
“HCHS/SOL is aimed at improving the health of Hispanics and Latinos in the U.S., the fastest growing segment of the population in the country,” said Dr. Cathy Laurie, who is also part of the Omics in Latinos Genetic Analysis Center team.
Hispanics/Latinos have a rich cultural and ancestral diversity. According to the 2010 Census, Hispanics in the U.S. originate from over 20 different countries. To take this diversity into account, HCHS/SOL has recruited individuals living in the U.S., but whose ancestors are mainly from Cuba, the Dominican Republic, Mexico, Puerto Rico or Central and South America.
“We want to understand the differences between individuals — their genetic variations — and how they are relevant to health outcomes,” said Dr. Tim Thornton, assistant professor at UW Biostatistics and co-investigator for the Omics in Latinos Genetic Analysis Center.
In their genetic analysis of HCHS/SOL participants, Weir, Thornton, Laurie and colleagues are hoping to find risk factors associated with diseases like diabetes and asthma, which have a high prevalence amongst Latino populations. They will also investigate the genetic variants associated with conditions such as sleep apnea and adult hearing loss. Very little is known about the genetic basis of these conditions.
By analyzing the genetic risk factors for several diseases that affect Latino populations, researchers at UW Biostatistics will contribute to the understanding of their causes. Findings from the Omics in Latinos Genetic Analysis Center could also help pinpoint the causes of health disparities within Latino populations in the U.S.
Understanding that there are genetic differences between individuals has led to the understanding of the mechanisms for disease. If every individual were the same, scientists would not be able to figure out what variants are relevant to disease and how disease arises.
Projects like 1000 Genomes and the Omics in Latinos Genetic Analysis Center will help us gain a richer understanding of human genetic diversity and the genetic risk factors of disease. Armed with this knowledge scientists can move towards designing better diagnosis, treatment and prevention methods for everyone.