vs.

Hg19 vs. Hg38

What's the Difference?

Hg19 and Hg38 are both versions of the human genome reference sequence, but they differ in several key aspects. Hg19, also known as GRCh37, was the first widely used reference genome and was completed in 2009. It contains approximately 3 billion base pairs and has been extensively annotated with known genes and genetic variations. On the other hand, Hg38, also known as GRCh38, was released in 2013 as an updated version. It includes additional sequences and improved accuracy, addressing some of the limitations of Hg19. Hg38 also incorporates alternative haplotypes, which allows for a more comprehensive representation of genetic diversity. Overall, Hg38 provides a more refined and accurate reference genome for researchers and clinicians, enabling better understanding and analysis of the human genome.

Comparison

AttributeHg19Hg38
Chromosome numbering1-based1-based
Assembly versionGRCh37GRCh38
Number of chromosomes2424
Reference sequence length3.1 billion base pairs3.1 billion base pairs
Gene annotationEnsembl, RefSeqEnsembl, RefSeq
Known variantsdbSNP, COSMICdbSNP, COSMIC
Genome coverage~90%~90%
Sequencing technologySanger, IlluminaIllumina
Release dateFebruary 2009December 2013

Further Detail

Introduction

Genome assemblies play a crucial role in understanding the structure and function of the human genome. Over the years, multiple versions of the human reference genome have been released, each with its own set of improvements and updates. Two widely used versions are Hg19 (GRCh37) and Hg38 (GRCh38). In this article, we will compare the attributes of Hg19 and Hg38, highlighting their differences and advancements.

Assembly Quality

One of the primary considerations when comparing genome assemblies is their quality. Hg38 is considered to be a more accurate and complete assembly compared to Hg19. Hg38 incorporates advancements in sequencing technologies, resulting in improved contiguity and reduced gaps in the genome. The Hg38 assembly also benefits from the inclusion of additional data sources, such as long-read sequencing, which helps resolve complex genomic regions that were challenging to assemble in Hg19.

Furthermore, Hg38 has undergone extensive manual curation, with a focus on resolving problematic regions and correcting errors present in Hg19. This curation process involved the integration of data from various projects and initiatives, resulting in a more reliable and comprehensive reference genome.

Gene Annotation

Accurate gene annotation is crucial for understanding the functional elements within the genome. Hg38 provides an improved gene annotation compared to Hg19. The updated annotation includes a more comprehensive catalog of protein-coding genes, non-coding RNA genes, and regulatory elements. This enhanced annotation is a result of incorporating data from large-scale projects, such as ENCODE and GENCODE, which have extensively characterized the human genome.

In addition to improved gene annotation, Hg38 also includes alternative splicing isoforms, which were not present in Hg19. These isoforms provide a more detailed representation of gene expression patterns and contribute to a better understanding of gene regulation and protein diversity.

Repetitive Elements

Repetitive elements constitute a significant portion of the human genome and can pose challenges during assembly. Hg38 has made significant improvements in accurately representing repetitive elements compared to Hg19. The updated assembly incorporates advanced algorithms and additional data sources to better resolve repetitive regions, resulting in a more accurate representation of the genome.

Furthermore, Hg38 includes a more comprehensive catalog of repetitive elements, including transposable elements and retrotransposons. This expanded catalog provides researchers with a better understanding of the repetitive landscape of the human genome and its potential implications in genome evolution and disease.

Structural Variants

Structural variants (SVs) are large-scale genomic alterations that can have significant implications in human health and disease. Hg38 provides improved detection and representation of SVs compared to Hg19. The updated assembly incorporates data from large-scale sequencing projects, such as the 1000 Genomes Project and the Genome in a Bottle Consortium, which have extensively characterized SVs in the human population.

Hg38 also benefits from advancements in sequencing technologies, such as long-read sequencing, which enables the detection of complex SVs that were challenging to identify in Hg19. The improved representation of SVs in Hg38 contributes to a more comprehensive understanding of genomic structural variation and its impact on human biology and disease susceptibility.

Compatibility and Adoption

While Hg38 offers numerous improvements over Hg19, it is important to consider compatibility and adoption within the scientific community. Hg19 has been widely used for many years, and numerous tools, databases, and analyses have been developed specifically for this assembly. Transitioning to Hg38 may require updating these resources and workflows, which can be time-consuming and challenging.

However, the scientific community has recognized the benefits of Hg38, and efforts are underway to facilitate its adoption. Many widely used databases and tools have already transitioned to Hg38, and guidelines have been established to ensure compatibility and consistency across different versions of the human reference genome. As more researchers adopt Hg38, the availability of resources and support for this assembly will continue to grow.

Conclusion

In conclusion, Hg38 represents a significant improvement over Hg19 in terms of assembly quality, gene annotation, representation of repetitive elements, and detection of structural variants. The updated assembly incorporates advancements in sequencing technologies, additional data sources, and extensive manual curation, resulting in a more accurate and comprehensive reference genome. While transitioning to Hg38 may require some adjustments within the scientific community, the benefits it offers in terms of accuracy and completeness make it a valuable resource for studying the human genome.

Comparisons may contain inaccurate information about people, places, or facts. Please report any issues.