98% of the DNA in our body is non-coding, i.e. does not carry the information needed to build proteins. Non-coding has sometimes been equated with ‘non-functional’, or called ‘junk’ in the past; today we know that this is far from the truth. Scattered throughout non-coding DNA is a plethora of so-called regulatory elements, including enhancers, silencers and insulators. These regulatory elements function like molecular switches to control which genes are active (and thus produce proteins) in which cells. This process of gene expression control is vital to allow cells – which all contain the same genes – to specialise to carry out different tasks, and to help them respond to changes.
Enhancers are a type of regulatory element that control gene expression over long distances. They contact their target genes via chromosomal interactions, often bridging large distances in the genome, with the intervening DNA ‘looping out’. To understand how enhancers work, we study them in the context of the three-dimensional organisation of the genome.
Our aim is to find regulatory elements and to understand which genes they control. We also aim to uncover the molecular mechanisms by which regulatory elements find their target genes in the three-dimensional space of the cell nucleus, and to understand how altering the function of regulatory elements can lead to developmental malformations and disease.
We study these questions in pluripotent stem cells – cells that have the potential to create all cell types in the adult body. We use a combination of molecular, genetic, biochemical and imaging approaches to study pluripotent stem cells in their ‘ground state’, and when they start to form new cell types – a process called cell lineage specification.
Through high-resolution mapping and experimental perturbation of the spatial genome architecture, we aim to reveal gene regulatory principles that underpin cell states and cell fate transitions. This may ultimately pave the way for us to experimentally engineer 3D genome folding to achieve predictable outcomes on gene expression and cell fate choice, with potential implications for gene therapy and regenerative medicine.
Genome sequencing has revealed over 300 million genetic variations in human populations. Over 90% of variants are single nucleotide polymorphisms (SNPs), the remainder include short deletions or insertions, and small numbers of structural variants. Hundreds of thousands of these variants have been associated with specific phenotypic traits and diseases through genome wide association studies which link significant differences in variant frequencies with specific phenotypes among large groups of individuals. Only 5% of disease-associated SNPs are located in gene coding sequences, with the potential to disrupt gene expression or alter of the function of encoded proteins. The remaining 95% of disease-associated SNPs are located in non-coding DNA sequences which make up 98% of the genome. The role of non-coding, disease-associated SNPs, many of which are located at considerable distances from any gene, was at first a mystery until the discovery that gene promoters regularly interact with distal regulatory elements to control gene expression. Disease-associated SNPs are enriched at the millions of gene regulatory elements that are dispersed throughout the non-coding sequences of the genome, suggesting they function as gene regulation variants. Assigning specific regulatory elements to the genes they control is not straightforward since they can be millions of base pairs apart. In this review we describe how understanding 3D genome organization can identify specific interactions between gene promoters and distal regulatory elements and how 3D genomics can link disease-associated SNPs to their target genes. Understanding which gene or genes contribute to a specific disease is the first step in designing rational therapeutic interventions.
Sex determination is the process by which an initial bipotential gonad adopts either a testicular or ovarian cell fate. The inability to properly complete this process leads to a group of developmental disorders classified as disorders of sex development (DSD). To date, dozens of genes were shown to play roles in mammalian sex determination, and mutations in these genes can cause DSD in humans or gonadal sex reversal/dysfunction in mice. However, exome sequencing currently provides genetic diagnosis for only less than half of DSD patients. This points towards a major role for the non-coding genome during sex determination. In this review, we highlight recent advances in our understanding of non-coding, cis-acting gene regulatory elements and discuss how they may control transcriptional programmes that underpin sex determination in the context of the 3-dimensional folding of chromatin. As a paradigm, we focus on the Sox9 gene, a prominent pro-male factor and one of the most extensively studied genes in gonadal cell fate determination.
Development of cervical cancer is directly associated with integration of human papillomavirus (HPV) genomes into host chromosomes and subsequent modulation of HPV oncogene expression, which correlates with multi-layered epigenetic changes at the integrated HPV genomes. However, the process of integration itself and dysregulation of host gene expression at sites of integration in our model of HPV16 integrant clone natural selection has remained enigmatic. We now show, using a state-of-the-art 'HPV integrated site capture' (HISC) technique, that integration likely occurs through microhomology-mediated repair (MHMR) mechanisms via either a direct process, resulting in host sequence deletion (in our case, partially homozygously) or via a 'looping' mechanism by which flanking host regions become amplified. Furthermore, using our 'HPV16-specific Region Capture Hi-C' technique, we have determined that chromatin interactions between the integrated virus genome and host chromosomes, both at short- (<500 kbp) and long-range (>500 kbp), appear to drive local host gene dysregulation through the disruption of host:host interactions within (but not exceeding) host structures known as topologically associating domains (TADs). This mechanism of HPV-induced host gene expression modulation indicates that integration of virus genomes near to or within a 'cancer-causing gene' is not essential to influence their expression and that these modifications to genome interactions could have a major role in selection of HPV integrants at the early stage of cervical neoplastic progression.
Stefan Schoenfelder on Twitter