	<rdf:RDF xmlns:admin="http://webns.net/mvcb/" xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:prism="http://purl.org/rss/1.0/modules/prism/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/">
	<channel rdf:about="https://biorxiv.org">
	<admin:errorReportsTo rdf:resource="mailto:biorxiv@cshlpress.edu"/>
	<title>bioRxiv Channel: International Human Epigenome Consortium (IHEC)</title>
	<link>https://biorxiv.org</link>
	<description>
	This feed contains articles for bioRxiv Channel "International Human Epigenome Consortium (IHEC)"
	</description>

		<items>
	<rdf:Seq>
		</rdf:Seq>
	</items>
	<prism:eIssn/>
	<prism:publicationName>bioRxiv</prism:publicationName>
	<prism:issn/>

	<image rdf:resource=""/>
	</channel>
	<image rdf:about="">
	<title>bioRxiv</title>
	<url/>
	<link>https://biorxiv.org</link>
	</image>
	<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.24.493345v1?rss=1">
<title>
<![CDATA[
ChromGene: Gene-Based Modeling of Epigenomic Data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.24.493345v1?rss=1"
</link>
<description><![CDATA[
BackgroundVarious computational approaches have been developed to annotate epigenomes on a per-position basis by modeling combinatorial and spatial patterns within epigenomic data. However, such annotations are less suitable for gene-based analyses, in which a single annotation for each gene is desired.

ResultsTo address this, we developed ChromGene, which annotates genes based on the combinatorial and spatial patterns of multiple epigenomic marks across the gene body and flanking regions. Specifically, ChromGene models the epigenomics maps using a mixture of hidden Markov models learned de novo. Using ChromGene, we generated annotations for the human protein-coding genes for over 100 cell and tissue types. We characterize the different mixture components and their associated gene sets in terms of gene expression, constraint, and other gene annotations. We also characterize variation in ChromGene gene annotations across cell and tissue types.

ConclusionsWe expect that the ChromGene method and provided annotations will be a useful resource for gene-based epigenomic analyses.
]]></description>
<dc:creator>Jaroszewicz, A.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2022-05-25</dc:date>
<dc:identifier>doi:10.1101/2022.05.24.493345</dc:identifier>
<dc:title><![CDATA[ChromGene: Gene-Based Modeling of Epigenomic Data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-25</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.08.491094v1?rss=1">
<title>
<![CDATA[
A framework for summarizing chromatin state annotations within and identifying differential annotations across groups of samples 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.08.491094v1?rss=1"
</link>
<description><![CDATA[
MotivationGenome-wide maps of epigenetic modifications are powerful resources for non-coding genome annotation. Maps of multiple epigenetics marks have been integrated into cell or tissue type-specific chromatin state annotations for many cell or tissue types. With the increasing availability of multiple chromatin state maps for biologically similar samples, there is a need for methods that can effectively summarize the information about chromatin state annotations within groups of samples and identify differences across groups of samples at a high resolution.

ResultsWe developed CSREP, which takes as input chromatin state annotations for a group of samples and then probabilistically estimates the state at each genomic position and derives a representative chromatin state map for the group. CSREP uses an ensemble of multi-class logistic regression classifiers to predict the chromatin state assignment of each sample given the state maps from all other samples. The difference of CSREPs probability assignments for two groups can be used to identify genomic locations with differential chromatin state patterns.

Using groups of chromatin state maps of a diverse set of cell and tissue types, we demonstrate the advantages of using CSREP to summarize chromatin state maps and identify biologically relevant differences between groups at a high resolution.

Availability and implementationThe CSREP source code is openly available under http://github.com/ernstlab/csrep.

Contact: jason.ernst@ucla.edu
]]></description>
<dc:creator>Vu, H. T.</dc:creator>
<dc:creator>Koch, Z.</dc:creator>
<dc:creator>Fiziev, P.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2022-05-08</dc:date>
<dc:identifier>doi:10.1101/2022.05.08.491094</dc:identifier>
<dc:title><![CDATA[A framework for summarizing chromatin state annotations within and identifying differential annotations across groups of samples]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.19.521116v1?rss=1">
<title>
<![CDATA[
Universal chromatin state annotation of the mouse genome 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.19.521116v1?rss=1"
</link>
<description><![CDATA[
Genome-wide chromatin states learned from integrating genome-wide maps of multiple epigenetic marks within the same cell type have been widely used to generate genome annotations of individual cell types. An alternative strategy based on  stacked modeling can provide a single  universal chromatin state annotation based jointly on data from many cell types. In human, such an approach was recently demonstrated and the resulting chromatin state annotation, denoted full-stack, was shown to have complementary advantages to per-cell-type annotations. However, an analogous annotation has not been previously available in mouse. Here, we produce a chromatin state annotation for mouse based on 901 datasets assaying 14 chromatin marks in 26 different cell or tissue types. To characterize each chromatin state, we relate the states to other external annotations and compare them to analogously defined states in human. We expect the full-stack chromatin state annotation for mouse will be a useful resource for studying the genome of this key mammalian model organism.
]]></description>
<dc:creator>Vu, H. T.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2022-12-20</dc:date>
<dc:identifier>doi:10.1101/2022.12.19.521116</dc:identifier>
<dc:title><![CDATA[Universal chromatin state annotation of the mouse genome]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.11.17.387134v1?rss=1">
<title>
<![CDATA[
Universal annotation of the human genome through integration of over a thousand epigenomic datasets 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.11.17.387134v1?rss=1"
</link>
<description><![CDATA[
BackgroundGenome-wide maps of chromatin marks such as histone modifications and open chromatin sites provide valuable information for annotating the non-coding genome, including identifying regulatory elements. Computational approaches such as ChromHMM have been applied to discover and annotate chromatin states defined by combinatorial and spatial patterns of chromatin marks within the same cell type. An alternative  stacked modeling approach was previously suggested, where chromatin states are defined jointly from datasets of multiple cell types to produce a single universal genome annotation based on all datasets. Despite its potential benefits for applications that are not specific to one cell type, such an approach was previously applied only for small-scale specialized purposes. Large-scale applications of stacked modeling have previously posed scalability challenges.

ResultsUsing a version of ChromHMM enhanced for large-scale applications, we applied the stacked modeling approach to produce a universal chromatin state annotation of the human genome using over 1000 datasets from more than 100 cell types, with the learned model denoted as the full-stack model. The full-stack model states show distinct enrichments for external genomic annotations, which we used in characterizing each state. Compared to per-cell-type annotations, the full-stack annotations directly differentiate constitutive from cell type specific activity and is more predictive of locations of external genomic annotations.

ConclusionsThe full-stack ChromHMM model provides a universal chromatin state annotation of the genome and a unified global view of over 1000 datasets. We expect this to be a useful resource that complements existing per-cell-type annotations for studying the non-coding human genome.
]]></description>
<dc:creator>Ernst, J.</dc:creator>
<dc:creator>Vu, H. T.</dc:creator>
<dc:date>2020-11-19</dc:date>
<dc:identifier>doi:10.1101/2020.11.17.387134</dc:identifier>
<dc:title><![CDATA[Universal annotation of the human genome through integration of over a thousand epigenomic datasets]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-11-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.08.02.502571v1?rss=1">
<title>
<![CDATA[
Chromatin state modeling across individuals reveals global patterns of histone modifications 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.08.02.502571v1?rss=1"
</link>
<description><![CDATA[
Epigenetic mapping studies across individuals have identified many positions of epigenetic variation in various human tissues and conditions. However the relationships between these positions, and in particular global patterns that recur in many regions of the genome remains understudied. In this study, we use a stacked chromatin state model to systematically learn global patterns of epigenetic variation across individuals and annotate the human genome based on them. We applied this framework to histone modification data across individuals in lymphoblastoid cell lines and across autism spectrum disorder cases and controls in prefrontal cortex tissue. We find that global patterns are correlated across multiple histone modifications and with gene expression. We used the global patterns as a framework to predict transregulators, identify trans-QTL, and study complex disease. The frameworks for identifying and analyzing global patterns of epigenetic variation are general and we expect will be useful in other systems.
]]></description>
<dc:creator>Zou, J.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2022-08-03</dc:date>
<dc:identifier>doi:10.1101/2022.08.02.502571</dc:identifier>
<dc:title><![CDATA[Chromatin state modeling across individuals reveals global patterns of histone modifications]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-08-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.07.14.549056v1?rss=1">
<title>
<![CDATA[
Integrative epigenomic and functional characterization assay based annotation of regulatory activity across diverse human cell types 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.07.14.549056v1?rss=1"
</link>
<description><![CDATA[
We introduce ChromActivity, a computational framework for predicting and annotating regulatory activity across the genome through integration of multiple epigenomic maps and various functional characterization datasets. ChromActivity generates genomewide predictions of regulatory activity associated with each functional characterization dataset across many cell types based on available epigenomic data. It then for each cell type produces (1) ChromScoreHMM genome annotations based on the combinatorial and spatial patterns within these predictions and (2) ChromScore tracks of overall predicted regulatory activity. ChromActivity provides a resource for analyzing and interpreting the human regulatory genome across diverse cell types.
]]></description>
<dc:creator>Dincer, T. U.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2023-07-15</dc:date>
<dc:identifier>doi:10.1101/2023.07.14.549056</dc:identifier>
<dc:title><![CDATA[Integrative epigenomic and functional characterization assay based annotation of regulatory activity across diverse human cell types]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-07-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.05.14.594262v1?rss=1">
<title>
<![CDATA[
Synovial Sarcoma Chromatin Dynamics Reveal a Continuum in SS18:SSX Reprograming 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.05.14.594262v1?rss=1"
</link>
<description><![CDATA[
Synovial sarcoma (SyS) is an aggressive soft-tissue malignancy characterized by a pathognomonic chromosomal translocation leading to the formation of the SS18::SSX fusion oncoprotein. SS18::SSX associates with mammalian BAF complexes suggesting deregulation of chromatin architecture as the oncogenic driver in this tumour type. To examine the epigenomic state of SyS we performed comprehensive multi-omics analysis on 52 primary pre-treatment human SyS tumours. Our analysis revealed a continuum of epigenomic states across the cohort at fusion target genes independent of rare somatic genetic lesions. We identify cell-of-origin signatures defined by enhancer states and reveal unexpected relationships between H2AK119Ub1 and active marks. The number of bivalent promoters, dually marked by the repressive H3K27me3 and activating H3K4me3 marks, has strong prognostic value and outperforms tumor grade in predicting patient outcome. Finally, we identify SyS defining epigenomic features including H3K4me3 expansion associated with striking promoter DNA hypomethylation in which SyS displays the lowest mean methylation level of any sarcoma subtype. We explore these distinctive features as potential vulnerabilities in SyS and identify H3K4me3 inhibition as a promising therapeutic strategy.
]]></description>
<dc:creator>Hofvander, J.</dc:creator>
<dc:creator>Qiu, A.</dc:creator>
<dc:creator>Lee, K.</dc:creator>
<dc:creator>Bilenky, M.</dc:creator>
<dc:creator>Carles, A.</dc:creator>
<dc:creator>Cao, Q.</dc:creator>
<dc:creator>Moksa, M.</dc:creator>
<dc:creator>Steif, J.</dc:creator>
<dc:creator>Su, E.</dc:creator>
<dc:creator>Sotiriou, A.</dc:creator>
<dc:creator>Goytain, A.</dc:creator>
<dc:creator>Hill, L.</dc:creator>
<dc:creator>Singer, S.</dc:creator>
<dc:creator>Andulis, I.</dc:creator>
<dc:creator>Wunder, J.</dc:creator>
<dc:creator>Mertens, F.</dc:creator>
<dc:creator>Banito, A.</dc:creator>
<dc:creator>Jones, K. B.</dc:creator>
<dc:creator>Underhill, T. M.</dc:creator>
<dc:creator>Nielsen, T.</dc:creator>
<dc:creator>Hirst, M.</dc:creator>
<dc:date>2024-05-17</dc:date>
<dc:identifier>doi:10.1101/2024.05.14.594262</dc:identifier>
<dc:title><![CDATA[Synovial Sarcoma Chromatin Dynamics Reveal a Continuum in SS18:SSX Reprograming]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-05-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.07.22.501196v1?rss=1">
<title>
<![CDATA[
Regulatory roles of three-dimensional structures of topologically associating domains 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.07.22.501196v1?rss=1"
</link>
<description><![CDATA[
1Transcriptional enhancers usually, but not always, regulate genes within the same topologically associating domain (TAD). We hypothesize that this incomplete insulation is due to three-dimensional structures of corresponding chromatin domains in individual cells: Whereas enhancers and genes buried inside the "core" of a domain interact mostly with other regions in the same domain, those on the "surface" can more easily interact with the outside. Here we show that a simple measure, the intra-TAD ratio, can quantify the "coreness" of a region with respect to single-cell domains it belongs. We show that domain surfaces are permissive for high gene expression, and cell type-specific active cis-regulatory elements (CREs), active histone marks, and transcription factor binding sites are enriched on domain surfaces, most strongly in chromatin subcompartments typically considered inactive. These findings suggest a "domain surface CRE" model of gene regulation. We also find that disease-associated non-coding variants are enriched on domain surfaces.
]]></description>
<dc:creator>Li, K. Y.</dc:creator>
<dc:creator>Cao, Q.</dc:creator>
<dc:creator>Wang, H.</dc:creator>
<dc:creator>Leung, D. C. Y.</dc:creator>
<dc:creator>Yip, K. Y.</dc:creator>
<dc:date>2022-07-23</dc:date>
<dc:identifier>doi:10.1101/2022.07.22.501196</dc:identifier>
<dc:title><![CDATA[Regulatory roles of three-dimensional structures of topologically associating domains]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-07-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/547596v1?rss=1">
<title>
<![CDATA[
Unique and assay specific features of NOMe-, ATAC- and DNase I-seq data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/547596v1?rss=1"
</link>
<description><![CDATA[
Chromatin accessibility maps are important for the functional interpretation of the genome. Here, we systematically analysed assay specific differences between DNase I-Seq, ATAC-Seq and NOMe-Seq in a side by side experimental and bioinformatic setup. We observe that most prominent nucleosome depleted regions (NDRs, e.g. in promoters) are roboustly called by all three or at least two assays. However we also find a high proportion of assay specific NDRs that are often "called" by only one of the assays. We show evidence that these assay specific NDRs are indeed genuine open chromatin sites and contribute important information for accurate gene expression prediction. While technically ATAC-Seq and DNAse I-Seq provide a high NDR calling rate for relatively low sequencing costs in comparison to NOMe-Seq, NOMe-Seq singles out as it provides a multitude of information: it allows to not only detect NDRs but also endogenous DNA methylation, genome wide segmentation into heterochromatic A/B domains and local phasing of nucleosomes outside of NDRs. In summary our comparison strongly suggest to consider assay specific differences for the experimental desgin and for generalized and comparative functional interpretations.
]]></description>
<dc:creator>Nordström, K.</dc:creator>
<dc:creator>Schmidt, F.</dc:creator>
<dc:creator>Gasparoni, N.</dc:creator>
<dc:creator>Salhab, A.</dc:creator>
<dc:creator>Gasparoni, G.</dc:creator>
<dc:creator>Kattler, K.</dc:creator>
<dc:creator>Müller, F.</dc:creator>
<dc:creator>Ebert, P.</dc:creator>
<dc:creator>Costa, I. G.</dc:creator>
<dc:creator>DEEP consortium,</dc:creator>
<dc:creator>Pfeifer, N.</dc:creator>
<dc:creator>Lengauer, T.</dc:creator>
<dc:creator>Schulz, M. H.</dc:creator>
<dc:creator>Walter, J.</dc:creator>
<dc:date>2019-02-13</dc:date>
<dc:identifier>doi:10.1101/547596</dc:identifier>
<dc:title><![CDATA[Unique and assay specific features of NOMe-, ATAC- and DNase I-seq data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-02-13</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.31.494153v1?rss=1">
<title>
<![CDATA[
SARS-CoV-2 impacts the transcriptome and epigenome at the maternal-fetal interface in pregnancy 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.31.494153v1?rss=1"
</link>
<description><![CDATA[
During pregnancy, the maternal-fetal interface plays vital roles in fetal development. Its disruption is frequently found in pregnancy complications. Recent works show increased incidences of adverse pregnancy outcomes in COVID-19 patients; however, the mechanism remains unclear. Here, we analyzed the molecular impacts of SARS-CoV-2 infection on the maternal-fetal interface. Generating bulk and single-nucleus transcriptomic and epigenomic profiles from COVID-19 patients and control samples, we discovered aberrant immune activation and angiogenesis patterns in patients. Surprisingly, retrotransposons were dysregulated in specific cell types. Notably, reduced enhancer activities of LTR8B elements were functionally linked to the downregulation of Pregnancy-Specific Glycoprotein genes in syncytiotrophoblasts. Our findings revealed that SARS-CoV-2 infection induced significant changes to the epigenome and transcriptome at the maternal-fetal interface, which may be associated with pregnancy complications.

One-Sentence SummaryPregnant COVID-19 patients show placental epigenetic and transcriptional changes, associated with adverse pregnancy outcomes.
]]></description>
<dc:creator>Gao, L.</dc:creator>
<dc:creator>Mathur, V.</dc:creator>
<dc:creator>Tam, S. K. M.</dc:creator>
<dc:creator>Zhou, X.</dc:creator>
<dc:creator>Cheung, M. F.</dc:creator>
<dc:creator>Chan, L. Y.</dc:creator>
<dc:creator>Estrada-Gutierrez, G.</dc:creator>
<dc:creator>Leung, B. W.</dc:creator>
<dc:creator>Moungmaithong, S.</dc:creator>
<dc:creator>Wang, C. C.</dc:creator>
<dc:creator>Poon, L.</dc:creator>
<dc:creator>Leung, D. C. Y.</dc:creator>
<dc:date>2022-05-31</dc:date>
<dc:identifier>doi:10.1101/2022.05.31.494153</dc:identifier>
<dc:title><![CDATA[SARS-CoV-2 impacts the transcriptome and epigenome at the maternal-fetal interface in pregnancy]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-31</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.10.491413v1?rss=1">
<title>
<![CDATA[
Epigenetic variation impacts ancestry-associated differences in the transcriptional response to influenza infection 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.10.491413v1?rss=1"
</link>
<description><![CDATA[
Humans display remarkable inter-individual variation in immune response when exposed to identical immune challenges. Yet, our understanding of the genetic and epigenetic factors contributing to such variation remains limited. Here we carried out in-depth genetic, epigenetic, and transcriptional profiling on primary macrophages derived from a panel of European and African-ancestry individuals before and after infection with influenza A virus (IAV). We show that baseline epigenetic profiles are strongly predictive of the transcriptional response to IAV across individuals, and that ancestry-associated differences in gene expression are tightly coupled with variation in enhancer activity. Quantitative trait locus (QTL) mapping revealed highly coordinated genetic effects on gene regulation with many cis-acting genetic variants impacting concomitantly gene expression and multiple epigenetic marks. These data reveal that ancestry-associated differences in the epigenetic landscape are genetically controlled, even more so than variation in gene expression. Lastly, we show that among QTL variants that colocalized with immune-disease loci, only 7% were gene expression QTL, the remaining corresponding to genetic variants that impact one or more epigenetic marks, which stresses the importance of considering molecular phenotypes beyond gene expression in disease-focused studies.
]]></description>
<dc:creator>Aracena, K. A.</dc:creator>
<dc:creator>Lin, Y.-L.</dc:creator>
<dc:creator>Luo, K.</dc:creator>
<dc:creator>Pacis, A.</dc:creator>
<dc:creator>Gona, S.</dc:creator>
<dc:creator>Mu, Z.</dc:creator>
<dc:creator>Yotova, V.</dc:creator>
<dc:creator>Sindeaux, R.</dc:creator>
<dc:creator>Pramatarova, A.</dc:creator>
<dc:creator>Simon, M.-M.</dc:creator>
<dc:creator>Chen, X.</dc:creator>
<dc:creator>Groza, C.</dc:creator>
<dc:creator>Lougheed, D.</dc:creator>
<dc:creator>Gregoire, R.</dc:creator>
<dc:creator>Brownlee, D.</dc:creator>
<dc:creator>Li, Y.</dc:creator>
<dc:creator>He, X.</dc:creator>
<dc:creator>Bujold, D.</dc:creator>
<dc:creator>Pastinen, T.</dc:creator>
<dc:creator>Bourque, G.</dc:creator>
<dc:creator>Barreiro, L. B.</dc:creator>
<dc:date>2022-05-11</dc:date>
<dc:identifier>doi:10.1101/2022.05.10.491413</dc:identifier>
<dc:title><![CDATA[Epigenetic variation impacts ancestry-associated differences in the transcriptional response to influenza infection]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.10.491101v1?rss=1">
<title>
<![CDATA[
Transposable elements are associated with the variable response to influenza infection 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.10.491101v1?rss=1"
</link>
<description><![CDATA[
Influenza A virus (IAV) infections are frequent every year and result in a range of disease severity. Given that transposable elements (TEs) contribute to the activation of innate immunity, we wanted to explore their potential role in this variability. Transcriptome profiling in monocyte-derived macrophages from 39 individuals following IAV infection revealed significant inter-individual variation in viral load post-infection. Using ATAC-seq we identified a set of TE families with either enhanced or reduced accessibility upon infection. Of the enhanced families, 15 showed high variability between individuals and had distinct epigenetic profiles. Motif analysis showed an association with known immune regulators in stably enriched TE families and with other factors in variable families, including KRAB-ZNFs. We also observed a strong association between basal TE transcripts and viral load post infection. Finally, we built a predictive model suggesting that TEs, and host factors regulating TEs, contribute to the variable response to infection.
]]></description>
<dc:creator>Chen, X.</dc:creator>
<dc:creator>Pacis, A. S.</dc:creator>
<dc:creator>Aracena, K. A.</dc:creator>
<dc:creator>Gona, S.</dc:creator>
<dc:creator>Kwan, T.</dc:creator>
<dc:creator>Groza, C.</dc:creator>
<dc:creator>Lin, Y. L.</dc:creator>
<dc:creator>Sindeaux, R. H. M.</dc:creator>
<dc:creator>Yotova, V.</dc:creator>
<dc:creator>Pramatarova, A.</dc:creator>
<dc:creator>Simon, M.-M.</dc:creator>
<dc:creator>Pastinen, T. M.</dc:creator>
<dc:creator>Barreiro, L. B.</dc:creator>
<dc:creator>Bourque, G.</dc:creator>
<dc:date>2022-05-10</dc:date>
<dc:identifier>doi:10.1101/2022.05.10.491101</dc:identifier>
<dc:title><![CDATA[Transposable elements are associated with the variable response to influenza infection]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.09.29.462206v1?rss=1">
<title>
<![CDATA[
Genome graphs detect human polymorphisms in active epigenomic states during influenza infection 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.09.29.462206v1?rss=1"
</link>
<description><![CDATA[
Genetic variants, including mobile element insertions (MEIs), are known to impact the epigenome. We hypothesized that the use of a genome graph, which encapsulates genetic diversity, could reveal missing epigenomic signal. Given the contributions of mobile elements to the evolution of primate innate immunity, we tested this in monocyte-derived macrophages obtained from 35 individuals before and after Influenza virus infection. After characterizing genetic variants in this cohort using linked-reads, including 5140 Alu, 316 L1, 94 SVAs and 48 ERVs, we incorporated them into a genome graph. Mapping epigenetic data to this graph revealed 2.5%, 3.0% and 2.3% novel peaks for H3K4me1 and H3K27ac ChIP-seq and ATAC-seq respectively. Notably, using a genome graph also modified quantitative trait loci estimates and we observed 375 polymorphic MEIs in active epigenomic state. For example, we found an AluYh3 polymorphism whose chromatin state changed after infection and that was associated with the expression of TRIM25, a gene that restricts influenza RNA synthesis. Our results demonstrate that graph genomes can reveal regulatory regions that would have been overlooked by other approaches.
]]></description>
<dc:creator>Groza, C.</dc:creator>
<dc:creator>Chen, X.</dc:creator>
<dc:creator>Pacis, A.</dc:creator>
<dc:creator>Simon, M.-M.</dc:creator>
<dc:creator>Pramatarova, A.</dc:creator>
<dc:creator>Aracena, K. A.</dc:creator>
<dc:creator>Pastinen, T.</dc:creator>
<dc:creator>Barreiro, L. B.</dc:creator>
<dc:creator>Bourque, G.</dc:creator>
<dc:date>2021-10-01</dc:date>
<dc:identifier>doi:10.1101/2021.09.29.462206</dc:identifier>
<dc:title><![CDATA[Genome graphs detect human polymorphisms in active epigenomic states during influenza infection]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-10-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.01.28.478202v1?rss=1">
<title>
<![CDATA[
The adapted Activity-By-Contact-model for enhancer-gene assignment and its combination with transcription factor affinities in single cell data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.01.28.478202v1?rss=1"
</link>
<description><![CDATA[
Identifying regulatory regions in the genome is of great interest for understanding the epigenomic landscape in cells. One fundamental challenge in this context is to find the target genes whose expression is affected by the regulatory regions. A recent successful method is the Activity-By-Contact (ABC) model (Fulco et al., 2019) which scores enhancer-gene interactions based on enhancer activity and the contact frequency of an enhancer to its target gene. However, it describes regulatory interactions entirely from a genes perspective, and does not account for all the candidate target genes of an enhancer. In addition, the ABC-model requires two types of assays to measure enhancer activity, which limits the applicability. Moreover, there is no implementation available that could allow for an integration with transcription factor (TF) binding information nor an efficient analysis of single-cell data. We demonstrate that the ABC-score can yield a higher accuracy by adapting the enhancer activity according to the number of contacts the enhancer has to its candidate target genes and also by considering all annotated transcription start sites of a gene. Further, we show that the model is comparably accurate with only one assay to measure enhancer activity. We combined our generalised ABC-model (gABC) with TF binding information and illustrate an analysis of a single-cell ATAC-seq data set of the human heart, where we were able to characterise cell type-specific regulatory interactions and predict gene expression based on transcription factor affinities. All executed processing steps are incorporated into our new computational pipeline STARE. The software is available at https://github.com/schulzlab/STARE.
]]></description>
<dc:creator>Hecker, D.</dc:creator>
<dc:creator>Behjati Ardakani, F.</dc:creator>
<dc:creator>Schulz, M. H.</dc:creator>
<dc:date>2022-01-28</dc:date>
<dc:identifier>doi:10.1101/2022.01.28.478202</dc:identifier>
<dc:title><![CDATA[The adapted Activity-By-Contact-model for enhancer-gene assignment and its combination with transcription factor affinities in single cell data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-01-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.06.21.496953v1?rss=1">
<title>
<![CDATA[
TF-COMB - discovering grammar of transcription factor binding sites 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.06.21.496953v1?rss=1"
</link>
<description><![CDATA[
Cooperativity between transcription factors is important to regulate target gene expression. In particular, the binding grammar of TFs in relation to each other, as well as in the context of other genomic elements, is crucial for TF functionality. However, tools to easily uncover co-occurrence between DNA-binding proteins, and investigate the regulatory modules of TFs, are limited. Here we present TF-COMB (Transcription Factor Co-Occurrence using Market Basket analysis) - a tool to investigate co-occurring TFs and binding grammar within regulatory regions. We found that TF-COMB can accurately identify known co-occurring TFs from ChIP-seq data, as well as uncover preferential localization to other genomic elements. With the use of ATAC-seq footprinting and TF motif locations, we found that TFs exhibit both preferred orientation and distance in relation to each other, and that these are biologically significant. Finally, we extended the analysis to not only investigate individual TF pairs, but also TF pairs in the context of networks, which enabled the investigation of TF complexes and TF hubs. In conclusion, TF-COMB is a flexible tool to investigate various aspects of TF binding grammar.

Graphical abstract

O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=101 SRC="FIGDIR/small/496953v1_ufig1.gif" ALT="Figure 1">
View larger version (23K):
org.highwire.dtl.DTLVardef@e37cccorg.highwire.dtl.DTLVardef@1165241org.highwire.dtl.DTLVardef@725a2corg.highwire.dtl.DTLVardef@17e1893_HPS_FORMAT_FIGEXP  M_FIG C_FIG
]]></description>
<dc:creator>Bentsen, M.</dc:creator>
<dc:creator>Heger, V.</dc:creator>
<dc:creator>Schultheis, H.</dc:creator>
<dc:creator>Kuenne, C.</dc:creator>
<dc:creator>Looso, M.</dc:creator>
<dc:date>2022-06-22</dc:date>
<dc:identifier>doi:10.1101/2022.06.21.496953</dc:identifier>
<dc:title><![CDATA[TF-COMB - discovering grammar of transcription factor binding sites]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-06-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.11.14.516365v1?rss=1">
<title>
<![CDATA[
FORGEdb: systematic analysis of candidate causal variants to uncover target genes and mechanisms in complex traits. 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.11.14.516365v1?rss=1"
</link>
<description><![CDATA[
The majority of disease-associated variants identified through genome-wide association studies (GWAS) are located outside of protein-coding regions and are overrepresented in sequences that regulate gene expression. Prioritizing candidate regulatory variants and potential biological mechanisms for further functional experiments, such as genome editing, can be challenging, especially in regions with a high number of variants in strong linkage disequilibrium or multiple proximal gene targets. Improved annotation of the regulatory genome can help identify promising variants and target genes for functional genomics experiments. To advance this area, we developed FORGEdb (https://forge2.altiusinstitute.org/files/forgedb.html), a web-based tool that can rapidly integrate data for individual genetic variants, providing information on associated regulatory elements, transcription factor (TF) binding sites and target genes for over 37 million variants. FORGEdb uses annotations derived from data across a wide range of biological samples to delineate the regulatory context for each variant at the cell type level. Multiple data types, such as Combined Annotation Dependent Depletion (CADD) scores, expression quantitative trait loci (eQTLs), activity-by-contact (ABC) interactions, Contextual Analysis of TF Occupancy (CATO) scores, transcription factor (TF) motifs, DNase I hotspots, histone mark ChIP-seq peaks and chromatin states, are included in FORGEdb and these annotations are integrated into a FORGEdb score to guide assessment of functional importance. In summary, FORGEdb provides an expansive and unique resource of genomic annotations and an integrated score that can be used to accelerate the translation of identified genetic loci into biological insight.
]]></description>
<dc:creator>Breeze, C. E.</dc:creator>
<dc:creator>Haugen, E.</dc:creator>
<dc:creator>Gutierrez-Arcelus, M.</dc:creator>
<dc:creator>Yao, X.</dc:creator>
<dc:creator>Teschendorff, A.</dc:creator>
<dc:creator>Beck, S.</dc:creator>
<dc:creator>Dunham, I.</dc:creator>
<dc:creator>Stamatoyannopoulos, J.</dc:creator>
<dc:creator>Franceschini, N.</dc:creator>
<dc:creator>Machiela, M.</dc:creator>
<dc:creator>Berndt, S.</dc:creator>
<dc:date>2022-11-16</dc:date>
<dc:identifier>doi:10.1101/2022.11.14.516365</dc:identifier>
<dc:title><![CDATA[FORGEdb: systematic analysis of candidate causal variants to uncover target genes and mechanisms in complex traits.]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-11-16</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.08.03.551309v1?rss=1">
<title>
<![CDATA[
EpiVar Browser: advanced exploration of epigenomics data under controlled access 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.08.03.551309v1?rss=1"
</link>
<description><![CDATA[
MotivationHuman epigenomic data has been generated by large consortia for thousands of cell types to be used as a reference map of normal and disease chromatin states. Since epigenetic data contains potentially identifiable information, similarly to genetic data, most raw files generated by these consortia are stored in controlled-access databases. It is important to protect identifiable information, but this should not hinder secure sharing of these valuable datasets.

ResultsGuided by the Framework for responsible sharing of genomic and health-related data from the Global Alliance for Genomics and Health (GA4GH), we have developed a tool to facilitate the exploration of epigenomics datasets aggregate results, while filtering out identifiable information. Specifically, the EpiVar Browser allows a user to navigate an epigenetic dataset from a cohort of individuals and enables direct exploration of genotype-chromatin phenotype relationships. Because the information about individual genotypes is not accessible and aggregated in the output that is made available, no identifiable data is released, yet the interface allows for dynamic genotype - epigenome interrogation. This approach has the potential to accelerate analyses that would otherwise require a lengthy multi-step approval process and provides a generalisable strategy to facilitate responsible access to sensitive epigenomics data.

Availability and implementationOnline portal instance: https://computationalgenomics.ca/tools/epivar

Source code: https://github.com/c3g/epivar-browser
]]></description>
<dc:creator>Lougheed, D. R.</dc:creator>
<dc:creator>Liu, H.</dc:creator>
<dc:creator>Aracena, K. A.</dc:creator>
<dc:creator>Gregoire, R.</dc:creator>
<dc:creator>Pacis, A.</dc:creator>
<dc:creator>Pastinen, T.</dc:creator>
<dc:creator>Barreiro, L. B.</dc:creator>
<dc:creator>Joly, Y.</dc:creator>
<dc:creator>Bujold, D.</dc:creator>
<dc:creator>Bourque, G.</dc:creator>
<dc:date>2023-08-05</dc:date>
<dc:identifier>doi:10.1101/2023.08.03.551309</dc:identifier>
<dc:title><![CDATA[EpiVar Browser: advanced exploration of epigenomics data under controlled access]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-08-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.07.15.549175v1?rss=1">
<title>
<![CDATA[
Robust chromatin state annotation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.07.15.549175v1?rss=1"
</link>
<description><![CDATA[
BackgroundSegmentation and genome annotations (SAGA) methods such as ChromHMM and Segway are widely to annotate chromatin states in the genome. These algorithms take as input a collection of genomics datasets, partition the genome, and assign a label to each segment such that positions with the same label have similar patterns in the input data. SAGA methods output an human-interpretable summary of the genome by labeling every genomic position with its annotated activity such as Enhancer, Transcribed, etc. Chromatin state annotations are essential for many genomic tasks, including identifying active regulatory elements and interpreting disease-associated genetic variation. However, despite the widespread applications of SAGA methods, no principled approach exists to evaluate the statistical significance of SAGA state assignments.

ResultsTowards the goal of producing robust chromatin state annotations, we performed a comprehensive evaluation of the reproducibility of SAGA methods. We show that SAGA annotations exhibit a large degree of disagreement, even when run with the same method on replicated data sets. This finding suggests that there is significant risk to using SAGA chromatin state annotations.

To remedy this problem, we introduce SAGAconf, a method for assigning a measure of confidence (r-value) to SAGA annotations. This r-value is assigned to each genomic bin of a SAGA annotation and represents the probability that the label of this bin will be reproduced in a replicated experiment. This process is analogous to irreproducible discovery rate (IDR) analysis that is commonly used for ChIP-seq peak calling and related tasks. Thus SAGAconf allows a researcher to select only the reliable parts of a SAGA annotation for use in downstream analyses.

SAGAconf r-values provide accurate confidence estimates of SAGA annotations, allowing researchers to filter out unreliable elements and remove doubt in those that stand up to this scrutiny.
]]></description>
<dc:creator>Foroozandeh Shahraki, M.</dc:creator>
<dc:creator>Farahbod, M.</dc:creator>
<dc:creator>Libbrecht, M. W.</dc:creator>
<dc:date>2023-07-17</dc:date>
<dc:identifier>doi:10.1101/2023.07.15.549175</dc:identifier>
<dc:title><![CDATA[Robust chromatin state annotation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-07-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.01.31.526404v1?rss=1">
<title>
<![CDATA[
A statistical approach to identify regulatory DNA variations 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.01.31.526404v1?rss=1"
</link>
<description><![CDATA[
Non-coding variations located within regulatory elements may alter gene expression by modifying Transcription Factor (TF) binding sites and thereby lead to functional consequences like various traits or diseases. To understand these molecular mechanisms, different TF models are being used to assess the effect of DNA sequence variations, such as Single Nucleotide Polymorphisms (SNPs). However, few statistical approaches exist to compute statistical significance of results but they often are slow for large sets of SNPs, such as data obtained from a genome-wide association study (GWAS) or allele-specific analysis of chromatin data.

ResultsWe investigate the distribution of maximal differential TF binding scores for general computational models that assess TF binding. We find that a modified Laplace distribution can adequately approximate the empirical distributions. A benchmark on in vitro and in vivo data sets showed that our new approach improves on an existing method in terms of performance and speed. In applications on large sets of eQTL and GWAS SNPs we could illustrate the usefulness of the novel statistic to highlight cell type specific regulators and TF target genes.

ConclusionsOur approach allows the evaluation of DNA changes that induce differential TF binding in a fast and accurate manner, permitting computations on large mutation data sets. An implementation of the novel approach is freely available at https://github.com/SchulzLab/SNEEP.

Contactmarcel.schulz@em.uni-frankfurt.de
]]></description>
<dc:creator>Baumgarten, N.</dc:creator>
<dc:creator>Rumpf, L.</dc:creator>
<dc:creator>Kessler, T.</dc:creator>
<dc:creator>Schulz, M. H.</dc:creator>
<dc:date>2023-02-03</dc:date>
<dc:identifier>doi:10.1101/2023.01.31.526404</dc:identifier>
<dc:title><![CDATA[A statistical approach to identify regulatory DNA variations]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-02-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.07.25.605219v1?rss=1">
<title>
<![CDATA[
ChromBERT: Uncovering Chromatin State Motifs in the Human Genome Using a BERT-based Approach 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.07.25.605219v1?rss=1"
</link>
<description><![CDATA[
Chromatin states, which are defined by specific combinations of histone post-translational modifications, are fundamental to gene regulation and cellular identity. Despite their importance, comprehensive patterns within chromatin state sequences, which could provide insights into key biological functions, remain largely unexplored. In this study, we introduce ChromBERT, a BERT-based model specifically designed to detect distinct chromatin state patterns as "motifs." We pre-trained ChromBERT on 15-state chromatin annotations from 127 human cell and tissue types from the ROADMAP consortium. This pre-trained model can be fine-tuned for various downstream tasks, and obtained high-attention chromatin state patterns are extracted as motifs. To account for the variable-length nature of chromatin state motifs, ChromBERT uses Dynamic Time Warping to cluster similar motifs and identify meaningful representative patterns. In this study, we evaluated the performance of the model on several tasks, including binary and quantitative gene expression prediction, cell type classification, and three-dimensional genome feature classification. Our analyses yielded biologically grounded results and revealed the associated chromatin state motifs. This workflow facilitates the discovery of specific chromatin state patterns across different biological contexts and offers a new framework for exploring the dynamics of epigenomic states.
]]></description>
<dc:creator>Lee, S.</dc:creator>
<dc:creator>Lin, C.</dc:creator>
<dc:creator>Chen, C.-Y.</dc:creator>
<dc:creator>Nakato, R.</dc:creator>
<dc:date>2024-07-26</dc:date>
<dc:identifier>doi:10.1101/2024.07.25.605219</dc:identifier>
<dc:title><![CDATA[ChromBERT: Uncovering Chromatin State Motifs in the Human Genome Using a BERT-based Approach]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-07-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.07.24.604914v1?rss=1">
<title>
<![CDATA[
Epigenetic control of metabolic identity across cell types 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.07.24.604914v1?rss=1"
</link>
<description><![CDATA[
BackgroundConstraint-based network modeling is a powerful genomic-scale approach for analyzing cellular metabolism, capturing metabolic variations across tissues and cell types, and defining the metabolic identity essential for identifying disease-associated transcriptional states.

ResultsUsing RNA-seq and epigenomic data from the EpiATLAS resource of the International Human Epigenome Consortium (IHEC), we reconstructed metabolic networks for 1,555 samples spanning 58 tissues and cell types. Analysis of these networks revealed the distribution of metabolic functionalities across human cell types and provides a compendium of human metabolic activity. This integrative approach allowed us to define, across tissues and cell types, i) reactions that fulfil the basic metabolic processes (core metabolism), and ii) cell type-specific functions (unique metabolism), that shape the metabolic identity of a cell or a tissue. Integration with EpiATLAS-derived cell-type-specific gene-level chromatin states and enhancer-gene interactions identified enhancers, transcription factors, and key nodes controlling core and unique metabolism. Transport and first reactions of pathways were enriched for high expression, active chromatin state, and Polycomb-mediated repression in cell types where pathways are inactive, suggesting that key nodes are targets of repression.

DiscussionThis integrative analysis forms the basis for identifying regulation points that control metabolic identity in human cells.
]]></description>
<dc:creator>Pacheco, M. P.</dc:creator>
<dc:creator>Gerard, D.</dc:creator>
<dc:creator>Mangan, R. J.</dc:creator>
<dc:creator>Chapman, A. R.</dc:creator>
<dc:creator>Hecker, D.</dc:creator>
<dc:creator>Kellis, M.</dc:creator>
<dc:creator>Schulz, M. H.</dc:creator>
<dc:creator>Sinkkonen, L.</dc:creator>
<dc:creator>Sauter, T.</dc:creator>
<dc:date>2024-07-24</dc:date>
<dc:identifier>doi:10.1101/2024.07.24.604914</dc:identifier>
<dc:title><![CDATA[Epigenetic control of metabolic identity across cell types]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-07-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.08.07.606967v1?rss=1">
<title>
<![CDATA[
Transposable elements impact the regulatory landscape through cell type specific epigenomic associations 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.08.07.606967v1?rss=1"
</link>
<description><![CDATA[
Transposable elements (TEs) are DNA sequences able to create copies of themselves within the genome. Despite their limited expression due to silencing, TEs still manage to impact the host genome. For instance, some TEs have been shown to act as cis-regulatory elements and be co-opted in the human genome. This highlights that the contributions of TEs to the host might come from their relationship with the epigenome rather than their expression. However, a systematic analysis that relates TEs in the human genome directly with chromatin histone marks across distinct cell types remains lacking. Here we leverage a new dataset from the International Human Epigenome Consortium with 4867 uniformly processed ChIP-seq experiments for 6 histone marks across 175 annotated cell labels and show that TEs have drastically different enrichments levels across marks. Overall, we find that TEs are generally depleted in H3K9me3 histone modification, except for L1s, while MIRs were highly enriched in H3K4me1, H3K27ac and H3K27me3 and Alus were enriched in H3K36me3. Furthermore, we present a generalised profile of the relationship between TEs enrichment and TE age which reveals a few TE families (Alu, MIR, L2) as diverging from expected dynamics. We also find significant differences in TE enrichment between cell types and that in 20% of the cases, these enrichments were cell-type specific. Moreover, we report that at least 4% of cell types-histone-TE combinations featured significant differences in enrichment between healthy and cancer samples. Notably, we identify 456 cell type-histone-TE triplets with strong cell-type specific enrichments. We show that many of these triplets are associated with relevant biological processes and genes expressed in the relevant cell type. These results further support a role for TE in genome regulation and highlight novel associations between TEs and histone marks across cell types.
]]></description>
<dc:creator>Hyacinthe, J.</dc:creator>
<dc:creator>Bourque, G.</dc:creator>
<dc:date>2024-08-07</dc:date>
<dc:identifier>doi:10.1101/2024.08.07.606967</dc:identifier>
<dc:title><![CDATA[Transposable elements impact the regulatory landscape through cell type specific epigenomic associations]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-08-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.08.30.610315v1?rss=1">
<title>
<![CDATA[
Revisiting Evidence for Epigenetic Control of Alternative Splicing 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.08.30.610315v1?rss=1"
</link>
<description><![CDATA[
Alternative splicing is crucial for increasing eukaryotic cell transcriptome and proteome diversity. Changes in alternative splicing play a key role in cell differentiation and tissue development, and aberrations in this process have been associated with diseases. Despite its importance, the exact mechanisms for regulating alternative splicing are poorly understood. Several epigenetic marks, such as histone modification H3K36me3, have previously been associated with changes in alternative splicing. Here, we leverage the EpiATLAS data set to systematically re-evaluate evidence for epigenetic control of alternative splicing.

We used SUPPA2 to calculate percentage spliced-in (PSI) values for skipped exons and retained introns and integrated this information with histone ChIP-seq and DNA methylation data. In addition to genome-wide association analysis with partial correlation and machine learning, we perform locus-specific PSI modeling on individual alternative splicing events. The latter represents a new contribution enabled by the unprecedented number of uniformly processed datasets in EpiATLAS.

Our results confirm previously reported global associations of DNAm and H3K36me3 for exon inclusion and emphasize the importance of intrinsic features for genome-wide associations.

On an event-specific level, trying to identify co-transcriptionally spliced events with epigenetic influence, we show that overall gene expression biases locus-specific analyses. Further, we show that epigenetic signal can predict PSI value in cis and trans, indicating sample-specific epigenetic fingerprints that distort across-sample analyses. Specifically, generalized linear models select epigenetic features predictive for PSI values not only in their genomic vicinity but also on different chromosomes.

Our study demonstrates that epigenetic marks are associated with the alternative inclusion of exons and introns. Without a mechanistic explanation for these associations, our work emphasizes the need for more detailed research into the relationship between epigenetic changes and transcriptome diversity.
]]></description>
<dc:creator>Manz, Q.</dc:creator>
<dc:creator>List, M.</dc:creator>
<dc:date>2024-09-01</dc:date>
<dc:identifier>doi:10.1101/2024.08.30.610315</dc:identifier>
<dc:title><![CDATA[Revisiting Evidence for Epigenetic Control of Alternative Splicing]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-09-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.09.16.613254v1?rss=1">
<title>
<![CDATA[
Epigenomic analysis of hepatocellular carcinoma reveals aberrant cis-regulatory changes and dysregulated retrotransposons with prognostic potentials 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.09.16.613254v1?rss=1"
</link>
<description><![CDATA[
Hepatocellular carcinoma (HCC) exhibits widespread epigenetic alterations, yet their impact on cis-regulatory elements (CREs) and retrotransposons remains poorly understood. Here, we present an integrated epigenomic and transcriptomic analysis of HCC tumors and matched tumor-adjacent normal tissues. We identified extensive DNA hypomethylation coupled with changes in histone modifications at partially methylated domains, CREs, and retrotransposons. These epigenetic aberrations were associated with dysregulated expression of genes involved in cell cycle regulation, immune response, and extracellular matrix organization. Notably, our findings revealed a novel mechanism for the transcriptional dysregulation of GPC3, a key HCC biomarker and immunotherapeutic target. We observed that GPC3 upregulation is driven by both the reactivation of a fetal liver super enhancer and hypomethylation of GPC3-associated CpG islands. Moreover, we found that DNA hypomethylation-driven aberrant expression of retrotransposons carries prognostic significance in HCC. Patients with high expression of a long non-coding RNA driven by a HERVE-int element exhibited more aggressive tumors, poorer clinical outcomes, and molecular features associated with favorable immunotherapy response. Together, our study provides a comprehensive resource for understanding the role of epigenetic dysregulation in HCC and identifies retrotransposon-associated transcripts as potential biomarkers.
]]></description>
<dc:creator>Cheng, C. C. Y.</dc:creator>
<dc:creator>Cheung, M. F.</dc:creator>
<dc:creator>Lee, A. Y.</dc:creator>
<dc:creator>Wu, Q.</dc:creator>
<dc:creator>Chow, S. H.-C.</dc:creator>
<dc:creator>Ang, J. Y. J.</dc:creator>
<dc:creator>Riquelme Medina, I.</dc:creator>
<dc:creator>Lo, G.</dc:creator>
<dc:creator>Wu, H.</dc:creator>
<dc:creator>Yang, W.</dc:creator>
<dc:creator>Lai, P. B. S.</dc:creator>
<dc:creator>Yip, K.</dc:creator>
<dc:creator>Cheng, A.</dc:creator>
<dc:creator>Leung, D. C. Y.</dc:creator>
<dc:date>2024-09-20</dc:date>
<dc:identifier>doi:10.1101/2024.09.16.613254</dc:identifier>
<dc:title><![CDATA[Epigenomic analysis of hepatocellular carcinoma reveals aberrant cis-regulatory changes and dysregulated retrotransposons with prognostic potentials]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-09-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.02.20.639228v1?rss=1">
<title>
<![CDATA[
Cell type-specific epigenetic regulatory circuitry of coronary artery disease loci 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.02.20.639228v1?rss=1"
</link>
<description><![CDATA[
Coronary artery disease (CAD) is the leading cause of death worldwide. Recently, hundreds of genomic loci have been shown to increase CAD risk, however, the molecular mechanisms underlying signals from CAD risk loci remain largely unclear. We sought to pinpoint the candidate causal coding and non-coding genes of CAD risk loci in a cell type-specific fashion. We integrated the latest statistics of CAD genetics from over one million individuals with epigenetic data from 45 relevant cell types to identify genes whose regulation is affected by CAD-associated single nucleotide variants (SNVs) via epigenetic mechanisms. Applying two statistical approaches, we identified 1,580 genes likely involved in CAD, about half of which have not been associated with the disease so far. Enrichment analysis and phenome-wide association studies linked the novel candidate genes to disease-specific pathways and CAD risk factors, corroborating their disease relevance. We showed that CAD-SNVs are enriched to regulate gene expression by affecting the binding of transcription factors (TFs) with cellular specificity. Of all the candidate genes, 23.5% represented non-coding RNAs (ncRNA), which likewise showed strong cell type specificity. We conducted a proof-of-concept biological validation for the novel CAD ncRNA gene IQCH-AS1. CRISPR/Cas9-based gene knockout of IQCH-AS1, in a human preadipocyte strain, resulted in reduced preadipocyte proliferation, less adipocyte lipid accumulation, and atherogenic cytokine profile. The cellular data are in line with the reduction of IQCH-AS1 in adipose tissues of CAD patients and the negative impact of risk alleles on its expression, suggesting IQCH-AS1 to be protective for CAD. Our study not only pinpoints CAD candidate genes in a cell type-specific fashion but also spotlights the roles of the understudied ncRNA genes in CAD genetics.
]]></description>
<dc:creator>Hecker, D.</dc:creator>
<dc:creator>Song, X.</dc:creator>
<dc:creator>Baumgarten, N.</dc:creator>
<dc:creator>Diagel, A.</dc:creator>
<dc:creator>Katsaouni, N.</dc:creator>
<dc:creator>Li, L.</dc:creator>
<dc:creator>Li, S.</dc:creator>
<dc:creator>Kumar Maji, R.</dc:creator>
<dc:creator>Behjati Ardakani, F.</dc:creator>
<dc:creator>Ma, L.</dc:creator>
<dc:creator>Tews, D.</dc:creator>
<dc:creator>Wabitsch, M.</dc:creator>
<dc:creator>Björkegren, J. L. M.</dc:creator>
<dc:creator>Schunkert, H.</dc:creator>
<dc:creator>Chen, Z.</dc:creator>
<dc:creator>Schulz, M. H.</dc:creator>
<dc:date>2025-02-21</dc:date>
<dc:identifier>doi:10.1101/2025.02.20.639228</dc:identifier>
<dc:title><![CDATA[Cell type-specific epigenetic regulatory circuitry of coronary artery disease loci]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-02-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.02.06.636950v1?rss=1">
<title>
<![CDATA[
Pan-cell type continuous chromatin state annotation of all IHEC epigenomes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.02.06.636950v1?rss=1"
</link>
<description><![CDATA[
The International Human Epigenome Consortium has generated thousands of epigenomic datasets that mea-sure various biochemical activities in the genome, including transcription factor binding, histone modification, and DNA accessibility. Currently, the predominant methods for integrating these datasets to annotate regu-latory elements are segmentation and genome annotation (SAGA) algorithms. The majority of annotations by these methods are cell type-specific. However, as the number of profiled cell types has grown into the thousands, using thousands of cell type-specific chromatin state annotations proves undesirable for many applications. Here, we present a pan-cell type annotation that summarizes all IHEC epigenomes using the recently-developed method, epigenome-ssm.
]]></description>
<dc:creator>Daneshpajouh, H.</dc:creator>
<dc:creator>Moghul, I.</dc:creator>
<dc:creator>Wiese, K. C.</dc:creator>
<dc:creator>Libbrecht, M. W.</dc:creator>
<dc:date>2025-02-08</dc:date>
<dc:identifier>doi:10.1101/2025.02.06.636950</dc:identifier>
<dc:title><![CDATA[Pan-cell type continuous chromatin state annotation of all IHEC epigenomes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-02-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.03.17.643746v1?rss=1">
<title>
<![CDATA[
GIMMEcpg: Global Imputation of Mean CpG MEthylation in Real-time 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.03.17.643746v1?rss=1"
</link>
<description><![CDATA[
Whole-genome DNA methylation (methylome) analysis is of broad interest to biomedical research due to its central role in human development and disease. However, generating high-quality methylomes at scale remains challenging due to inherent technical limitations. While imputation has the potential to help overcome this problem, no existing approach adequately addresses the scaling issue.

Here, we present GIMMEcpg (Global Imputation of Mean cpg MEthylation), a novel imputation tool that scales efficiently from single samples to large cohort studies. GIMMEcpg uses a custom feature dataset built from known CpG sites within the same dataset to impute missing values by calculating the distance-weighted mean of the methylation value of the two immediately neighbouring CpG sites.

We benchmarked GIMMEcpg for speed and accuracy against multiple imputation methods using downsampled datasets produced from high-quality ([~]100x) Whole Genome Bisulfite Sequencing (WGBS) data. With a 10x downsampled dataset, GIMMEcpg was able to process the dataset and impute 9.14 Million CpG sites within 7 seconds (R: 0.78, MAE: +5.6%, RMSE: +10.9%). Our results demonstrate that GIMMEcpg is 39-2,562 times faster than three existing methylation imputation tools (BoostMe, DeepCpG, and MethImpute) while maintaining comparable accuracy.

To quantify GIMMEcpgs scalability, we applied it to the most extensive single collection of WGBS data (N=645 at variable coverage) from the EpiATLAS generated by the International Human Epigenome Consortium (IHEC). Using a single, standard CPU server, GIMMEcpg processed and imputed an additional 2.4 billion CpG methylation values across the 645 datasets in less than a day, enriching the EpiATLAS methylome resource by 20%. This demonstrates that GIMMEcpg scales to large cohort studies with only a subtle impact on accuracy, as illustrated by our benchmark.

We also developed a machine learning variant, GIMMEcpg.ml, which delivers a higher accuracy compared to existing methodologies. Using the same 10x downsampled benchmarking dataset, GIMMEcpg.ml achieved a Person Correlation of 0.87 compared to the ground truth, representing an improvement of 0.11 over the best performing alternative method. Additionally, GIMMEcpg.ml has a Mean Absolute Error (MAE) of 8.67%, which is 2.63% lower than the most accurate performing alternative. While this enhanced accuracy comes at the cost of increased computation requirements, GIMMEcpg.ml is a useful tool where higher accuracy is preferred over scalability.

For increased accessibility, GIMMEcpg is freely available under an MIT license as R and Python packages at https://github.com/ucl-medical-genomics/gimmecpg-r and https://github.com/ucl-medical-genomics/gimmecpg-python, respectively.
]]></description>
<dc:creator>Moghul, I.</dc:creator>
<dc:creator>Chai, N.</dc:creator>
<dc:creator>Pontikos, N.</dc:creator>
<dc:creator>Hardcastle, A.</dc:creator>
<dc:creator>Herrero, J.</dc:creator>
<dc:creator>Beck, S.</dc:creator>
<dc:date>2025-03-18</dc:date>
<dc:identifier>doi:10.1101/2025.03.17.643746</dc:identifier>
<dc:title><![CDATA[GIMMEcpg: Global Imputation of Mean CpG MEthylation in Real-time]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-03-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.03.15.641804v1?rss=1">
<title>
<![CDATA[
Systematic comparison of dCas9-based DNA methylation epimodifiers over time indicates efficient on-target and widespread off-target effects 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.03.15.641804v1?rss=1"
</link>
<description><![CDATA[
CRISPR/dCas9-based epigenome editing systems, including DNA methylation epimodifiers, have greatly advanced molecular functional studies revolutionizing their precision and applicability. Despite their promise, challenges such as the magnitude and stability of the on-target editing and unwanted off-target effects underscore the need for improved tool characterization and design. We systematically compared specific targeting of the BACH2 gene promoter and genome-wide off-target effects of available and novel dCas9-based DNA methylation editing tools over time. We demonstrate that multimerization of the catalytic domain of DNA methyltransferase 3A enhances editing potency but also induces widespread, early methylation deposition at low-to-medium methylated promoter-related regions with specific gRNAs and, interestingly, also with non-targeting gRNAs. A small fraction of the methylation changes associated with transcriptional dysregulation and mapped predominantly to bivalent chromatin associating both with transcriptional repression and activation. Additionally, specific non-targeting control gRNA caused pervasive and long-lasting methylation-independent transcriptional alterations particularly in genes linked to RNA and energy metabolism. CRISPRoff emerged as the most efficient tool for stable targeting of the BACH2 promoter, with fewer and less stable off-target effects compared to other epimodifiers but with persistent transcriptome alterations. Our findings highlight the delicate balance between potency and specificity of epigenome editing and provide critical insights into the design and application of future tools to improve their precision and minimize unintended consequences.
]]></description>
<dc:creator>Pahlevan Kakhki, M.</dc:creator>
<dc:creator>Rangani, F.</dc:creator>
<dc:creator>Ewing, E.</dc:creator>
<dc:creator>Starvaggi Cucuzza, C.</dc:creator>
<dc:creator>Zheleznyakova, G.</dc:creator>
<dc:creator>Kalomoiri, M.</dc:creator>
<dc:creator>v, T. V. S.</dc:creator>
<dc:creator>Covacu, R.</dc:creator>
<dc:creator>Andreou, I.</dc:creator>
<dc:creator>Needhamsen, M.</dc:creator>
<dc:creator>Kular, L.</dc:creator>
<dc:creator>Jagodic, M.</dc:creator>
<dc:date>2025-03-16</dc:date>
<dc:identifier>doi:10.1101/2025.03.15.641804</dc:identifier>
<dc:title><![CDATA[Systematic comparison of dCas9-based DNA methylation epimodifiers over time indicates efficient on-target and widespread off-target effects]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-03-16</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.04.16.648998v1?rss=1">
<title>
<![CDATA[
Cell type-specific epigenomic variation and its association with genotype in the human breast 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.04.16.648998v1?rss=1"
</link>
<description><![CDATA[
BackgroundUnderstanding the interplay between genome variation and epigenomic structure is fundamental to the study of the development and mechanisms of disease. Previous studies have leveraged population-scale genotype surveys to associate alleles with epigenomic states in heterogenous tissue types. However, epigenomes are inherently cell type-specific, giving rise to unique genome-epigenome interactions that can influence distinct functional states and susceptibility to disease. Moreover, the extent of individual variation in cell type-specific epigenotypes remains poorly understood, posing additional challenges to accurately link genotypes with epigenomic features.

ResultsWe generated comprehensive genomic and epigenomic measurements in four functionally defined human breast cell types across eight individuals. We developed a method to measure histone modification variance, discovering significantly higher variation in repressive chromatin states marked by H3K27me3 compared to the active states marked by H3K27ac and H3K4me3. Genetic variation linked to variation in chromatin state was highly cell type-specific, with nearly 90% occurring uniquely in a single cell type, and active histone modifications were enriched in these variants relative to repressive modifications. Association with gene transcription allowed for the prioritization of functional candidates, and the regulatory impact of an ANXA1-linked variant, rs75071948, was validated in vitro with CRISPR/Cas9-mediated HDR.

ConclusionsWe define structures of epigenomic variability among breast cell types and present evidence of extensive cell type-specific genome-epigenome interactions, highlighting the critical role of cell type in mediating these associations in the breast.
]]></description>
<dc:creator>Hauduc, A.</dc:creator>
<dc:creator>Steif, J.</dc:creator>
<dc:creator>Bilenky, M.</dc:creator>
<dc:creator>Moksa, M. M.</dc:creator>
<dc:creator>Cao, Q.</dc:creator>
<dc:creator>Ding, S.</dc:creator>
<dc:creator>Eaves, C. J.</dc:creator>
<dc:creator>Hirst, M.</dc:creator>
<dc:date>2025-04-20</dc:date>
<dc:identifier>doi:10.1101/2025.04.16.648998</dc:identifier>
<dc:title><![CDATA[Cell type-specific epigenomic variation and its association with genotype in the human breast]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-04-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.07.556549v1?rss=1">
<title>
<![CDATA[
EpiSegMix: A Flexible Distribution Hidden Markov Model with Duration Modeling for Chromatin State Discovery 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.07.556549v1?rss=1"
</link>
<description><![CDATA[
MotivationAutomated chromatin segmentation based on ChIP-seq data reveals insights into the epigenetic regulation of chromatin accessibility. Existing segmentation methods are constrained by simplifying modeling assumptions, which may have a negative impact on the segmentation quality.

ResultsWe introduce EpiSegMix, a novel segmentation method based on a hidden Markov model with flexible read count distribution types and state duration modeling, allowing for a more flexible modeling of both histone signals and segment lengths. In a comparison with two existing tools, ChromHMM, Segway and EpiCSeg, we show that EpiSegMix is more predictive of cell biology, such as gene expression. Its flexible framework enables it to fit an accurate probabilistic model, which has the potential to increase the biological interpretability of chromatin states.

Availability and implementationSource code: https://gitlab.com/rahmannlab/episegmix.
]]></description>
<dc:creator>Schmitz, J. E.</dc:creator>
<dc:creator>Aggarwal, N.</dc:creator>
<dc:creator>Laufer, L.</dc:creator>
<dc:creator>Walter, J.</dc:creator>
<dc:creator>Salhab, A.</dc:creator>
<dc:creator>Rahmann, S.</dc:creator>
<dc:date>2023-09-07</dc:date>
<dc:identifier>doi:10.1101/2023.09.07.556549</dc:identifier>
<dc:title><![CDATA[EpiSegMix: A Flexible Distribution Hidden Markov Model with Duration Modeling for Chromatin State Discovery]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.05.09.653095v1?rss=1">
<title>
<![CDATA[
Harnessing machine learning models for epigenome to transcriptome association studies 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.05.09.653095v1?rss=1"
</link>
<description><![CDATA[
Understanding how epigenome variation contributes to gene expression in disease and development is a fundamental challenge. Regulatory regions show cell type-specific epigenome activity and differ in their location, size, and distance to their target genes, complicating discovery and analysis. Recent machine learning models have been proposed to address these problems by learning functions for the prediction of gene expression from epigenomic data. Here, we use the large IHEC EpiATLAS dataset to benchmark state-of-the-art linear and non-linear approaches. Each approach is optimized for over 28,000 human genes, providing a comprehensive regulatory catalog of gene models. In-depth comparison reveals that gene characteristics and the epigenomic complexity of the locus influence the difficulty of predicting the epigenome-to-transcriptome association. The model performance is further evaluated using CRISPRi and eQTL validation data. Based on these models, we conduct histone-acetylation association studies in a systematic way to investigate how epigenomic variation impacts gene expression. The model-based analysis revealed genes and regulatory regions linked to B-cell leukemia in patient data with known disease-related functions. Our work provides a foundation for applications that link epigenome variation to gene expression in human cells, by benchmarking methods on a per-gene basis, illustrating their use in a disease context and making trained models available to the community.
]]></description>
<dc:creator>Behjati ardakani, F.</dc:creator>
<dc:creator>Ashrafiyan, S.</dc:creator>
<dc:creator>Rumpf, L.</dc:creator>
<dc:creator>Hecker, D.</dc:creator>
<dc:creator>Schulz, M. H.</dc:creator>
<dc:date>2025-05-15</dc:date>
<dc:identifier>doi:10.1101/2025.05.09.653095</dc:identifier>
<dc:title><![CDATA[Harnessing machine learning models for epigenome to transcriptome association studies]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-05-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.06.18.660301v1?rss=1">
<title>
<![CDATA[
Epilogos: information-theoretic navigation of multi-tissue functional genomic annotations 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.06.18.660301v1?rss=1"
</link>
<description><![CDATA[
Functional genomics data, such as chromatin state maps, provide critical insights into biological processes, but are hard to navigate and interpret. We present Epilogos to address this challenge by offering a simple information-theoretic framework for large-scale visualization, navigation and interpretation of functional genomics annotations, and apply it to over 2,000 genome-wide chromatin state maps in human and mouse. We construct intuitive visualizations of multi-tissue chromatin state maps, prioritize salient genomic regions, identify group-wise differential regions, and enable rapid similarity search given a region of interest. To facilitate usability, we provide a purpose-built web-based browser interface (http://epilogos.net) alongside open-source software for community access and adoption.
]]></description>
<dc:creator>Quon, J.</dc:creator>
<dc:creator>Reynolds, A. P.</dc:creator>
<dc:creator>Tripician, N.</dc:creator>
<dc:creator>Rynes, E. T.</dc:creator>
<dc:creator>Teodosiadis, A.</dc:creator>
<dc:creator>Kellis, M.</dc:creator>
<dc:creator>Meuleman, W.</dc:creator>
<dc:date>2025-06-23</dc:date>
<dc:identifier>doi:10.1101/2025.06.18.660301</dc:identifier>
<dc:title><![CDATA[Epilogos: information-theoretic navigation of multi-tissue functional genomic annotations]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-06-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.07.25.666820v1?rss=1">
<title>
<![CDATA[
Integrated flexible DNA methylation-chromatin segmentation modelingenhances epigenomic state annotation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.07.25.666820v1?rss=1"
</link>
<description><![CDATA[
DNA methylation and histone modifications together shape the cell-type-specific epigenomic landscape. To enable a more comprehensive genome-wide annotation, we developed EpiSegMixMeth (ESMM), the first truly integrative segmentation model combining chromatin marks and DNA methylation. ESMM extends hidden Markov models with flexible read count distributions and state duration modeling. Applied to 154 high-quality human epigenomes from the IHEC EpiAtlas, ESMM substantially improves the annotation of broad heterochromatic regions-covering over 60% of the genome, that are frequently missed by chromatin-only models. Additionally, it precisely defines the boundaries of narrow regulatory elements and resolves local chromatin state transitions during cell differentiation. Notably, we demonstrate that DNA methylation can substitute for missing repressive histone marks in segmentation, ensuring robust annotation across diverse cell types. In memory B-cell development, ESMM reveals fine-scale chromatin state shifts that align with 3D genome architecture changes. Our results highlight the power of integrating DNA methylation into genome segmentation and provide a valuable resource for dissecting cell-type-specific epigenomic regulation.
]]></description>
<dc:creator>Aggarwal, N.</dc:creator>
<dc:creator>Schmitz, J. E.</dc:creator>
<dc:creator>Laufer, L.</dc:creator>
<dc:creator>Rahmann, S.</dc:creator>
<dc:creator>Walter, J.</dc:creator>
<dc:creator>Salhab, A.</dc:creator>
<dc:date>2025-07-27</dc:date>
<dc:identifier>doi:10.1101/2025.07.25.666820</dc:identifier>
<dc:title><![CDATA[Integrated flexible DNA methylation-chromatin segmentation modelingenhances epigenomic state annotation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-07-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.09.04.670545v1?rss=1">
<title>
<![CDATA[
Leveraging the largest harmonized epigenomic data collection for metadata prediction validated and augmented over 350,000 public epigenomic datasets 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.09.04.670545v1?rss=1"
</link>
<description><![CDATA[
Epigenomic data found in public databases often suffer from issues of non-standardization and incompleteness in their associated metadata. There are currently no automated approaches to validate or correct missing or inaccurate information listed in databases. To tackle this challenge, we harnessed the extensive harmonized data and metadata provided by the EpiATLAS project of the International Human Epigenome Consortium (IHEC) to train EpiClass, a suite of machine learning classifiers that can predict key metadata ([~]98% accuracy), including experimental assay, donor sex, biospecimen and sample cancer status. The development of these classifiers enabled the identification of a few mislabeled and low-quality datasets in the EpiATLAS project, while also completing with high-confidence most of the missing metadata. These classifiers were also validated on ENCODE datasets absent from the initial training, then applied to assess more than 350,000 human ChIP-Seq and RNA-Seq datasets from public repositories. Overall, this effort not only validated the accuracy of the vast majority of assays reported by the original authors, but also unveiled [~]500 datasets with discrepancies, in particular through data swap within series of experiments. More importantly, EpiClass also supplied high-confidence predictions for over 320,000 metadata attributes of the biological sample such as the sex, cancer status and biomaterial type, which had been originally omitted in the majority of cases. Our work introduces the first systematic approach for metadata correction and augmentation, enhancing the quality and reliability of publicly available epigenomic data.
]]></description>
<dc:creator>Raby, J.</dc:creator>
<dc:creator>Frosi, G.</dc:creator>
<dc:creator>White, F.</dc:creator>
<dc:creator>Laperle, J.</dc:creator>
<dc:creator>Jacques, P.-E.</dc:creator>
<dc:date>2025-09-04</dc:date>
<dc:identifier>doi:10.1101/2025.09.04.670545</dc:identifier>
<dc:title><![CDATA[Leveraging the largest harmonized epigenomic data collection for metadata prediction validated and augmented over 350,000 public epigenomic datasets]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-09-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.12.19.629547v1?rss=1">
<title>
<![CDATA[
Learning a Pairwise Epigenomic and Transcription Factor Binding Association Score Across the Human Genome 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.12.19.629547v1?rss=1"
</link>
<description><![CDATA[
Identifying pairwise associations between genomic loci is an important challenge for which large and diverse collections of epigenomic and transcription factor (TF) binding data can potentially be informative. We therefore developed Learning Evidence of Pairwise Association from Epigenomic and TF binding data (LEPAE). LEPAE uses neural networks to quantify evidence of association for pairs of genomic windows from large-scale epigenomic and TF binding data along with distance information. We applied LEPAE using thousands of human datasets. We present evidence using additional data that LEPAE captures biologically meaningful pairwise relationships between genomic loci and expect LEPAE scores to be a resource.
]]></description>
<dc:creator>Kwon, S. B.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2024-12-22</dc:date>
<dc:identifier>doi:10.1101/2024.12.19.629547</dc:identifier>
<dc:title><![CDATA[Learning a Pairwise Epigenomic and Transcription Factor Binding Association Score Across the Human Genome]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-12-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.10.02.679828v1?rss=1">
<title>
<![CDATA[
GECSI: Large-scale chromatin state imputation from gene expression 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.10.02.679828v1?rss=1"
</link>
<description><![CDATA[
Compendiums of chromatin state annotations based on integrating maps of multiple epigenetic marks such as from ChromHMM have become a powerful resource. While these compendiums have coverage of many biological samples, there are many additional biological samples that have gene expression data but lack epigenetic mark data and chromatin state annotations. The EpiAtlas resource of the International Human Epigenome Consortium (IHEC) contains a large compendium of chromatin state annotations for which many samples have matched gene expression data, which provides the opportunity to use it to train models to predict chromatin state annotations in additional biological samples with only gene expression data available. To address this, we develop Gene Expression-based Chromatin State Imputation (GECSI), which uses a multi-class logistic regression model trained using a large compendium of gene expression and chromatin state annotations, and apply it to IHEC data. Using cross-validation, we find that GECSI accurately predicts chromatin state assignments and generates probability estimates that are predictive of observed chromatin states, overall outperforming multiple other alternative and baseline methods. GECSI-predicted chromatin states reflect relationships among biological samples and show similar transcription factor and gene annotation enrichments as observed chromatin states. Using available IHEC gene expression data, we apply GECSI to predict chromatin state annotations for 449 additional epigenomes. We expect these predicted annotations and the GECSI software will be a useful resource for chromatin state analyses in many additional biological samples.
]]></description>
<dc:creator>Fu, J.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2025-10-04</dc:date>
<dc:identifier>doi:10.1101/2025.10.02.679828</dc:identifier>
<dc:title><![CDATA[GECSI: Large-scale chromatin state imputation from gene expression]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-10-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.01.11.695537v1?rss=1">
<title>
<![CDATA[
Whole-genome profiling of native 5-hydroxymethylation in human neurons with long-read sequencing 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.01.11.695537v1?rss=1"
</link>
<description><![CDATA[
The 5-hydroxymethylcytosine (5hmC) modification of DNA is particularly prevalent in neurons and thereby a hallmark of the brains epigenetic landscape. While 5mC DNA methylation is a well-known player in genome stability and transcriptional regulation, the role of 5hmC remains largely unknown. Here, we used long-read Oxford Nanopore Technology (ONT) to profile whole-genome, native 5mC and 5hmC levels in sorted neuronal nuclei samples from human post-mortem brain tissue. We applied different models for DNA modification calling and compared with array-based 5mC and 5hmC levels derived from the same samples, demonstrating high sample-wise correlations. Annotation across genomic and regulatory features, as well as chromatin states, generated by the International Human Epigenome Consortium, revealed high levels of 5hmC in introns, actively transcribed genes and (distal) enhancers. Pathway analysis of genes with high levels of 5hmC (> 60%) were enriched in neuron-related terms, with functional variety when stratifying across chromatin states. Analysis of transcription factor motifs in highly methylated regions, demonstrated 5hmC- and 5mC-specific enrichment affecting downstream regulatory networks.

Altogether, our study demonstrates the potential of ONT to characterize whole-genome, native 5hmC and 5mC DNA modifications in human neurons, specifically highlighting the enrichment of 5hmC in actively transcribed regions and enhancers in the human brain.
]]></description>
<dc:creator>Klose, D.</dc:creator>
<dc:creator>Sepehri, M. H.</dc:creator>
<dc:creator>Olsen, R.-A.</dc:creator>
<dc:creator>Vu, H.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:creator>Kular, L.</dc:creator>
<dc:creator>Needhamsen, M.</dc:creator>
<dc:creator>Jagodic, M.</dc:creator>
<dc:date>2026-01-12</dc:date>
<dc:identifier>doi:10.64898/2026.01.11.695537</dc:identifier>
<dc:title><![CDATA[Whole-genome profiling of native 5-hydroxymethylation in human neurons with long-read sequencing]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-01-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.02.18.706470v1?rss=1">
<title>
<![CDATA[
TEExplorer: A Web Portal to Investigate TE-Epigenome Associations Across Human Cell Types 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.02.18.706470v1?rss=1"
</link>
<description><![CDATA[
Approximately half of the human genome is derived from transposable elements (TEs) and several studies support the involvement of TEs in genome regulation in development, immunity and disease. We previously leveraged 4614 ChIP-seq samples from the International Human Epigenome Consortium (IHEC) EpiATLAS dataset and did a comprehensive analysis of the relationship between TEs and 6 histone marks across 57 human cell types. However, with over 6 million measurements of TE / histone mark / cell type enrichment, it was challenging to navigate the results and it was not possible to integrate them with user data. To address this, we developed a web tool, TEExplorer, which makes available TE overlaps and enrichments in an accessible and intuitive manner. The tool presents an interactive view of TE families and subfamilies, with their overlap and enrichments across histone marks and cell types. Finally, the tool allows users to upload their own ChIP-seq BED file to obtain the TE overlap and enrichment relative to random controls and compare their data with the EpiATLAS dataset. With TEExplorer, researchers with an interest in a particular TE family or subfamily, histone mark, or cell type, or those bringing their own ChIP-seq dataset, can dynamically explore and contrast hundreds of associations found within the large EpiATLAS dataset.

AvailabilityOnline portal: https://teexplorer.c3g.sd4h.ca
]]></description>
<dc:creator>Hyacinthe, J.</dc:creator>
<dc:creator>Lougheed, D. R.</dc:creator>
<dc:creator>Bourque, G.</dc:creator>
<dc:date>2026-02-19</dc:date>
<dc:identifier>doi:10.64898/2026.02.18.706470</dc:identifier>
<dc:title><![CDATA[TEExplorer: A Web Portal to Investigate TE-Epigenome Associations Across Human Cell Types]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-02-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.03.31.715657v1?rss=1">
<title>
<![CDATA[
Histone Modification Metapeaks are Epigenetic Landmarks Predictive of Cell State 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.03.31.715657v1?rss=1"
</link>
<description><![CDATA[
Histone modifications are a key component of the epigenetic state of a cell, and they vary widely across different cell and tissue types, conditions, and disease states. Indeed, the majority of the genome is enriched with one histone mark or another across the thousands of cellular conditions that have been studied to date. Here, we use the largest-to-date collection of histone modification ChIP-seq datasets to identify the most important sites of histone modifications genome-wide. Collected and uniformly reprocessed by the International Human Epigenome Consortium, this data includes 5339 datasets enriched at nearly one billion total peaks across 59 different major cell or tissue types and in healthy and disease conditions, for six different histone marks. We propose FindMetapeaks, a new approach to identifying histone mark metapeaks, which are genomic regions with enrichment of a mark across many samples. We show that many of these epigenetic metapeaks are strongly indicative of cell and tissue type, or are associated with other sample characteristics, and highlight key regulatory regions of the genome. However, we also show that many metapeaks contain redundant information, and that parsimonious subsets of metapeaks can be selected by machine learning to predict cell state. Our histone mark metapeak atlas provides a concise set of regions for interpreting the epigenome.

Availabilityhttps://github.com/rmbioinfo83/FindMetapeaks/
]]></description>
<dc:creator>Tanner, R. M.</dc:creator>
<dc:creator>Perkins, T. J.</dc:creator>
<dc:date>2026-04-02</dc:date>
<dc:identifier>doi:10.64898/2026.03.31.715657</dc:identifier>
<dc:title><![CDATA[Histone Modification Metapeaks are Epigenetic Landmarks Predictive of Cell State]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-04-02</prism:publicationDate>
<prism:section></prism:section>
</item>
</rdf:RDF>
