	<rdf:RDF xmlns:admin="http://webns.net/mvcb/" xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:prism="http://purl.org/rss/1.0/modules/prism/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/">
	<channel rdf:about="https://biorxiv.org">
	<admin:errorReportsTo rdf:resource="mailto:biorxiv@cshlpress.edu"/>
	<title>bioRxiv Channel: ENCODE</title>
	<link>https://biorxiv.org</link>
	<description>
	This feed contains articles for bioRxiv Channel "ENCODE"
	</description>

		<items>
	<rdf:Seq>
		</rdf:Seq>
	</items>
	<prism:eIssn/>
	<prism:publicationName>bioRxiv</prism:publicationName>
	<prism:issn/>

	<image rdf:resource=""/>
	</channel>
	<image rdf:about="">
	<title>bioRxiv</title>
	<url/>
	<link>https://biorxiv.org</link>
	</image>
	<item rdf:about="https://biorxiv.org/cgi/content/short/708107v1?rss=1">
<title>
<![CDATA[
Prioritizing transcriptomic and epigenomic experiments by using an optimization strategy that leverages imputed data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/708107v1?rss=1"
</link>
<description><![CDATA[
Successful science often involves not only performing experiments well, but also choosing well among many possible experiments. In a hypothesis generation setting, choosing an experiment well means choosing an experiment whose results are interesting or novel. In this work, we formalize this selection procedure in the context of genomics and epigenomics data generation. Specifically, we consider the task faced by a scientific consortium such as the National Institutes of Health ENCODE Consortium, whose goal is to characterize all of the functional elements in the human genome. Given a list of possible cell types or tissue types ("biosamples") and a list of possible high throughput sequencing assays, we ask "Which experiments should ENCODE perform next?" We demonstrate how to represent this task as an optimization problem, where the goal is to maximize the information gained in each successive experiment. Compared with previous work that has addressed a similar problem, our approach has the advantage that it can use imputed data to tailor the selected list of experiments based on data collected previously by the consortium. We demonstrate the utility of our proposed method in simulations, and we provide a general software framework, named Kiwano, for selecting genomic and epigenomic experiments.
]]></description>
<dc:creator>Schreiber, J.</dc:creator>
<dc:creator>Bilmes, J.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:date>2019-07-19</dc:date>
<dc:identifier>doi:10.1101/708107</dc:identifier>
<dc:title><![CDATA[Prioritizing transcriptomic and epigenomic experiments by using an optimization strategy that leverages imputed data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-07-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/818849v1?rss=1">
<title>
<![CDATA[
Transcription imparts architecture, function, and logic to enhancer units 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/818849v1?rss=1"
</link>
<description><![CDATA[
Distal enhancers remain one of the least understood regulatory elements with pivotal roles in development and disease. We used massively parallel reporter assays to perform functional comparisons of two leading enhancer models and find that gene-distal transcription start sites (TSSs) are robust predictors of enhancer activity with higher resolution and specificity than histone modifications. We show that active enhancer units are precisely delineated by active TSSs, validate that these boundaries are sufficient to capture enhancer function, and confirm that core promoter sequences are required for this activity. Finally, we assay pairs of adjacent units and find that their cumulative activity is best predicted by the strongest unit within the pair. Synthetic fusions of enhancer units demonstrate that adjacency imposes winner-takes-all logic, revealing a simple design for a maximum-activity filter of enhancer unit outputs. Together, our results define fundamental enhancer units and a principle of non-cooperativity between adjacent units.
]]></description>
<dc:creator>Tippens, N. D.</dc:creator>
<dc:creator>Liang, J.</dc:creator>
<dc:creator>Leung, K. Y.</dc:creator>
<dc:creator>Ozer, A.</dc:creator>
<dc:creator>Booth, J. G.</dc:creator>
<dc:creator>Lis, J.</dc:creator>
<dc:creator>Yu, H.</dc:creator>
<dc:date>2019-11-07</dc:date>
<dc:identifier>doi:10.1101/818849</dc:identifier>
<dc:title><![CDATA[Transcription imparts architecture, function, and logic to enhancer units]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-11-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/166744v1?rss=1">
<title>
<![CDATA[
Spatiotemporal DNA Methylome Dynamics of the Developing Mammalian Fetus 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/166744v1?rss=1"
</link>
<description><![CDATA[
Genetic studies have revealed an essential role for cytosine DNA methylation in mammalian development. However, its spatiotemporal distribution in the developing embryo remains obscure. Here, we profiled the methylome landscapes of 12 mouse tissues/organs at 8 developmental stages spanning from early embryogenesis to birth. Indepth analysis of these spatiotemporal epigenome maps systematically delineated ~2 million methylation variant regions and uncovered widespread methylation dynamics at nearly one-half million tissue-specific enhancers, whose human counterparts were enriched for variants involved in genetic diseases. Strikingly, these predicted regulatory elements predominantly lose CG methylation during fetal development, whereas the trend is reversed after birth. Accumulation of non-CG methylation within gene bodies of key developmental transcription factors coincided with their transcriptional repression during later stages of fetal development. These spatiotemporal epigenomic maps provide a valuable resource for studying gene regulation during mammalian tissue/organ progression and for pinpointing regulatory elements involved in human developmental diseases.
]]></description>
<dc:creator>He, Y.</dc:creator>
<dc:creator>Hariharan, M.</dc:creator>
<dc:creator>Gorkin, D. U.</dc:creator>
<dc:creator>Dickel, D. E.</dc:creator>
<dc:creator>Luo, C.</dc:creator>
<dc:creator>Castanon, R. G.</dc:creator>
<dc:creator>Nery, J. R.</dc:creator>
<dc:creator>Lee, A. Y.</dc:creator>
<dc:creator>Williams, B. A.</dc:creator>
<dc:creator>Trout, D.</dc:creator>
<dc:creator>Amrhein, H.</dc:creator>
<dc:creator>Fang, R.</dc:creator>
<dc:creator>Chen, H.</dc:creator>
<dc:creator>Li, B.</dc:creator>
<dc:creator>Visel, A.</dc:creator>
<dc:creator>Pennacchio, L.</dc:creator>
<dc:creator>Ren, B.</dc:creator>
<dc:creator>Ecker, J.</dc:creator>
<dc:date>2017-07-21</dc:date>
<dc:identifier>doi:10.1101/166744</dc:identifier>
<dc:title><![CDATA[Spatiotemporal DNA Methylome Dynamics of the Developing Mammalian Fetus]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2017-07-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/166652v1?rss=1">
<title>
<![CDATA[
Systematic mapping of chromatin state landscapes during mouse development 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/166652v1?rss=1"
</link>
<description><![CDATA[
Embryogenesis requires epigenetic information that allows each cell to respond appropriately to developmental cues. Histone modifications are core components of a cells epigenome, giving rise to chromatin states that modulate genome function. Here, we systematically profile histone modifications in a diverse panel of mouse tissues at 8 developmental stages from 10.5 days post conception until birth, performing a total of 1,128 ChIP-seq assays across 72 distinct tissue-stages. We combine these histone modification profiles into a unified set of chromatin state annotations, and track their activity across developmental time and space. Through integrative analysis we identify dynamic enhancers, reveal key transcriptional regulators, and characterize the role of chromatin-based repression in developmental gene regulation. We also leverage these data to link enhancers to putative target genes, revealing connections between coding and non-coding sequence variation in disease etiology. Our study provides a compendium of resources for biomedical researchers, and achieves the most comprehensive view of embryonic chromatin states to date.
]]></description>
<dc:creator>Gorkin, D.</dc:creator>
<dc:creator>Barozzi, I.</dc:creator>
<dc:creator>Zhang, Y.</dc:creator>
<dc:creator>Lee, A. Y.</dc:creator>
<dc:creator>Lee, B.</dc:creator>
<dc:creator>Zhao, Y.</dc:creator>
<dc:creator>Wildberg, A.</dc:creator>
<dc:creator>Ding, B.</dc:creator>
<dc:creator>Zhang, B.</dc:creator>
<dc:creator>Wang, M.</dc:creator>
<dc:creator>Strattan, J. S.</dc:creator>
<dc:creator>Davidson, J. M.</dc:creator>
<dc:creator>Qiu, Y.</dc:creator>
<dc:creator>Afzal, V.</dc:creator>
<dc:creator>Akiyama, J. A.</dc:creator>
<dc:creator>Plajzer-Frick, I.</dc:creator>
<dc:creator>Pickle, C. S.</dc:creator>
<dc:creator>Kato, M.</dc:creator>
<dc:creator>Garvin, T. H.</dc:creator>
<dc:creator>Pham, Q. T.</dc:creator>
<dc:creator>Harrington, A. N.</dc:creator>
<dc:creator>Mannion, B. J.</dc:creator>
<dc:creator>Lee, E. A.</dc:creator>
<dc:creator>Fukuda-Yuzawa, Y.</dc:creator>
<dc:creator>He, Y.</dc:creator>
<dc:creator>Preissl, S.</dc:creator>
<dc:creator>Chee, S.</dc:creator>
<dc:creator>Williams, B. A.</dc:creator>
<dc:creator>Trout, D.</dc:creator>
<dc:creator>Amrhein, H.</dc:creator>
<dc:creator>Yang, H.</dc:creator>
<dc:creator>Cherry, J. M.</dc:creator>
<dc:creator>Shen, Y.</dc:creator>
<dc:creator>Ecker, J. R.</dc:creator>
<dc:creator>Wang, W.</dc:creator>
<dc:creator>Dickel, D. E.</dc:creator>
<dc:creator>Visel, A.</dc:creator>
<dc:creator>Pennacchio, L. A.</dc:creator>
<dc:creator>Ren, B.</dc:creator>
<dc:date>2017-07-21</dc:date>
<dc:identifier>doi:10.1101/166652</dc:identifier>
<dc:title><![CDATA[Systematic mapping of chromatin state landscapes during mouse development]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2017-07-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/731729v1?rss=1">
<title>
<![CDATA[
An integrative view of the regulatory and transcriptional landscapes in mouse hematopoiesis 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/731729v1?rss=1"
</link>
<description><![CDATA[
Thousands of epigenomic datasets have been generated in the past decade, but it is difficult for researchers to effectively utilize all the data relevant to their projects. Systematic integrative analysis can help meet this need, and the VISION project was established for ValIdated Systematic IntegratiON of epigenomic data in hematopoiesis. Here, we systematically integrated extensive data recording epigenetic features and transcriptomes from many sources, including individual laboratories and consortia, to produce a comprehensive view of the regulatory landscape of differentiating hematopoietic cell types in mouse. By employing IDEAS as our Integrative and Discriminative Epigenome Annotation System, we identified and assigned epigenetic states simultaneously along chromosomes and across cell types, precisely and comprehensively. Combining nuclease accessibility and epigenetic states produced a set of over 200,000 candidate cis-regulatory elements (cCREs) that efficiently capture enhancers and promoters. The transitions in epigenetic states of these cCREs across cell types provided insights into mechanisms of regulation, including decreases in numbers of active cCREs during differentiation of most lineages, transitions from poised to active or inactive states, and shifts in nuclease accessibility of CTCF-bound elements. Regression modeling of epigenetic states at cCREs and gene expression produced a versatile resource to improve selection of cCREs potentially regulating target genes. These resources are available from our VISION website (usevision.org) to aid research in genomics and hematopoiesis.
]]></description>
<dc:creator>Xiang, G.</dc:creator>
<dc:creator>Keller, C. A.</dc:creator>
<dc:creator>Heuston, E. F.</dc:creator>
<dc:creator>Giardine, B. M.</dc:creator>
<dc:creator>An, L.</dc:creator>
<dc:creator>Wixom, A. Q.</dc:creator>
<dc:creator>Miller, A.</dc:creator>
<dc:creator>Cockburn, A.</dc:creator>
<dc:creator>Lichtenberg, J.</dc:creator>
<dc:creator>Gottgens, B.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:creator>Bodine, D.</dc:creator>
<dc:creator>Mahony, S.</dc:creator>
<dc:creator>Taylor, J.</dc:creator>
<dc:creator>Blobel, G. A.</dc:creator>
<dc:creator>Weiss, M. J.</dc:creator>
<dc:creator>Cheng, Y.</dc:creator>
<dc:creator>Yue, F.</dc:creator>
<dc:creator>Hughes, J.</dc:creator>
<dc:creator>Higgs, D. R.</dc:creator>
<dc:creator>Zhang, Y.</dc:creator>
<dc:creator>Hardison, R. C.</dc:creator>
<dc:date>2019-08-10</dc:date>
<dc:identifier>doi:10.1101/731729</dc:identifier>
<dc:title><![CDATA[An integrative view of the regulatory and transcriptional landscapes in mouse hematopoiesis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-08-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/179648v1?rss=1">
<title>
<![CDATA[
A Large-Scale Binding and Functional Map of Human RNA Binding Proteins 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/179648v1?rss=1"
</link>
<description><![CDATA[
Genomes encompass all the information necessary to specify the development and function of an organism. In addition to genes, genomes also contain a myriad of functional elements that control various steps in gene expression. A major class of these elements function only when transcribed into RNA as they serve as the binding sites for RNA binding proteins (RBPs), which act to control post-transcriptional processes including splicing, cleavage and polyadenylation, RNA editing, RNA localization, stability, and translation. Despite the importance of these functional RNA elements encoded in the genome, they have been much less studied than genes and DNA elements. Here, we describe the mapping and characterization of RNA elements recognized by a large collection of human RBPs in K562 and HepG2 cells. These data expand the catalog of functional elements encoded in the human genome by addition of a large set of elements that function at the RNA level through interaction with RBPs.nnHighlightsO_LI223 eCLIP datasets for 150 RBPs reveal a wide variety of in vivo RNA target classes.nC_LIO_LI472 knockdown/RNA-seq profiles of 263 RBPs reveal factor-responsive targets and integration with eCLIP indicates RNA expression and splicing regulatory patterns.nC_LIO_LI78 RNA Bind-N-Seq profiles of in vitro binding motifs reveal links between in vitro and in vivo binding and indicate that eCLIP peaks that contain in vitro motifs are more strongly associated with regulation.nC_LIO_LI274 maps of RBP subcellular localization by immunofluorescence indicate widespread organelle-specific RNA processing regulation.nC_LIO_LI63 ChIP-seq profiles of DNA association suggest broad interconnectivity between chromatin association and RNA processing.nC_LI
]]></description>
<dc:creator>Van Nostrand, E. L.</dc:creator>
<dc:creator>Freese, P.</dc:creator>
<dc:creator>Pratt, G. A.</dc:creator>
<dc:creator>Wang, X.</dc:creator>
<dc:creator>Wei, X.</dc:creator>
<dc:creator>Blue, S. M.</dc:creator>
<dc:creator>Dominguez, D.</dc:creator>
<dc:creator>Cody, N. A. L.</dc:creator>
<dc:creator>Olson, S.</dc:creator>
<dc:creator>Sundararaman, B.</dc:creator>
<dc:creator>Xiao, R.</dc:creator>
<dc:creator>Zhan, L.</dc:creator>
<dc:creator>Bazile, C.</dc:creator>
<dc:creator>Benoit Bouvrette, L. P.</dc:creator>
<dc:creator>Chen, J.</dc:creator>
<dc:creator>Duff, M. O.</dc:creator>
<dc:creator>Garcia, K.</dc:creator>
<dc:creator>Gelboin-Burkhart, C.</dc:creator>
<dc:creator>Hochman, A.</dc:creator>
<dc:creator>Lambert, N. J.</dc:creator>
<dc:creator>Li, H.</dc:creator>
<dc:creator>Nguyen, T. B.</dc:creator>
<dc:creator>Palden, T.</dc:creator>
<dc:creator>Rabano, I.</dc:creator>
<dc:creator>Sathe, S.</dc:creator>
<dc:creator>Stanton, R.</dc:creator>
<dc:creator>Louie, A. L.</dc:creator>
<dc:creator>Aigner, S.</dc:creator>
<dc:creator>Bergalet, J.</dc:creator>
<dc:creator>Zhou, B.</dc:creator>
<dc:creator>Su, A.</dc:creator>
<dc:creator>Wang, R.</dc:creator>
<dc:creator>Yee, B. A.</dc:creator>
<dc:creator>Fu, X.-D.</dc:creator>
<dc:creator>Lecuyer, E.</dc:creator>
<dc:creator>Burge, C. B.</dc:creator>
<dc:creator>Graveley, B.</dc:creator>
<dc:creator>Yeo, G. W.</dc:creator>
<dc:date>2017-08-23</dc:date>
<dc:identifier>doi:10.1101/179648</dc:identifier>
<dc:title><![CDATA[A Large-Scale Binding and Functional Map of Human RNA Binding Proteins]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2017-08-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/857169v1?rss=1">
<title>
<![CDATA[
A limited set of transcriptional programs define major histological types and provide the molecular basis for a cellular taxonomy of the human body 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/857169v1?rss=1"
</link>
<description><![CDATA[
We have produced RNA sequencing data for a number of primary cells from different locations in the human body. The clustering of these primary cells reveals that most cells in the human body share a few broad transcriptional programs, which define five major cell types: epithelial, endothelial, mesenchymal, neural and blood cells. These act as basic components of many tissues and organs. Based on gene expression, these cell types redefine the basic histological types by which tissues have been traditionally classified. We identified genes whose expression is specific to these cell types, and from these genes, we estimated the contribution of the major cell types to the composition of human tissues. We found this cellular composition to be a characteristic signature of tissues, and to reflect tissue morphological heterogeneity and histology. We identified changes in cellular composition in different tissues associated with age and sex and found that departures from the normal cellular composition correlate with histological phenotypes associated to disease.

One Sentence SummaryA few broad transcriptional programs define the major cell types underlying the histology of human tissues and organs.
]]></description>
<dc:creator>Breschi, A.</dc:creator>
<dc:creator>Munoz-Aguirre, M.</dc:creator>
<dc:creator>Wucher, V.</dc:creator>
<dc:creator>Davis, C. A.</dc:creator>
<dc:creator>Garrido-Martin, D.</dc:creator>
<dc:creator>Djebali, S.</dc:creator>
<dc:creator>Gillis, J.</dc:creator>
<dc:creator>Pervouchine, D. D.</dc:creator>
<dc:creator>Vlasova, A.</dc:creator>
<dc:creator>Dobin, A.</dc:creator>
<dc:creator>Zaleski, C.</dc:creator>
<dc:creator>Drenkow, J.</dc:creator>
<dc:creator>Danyko, C.</dc:creator>
<dc:creator>Scavelli, A.</dc:creator>
<dc:creator>Reverter, F.</dc:creator>
<dc:creator>Snyder, M. P.</dc:creator>
<dc:creator>Gingeras, T. R.</dc:creator>
<dc:creator>Guigo, R.</dc:creator>
<dc:date>2019-11-27</dc:date>
<dc:identifier>doi:10.1101/857169</dc:identifier>
<dc:title><![CDATA[A limited set of transcriptional programs define major histological types and provide the molecular basis for a cellular taxonomy of the human body]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-11-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/464800v1?rss=1">
<title>
<![CDATA[
Occupancy patterns of 208 DNA-associated proteins in a single human cell type 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/464800v1?rss=1"
</link>
<description><![CDATA[
Genome-wide occupancy maps of transcriptional regulators are important for understanding gene regulation and its effects on diverse biological processes, but only a small fraction of the >1,600 transcription factors (TFs) encoded in the human genome has been assayed. Here we present data and analyses of ChIP-seq experiments for 208 DNA-associated proteins (DAPs) in the HepG2 hepatocellular carcinoma line, spanning nearly a quarter of its expressed TFs, transcriptional co-factors, and chromatin regulator proteins. The DAP binding profiles classify into major groups associated predominantly with promoters or enhancers, or with both. We confirm and expand the current catalog of DNA sequence motifs; 77 factors showed similar motifs to those previously described using in vivo and/or in vitro methods, and 17 yielded novel motifs. We also describe motifs corresponding to other TFs that co-enrich with the primary ChIP target. FOX family motifs are, for example, significantly enriched in ChIP-seq peaks of 37 other DAPs. We show that promoters and enhancers can be discriminated based on motif content and occupancy patterns. This large catalog reveals High Occupancy Target (HOT) regions at which many DAPs associate, although each contains motifs for only a minority of the numerous associated DAPs. These analyses provide a deeper and more complete overview of the gene regulatory networks that define this cell type.
]]></description>
<dc:creator>Partridge, E. C.</dc:creator>
<dc:creator>Chhetri, S. B.</dc:creator>
<dc:creator>Prokop, J. W.</dc:creator>
<dc:creator>Ramaker, R. C.</dc:creator>
<dc:creator>Jansen, C. S.</dc:creator>
<dc:creator>Goh, S.-T.</dc:creator>
<dc:creator>Mackiewicz, M.</dc:creator>
<dc:creator>Newberry, K. M.</dc:creator>
<dc:creator>Brandsmeier, L. A.</dc:creator>
<dc:creator>Meadows, S. K.</dc:creator>
<dc:creator>Messer, C. L.</dc:creator>
<dc:creator>Hardigan, A. A.</dc:creator>
<dc:creator>Dean, E. C.</dc:creator>
<dc:creator>Jiang, S.</dc:creator>
<dc:creator>Savic, D.</dc:creator>
<dc:creator>Mortazavi, A.</dc:creator>
<dc:creator>Wold, B. J.</dc:creator>
<dc:creator>Myers, R. M.</dc:creator>
<dc:creator>Mendenhall, E. M.</dc:creator>
<dc:date>2018-11-07</dc:date>
<dc:identifier>doi:10.1101/464800</dc:identifier>
<dc:title><![CDATA[Occupancy patterns of 208 DNA-associated proteins in a single human cell type]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2018-11-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/385237v1?rss=1">
<title>
<![CDATA[
A cross-organism framework for supervised enhancer prediction with epigenetic pattern recognition and targeted validation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/385237v1?rss=1"
</link>
<description><![CDATA[
Enhancers are important noncoding elements, but they have been traditionally hard to characterize experimentally. Only a few mammalian enhancers have been validated, making it difficult to train statistical models for their identification properly. Instead, postulated patterns of genomic features have been used heuristically for identification. The development of massively parallel assays allows for the characterization of large numbers of enhancers for the first time. Here, we developed a framework that uses Drosophila STARR-seq data to create shape-matching filters based on enhancer-associated meta-profiles of epigenetic features. We combined these features with supervised machine learning algorithms (e.g., support vector machines) to predict enhancers. We demonstrated that our model could be applied to predict enhancers in mammalian species (i.e., mouse and human). We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mouse and transduction-based reporter assays in human cell lines. Overall, the validations involved 153 enhancers in 6 mouse tissues and 4 human cell lines. The results confirmed that our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription-factor binding patterns at predicted enhancers and promoters in human cell lines. We demonstrated that these patterns enable the construction of a secondary model effectively discriminating between enhancers and promoters.
]]></description>
<dc:creator>Sethi, A.</dc:creator>
<dc:creator>Gu, M.</dc:creator>
<dc:creator>Gumusgoz, E.</dc:creator>
<dc:creator>Chan, L.</dc:creator>
<dc:creator>Yan, K.-K.</dc:creator>
<dc:creator>Rozowsky, J. S.</dc:creator>
<dc:creator>Barozzi, I.</dc:creator>
<dc:creator>Afzal, V.</dc:creator>
<dc:creator>Akiyama, J.</dc:creator>
<dc:creator>Plajzer-Frick, I.</dc:creator>
<dc:creator>Yan, C.</dc:creator>
<dc:creator>Pickle, C.</dc:creator>
<dc:creator>Kato, M.</dc:creator>
<dc:creator>Garvin, T.</dc:creator>
<dc:creator>Pham, Q.</dc:creator>
<dc:creator>Harrington, A.</dc:creator>
<dc:creator>Mannion, B.</dc:creator>
<dc:creator>Lee, E.</dc:creator>
<dc:creator>Fukuda-Yuzawa, Y.</dc:creator>
<dc:creator>Visel, A.</dc:creator>
<dc:creator>Dickle, D. E.</dc:creator>
<dc:creator>Yip, K.</dc:creator>
<dc:creator>Sutton, R.</dc:creator>
<dc:creator>Pennacchio, L. A.</dc:creator>
<dc:creator>Gerstein, M.</dc:creator>
<dc:date>2018-08-05</dc:date>
<dc:identifier>doi:10.1101/385237</dc:identifier>
<dc:title><![CDATA[A cross-organism framework for supervised enhancer prediction with epigenetic pattern recognition and targeted validation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2018-08-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/745844v1?rss=1">
<title>
<![CDATA[
A curated benchmark of enhancer-gene interactions for evaluating enhancer-target gene prediction methods 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/745844v1?rss=1"
</link>
<description><![CDATA[
Many genome-wide collections of candidate cis-regulatory elements (cCREs) have been defined using genomic and epigenomic data, but it remains a major challenge to connect these elements to their target genes. To facilitate the development of computational methods for predicting target genes, we developed a Benchmark of candidate Enhancer-Gene Interactions (BENGI) by integrating the Registry of cCREs we developed recently with experimentally-derived genomic interactions. We used BENGI to test several published computational methods for linking enhancers with genes, including signal correlation and the supervised learning methods TargetFinder and PEP. We found that while TargetFinder was the best performing method, it was modestly better than a baseline distance method for most benchmark datasets while trained and tested within the same cell type and that TargetFinder often did not outperform the distance method when applied across cell types. Our results suggest that current computational methods need to be improved and that BENGI presents a useful framework for method development and testing.
]]></description>
<dc:creator>Moore, J. E.</dc:creator>
<dc:creator>Pratt, H.</dc:creator>
<dc:creator>Purcaro, M.</dc:creator>
<dc:creator>Weng, Z.</dc:creator>
<dc:date>2019-08-24</dc:date>
<dc:identifier>doi:10.1101/745844</dc:identifier>
<dc:title><![CDATA[A curated benchmark of enhancer-gene interactions for evaluating enhancer-target gene prediction methods]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-08-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/810291v1?rss=1">
<title>
<![CDATA[
Integrative analysis of 10,000 epigenomic maps across 800 samples for regulatory genomics and disease dissection 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/810291v1?rss=1"
</link>
<description><![CDATA[
To help elucidate genetic variants underlying complex traits, we develop EpiMap, a compendium of 833 reference epigenomes across 18 uniformly-processed and computationally-completed assays. We define chromatin states, high-resolution enhancers, activity patterns, enhancer modules, upstream regulators, and downstream target gene functions. We annotate 30,247 genetic variants associated with 534 traits, recognize principal and partner tissues underlying each trait, infer trait-tissue, tissue-tissue and trait-trait relationships, and partition multifactorial traits into their tissue-specific contributing factors. Our results demonstrate the importance of dense, rich, and high-resolution epigenomic annotations for complex trait dissection, and yield numerous new insights for understanding the molecular basis of human disease.
]]></description>
<dc:creator>Adsera, C. B.</dc:creator>
<dc:creator>Park, Y.</dc:creator>
<dc:creator>Meuleman, W.</dc:creator>
<dc:creator>Kellis, M.</dc:creator>
<dc:date>2019-10-18</dc:date>
<dc:identifier>doi:10.1101/810291</dc:identifier>
<dc:title><![CDATA[Integrative analysis of 10,000 epigenomic maps across 800 samples for regulatory genomics and disease dissection]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-10-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/730549v1?rss=1">
<title>
<![CDATA[
Quantifying genetic effects on disease mediated by assayed gene expression levels 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/730549v1?rss=1"
</link>
<description><![CDATA[
Disease variants identified by genome-wide association studies (GWAS) tend to overlap with expression quantitative trait loci (eQTLs). However, it remains unclear whether this overlap is driven by mediation of genetic effects on disease by expression levels, or whether it primarily reflects pleiotropic relationships instead. Here we introduce a new method, mediated expression score regression (MESC), to estimate disease heritability mediated by the cis-genetic component of assayed steady-state gene expression levels, using summary association statistics from GWAS and eQTL studies. We show that MESC produces robust estimates of expression-mediated heritability across a wide range of simulations. We applied MESC to GWAS summary statistics for 42 diseases and complex traits (average N = 323K) and cis-eQTL data across 48 tissues from the GTEx consortium. We determined that a statistically significant but low proportion of disease heritability (mean estimate 11% with S.E. 2%) is mediated by the cis-genetic component of assayed gene expression levels, with substantial variation across diseases (point estimates from 0% to 38%). We further partitioned expression-mediated heritability across various gene sets. We observed an inverse relationship between cis-heritability of expression and disease heritability mediated by expression, suggesting that genes with weaker eQTLs have larger causal effects on disease. Moreover, we observed broad patterns of expression-mediated heritability enrichment across functional gene sets that implicate specific gene sets in disease, including loss-of-function intolerant genes and FDA-approved drug targets. Our results demonstrate that eQTLs estimated from steady-state expression levels in bulk tissues are informative of regulatory disease mechanisms, but that such eQTLs are insufficient to explain the majority of disease heritability. Instead, additional assays are necessary to more fully capture the regulatory effects of GWAS variants.
]]></description>
<dc:creator>Yao, D. W.</dc:creator>
<dc:creator>O'Connor, L. J.</dc:creator>
<dc:creator>Price, A. L.</dc:creator>
<dc:creator>Gusev, A.</dc:creator>
<dc:date>2019-08-09</dc:date>
<dc:identifier>doi:10.1101/730549</dc:identifier>
<dc:title><![CDATA[Quantifying genetic effects on disease mediated by assayed gene expression levels]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-08-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/803452v1?rss=1">
<title>
<![CDATA[
Population-specific causal disease effect sizes in functionally important regions impacted by selection 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/803452v1?rss=1"
</link>
<description><![CDATA[
Many diseases and complex traits exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting trans-ethnic polygenic risk prediction. We developed a new method, S-LDXR, for stratifying squared trans-ethnic genetic correlation across genomic annotations, and applied S-LDXR to genome-wide association summary statistics for 31 diseases and complex traits in East Asians (EAS) and Europeans (EUR) (average NEAS=90K, NEUR=267K) with an average trans-ethnic genetic correlation of 0.85 (s.e. 0.01). We determined that squared trans-ethnic genetic correlation was 0.82x (s.e. 0.01) smaller than the genome-wide average at SNPs in the top quintile of background selection statistic, implying more population-specific causal effect sizes. Accordingly, causal effect sizes were more population-specific in functionally important regions, including conserved and regulatory regions. In analyses of regions surrounding specifically expressed genes, causal effect sizes were most population-specific for skin and immune genes and least population-specific for brain genes. Our results could potentially be explained by stronger gene-environment interaction at loci impacted by selection, particularly positive selection.
]]></description>
<dc:creator>Shi, H.</dc:creator>
<dc:creator>Gazal, S.</dc:creator>
<dc:creator>Kanai, M.</dc:creator>
<dc:creator>Koch, E. M.</dc:creator>
<dc:creator>Schoech, A. P.</dc:creator>
<dc:creator>Kim, S. S.</dc:creator>
<dc:creator>Luo, Y.</dc:creator>
<dc:creator>Amariuta, T.</dc:creator>
<dc:creator>Okada, Y.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:creator>Sunyaev, S. R.</dc:creator>
<dc:creator>Price, A. L.</dc:creator>
<dc:date>2019-10-15</dc:date>
<dc:identifier>doi:10.1101/803452</dc:identifier>
<dc:title><![CDATA[Population-specific causal disease effect sizes in functionally important regions impacted by selection]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-10-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.01.02.890657v1?rss=1">
<title>
<![CDATA[
Improving the informativeness of Mendelian disease pathogenicity scores for common disease 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.01.02.890657v1?rss=1"
</link>
<description><![CDATA[
Despite considerable progress on pathogenicity scores prioritizing both coding and noncoding variants for Mendelian disease, little is known about the utility of these pathogenicity scores for common disease. Here, we sought to assess the informativeness of Mendelian diseasederived pathogenicity scores for common disease, and to improve upon existing scores. We first applied stratified LD score regression to assess the informativeness of annotations defined by top variants from published Mendelian disease-derived pathogenicity scores across 41 independent common diseases and complex traits (average N = 320K). Several of the resulting annotations were informative for common disease, even after conditioning on a broad set of coding, conserved, regulatory and LD-related annotations from the baseline-LD model. We then improved upon the published pathogenicity scores by developing AnnotBoost, a gradient boosting-based framework to impute and denoise pathogenicity scores using functional annotations from the baseline-LD model. AnnotBoost substantially increased the informativeness for common disease of both previously uninformative and previously informative pathogenicity scores, implying pervasive variant-level overlap between Mendelian disease and common disease. The boosted scores also produced significant improvements in heritability model fit and in classifying disease-associated, fine-mapped SNPs. Our boosted scores have high potential to improve candidate gene discovery and fine-mapping for common disease.
]]></description>
<dc:creator>Kim, S. S.</dc:creator>
<dc:creator>Dey, K. K.</dc:creator>
<dc:creator>Weissbrod, O.</dc:creator>
<dc:creator>Marquez-Luna, C.</dc:creator>
<dc:creator>Gazal, S.</dc:creator>
<dc:creator>Price, A. L.</dc:creator>
<dc:date>2020-01-03</dc:date>
<dc:identifier>doi:10.1101/2020.01.02.890657</dc:identifier>
<dc:title><![CDATA[Improving the informativeness of Mendelian disease pathogenicity scores for common disease]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-01-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/807792v1?rss=1">
<title>
<![CDATA[
Functionally-informed fine-mapping and polygenic localization of complex trait heritability 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/807792v1?rss=1"
</link>
<description><![CDATA[
Fine-mapping aims to identify causal variants impacting complex traits. Several recent methods improve fine-mapping accuracy by prioritizing variants in enriched functional annotations. However, these methods can only use information at genome-wide significant loci (or a small number of functional annotations), severely limiting the benefit of functional data. We propose PolyFun, a computationally scalable framework to improve fine-mapping accuracy using genome-wide functional data for a broad set of coding, conserved, regulatory and LD-related annotations. PolyFun prioritizes variants in enriched functional annotations by specifying prior causal probabilities for fine-mapping methods such as SuSiE or FINEMAP, employing special procedures to ensure robustness to model misspecification and winners curse. In simulations with in-sample LD, PolyFun + SuSiE and PolyFun + FINEMAP were well-calibrated and identified >20% more variants with posterior causal probability >0.95 than their non-functionally informed counterparts (and >33% more fine-mapped variants than previous functionally-informed fine-mapping methods). In simulations with mismatched reference LD, PolyFun + SuSiE remained well-calibrated when reducing the maximum number of assumed causal SNPs per locus, which reduces absolute power but still produces large relative improvements. In analyses of 49 UK Biobank traits (average N=318K) with in-sample LD, PolyFun + SuSiE identified 3,025 fine-mapped variant-trait pairs with posterior causal probability >0.95, a >32% improvement vs. SuSiE; 223 variants were fine-mapped for multiple genetically uncorrelated traits, indicating pervasive pleiotropy. We used posterior mean per-SNP heritabilities from PolyFun + SuSiE to perform polygenic localization, constructing minimal sets of common SNPs causally explaining 50% of common SNP heritability; these sets ranged in size from 28 (hair color) to 3,400 (height) to 2 million (number of children). In conclusion, PolyFun prioritizes variants for functional follow-up and provides insights into complex trait architectures.
]]></description>
<dc:creator>Weissbrod, O.</dc:creator>
<dc:creator>Hormozdiari, F.</dc:creator>
<dc:creator>Benner, C.</dc:creator>
<dc:creator>Cui, R.</dc:creator>
<dc:creator>Ulirsch, J.</dc:creator>
<dc:creator>Gazal, S.</dc:creator>
<dc:creator>Schoech, A. P.</dc:creator>
<dc:creator>van de Geijn, B.</dc:creator>
<dc:creator>Reshef, Y.</dc:creator>
<dc:creator>Marquez-Luna, C.</dc:creator>
<dc:creator>O'Connor, L. J.</dc:creator>
<dc:creator>Pirinen, M.</dc:creator>
<dc:creator>Finucane, H. K.</dc:creator>
<dc:creator>Price, A. L.</dc:creator>
<dc:date>2019-10-17</dc:date>
<dc:identifier>doi:10.1101/807792</dc:identifier>
<dc:title><![CDATA[Functionally-informed fine-mapping and polygenic localization of complex trait heritability]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-10-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/784439v1?rss=1">
<title>
<![CDATA[
Evaluating the informativeness of deep learning annotations for human complex diseases 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/784439v1?rss=1"
</link>
<description><![CDATA[
Deep learning models have shown great promise in predicting genome-wide regulatory effects from DNA sequence, but their informativeness for human complex diseases and traits is not fully understood. Here, we evaluate the disease informativeness of allelic-effect annotations (absolute value of the predicted difference between reference and variant alleles) constructed using two previously trained deep learning models, DeepSEA and Basenji. We apply stratified LD score regression (S-LDSC) to 41 independent diseases and complex traits (average N=320K) to evaluate each annotations informativeness for disease heritability conditional on a broad set of coding, conserved, regulatory and LD-related annotations from the baseline-LD model and other sources; as a secondary metric, we also evaluate the accuracy of models that incorporate deep learning annotations in predicting disease-associated or fine-mapped SNPs. We aggregated annotations across all tissues (resp. blood cell types or brain tissues) in meta-analyses across all 41 traits (resp. 11 blood-related traits or 8 brain-related traits). These allelic-effect annotations were highly enriched for disease heritability, but produced only limited conditionally significant results - only Basenji-H3K4me3 in meta-analyses across all 41 traits and brain-specific Basenji-H3K4me3 in meta-analyses across 8 brain-related traits. We conclude that deep learning models are yet to achieve their full potential to provide considerable amount of unique information for complex disease, and that the informativeness of deep learning models for disease beyond established functional annotations cannot be inferred from metrics based on their accuracy in predicting regulatory annotations.
]]></description>
<dc:creator>Dey, K. K.</dc:creator>
<dc:creator>van de Geijn, B. K.</dc:creator>
<dc:creator>Kim, S. S.</dc:creator>
<dc:creator>Hormozdiari, F.</dc:creator>
<dc:creator>Kelley, D. R.</dc:creator>
<dc:creator>Price, A.</dc:creator>
<dc:date>2019-09-26</dc:date>
<dc:identifier>doi:10.1101/784439</dc:identifier>
<dc:title><![CDATA[Evaluating the informativeness of deep learning annotations for human complex diseases]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-09-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/375337v1?rss=1">
<title>
<![CDATA[
Modeling functional enrichment improves polygenic prediction accuracy in UK Biobank and 23andMe data sets 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/375337v1?rss=1"
</link>
<description><![CDATA[
Genetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a new method for polygenic prediction, LDpred-funct, that leverages trait-specific functional priors to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, which includes coding, conserved, regulatory and LD-related annotations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. LDpred-funct attained higher prediction accuracy than other polygenic prediction methods in simulations using real genotypes. We applied LDpred-funct to predict 21 highly heritable traits in the UK Biobank. We used association statistics from British-ancestry samples as training data (avg N=373K) and samples of other European ancestries as validation data (avg N=22K), to minimize confounding. LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (avg prediction R2=0.144; highest R2=0.413 for height) compared to SBayesR (the best method that does not incorporate functional information). For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (total N=1107K; higher heritability in UK Biobank cohort) increased prediction R2 to 0.431. Our results show that incorporating functional priors improves polygenic prediction accuracy, consistent with the functional architecture of complex traits.
]]></description>
<dc:creator>Marquez-Luna, C.</dc:creator>
<dc:creator>Gazal, S.</dc:creator>
<dc:creator>Loh, P.-R.</dc:creator>
<dc:creator>Furlotte, N.</dc:creator>
<dc:creator>Auton, A.</dc:creator>
<dc:creator>23andMe Research Team,</dc:creator>
<dc:creator>Price, A. L.</dc:creator>
<dc:date>2018-07-24</dc:date>
<dc:identifier>doi:10.1101/375337</dc:identifier>
<dc:title><![CDATA[Modeling functional enrichment improves polygenic prediction accuracy in UK Biobank and 23andMe data sets]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2018-07-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/512434v1?rss=1">
<title>
<![CDATA[
A pitfall for machine learning methods aiming to predict across cell types 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/512434v1?rss=1"
</link>
<description><![CDATA[
Machine learning models to predict phenomena such as gene expression, enhancer activity, transcription factor binding, or chromatin conformation are most useful when they can generalize to make accurate predictions across cell types. In this situation, a natural strategy is to train the model on experimental data from some cell types and evaluate performance on one or more held-out cell types. In this work, we show that, when the training set contains examples derived from the same genomic loci across multiple cell types, then the resulting model can be susceptible to a particular form of bias related to memorizing the average activity associated with each genomic locus. Consequently, the trained model may appear to perform well when evaluated on the genomic loci that it was trained on but tends to perform poorly on loci that it was not trained on. We demonstrate this phenomenon by using epigenomic measurements and nucleotide sequence to predict gene expression and chromatin domain boundaries, and we suggest methods to diagnose and avoid the pitfall. We anticipate that, as more data and computing resources become available, future projects will increasingly risk suffering from this issue.
]]></description>
<dc:creator>Schreiber, J.</dc:creator>
<dc:creator>Singh, R.</dc:creator>
<dc:creator>Bilmes, J.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:date>2019-01-04</dc:date>
<dc:identifier>doi:10.1101/512434</dc:identifier>
<dc:title><![CDATA[A pitfall for machine learning methods aiming to predict across cell types]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-01-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.01.31.927798v1?rss=1">
<title>
<![CDATA[
Global reference mapping and dynamics of human transcription factor footprints 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.01.31.927798v1?rss=1"
</link>
<description><![CDATA[
Combinatorial binding of transcription factors to regulatory DNA underpins gene regulation in all organisms. Genetic variation in regulatory regions has been connected with diseases and diverse phenotypic traits1, yet it remains challenging to distinguish variants that impact regulatory function2. Genomic DNase I footprinting enables quantitative, nucleotide-resolution delineation of sites of transcription factor occupancy within native chromatin3-5. However, to date only a small fraction of such sites have been precisely resolved on the human genome sequence5. To enable comprehensive mapping of transcription factor footprints, we produced high-density DNase I cleavage maps from 243 human cell and tissue types and states and integrated these data to delineate at nucleotide resolution ~4.5 million compact genomic elements encoding transcription factor occupancy. We map the fine-scale structure of ~1.6 million DHS and show that the overwhelming majority is populated by well-spaced sites of single transcription factor:DNA interaction. Cell context-dependent cis-regulation is chiefly executed by wholesale actuation of accessibility at regulatory DNA versus by differential transcription factor occupancy within accessible elements. We show further that the well-described enrichment of disease- and phenotypic trait-associated genetic variants in regulatory regions1,6 is almost entirely attributable to variants localizing within footprints, and that functional variants impacting transcription factor occupancy are nearly evenly partitioned between loss- and gain-of-function alleles. Unexpectedly, we find that the global density of human genetic variation is markedly increased within transcription factor footprints, revealing an unappreciated driver of cis-regulatory evolution. Our results provide a new framework for both global and nucleotide-precision analyses of gene regulatory mechanisms and functional genetic variation.
]]></description>
<dc:creator>Vierstra, J.</dc:creator>
<dc:creator>Lazar, J.</dc:creator>
<dc:creator>Sandstrom, R.</dc:creator>
<dc:creator>Halow, J.</dc:creator>
<dc:creator>Lee, K.</dc:creator>
<dc:creator>Bates, D.</dc:creator>
<dc:creator>Diegel, M.</dc:creator>
<dc:creator>Dunn, D.</dc:creator>
<dc:creator>Neri, F.</dc:creator>
<dc:creator>Haugen, E.</dc:creator>
<dc:creator>Rynes, E.</dc:creator>
<dc:creator>Reynolds, A.</dc:creator>
<dc:creator>Nelson, J.</dc:creator>
<dc:creator>Johnson, A.</dc:creator>
<dc:creator>Frerker, M.</dc:creator>
<dc:creator>Buckley, M.</dc:creator>
<dc:creator>Kaul, R.</dc:creator>
<dc:creator>Meuleman, W.</dc:creator>
<dc:creator>Stamatoyannopoulos, J. A.</dc:creator>
<dc:date>2020-02-01</dc:date>
<dc:identifier>doi:10.1101/2020.01.31.927798</dc:identifier>
<dc:title><![CDATA[Global reference mapping and dynamics of human transcription factor footprints]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-02-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/396275v1?rss=1">
<title>
<![CDATA[
Allele-specific binding of RNA-binding proteins reveals functional genetic variants in the RNA 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/396275v1?rss=1"
</link>
<description><![CDATA[
Allele-specific protein-RNA binding is an essential aspect that may reveal functional genetic variants influencing RNA processing and gene expression phenotypes. Recently, genome-wide detection of in vivo binding sites of RNA binding proteins (RBPs) is greatly facilitated by the enhanced UV crosslinking and immunoprecipitation (eCLIP) protocol. Hundreds of eCLIP-Seq data sets were generated from HepG2 and K562 cells during the ENCODE3 phase. These data afford a valuable opportunity to examine allele-specific binding (ASB) of RBPs. To this end, we developed a new computational algorithm, called BEAPR (Binding Estimation of Allele-specific Protein-RNA interaction). In identifying statistically significant ASB sites, BEAPR takes into account UV cross-linking induced sequence propensity and technical variations between replicated experiments. Using simulated data and actual eCLIP-Seq data, we show that BEAPR largely outperforms often-used methods Chi-Squared test and Fishers Exact test. Importantly, BEAPR overcomes the inherent over-dispersion problem of the other methods. Complemented by experimental validations, we demonstrate that ASB events are significantly associated with genetic regulation of splicing and mRNA abundance, supporting the usage of this method to pinpoint functional genetic variants in post-transcriptional gene regulation. Many variants with ASB patterns of RBPs were found as genetic variants with cancer or other disease relevance. About 38% of ASB variants were in linkage disequilibrium with single nucleotide polymorphisms from genome-wide association studies. Overall, our results suggest that BEAPR is an effective method to reveal ASB patterns in eCLIP and can inform functional interpretation of disease-related genetic variants.
]]></description>
<dc:creator>Yang, E.-W.</dc:creator>
<dc:creator>Bahn, J. H.</dc:creator>
<dc:creator>Hsiao, E. Y.-H.</dc:creator>
<dc:creator>Tan, B. X.</dc:creator>
<dc:creator>Sun, Y.</dc:creator>
<dc:creator>Fu, T.</dc:creator>
<dc:creator>Zhou, B.</dc:creator>
<dc:creator>Van Nostrand, E. L.</dc:creator>
<dc:creator>Pratt, G. A.</dc:creator>
<dc:creator>Freese, P.</dc:creator>
<dc:creator>Wei, X.</dc:creator>
<dc:creator>Quinones-Valdez, G.</dc:creator>
<dc:creator>Urban, A. E.</dc:creator>
<dc:creator>Graveley, B. R.</dc:creator>
<dc:creator>Burge, C. B.</dc:creator>
<dc:creator>Yeo, G. W.</dc:creator>
<dc:creator>Xiao, X.</dc:creator>
<dc:date>2018-08-20</dc:date>
<dc:identifier>doi:10.1101/396275</dc:identifier>
<dc:title><![CDATA[Allele-specific binding of RNA-binding proteins reveals functional genetic variants in the RNA]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2018-08-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/446625v1?rss=1">
<title>
<![CDATA[
Widespread RNA editing dysregulation in Autism Spectrum Disorders 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/446625v1?rss=1"
</link>
<description><![CDATA[
Autism spectrum disorder (ASD) is a genetically complex, clinically heterogeneous neurodevelopmental disease. Recently, our understanding of the molecular abnormalities in ASD has been expanded through transcriptomic analyses of postmortem brains. However, a crucial molecular pathway involved in synaptic development, RNA editing, has not yet been studied on a genome-wide scale. Here, we profiled the global patterns of adenosine-to-inosine (A-to-I) editing in a large cohort of post-mortem ASD brains. Strikingly, we observed a global bias of hypo-editing in ASD brains, common to different brain regions and involving many genes with known neurobiological functions. Through genome-wide protein-RNA binding analyses and detailed molecular assays, we show that the Fragile X proteins, FMRP and FXR1P, interact with ADAR proteins and modulate A-to-I editing. Furthermore, we observed convergent patterns of RNA editing alterations in ASD and Fragile X syndrome, thus establishing RNA editing as a molecular link underlying these two highly related diseases. Our findings support a role for RNA editing dysregulation in ASD and highlight novel mechanisms for RNA editing regulation.
]]></description>
<dc:creator>Tran, S.</dc:creator>
<dc:creator>Jun, H.-I.</dc:creator>
<dc:creator>Bahn, J. H.</dc:creator>
<dc:creator>Azghadi, A.</dc:creator>
<dc:creator>Ramaswami, G.</dc:creator>
<dc:creator>Van Nostrand, E. L.</dc:creator>
<dc:creator>Nguyen, T. B.</dc:creator>
<dc:creator>Hsiao, Y.-H. E.</dc:creator>
<dc:creator>Lee, C.</dc:creator>
<dc:creator>Pratt, G. A.</dc:creator>
<dc:creator>Yeo, G. W.</dc:creator>
<dc:creator>Geschwind, D. H.</dc:creator>
<dc:creator>Xiao, X.</dc:creator>
<dc:date>2018-10-17</dc:date>
<dc:identifier>doi:10.1101/446625</dc:identifier>
<dc:title><![CDATA[Widespread RNA editing dysregulation in Autism Spectrum Disorders]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2018-10-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/301267v1?rss=1">
<title>
<![CDATA[
Co-regulation of alternative splicing by hnRNPM and ESRP1 during EMT 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/301267v1?rss=1"
</link>
<description><![CDATA[
The epithelial-mesenchymal transition (EMT) is a fundamental developmental process that is abnormally activated in cancer metastasis. Dynamic changes in alternative splicing occur during EMT. ESRP1 and hnRNPM are splicing regulators that promote an epithelial splicing program and a mesenchymal splicing program, respectively. The functional relationships between these splicing factors in the genome-scale remain elusive. Comparing alternative splicing targets of hnRNPM and ESRP1 revealed that they co-regulate a set of cassette exon events, with the majority showing discordant splicing regulation. hnRNPM discordantly regulated splicing events show a positive correlation with splicing during EMT while concordant splicing events do not, highlighting the antagonistic role of hnRNPM and ESRP1 during EMT. Motif enrichment analysis near co-regulated exons identifies guanine-uridine rich motifs downstream of hnRNPM-repressed and ESRP1-enhanced exons, supporting a model of competitive binding to these cis-elements to antagonize alternative splicing. The set of co-regulated exons are enriched in genes associated with cell-migration and cytoskeletal reorganization, which are pathways associated with EMT. Splicing levels of co-regulated exons are associated with breast cancer patient survival and correlate with gene sets involved in EMT and breast cancer subtypes. These data identify complex modes of interaction between hnRNPM and ESRP1 in regulation of splicing in disease-relevant contexts.
]]></description>
<dc:creator>Harvey, S.</dc:creator>
<dc:creator>Xu, Y.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:creator>Gao, X. D.</dc:creator>
<dc:creator>Qiu, Y.</dc:creator>
<dc:creator>Ahn, J.</dc:creator>
<dc:creator>Xiao, X.</dc:creator>
<dc:creator>Cheng, C.</dc:creator>
<dc:date>2018-04-13</dc:date>
<dc:identifier>doi:10.1101/301267</dc:identifier>
<dc:title><![CDATA[Co-regulation of alternative splicing by hnRNPM and ESRP1 during EMT]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2018-04-13</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/807008v1?rss=1">
<title>
<![CDATA[
Principles of RNA processing from analysis of enhanced CLIP maps for 150 RNA binding proteins. 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/807008v1?rss=1"
</link>
<description><![CDATA[
A critical step in uncovering rules of RNA processing is to study the in vivo regulatory networks of RNA binding proteins (RBPs). Crosslinking and immunoprecipitation (CLIP) methods enabled mapping RBP targets transcriptome-wide, but methodological differences present challenges to large-scale integrated analysis across datasets. The development of enhanced CLIP (eCLIP) enabled the large-scale mapping of targets for 150 RBPs in K562 and HepG2, creating a unique resource of RBP interactomes profiled with a standardized methodology in the same cell types. Here we describe our analysis of 223 enhanced (eCLIP) datasets characterizing 150 RBPs in K562 and HepG2 cell lines, revealing a range of binding modalities, including highly resolved positioning around splicing signals and mRNA untranslated regions that associate with distinct RBP functions. Quantification of enrichment for repetitive and abundant multi-copy elements reveals 70% of RBPs have enrichment for non-mRNA element classes, enables identification of novel ribosomal RNA processing factors and sites and suggests that association with retrotransposable elements reflects multiple RBP mechanisms of action. Analysis of spliceosomal RBPs indicates that eCLIP resolves AQR association after intronic lariat formation (enabling identification of branch points with single-nucleotide resolution) and provides genome-wide validation for a branch point-based scanning model for 3 splice site recognition. Further, we show that eCLIP peak co-occurrences across RBPs enables the discovery of novel co-interacting RBPs. Finally, we present a protocol for visualization of RBP:RNA complexes in the eCLIP workflow using biotin and standard chemiluminescent visualization reagents, enabling simplified confirmation of ribonucleoprotein enrichment without radioactivity. This work illustrates the value of integrated analysis across eCLIP profiling of RBPs with widely distinct functions to reveal novel RNA biology. Further, our quantification of both mRNA and other element association will enable further research to identify novel roles of RBPs in regulating RNA processing.
]]></description>
<dc:creator>Van Nostrand, E. L.</dc:creator>
<dc:creator>Pratt, G. A.</dc:creator>
<dc:creator>Yee, B. A.</dc:creator>
<dc:creator>Wheeler, E. C.</dc:creator>
<dc:creator>Blue, S. M.</dc:creator>
<dc:creator>Mueller, J.</dc:creator>
<dc:creator>Park, S. S.</dc:creator>
<dc:creator>Garcia, K. E.</dc:creator>
<dc:creator>Gelboin-Burkhart, C.</dc:creator>
<dc:creator>Nguyen, T. B.</dc:creator>
<dc:creator>Rabano, I.</dc:creator>
<dc:creator>Stanton, R.</dc:creator>
<dc:creator>Sundararaman, B.</dc:creator>
<dc:creator>Wang, R.</dc:creator>
<dc:creator>Fu, X.-D.</dc:creator>
<dc:creator>Graveley, B. R.</dc:creator>
<dc:creator>Yeo, G. W.</dc:creator>
<dc:date>2019-10-16</dc:date>
<dc:identifier>doi:10.1101/807008</dc:identifier>
<dc:title><![CDATA[Principles of RNA processing from analysis of enhanced CLIP maps for 150 RNA binding proteins.]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-10-16</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/086025v1?rss=1">
<title>
<![CDATA[
A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/086025v1?rss=1"
</link>
<description><![CDATA[
Semi-automated genome annotation methods such as Segway enable understanding of chromatin activity. Here we present chromatin state annotations of 164 human cell types using 1,615 genomics data sets. To produce these annotations, we developed a fully-automated annotation strategy in which we train separate unsupervised annotation models on each cell type and use a machine learning classifier to automate the state interpretation step. Using these annotations, we developed a measure of the importance of each genomic position called the "conservation-associated activity score," which we use to aggregate information across cell types into a multi-cell type view. The aggregated conservation-associated activity score provides a measure of importance directly attributable to a specific activity in a specific set of cell types. In contrast to evolutionary conservation, this measure is not biased to detect only elements shared with related species. Using the conservation-associated activity score, we combined all our annotations into a single, cell type-agnostic encyclopedia that catalogs all human transcriptional and regulatory elements, enabling easy and intuitive interpretation of the effect of genome variants on phenotype, such as in disease-associated, evolutionarily conserved or positively selected loci. These resources, including cell type-specific annotations, encyclopedia, and a visualization server, are available at http://noble.gs.washington.edu/proj/encyclopedia.nnAuthor SummaryGenome annotation algorithms are an effective class of tools for understanding the function of the genome. These algorithms take as input a set of genome-wide measurements about the activity at each base pair in a given tissue, such as where a given protein is binding or how accessible the DNA is to being read by a protein. The genome is then partitioned and each segment is assigned a label such that positions with the same label exhibit similar patterns in the input data. Such annotations are widely used for many applications, such as to understand the mechanism of impact of a given genetic variant. Here we present, to our knowledge, the most comprehensive set of genome annotations created so far, encompassing 164 human cell types and including 1,615 genomics data sets. These comprehensive annotations are made possible by a strategy that automates the previous interpretation step. Furthermore, we present several methodological innovations that make these genome annotations more useful.
]]></description>
<dc:creator>Libbrecht, M. W.</dc:creator>
<dc:creator>Rodriguez, O.</dc:creator>
<dc:creator>Weng, Z.</dc:creator>
<dc:creator>Hoffman, M.</dc:creator>
<dc:creator>Bilmes, J. A.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:date>2016-11-07</dc:date>
<dc:identifier>doi:10.1101/086025</dc:identifier>
<dc:title><![CDATA[A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-11-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/009209v1?rss=1">
<title>
<![CDATA[
Joint annotation of chromatin state and chromatin conformation reveals relationships among domain types and identifies domains of cell type-specific expression 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/009209v1?rss=1"
</link>
<description><![CDATA[
The genomic neighborhood of a gene influences its activity, a behavior that is attributable in part to domain-scale regulation, in which regions of hundreds or thousands of kilobases known as domains are regulated as a unit. Previous studies using genomics assays such as chromatin immunoprecipitation (ChIP)-seq and chromatin conformation capture (3C)-based assays have identified many types of regulatory domains. However, due to the difficulty of integrating genomics data sets, the relationships among these domain types are poorly understood. Semi-automated genome annotation (SAGA) algorithms facilitate human interpretation of heterogeneous collections of genomics data by simultaneously partitioning the human genome and assigning labels to the resulting genomic segments. However, existing SAGA methods can incorporate only data sets that can be expressed as a one-dimensional vector over the genome and therefore cannot integrate inherently pairwise chromatin conformation data. We developed a new computational method, called graph-based regularization (GBR), for expressing a pairwise prior that encourages certain pairs of genomic loci to receive the same label in a genome annotation. We used GBR to exploit chromatin conformation information during genome annotation by encouraging positions that are close in 3D to occupy the same type of domain. Using this approach, we produced a comprehensive model of chromatin domains in eight human cell types, thereby revealing the relationships among known domain types. Through this model, we identified clusters of tightly-regulated genes expressed in only a small number of cell types, which we term "specific expression domains." We additionally found that a subset of domain boundaries marked by promoters and CTCF motifs are consistent between cell types even when domain activity changes. Finally, we showed that GBR can be used for the seemingly unrelated task of transferring information from well-studied cell types to less well characterized cell types during genome annotation, making it possible to produce high-quality annotations of the hundreds of cell types with limited available data.
]]></description>
<dc:creator>Maxwell W Libbrecht</dc:creator>
<dc:creator>Ferhat Ay</dc:creator>
<dc:creator>Michael M Hoffman</dc:creator>
<dc:creator>David M Gilbert</dc:creator>
<dc:creator>Jeffrey A Bilmes</dc:creator>
<dc:creator>William Stafford Noble</dc:creator>
<dc:creator></dc:creator>
<dc:date>2014-09-16</dc:date>
<dc:identifier>doi:10.1101/009209</dc:identifier>
<dc:title><![CDATA[Joint annotation of chromatin state and chromatin conformation reveals relationships among domain types and identifies domains of cell type-specific expression]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2014-09-16</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/036137v1?rss=1">
<title>
<![CDATA[
Choosing panels of genomics assays using submodular optimization 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/036137v1?rss=1"
</link>
<description><![CDATA[
Although the cost of high-throughput DNA sequencing continues to drop, extensively characterizing a given cell type using assays such as ChIP-seq and DNase-seq is still expensive. As a result, epigenomic characterization of a cell type is typically carried out using a small panel of assay types. Deciding a priori which assays to perform--e.g., a few complementary histone modification ChIP-seq experiments, perhaps an open chromatin assay, plus a few diverse transcription factor assays--is thus a critical step in many studies. Unfortunately, the field currently lacks a principled method for making these choices. We present submodular selection of assays (SSA), a method for choosing a diverse panel of genomic assays that leverages methods from the field of submodular optimization. We also describe a series of evaluation methods that allow us to measure the quality of a selected assay panel in the context of inference tasks such as data imputation, functional element prediction, and semi-automated genome annotation. Applying this evaluation framework to data from the ENCODE and Roadmap Epigenomics Consortia, we provide empirical evidence that SSA provides high quality panels of assays. The method is computationally efficient and is theoretically optimal under certain assumptions. SSA is extremely flexible, and can be employed to select assays for a new cell type or to select additional assays to be performed in a partially characterized cell type. More generally, this application serves as a model for how submodular optimization can be applied to other discrete problems in biology. SSA is available at http://melodi.ee.washington.edu/assay_panel_selection.html.
]]></description>
<dc:creator>Kai Wei</dc:creator>
<dc:creator>Maxwell W Libbrecht</dc:creator>
<dc:creator>Jeffrey A Bilmes</dc:creator>
<dc:creator>William Noble</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-01-07</dc:date>
<dc:identifier>doi:10.1101/036137</dc:identifier>
<dc:title><![CDATA[Choosing panels of genomics assays using submodular optimization]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-01-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/051201v1?rss=1">
<title>
<![CDATA[
Eliminating redundancy among protein sequences using submodular optimization 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/051201v1?rss=1"
</link>
<description><![CDATA[
AbstactO_ST_ABSMotivationC_ST_ABSSubmodular optimization, a discrete analogue to continuous convex optimization, has been used with great success in many fields but is not yet widely used in biology. We apply submodular optimization to the problem of removing redundancy in protein sequence data sets. This is a common step in many bioinformatics and structural biology workflows, including creation of non-redundant training sets for sequence and structural models as well as selection of "operational taxonomic units" from metagenomics data.nnResultsWe demonstrate that the submodular optimization approach results in representative protein sequence subsets with greater structural diversity than sets chosen by existing methods. In particular, we compare to a widely used, heuristic algorithm implemented in software tools such as CD-HIT, as well to as a variety of standard clustering methods, using as a gold standard the SCOPe library of protein domain structures. In this setting, submodular optimization consistently yields protein sequence subsets that include more SCOPe domain families than sets of the same size selected by competing approaches. We also show how the optimization framework allows us to design a mixture objective function that performs well for both large and small representative sets. The framework we describe is theoretically optimal under some assumptions, and it is flexible and intuitive because it applies generic methods to optimize one of a variety of objective functions. This application serves as a model for how submodular optimization can be applied to other discrete problems in biology.nnAvailabilitySource code is available at https://github.com/mlibbrecht/submodular_sequence_repset.nnContactwilliam-noble@uw.edu
]]></description>
<dc:creator>Maxwell W Libbrecht</dc:creator>
<dc:creator>Jeffrey A Bilmes</dc:creator>
<dc:creator>William Stafford Noble</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-05-02</dc:date>
<dc:identifier>doi:10.1101/051201</dc:identifier>
<dc:title><![CDATA[Eliminating redundancy among protein sequences using submodular optimization]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-05-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/123927v1?rss=1">
<title>
<![CDATA[
PREDICTD: PaRallel Epigenomics Data Imputation With Cloud-based Tensor Decomposition 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/123927v1?rss=1"
</link>
<description><![CDATA[
The Encyclopedia of DNA Elements (ENCODE) and the Roadmap Epigenomics Project have produced thousands of data sets mapping the epigenome in hundreds of cell types. However, the number of cell types remains too great to comprehensively map given current time and financial constraints. We present a method, PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition (PREDICTD), to address this issue by computationally imputing missing experiments in collections of epigenomics experiments. PREDICTD leverages an intuitive and natural model called "tensor decomposition" to impute many experiments simultaneously. Compared with the current state-of-the-art method, ChromImpute, PREDICTD produces lower overall mean squared error, and combining methods yields further improvement. We show that PREDICTD data can be used to investigate enhancer biology at non-coding human accelerated regions. PREDICTD provides reference imputed data sets and open-source software for investigating new cell types, and demonstrates the utility of tensor decomposition and cloud computing, two technologies increasingly applicable in bioinformatics.
]]></description>
<dc:creator>Durham, T. J.</dc:creator>
<dc:creator>Libbrecht, M. W.</dc:creator>
<dc:creator>Howbert, J. J.</dc:creator>
<dc:creator>Bilmes, J.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:date>2017-04-04</dc:date>
<dc:identifier>doi:10.1101/123927</dc:identifier>
<dc:title><![CDATA[PREDICTD: PaRallel Epigenomics Data Imputation With Cloud-based Tensor Decomposition]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2017-04-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/147470v1?rss=1">
<title>
<![CDATA[
Segway 2.0: Gaussian mixture models and minibatch training 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/147470v1?rss=1"
</link>
<description><![CDATA[
SummarySegway performs semi-automated genome annotation, discovering joint patterns across multiple genomic signal datasets. We discuss a major new version of Segway and highlight its ability to model data with substantially greater accuracy. Major enhancements in Segway 2.0 include the ability to model data with a mixture of Gaussians, enabling capture of arbitrarily complex signal distributions, and minibatch training, leading to better learned parameters.nnAvailability and ImplementationSegway and its source code are freely available for download at https://segway.hoffmanlab.org. We have made available scripts (https://doi.org/10.5281/zenodo.802940) and datasets (https://doi.org/10.5281/zenodo.802907) for this papers analysis.nnContactmichael.hoffman@utoronto.ca
]]></description>
<dc:creator>Chan, R. C. W.</dc:creator>
<dc:creator>Libbrecht, M. W.</dc:creator>
<dc:creator>Roberts, E. G.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:creator>Hoffman, M. M.</dc:creator>
<dc:date>2017-06-08</dc:date>
<dc:identifier>doi:10.1101/147470</dc:identifier>
<dc:title><![CDATA[Segway 2.0: Gaussian mixture models and minibatch training]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2017-06-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/082594v1?rss=1">
<title>
<![CDATA[
LR-DNase: Predicting TF binding from DNase-seq data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/082594v1?rss=1"
</link>
<description><![CDATA[
Transcription factors play a key role in the regulation of gene expression. Hypersensitivity to DNase I cleavage has long been used to gauge the accessibility of genomic DNA for transcription factor binding and as an indicator of regulatory genomic locations. An increasing amount of ChIP-seq data on a large number of TFs is being generated, mostly in a small number of cell types. DNase-seq data are being produced for hundreds of cell types. We aimed to develop a computational method that could combine ChIP-seq and DNase-seq data to predict TF binding sites in a wide variety of cell types. We trained and tested a logistic regression model, called LR-DNase, to predict binding sites for a specific TF using seven features derived from DNase-seq and genomic sequence. We calculated the area under the precision-recall curve at a false discovery rate cutoff of 0.5 for the LR-DNase model, a number of logistic regression models with fewer features, and several existing state-of-the-art TF binding prediction methods. The LR-DNase model outperformed existing unsupervised and supervised methods. Additionally, for many TFs, a model that uses only two features, DNase-seq reads and motif score, was sufficient to match the performance of the best existing methods.
]]></description>
<dc:creator>van der Velde, A. G.</dc:creator>
<dc:creator>Purcaro, M.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:creator>Weng, Z.</dc:creator>
<dc:date>2016-10-24</dc:date>
<dc:identifier>doi:10.1101/082594</dc:identifier>
<dc:title><![CDATA[LR-DNase: Predicting TF binding from DNase-seq data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-10-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/181842v1?rss=1">
<title>
<![CDATA[
GenomeDISCO: A concordance score for chromosome conformation capture experiments using random walks on contact map graphs 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/181842v1?rss=1"
</link>
<description><![CDATA[
MotivationThe three-dimensional organization of chromatin plays a critical role in gene regulation and disease. High-throughput chromosome conformation capture experiments such as Hi-C are used to obtain genome-wide maps of 3D chromatin contacts. However, robust estimation of data quality and systematic comparison of these contact maps is challenging due to the multi-scale, hierarchical structure of chromatin contacts and the resulting properties of experimental noise in the data. Measuring concordance of contact maps is important for assessing reproducibility of replicate experiments and for modeling variation between different cellular contexts.nnResultsWe introduce a concordance measure called GenomeDISCO (DIfferences between Smoothed COntact maps) for assessing the similarity of a pair of contact maps obtained from chromosome conformation capture experiments. The key idea is to smooth contact maps using random walks on the contact map graph, before estimating concordance. We use simulated datasets to benchmark GenomeDISCOs sensitivity to different types of noise that affect chromatin contact maps. When applied to a large collection of Hi-C datasets, GenomeDISCO accurately distinguishes biological replicates from samples obtained from different cell types. GenomeDISCO also generalizes to other chromosome conformation capture assays, such as HiChIP.nnAvailabilitySoftware implementing GenomeDISCO is available at https://github.com/kundajelab/genomedisco.nnContactakundaje@stanford.edunnSupplementary informationSupplementary data are available at Bioinformatics online.
]]></description>
<dc:creator>Ursu, O.</dc:creator>
<dc:creator>Boley, N.</dc:creator>
<dc:creator>Taranova, M.</dc:creator>
<dc:creator>Wang, Y. X. R.</dc:creator>
<dc:creator>Yardimci, G. G.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:date>2017-08-29</dc:date>
<dc:identifier>doi:10.1101/181842</dc:identifier>
<dc:title><![CDATA[GenomeDISCO: A concordance score for chromosome conformation capture experiments using random walks on contact map graphs]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2017-08-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/481069v1?rss=1">
<title>
<![CDATA[
Massively parallel profiling and predictive modeling of the outcomes of CRISPR/Cas9-mediated double-strand break repair 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/481069v1?rss=1"
</link>
<description><![CDATA[
Non-homologous end-joining (NHEJ) plays an important role in double-strand break (DSB) repair of DNA. Recent studies have shown that the error patterns of NHEJ are strongly biased by sequence context, but these studies were based on relatively few templates. To investigate this more thoroughly, we systematically profiled [~]1.16 million independent mutational events resulting from CRISPR/Cas9-mediated cleavage and NHEJ-mediated DSB repair of 6,872 synthetic target sequences, introduced into a human cell line via lentiviral infection. We find that: 1) insertions are dominated by 1 bp events templated by sequence immediately upstream of the cleavage site, 2) deletions are predominantly associated with microhomology, and 3) targets exhibit variable but reproducible diversity with respect to the number and relative frequency of the mutational outcomes to which they give rise. From these data, we trained a model that uses local sequence context to predict the distribution of mutational outcomes. Exploiting the bias of NHEJ outcomes towards microhomology mediated events, we demonstrate the programming of deletion patterns by introducing microhomology to specific locations in the vicinity of the DSB site. We anticipate that our results will inform investigations of DSB repair mechanisms as well as the design of CRISPR/Cas9 experiments for diverse applications including genome-wide screens, gene therapy, lineage tracing and molecular recording.
]]></description>
<dc:creator>Chen, W.</dc:creator>
<dc:creator>McKenna, A.</dc:creator>
<dc:creator>Schreiber, J.</dc:creator>
<dc:creator>Yin, Y.</dc:creator>
<dc:creator>Agarwal, V.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:creator>Shendure, J.</dc:creator>
<dc:date>2018-11-28</dc:date>
<dc:identifier>doi:10.1101/481069</dc:identifier>
<dc:title><![CDATA[Massively parallel profiling and predictive modeling of the outcomes of CRISPR/Cas9-mediated double-strand break repair]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2018-11-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/822510v1?rss=1">
<title>
<![CDATA[
Index and biological spectrum of accessible DNA elements in the human genome 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/822510v1?rss=1"
</link>
<description><![CDATA[
DNase I hypersensitive sites (DHSs) are generic markers of regulatory DNA and harbor disease- and phenotypic trait-associated genetic variation. We established high-precision maps of DNase I hypersensitive sites from 733 human biosamples encompassing 439 cell and tissue types and states, and integrated these to precisely delineate and numerically index ~3.6 million DHSs encoded within the human genome, providing a common coordinate system for regulatory DNA. Here we show that the expansive scale of cell and tissue states sampled exposes an unprecedented degree of stereotyped actuation of large sets of elements, signaling the operation of distinct genome-scale regulatory programs. We show further that the complex actuation patterns of individual elements can be captured comprehensively by a simple regulatory vocabulary reflecting their dominant cellular manifestation. This vocabulary, in turn, enables comprehensive and quantitative regulatory annotation of both protein-coding genes and the vast array of well-defined but poorly-characterized non-coding RNA genes. Finally, we show that the combination of high-precision DHSs and regulatory vocabularies markedly concentrate disease- and trait-associated non-coding genetic signals both along the genome and across cellular compartments. Taken together, our results provide a common and extensible coordinate system and vocabulary for human regulatory DNA, and a new global perspective on the architecture of human gene regulation.
]]></description>
<dc:creator>Meuleman, W.</dc:creator>
<dc:creator>Muratov, A.</dc:creator>
<dc:creator>Rynes, E.</dc:creator>
<dc:creator>Vierstra, J.</dc:creator>
<dc:creator>Teodosiadis, A.</dc:creator>
<dc:creator>Reynolds, A.</dc:creator>
<dc:creator>Haugen, E.</dc:creator>
<dc:creator>Sandstrom, R.</dc:creator>
<dc:creator>Kaul, R.</dc:creator>
<dc:creator>Stamatoyannopoulos, J. A.</dc:creator>
<dc:date>2019-10-29</dc:date>
<dc:identifier>doi:10.1101/822510</dc:identifier>
<dc:title><![CDATA[Index and biological spectrum of accessible DNA elements in the human genome]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-10-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/801183v1?rss=1">
<title>
<![CDATA[
Zero-shot imputations across species are enabled through joint modeling of human and mouse epigenomics 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/801183v1?rss=1"
</link>
<description><![CDATA[
Recent large-scale efforts to characterize functional activity in human have produced thousands of genome-wide experiments that quantify various forms of biochemistry, such as histone modifications, protein binding, transcription, and chromatin accessibility. Although these experiments represent a small fraction of the possible experiments that could be performed, they also make human more comprehensively characterized than any other species. We propose an extension to the imputation approach Avocado that enables the model to leverage genome alignments and the large number of human genomics data sets when making imputations in other species. We found that not only does this extension result in improved imputation of mouse functional experiments, but that the extended model is able to make accurate imputations for protein binding assays that have been performed in human but not in mouse. This ability to make "zero-shot" imputations greatly increases the utility of such imputation approaches and enables comprehensive imputations to be made for species even when experimental data are sparse.

CCS CONCEPTS* Computing methodologies [-&gt;] Neural networks; Factorization methods; * Applied computing [-&gt;] Bioinformatics; Genomics.

ACM Reference FormatJacob Schreiber, Deepthi Hegde, and William Noble. 2020. Zero-shot imputations across species are enabled through joint modeling of human and mouse epigenomics. In ACM-BCB 2020: 11th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, Sept 21-24, 2020, Virtual. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/1122445.1122456
]]></description>
<dc:creator>Schreiber, J.</dc:creator>
<dc:creator>Hedge, D.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:date>2019-10-11</dc:date>
<dc:identifier>doi:10.1101/801183</dc:identifier>
<dc:title><![CDATA[Zero-shot imputations across species are enabled through joint modeling of human and mouse epigenomics]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-10-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/576405v1?rss=1">
<title>
<![CDATA[
A systematic evaluation of the design, orientation, and sequence context dependencies of massively parallel reporter assays 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/576405v1?rss=1"
</link>
<description><![CDATA[
Massively parallel reporter assays (MPRAs) functionally screen thousands of sequences for regulatory activity in parallel. Although MPRAs have been applied to address diverse questions in gene regulation, there has been no systematic comparison of how differences in experimental design influence findings. Here, we screen a library of 2,440 sequences, representing candidate liver enhancers and controls, in HepG2 cells for regulatory activity using nine different approaches (including conventional episomal, STARR-seq, and lentiviral MPRA designs). We identify subtle but significant differences in the resulting measurements that correlate with epigenetic and sequence-level features. We also test this library in both orientations with respect to the promoter, validating en masse that enhancer activity is robustly independent of orientation. Finally, we develop and apply a novel method to assemble and functionally test libraries of the same putative enhancers as 192-mers, 354-mers, and 678-mers, and observe surprisingly large differences in functional activity. This work provides a framework for the experimental design of high-throughput reporter assays, suggesting that the extended sequence context of tested elements, and to a lesser degree the precise assay, influence MPRA results.
]]></description>
<dc:creator>Klein, J. C.</dc:creator>
<dc:creator>Agarwal, V.</dc:creator>
<dc:creator>Inoue, F.</dc:creator>
<dc:creator>Keith, A.</dc:creator>
<dc:creator>Martin, B.</dc:creator>
<dc:creator>Kircher, M.</dc:creator>
<dc:creator>Ahituv, N.</dc:creator>
<dc:creator>Shendure, J.</dc:creator>
<dc:date>2019-03-13</dc:date>
<dc:identifier>doi:10.1101/576405</dc:identifier>
<dc:title><![CDATA[A systematic evaluation of the design, orientation, and sequence context dependencies of massively parallel reporter assays]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-03-13</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/827071v1?rss=1">
<title>
<![CDATA[
A genome-wide almanac of co-essential modules assigns function to uncharacterized genes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/827071v1?rss=1"
</link>
<description><![CDATA[
A central remaining question in the post-genomic era is how genes interact to form biological pathways. Measurements of gene dependency across hundreds of cell lines have been used to cluster genes into  co-essential pathways, but this approach has been limited by ubiquitous false positives. Here, we develop a statistical method that enables robust identification of gene co-essentiality and yields a genome-wide set of functional modules. This almanac recapitulates diverse pathways and protein complexes and predicts the functions of 102 uncharacterized genes. Validating top predictions, we show that TMEM189 encodes plasmanylethanolamine desaturase, the long-sought key enzyme for plasmalogen synthesis. We also show that C15orf57 binds the AP2 complex, localizes to clathrin-coated pits, and enables efficient transferrin uptake. Finally, we provide an interactive web tool for the community to explore the results (coessentiality.net). Our results establish co-essentiality profiling as a powerful resource for biological pathway identification and discovery of novel gene functions.
]]></description>
<dc:creator>Wainberg, M.</dc:creator>
<dc:creator>Kamber, R. A.</dc:creator>
<dc:creator>Balsubramani, A.</dc:creator>
<dc:creator>Meyers, R. M.</dc:creator>
<dc:creator>Sinnott-Armstrong, N.</dc:creator>
<dc:creator>Hornburg, D.</dc:creator>
<dc:creator>Jiang, L.</dc:creator>
<dc:creator>Chan, J.</dc:creator>
<dc:creator>Jian, R.</dc:creator>
<dc:creator>Gu, M.</dc:creator>
<dc:creator>Shcherbina, A.</dc:creator>
<dc:creator>Dubreuil, M. M.</dc:creator>
<dc:creator>Spees, K.</dc:creator>
<dc:creator>Snyder, M. P.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Bassik, M. C.</dc:creator>
<dc:date>2019-11-01</dc:date>
<dc:identifier>doi:10.1101/827071</dc:identifier>
<dc:title><![CDATA[A genome-wide almanac of co-essential modules assigns function to uncharacterized genes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-11-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/474072v1?rss=1">
<title>
<![CDATA[
RADAR: annotation and prioritization of variants in the post-transcriptional regulome of RNA-binding proteins 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/474072v1?rss=1"
</link>
<description><![CDATA[
RNA-binding proteins (RBPs) play key roles in post-transcriptional regulation and disease. Their binding sites cover more of the genome than coding exons; nevertheless, most noncoding variant-prioritization methods only focus on transcriptional regulation. Here, we integrate the portfolio of ENCODE-RBP experiments to develop RADAR, a variant-scoring framework. RADAR uses conservation, RNA structure, network centrality, and motifs to provide an overall impact score. Then it further incorporates tissue-specific inputs to highlight disease-specific variants. Our results demonstrate RADAR can successfully pinpoint variants, both somatic and germline, associated with RBP-function dysregulation, that cannot be found by most current prioritization methods, for example variants affecting splicing.
]]></description>
<dc:creator>Zhang, J.</dc:creator>
<dc:creator>Liu, J.</dc:creator>
<dc:creator>Lee, D.</dc:creator>
<dc:creator>Feng, J.-J.</dc:creator>
<dc:creator>Lochovsky, L.</dc:creator>
<dc:creator>Lou, S.</dc:creator>
<dc:creator>Rutenberg-Schoenberg, M.</dc:creator>
<dc:creator>Gerstein, M.</dc:creator>
<dc:date>2018-11-19</dc:date>
<dc:identifier>doi:10.1101/474072</dc:identifier>
<dc:title><![CDATA[RADAR: annotation and prioritization of variants in the post-transcriptional regulome of RNA-binding proteins]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2018-11-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.03.24.006866v1?rss=1">
<title>
<![CDATA[
Transcriptome-Wide Combinatorial RNA Structure Probing in Living Cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.03.24.006866v1?rss=1"
</link>
<description><![CDATA[
RNA molecules can fold into complex structures and interact with trans-acting factors to control their biology. Recent methods have been focused on developing novel tools to measure RNA structure transcriptome-wide, but their utility to study and predict RNA-protein interactions or RNA processing has been limited thus far. Here, we extend these studies with the first transcriptomewide mapping method for cataloging RNA solvent accessibility, icLASER. By combining solvent accessibility (icLASER) with RNA flexibility (icSHAPE) data, we efficiently predict RNA-protein interactions transcriptome-wide and catalog RNA polyadenylation sites by RNA structure alone. These studies showcase the power of designing novel chemical approaches to studying RNA biology. Further, our study exemplifies merging complementary methods to measure RNA structure inside cells and its utility for predicting transcriptome-wide interactions that are critical for control of and regulation by RNA structure. We envision such approaches can be applied to studying different cell types or cells under varying conditions, using RNA structure and footprinting to characterize cellular interactions and processing involving RNA.
]]></description>
<dc:creator>Spitale, R.</dc:creator>
<dc:creator>Chan, D.</dc:creator>
<dc:creator>Feng, C.</dc:creator>
<dc:creator>England, W.</dc:creator>
<dc:creator>Wyman, D.</dc:creator>
<dc:creator>Flynn, R.</dc:creator>
<dc:creator>Wang, X.</dc:creator>
<dc:creator>Shi, Y.</dc:creator>
<dc:creator>Mortazavi, A.</dc:creator>
<dc:date>2020-03-25</dc:date>
<dc:identifier>doi:10.1101/2020.03.24.006866</dc:identifier>
<dc:title><![CDATA[Transcriptome-Wide Combinatorial RNA Structure Probing in Living Cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-03-25</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.02.24.963652v1?rss=1">
<title>
<![CDATA[
3D Epigenomic Characterization Reveals Insights Into Gene Regulation and Lineage Specification During Corticogenesis 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.02.24.963652v1?rss=1"
</link>
<description><![CDATA[
Lineage-specific epigenomic changes during human corticogenesis have previously remained elusive due to challenges with tissue heterogeneity and sample availability. Here, we analyze cis-regulatory chromatin interactions, open chromatin regions, and transcriptomes for radial glia, intermediate progenitor cells, excitatory neurons, and interneurons isolated from mid-gestational human brain samples. We show that chromatin looping underlies transcriptional regulation for lineage-specific genes, with transcription factor motifs, families of transposable elements, and disease-associated variants enriched at distal interacting regions in a cell type-specific manner. A subset of promoters exhibit unusually high degrees of chromatin interactivity, which we term super interactive promoters. Super interactive promoters are enriched for critical lineage-specific genes, suggesting that interactions at these loci contribute to the fine-tuning of cell type-specific transcription. Finally, we present CRISPRview, a novel approach for validating distal interacting regions in primary cells. Our study presents the first characterization of cell type-specific 3D epigenomic landscapes during human corticogenesis, advancing our understanding of gene regulation and lineage specification during human brain development.
]]></description>
<dc:creator>Song, M.</dc:creator>
<dc:creator>Pebworth, M.-P.</dc:creator>
<dc:creator>Yang, X.</dc:creator>
<dc:creator>Abnousi, A.</dc:creator>
<dc:creator>Fan, C.</dc:creator>
<dc:creator>Wen, J.</dc:creator>
<dc:creator>Rosen, J.</dc:creator>
<dc:creator>Choudhary, M.</dc:creator>
<dc:creator>Cui, X.</dc:creator>
<dc:creator>Jones, I.</dc:creator>
<dc:creator>Bergenholtz, S.</dc:creator>
<dc:creator>Eze, U.</dc:creator>
<dc:creator>Juric, I.</dc:creator>
<dc:creator>Li, B.</dc:creator>
<dc:creator>Maliskova, L.</dc:creator>
<dc:creator>Liu, W.</dc:creator>
<dc:creator>Pollen, A.</dc:creator>
<dc:creator>Li, Y.</dc:creator>
<dc:creator>Wang, T.</dc:creator>
<dc:creator>Hu, M.</dc:creator>
<dc:creator>Kriegstein, A.</dc:creator>
<dc:creator>Shen, Y.</dc:creator>
<dc:date>2020-02-25</dc:date>
<dc:identifier>doi:10.1101/2020.02.24.963652</dc:identifier>
<dc:title><![CDATA[3D Epigenomic Characterization Reveals Insights Into Gene Regulation and Lineage Specification During Corticogenesis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-02-25</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.04.28.066498v1?rss=1">
<title>
<![CDATA[
ArchR: An integrative and scalable software package for single-cell chromatin accessibility analysis 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.04.28.066498v1?rss=1"
</link>
<description><![CDATA[
The advent of large-scale single-cell chromatin accessibility profiling has accelerated our ability to map gene regulatory landscapes, but has outpaced the development of robust, scalable software to rapidly extract biological meaning from these data. Here we present a software suite for single-cell analysis of regulatory chromatin in R (ArchR; www.ArchRProject.com) that enables fast and comprehensive analysis of single-cell chromatin accessibility data. ArchR provides an intuitive, user-focused interface for complex single-cell analyses including doublet removal, single-cell clustering and cell type identification, robust peak set generation, cellular trajectory identification, DNA element to gene linkage, transcription factor footprinting, mRNA expression level prediction from chromatin accessibility, and multi-omic integration with scRNA-seq. Enabling the analysis of over 1.2 million single cells within 8 hours on a standard Unix laptop, ArchR is a comprehensive analytical suite for end-to-end analysis of single-cell chromatin accessibility data that will accelerate the understanding of gene regulation at the resolution of individual cells.
]]></description>
<dc:creator>Granja, J. M.</dc:creator>
<dc:creator>Corces, M. R.</dc:creator>
<dc:creator>Pierce, S. E.</dc:creator>
<dc:creator>Bagdatli, S. T.</dc:creator>
<dc:creator>Choudhry, H.</dc:creator>
<dc:creator>Chang, H.</dc:creator>
<dc:creator>Greenleaf, W.</dc:creator>
<dc:date>2020-04-29</dc:date>
<dc:identifier>doi:10.1101/2020.04.28.066498</dc:identifier>
<dc:title><![CDATA[ArchR: An integrative and scalable software package for single-cell chromatin accessibility analysis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-04-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.05.04.077255v1?rss=1">
<title>
<![CDATA[
Allele-specific alternative splicing in human tissues 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.05.04.077255v1?rss=1"
</link>
<description><![CDATA[
Alternative splicing is an RNA processing mechanism that affects most genes in human, contributing to disease mechanisms and phenotypic diversity. The regulation of splicing involves an intricate network of cis-regulatory elements and trans-acting factors. Due to their high sequence specificity, cis-regulation of splicing can be altered by genetic variants, significantly affecting splicing outcomes. Recently, multiple methods have been applied to understanding the regulatory effects of genetic variants on splicing. However, it is still challenging to go beyond apparent association to pinpoint functional variants. To fill in this gap, we utilized large-scale datasets of the Genotype-Tissue Expression (GTEx) project to study genetically-modulated alternative splicing (GMAS) via identification of allele-specific splicing events. We demonstrate that GMAS events are shared across tissues and individuals more often than expected by chance, consistent with their genetically driven nature. Moreover, although the allelic bias of GMAS exons varies across samples, the degree of variation is similar across tissues vs. individuals. Thus, genetic background drives the GMAS pattern to a similar degree as tissue-specific splicing mechanisms. Leveraging the genetically driven nature of GMAS, we developed a new method to predict functional splicing-altering variants, built upon a genotype-phenotype concordance model across samples. Complemented by experimental validations, this method predicted >1000 functional variants, many of which may alter RNA-protein interactions. Lastly, 72% of GMAS-associated SNPs were in linkage disequilibrium with GWAS-reported SNPs, and such association was enriched in tissues of relevance for specific traits/diseases. Our study enables a comprehensive view of genetically driven splicing variations in human tissues.
]]></description>
<dc:creator>Amoah, K.</dc:creator>
<dc:creator>Hsiao, Y.-H. E.</dc:creator>
<dc:creator>Bahn, J. H.</dc:creator>
<dc:creator>Sun, Y.</dc:creator>
<dc:creator>Burghard, C.</dc:creator>
<dc:creator>Tan, B. X.</dc:creator>
<dc:creator>Yang, E.-W.</dc:creator>
<dc:creator>Xiao, X.</dc:creator>
<dc:date>2020-05-05</dc:date>
<dc:identifier>doi:10.1101/2020.05.04.077255</dc:identifier>
<dc:title><![CDATA[Allele-specific alternative splicing in human tissues]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-05-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.03.06.981191v1?rss=1">
<title>
<![CDATA[
Differential RNA editing between epithelial and mesenchymal tumors impacts mRNA abundance in immune response pathways 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.03.06.981191v1?rss=1"
</link>
<description><![CDATA[
Recent studies revealed global shifts in RNA editing, the modification of RNA sequences, across many cancers. Besides a few sites implicated in tumorigenesis or metastasis, most tumor-associated sites, predominantly in noncoding regions, have unknown function. Here, we characterize editing profiles between epithelial (E) and mesenchymal (M) phenotypes in seven cancer types, as epithelial-mesenchymal transition (EMT) is a key paradigm for metastasis. We observe distinct editing patterns between E and M tumors and EMT induction upon loss of ADAR enzymes in cultured cells. E-M differential sites are highly enriched in genes involved in immune and viral processes, some of which regulate mRNA abundance of their respective genes. We identify a novel mechanism in which ILF3 preferentially stabilizes edited transcripts. Among editing-dependent ILF3 targets is the transcript encoding PKR, a crucial player in immune response. Our study demonstrates the broad impact of RNA editing in cancer and relevance of editing to cancer-related immune pathways.
]]></description>
<dc:creator>Chan, T.</dc:creator>
<dc:creator>Fu, T.</dc:creator>
<dc:creator>Bahn, J. H.</dc:creator>
<dc:creator>Jun, H.-I.</dc:creator>
<dc:creator>Lee, J.-H.</dc:creator>
<dc:creator>Quinones-Valdez, G.</dc:creator>
<dc:creator>Cheng, C.</dc:creator>
<dc:creator>Xiao, X.</dc:creator>
<dc:date>2020-03-08</dc:date>
<dc:identifier>doi:10.1101/2020.03.06.981191</dc:identifier>
<dc:title><![CDATA[Differential RNA editing between epithelial and mesenchymal tumors impacts mRNA abundance in immune response pathways]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-03-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.03.24.006551v1?rss=1">
<title>
<![CDATA[
Extracellular microRNA 3' end modification across diverse body fluids 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.03.24.006551v1?rss=1"
</link>
<description><![CDATA[
microRNAs (miRNAs) are small non-coding RNAs that play critical roles in gene regulation. The presence of miRNAs in extracellular biofluids is increasingly recognized. However, most previous characterization of extracellular miRNAs focused on their overall expression levels. Alternative sequence isoforms and modifications of miRNAs were rarely considered in the extracellular space. Here, we developed a highly accurate bioinformatic method, called miNTA, to identify 3 non-templated additions (NTAs) of miRNAs using small RNA-sequencing data. Using miNTA, we conducted an in-depth analysis of miRNA 3 NTA profiles in 1047 extracellular RNA-sequencing data sets of 4 types of biofluids. This analysis identified abundant 3 uridylation and adenylation of miRNAs, with an estimated false discovery rate of <5%. Strikingly, we found that 3 uridylation levels enabled segregation of different types of biofluids, more effectively than overall miRNA expression levels. This observation suggests that 3 NTA levels possess fluid-specific information insensitive to batch effects. In addition, we observed that extracellular miRNAs with 3 uridylations are enriched in processes related to angiogenesis, apoptosis and inflammatory response, and this type of modification may stabilize base-pairing between miRNAs and their target genes. Together, our study provides a comprehensive landscape of miRNA NTAs in human biofluids, which paves way for further biomarker discoveries. The insights generated in our work built a foundation for future functional, mechanistic and translational discoveries.
]]></description>
<dc:creator>Koyano, K.</dc:creator>
<dc:creator>Bahn, J.</dc:creator>
<dc:creator>Xiao, X.</dc:creator>
<dc:date>2020-03-25</dc:date>
<dc:identifier>doi:10.1101/2020.03.24.006551</dc:identifier>
<dc:title><![CDATA[Extracellular microRNA 3' end modification across diverse body fluids]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-03-25</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.05.11.078675v1?rss=1">
<title>
<![CDATA[
HCR-FlowFISH: A flexible CRISPR screening method to identify cis-regulatory elements and their target genes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.05.11.078675v1?rss=1"
</link>
<description><![CDATA[
CRISPR screens for cis-regulatory elements (CREs) have shown unprecedented power to endogenously characterize the non-coding genome. To characterize CREs we developed HCR-FlowFISH (Hybridization Chain Reaction Fluorescent In-Situ Hybridization coupled with Flow Cytometry), which directly quantifies native transcripts within their endogenous loci following CRISPR perturbations of regulatory elements, eliminating the need for restrictive phenotypic assays such as growth or transcript-tagging. HCR-FlowFISH accurately quantifies gene expression across a wide range of transcript levels and cell types. We also developed CASA (CRISPR Activity Screen Analysis), a hierarchical Bayesian model to identify and quantify CRE activity. Using >270,000 perturbations, we identified CREs for GATA1, HDAC6, ERP29, LMO2, MEF2C, CD164, NMU, FEN1 and the FADS gene cluster. Our methods detect subtle gene expression changes and identify CREs regulating multiple genes, sometimes at different magnitudes and directions. We demonstrate the power of HCR-FlowFISH to parse genome-wide association signals by nominating causal variants and target genes.
]]></description>
<dc:creator>Reilly, S. K.</dc:creator>
<dc:creator>Gosai, S. J.</dc:creator>
<dc:creator>Guiterrez, A.</dc:creator>
<dc:creator>Ulirsch, J. C.</dc:creator>
<dc:creator>Kanai, M.</dc:creator>
<dc:creator>Berenzy, D.</dc:creator>
<dc:creator>Kales, S.</dc:creator>
<dc:creator>Butler, G. B.</dc:creator>
<dc:creator>Gladden-Young, A.</dc:creator>
<dc:creator>Finucane, H. K.</dc:creator>
<dc:creator>Sabeti, P. C.</dc:creator>
<dc:creator>Tewhey, R.</dc:creator>
<dc:date>2020-05-12</dc:date>
<dc:identifier>doi:10.1101/2020.05.11.078675</dc:identifier>
<dc:title><![CDATA[HCR-FlowFISH: A flexible CRISPR screening method to identify cis-regulatory elements and their target genes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-05-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/694869v1?rss=1">
<title>
<![CDATA[
STARRPeaker: Uniform processing and accurate identification of whole human STARR-seq active regions 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/694869v1?rss=1"
</link>
<description><![CDATA[
BackgroundHigh-throughput reporter assays, such as self-transcribing active regulatory region sequencing (STARR-seq), allow for unbiased and quantitative assessment of enhancers at a genome-wide scale. Recent advances in STARR-seq technology have employed progressively more complex genomic libraries and increased sequencing depths, to assay larger sized regions, up to the entire human genome. These advances necessitate a reliable processing pipeline and peak-calling algorithm.

ResultsMost STARR-seq studies have relied on chromatin immunoprecipitation sequencing (ChIP-seq) processing pipelines. However, there are key differences in STARR-seq versus ChIP-seq. First, STARR-seq uses transcribed RNA to measure the activity of an enhancer, making an accurate determination of the basal transcription rate important. Second, STARR-seq coverage is highly non-uniform, overdispersed, and often confounded by sequencing biases, such as GC content and mappability. Lastly, here, we observed a clear correlation between RNA thermodynamic stability and STARR-seq readout, suggesting that STARR-seq may be sensitive to RNA secondary structure and stability. Considering these findings, we developed a negative-binomial regression framework for uniformly processing STARR-seq data, called STARRPeaker. In support of this, we generated whole-genome STARR-seq data from the HepG2 and K562 human cell lines and applied STARRPeaker to call enhancers.

ConclusionsWe show STARRPeaker can unbiasedly detect active enhancers from both captured and whole-genome STARR-seq data. Specifically, we report [~]33,000 and [~]20,000 candidate enhancers from HepG2 and K562, respectively. Moreover, we show that STARRPeaker outperforms other peak callers in terms of identifying known enhancers with fewer false positives. Overall, we demonstrate an optimized processing framework for STARR-seq experiments can identify putative enhancers while addressing potential confounders.
]]></description>
<dc:creator>Lee, D.</dc:creator>
<dc:creator>Shi, M.</dc:creator>
<dc:creator>Moran, J.</dc:creator>
<dc:creator>Wall, M.</dc:creator>
<dc:creator>Zhang, J.</dc:creator>
<dc:creator>Liu, J.</dc:creator>
<dc:creator>Fitzgerald, D.</dc:creator>
<dc:creator>Kyono, Y.</dc:creator>
<dc:creator>Ma, L.</dc:creator>
<dc:creator>White, K. P.</dc:creator>
<dc:creator>Gerstein, M.</dc:creator>
<dc:date>2019-07-08</dc:date>
<dc:identifier>doi:10.1101/694869</dc:identifier>
<dc:title><![CDATA[STARRPeaker: Uniform processing and accurate identification of whole human STARR-seq active regions]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-07-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/706424v1?rss=1">
<title>
<![CDATA[
An integrative ENCODE resource for cancer genomics 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/706424v1?rss=1"
</link>
<description><![CDATA[
ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.
]]></description>
<dc:creator>Zhang, J.</dc:creator>
<dc:creator>Lee, D.</dc:creator>
<dc:creator>Dhiman, V.</dc:creator>
<dc:creator>Jiang, P.</dc:creator>
<dc:creator>Xu, J.</dc:creator>
<dc:creator>McGillivray, P.</dc:creator>
<dc:creator>Yang, H.</dc:creator>
<dc:creator>Liu, J.</dc:creator>
<dc:creator>Meyerson, W.</dc:creator>
<dc:creator>Clarke, D.</dc:creator>
<dc:creator>Gu, M.</dc:creator>
<dc:creator>Li, S.</dc:creator>
<dc:creator>Lou, S.</dc:creator>
<dc:creator>Xu, J.</dc:creator>
<dc:creator>Lochovsky, L.</dc:creator>
<dc:creator>Ung, M.</dc:creator>
<dc:creator>Ma, L.</dc:creator>
<dc:creator>Yu, S.</dc:creator>
<dc:creator>Cao, Q.</dc:creator>
<dc:creator>Harmanci, A.</dc:creator>
<dc:creator>Yan, K.-K.</dc:creator>
<dc:creator>Sethi, A.</dc:creator>
<dc:creator>Gursoy, G.</dc:creator>
<dc:creator>Schoenberg, M. R.</dc:creator>
<dc:creator>Rozowsky, J.</dc:creator>
<dc:creator>Warrell, J.</dc:creator>
<dc:creator>Emani, P.</dc:creator>
<dc:creator>Yang, Y. T.</dc:creator>
<dc:creator>Galeev, T.</dc:creator>
<dc:creator>Kong, X.</dc:creator>
<dc:creator>Liu, S.</dc:creator>
<dc:creator>Li, X.</dc:creator>
<dc:creator>Krishnan, J.</dc:creator>
<dc:creator>Feng, Y.</dc:creator>
<dc:creator>Rivera-Mulia, J. C.</dc:creator>
<dc:creator>Adrian, J.</dc:creator>
<dc:creator>Broach, J. R.</dc:creator>
<dc:creator>Bolt, M.</dc:creator>
<dc:creator>Moran, J.</dc:creator>
<dc:creator>Fitzgerald, D.</dc:creator>
<dc:creator>Dileep, V.</dc:creator>
<dc:creator>Liu, T.</dc:creator>
<dc:creator>Mei, S.</dc:creator>
<dc:creator>Sasaki, T.</dc:creator>
<dc:creator>Trevilla-Garcia, C.</dc:creator>
<dc:creator>Wang, S.</dc:creator>
<dc:creator>Wang, Y.</dc:creator>
<dc:creator>Zang, C.</dc:creator>
<dc:creator>Wang, D.</dc:creator>
<dc:creator>Klein, R.</dc:creator>
<dc:creator>Snyder, M.</dc:creator>
<dc:creator>Gilbert, D.</dc:creator>
<dc:date>2019-07-18</dc:date>
<dc:identifier>doi:10.1101/706424</dc:identifier>
<dc:title><![CDATA[An integrative ENCODE resource for cancer genomics]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-07-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/533273v1?rss=1">
<title>
<![CDATA[
Completing the ENCODE3 compendium yields accurate imputations across a variety of assays and human biosamples 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/533273v1?rss=1"
</link>
<description><![CDATA[
MotivationRecent efforts to describe the human epigenome have yielded thousands of uniformly processed epigenomic and transcriptomic data sets. These data sets characterize a rich variety of biological activity in hundreds of human cell lines and tissues ("biosamples"). Understanding these data sets, and specifically how they differ across biosamples, can help explain many cellular mechanisms, particularly those driving development and disease. However, due primarily to cost, the total number of assays that can be performed is limited. Previously described imputation approaches, such as Avocado, have sought to overcome this limitation by predicting genome-wide epigenomics experiments using learned associations among available epigenomic data sets. However, these previous imputations have focused primarily on measurements of histone modification and chromatin accessibility, despite other biological activity being crucially important.

ResultsWe applied Avocado to a data set of 3,814 tracks of data derived from the ENCODE compendium, spanning 400 human biosamples and 84 assays. The resulting imputations cover measurements of chromatin accessibility, histone modification, transcription, and protein binding. We demonstrate the quality of these imputations by comprehensively evaluating the models predictions and by showing significant improvements in protein binding performance compared to the top models in an ENCODE-DREAM challenge. Additionally, we show that the Avocado model allows for efficient addition of new assays and biosamples to a pre-trained model, achieving high accuracy at predicting protein binding, even with only a single track of training data.

AvailabilityTutorials and source code are available under an Apache 2.0 license at https://github.com/jmschrei/avocado.

Contactwilliam-noble@uw.edu or jmschr@cs.washington.edu
]]></description>
<dc:creator>Schreiber, J.</dc:creator>
<dc:creator>Bilmes, J.</dc:creator>
<dc:creator>Noble, W.</dc:creator>
<dc:date>2019-01-29</dc:date>
<dc:identifier>doi:10.1101/533273</dc:identifier>
<dc:title><![CDATA[Completing the ENCODE3 compendium yields accurate imputations across a variety of assays and human biosamples]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-01-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/364976v1?rss=1">
<title>
<![CDATA[
Multi-scale deep tensor factorization learns a latent representation of the human epigenome 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/364976v1?rss=1"
</link>
<description><![CDATA[
The human epigenome has been experimentally characterized by measurements of protein binding, chromatin acessibility, methylation, and histone modification in hundreds of cell types. The result is a huge compendium of data, consisting of thousands of measurements for every basepair in the human genome. These data are difficult to make sense of, not only for humans, but also for computational methods that aim to detect genes and other functional elements, predict gene expression, characterize polymorphisms, etc. To address this challenge, we propose a deep neural network tensor factorization method, Avocado, that compresses epigenomic data into a dense, information-rich representation of the human genome. We use data from the Roadmap Epigenomics Consortium to demonstrate that this learned representation of the genome is broadly useful: first, by imputing epigenomic data more accurately than previous methods, and second, by showing that machine learning models that exploit this representation outperform those trained directly on epigenomic data on a variety of genomics tasks. These tasks include predicting gene expression, promoter-enhancer interactions, replication timing, and an element of 3D chromatin architecture. Our findings suggest the broad utility of Avocados learned latent representation for computational genomics and epigenomics.
]]></description>
<dc:creator>Schreiber, J.</dc:creator>
<dc:creator>Durham, T. J.</dc:creator>
<dc:creator>Bilmes, J.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:date>2018-07-08</dc:date>
<dc:identifier>doi:10.1101/364976</dc:identifier>
<dc:title><![CDATA[Multi-scale deep tensor factorization learns a latent representation of the human epigenome]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2018-07-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/386656v1?rss=1">
<title>
<![CDATA[
Pseudogenes in the mouse lineage: transcriptional activity and strain-specific history 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/386656v1?rss=1"
</link>
<description><![CDATA[
Pseudogenes are ideal markers of genome remodeling. In turn, the mouse is an ideal platform for studying them, particularly with the availability of developmental transcriptional data and the sequencing of 18 strains. Here, we present a comprehensive genome-wide annotation of the pseudogenes in the mouse reference genome and associated strains. We compiled this by combining manual curation of over 10,000 pseudogenes with results from automatic annotation pipelines. Also, by comparing the human and mouse, we annotated 165 unitary pseudogenes in mouse, and 303 unitaries in human. We make all our annotation available through mouse.pseudogene.org. The overall mouse pseudogene repertoire (in the reference and strains) is similar to human in terms of overall size, biotype distribution (~80% processed/~20% duplicated) and top family composition (with many GAPDH and ribosomal pseudogenes). However, notable differences arise in the pseudogene age distribution, with multiple retro-transpositional bursts in mouse evolutionary history and only one in human. Furthermore, in each strain about a fifth of the pseudogenes are unique, reflecting strain-specific functions and evolution. Additionally, we find that ~15% of the pseudogenes are transcribed, a fraction similar to that for human, and that pseudogene transcription exhibits greater tissue and strain specificity compared to protein-coding genes. Finally, we show that highly transcribed parent genes tend to give rise to processed pseudogenes.
]]></description>
<dc:creator>Sisu, C.</dc:creator>
<dc:creator>Muir, P.</dc:creator>
<dc:creator>Frankish, A.</dc:creator>
<dc:creator>Fiddes, I.</dc:creator>
<dc:creator>Diekhans, M.</dc:creator>
<dc:creator>Thybert, D.</dc:creator>
<dc:creator>Odom, D.</dc:creator>
<dc:creator>Flicek, P.</dc:creator>
<dc:creator>Keane, T.</dc:creator>
<dc:creator>Hubbard, T.</dc:creator>
<dc:creator>Harrow, J.</dc:creator>
<dc:creator>Gerstein, M.</dc:creator>
<dc:date>2018-08-07</dc:date>
<dc:identifier>doi:10.1101/386656</dc:identifier>
<dc:title><![CDATA[Pseudogenes in the mouse lineage: transcriptional activity and strain-specific history]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2018-08-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.03.15.992750v1?rss=1">
<title>
<![CDATA[
Detecting sample swaps in diverse NGS data types using linkage disequilibrium 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.03.15.992750v1?rss=1"
</link>
<description><![CDATA[
As the number of genomics datasets grows rapidly, sample mislabeling has become a high stakes issue. We present CrosscheckFingerprints (Crosscheck), a tool for quantifying sample-relatedness and detecting incorrectly paired sequencing datasets from different donors. Crosscheck outperforms similar methods and is effective even when data are sparse or from different assays. Application of Crosscheck to 8851 ENCODE ChIP-, RNA-, and DNase-seq datasets enabled us to identify and correct dozens of mislabeled samples and ambiguous metadata annotations, representing [~]1% of ENCODE datasets.
]]></description>
<dc:creator>Javed, N. M.</dc:creator>
<dc:creator>Farjoun, Y.</dc:creator>
<dc:creator>Fennell, T.</dc:creator>
<dc:creator>Epstein, C. B.</dc:creator>
<dc:creator>Bernstein, B. E.</dc:creator>
<dc:creator>Shoresh, N.</dc:creator>
<dc:date>2020-03-17</dc:date>
<dc:identifier>doi:10.1101/2020.03.15.992750</dc:identifier>
<dc:title><![CDATA[Detecting sample swaps in diverse NGS data types using linkage disequilibrium]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-03-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/188755v1?rss=1">
<title>
<![CDATA[
Measuring the reproducibility and quality of Hi-C data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/188755v1?rss=1"
</link>
<description><![CDATA[
Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study. Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established (e.g., ratio of intra to interchromosomal interactions) and novel (e.g., QuASAR-QC) measures to identify low quality experiments. In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community.
]]></description>
<dc:creator>Yardimci, G.</dc:creator>
<dc:creator>Ozadam, H.</dc:creator>
<dc:creator>Sauria, M. E. G.</dc:creator>
<dc:creator>Ursu, O.</dc:creator>
<dc:creator>Yan, K.-K.</dc:creator>
<dc:creator>Yang, T.</dc:creator>
<dc:creator>Chakraborty, A.</dc:creator>
<dc:creator>Kaul, A.</dc:creator>
<dc:creator>Lajoie, B. R.</dc:creator>
<dc:creator>Song, F.</dc:creator>
<dc:creator>Zhan, Y.</dc:creator>
<dc:creator>Ay, F.</dc:creator>
<dc:creator>Gerstein, M.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:creator>Taylor, J.</dc:creator>
<dc:creator>Yue, F.</dc:creator>
<dc:creator>Dekker, J.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:date>2017-09-14</dc:date>
<dc:identifier>doi:10.1101/188755</dc:identifier>
<dc:title><![CDATA[Measuring the reproducibility and quality of Hi-C data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2017-09-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/103614v1?rss=1">
<title>
<![CDATA[
Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/103614v1?rss=1"
</link>
<description><![CDATA[
Recently, Hi-C has been used to probe the 3D chromatin architecture of multiple organisms and cell types. The resulting collections of pairwise contacts across the genome have connected chromatin architecture to many cellular phenomena, including replication timing and gene regulation. However, high resolution (10 kb or finer) contact maps remain scarce due to the expense and time required for collection. A computational method for predicting pairwise contacts without the need to run a Hi-C experiment would be invaluable in understanding the role that 3D chromatin architecture plays in genome biology. We describe Rambutan, a deep convolutional neural network that predicts Hi-C contacts at 1 kb resolution using nucleotide sequence and DNaseI assay signal as inputs. Specifically, Rambutan identifies locus pairs that engage in high confidence contacts according to Fit-Hi-C, a previously described method for assigning statistical confidence estimates to Hi-C contacts. We first demonstrate Rambutans performance across chromosomes at 1 kb resolution in the GM12878 cell line. Subsequently, we measure Rambutans performance across six cell types. In this setting, the model achieves an area under the receiver operating characteristic curve between 0.7662 and 0.8246 and an area under the precision-recall curve between 0.3737 and 0.9008. We further demonstrate that the predicted contacts exhibit expected trends relative to histone modification ChlP-seq data, replication timing measurements, and annotations of functional elements such as promoters and enhancers. Finally, we predict Hi-C contacts for 53 human cell types and show that the predictions cluster by cellular function. [NOTE: After our original submission we discovered an error in our calling of statistically significant contacts. Briefly, when calculating the prior probability of a contact, we used the number of contacts at a certain genomic distance in a chromosome but divided by the total number of bins in the full genome. While we investigate what impact this had on our results, we ask that readers treat this manuscript skeptically.]
]]></description>
<dc:creator>Schreiber, J.</dc:creator>
<dc:creator>Libbrecht, M.</dc:creator>
<dc:creator>Bilmes, J.</dc:creator>
<dc:creator>Noble, W.</dc:creator>
<dc:date>2017-01-27</dc:date>
<dc:identifier>doi:10.1101/103614</dc:identifier>
<dc:title><![CDATA[Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2017-01-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/101386v1?rss=1">
<title>
<![CDATA[
HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/101386v1?rss=1"
</link>
<description><![CDATA[
Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods for assessing Hi-C data reproducibility can produce misleading results because they ignore spatial features in Hi-C data, such as domain structure and distance dependence. We present HiCRep, a framework for assessing the reproducibility of Hi-C data that systematically accounts for these features. In particular, we introduce a novel similarity measure, the stratum adjusted correlation coefficient (SCC), for quantifying the similarity between Hi-C interaction matrices. Not only does it provide a statistically sound and reliable evaluation of reproducibility, SCC can also be used to quantify differences between Hi-C contact matrices and to determine the optimal sequencing depth for a desired resolution. The measure consistently shows higher accuracy than existing approaches in distinguishing subtle differences in reproducibility and depicting interrelationships of cell lineages. The proposed measure is straightforward to interpret and easy to compute, making it well-suited for providing standardized, interpretable, automatable, and scalable quality control. The freely available R package HiCRep implements our approach.
]]></description>
<dc:creator>Yang, T.</dc:creator>
<dc:creator>Zhang, F.</dc:creator>
<dc:creator>Yardimci, G. G.</dc:creator>
<dc:creator>Hardison, R. C.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:creator>Yue, F.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:date>2017-01-18</dc:date>
<dc:identifier>doi:10.1101/101386</dc:identifier>
<dc:title><![CDATA[HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2017-01-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/119651v1?rss=1">
<title>
<![CDATA[
An Integrative Framework For Detecting Structural Variations In Cancer Genomes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/119651v1?rss=1"
</link>
<description><![CDATA[
Structural variants can contribute to oncogenesis through a variety of mechanisms, yet, despite their importance, the identification of structural variants in cancer genomes remains challenging. Here, we present an integrative framework for comprehensively identifying structural variation in cancer genomes. For the first time, we apply next-generation optical mapping, high-throughput chromosome conformation capture (Hi-C), and whole genome sequencing to systematically detect SVs in a variety of cancer cells.nnUsing this approach, we identify and characterize structural variants in up to 29 commonly used normal and cancer cell lines. We find that each method has unique strengths in identifying different classes of structural variants and at different scales, suggesting that integrative approaches are likely the only way to comprehensively identify structural variants in the genome. Studying the impact of the structural variants in cancer cell lines, we identify widespread structural variation events affecting the functions of non-coding sequences in the genome, including the deletion of distal regulatory sequences, alteration of DNA replication timing, and the creation of novel 3D chromatin structural domains.nnThese results underscore the importance of comprehensive structural variant identification and indicate that non-coding structural variation may be an underappreciated mutational process in cancer genomes.
]]></description>
<dc:creator>Dixon, J.</dc:creator>
<dc:creator>Xu, J.</dc:creator>
<dc:creator>Dileep, V.</dc:creator>
<dc:creator>Zhan, Y.</dc:creator>
<dc:creator>Song, F.</dc:creator>
<dc:creator>Le, V. T.</dc:creator>
<dc:creator>Yardimci, G. G.</dc:creator>
<dc:creator>Chakraborty, A.</dc:creator>
<dc:creator>Bann, D. V.</dc:creator>
<dc:creator>Wang, Y.</dc:creator>
<dc:creator>Clark, R.</dc:creator>
<dc:creator>Zhang, L.</dc:creator>
<dc:creator>Yang, H.</dc:creator>
<dc:creator>Liu, T.</dc:creator>
<dc:creator>Iyyanki, S.</dc:creator>
<dc:creator>An, L.</dc:creator>
<dc:creator>Pool, C.</dc:creator>
<dc:creator>Sasaki, T.</dc:creator>
<dc:creator>Mulia, J. C. R.</dc:creator>
<dc:creator>Ozadam, H.</dc:creator>
<dc:creator>Lajoie, B. R.</dc:creator>
<dc:creator>Kaul, R.</dc:creator>
<dc:creator>Buckley, M.</dc:creator>
<dc:creator>Lee, K.</dc:creator>
<dc:creator>Diegel, M.</dc:creator>
<dc:creator>Pezic, D.</dc:creator>
<dc:creator>Ernst, C.</dc:creator>
<dc:creator>Hadjur, S.</dc:creator>
<dc:creator>Odom, D. T.</dc:creator>
<dc:creator>Stamatoyannopoulos, J. A.</dc:creator>
<dc:creator>Broach, J. R.</dc:creator>
<dc:creator>Hardison, R.</dc:creator>
<dc:creator>Ay, F.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:creator>Dekker, J.</dc:creator>
<dc:creator>Gilbert, D. M.</dc:creator>
<dc:creator>Yue, F.</dc:creator>
<dc:date>2017-03-28</dc:date>
<dc:identifier>doi:10.1101/119651</dc:identifier>
<dc:title><![CDATA[An Integrative Framework For Detecting Structural Variations In Cancer Genomes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2017-03-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/862177v1?rss=1">
<title>
<![CDATA[
TopicNet: a framework for measuring transcriptional regulatory network change 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/862177v1?rss=1"
</link>
<description><![CDATA[
Next generation sequencing data highlights comprehensive and dynamic changes in the human gene regulatory network. Moreover, changes in regulatory network connectivity (network "rewiring") manifest different regulatory programs in multiple cellular states. However, due to the dense and noisy nature of the connectivity in regulatory networks, directly comparing the gains and losses of targets of key TFs is not that informative. Thus, here, we seek a abstracted lower-dimensional representation to understand the main features of network change. In particular, we propose a method called TopicNet that applies latent Dirichlet allocation (LDA) to extract meaningful functional topics for a collection of genes regulated by a TF. We then define a rewiring score to quantify the large-scale changes in the regulatory network in terms of topic change for a TF. Using this framework, we can pinpoint particular TFs that change greatly in network connectivity between different cellular states. This is particularly relevant in oncogenesis. Also, incorporating gene-expression data, we define a topic activity score that gives the degree that a topic is active in a particular cellular state. Furthermore, we show how activity differences can highlight differential survival in certain cancers.
]]></description>
<dc:creator>Lou, S.</dc:creator>
<dc:creator>Li, T.</dc:creator>
<dc:creator>Kong, X.</dc:creator>
<dc:creator>Zhang, J.</dc:creator>
<dc:creator>Liu, J.</dc:creator>
<dc:creator>Lee, D.</dc:creator>
<dc:creator>Gerstein, M.</dc:creator>
<dc:date>2019-12-02</dc:date>
<dc:identifier>doi:10.1101/862177</dc:identifier>
<dc:title><![CDATA[TopicNet: a framework for measuring transcriptional regulatory network change]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-12-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.05.29.124164v1?rss=1">
<title>
<![CDATA[
DiNeR: a Differential Graphical Model for analysis of co-regulation Network Rewiring 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.05.29.124164v1?rss=1"
</link>
<description><![CDATA[
BackgroundDuring transcription, numerous transcription factors (TFs) bind to targets in a highly coordinated manner to control the gene expression. Alterations in groups of TF-binding profiles (i.e. "co-binding changes") can affect the co-regulating associations between TFs (i.e. "rewiring the co-regulator network"). This, in turn, can potentially drive downstream expression changes, phenotypic variation, and even disease. However, quantification of co-regulatory network rewiring has not been comprehensively studied.

MethodsTo address this, we propose DiNeR, a computational method to directly construct a differential TF co-regulation network from paired disease-to-normal ChIP-seq data. Specifically, DiNeR uses a graphical model to capture the gained and lost edges in the co-regulation network. Then, it adopts a stability-based, sparsity-tuning criterion -- by sub-sampling the complete binding profiles to remove spurious edges -- to report only significant co-regulation alterations. Finally, DiNeR highlights hubs in the resultant differential network as key TFs associated with disease.

ResultsWe assembled genome-wide binding profiles of 104 TFs in the K562 and GM12878 cell lines, which loosely model the transition between normal and cancerous states in chronic myeloid leukemia (CML). In total, we identified 351 significantly altered TF co-regulation pairs. In particular, we found that the co-binding of the tumor suppressor BRCA1 and RNA polymerase II, a well-known transcriptional pair in healthy cells, was disrupted in tumors. Thus, DiNeR successfully extracted hub regulators and discovered well-known risk genes.

ConclusionsOur method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators. Our method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators.
]]></description>
<dc:creator>Zhang, J.</dc:creator>
<dc:creator>Liu, J.</dc:creator>
<dc:creator>Lee, D.</dc:creator>
<dc:creator>Lou, S.</dc:creator>
<dc:creator>Chen, Z.</dc:creator>
<dc:creator>Gursoy, G.</dc:creator>
<dc:creator>Gerstein, M.</dc:creator>
<dc:date>2020-05-30</dc:date>
<dc:identifier>doi:10.1101/2020.05.29.124164</dc:identifier>
<dc:title><![CDATA[DiNeR: a Differential Graphical Model for analysis of co-regulation Network Rewiring]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-05-30</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.02.03.932251v1?rss=1">
<title>
<![CDATA[
Epigenome-based Splicing Prediction using a Recurrent Neural Network 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.02.03.932251v1?rss=1"
</link>
<description><![CDATA[
Alternative RNA splicing provides an important means to expand metazoan transcriptome diversity. Contrary to what was accepted previously, splicing is now thought to predominantly take place during transcription. Motivated by emerging data showing the physical proximity of the spliceosome to Pol II, we surveyed the effect of epigenetic context on co-transcriptional splicing. In particular, we observed that splicing factors were not necessarily enriched at exon junctions and that most epigenetic signatures had a distinctly asymmetric profile around known splice sites. Given this, we tried to build an interpretable model that mimics the physical layout of splicing regulation where the chromatin context progressively changes as the Pol II moves along the guide DNA. We used a recurrent-neural-network architecture to predict the inclusion of a spliced exon based on adjacent epigenetic signals, and we showed that distinct spatio-temporal features of these signals were key determinants of model outcome, in addition to the actual nucleotide sequence of the guide DNA strand. After the model had been trained and tested (with >80% precision-recall curve metric), we explored the derived weights of the latent factors, finding they highlight the importance of the asymmetric time-direction of chromatin context during transcription.

Author SummaryIn humans, only about 2% of the genome is comprised of so-called coding regions and can give rise to protein products. However, the human transcriptome is much more diverse than the number of genes found in these coding regions. Each gene can give rise to multiple transcripts through a process during transcription called alternative splicing. There is a limited understanding of the regulation of splicing and the underlying splicing code that determines cell-type-specific splicing. Here, we studied epigenetic features that characterize splicing regulation in humans using a recurrent neural network model. Unlike feedforward neural networks, this method contains an internal memory state that learns from spatiotemporal patterns - like the context in language - from a sequence of genomic and epigenetic information, making it better suited for characterizing splicing. We demonstrated that our method improves the prediction of spicing outcomes compared to previous methods. Furthermore, we applied our method to 49 cell types in ENCODE to investigate splicing regulation and found that not only spatial but also temporal epigenomic context can influence splicing regulation during transcription.
]]></description>
<dc:creator>Lee, D.</dc:creator>
<dc:creator>Zhang, J.</dc:creator>
<dc:creator>Liu, J.</dc:creator>
<dc:creator>Gerstein, M.</dc:creator>
<dc:date>2020-02-03</dc:date>
<dc:identifier>doi:10.1101/2020.02.03.932251</dc:identifier>
<dc:title><![CDATA[Epigenome-based Splicing Prediction using a Recurrent Neural Network]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-02-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.06.14.150599v1?rss=1">
<title>
<![CDATA[
The changing mouse embryo transcriptome at whole tissue and single-cell resolution 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.06.14.150599v1?rss=1"
</link>
<description><![CDATA[
In mammalian embryogenesis differential gene expression gradually builds the identity and complexity of each tissue and organ system. We systematically quantified mouse polyA-RNA from embryo day E10.5 to birth, sampling 17 whole tissues, enhanced with single-cell measurements for the developing limb. The resulting developmental transcriptome is globally structured by dynamic cytodifferentiation, body-axis and cell-proliferation gene sets, characterized by their promoters transcription factor (TF) motif codes. We decomposed the tissue-level transcriptome using scRNA-seq and found that neurogenesis and haematopoiesis dominate at both the gene and cellular levels, jointly accounting for 1/3 of differential gene expression and over 40% of identified cell types. Integrating promoter sequence motifs with companion ENCODE epigenomic profiles identified a promoter de-repression mechanism unique to neuronal expression clusters and attributable to known and novel repressors. Focusing on the developing limb, scRNA-seq identified 25 known and candidate novel cell types, including progenitor and differentiating states with computationally inferred lineage relationships. We extracted cell type TF networks and complementary sets of candidate enhancer elements by de-convolving whole-tissue IDEAS epigenome chromatin state models. These ENCODE reference data, computed network components and IDEAS chromatin segmentations, are companion resources to the matching epigenomic developmental matrix, available for researchers to further mine and integrate.
]]></description>
<dc:creator>He, P.</dc:creator>
<dc:creator>Williams, B. A.</dc:creator>
<dc:creator>Trout, D.</dc:creator>
<dc:creator>Marinov, G. K.</dc:creator>
<dc:creator>Amrhein, H.</dc:creator>
<dc:creator>Berghella, L.</dc:creator>
<dc:creator>Goh, S.-T.</dc:creator>
<dc:creator>Plajzer-Frick, I.</dc:creator>
<dc:creator>Afzal, V.</dc:creator>
<dc:creator>Pennacchio, L. A.</dc:creator>
<dc:creator>Dickel, D. E.</dc:creator>
<dc:creator>Visel, A.</dc:creator>
<dc:creator>Ren, B.</dc:creator>
<dc:creator>Hardison, R. C.</dc:creator>
<dc:creator>Zhang, Y.</dc:creator>
<dc:creator>Wold, B. J.</dc:creator>
<dc:date>2020-06-14</dc:date>
<dc:identifier>doi:10.1101/2020.06.14.150599</dc:identifier>
<dc:title><![CDATA[The changing mouse embryo transcriptome at whole tissue and single-cell resolution]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-06-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.06.26.172718v1?rss=1">
<title>
<![CDATA[
Atlas and developmental dynamics of mouse DNase I hypersensitive sites 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.06.26.172718v1?rss=1"
</link>
<description><![CDATA[
Early mammalian development is orchestrated by genome-encoded regulatory elements populated by a changing complement of regulatory factors, creating a dynamic chromatin landscape. To define the spatiotemporal organization of regulatory DNA landscapes during mouse development and maturation, we generated nucleotide-resolution DNA accessibility maps from 15 tissues sampled at 9 intervals spanning post-conception day 9.5 through early adult, and integrated these with 41 adult-stage DNase-seq profiles to create a global atlas of mouse regulatory DNA. Collectively, we delineated >1.8 million DNase I hypersensitive sites (DHSs), with the vast majority displaying temporal and tissue-selective patterning. Here we show that tissue regulatory DNA compartments show sharp embryonic-to-fetal transitions characterized by wholesale turnover of DHSs and progressive domination by a diminishing number of transcription factors. We show further that aligning mouse and human fetal development on a regulatory axis exposes disease-associated variation enriched in early intervals lacking human samples. Our results provide an expansive new resource for decoding mammalian developmental regulatory programs.
]]></description>
<dc:creator>Breeze, C. E.</dc:creator>
<dc:creator>Lazar, J.</dc:creator>
<dc:creator>Mercer, T.</dc:creator>
<dc:creator>Halow, J.</dc:creator>
<dc:creator>Washington, I.</dc:creator>
<dc:creator>Lee, K.</dc:creator>
<dc:creator>Ibarrientos, S.</dc:creator>
<dc:creator>Castillo, A.</dc:creator>
<dc:creator>Neri, F.</dc:creator>
<dc:creator>Haugen, E.</dc:creator>
<dc:creator>Rynes, E.</dc:creator>
<dc:creator>Reynolds, A.</dc:creator>
<dc:creator>Bates, D.</dc:creator>
<dc:creator>Diegel, M.</dc:creator>
<dc:creator>Dunn, D.</dc:creator>
<dc:creator>Kaul, R.</dc:creator>
<dc:creator>Sandstrom, R.</dc:creator>
<dc:creator>Meuleman, W.</dc:creator>
<dc:creator>Bender, M. A.</dc:creator>
<dc:creator>Groudine, M.</dc:creator>
<dc:creator>Stamatoyannopoulos, J. A.</dc:creator>
<dc:date>2020-06-27</dc:date>
<dc:identifier>doi:10.1101/2020.06.26.172718</dc:identifier>
<dc:title><![CDATA[Atlas and developmental dynamics of mouse DNase I hypersensitive sites]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-06-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.07.02.185389v1?rss=1">
<title>
<![CDATA[
Loop extrusion model predicts CTCF interaction specificity 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.07.02.185389v1?rss=1"
</link>
<description><![CDATA[
Three-dimensional chromatin looping interactions play an important role in constraining enhancer-promoter interactions and mediating transcriptional gene regulation. CTCF is thought to play a critical role in the formation of these loops, but the specificity of which CTCF binding events form loops and which do not is difficult to predict. Loops often have convergent CTCF binding site motif orientation, but this constraint alone is only weakly predictive of genome-wide interaction data. Here we present an easily interpretable and simple mathematical model of CTCF mediated loop formation which is consistent with Cohesin extrusion and can predict ChIA-PET CTCF looping interaction measurements with high accuracy. Competition between overlapping loops is a critical determinant of loop specificity. We show that this model is consistent with observed chromatin interaction frequency changes induced by CTCF binding site deletion, inversion, and mutation, and is also consistent with observed constraints on validated enhancer-promoter interactions.
]]></description>
<dc:creator>Xi, W.</dc:creator>
<dc:creator>Beer, M. A.</dc:creator>
<dc:date>2020-07-03</dc:date>
<dc:identifier>doi:10.1101/2020.07.02.185389</dc:identifier>
<dc:title><![CDATA[Loop extrusion model predicts CTCF interaction specificity]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2019.12.21.885830v1?rss=1">
<title>
<![CDATA[
Dissecting the regulatory activity and key sequence elements of loci with exceptional numbers of transcription factor associations 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2019.12.21.885830v1?rss=1"
</link>
<description><![CDATA[
DNA associated proteins (DAPs) classically regulate gene expression by binding to regulatory loci such as enhancers or promoters. As expanding catalogs of genome-wide DAP binding maps reveal thousands of loci that, unlike the majority of conventional enhancers and promoters, associate with dozens of different DAPs with apparently little regard for motif preference, an understanding of DAP association and coordination at such regulatory loci is essential to deciphering how these regions contribute to normal development and disease. In this study, we aggregated publicly available ChIP-seq data from 469 human DAPs assayed in three cell lines and integrated these data with an orthogonal dataset of 352 non-redundant, in vitro-derived motifs mapped to the genome within DNase hypersensitivity footprints in an effort to characterize regions of the genome that have exceptionally high numbers of DAP associations. We subsequently performed a massively parallel mutagenesis assay to search for sequence elements driving transcriptional activity at such loci and explored plausible biological mechanisms underlying their formation. We establish a generalizable definition for High Occupancy Target (HOT) loci and identify putative driver DAP motifs in HEPG2 cells, including HNF4A, SP1, SP5, and ETV4, that are highly prevalent and exhibit sequence conservation at HOT loci. The number of different DAPs associated with an element is positively associated with evidence of regulatory activity and, by systematically mutating 245 HOT loci, we localized regulatory activity to a central core region that depends on the motif sequences of our previously nominated driver DAPs. In sum, this work leverages the increasingly large number of DAP motif and ChIP-seq data publicly available to explore how DAP associations contribute to genome-wide transcriptional regulation.
]]></description>
<dc:creator>Ramaker, R. C.</dc:creator>
<dc:creator>Hardigan, A. A.</dc:creator>
<dc:creator>Goh, S.-T.</dc:creator>
<dc:creator>Partridge, E. C.</dc:creator>
<dc:creator>Wold, B.</dc:creator>
<dc:creator>Cooper, S. J.</dc:creator>
<dc:creator>Myers, R. M.</dc:creator>
<dc:date>2019-12-23</dc:date>
<dc:identifier>doi:10.1101/2019.12.21.885830</dc:identifier>
<dc:title><![CDATA[Dissecting the regulatory activity and key sequence elements of loci with exceptional numbers of transcription factor associations]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-12-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.05.01.073296v1?rss=1">
<title>
<![CDATA[
Long-TUC-seq is a robust method for quantification of metabolically labeled full-length isoforms 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.05.01.073296v1?rss=1"
</link>
<description><![CDATA[
The steady state expression of each gene is the result of a dynamic transcription and degradation of that gene. While regular RNA-seq methods only measure steady state expression levels, RNA-seq of metabolically labeled RNA identifies transcripts that were transcribed during the window of metabolic labeling. Whereas short-read RNA sequencing can identify metabolically labeled RNA at the gene level, long-read sequencing provides much better resolution of isoform-level transcription. Here we combine thiouridine-to-cytosine conversion (TUC) with PacBio long-read sequencing to study the dynamics of mRNA transcription in the GM12878 cell line. We show that using long-TUC-seq, we can detect metabolically labeled mRNA of distinct isoforms more reliably than using short reads. Long-TUC-seq holds the promise of capturing isoform dynamics robustly and without the need for enrichment.
]]></description>
<dc:creator>Rahmanian, S.</dc:creator>
<dc:creator>Balderrama-Gutierrez, G.</dc:creator>
<dc:creator>Wyman, D.</dc:creator>
<dc:creator>McGill, C. J.</dc:creator>
<dc:creator>Nguyen, K.</dc:creator>
<dc:creator>Spitale, R.</dc:creator>
<dc:creator>Mortazavi, A.</dc:creator>
<dc:date>2020-05-02</dc:date>
<dc:identifier>doi:10.1101/2020.05.01.073296</dc:identifier>
<dc:title><![CDATA[Long-TUC-seq is a robust method for quantification of metabolically labeled full-length isoforms]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-05-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.06.09.143024v1?rss=1">
<title>
<![CDATA[
Swan: a library for the analysis and visualization of long-read transcriptomes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.06.09.143024v1?rss=1"
</link>
<description><![CDATA[
MotivationLong-read RNA-sequencing technologies such as PacBio and Oxford Nanopore have discovered an explosion of new transcript isoforms that are difficult to visually analyze using currently available tools. We introduce the Swan Python library, which is designed to analyze and visualize transcript models.

ResultsSwan finds 4,909 differentially expressed transcripts between cell lines HepG2 and HFFc6, including 279 that are differentially expressed even though the parent gene is not. Additionally, Swan discovers 1,021 reproducible exon skipping and 73 intron retention events not recorded in the GENCODE v29 annotation.

AvailabilityThe Swan library for Python 3 is available on PyPi and on GitHub at https://pypi.org/project/swan-vis/1.0/ and https://github.com/mortazavilab/swan_paper.
]]></description>
<dc:creator>Reese, F.</dc:creator>
<dc:creator>Mortazavi, A.</dc:creator>
<dc:date>2020-06-10</dc:date>
<dc:identifier>doi:10.1101/2020.06.09.143024</dc:identifier>
<dc:title><![CDATA[Swan: a library for the analysis and visualization of long-read transcriptomes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-06-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.08.07.241901v1?rss=1">
<title>
<![CDATA[
Detecting regulatory elements in high-throughput reporter assays 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.08.07.241901v1?rss=1"
</link>
<description><![CDATA[
High-throughput reporter assays such as self-transcribing active regulatory region sequencing (STARR-seq) have made it possible to measure genome-wide regulatory element activity across the human genome. The assays, however, also present substantial analytical challenges. Here, we identify technical biases that explain most of the variance in STARR-seq signals. We then develop a statistical model to correct those biases and to improve detection of regulatory elements. This approach substantially improves precision and recall over current methods, improves detection of both activating and repressive regulatory elements, and controls for false discoveries despite strong local signal correlations.
]]></description>
<dc:creator>Kim, Y.-S.</dc:creator>
<dc:creator>Jonhson, G. D.</dc:creator>
<dc:creator>Seo, J.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>Majoros, W. H.</dc:creator>
<dc:creator>Ochoa, A.</dc:creator>
<dc:creator>Allen, A. S.</dc:creator>
<dc:creator>Reddy, T. E.</dc:creator>
<dc:date>2020-08-07</dc:date>
<dc:identifier>doi:10.1101/2020.08.07.241901</dc:identifier>
<dc:title><![CDATA[Detecting regulatory elements in high-throughput reporter assays]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-08-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.02.21.959510v1?rss=1">
<title>
<![CDATA[
In silico integration of thousands of epigenetic datasets into 707 cell type regulatory annotations improves the trans-ethnic portability of polygenic risk scores 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.02.21.959510v1?rss=1"
</link>
<description><![CDATA[
Poor trans-ethnic portability of polygenic risk score (PRS) models is a critical issue that may be partially due to limited knowledge of causal variants shared among populations. Hence, leveraging noncoding regulatory annotations that capture genetic variation across populations has the potential to enhance the trans-ethnic portability of PRS. To this end, we constructed a unique resource of 707 cell-type-specific IMPACT regulatory annotations by aggregating 5,345 public epigenetic datasets to predict binding patterns of 142 cell-type-regulating transcription factors across 245 cell types. With this resource, we partitioned the common SNP heritability of diverse polygenic traits and diseases from 111 GWAS summary statistics of European (EUR, average N=180K) and East Asian (EAS, average N=157K) origin. For 95 traits, we were able to identify a single IMPACT annotation most strongly enriched for trait heritability. Across traits, these annotations captured an average of 43.3% of heritability (se = 13.8%) with the top 5% of SNPs. Strikingly, we observed highly concordant polygenic trait regulation between populations: the same regulatory annotations captured statistically indistinguishable SNP heritability (fitted slope = 0.98, se = 0.04). Since IMPACT annotations capture both large and consistent proportions of heritability across populations, prioritizing variants in IMPACT regulatory elements may improve the trans-ethnic portability of PRS. Indeed, we observed that EUR PRS models more accurately predicted 21 tested phenotypes of EAS individuals when variants were prioritized by key IMPACT tracks (49.9% mean relative increase in R2). Notably, the improvement afforded by IMPACT was greater in the trans-ethnic EUR-to-EAS PRS application than in the EAS-to-EAS application (47.3% vs 20.9%, P < 1.7e-4). Overall, our study identifies a crucial role for functional annotations such as IMPACT to improve the trans-ethnic portability of genetic data, and this has important implications for future risk prediction models that work across populations.
]]></description>
<dc:creator>Amariuta, T.</dc:creator>
<dc:creator>Ishigaki, K.</dc:creator>
<dc:creator>Sugishita, H.</dc:creator>
<dc:creator>Ohta, T.</dc:creator>
<dc:creator>Matsuda, K.</dc:creator>
<dc:creator>Murakami, Y.</dc:creator>
<dc:creator>Price, A. L.</dc:creator>
<dc:creator>Kawakami, E.</dc:creator>
<dc:creator>Terao, C.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:date>2020-02-25</dc:date>
<dc:identifier>doi:10.1101/2020.02.21.959510</dc:identifier>
<dc:title><![CDATA[In silico integration of thousands of epigenetic datasets into 707 cell type regulatory annotations improves the trans-ethnic portability of polygenic risk scores]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-02-25</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.09.02.279059v1?rss=1">
<title>
<![CDATA[
Unique contribution of enhancer-driven and master-regulator genes to autoimmune disease revealed using functionally informed SNP-to-gene linking strategies 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.09.02.279059v1?rss=1"
</link>
<description><![CDATA[
Gene regulation is known to play a fundamental role in human disease, but mechanisms of regulation vary greatly across genes. Here, we explore the contributions to disease of two types of genes: genes whose regulation is driven by enhancer regions as opposed to promoter regions (enhancer-related) and genes that regulate other genes in trans (candidate master-regulator). We link these genes to SNPs using a comprehensive set of SNP-to-gene (S2G) strategies and apply stratified LD score regression to the resulting SNP annotations to draw three main conclusions about 11 autoimmune diseases and blood cell traits (average Ncase=13K across 6 autoimmune diseases, average N =443K across 5 blood cell traits). First, several characterizations of enhancer-related genes defined in blood using functional genomics data (e.g. ATAC-seq, RNA-seq, PC-HiC) are conditionally informative for autoimmune disease heritability, after conditioning on a broad set of regulatory annotations from the baseline-LD model. Second, candidate master-regulator genes defined using trans-eQTL in blood are also conditionally informative for autoimmune disease heritability. Third, integrating enhancer-related and candidate master-regulator gene sets with protein-protein interaction (PPI) network information magnified their disease signal. The resulting PPI-enhancer gene score produced >2x stronger conditional signal (maximum standardized SNP annotation effect size ({tau}*) = 2.0 (s.e. 0.3) vs. 0.91 (s.e. 0.21)), and >2x stronger gene-level enrichment for approved autoimmune disease drug targets (5.3x vs. 2.1x), as compared to the recently proposed Enhancer Domain Score (EDS). In each case, using functionally informed S2G strategies to link genes to SNPs that may regulate them produced much stronger disease signals (4.1x-13x larger{tau} * values) than conventional window-based S2G strategies. We conclude that our characterizations of enhancer-related and candidate master-regulator genes identify gene sets that are important for autoimmune disease, and that combining those gene sets with functionally informed S2G strategies enables us to identify SNP annotations in which disease heritability is concentrated.
]]></description>
<dc:creator>Dey, K. K.</dc:creator>
<dc:creator>Gazal, S. K.</dc:creator>
<dc:creator>van de Geijn, B.</dc:creator>
<dc:creator>Kim, S. S.</dc:creator>
<dc:creator>Nasser, J.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:creator>Price, A.</dc:creator>
<dc:date>2020-09-03</dc:date>
<dc:identifier>doi:10.1101/2020.09.02.279059</dc:identifier>
<dc:title><![CDATA[Unique contribution of enhancer-driven and master-regulator genes to autoimmune disease revealed using functionally informed SNP-to-gene linking strategies]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-09-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.09.08.288563v1?rss=1">
<title>
<![CDATA[
Integrative approaches to improve the informativeness of deep learning models for human complex diseases 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.09.08.288563v1?rss=1"
</link>
<description><![CDATA[
Deep learning models have achieved great success in predicting genome-wide regulatory effects from DNA sequence, but recent work has reported that SNP annotations derived from these predictions contribute limited unique information for human complex disease. Here, we explore three integrative approaches to improve the disease informativeness of allelic-effect annotations (predicted difference between reference and variant alleles) constructed using several previously trained deep learning models: DeepSEA, Basenji and DeepBind (and a related machine learning model, deltaSVM). First, we employ gradient boosting to learn optimal combinations of deep learning annotations, using fine-mapped SNPs and matched control SNPs (on held-out chromosomes) for training. Second, we improve the specificity of these annotations by restricting them to SNPs implicated by (proximal and distal) SNP-to-gene (S2G) linking strategies, e.g. prioritizing SNPs involved in gene regulation. Third, we predict gene expression (and derive allelic-effect annotations) from deep learning annotations at SNPs implicated by S2G linking strategies -- generalizing the previously proposed ExPecto approach, which incorporates deep learning annotations based on distance to TSS. We evaluated these approaches using stratified LD score regression, using functional data in blood and focusing on 11 autoimmune diseases and blood-related traits (average N =306K). We determined that the three approaches produced SNP annotations that were uniquely informative for these diseases/traits, despite the fact that linear combinations of the underlying DeepSEA, Basenji, DeepBind and deltaSVM blood annotations were not uniquely informative for these diseases/traits. Our results highlight the benefits of integrating SNP annotations produced by deep learning models with other types of data, including data linking SNPs to genes.
]]></description>
<dc:creator>Dey, K. K.</dc:creator>
<dc:creator>Kim, S. S.</dc:creator>
<dc:creator>Gazal, S.</dc:creator>
<dc:creator>Nasser, J.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:creator>Price, A.</dc:creator>
<dc:date>2020-09-09</dc:date>
<dc:identifier>doi:10.1101/2020.09.08.288563</dc:identifier>
<dc:title><![CDATA[Integrative approaches to improve the informativeness of deep learning models for human complex diseases]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-09-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.11.02.364869v1?rss=1">
<title>
<![CDATA[
Genetic and Epigenetic Features of Promoters with Ubiquitous Chromatin Accessibility Support Ubiquitous Transcription of Cell-essential Genes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.11.02.364869v1?rss=1"
</link>
<description><![CDATA[
Gene expression is controlled by regulatory elements with accessible chromatin. Although the majority of regulatory elements are cell type-specific, being in the open chromatin state in only one or a few cell types, approximately 16,000 regions in the human genome and 13,000 regions in the mouse genome are in the open chromatin state in nearly all of the 517 human and 94 mouse cell and tissue types assayed by the ENCODE consortium, respectively. We performed a systematic analysis on the subset of 9,000 human and 8,000 mouse ubiquitously (ubi) open chromatin regions that were also classified as candidate cis-regulatory elements (cCREs) with promoter-like signatures (PLSs) by the ENCODE consortium, which we refer to as ubi-PLSs. We found that these ubi-PLSs had higher levels of CG dinucleotides and corresponded to the genes with ubiquitously high levels of transcriptional activities. Furthermore, the transcription start sites of a vast majority of cell-essential genes are located in ubi-PLSs. ubi-PLSs are enriched in the motifs of ubiquitously expressed transcription factors and preferentially bound by transcriptional cofactors that regulate ubiquitously expressed genes. Finally, ubi-PLSs are highly conserved between human and mouse at the synteny level, but not as conserved at the sequence level, with a high turnover of transcription factor motif sites. Thus, there is a distinct set of roughly 9,000 promoters in the mammalian genome that are actively maintained in the open chromatin state in nearly all cell types to ensure the transcriptional program of cell-essential genes.
]]></description>
<dc:creator>Fan, K.</dc:creator>
<dc:creator>Moore, J. E.</dc:creator>
<dc:creator>Zhang, X.-o.</dc:creator>
<dc:creator>Weng, Z.</dc:creator>
<dc:date>2020-11-02</dc:date>
<dc:identifier>doi:10.1101/2020.11.02.364869</dc:identifier>
<dc:title><![CDATA[Genetic and Epigenetic Features of Promoters with Ubiquitous Chromatin Accessibility Support Ubiquitous Transcription of Cell-essential Genes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-11-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.10.11.335273v1?rss=1">
<title>
<![CDATA[
HiC-DC+: systematic 3D interaction calls and differential analysis for Hi-C and HiChIP 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.10.11.335273v1?rss=1"
</link>
<description><![CDATA[
We present HiC-DC+, a software tool for Hi-C/Hi-ChIP interaction calling and differential analysis using an efficient implementation of the HiC-DC statistical framework. HiC-DC+ integrates with popular preprocessing and visualization tools, includes TAD and A/B compartment callers, and outperformed existing tools in H3K27ac HiChIP benchmarking as validated by CRISPRi-FlowFISH. Differential HiC-DC+ analysis recovered global principles of 3D organization during cohesin perturbation and differentiation, including TAD aggregation, enhancer hubs, and promoter-enhancer loop dynamics.
]]></description>
<dc:creator>Sahin, M.</dc:creator>
<dc:creator>Wong, W.</dc:creator>
<dc:creator>Zhan, Y.</dc:creator>
<dc:creator>Van Deynze, K.</dc:creator>
<dc:creator>Koche, R.</dc:creator>
<dc:creator>Leslie, C. S.</dc:creator>
<dc:date>2020-10-11</dc:date>
<dc:identifier>doi:10.1101/2020.10.11.335273</dc:identifier>
<dc:title><![CDATA[HiC-DC+: systematic 3D interaction calls and differential analysis for Hi-C and HiChIP]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-10-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.11.14.382606v1?rss=1">
<title>
<![CDATA[
Genome-wide Identification of the Genetic Basis of Amyotrophic Lateral Sclerosis 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.11.14.382606v1?rss=1"
</link>
<description><![CDATA[
Amyotrophic lateral sclerosis (ALS) is an archetypal complex disease centered on progressive death of motor neurons. Despite heritability estimates of 52%, GWAS studies have discovered only seven genome-wide significant hits, which are relevant to <10% of ALS patients. To increase the power of gene discovery, we integrated motor neuron functional genomics with ALS genetics in a hierarchical Bayesian model called RefMap. Comprehensive transcriptomic and epigenetic profiling of iPSC-derived motor neurons enabled RefMap to systematically fine-map genes and pathways associated with ALS. As a significant extension of the known genetic architecture of ALS, we identified a group of 690 candidate ALS genes, which is enriched with previously discovered risk genes. Extensive conservation, transcriptome and network analyses demonstrated the functional significance of these candidate genes in motor neurons and disease progression. In particular, we observed a genetic convergence on the distal axon, which supports the prevailing view of ALS as a distal axonopathy. Of the new ALS genes we discovered, we further characterized KANK1 that is enriched with coding and noncoding, common and rare ALS-associated genetic variation. Modelling patient mutations in human neurons reduced KANK1 expression and produced neurotoxicity with disruption of the distal axon. RefMap can be applied broadly to increase the discovery power in genetic association studies of human complex traits and diseases.
]]></description>
<dc:creator>Zhang, S.</dc:creator>
<dc:creator>Cooper-Knock, J.</dc:creator>
<dc:creator>Weimer, A. K.</dc:creator>
<dc:creator>Shi, M.</dc:creator>
<dc:creator>Moll, T.</dc:creator>
<dc:creator>Harvey, C.</dc:creator>
<dc:creator>Nezhad, H. G.</dc:creator>
<dc:creator>Franklin, J.</dc:creator>
<dc:creator>Souza, C. d. S.</dc:creator>
<dc:creator>Wang, C.</dc:creator>
<dc:creator>Li, J.</dc:creator>
<dc:creator>Eitan, C.</dc:creator>
<dc:creator>Hornstein, E.</dc:creator>
<dc:creator>Kenna, K. P.</dc:creator>
<dc:creator>Project MinE Sequencing Consortium,</dc:creator>
<dc:creator>Veldink, J.</dc:creator>
<dc:creator>Ferraiuolo, L.</dc:creator>
<dc:creator>Shaw, P. J.</dc:creator>
<dc:creator>Snyder, M. P.</dc:creator>
<dc:date>2020-11-15</dc:date>
<dc:identifier>doi:10.1101/2020.11.14.382606</dc:identifier>
<dc:title><![CDATA[Genome-wide Identification of the Genetic Basis of Amyotrophic Lateral Sclerosis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-11-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/684712v1?rss=1">
<title>
<![CDATA[
H3K27me3-rich genomic regions can function as silencers to repress gene expression via chromatin interactions 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/684712v1?rss=1"
</link>
<description><![CDATA[
Gene repression and silencers are poorly understood. We reasoned that H3K27me3-rich regions (MRRs) of the genome defined from clusters of H3K27me3 peaks may be used to identify silencers that can regulate gene expression via proximity or looping. MRRs were associated with chromatin interactions and interact preferentially with each other. MRR component removal at interaction anchors by CRISPR led to upregulation of interacting target genes, altered H3K27me3 and H3K27ac levels at interacting regions, and altered chromatin interactions. Chromatin interactions did not change at regions with high H3K27me3, but regions with low H3K27me3 and high H3K27ac levels showed changes in chromatin interactions. The MRR knockout cells also showed changes in phenotype associated with cell identity, and altered xenograft tumor growth. MRR-associated genes and long-range chromatin interactions were susceptible to H3K27me3 depletion. Our results characterized H3K27me3-rich regions and their mechanisms of functioning via looping.
]]></description>
<dc:creator>Cai, Y.</dc:creator>
<dc:creator>Zhang, Y.</dc:creator>
<dc:creator>Loh, Y. P.</dc:creator>
<dc:creator>Tng, J. Q.</dc:creator>
<dc:creator>Lim, M. C.</dc:creator>
<dc:creator>Cao, Z.</dc:creator>
<dc:creator>Raju, A.</dc:creator>
<dc:creator>Li, S.</dc:creator>
<dc:creator>Manikandan, L.</dc:creator>
<dc:creator>Tergaonkar, V.</dc:creator>
<dc:creator>Tucker-Kellogg, G.</dc:creator>
<dc:creator>Fullwood, M. J.</dc:creator>
<dc:date>2019-06-28</dc:date>
<dc:identifier>doi:10.1101/684712</dc:identifier>
<dc:title><![CDATA[H3K27me3-rich genomic regions can function as silencers to repress gene expression via chromatin interactions]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-06-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.08.05.238360v1?rss=1">
<title>
<![CDATA[
IFN-γ and TNF-α drive a CXCL10+ CCL2+ macrophage phenotype expanded in severe COVID-19 and other diseases with tissue inflammation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.08.05.238360v1?rss=1"
</link>
<description><![CDATA[
Immunosuppressive and anti-cytokine treatment may have a protective effect for patients with COVID-19. Understanding the immune cell states shared between COVID-19 and other inflammatory diseases with established therapies may help nominate immunomodulatory therapies. Using an integrative strategy, we built a reference by meta-analyzing > 300,000 immune cells from COVID-19 and 5 inflammatory diseases including rheumatoid arthritis (RA), Crohns disease (CD), ulcerative colitis (UC), lupus, and interstitial lung disease. Our cross-disease analysis revealed that an FCN1+ inflammatory macrophage state is common to COVID-19 bronchoalveolar lavage samples, RA synovium, CD ileum, and UC colon. We also observed that a CXCL10+ CCL2+ inflammatory macrophage state is abundant in severe COVID-19, inflamed CD and RA, and expresses inflammatory genes such as GBP1, STAT1, and IL1B. We found that the CXCL10+ CCL2+ macrophages are transcriptionally similar to blood-derived macrophages stimulated with TNF- and IFN-{gamma} ex vivo. Our findings suggest that IFN-{gamma}, alongside TNF-, might be a key driver of this abundant inflammatory macrophage phenotype in severe COVID-19 and other inflammatory diseases, which may be targeted by existing immunomodulatory therapies.
]]></description>
<dc:creator>Zhang, F.</dc:creator>
<dc:creator>Mears, J. R.</dc:creator>
<dc:creator>Shakib, L.</dc:creator>
<dc:creator>Beynor, J. I.</dc:creator>
<dc:creator>Shanaj, S.</dc:creator>
<dc:creator>Korsunsky, I.</dc:creator>
<dc:creator>Nathan, A.</dc:creator>
<dc:creator>Accelerating Medicines Partnership Rheumatoid Arthritis and Systemic Lupus Erythematosus,</dc:creator>
<dc:creator>Donlin, L. T.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:date>2020-08-05</dc:date>
<dc:identifier>doi:10.1101/2020.08.05.238360</dc:identifier>
<dc:title><![CDATA[IFN-γ and TNF-α drive a CXCL10+ CCL2+ macrophage phenotype expanded in severe COVID-19 and other diseases with tissue inflammation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-08-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.11.18.389189v1?rss=1">
<title>
<![CDATA[
Efficient and precise single-cell reference atlas mapping with Symphony 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.11.18.389189v1?rss=1"
</link>
<description><![CDATA[
Recent advances in single-cell technologies and integration algorithms make it possible to construct comprehensive reference atlases encompassing many donors, studies, disease states, and sequencing platforms. Much like mapping sequencing reads to a reference genome, it is essential to be able to map query cells onto complex, multimillion-cell reference atlases to rapidly identify relevant cell states and phenotypes. We present Symphony (https://github.com/immunogenomics/symphony), an algorithm for building integrated reference atlases of millions of cells in a convenient, portable format that enables efficient query mapping within seconds. Symphony localizes query cells within a stable low-dimensional reference embedding, facilitating reproducible downstream transfer of reference-defined annotations to the query. We demonstrate the power of Symphony by (1) mapping a multi-donor, multi-species query to predict pancreatic cell types, (2) localizing query cells along a developmental trajectory of human fetal liver hematopoiesis, and (3) inferring surface protein expression with a multimodal CITE-seq atlas of memory T cells.
]]></description>
<dc:creator>Kang, J. B.</dc:creator>
<dc:creator>Nathan, A.</dc:creator>
<dc:creator>Millard, N.</dc:creator>
<dc:creator>Rumker, L.</dc:creator>
<dc:creator>Moody, D. B.</dc:creator>
<dc:creator>Korsunsky, I.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:date>2020-11-20</dc:date>
<dc:identifier>doi:10.1101/2020.11.18.389189</dc:identifier>
<dc:title><![CDATA[Efficient and precise single-cell reference atlas mapping with Symphony]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-11-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.11.23.390682v1?rss=1">
<title>
<![CDATA[
Maximizing statistical power to detect clinically associated cell states with scPOST 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.11.23.390682v1?rss=1"
</link>
<description><![CDATA[
As advances in single-cell technologies enable the unbiased assay of thousands of cells simultaneously, human disease studies are able to identify clinically associated cell states using case-control study designs. These studies require precious clinical samples and costly technologies; therefore, it is critical to employ study design principles that maximize power to detect cell state frequency shifts between conditions, such as disease versus healthy. Here, we present single-cell Power Simulation Tool (scPOST), a method that enables users to estimate power under different study designs. To approximate the specific experimental and clinical scenarios being investigated, scPOST takes prototype (public or pilot) single-cell data as input and generates large numbers of single-cell datasets in silico. We use scPOST to perform power analyses on three independent single-cell datasets that span diverse experimental conditions: a batch-corrected 21-sample rheumatoid arthritis dataset (5,265 cells) from synovial tissue, a 259-sample tuberculosis progression dataset (496,517 memory T cells) from peripheral blood mononuclear cells (PBMCs), and a 30-sample ulcerative colitis dataset (235,229 cells) from intestinal biopsies. Over thousands of simulations, we consistently observe that power to detect frequency shifts in cell states is maximized by larger numbers of independent clinical samples, reduced batch effects, and smaller variation in a cell states frequency across samples.
]]></description>
<dc:creator>Millard, N.</dc:creator>
<dc:creator>Korsunsky, I.</dc:creator>
<dc:creator>Weinand, K.</dc:creator>
<dc:creator>Fonseka, C. Y.</dc:creator>
<dc:creator>Nathan, A.</dc:creator>
<dc:creator>Kang, J. B.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:date>2020-11-23</dc:date>
<dc:identifier>doi:10.1101/2020.11.23.390682</dc:identifier>
<dc:title><![CDATA[Maximizing statistical power to detect clinically associated cell states with scPOST]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-11-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.04.23.057828v1?rss=1">
<title>
<![CDATA[
Multimodal memory T cell profiling identifies a reduction in a polyfunctional Th17 state associated with tuberculosis progression 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.04.23.057828v1?rss=1"
</link>
<description><![CDATA[
Mycobacterium tuberculosis (M.tb) results in 10 million active tuberculosis (TB) cases and 1.5 million deaths each year1, making it the worlds leading infectious cause of death2. Infection leads to either an asymptomatic latent state or TB disease. Memory T cells have been implicated in TB disease progression, but the specific cell states involved have not yet been delineated because of the limited scope of traditional profiling strategies. Furthermore, immune activation during infection confounds underlying differences in T cell state distributions that influence risk of progression. Here, we used a multimodal single-cell approach to integrate measurements of transcripts and 30 functionally relevant surface proteins to comprehensively define the memory T cell landscape at steady state (i.e., outside of active infection). We profiled 500,000 memory T cells from 259 Peruvians > 4.7 years after they had either latent M.tb infection or active disease and defined 31 distinct memory T cell states, including a CD4+CD26+CD161+CCR6+ effector memory state that was significantly reduced in patients who had developed active TB (OR = 0.80, 95% CI: 0.73-0.87, p = 1.21 x 10-6). This state was also polyfunctional; in ex vivo stimulation, it was enriched for IL-17 and IL-22 production, consistent with a Th17-skewed phenotype, but also had more capacity to produce IFN{gamma} than other CD161+CCR6+ Th17 cells. Additionally, in progressors, IL-17 and IL-22 production in this cell state was significantly lower than in non-progressors. Reduced abundance and function of this state may be an important factor in failure to control M.tb infection.
]]></description>
<dc:creator>Nathan, A.</dc:creator>
<dc:creator>Beynor, J. I.</dc:creator>
<dc:creator>Baglaenko, Y.</dc:creator>
<dc:creator>Suliman, S.</dc:creator>
<dc:creator>Ishigaki, K.</dc:creator>
<dc:creator>Asgari, S.</dc:creator>
<dc:creator>Huang, C.-C.</dc:creator>
<dc:creator>Luo, Y.</dc:creator>
<dc:creator>Zhang, Z.</dc:creator>
<dc:creator>Lopez Tamara, K.</dc:creator>
<dc:creator>Jimenez, J.</dc:creator>
<dc:creator>Calderon, R. I.</dc:creator>
<dc:creator>Lecca, L.</dc:creator>
<dc:creator>van Rhijn, I.</dc:creator>
<dc:creator>Moody, B.</dc:creator>
<dc:creator>Murray, M. B.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:date>2020-04-25</dc:date>
<dc:identifier>doi:10.1101/2020.04.23.057828</dc:identifier>
<dc:title><![CDATA[Multimodal memory T cell profiling identifies a reduction in a polyfunctional Th17 state associated with tuberculosis progression]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-04-25</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/173088v1?rss=1">
<title>
<![CDATA[
De Novo Prediction of Human Chromosome Structures: Epigenetic Marking Patterns Encode Genome Architecture 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/173088v1?rss=1"
</link>
<description><![CDATA[
Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data derived from ChIP-Seq. We exploit the idea that chromosomes encode a one-dimensional sequence of chromatin structural types. Interactions between these chromatin types determine the three-dimensional (3D) structural ensemble of chromosomes through a process similar to phase separation. First, a recurrent neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization (MiChroM) in order to generate an ensemble of 3D chromosome conformations. After training the model, dubbed MEGABASE (Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles), on odd numbered chromosomes, we predict the chromatin type sequences and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps as well as distances measured using 3D FISH experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible.
]]></description>
<dc:creator>Di Pierro, M.</dc:creator>
<dc:creator>Cheng, R. R.</dc:creator>
<dc:creator>Lieberman Aiden, E.</dc:creator>
<dc:creator>Wolynes, P. G.</dc:creator>
<dc:creator>Onuchic, J. N.</dc:creator>
<dc:date>2017-08-07</dc:date>
<dc:identifier>doi:10.1101/173088</dc:identifier>
<dc:title><![CDATA[De Novo Prediction of Human Chromosome Structures: Epigenetic Marking Patterns Encode Genome Architecture]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2017-08-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/374058v1?rss=1">
<title>
<![CDATA[
Walking along chromosomes with super-resolution imaging, contact maps, and integrative modeling 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/374058v1?rss=1"
</link>
<description><![CDATA[
Chromosome structure is thought to be crucial for proper functioning of the nucleus. Here, we present a method for visualizing chromosomal DNA at super-resolution and then integrating Hi-C data to produce three-dimensional models of chromosome organization. We begin by applying Oligopaint probes and the single-molecule localization microscopy methods of OligoSTORM and OligoDNA-PAINT to image 8 megabases of human chromosome 19, discovering that chromosomal regions contributing to compartments can form distinct structures. Intriguingly, our data also suggest that homologous maternal and paternal regions may be differentially organized. Finally, we integrate imaging data with Hi-C and restraint-based modeling using a method called integrative modeling of genomic regions (IMGR) to increase the genomic resolution of our traces to 10 kb.nnOne Sentence SummarySuper-resolution genome tracing, contact maps, and integrative modeling enable 10 kb resolution glimpses of chromosome folding.
]]></description>
<dc:creator>Nir, G.</dc:creator>
<dc:creator>Farabella, I.</dc:creator>
<dc:creator>Perez Estrada, C.</dc:creator>
<dc:creator>Ebeling, C. G.</dc:creator>
<dc:creator>Beliveau, B. J.</dc:creator>
<dc:creator>Sasaki, H. M.</dc:creator>
<dc:creator>Lee, S. H.</dc:creator>
<dc:creator>Nguyen, S. C.</dc:creator>
<dc:creator>McCole, R. B.</dc:creator>
<dc:creator>Chattoraj, S.</dc:creator>
<dc:creator>Erceg, J.</dc:creator>
<dc:creator>AlHaj Abed, J.</dc:creator>
<dc:creator>Martins, N. M. C.</dc:creator>
<dc:creator>Nguyen, H. Q.</dc:creator>
<dc:creator>Hannan, M. A.</dc:creator>
<dc:creator>Russell, S.</dc:creator>
<dc:creator>Durand, N. C.</dc:creator>
<dc:creator>Rao, S. S. P.</dc:creator>
<dc:creator>Kishi, J. Y.</dc:creator>
<dc:creator>Soler-Vila, P.</dc:creator>
<dc:creator>Di Pierro, M.</dc:creator>
<dc:creator>Onuchic, J. N.</dc:creator>
<dc:creator>Callahan, S.</dc:creator>
<dc:creator>Schreiner, J.</dc:creator>
<dc:creator>Stuckey, J.</dc:creator>
<dc:creator>Yin, P.</dc:creator>
<dc:creator>Lieberman Aiden, E.</dc:creator>
<dc:creator>Marti-Renom, M. A.</dc:creator>
<dc:creator>Wu, C.- t.</dc:creator>
<dc:date>2018-07-28</dc:date>
<dc:identifier>doi:10.1101/374058</dc:identifier>
<dc:title><![CDATA[Walking along chromosomes with super-resolution imaging, contact maps, and integrative modeling]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2018-07-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/529990v1?rss=1">
<title>
<![CDATA[
Activity-by-Contact model of enhancer specificity from thousands of CRISPR perturbations 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/529990v1?rss=1"
</link>
<description><![CDATA[
Mammalian genomes harbor millions of noncoding elements called enhancers that quantitatively regulate gene expression, but it remains unclear which enhancers regulate which genes. Here we describe an experimental approach, based on CRISPR interference, RNA FISH, and flow cytometry (CRISPRi-FlowFISH), to perturb enhancers in the genome, and apply it to test >3,000 potential regulatory enhancer-gene connections across multiple genomic loci. A simple equation based on a mechanistic model for enhancer function performed remarkably well at predicting the complex patterns of regulatory connections we observe in our CRISPR dataset. This Activity-by-Contact (ABC) model involves multiplying measures of enhancer activity and enhancer-promoter 3D contacts, and can predict enhancer-gene connections in a given cell type based on chromatin state maps. Together, CRISPRi-FlowFISH and the ABC model provide a systematic approach to map and predict which enhancers regulate which genes, and will help to interpret the functions of the thousands of disease risk variants in the noncoding genome.
]]></description>
<dc:creator>Fulco, C. P.</dc:creator>
<dc:creator>Nasser, J.</dc:creator>
<dc:creator>Jones, T. R.</dc:creator>
<dc:creator>Munson, G.</dc:creator>
<dc:creator>Bergman, D. T.</dc:creator>
<dc:creator>Subramanian, V.</dc:creator>
<dc:creator>Grossman, S. R.</dc:creator>
<dc:creator>Anyoha, R.</dc:creator>
<dc:creator>Patwardhan, T. A.</dc:creator>
<dc:creator>Nguyen, T. H.</dc:creator>
<dc:creator>Kane, M.</dc:creator>
<dc:creator>Doughty, B.</dc:creator>
<dc:creator>Perez, E. M.</dc:creator>
<dc:creator>Durand, N. C.</dc:creator>
<dc:creator>Stamenova, E. K.</dc:creator>
<dc:creator>Lieberman Aiden, E.</dc:creator>
<dc:creator>Lander, E. S.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:date>2019-01-26</dc:date>
<dc:identifier>doi:10.1101/529990</dc:identifier>
<dc:title><![CDATA[Activity-by-Contact model of enhancer specificity from thousands of CRISPR perturbations]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-01-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.03.21.001917v1?rss=1">
<title>
<![CDATA[
Exploring Chromosomal Structural Heterogeneity AcrossMultiple Cell Lines 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.03.21.001917v1?rss=1"
</link>
<description><![CDATA[
We study the structural ensembles of human chromosomes across different cell types. Using computer simulations, we generate cell-specific 3D chromosomal structures and compare them to recently published chromatin structures obtained through microscopy. We demonstrate using a combination of machine learning and polymer physics simulations that epigenetic information can be used to predict the structural ensembles of multiple human cell lines. The chromosomal structures obtained in silico are quantitatively consistent with those obtained through microscopy as well as DNA-DNA proximity ligation assays. Theory predicts that chromosome structures are fluid and can only be described by an ensemble, which is consistent with the observation that chromosomes exhibit no unique fold. Nevertheless, our analysis of both structures from simulation and microscopy reveals that short segments of chromatin make transitions between a closed conformation and an open dumbbell conformation. This conformational transition appears to be consistent with a two-state process with an effective free energy cost of about four times the effective information theoretic temperature. Finally, we study the conformational changes associated with the switching of genomic compartments observed in human cell lines. Genetically identical but epigenetically distinct cell types appear to rearrange their respective structural ensembles to expose segments of transcriptionally active chromatin, belonging to the A genomic compartment, towards the surface of the chromosome, while inactive segments, belonging to the B compartment, move to the interior. The formation of genomic compartments resembles hydrophobic collapse in protein folding, with the aggregation of denser and predominantly inactive chromatin driving the positioning of active chromatin toward the surface of individual chromosomal territories.
]]></description>
<dc:creator>Cheng, R. R.</dc:creator>
<dc:creator>Contessoto, V.</dc:creator>
<dc:creator>Aiden, E. L.</dc:creator>
<dc:creator>Wolynes, P. G.</dc:creator>
<dc:creator>Di Pierro, M.</dc:creator>
<dc:creator>Onuchic, J. N.</dc:creator>
<dc:date>2020-03-22</dc:date>
<dc:identifier>doi:10.1101/2020.03.21.001917</dc:identifier>
<dc:title><![CDATA[Exploring Chromosomal Structural Heterogeneity AcrossMultiple Cell Lines]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-03-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/779058v1?rss=1">
<title>
<![CDATA[
ESCO1 and CTCF enable formation of long chromatin loops by protecting cohesinSTAG1 from WAPL 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/779058v1?rss=1"
</link>
<description><![CDATA[
Eukaryotic genomes are folded into loops. It is thought that these are formed by cohesin complexes via extrusion, either until loop expansion is arrested by CTCF or until cohesin is removed from DNA by WAPL. Although WAPL limits cohesins chromatin residence time to minutes, it has been reported that some loops exist for hours. How these loops can persist is unknown. We show that during G1-phase, mammalian cells contain acetylated cohesinSTAG1 which binds chromatin for hours, whereas cohesinSTAG2 binds chromatin for minutes. Our results indicate that CTCF and the acetyltransferase ESCO1 protect a subset of cohesinSTAG1 complexes from WAPL, thereby enable formation of long and presumably long-lived loops, and that ESCO1, like CTCF, contributes to boundary formation in chromatin looping. Our data are consistent with a model of nested loop extrusion, in which acetylated cohesinSTAG1 forms stable loops between CTCF sites, demarcating the boundaries of more transient cohesinSTAG2 extrusion activity.
]]></description>
<dc:creator>Wutz, G.</dc:creator>
<dc:creator>St. Hilaire, B. T. G.</dc:creator>
<dc:creator>Ladurner, R.</dc:creator>
<dc:creator>Stocsits, R.</dc:creator>
<dc:creator>Nagasaka, K.</dc:creator>
<dc:creator>Pignard, B.</dc:creator>
<dc:creator>Sanborn, A.</dc:creator>
<dc:creator>Tang, W.</dc:creator>
<dc:creator>Varnai, C.</dc:creator>
<dc:creator>Ivanov, M.</dc:creator>
<dc:creator>Schoenfelder, S.</dc:creator>
<dc:creator>van der Lelij, P.</dc:creator>
<dc:creator>Huang, X.</dc:creator>
<dc:creator>Duernberger, G.</dc:creator>
<dc:creator>Roitinger, E.</dc:creator>
<dc:creator>Mechtler, K.</dc:creator>
<dc:creator>Davidson, I. F.</dc:creator>
<dc:creator>Fraser, P.</dc:creator>
<dc:creator>Aiden, E. L.</dc:creator>
<dc:creator>Peters, J. M.</dc:creator>
<dc:date>2019-09-23</dc:date>
<dc:identifier>doi:10.1101/779058</dc:identifier>
<dc:title><![CDATA[ESCO1 and CTCF enable formation of long chromatin loops by protecting cohesinSTAG1 from WAPL]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-09-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/867341v1?rss=1">
<title>
<![CDATA[
Chromosomal-level genome assembly of the scimitar-horned oryx: insights into diversity and demography of a species extinct in the wild 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/867341v1?rss=1"
</link>
<description><![CDATA[
Captive populations provide a valuable insurance against extinctions in the wild. However, they are also vulnerable to the negative impacts of inbreeding, selection and drift. Genetic information is therefore considered a critical aspect of conservation management planning. Recent developments in sequencing technologies have the potential to improve the outcomes of management programmes however, the transfer of these approaches to applied conservation has been slow. The scimitar-horned oryx (Oryx dammah) is a North African antelope that has been extinct in the wild since the early 1980s and is the focus of a long-term reintroduction project. To enable the selection of suitable founder individuals, facilitate post-release monitoring and improve captive breeding management, comprehensive genomic resources are required. Here, we used 10X Chromium sequencing together with Hi-C contact mapping to develop a chromosomal-level genome assembly for the species. The resulting assembly contained 29 chromosomes with a scaffold N50 of 100.4 Mb, and displayed strong chromosomal synteny with the cattle genome. Using resequencing data from six additional individuals, we demonstrated relatively high genetic diversity in the scimitar-horned oryx compared to other mammals, despite it having experienced a strong founding event in captivity. Additionally, the level of diversity across populations varied according to management strategy. Finally, we uncovered a dynamic demographic history that coincided with periods of climate variation during the Pleistocene. Overall, our study provides a clear example of how genomic data can uncover valuable insights into captive populations and contributes important resources to guide future management decisions of an endangered species.
]]></description>
<dc:creator>Humble, E.</dc:creator>
<dc:creator>Dobrynin, P.</dc:creator>
<dc:creator>Senn, H.</dc:creator>
<dc:creator>Chuven, J.</dc:creator>
<dc:creator>Scott, A. F.</dc:creator>
<dc:creator>Mohr, D. W.</dc:creator>
<dc:creator>Dudchenko, O.</dc:creator>
<dc:creator>Omer, A. D.</dc:creator>
<dc:creator>Colaric, Z.</dc:creator>
<dc:creator>Lieberman Aiden, E.</dc:creator>
<dc:creator>Wildt, D.</dc:creator>
<dc:creator>Oliagi, S.</dc:creator>
<dc:creator>Tamazian, G.</dc:creator>
<dc:creator>Pukazhenthi, B.</dc:creator>
<dc:creator>Ogden, R.</dc:creator>
<dc:creator>Koepfli, K.-P.</dc:creator>
<dc:date>2019-12-08</dc:date>
<dc:identifier>doi:10.1101/867341</dc:identifier>
<dc:title><![CDATA[Chromosomal-level genome assembly of the scimitar-horned oryx: insights into diversity and demography of a species extinct in the wild]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-12-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.01.08.887828v1?rss=1">
<title>
<![CDATA[
The Gene-Rich Genome of the Scallop Pecten maximus 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.01.08.887828v1?rss=1"
</link>
<description><![CDATA[
BackgroundThe King Scallop, Pecten maximus, is distributed in shallow waters along the Atlantic coast of Europe. It forms the basis of a valuable commercial fishery and its ubiquity means that it plays a key role in coastal ecosystems and food webs. Like other filter feeding bivalves it can accumulate potent phytotoxins, to which it has evolved some immunity. The molecular origins of this immunity are of interest to evolutionary biologists, pharmaceutical companies and fisheries management.

FindingsHere we report the genome sequencing of this species, conducted as part of the Wellcome Sanger 25 Genomes Project. This genome was assembled from PacBio reads and scaffolded with 10x Chromium and Hi-C data, and its 3,983 scaffolds have an N50 of 44.8 Mb (longest scaffold 60.1 Mb), with 92% of the assembly sequence contained in 19 scaffolds, corresponding to the 19 chromosomes found in this species. The total assembly spans 918.3 Mb, and is the best-scaffolded marine bivalve genome published to date, exhibiting 95.5% recovery of the metazoan BUSCO set. Gene annotation resulted in 67,741 gene models. Analysis of gene content revealed large numbers of gene duplicates, as previously seen in bivalves, with little gene loss, in comparison with the sequenced genomes of other marine bivalve species.

ConclusionsThe genome assembly of Pecten maximus and its annotated gene set provide a high-quality platform for a wide range of investigations, including studies on such disparate topics as shell biomineralization, pigmentation, vision and resistance to algal toxins. As a result of our findings we highlight the sodium channel gene Nav1, known as a gene conferring resistance to saxitoxin and tetrodotoxin, as a candidate for further studies investigating immunity to domoic acid.
]]></description>
<dc:creator>Kenny, N. J.</dc:creator>
<dc:creator>McCarthy, S. A.</dc:creator>
<dc:creator>Dudchenko, O.</dc:creator>
<dc:creator>James, K.</dc:creator>
<dc:creator>Betteridge, E.</dc:creator>
<dc:creator>Corton, C.</dc:creator>
<dc:creator>Dolucan, J.</dc:creator>
<dc:creator>Mead, D.</dc:creator>
<dc:creator>Oliver, K.</dc:creator>
<dc:creator>Omer, A. D.</dc:creator>
<dc:creator>Pelan, S.</dc:creator>
<dc:creator>Ryan, Y.</dc:creator>
<dc:creator>Sims, Y.</dc:creator>
<dc:creator>Skelton, J.</dc:creator>
<dc:creator>Smith, M.</dc:creator>
<dc:creator>Torrance, J.</dc:creator>
<dc:creator>Weisz, D.</dc:creator>
<dc:creator>Wipat, A.</dc:creator>
<dc:creator>Aiden, E. L.</dc:creator>
<dc:creator>Howe, K.</dc:creator>
<dc:creator>Williams, S. T.</dc:creator>
<dc:date>2020-01-09</dc:date>
<dc:identifier>doi:10.1101/2020.01.08.887828</dc:identifier>
<dc:title><![CDATA[The Gene-Rich Genome of the Scallop Pecten maximus]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-01-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/762872v1?rss=1">
<title>
<![CDATA[
Chromatin is frequently unknotted at the megabase scale 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/762872v1?rss=1"
</link>
<description><![CDATA[
Knots in the human genome would greatly impact diverse cellular processes ranging from transcription to gene regulation. To date, it has not been possible to directly examine the genome in vivo for the presence of knots. Recently, methods for serial fluorescent in situ hybridization have made it possible to measure the 3d position of dozens of consecutive genomic loci, in vivo. However, the determination of whether genomic trajectories are knotted remains challenging, because small errors in the localization of a single locus can transform an unknotted trajectory into a highly-knotted trajectory, and vice versa. Here, we use stochastic closure analysis to determine whether a genomic trajectory is knotted in the setting of experimental noise. We analyse 4727 deposited genomic trajectories of a 2Mb long chromatin interval from chromosome 21. For 243 of these trajectories, their knottedness could be reliably determined despite the possibility of localization errors. Strikingly, in each of these 243 cases, the trajectory was unknotted. We note a potential source of bias, insofar as knotted contours may be more difficult to reliably resolve. Nevertheless, our data is consistent with a model where, at the scales probed, the human genome is often free of knots.
]]></description>
<dc:creator>Goundaroulis, D.</dc:creator>
<dc:creator>Aiden, E. L.</dc:creator>
<dc:creator>Stasiak, A.</dc:creator>
<dc:date>2019-09-09</dc:date>
<dc:identifier>doi:10.1101/762872</dc:identifier>
<dc:title><![CDATA[Chromatin is frequently unknotted at the megabase scale]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-09-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.12.18.423551v1?rss=1">
<title>
<![CDATA[
Simple biochemical features underlie transcriptional activation domain diversity and dynamic, fuzzy binding to Mediator 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.12.18.423551v1?rss=1"
</link>
<description><![CDATA[
Gene activator proteins comprise distinct DNA-binding and transcriptional activation domains (ADs). Because few ADs have been described, we tested domains tiling all yeast transcription factors for activation in vivo and identified 150 ADs. By mRNA display, we showed that 73% of ADs bound the Med15 subunit of Mediator, and that binding strength was correlated with activation. AD-Mediator interaction in vitro was unaffected by a large excess of free activator protein, pointing to a dynamic mechanism of interaction. Structural modeling showed that ADs interact with Med15 without shape complementarity ("fuzzy" binding). ADs shared no sequence motifs, but mutagenesis revealed biochemical and structural constraints. Finally, a neural network trained on AD sequences accurately predicted ADs in human proteins and in other yeast proteins, including chromosomal proteins and chromatin remodeling complexes. These findings solve the longstanding enigma of AD structure and function and provide a rationale for their role in biology.
]]></description>
<dc:creator>Sanborn, A. L.</dc:creator>
<dc:creator>Yeh, B. T.</dc:creator>
<dc:creator>Feigerle, J. T.</dc:creator>
<dc:creator>Hao, C. V.</dc:creator>
<dc:creator>Townshend, R. J. L.</dc:creator>
<dc:creator>Aiden, E. L.</dc:creator>
<dc:creator>Dror, R. O.</dc:creator>
<dc:creator>Kornberg, R. D.</dc:creator>
<dc:date>2020-12-18</dc:date>
<dc:identifier>doi:10.1101/2020.12.18.423551</dc:identifier>
<dc:title><![CDATA[Simple biochemical features underlie transcriptional activation domain diversity and dynamic, fuzzy binding to Mediator]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-12-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/816611v1?rss=1">
<title>
<![CDATA[
Cohesin depleted cells pass through mitosis and reconstitute a functional nuclear architecture 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/816611v1?rss=1"
</link>
<description><![CDATA[
Cohesin plays an essential role in chromatin loop extrusion, but its impact on a compartmentalized nuclear architecture, linked to nuclear functions, is debatable. Using live-cell and super-resolved 3D microscopy, we demonstrate that cohesin depleted cells pass through an endomitosis and rebuild a single multilobulated nucleus (MLN) with chromosome territories (CTs) pervaded by interchromatin channels. CTs contain chromatin domain clusters with a zonal organization of repressed chromatin domains in the interior and transcriptionally competent domains located at the periphery. Splicing speckles are located nearby within the lining channel system. These clusters form microscopically defined, active and inactive compartments, which correspond to A/B compartments, detected with ensemble Hi-C. Functionality of MLN despite continuous absence of cohesin was demonstrated by their ability to pass through S-phase with typical spatio-temporal patterns of replication domains. Evidence for structural changes of these domains compared to controls suggests that cohesin is required for their full integrity.
]]></description>
<dc:creator>Cremer, M.</dc:creator>
<dc:creator>Brandstetter, K.</dc:creator>
<dc:creator>Maiser, A.</dc:creator>
<dc:creator>Rao, S. S.</dc:creator>
<dc:creator>Schmid, V.</dc:creator>
<dc:creator>Mitra, N.</dc:creator>
<dc:creator>Mamberti, S.</dc:creator>
<dc:creator>Klein, K. N.</dc:creator>
<dc:creator>Gilbert, D. M.</dc:creator>
<dc:creator>Leonhardt, H.</dc:creator>
<dc:creator>Cardoso, M. C.</dc:creator>
<dc:creator>Lieberman Aiden, E.</dc:creator>
<dc:creator>Harz, H.</dc:creator>
<dc:creator>Cremer, T.</dc:creator>
<dc:date>2019-10-24</dc:date>
<dc:identifier>doi:10.1101/816611</dc:identifier>
<dc:title><![CDATA[Cohesin depleted cells pass through mitosis and reconstitute a functional nuclear architecture]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-10-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/205740v1?rss=1">
<title>
<![CDATA[
Juicebox.js provides a cloud-based visualization system for Hi-C data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/205740v1?rss=1"
</link>
<description><![CDATA[
Contact mapping experiments such as Hi-C explore how genomes fold in 3D. Here, we introduce Juicebox.js, a cloud-based web application for exploring the resulting datasets. Like the original Juicebox application, Juicebox.js allows users to zoom in and out of such datasets using an interface similar to Google Earth. Furthermore, Juicebox.js encodes the exact state of the browser in a shareable URL. Creating a public browser for a new Hi-C dataset does not require coding and can be accomplished in under a minute.
]]></description>
<dc:creator>Robinson, J.</dc:creator>
<dc:creator>Turner, D.</dc:creator>
<dc:creator>Durand, N. C.</dc:creator>
<dc:creator>Thorvaldsdottir, H.</dc:creator>
<dc:creator>Mesirov, J. P.</dc:creator>
<dc:creator>Aiden, E. L.</dc:creator>
<dc:date>2017-10-19</dc:date>
<dc:identifier>doi:10.1101/205740</dc:identifier>
<dc:title><![CDATA[Juicebox.js provides a cloud-based visualization system for Hi-C data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2017-10-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/139782v1?rss=1">
<title>
<![CDATA[
Cohesin Loss Eliminates All Loop Domains, Leading To Links Among Superenhancers And Downregulation Of Nearby Genes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/139782v1?rss=1"
</link>
<description><![CDATA[
The human genome folds to create thousands of intervals, called "contact domains," that exhibit enhanced contact frequency within themselves. "Loop domains" form because of tethering between two loci - almost always bound by CTCF and cohesin - lying on the same chromosome. "Compartment domains" form when genomic intervals with similar histone marks co-segregate. Here, we explore the effects of degrading cohesin. All loop domains are eliminated, but neither compartment domains nor histone marks are affected. Loci in different compartments that had been in the same loop domain become more segregated. Loss of loop domains does not lead to widespread ectopic gene activation, but does affect a significant minority of active genes. In particular, cohesin loss causes superenhancers to co-localize, forming hundreds of links within and across chromosomes, and affecting the regulation of nearby genes. Cohesin restoration quickly reverses these effects, consistent with a model where loop extrusion is rapid.
]]></description>
<dc:creator>Rao, S.</dc:creator>
<dc:creator>Huang, S.-C.</dc:creator>
<dc:creator>Glenn St. Hilaire, B.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:creator>Perez, E. M.</dc:creator>
<dc:creator>Kieffer-Kwon, K.-R.</dc:creator>
<dc:creator>Sanborn, A. L.</dc:creator>
<dc:creator>Johnstone, S. E.</dc:creator>
<dc:creator>Bochkov, I. D.</dc:creator>
<dc:creator>Huang, X.</dc:creator>
<dc:creator>Shamim, M. S.</dc:creator>
<dc:creator>Omer, A. D.</dc:creator>
<dc:creator>Bernstein, B. E.</dc:creator>
<dc:creator>Casellas, R.</dc:creator>
<dc:creator>Lander, E. S.</dc:creator>
<dc:creator>Lieberman Aiden, E.</dc:creator>
<dc:date>2017-05-18</dc:date>
<dc:identifier>doi:10.1101/139782</dc:identifier>
<dc:title><![CDATA[Cohesin Loss Eliminates All Loop Domains, Leading To Links Among Superenhancers And Downregulation Of Nearby Genes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2017-05-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/142026v1?rss=1">
<title>
<![CDATA[
Static And Dynamic DNA Loops Form AP-1 Bound Activation Hubs During Macrophage Development 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/142026v1?rss=1"
</link>
<description><![CDATA[
The three-dimensional arrangement of the human genome comprises a complex network of structural and regulatory chromatin loops important for coordinating changes in transcription during human development. To better understand the mechanisms underlying context-specific 3D chromatin structure and transcription during cellular differentiation, we generated comprehensive in situ Hi-C maps of DNA loops during human monocyte-to-macrophage differentiation. We demonstrate that dynamic looping events are regulatory rather than structural in nature and uncover widespread coordination of dynamic enhancer activity at preformed and acquired DNA loops. Enhancer-bound loop formation and enhancer-activation of preformed loops represent two distinct modes of regulation that together form multi-loop activation hubs at key macrophage genes. Activation hubs connect 3.4 enhancers per promoter and exhibit a strong enrichment for Activator Protein 1 (AP-1) binding events, suggesting multi-loop activation hubs driven by cell-type specific transcription factors may represent an important class of regulatory chromatin structures for the spatiotemporal control of transcription.nnHIGHLIGHTSO_LIHigh resolution and high sensitivity of loop detection via deeply sequenced in situ Hi-C experiments during monocyte to macrophage differentiation (> 10 billion total reads)nC_LIO_LIMulti-loop interaction communities identified surrounding key macrophage genes.nC_LIO_LIMulti-loop communities connect dynamic enhancers through both static and newly acquired DNA loops, forming hubs of activationnC_LIO_LIMacrophage activation hubs are enriched for AP-1 bound long-range enhancer interactions, suggesting cell-type specific TFs drive changes in 3D structure and transcription through regulatory DNA loopsnC_LI
]]></description>
<dc:creator>Phanstiel, D. H.</dc:creator>
<dc:creator>Van Bortle, K.</dc:creator>
<dc:creator>Spacek, D. V.</dc:creator>
<dc:creator>Hess, G. T.</dc:creator>
<dc:creator>Saad Shamim, M.</dc:creator>
<dc:creator>Machol, I.</dc:creator>
<dc:creator>Love, M. I.</dc:creator>
<dc:creator>Lieberman Aiden, E.</dc:creator>
<dc:creator>Bassik, M. C.</dc:creator>
<dc:creator>Snyder, M. P.</dc:creator>
<dc:date>2017-05-25</dc:date>
<dc:identifier>doi:10.1101/142026</dc:identifier>
<dc:title><![CDATA[Static And Dynamic DNA Loops Form AP-1 Bound Activation Hubs During Macrophage Development]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2017-05-25</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/481283v1?rss=1">
<title>
<![CDATA[
The Hi-Culfite assay reveals relationships between chromatin contacts and DNA methylation state 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/481283v1?rss=1"
</link>
<description><![CDATA[
Hi-Culfite, a protocol combining Hi-C and whole-genome bisulfite sequencing (WGBS), determines chromatin contacts and DNA methylation simultaneously. Hi-Culfite also reveals relationships that cannot be seen when the two assays are performed separately. For instance, we show that loci associated with open chromatin exhibit context-sensitive methylation: when their spatial neighbors lie in closed chromatin, they are much more likely to be methylated.
]]></description>
<dc:creator>Stamenova, E. K.</dc:creator>
<dc:creator>Durand, N.</dc:creator>
<dc:creator>Dudchenko, O.</dc:creator>
<dc:creator>Shamim, M. S.</dc:creator>
<dc:creator>Huang, S.-C.</dc:creator>
<dc:creator>Jiang, Y.</dc:creator>
<dc:creator>Bochkov, I. D.</dc:creator>
<dc:creator>Rao, S. S. P.</dc:creator>
<dc:creator>Lander, E. S.</dc:creator>
<dc:creator>Gnirke, A.</dc:creator>
<dc:creator>Aiden, E. L.</dc:creator>
<dc:date>2018-11-29</dc:date>
<dc:identifier>doi:10.1101/481283</dc:identifier>
<dc:title><![CDATA[The Hi-Culfite assay reveals relationships between chromatin contacts and DNA methylation state]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2018-11-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.02.19.431931v1?rss=1">
<title>
<![CDATA[
Parallel Characterization of cis-Regulatory Elements for Multiple Genes UsingCRISPRpath 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.02.19.431931v1?rss=1"
</link>
<description><![CDATA[
Current pooled CRISPR screens for cis-regulatory elements (CREs) can only accommodate one gene based on its expression level. Here, we describe CRISPRpath, a scalable screening strategy for parallelly characterizing CREs of genes linked to the same biological pathway and converging phenotypes. We demonstrate the ability of CRISPRpath for simultaneously identifying functional enhancers of six genes in the 6-thioguanine-induced DNA mismatch repair pathway using both CRISPR interference (CRISPRi) and CRISPR nuclease (CRISPRn) approaches. 60% of the identified enhancers are known promoters with distinct epigenomic features compared to other active promoters, including increased chromatin accessibility and interactivity. Furthermore, by imposing different levels of selection pressure, CRISPRpath can distinguish enhancers exerting strong impact on gene expression from those exerting weak impact. Our results offer a nuanced view of cis-regulation and demonstrate that CRISPRpath can be leveraged for understanding the complex gene regulatory program beyond transcriptional output at scale.
]]></description>
<dc:creator>Ren, X.</dc:creator>
<dc:creator>Wang, M.</dc:creator>
<dc:creator>Li, B.</dc:creator>
<dc:creator>Jamieson, K.</dc:creator>
<dc:creator>Zheng, L.</dc:creator>
<dc:creator>Jones, I. R.</dc:creator>
<dc:creator>Li, B.</dc:creator>
<dc:creator>Takagi, M. A.</dc:creator>
<dc:creator>Lee, J.</dc:creator>
<dc:creator>Maliskova, L.</dc:creator>
<dc:creator>Tam, T. W.</dc:creator>
<dc:creator>Yu, M.</dc:creator>
<dc:creator>Hu, R.</dc:creator>
<dc:creator>Lee, L.</dc:creator>
<dc:creator>Abnousi, A.</dc:creator>
<dc:creator>Li, G.</dc:creator>
<dc:creator>Li, Y.</dc:creator>
<dc:creator>Hu, M.</dc:creator>
<dc:creator>Ren, B.</dc:creator>
<dc:creator>Wang, W.</dc:creator>
<dc:creator>Shen, Y.</dc:creator>
<dc:date>2021-02-19</dc:date>
<dc:identifier>doi:10.1101/2021.02.19.431931</dc:identifier>
<dc:title><![CDATA[Parallel Characterization of cis-Regulatory Elements for Multiple Genes UsingCRISPRpath]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-02-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.12.30.424817v1?rss=1">
<title>
<![CDATA[
Chromatin Interaction Neural Network (ChINN): A machine learning-based method for predicting chromatin interactions from DNA sequences 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.12.30.424817v1?rss=1"
</link>
<description><![CDATA[
Chromatin interactions play important roles in regulating gene expression. However, the availability of genome-wide chromatin interaction data is limited. Various computational methods have been developed to predict chromatin interactions. Most of these methods rely on large collections of ChIP-Seq/RNA-Seq/DNase-Seq datasets and predict only enhancer-promoter interactions. Some of the  state-of-the-art methods have poor experimental designs, leading to over-exaggerated performances and misleading conclusions. Here we developed a computational method, Chromatin Interaction Neural Network (ChINN), to predict chromatin interactions between open chromatin regions by using only DNA sequences of the interacting open chromatin regions. ChINN is able to predict CTCF-, RNA polymerase II- and HiC-associated chromatin interactions between open chromatin regions. ChINN also shows good across-sample performances and captures various sequence features that are predictive of chromatin interactions. To apply our results to clinical patient data, we applied CHINN to predict chromatin interactions in 6 chronic lymphocytic leukemia (CLL) patient samples and a cohort of open chromatin data from 84 CLL samples that was previously published. Our results demonstrated extensive heterogeneity in chromatin interactions in patient samples, and one of the sources of this heterogeneity were the different subtypes of CLL.
]]></description>
<dc:creator>Cao, F.</dc:creator>
<dc:creator>Zhang, Y.</dc:creator>
<dc:creator>Cai, Y.</dc:creator>
<dc:creator>Animesh, S.</dc:creator>
<dc:creator>Zhang, Y.</dc:creator>
<dc:creator>Akincilar, S.</dc:creator>
<dc:creator>Loh, Y. P.</dc:creator>
<dc:creator>Chng, W. J.</dc:creator>
<dc:creator>Tergaonkar, V.</dc:creator>
<dc:creator>Kwoh, C. K.</dc:creator>
<dc:creator>Fullwood, M.</dc:creator>
<dc:date>2020-12-31</dc:date>
<dc:identifier>doi:10.1101/2020.12.30.424817</dc:identifier>
<dc:title><![CDATA[Chromatin Interaction Neural Network (ChINN): A machine learning-based method for predicting chromatin interactions from DNA sequences]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-12-31</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.04.18.047738v1?rss=1">
<title>
<![CDATA[
Three-dimensional Genome Organization Maps in Normal Haematopoietic Stem Cells and Acute Myeloid Leukemia 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.04.18.047738v1?rss=1"
</link>
<description><![CDATA[
Acute Myeloid Leukemia (AML) is a highly lethal blood cancer arising due to aberrant differentiation of haematopoietic stem cells. Here we obtained 3D genome organization maps by Hi-C in the CD34+ haematopoietic stem cells from three healthy individuals and eight individuals with AML, and found that AML have increased loops to oncogenes compared with normal CD34+ cells. The MEIS1 oncogenic transcription factor is regulated by a Frequently Interacting Region (FIRE). This FIRE is only present in normal bone marrow samples, and four of eight AML sample. FIRE presence is associated with MEIS1 expression. CRISPR excision of a FIRE boundary led to loss of MEIS1 and reduced cell growth. Moreover, MEIS1 can bind to the promoter of HOXA9, and HOXA9 shows gain of Acute Myeloid Leukemia-specific super-enhancers that loop to the HOXA9 promoter.

SignificanceWe found that Acute Myeloid Leukemias have more chromatin loops to oncogenes compared with normal blood stem cells. We identified heterogeneity in chromatin interactions at oncogenes, and heterogeneity in super-enhancers that loop to oncogenes, as two key epigenetic mechanisms that underlie MEIS1 and HOXA9 oncogene expression respectively.
]]></description>
<dc:creator>Wang, B.</dc:creator>
<dc:creator>Kong, L.</dc:creator>
<dc:creator>BABU, D.</dc:creator>
<dc:creator>Choudhary, R.</dc:creator>
<dc:creator>Fam, W.</dc:creator>
<dc:creator>Tng, J. Q.</dc:creator>
<dc:creator>Goh, Y.</dc:creator>
<dc:creator>Liu, X.</dc:creator>
<dc:creator>Song, F. F.</dc:creator>
<dc:creator>Chia, P.</dc:creator>
<dc:creator>Chan, M. C.</dc:creator>
<dc:creator>An, O.</dc:creator>
<dc:creator>Tham, C. Y.</dc:creator>
<dc:creator>Benoukraf, T.</dc:creator>
<dc:creator>Yang, H.</dc:creator>
<dc:creator>Wang, W.</dc:creator>
<dc:creator>Chng, W. J.</dc:creator>
<dc:creator>Tenen, D.</dc:creator>
<dc:creator>Fullwood, M. J.</dc:creator>
<dc:date>2020-04-18</dc:date>
<dc:identifier>doi:10.1101/2020.04.18.047738</dc:identifier>
<dc:title><![CDATA[Three-dimensional Genome Organization Maps in Normal Haematopoietic Stem Cells and Acute Myeloid Leukemia]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-04-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.01.04.425344v1?rss=1">
<title>
<![CDATA[
MYC overexpression leads to increased chromatin interactions at superenhancers and c-Myc binding sites 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.01.04.425344v1?rss=1"
</link>
<description><![CDATA[
The MYC oncogene encodes for the c-Myc protein and is frequently dysregulated across multiple cancer cell types, making it an attractive target for cancer therapy. There have been many difficulties in targeting c-Myc, due to its complex network of regulators and the unstructured nature of its protein. Thus, we are interested in looking at the downstream cancer-specific functions of c-Myc. Overexpression of MYC leads to c-Myc binding at active enhancers, resulting in a global transcriptional amplification of active genes. However, the mechanism underlying this c-Myc enhancer invasion has not been well studied. To that end, we performed ChIP-seq, RNA-seq, 4C-seq and SIQHiC (Spike-in Quantitative Hi-C) on the U2OS osteosarcoma cell line with tetracycline-inducible MYC. MYC overexpression in U2OS cells modulated histone acetylation and increased c-Myc binding at superenhancers. SIQHiC analysis revealed increased global chromatin contact frequency, particularly at chromatin interactions connecting c-Myc binding sites. Our results suggest that c-Myc molecules are recruited to and accumulates within zones of high transcription activity, binding first at stable promoter binding sites at low expression levels, then at superenhancer binding sites when overexpressed. At the same time, the recruitment of c-Myc and other transcription factors may stabilize chromatin interactions to increase chromatin contact frequency. The accumulation of c-Myc at cancer-type specific superenhancers may then drive the expression of interacting oncogenes that each cancer is highly reliant on. By elucidating the chromatin landscape of c-Myc driven cancers, we can potentially target these chromatin interactions for cancer therapy, without affecting physiological c-Myc signaling.
]]></description>
<dc:creator>See, Y. X.</dc:creator>
<dc:creator>Chen, K.</dc:creator>
<dc:creator>Fullwood, M. J.</dc:creator>
<dc:date>2021-01-05</dc:date>
<dc:identifier>doi:10.1101/2021.01.04.425344</dc:identifier>
<dc:title><![CDATA[MYC overexpression leads to increased chromatin interactions at superenhancers and c-Myc binding sites]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-01-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.01.11.426176v1?rss=1">
<title>
<![CDATA[
Biop-C: A Method for Chromatin Interactome Analysis of Solid Cancer Needle Biopsy Samples 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.01.11.426176v1?rss=1"
</link>
<description><![CDATA[
A major challenge in understanding the 3D genome organization of cancer samples is the lack of a method adapted to solid cancer needle biopsy samples. Here we developed Biop-C, a modified in situ Hi-C method, and applied it to characterize three nasopharyngeal cancer patient samples. We identified Topologically-Associated Domains (TADs), chromatin interaction loops, and Frequently Interacting regions (FIREs) at key oncogenes in nasopharyngeal cancer from Biop-C heat maps. Our results demonstrate the utility of our Biop-C method in investigating the 3D genome organization in solid cancers, and the importance of 3D genome organization in regulating oncogenes in nasopharyngeal cancer.
]]></description>
<dc:creator>Fullwood, M.</dc:creator>
<dc:creator>Animesh, S.</dc:creator>
<dc:creator>Choudhary, R.</dc:creator>
<dc:creator>Goh, B. C.</dc:creator>
<dc:creator>Tay, J.</dc:creator>
<dc:creator>chong, w.-q.</dc:creator>
<dc:creator>Ng, X. Y.</dc:creator>
<dc:date>2021-01-11</dc:date>
<dc:identifier>doi:10.1101/2021.01.11.426176</dc:identifier>
<dc:title><![CDATA[Biop-C: A Method for Chromatin Interactome Analysis of Solid Cancer Needle Biopsy Samples]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-01-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.02.17.431699v1?rss=1">
<title>
<![CDATA[
A cell atlas of chromatin accessibility across 25 adult human tissues 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.02.17.431699v1?rss=1"
</link>
<description><![CDATA[
Current catalogs of regulatory sequences in the human genome are still incomplete and lack cell type resolution. To profile the activity of human gene regulatory elements in diverse cell types and tissues in the human body, we applied single cell chromatin accessibility assays to 25 distinct human tissue types from multiple donors. The resulting chromatin maps comprising [~]500,000 nuclei revealed the status of open chromatin for over 750,000 candidate cis-regulatory elements (cCREs) in 54 distinct cell types. We further delineated cell type-specific and tissue-context dependent gene regulatory programs, and developmental stage specificity by comparing with a recent human fetal chromatin accessibility atlas. We finally used these chromatin maps to interpret the noncoding variants associated with complex human traits and diseases. This rich resource provides a foundation for the analysis of gene regulatory programs in human cell types across tissues and organ systems.
]]></description>
<dc:creator>Zhang, K.</dc:creator>
<dc:creator>Hocker, J. D.</dc:creator>
<dc:creator>Miller, M.</dc:creator>
<dc:creator>Hou, X.</dc:creator>
<dc:creator>Chiou, J.</dc:creator>
<dc:creator>Poirion, O. B.</dc:creator>
<dc:creator>Qiu, Y.</dc:creator>
<dc:creator>Li, Y. E.</dc:creator>
<dc:creator>Gaulton, K. J.</dc:creator>
<dc:creator>Wang, A.</dc:creator>
<dc:creator>Preissl, S.</dc:creator>
<dc:creator>Ren, B.</dc:creator>
<dc:date>2021-02-17</dc:date>
<dc:identifier>doi:10.1101/2021.02.17.431699</dc:identifier>
<dc:title><![CDATA[A cell atlas of chromatin accessibility across 25 adult human tissues]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-02-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.02.04.429675v1?rss=1">
<title>
<![CDATA[
Discovery and Functional Characterization of Pro-growth Enhancers in Human Cancer Cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.02.04.429675v1?rss=1"
</link>
<description><![CDATA[
Precision medicine depends critically on developing treatment strategies that can selectively target cancer cells with minimal adverse effects. Identifying unique transcriptional regulators of oncogenic signaling, and targeting cancer-cell-specific enhancers that may be active only in specific tumor cell lineages, could provide the necessary high specificity, but a scarcity of functionally validated enhancers in cancer cells presents a significant hurdle to this strategy. We address this limitation by carrying out large-scale functional screens for pro-growth enhancers using highly multiplexed CRISPR-based perturbation and sequencing in multiple cancer cell lines. We used this strategy to identify 488 pro-growth enhancers in a colorectal cancer cell line and 22 functional enhancers for the MYC and MYB key oncogenes in an additional nine cancer cell lines. The majority of pro-growth enhancers are accessible and presumably active only in cancer cells but not in normal tissues, and are enriched for elements associated with poor prognosis in colorectal cancer. We further identify master transcriptional regulators and demonstrate that the cancer pro-growth enhancers are modulated by lineage-specific transcription factors acting downstream of growth signaling pathways. Our results uncover context-specific, potentially actionable pro-growth enhancers from cancer cells, yielding insight into altered oncogenic transcription and revealing potential therapeutic targets for cancer treatment.
]]></description>
<dc:creator>Chen, P.</dc:creator>
<dc:creator>Fiaux, P.</dc:creator>
<dc:creator>Li, B.</dc:creator>
<dc:creator>Zhang, K.</dc:creator>
<dc:creator>Kubo, N.</dc:creator>
<dc:creator>Jiang, S.</dc:creator>
<dc:creator>Hu, R.</dc:creator>
<dc:creator>Wu, S.</dc:creator>
<dc:creator>Wang, M.</dc:creator>
<dc:creator>Wang, W.</dc:creator>
<dc:creator>McVicker, G. P.</dc:creator>
<dc:creator>Mischel, P.</dc:creator>
<dc:creator>Ren, B.</dc:creator>
<dc:date>2021-02-05</dc:date>
<dc:identifier>doi:10.1101/2021.02.04.429675</dc:identifier>
<dc:title><![CDATA[Discovery and Functional Characterization of Pro-growth Enhancers in Human Cancer Cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-02-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.03.08.434430v1?rss=1">
<title>
<![CDATA[
Transgenic mice for in vivo epigenome editing with CRISPR-based systems 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.03.08.434430v1?rss=1"
</link>
<description><![CDATA[
The discovery, characterization, and adaptation of the RNA-guided clustered regularly interspersed short palindromic repeat (CRISPR)-Cas9 system has greatly increased the ease with which genome and epigenome editing can be performed. Fusion of chromatin-modifying domains to the nuclease-deactivated form of Cas9 (dCas9) has enabled targeted gene activation or repression in both cultured cells and in vivo in animal models. However, delivery of the large dCas9 fusion proteins to target cell types and tissues is an obstacle to widespread adoption of these tools for in vivo studies. Here we describe the generation and validation of two conditional transgenic mouse lines for targeted gene regulation, Rosa26:LSL-dCas9-p300 for gene activation and Rosa26:LSL-dCas9-KRAB for gene repression. Using the dCas9p300 and dCas9KRAB transgenic mice we demonstrate activation or repression of genes in both the brain and liver in vivo, and T cells and fibroblasts ex vivo. We show gene regulation and targeted epigenetic modification with gRNAs targeting either transcriptional start sites (TSS) or distal enhancer elements, as well as corresponding changes to downstream phenotypes. These mouse lines are convenient and valuable tools for facile, temporally controlled, and tissue-restricted epigenome editing and manipulation of gene expression in vivo.
]]></description>
<dc:creator>Gemberling, M.</dc:creator>
<dc:creator>Siklenka, K.</dc:creator>
<dc:creator>Rodriguez, E.</dc:creator>
<dc:creator>Eisinger, K.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>Liu, F.</dc:creator>
<dc:creator>Kantor, A.</dc:creator>
<dc:creator>Li, L.</dc:creator>
<dc:creator>Cigliola, V.</dc:creator>
<dc:creator>Hazlett, M.</dc:creator>
<dc:creator>Williams, C.</dc:creator>
<dc:creator>Bartelt, L.</dc:creator>
<dc:creator>Bodle, J.</dc:creator>
<dc:creator>Daniels, H.</dc:creator>
<dc:creator>Rouse, C.</dc:creator>
<dc:creator>Hilton, I.</dc:creator>
<dc:creator>Madigan, V.</dc:creator>
<dc:creator>Asokan, A.</dc:creator>
<dc:creator>Ciofani, M.</dc:creator>
<dc:creator>Poss, K.</dc:creator>
<dc:creator>Reddy, T. E.</dc:creator>
<dc:creator>West, A.</dc:creator>
<dc:creator>Gersbach, C.</dc:creator>
<dc:date>2021-03-08</dc:date>
<dc:identifier>doi:10.1101/2021.03.08.434430</dc:identifier>
<dc:title><![CDATA[Transgenic mice for in vivo epigenome editing with CRISPR-based systems]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-03-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.03.08.434470v1?rss=1">
<title>
<![CDATA[
Genome-wide annotation of gene regulatory elements linked to cell fitness 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.03.08.434470v1?rss=1"
</link>
<description><![CDATA[
Noncoding regulatory elements control gene expression and thus govern nearly all biological processes. Epigenomic profiling assays have identified millions of putative regulatory elements, but systematically determining the function of those regulatory elements remains a substantial challenge. Here we adapt CRISPR screening by epigenetic repression to screen all 111,619 putative non-coding regulatory elements defined by open chromatin sites in human K562 leukemia cells for their role in regulating essential cellular processes and proliferation. In an initial screen containing 1,084,704 gRNAs, we implemented an analysis framework to quantify perturbation effects, and nominate 1,108 regulatory elements that strongly impact cell fitness. We tested 8,845 of the primary screen elements in a secondary screen, evaluated their cell-type specificity in a second cancer cell line, and then used a single-cell RNA-seq CRISPR screen to discover 63 connections between distal regulatory elements and target genes. This comprehensive and quantitative genome-wide map of essential gene regulatory elements presents a framework for extensive characterization of noncoding regulatory elements that drive complex cell phenotypes and for prioritizing non-coding genetic variants that may contribute to common traits and disease risk.
]]></description>
<dc:creator>Klann, T.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>Ettyreddy, A.</dc:creator>
<dc:creator>Rickels, R.</dc:creator>
<dc:creator>Bryois, J.</dc:creator>
<dc:creator>Jiang, S.</dc:creator>
<dc:creator>Adkar, S.</dc:creator>
<dc:creator>Iglesias, N.</dc:creator>
<dc:creator>Sullivan, P.</dc:creator>
<dc:creator>Reddy, T. E.</dc:creator>
<dc:creator>Crawford, G. E.</dc:creator>
<dc:creator>Gersbach, C.</dc:creator>
<dc:date>2021-03-09</dc:date>
<dc:identifier>doi:10.1101/2021.03.08.434470</dc:identifier>
<dc:title><![CDATA[Genome-wide annotation of gene regulatory elements linked to cell fitness]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-03-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.03.19.436212v1?rss=1">
<title>
<![CDATA[
Identifying disease-critical cell types and cellular processes across the human body by integration of single-cell profiles and human genetics 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.03.19.436212v1?rss=1"
</link>
<description><![CDATA[
Genome-wide association studies (GWAS) provide a powerful means to identify loci and genes contributing to disease, but in many cases the related cell types/states through which genes confer disease risk remain unknown. Deciphering such relationships is important for identifying pathogenic processes and developing therapeutics. Here, we introduce sc-linker, a framework for integrating single-cell RNA-seq (scRNA-seq), epigenomic maps and GWAS summary statistics to infer the underlying cell types and processes by which genetic variants influence disease. We analyzed 1.6 million scRNA-seq profiles from 209 individuals spanning 11 tissue types and 6 disease conditions, and constructed gene programs capturing cell types, disease progression, and cellular processes both within and across cell types. We evaluated these gene programs for disease enrichment by transforming them to SNP annotations with tissue-specific epigenomic maps and computing enrichment scores across 60 diseases and complex traits (average N=297K). Cell type, disease progression, and cellular process programs captured distinct heritability signals even within the same cell type, as we show in multiple complex diseases that affect the brain (Alzheimers disease, multiple sclerosis), colon (ulcerative colitis) and lung (asthma, idiopathic pulmonary fibrosis, severe COVID-19). The inferred disease enrichments recapitulated known biology and highlighted novel cell-disease relationships, including GABAergic neurons in major depressive disorder (MDD), a disease progression M cell program in ulcerative colitis, and a disease-specific complement cascade process in multiple sclerosis. In autoimmune disease, both healthy and disease progression immune cell type programs were associated, whereas for epithelial cells, disease progression programs were most prominent, perhaps suggesting a role in disease progression over initiation. Our framework provides a powerful approach for identifying the cell types and cellular processes by which genetic variants influence disease.
]]></description>
<dc:creator>Jagadeesh, K. A.</dc:creator>
<dc:creator>Dey, K. K.</dc:creator>
<dc:creator>Montoro, D. T.</dc:creator>
<dc:creator>Gazal, S.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:creator>Xavier, R. J.</dc:creator>
<dc:creator>Price, A. L.</dc:creator>
<dc:creator>Regev, A.</dc:creator>
<dc:date>2021-03-19</dc:date>
<dc:identifier>doi:10.1101/2021.03.19.436212</dc:identifier>
<dc:title><![CDATA[Identifying disease-critical cell types and cellular processes across the human body by integration of single-cell profiles and human genetics]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-03-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.01.13.424697v1?rss=1">
<title>
<![CDATA[
Genome-wide functional screen of 3'UTR variants uncovers causal variants for human disease and evolution 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.01.13.424697v1?rss=1"
</link>
<description><![CDATA[
3 untranslated region (3UTR) variants are strongly associated with human traits and diseases, yet few have been causally identified. We developed the Massively Parallel Reporter Assay for 3UTRs (MPRAu) to sensitively assay 12,173 3UTR variants. We applied MPRAu to six human cell lines, focusing on genetic variants associated with genome-wide association studies (GWAS) and human evolutionary adaptation. MPRAu expands our understanding of 3UTR function, suggesting that low-complexity sequences predominately explain 3UTR regulatory activity. We adapt MPRAu to uncover diverse molecular mechanisms at base-pair resolution, including an AU-rich element of LEPR linked to potential metabolic evolutionary adaptations in East Asians. We nominate hundreds of 3UTR causal variants with genetically fine-mapped phenotype associations. Using endogenous allelic replacements, we characterize one variant that disrupts a miRNA site regulating the viral defense gene TRIM14, and one that alters PILRB abundance, nominating a causal variant underlying transcriptional changes in age-related macular degeneration.
]]></description>
<dc:creator>Griesemer, D.</dc:creator>
<dc:creator>Xue, J. R.</dc:creator>
<dc:creator>Reilly, S. K.</dc:creator>
<dc:creator>Ulirsch, J. C.</dc:creator>
<dc:creator>Kukreja, K.</dc:creator>
<dc:creator>Davis, J.</dc:creator>
<dc:creator>Kanai, M.</dc:creator>
<dc:creator>Yang, D. K.</dc:creator>
<dc:creator>Montgomery, S. B.</dc:creator>
<dc:creator>Novina, C. D.</dc:creator>
<dc:creator>Tewhey, R.</dc:creator>
<dc:creator>Sabeti, P. C.</dc:creator>
<dc:date>2021-01-13</dc:date>
<dc:identifier>doi:10.1101/2021.01.13.424697</dc:identifier>
<dc:title><![CDATA[Genome-wide functional screen of 3'UTR variants uncovers causal variants for human disease and evolution]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-01-13</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.04.26.441442v1?rss=1">
<title>
<![CDATA[
Multi-tissue integrative analysis of personal epigenomes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.04.26.441442v1?rss=1"
</link>
<description><![CDATA[
Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of personal epigenomes, for [~]25 tissues and >10 assays in four donors (>1500 open-access functional genomic and proteomic datasets, in total). Each dataset is mapped to a matched, diploid personal genome, which has long-read phasing and structural variants. The mappings enable us to identify >1 million loci with allele-specific behavior. These loci exhibit coordinated epigenetic activity along haplotypes and less conservation than matched, non-allele-specific loci, in a fashion broadly paralleling tissue-specificity. Surprisingly, they can be accurately modelled just based on local nucleotide-sequence context. Combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci and enables models for transferring known eQTLs to difficult-to-profile tissues. Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.
]]></description>
<dc:creator>Rozowsky, J.</dc:creator>
<dc:creator>Drenkow, J.</dc:creator>
<dc:creator>Yang, Y.</dc:creator>
<dc:creator>Gursoy, G.</dc:creator>
<dc:creator>Galeev, T.</dc:creator>
<dc:creator>Borsari, B.</dc:creator>
<dc:creator>Epstein, C.</dc:creator>
<dc:creator>Xiong, K.</dc:creator>
<dc:creator>Xu, J.</dc:creator>
<dc:creator>Gao, J.</dc:creator>
<dc:creator>Yu, K.</dc:creator>
<dc:creator>Berthel, A.</dc:creator>
<dc:creator>Chen, Z.</dc:creator>
<dc:creator>Navarro, F.</dc:creator>
<dc:creator>Liu, J.</dc:creator>
<dc:creator>Sun, M.</dc:creator>
<dc:creator>Wright, J.</dc:creator>
<dc:creator>Chang, J.</dc:creator>
<dc:creator>Cameron, C.</dc:creator>
<dc:creator>Shoresh, N.</dc:creator>
<dc:creator>Gaskell, E.</dc:creator>
<dc:creator>Adrian, J.</dc:creator>
<dc:creator>Aganezov, S.</dc:creator>
<dc:creator>Balderrama-Gutierrez, G.</dc:creator>
<dc:creator>Banskota, S.</dc:creator>
<dc:creator>Corona, G.</dc:creator>
<dc:creator>Chee, S.</dc:creator>
<dc:creator>Chhetri, S.</dc:creator>
<dc:creator>Martins, G.</dc:creator>
<dc:creator>Danyko, C.</dc:creator>
<dc:creator>Davis, C.</dc:creator>
<dc:creator>Farid, D.</dc:creator>
<dc:creator>Farrell, N.</dc:creator>
<dc:creator>Gabdank, I.</dc:creator>
<dc:creator>Gofin, Y.</dc:creator>
<dc:creator>Gorkin, D.</dc:creator>
<dc:creator>Gu, M.</dc:creator>
<dc:creator>Hecht, V.</dc:creator>
<dc:creator>Hitz, B.</dc:creator>
<dc:creator>Issner, R.</dc:creator>
<dc:creator>Kirsche, M.</dc:creator>
<dc:creator>Kong, X.</dc:creator>
<dc:creator>Lam, B.</dc:creator>
<dc:creator>Li, S.</dc:creator>
<dc:creator>Li, B.</dc:creator>
<dc:creator>Li, T.</dc:creator>
<dc:creator>Li, X.</dc:creator>
<dc:creator>Lin, K.</dc:creator>
<dc:creator>Luo, R.</dc:creator>
<dc:creator>Mackiewicz, M.</dc:creator>
<dc:creator>Moore, J.</dc:creator>
<dc:creator>Mudge, J.</dc:creator>
<dc:creator>Nel</dc:creator>
<dc:date>2021-04-26</dc:date>
<dc:identifier>doi:10.1101/2021.04.26.441442</dc:identifier>
<dc:title><![CDATA[Multi-tissue integrative analysis of personal epigenomes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-04-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.04.26.441522v1?rss=1">
<title>
<![CDATA[
Mapping and modeling the genomic basis of differential RNA isoform expression at single-cell resolution with LR-Split-seq 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.04.26.441522v1?rss=1"
</link>
<description><![CDATA[
Alternative RNA isoforms are defined by promoter choice, alternative splicing, and polyA site selection. Although differential isoform expression is known to play a large regulatory role in eukaryotes, it has proved challenging to study with standard short-read RNA-seq because of the uncertainties it leaves about the full-length structure and precise termini of transcripts. The rise in throughput and quality of long-read sequencing now makes it possible, in principle, to unambiguously identify most transcript isoforms from beginning to end. However, its application to single-cell RNA-seq has been limited by throughput and expense. Here, we develop and characterize long-read Split-seq (LR-Split-seq), which uses a combinatorial barcoding-based method for sequencing single cells and nuclei with long reads. We show that LR-Split-seq can associate isoforms with cell types with relative economy and design flexibility. We characterize LR-Split-seq for whole cells and nuclei by using the well-studied mouse C2C12 system in which mononucleated myoblast cells differentiate and fuse into multinucleated myotubes. We show that the overall results are reproducible when comparing long- and short-read data from the same cell or nucleus. We find substantial evidence of differential isoform expression during differentiation including alternative transcription start site (TSS) usage. We integrate the resulting isoform expression dynamics with snATAC-seq chromatin accessibility to validate TSS-driven isoform choices. LR-Split-seq provides an affordable method for identifying cluster-specific isoforms in single cells that can be further quantified with companion deep short-read scRNA-seq from the same cell populations.
]]></description>
<dc:creator>Rebboah, E.</dc:creator>
<dc:creator>Reese, F.</dc:creator>
<dc:creator>Williams, K.</dc:creator>
<dc:creator>Balderrama-Gutierrez, G.</dc:creator>
<dc:creator>McGill, C.</dc:creator>
<dc:creator>Trout, D.</dc:creator>
<dc:creator>Rodriguez, I. M.</dc:creator>
<dc:creator>Liang, H.</dc:creator>
<dc:creator>Wold, B. J.</dc:creator>
<dc:creator>Mortazavi, A.</dc:creator>
<dc:date>2021-04-27</dc:date>
<dc:identifier>doi:10.1101/2021.04.26.441522</dc:identifier>
<dc:title><![CDATA[Mapping and modeling the genomic basis of differential RNA isoform expression at single-cell resolution with LR-Split-seq]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-04-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.04.19.440534v1?rss=1">
<title>
<![CDATA[
Axes of inter-sample variability among transcriptional neighborhoods reveal disease associated cell states in single-cell data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.04.19.440534v1?rss=1"
</link>
<description><![CDATA[
As single-cell datasets grow in sample size, there is a critical need to characterize cell states that vary across samples and associate with sample attributes like clinical phenotypes. Current statistical approaches typically map cells to cell-type clusters and examine sample differences through that lens alone. Here we present covarying neighborhood analysis (CNA), an unbiased method to identify cell populations of interest with greater flexibility and granularity. CNA characterizes dominant axes of variation across samples by identifying groups of very small regions in transcriptional space--termed neighborhoods--that covary in abundance across samples, suggesting shared function or regulation. CNA can then rigorously test for associations between any sample-level attribute and the abundances of these covarying neighborhood groups. We show in simulation that CNA enables more powerful and accurate identification of disease-associated cell states than a cluster-based approach. When applied to published datasets, CNA captures a Notch activation signature in rheumatoid arthritis, redefines monocyte populations expanded in sepsis, and identifies a previously undiscovered T-cell population associated with progression to active tuberculosis.
]]></description>
<dc:creator>Reshef, Y. A.</dc:creator>
<dc:creator>Rumker, L.</dc:creator>
<dc:creator>Kang, J. B.</dc:creator>
<dc:creator>Nathan, A.</dc:creator>
<dc:creator>Murray, M. B.</dc:creator>
<dc:creator>Moody, D. B.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:date>2021-04-20</dc:date>
<dc:identifier>doi:10.1101/2021.04.19.440534</dc:identifier>
<dc:title><![CDATA[Axes of inter-sample variability among transcriptional neighborhoods reveal disease associated cell states in single-cell data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-04-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.05.06.443037v1?rss=1">
<title>
<![CDATA[
Topologically Associating Domain Boundaries are Commonly Required for Normal Genome Function 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.05.06.443037v1?rss=1"
</link>
<description><![CDATA[
Topologically associating domain (TAD) boundaries are thought to partition the genome into distinct regulatory territories. Anecdotal evidence suggests that their disruption may interfere with normal gene expression and cause disease phenotype1-3, but the overall extent to which this occurs remains unknown. Here we show that TAD boundary deletions commonly disrupt normal genome function in vivo. We used CRISPR genome editing in mice to individually delete eight TAD boundaries (11-80kb in size) from the genome in mice. All deletions examined resulted in at least one detectable molecular or organismal phenotype, which included altered chromatin interactions or gene expression, reduced viability, and anatomical phenotypes. For 5 of 8 (62%) loci examined, boundary deletions were associated with increased embryonic lethality or other developmental phenotypes. For example, a TAD boundary deletion near Smad3/Smad6 caused complete embryonic lethality, while a deletion near Tbx5/Lhx5 resulted in a severe lung malformation. Our findings demonstrate the importance of TAD boundary sequences for in vivo genome function and suggest that noncoding deletions affecting TAD boundaries should be carefully considered for potential pathogenicity in clinical genetics screening.
]]></description>
<dc:creator>Rajderkar, S.</dc:creator>
<dc:creator>Barozzi, I.</dc:creator>
<dc:creator>Zhu, Y.</dc:creator>
<dc:creator>Hu, R.</dc:creator>
<dc:creator>Zhang, Y.</dc:creator>
<dc:creator>Li, B.</dc:creator>
<dc:creator>Fukuda-Yuzawa, Y.</dc:creator>
<dc:creator>Kelman, G.</dc:creator>
<dc:creator>Akeza, A.</dc:creator>
<dc:creator>Blow, M. J.</dc:creator>
<dc:creator>Pham, Q.</dc:creator>
<dc:creator>Harrington, A. N.</dc:creator>
<dc:creator>Godoy, J.</dc:creator>
<dc:creator>Meky, E. M.</dc:creator>
<dc:creator>von Maydell, K.</dc:creator>
<dc:creator>Novak, C. S.</dc:creator>
<dc:creator>Plajzer-Frick, I.</dc:creator>
<dc:creator>Afzal, V.</dc:creator>
<dc:creator>Tran, S.</dc:creator>
<dc:creator>Talkowski, M. E.</dc:creator>
<dc:creator>Llyod, K. C. K.</dc:creator>
<dc:creator>Ren, B.</dc:creator>
<dc:creator>Dickel, D. E.</dc:creator>
<dc:creator>Visel, A.</dc:creator>
<dc:creator>Pennacchio, L. A.</dc:creator>
<dc:date>2021-05-07</dc:date>
<dc:identifier>doi:10.1101/2021.05.06.443037</dc:identifier>
<dc:title><![CDATA[Topologically Associating Domain Boundaries are Commonly Required for Normal Genome Function]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-05-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.11.02.364265v1?rss=1">
<title>
<![CDATA[
High-throughput single-cell chromatin accessibility CRISPR screens enable unbiased identification of regulatory networks in cancer 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.11.02.364265v1?rss=1"
</link>
<description><![CDATA[
Spear-ATAC is a modified droplet-based single-cell ATAC-seq (scATAC-seq) protocol that enables simultaneous read-out of chromatin accessibility profiles and integrated sgRNA spacer sequences from thousands of individual cells at a time. Spear-ATAC profiling of 104,592 cells representing 414 sgRNA knock-down populations revealed the temporal dynamics of epigenetic responses to regulatory perturbations in cancer cells and the associations between transcription factor binding profiles, demonstrating a high-throughput method for perturbing and evaluating dynamic single-cell epigenetic states.
]]></description>
<dc:creator>Pierce, S. E.</dc:creator>
<dc:creator>Granja, J. M.</dc:creator>
<dc:creator>Greenleaf, W. J.</dc:creator>
<dc:date>2020-11-02</dc:date>
<dc:identifier>doi:10.1101/2020.11.02.364265</dc:identifier>
<dc:title><![CDATA[High-throughput single-cell chromatin accessibility CRISPR screens enable unbiased identification of regulatory networks in cancer]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-11-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.02.25.430130v1?rss=1">
<title>
<![CDATA[
A single-cell and spatial atlas of autopsy tissues reveals pathology and cellular targets of SARS-CoV-2 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.02.25.430130v1?rss=1"
</link>
<description><![CDATA[
The SARS-CoV-2 pandemic has caused over 1 million deaths globally, mostly due to acute lung injury and acute respiratory distress syndrome, or direct complications resulting in multiple-organ failures. Little is known about the host tissue immune and cellular responses associated with COVID-19 infection, symptoms, and lethality. To address this, we collected tissues from 11 organs during the clinical autopsy of 17 individuals who succumbed to COVID-19, resulting in a tissue bank of approximately 420 specimens. We generated comprehensive cellular maps capturing COVID-19 biology related to patients demise through single-cell and single-nucleus RNA-Seq of lung, kidney, liver and heart tissues, and further contextualized our findings through spatial RNA profiling of distinct lung regions. We developed a computational framework that incorporates removal of ambient RNA and automated cell type annotation to facilitate comparison with other healthy and diseased tissue atlases. In the lung, we uncovered significantly altered transcriptional programs within the epithelial, immune, and stromal compartments and cell intrinsic changes in multiple cell types relative to lung tissue from healthy controls. We observed evidence of: alveolar type 2 (AT2) differentiation replacing depleted alveolar type 1 (AT1) lung epithelial cells, as previously seen in fibrosis; a concomitant increase in myofibroblasts reflective of defective tissue repair; and, putative TP63+ intrapulmonary basal-like progenitor (IPBLP) cells, similar to cells identified in H1N1 influenza, that may serve as an emergency cellular reserve for severely damaged alveoli. Together, these findings suggest the activation and failure of multiple avenues for regeneration of the epithelium in these terminal lungs. SARS-CoV-2 RNA reads were enriched in lung mononuclear phagocytic cells and endothelial cells, and these cells expressed distinct host response transcriptional programs. We corroborated the compositional and transcriptional changes in lung tissue through spatial analysis of RNA profiles in situ and distinguished unique tissue host responses between regions with and without viral RNA, and in COVID-19 donor tissues relative to healthy lung. Finally, we analyzed genetic regions implicated in COVID-19 GWAS with transcriptomic data to implicate specific cell types and genes associated with disease severity. Overall, our COVID-19 cell atlas is a foundational dataset to better understand the biological impact of SARS-CoV-2 infection across the human body and empowers the identification of new therapeutic interventions and prevention strategies.
]]></description>
<dc:creator>Delorey, T. M.</dc:creator>
<dc:creator>Ziegler, C. G. K.</dc:creator>
<dc:creator>Heimberg, G.</dc:creator>
<dc:creator>Normand, R.</dc:creator>
<dc:creator>Yang, Y.</dc:creator>
<dc:creator>Segerstolpe, A.</dc:creator>
<dc:creator>Abbondanza, D.</dc:creator>
<dc:creator>Fleming, S. J.</dc:creator>
<dc:creator>Subramanian, A.</dc:creator>
<dc:creator>Montoro, D. T.</dc:creator>
<dc:creator>Jagadeesh, K. A.</dc:creator>
<dc:creator>Dey, K.</dc:creator>
<dc:creator>Sen, P.</dc:creator>
<dc:creator>Slyper, M.</dc:creator>
<dc:creator>Pita-Juarez, Y.</dc:creator>
<dc:creator>Phillips, D.</dc:creator>
<dc:creator>Bloom-Ackermann, Z.</dc:creator>
<dc:creator>Barkas, N.</dc:creator>
<dc:creator>Ganna, A.</dc:creator>
<dc:creator>Gomez, J.</dc:creator>
<dc:creator>Normandin, E.</dc:creator>
<dc:creator>Naderi, P.</dc:creator>
<dc:creator>Popov, Y. V.</dc:creator>
<dc:creator>Raju, S. S.</dc:creator>
<dc:creator>Niezen, S.</dc:creator>
<dc:creator>Tsai, L. T.- Y.</dc:creator>
<dc:creator>Siddle, K. J.</dc:creator>
<dc:creator>Sud, M.</dc:creator>
<dc:creator>Tran, V. M.</dc:creator>
<dc:creator>Karuthedath Vellarikkal, S.</dc:creator>
<dc:creator>Amir-Zilberstein, L.</dc:creator>
<dc:creator>Atri, D. S.</dc:creator>
<dc:creator>Beechem, J. M.</dc:creator>
<dc:creator>Brook, O. R.</dc:creator>
<dc:creator>Chen, J.</dc:creator>
<dc:creator>Divakar, P.</dc:creator>
<dc:creator>Dorceus, P.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:creator>Essene, A.</dc:creator>
<dc:creator>Fitzgerald, D. M.</dc:creator>
<dc:creator>Fropf, R.</dc:creator>
<dc:creator>Gaz</dc:creator>
<dc:date>2021-02-25</dc:date>
<dc:identifier>doi:10.1101/2021.02.25.430130</dc:identifier>
<dc:title><![CDATA[A single-cell and spatial atlas of autopsy tissues reveals pathology and cellular targets of SARS-CoV-2]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-02-25</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.05.12.443890v1?rss=1">
<title>
<![CDATA[
A catalog of transcription start sites across 115 human tissue and cell types 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.05.12.443890v1?rss=1"
</link>
<description><![CDATA[
Accurate transcription start site (TSS) annotations are essential for understanding transcriptional regulation and its role in human disease. Gene collections such as GENCODE contain annotations for tens of thousands of TSSs, but not all of these annotations are experimentally validated nor do they contain information on cell type-specific usage. Therefore, we sought to generate a collection of experimentally validated TSSs by integrating RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression (RAMPAGE) data from 115 cell and tissue types, which resulted in a collection of approximately 50 thousand representative RAMPAGE peaks. These peaks were primarily proximal to GENCODE-annotated TSSs and were concordant with other transcription assays. Because RAMPAGE uses paired-end reads, we were then able to connect peaks to transcripts by analyzing the genomic positions of the 3 ends of read mates. Using this paired-end information, we classified the vast majority (37 thousand) of our RAMPAGE peaks as verified TSSs, updating TSS annotations for 20% of GENCODE genes. We also found that these updated TSS annotations were supported by epigenomic and other transcriptomic datasets. To demonstrate the utility of this RAMPAGE rPeak collection, we intersected it with the NHGRI/EBI GWAS catalog and identified new candidate GWAS genes. Overall, our work demonstrates the importance of integrating experimental data to further refine TSS annotations and provides a valuable resource for the biological community.
]]></description>
<dc:creator>Moore, J. E.</dc:creator>
<dc:creator>Zhang, X.-O.</dc:creator>
<dc:creator>Elhajjajy, S. I.</dc:creator>
<dc:creator>Fan, K.</dc:creator>
<dc:creator>Reese, F.</dc:creator>
<dc:creator>Mortazavi, A.</dc:creator>
<dc:creator>Weng, Z.</dc:creator>
<dc:date>2021-05-13</dc:date>
<dc:identifier>doi:10.1101/2021.05.12.443890</dc:identifier>
<dc:title><![CDATA[A catalog of transcription start sites across 115 human tissue and cell types]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-05-13</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.06.02.446833v1?rss=1">
<title>
<![CDATA[
Systematic comparison of experimental assays and analytical pipelines for identification of active enhancers genome-wide 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.06.02.446833v1?rss=1"
</link>
<description><![CDATA[
Mounting evidence supports the idea that transcriptional patterns serve as more specific identifiers of active enhancers than histone marks1,2; however, the optimal strategy to identify active enhancers both experimentally and computationally has not been determined. In this study, we compared 13 genome-wide RNA sequencing assays in K562 cells and showed that the nuclear run-on followed by cap-selection assay (namely, GRO/PRO-cap) has significant advantages in eRNA detection and active enhancer identification. We also introduced a new analytical tool, Peak Identifier for Nascent-Transcript Sequencing (PINTS), to identify active promoters and enhancers genome-wide and pinpoint the precise location of the 5 transcription start sites (TSSs) within these regulatory elements. Finally, we compiled a comprehensive enhancer candidate compendium based on the detected eRNA TSSs available in 120 cell and tissue types. To facilitate the exploration and prioritization of these enhancer candidates, we also built a user-friendly web server (https://pints.yulab.org) for the compendium with various additional genomic and epigenomic annotations. With the knowledge of the best available assays and pipelines, this large-scale annotation of candidate enhancers will pave the road for selection and characterization of their functions in a time-, labor-, and cost-effective manner in the future.
]]></description>
<dc:creator>Yao, L.</dc:creator>
<dc:creator>Liang, J.</dc:creator>
<dc:creator>Ozer, A.</dc:creator>
<dc:creator>Leung, A. K.-Y.</dc:creator>
<dc:creator>ENCODE Consortium,</dc:creator>
<dc:creator>Lis, J. T.</dc:creator>
<dc:creator>Yu, H.</dc:creator>
<dc:date>2021-06-03</dc:date>
<dc:identifier>doi:10.1101/2021.06.02.446833</dc:identifier>
<dc:title><![CDATA[Systematic comparison of experimental assays and analytical pipelines for identification of active enhancers genome-wide]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-06-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.06.01.444518v1?rss=1">
<title>
<![CDATA[
Glucocorticoid receptor collaborates with pioneer factors and AP-1 to execute genome-wide regulation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.06.01.444518v1?rss=1"
</link>
<description><![CDATA[
AO_SCPLOWBSTRACTC_SCPLOWThe glucocorticoid receptor (GR) regulates transcription through binding to specific DNA motifs, particularly at enhancers. While the motif to which it binds is constant across cell types, GR has cell type-specific binding at genomic loci, resulting in regulation of different genes. The presence of other bound transcription factors (TFs) is hypothesized to strongly influence where GR binds. Here, we addressed the roles of other TFs in the glucocorticoid response by comparing changes in GR binding and nascent transcription at promoters and distal candidate cis-regulatory elements (CCREs) in two distinct human cancer cell types. We found that after glucocorticoid treatment, GR binds to thousands of genomic loci that are primarily outside of promoter regions and are potentially enhancers. The majority of these GR binding sites are cell-type specific, and they are associated with pioneer factor binding. A small fraction of GR occupied regions (GORs) displayed increased bidirectional nascent transcription, which is a characteristic of many active enhancers, after glucocorticoid treatment. Non-promoter GORs with increased transcription were specifically enriched for AP-1 binding prior to glucocorticoid treatment. These results support a model of transcriptional regulation in which multiple classes of TFs are required. The pioneer factors increase chromatin accessibility, facilitating the binding of GR and additional factors. AP-1 binding poises a fraction of accessible sites to be rapidly transcribed upon glucocorticoid-induced GR binding. The coordinated activity of multiple TFs then results in cell type-specific changes in gene expression. We anticipate that many models of inducible gene expression also require multiple distinct TFs that act at multiple steps of transcriptional regulation.
]]></description>
<dc:creator>Wissink, E. M.</dc:creator>
<dc:creator>Martinez, D. M.</dc:creator>
<dc:creator>Ehmsen, K. T.</dc:creator>
<dc:creator>Yamamoto, K. R.</dc:creator>
<dc:creator>Lis, J. T.</dc:creator>
<dc:date>2021-06-01</dc:date>
<dc:identifier>doi:10.1101/2021.06.01.444518</dc:identifier>
<dc:title><![CDATA[Glucocorticoid receptor collaborates with pioneer factors and AP-1 to execute genome-wide regulation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-06-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.10.11.463518v1?rss=1">
<title>
<![CDATA[
Factorbook: an Updated Catalog of Transcription Factor Motifs and Candidate Regulatory Motif Sites 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.10.11.463518v1?rss=1"
</link>
<description><![CDATA[
The human genome contains roughly 1,600 transcription factors (TFs) (1), DNA-binding proteins recognizing characteristic sequence motifs to exert regulatory effects on gene expression. The binding specificities of these factors have been profiled both in vitro, using techniques such as HT-SELEX (2), and in vivo, using techniques including ChIP-seq (3, 4). We previously developed Factorbook, a TF-centric database of annotations, motifs, and integrative analyses based on ChIP-seq data from Phase II of the ENCODE Project. Here we present an update to Factorbook which significantly expands the breadth of cell type and TF coverage. The update includes an expanded motif catalog derived from thousands of ENCODE Phase II and III ChIP-seq experiments and HT-SELEX experiments; this motif catalog is integrated with the ENCODE registry of candidate cis-regulatory elements to annotate a comprehensive collection of genome-wide candidate TF binding sites. The database also offers novel tools for applying the motif models within machine learning frameworks and using these models for integrative analysis, including annotation of variants and disease and trait heritability. We will continue to expand the resource as ENCODE Phase IV data are released.
]]></description>
<dc:creator>Pratt, H. E.</dc:creator>
<dc:creator>Andrews, G. R.</dc:creator>
<dc:creator>Phalke, N.</dc:creator>
<dc:creator>Purcaro, M. J.</dc:creator>
<dc:creator>van der Velde, A. G.</dc:creator>
<dc:creator>Moore, J. E.</dc:creator>
<dc:creator>Weng, Z.</dc:creator>
<dc:date>2021-10-12</dc:date>
<dc:identifier>doi:10.1101/2021.10.11.463518</dc:identifier>
<dc:title><![CDATA[Factorbook: an Updated Catalog of Transcription Factor Motifs and Candidate Regulatory Motif Sites]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-10-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.02.01.475239v1?rss=1">
<title>
<![CDATA[
Widespread contribution of transposable elements to the rewiring of mammalian 3D genomes and gene regulation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.02.01.475239v1?rss=1"
</link>
<description><![CDATA[
Transposable elements (TEs) are major contributors of genetic material in mammalian genomes. These often include binding sites for architectural proteins, including the multifarious master protein, CTCF. These TE-derived architectural protein binding sites shape the 3D genome by creating loops, domains, and compartments borders as well as RNA-DNA chromatin interactions, all of which play a role in the compact packaging of DNA in the nucleus and have the potential to facilitate regulatory function.

In this study, we explore the widespread contribution of TEs to mammalian 3D genomes by quantifying the extent to which they give rise to loops and domain border differences across various cell types and species using a variety of 3D genome mapping technologies. We show that specific (sub-)families of TEs have significantly contributed to lineage-specific 3D chromatin structures in specific mammals. In many cases, these loops have the potential to facilitate interaction between distant cis-regulatory elements and target genes, and domains have the potential to segregate chromatin state to impact gene expression in a lineage-specific and cell-type-specific manner. Backing our extensive conformation study cataloguing and computational analyses, we perform experimental validation using CRISPR-Cas9 to delete one such candidate TE and show disruption of species-specific 3D chromatin structure.

Taken together, we comprehensively quantify and selectively validate our finding that TEs contribute significantly to 3D genome organization and continuously shape it to affect gene regulation during the course of mammalian evolution over deep time.
]]></description>
<dc:creator>Choudhary, M. N.</dc:creator>
<dc:creator>Quaid, K.</dc:creator>
<dc:creator>Xing, X.</dc:creator>
<dc:creator>Schmidt, H.</dc:creator>
<dc:creator>Wang, T.</dc:creator>
<dc:date>2022-02-03</dc:date>
<dc:identifier>doi:10.1101/2022.02.01.475239</dc:identifier>
<dc:title><![CDATA[Widespread contribution of transposable elements to the rewiring of mammalian 3D genomes and gene regulation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-02-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.05.20.445067v1?rss=1">
<title>
<![CDATA[
Leveraging single-cell ATAC-seq to identify disease-critical fetal and adult brain cell types 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.05.20.445067v1?rss=1"
</link>
<description><![CDATA[
Prioritizing disease-critical cell types by integrating genome-wide association studies (GWAS) with functional data is a fundamental goal. Single-cell chromatin accessibility (scATAC-seq) and gene expression (scRNA-seq) have characterized cell types at high resolution, and early work on integrating GWAS with scRNA-seq has shown promise, but work on integrating GWAS with scATAC-seq has been limited. Here, we identify disease-critical fetal and adult brain cell types by integrating GWAS summary statistics from 28 brain-related diseases and traits (average N =298K) with 3.2 million scATAC-seq and scRNA-seq profiles from 83 cell types. We identified disease-critical fetal (resp. adult) brain cell types for 22 (resp. 23) of 28 traits using scATAC-seq data, and for 8 (resp. 17) of 28 traits using scRNA-seq data. Notable findings using scATAC-seq data included highly significant enrichments of fetal photoreceptor cells for major depressive disorder, fetal ganglion cells for BMI, fetal astrocytes for ADHD, and adult VGLUT2 excitatory neurons for schizophrenia. Our findings improve our understanding of brain-related diseases and traits, and inform future analyses of other diseases/traits.
]]></description>
<dc:creator>Kim, S. S.</dc:creator>
<dc:creator>Jagadeesh, K.</dc:creator>
<dc:creator>Dey, K. K.</dc:creator>
<dc:creator>Shen, A. Z.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:creator>Kellis, M.</dc:creator>
<dc:creator>Price, A. L.</dc:creator>
<dc:date>2021-05-21</dc:date>
<dc:identifier>doi:10.1101/2021.05.20.445067</dc:identifier>
<dc:title><![CDATA[Leveraging single-cell ATAC-seq to identify disease-critical fetal and adult brain cell types]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-05-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.10.23.462170v1?rss=1">
<title>
<![CDATA[
Compatibility logic of human enhancer and promoter sequences 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.10.23.462170v1?rss=1"
</link>
<description><![CDATA[
Gene regulation in the human genome is controlled by distal enhancers that activate specific nearby promoters. One model for the specificity of enhancer-promoter regulation is that different promoters might have sequence-encoded preferences for distinct classes of enhancers, for example mediated by interacting sets of transcription factors or cofactors. This "biochemical compatibility" model has been supported by observations at individual human promoters and by genome-wide measurements in Drosophila. However, the degree to which human enhancers and promoters are intrinsically compatible or specific has not been systematically measured, and how their activities combine to control RNA expression remains unclear. To address these questions, we designed a high-throughput reporter assay called enhancer x promoter (ExP) STARR-seq and applied it to examine the combinatorial compatibilities of 1,000 enhancer and 1,000 promoter sequences in human K562 cells. We identify a simple logic for enhancer-promoter compatibility - virtually all enhancers activated all promoters by similar amounts, and intrinsic enhancer and promoter activities combine multiplicatively to determine RNA output (R2=0.82). In addition, two classes of enhancers and promoters showed subtle preferential effects. Promoters of housekeeping genes contained built-in activating sequences, corresponding to motifs for factors such as GABPA and YY1, that correlated with both stronger autonomous promoter activity and enhancer activity, and weaker responsiveness to distal enhancers. Promoters of context-specific genes lacked these motifs and showed stronger responsiveness to enhancers. Together, this systematic assessment of enhancer-promoter compatibility suggests a multiplicative model tuned by enhancer and promoter class to control gene transcription in the human genome.
]]></description>
<dc:creator>Bergman, D. T.</dc:creator>
<dc:creator>Jones, T. R.</dc:creator>
<dc:creator>Liu, V.</dc:creator>
<dc:creator>Siraj, L.</dc:creator>
<dc:creator>Kang, H. Y.</dc:creator>
<dc:creator>Nasser, J.</dc:creator>
<dc:creator>Nguyen, T. H.</dc:creator>
<dc:creator>Grossman, S. R.</dc:creator>
<dc:creator>Fulco, C. P.</dc:creator>
<dc:creator>Lander, E. S.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:date>2021-10-24</dc:date>
<dc:identifier>doi:10.1101/2021.10.23.462170</dc:identifier>
<dc:title><![CDATA[Compatibility logic of human enhancer and promoter sequences]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-10-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.03.31.437978v1?rss=1">
<title>
<![CDATA[
Chromatin interaction aware gene regulatory modeling with graph attention networks 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.03.31.437978v1?rss=1"
</link>
<description><![CDATA[
Linking distal enhancers to genes and modeling their impact on target gene expression are longstanding unresolved problems in regulatory genomics and critical for interpreting non-coding genetic variation. Here we present a new deep learning approach called GraphReg that exploits 3D interactions from chromosome conformation capture assays in order to predict gene expression from 1D epigenomic data or genomic DNA sequence. By using graph attention networks to exploit the connectivity of distal elements up to 2Mb away in the genome, GraphReg more faithfully models gene regulation and more accurately predicts gene expression levels than state-of-the-art deep learning methods for this task. Feature attribution used with GraphReg accurately identifies functional enhancers of genes, as validated by CRISPRi-FlowFISH and TAP-seq assays, outperforming both CNNs and the recently proposed Activity-by-Contact model. Sequence-based GraphReg also accurately predicts direct transcription factor (TF) targets as validated by CRISPRi TF knockout experiments via in silico ablation of TF binding motifs. GraphReg therefore represents an important advance in modeling the regulatory impact of epigenomic and sequence elements.
]]></description>
<dc:creator>Karbalayghareh, A.</dc:creator>
<dc:creator>Sahin, M.</dc:creator>
<dc:creator>Leslie, C. S.</dc:creator>
<dc:date>2021-04-02</dc:date>
<dc:identifier>doi:10.1101/2021.03.31.437978</dc:identifier>
<dc:title><![CDATA[Chromatin interaction aware gene regulatory modeling with graph attention networks]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-04-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.03.16.483999v1?rss=1">
<title>
<![CDATA[
Evolution of transposable element-derived enhancer activity 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.03.16.483999v1?rss=1"
</link>
<description><![CDATA[
Many transposable elements (TEs) contain transcription factor binding sites and are implicated as potential regulatory elements. However, TEs are rarely functionally tested for regulatory activity, which in turn limits our understanding of how TE regulatory activity has evolved. We systematically tested the human LTR18A subfamily for regulatory activity using massively parallel reporter assay (MPRA) and found AP-1 and C/EBP-related binding motifs as drivers of enhancer activity. Functional analysis of evolutionarily reconstructed ancestral sequences revealed that LTR18A elements have generally lost regulatory activity over time through sequence changes, with the largest effects occurring due to mutations in the AP-1 and C/EBP motifs. We observed that the two motifs are conserved at higher rates than expected based on neutral evolution. Finally, we identified LTR18A elements as potential enhancers in the human genome, primarily in epithelial cells. Together, our results provide a model for the origin, evolution, and co-option of TE-derived regulatory elements.
]]></description>
<dc:creator>Du, A. Y.</dc:creator>
<dc:creator>Zhuo, X.</dc:creator>
<dc:creator>Sundaram, V.</dc:creator>
<dc:creator>Jensen, N. O.</dc:creator>
<dc:creator>Chaudhari, H. G.</dc:creator>
<dc:creator>Saccone, N. L.</dc:creator>
<dc:creator>Cohen, B. A.</dc:creator>
<dc:creator>Wang, T.</dc:creator>
<dc:date>2022-03-17</dc:date>
<dc:identifier>doi:10.1101/2022.03.16.483999</dc:identifier>
<dc:title><![CDATA[Evolution of transposable element-derived enhancer activity]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-03-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.04.19.488754v1?rss=1">
<title>
<![CDATA[
Predicting A/B compartments from histone modifications using deep learning 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.04.19.488754v1?rss=1"
</link>
<description><![CDATA[
Genomes fold into organizational units in the 3D space that can influence critical biological functions. In particular, the organization of chromatin into A and B compartments segregates its active regions from inactive regions. Compartments, evident in Hi-C contact matrices, have been used to describe cell-type specific changes in the A/B organization. However, obtaining Hi-C data for all cell and tissue types of interest is prohibitively expensive, which has limited the widespread consideration of compartment status. We present a prediction tool called Compartment prediction using Recurrent Neural Network (CoRNN) that models the relationship between the compartmental organization of the genome and histone modification enrichment. Our model predicts A/B compartments, in a cross-cell type setting, with an average area under the ROC curve of 90.9%. Our cell type-specific compartment predictions show high overlap with known functional elements. We investigate our predictions by systematically removing combinations of histone marks and find that H3K27ac and H3K36me3 are the most predictive marks. We then perform a detailed analysis of loci where compartment status cannot be accurately predicted from these marks. These regions represent chromatin with ambiguous compartmental status, likely due to variations in status within the population of cells. These ambiguous loci also show highly variable compartmental status between biological replicates in the same GM12878 cell type. Finally, we demonstrate the generalizability of our model by predicting compartments in independent tissue samples. Our software and trained model are publicly available at https://github.com/rsinghlab/CoRNN.
]]></description>
<dc:creator>Zheng, S.</dc:creator>
<dc:creator>Thakkar, N.</dc:creator>
<dc:creator>Harris, H. L.</dc:creator>
<dc:creator>Zhang, M.</dc:creator>
<dc:creator>Liu, S.</dc:creator>
<dc:creator>Gerstein, M.</dc:creator>
<dc:creator>Aiden, E. L.</dc:creator>
<dc:creator>Rowley, J.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:creator>Gursoy, G.</dc:creator>
<dc:creator>Singh, R.</dc:creator>
<dc:date>2022-04-19</dc:date>
<dc:identifier>doi:10.1101/2022.04.19.488754</dc:identifier>
<dc:title><![CDATA[Predicting A/B compartments from histone modifications using deep learning]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-04-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.29.493901v1?rss=1">
<title>
<![CDATA[
Uncovering Hidden Enhancers Through Unbiased In Vivo Testing 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.29.493901v1?rss=1"
</link>
<description><![CDATA[
Transcriptional enhancers are a predominant class of noncoding regulatory elements that activate cell type-specific gene expression. Tissue-specific enhancer-associated chromatin signatures have proven useful to identify candidate enhancer elements at a genome-wide scale, but their sensitivity for the comprehensive detection of all enhancers active in a given tissue in vivo remains unclear. Here we show that a substantial proportion of in vivo enhancers are hidden from discovery by conventional chromatin profiling methods. In an initial comparison of over 1,200 in vivo validated tissue-specific enhancers with tissue-matched mouse developmental epigenome data, 14% (n=286) of active enhancers did not show canonical enhancer-associated chromatin signatures in the tissue in which they are active. To assess the prevalence of enhancers not detectable by conventional chromatin profiling approaches in more detail, we used a high throughput transgenic enhancer reporter assay to systematically screen over 1.3 Mb of mouse genomic sequence at two critical developmental loci, assessing a total of 281 consecutive 5kb regions for in vivo enhancer activity in mouse embryos. We observed reproducible enhancer-reporter activity in 88 tissue-specific elements, 26% of which did not show canonical enhancer-associated chromatin signatures in the corresponding tissues. Overall, we find these hidden enhancers are indistinguishable from marked enhancers based on levels of evolutionary conservation, enrichment of transcription factor families, and genomic positioning relative to putative target genes. In combination, our retrospective and prospective studies assessed only 0.1% of the mouse genome and identified 309 tissue-specific enhancers that are hidden from current chromatin-based enhancer identification approaches. Our findings suggest the existence of tens of thousands of active enhancers throughout the genome that remain undetected by current chromatin profiling approaches and are an unappreciated source of additional genome function of import in interpreting growing whole human genome sequencing data.
]]></description>
<dc:creator>Mannion, B. J.</dc:creator>
<dc:creator>Osterwalder, M.</dc:creator>
<dc:creator>Tran, S.</dc:creator>
<dc:creator>Plajzer-Frick, I.</dc:creator>
<dc:creator>Novak, C. S.</dc:creator>
<dc:creator>Afzal, V.</dc:creator>
<dc:creator>Akiyama, J. A.</dc:creator>
<dc:creator>Barton, S.</dc:creator>
<dc:creator>Beckman, E.</dc:creator>
<dc:creator>Garvin, T. H.</dc:creator>
<dc:creator>Godfrey, P.</dc:creator>
<dc:creator>Godoy, J.</dc:creator>
<dc:creator>Hunter, R. D.</dc:creator>
<dc:creator>Kato, M.</dc:creator>
<dc:creator>Kosicki, M.</dc:creator>
<dc:creator>Kronshage, A. H.</dc:creator>
<dc:creator>Lee, E. A.</dc:creator>
<dc:creator>Meky, E. M.</dc:creator>
<dc:creator>Pham, Q. T.</dc:creator>
<dc:creator>von Maydell, K.</dc:creator>
<dc:creator>Zhu, Y.</dc:creator>
<dc:creator>Lopez-Rios, J.</dc:creator>
<dc:creator>Dickel, D. E.</dc:creator>
<dc:creator>Visel, A.</dc:creator>
<dc:creator>Pennacchio, L. A.</dc:creator>
<dc:date>2022-05-30</dc:date>
<dc:identifier>doi:10.1101/2022.05.29.493901</dc:identifier>
<dc:title><![CDATA[Uncovering Hidden Enhancers Through Unbiased In Vivo Testing]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-30</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.07.30.502157v1?rss=1">
<title>
<![CDATA[
The ENCODE Imputation Challenge: A critical assessment of methods for cross-cell type imputation of epigenomic profiles 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.07.30.502157v1?rss=1"
</link>
<description><![CDATA[
Functional genomics experiments are invaluable for understanding mechanisms of gene regulation. However, comprehensively performing all such experiments, even across a fixed set of sample and assay types, is often infeasible in practice. A promising alternative to performing experiments exhaustively is to, instead, perform a core set of experiments and subsequently use machine learning methods to impute the remaining experiments. However, questions remain as to the quality of the imputations, the best approaches for performing imputations, and even what performance measures meaningfully evaluate performance of such models. In this work, we address these questions by comprehensively analyzing imputations from 23 imputation models submitted to the ENCODE Imputation Challenge. We find that measuring the quality of imputations is significantly more challenging than reported in the literature, and is confounded by three factors: major distributional shifts that arise because of differences in data collection and processing over time, the amount of available data per cell type, and redundancy among performance measures. Our systematic analyses suggest several steps that are necessary, but also simple, for fairly evaluating the performance of such models, as well as promising directions for more robust research in this area.
]]></description>
<dc:creator>Schreiber, J. M.</dc:creator>
<dc:creator>Boix, C. A.</dc:creator>
<dc:creator>Lee, J. w.</dc:creator>
<dc:creator>Li, H.</dc:creator>
<dc:creator>Guan, Y.</dc:creator>
<dc:creator>Chang, C.-C.</dc:creator>
<dc:creator>Chang, J.-C.</dc:creator>
<dc:creator>Hawkins-Hooker, A.</dc:creator>
<dc:creator>Schoelkopf, B.</dc:creator>
<dc:creator>Schweikert, G.</dc:creator>
<dc:creator>Rojas Carulla, M.</dc:creator>
<dc:creator>Canakoglu, A.</dc:creator>
<dc:creator>Guzzo, F.</dc:creator>
<dc:creator>Nanni, L.</dc:creator>
<dc:creator>Masseroli, M.</dc:creator>
<dc:creator>Carman, M. J.</dc:creator>
<dc:creator>Pinoli, P.</dc:creator>
<dc:creator>Hong, C.</dc:creator>
<dc:creator>Yip, K. Y.</dc:creator>
<dc:creator>Spence, J. P.</dc:creator>
<dc:creator>Batra, S. S.</dc:creator>
<dc:creator>Song, Y. S.</dc:creator>
<dc:creator>Mahony, S.</dc:creator>
<dc:creator>Zhang, Z.</dc:creator>
<dc:creator>Tan, W.</dc:creator>
<dc:creator>Shen, Y.</dc:creator>
<dc:creator>Sun, Y.</dc:creator>
<dc:creator>Shi, M.</dc:creator>
<dc:creator>Adrian, J.</dc:creator>
<dc:creator>Sandstrom, R. S.</dc:creator>
<dc:creator>Farrell, N.</dc:creator>
<dc:creator>Halow, J. M.</dc:creator>
<dc:creator>Lee, K.</dc:creator>
<dc:creator>Jiang, L.</dc:creator>
<dc:creator>Yang, X.</dc:creator>
<dc:creator>Epstein, C. B.</dc:creator>
<dc:creator>Strattan, J. S.</dc:creator>
<dc:creator>Snyder, M. P.</dc:creator>
<dc:creator>Kellis, M.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:creator>Kundaje, A. B.</dc:creator>
<dc:date>2022-08-02</dc:date>
<dc:identifier>doi:10.1101/2022.07.30.502157</dc:identifier>
<dc:title><![CDATA[The ENCODE Imputation Challenge: A critical assessment of methods for cross-cell type imputation of epigenomic profiles]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-08-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.02.490368v1?rss=1">
<title>
<![CDATA[
Genome-wide CRISPR guide RNA design and specificity analysis with GuideScan2 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.02.490368v1?rss=1"
</link>
<description><![CDATA[
We present GuideScan2 for memory-efficient, parallelizable construction of high-specificity CRISPR guide RNA (gRNA) databases and user-friendly gRNA/library design in custom genomes. GuideScan2 analysis identified widespread confounding effects of low-specificity gRNAs in published CRISPR knockout, interference and activation screens and enabled construction of a ready-to-use gRNA library that reduced off-target effects in a novel gene essentiality screen. GuideScan2 also enabled the design and experimental validation of allele-specific gRNAs in a hybrid mouse genome.
]]></description>
<dc:creator>Schmidt, H.</dc:creator>
<dc:creator>Zhang, M.</dc:creator>
<dc:creator>Mourelatos, H.</dc:creator>
<dc:creator>Sanchez-Rivera, F. J.</dc:creator>
<dc:creator>Lowe, S. W.</dc:creator>
<dc:creator>Ventura, A.</dc:creator>
<dc:creator>Leslie, C. S.</dc:creator>
<dc:creator>Pritykin, Y.</dc:creator>
<dc:date>2022-05-03</dc:date>
<dc:identifier>doi:10.1101/2022.05.02.490368</dc:identifier>
<dc:title><![CDATA[Genome-wide CRISPR guide RNA design and specificity analysis with GuideScan2]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.02.490310v1?rss=1">
<title>
<![CDATA[
Scalable sequence-informed embedding of single-cell ATAC-seq data with CellSpace 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.02.490310v1?rss=1"
</link>
<description><![CDATA[
Standard scATAC-seq analysis pipelines represent cells as sparse numeric vectors relative to an atlas of peaks or genomic tiles and consequently ignore genomic sequence information at accessible loci. We present CellSpace, an efficient and scalable sequence-informed embedding algorithm for scATAC-seq that learns a mapping of DNA k-mers and cells to the same space. CellSpace captures meaningful latent structure in scATAC-seq datasets, including cell subpopulations and developmental hierarchies, and scores the activity of transcription factors in single cells based on proximity to binding motifs embedded in the same space. Importantly, CellSpace implicitly mitigates batch effects arising from multiple samples, donors, or assays, even when individual datasets are processed relative to different peak atlases. Thus, CellSpace provides a powerful tool for integrating and interpreting large-scale scATAC-seq compendia.
]]></description>
<dc:creator>Tayyebi, Z.</dc:creator>
<dc:creator>Pine, A. R.</dc:creator>
<dc:creator>Leslie, C. S.</dc:creator>
<dc:date>2022-05-02</dc:date>
<dc:identifier>doi:10.1101/2022.05.02.490310</dc:identifier>
<dc:title><![CDATA[Scalable sequence-informed embedding of single-cell ATAC-seq data with CellSpace]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.12.02.470663v1?rss=1">
<title>
<![CDATA[
Epiphany: predicting Hi-C contact maps from 1D epigenomic signals 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.12.02.470663v1?rss=1"
</link>
<description><![CDATA[
Recent deep learning models that predict the Hi-C contact map from DNA sequence achieve promising accuracy but cannot generalize to new cell types and indeed do not capture cell-type-specific differences among training cell types. We propose Epiphany, a neural network to predict cell-type-specific Hi-C contact maps from five epigenomic tracks that are already available in hundreds of cell types and tissues: DNase I hypersensitive sites and ChIP-seq for CTCF, H3K27ac, H3K27me3, and H3K4me3. Epiphany uses 1D convolutional layers to learn local representations from the input tracks, a bidirectional long short-term memory (Bi-LSTM) layers to capture long term dependencies along the epigenome, as well as a generative adversarial network (GAN) architecture to encourage contact map realism. To improve the usability of predicted contact matrices, we trained and evaluated models using multiple normalization and matrix balancing techniques including KR, ICE, and HiC-DC+ Z-score and observed-over-expected count ratio. Epiphany is trained with a combination of MSE and adversarial (i.a., a GAN) loss to enhance its ability to produce realistic Hi-C contact maps for downstream analysis. Epiphany shows robust performance and generalization to held-out chromosomes within and across cell types and species, and its predicted contact matrices yield accurate TAD and significant interaction calls. At inference time, Epiphany can be used to study the contribution of specific epigenomic peaks to 3D architecture and to predict the structural changes caused by perturbations of epigenomic signals.
]]></description>
<dc:creator>Yang, R.</dc:creator>
<dc:creator>Das, A.</dc:creator>
<dc:creator>Gao, V. R.</dc:creator>
<dc:creator>Karbalayghareh, A.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:creator>Bilmes, J. A.</dc:creator>
<dc:creator>Leslie, C. S.</dc:creator>
<dc:date>2021-12-03</dc:date>
<dc:identifier>doi:10.1101/2021.12.02.470663</dc:identifier>
<dc:title><![CDATA[Epiphany: predicting Hi-C contact maps from 1D epigenomic signals]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-12-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.22.521605v1?rss=1">
<title>
<![CDATA[
Three linked opposing regulatory variants under selection associate with IVD 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.22.521605v1?rss=1"
</link>
<description><![CDATA[
While genome-wide association studies (GWAS) and selection scans identify genomic loci driving human phenotypic diversity, functional validation is required to discover the variant(s) responsible. We dissected the IVD locus, implicated by selection statistics, multiple GWAS, and clinical genetics as important to function and fitness. We combined luciferase assays, CRISPR/Cas9 genome-editing, massively parallel reporter assays (MPRA), and bashing of regulatory loci. We identified three regulatory variants, including an indel, that may underpin GWAS signals for pulmonary fibrosis and testosterone, and that are linked on a positively selected haplotype in the Japanese population. These regulatory variants exhibit synergistic and opposing effects on IVD expression experimentally. Alleles at these variants lie on a haplotype tagged by the variant most strongly associated with IVD expression and metabolites, but with no functional evidence itself. This work demonstrates how comprehensive functional investigation and multiple technologies are needed to discover the true genetic drivers of phenotypic diversity.
]]></description>
<dc:creator>Brown, E. A.</dc:creator>
<dc:creator>Kales, S.</dc:creator>
<dc:creator>Boyle, M. J.</dc:creator>
<dc:creator>Vitti, J.</dc:creator>
<dc:creator>Kotliar, D.</dc:creator>
<dc:creator>Schaffner, S. F.</dc:creator>
<dc:creator>Tewhey, R. S.</dc:creator>
<dc:creator>Sabeti, P. C.</dc:creator>
<dc:date>2022-12-22</dc:date>
<dc:identifier>doi:10.1101/2022.12.22.521605</dc:identifier>
<dc:title><![CDATA[Three linked opposing regulatory variants under selection associate with IVD]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.11.05.467434v1?rss=1">
<title>
<![CDATA[
Multiplex genomic recording of enhancer and signal transduction activity in mammalian cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.11.05.467434v1?rss=1"
</link>
<description><![CDATA[
Measurements of gene expression and signal transduction activity are conventionally performed with methods that require either the destruction or live imaging of a biological sample within the timeframe of interest. Here we demonstrate an alternative paradigm, termed ENGRAM (ENhancer-driven Genomic Recording of transcriptional Activity in Multiplex), in which the activity and dynamics of multiple transcriptional reporters are stably recorded to DNA. ENGRAM is based on the prime editing-mediated insertion of signal- or enhancer-specific barcodes to a genomically encoded recording unit. We show how this strategy can be used to concurrently genomically record the relative activity of at least hundreds of enhancers with high fidelity, sensitivity and reproducibility. Leveraging synthetic enhancers that are responsive to specific signal transduction pathways, we further demonstrate time- and concentration-dependent genomic recording of Wnt, NF-{kappa}B, and Tet-On activity. Finally, by coupling ENGRAM to sequential genome editing, we show how serially occurring molecular events can potentially be ordered. Looking forward, we envision that multiplex, ENGRAM-based recording of the strength, duration and order of enhancer and signal transduction activities has broad potential for application in functional genomics, developmental biology and neuroscience.
]]></description>
<dc:creator>Chen, W.</dc:creator>
<dc:creator>Choi, J.</dc:creator>
<dc:creator>Nathans, J. F.</dc:creator>
<dc:creator>Agarwal, V.</dc:creator>
<dc:creator>Martin, B.</dc:creator>
<dc:creator>Nichols, E.</dc:creator>
<dc:creator>Leith, A.</dc:creator>
<dc:creator>Lee, C.</dc:creator>
<dc:creator>Shendure, J.</dc:creator>
<dc:date>2021-11-05</dc:date>
<dc:identifier>doi:10.1101/2021.11.05.467434</dc:identifier>
<dc:title><![CDATA[Multiplex genomic recording of enhancer and signal transduction activity in mammalian cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-11-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.05.531189v1?rss=1">
<title>
<![CDATA[
Massively parallel characterization of transcriptional regulatory elements in three diverse human cell types 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.05.531189v1?rss=1"
</link>
<description><![CDATA[
The human genome contains millions of candidate cis-regulatory elements (CREs) with cell-type-specific activities that shape both health and myriad disease states. However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these CREs. Here, we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of over 680,000 sequences, representing a nearly comprehensive set of all annotated CREs among three cell types (HepG2, K562, and WTC11), finding 41.7% to be functional. By testing sequences in both orientations, we find promoters to have significant strand orientation effects. We also observe that their 200 nucleotide cores function as non-cell-type-specific  on switches providing similar expression levels to their associated gene. In contrast, enhancers have weaker orientation effects, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict CRE function with high accuracy and delineate regulatory motifs. Testing an additional lentiMPRA library encompassing 60,000 CREs in all three cell types, we further identified factors that determine cell-type specificity. Collectively, our work provides an exhaustive catalog of functional CREs in three widely used cell lines, and showcases how large-scale functional measurements can be used to dissect regulatory grammar.
]]></description>
<dc:creator>Agarwal, V.</dc:creator>
<dc:creator>Inoue, F.</dc:creator>
<dc:creator>Schubach, M.</dc:creator>
<dc:creator>Martin, B.</dc:creator>
<dc:creator>Dash, P.</dc:creator>
<dc:creator>Zhang, Z.</dc:creator>
<dc:creator>Sohota, A.</dc:creator>
<dc:creator>Noble, W.</dc:creator>
<dc:creator>Yardimci, G.</dc:creator>
<dc:creator>Kircher, M.</dc:creator>
<dc:creator>Shendure, J.</dc:creator>
<dc:creator>Ahituv, N.</dc:creator>
<dc:date>2023-03-06</dc:date>
<dc:identifier>doi:10.1101/2023.03.05.531189</dc:identifier>
<dc:title><![CDATA[Massively parallel characterization of transcriptional regulatory elements in three diverse human cell types]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.05.01.538906v1?rss=1">
<title>
<![CDATA[
Orthogonal CRISPR screens to identify transcriptional and epigenetic regulators of human CD8 T cell function 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.05.01.538906v1?rss=1"
</link>
<description><![CDATA[
The clinical response to adoptive T cell therapies is strongly associated with transcriptional and epigenetic state. Thus, technologies to discover regulators of T cell gene networks and their corresponding phenotypes have great potential to improve the efficacy of T cell therapies. We developed pooled CRISPR screening approaches with compact epigenome editors to systematically profile the effects of activation and repression of 120 transcription factors and epigenetic modifiers on human CD8+ T cell state. These screens nominated known and novel regulators of T cell phenotypes with BATF3 emerging as a high confidence gene in both screens. We found that BATF3 overexpression promoted specific features of memory T cells such as increased IL7R expression and glycolytic capacity, while attenuating gene programs associated with cytotoxicity, regulatory T cell function, and T cell exhaustion. In the context of chronic antigen stimulation, BATF3 overexpression countered phenotypic and epigenetic signatures of T cell exhaustion. CAR T cells overexpressing BATF3 significantly outperformed control CAR T cells in both in vitro and in vivo tumor models. Moreover, we found that BATF3 programmed a transcriptional profile that correlated with positive clinical response to adoptive T cell therapy. Finally, we performed CRISPR knockout screens with and without BATF3 overexpression to define co-factors and downstream factors of BATF3, as well as other therapeutic targets. These screens pointed to a model where BATF3 interacts with JUNB and IRF4 to regulate gene expression and illuminated several other novel targets for further investigation.
]]></description>
<dc:creator>McCutcheon, S.</dc:creator>
<dc:creator>Swartz, A.</dc:creator>
<dc:creator>Brown, M.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>McRoberts Amador, C.</dc:creator>
<dc:creator>Siklenka, K.</dc:creator>
<dc:creator>Humayun, L.</dc:creator>
<dc:creator>Isaacs, J.</dc:creator>
<dc:creator>Reddy, T. E.</dc:creator>
<dc:creator>Nair, S.</dc:creator>
<dc:creator>Antonia, S.</dc:creator>
<dc:creator>Gersbach, C. A.</dc:creator>
<dc:date>2023-05-01</dc:date>
<dc:identifier>doi:10.1101/2023.05.01.538906</dc:identifier>
<dc:title><![CDATA[Orthogonal CRISPR screens to identify transcriptional and epigenetic regulators of human CD8 T cell function]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-05-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.21.520137v1?rss=1">
<title>
<![CDATA[
Multi-center integrated analysis of non-coding CRISPR screens 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.21.520137v1?rss=1"
</link>
<description><![CDATA[
The ENCODE Consortiums efforts to annotate non-coding, cis-regulatory elements (CREs) have advanced our understanding of gene regulatory landscapes which play a major role in health and disease. Pooled, non-coding CRISPR screens are a promising approach for systematically investigating gene regulatory mechanisms. Here, the ENCODE Functional Characterization Centers report 109 screens comprising 346,970 individual perturbations across 13.3Mb of the genome, using a variety of methods, readouts, and statistical analyses. Across 332 functionally confirmed CRE-gene links, we identify principles for screening endogenous, non-coding elements for causal regulatory mechanisms. Nearly all CREs show strong evidence of open chromatin, and targeting accessibility peak summits is a critical component of our proposed sgRNA design rules. We provide experimental guidelines to accurately detect CREs with variable, often low, transcriptional effects. We discover a previously undescribed DNA strand-bias for CRISPRi in transcribed regions with implications for screen design and analysis. Benchmarking five screen analysis tools, we find CASA produces the most conservative CRE calls and is robust to artifacts of low-specificity sgRNAs. Together, we provide an accessible data resource, predesigned sgRNAs targeting 3,275,697 ENCODE SCREEN candidate CREs, and screening guidelines to accelerate functional characterization of the non-coding genome.
]]></description>
<dc:creator>Yao, D.</dc:creator>
<dc:creator>Tycko, J.</dc:creator>
<dc:creator>Oh, W.</dc:creator>
<dc:creator>Bounds, L. R.</dc:creator>
<dc:creator>Gosai, S. J.</dc:creator>
<dc:creator>Lataniotis, L.</dc:creator>
<dc:creator>Mackay-Smith, A.</dc:creator>
<dc:creator>Doughty, B. R.</dc:creator>
<dc:creator>Gabdank, I.</dc:creator>
<dc:creator>Schmidt, H.</dc:creator>
<dc:creator>Youngworth, I.</dc:creator>
<dc:creator>Andreeva, K.</dc:creator>
<dc:creator>Ren, X.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>Luo, Y.</dc:creator>
<dc:creator>Siklenka, K.</dc:creator>
<dc:creator>Yardimci, G. G.</dc:creator>
<dc:creator>The ENCODE4 Consortium,</dc:creator>
<dc:creator>Tewhey, R.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Greenleaf, W. J.</dc:creator>
<dc:creator>Sabeti, P. C.</dc:creator>
<dc:creator>Leslie, C.</dc:creator>
<dc:creator>Pritykin, Y.</dc:creator>
<dc:creator>Moore, J. E.</dc:creator>
<dc:creator>Beer, M. A.</dc:creator>
<dc:creator>Gersbach, C.</dc:creator>
<dc:creator>Reddy, T. E.</dc:creator>
<dc:creator>Shen, Y.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:creator>Bassik, M. C.</dc:creator>
<dc:creator>Reilly, S. K.</dc:creator>
<dc:date>2022-12-22</dc:date>
<dc:identifier>doi:10.1101/2022.12.21.520137</dc:identifier>
<dc:title><![CDATA[Multi-center integrated analysis of non-coding CRISPR screens]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.08.30.555359v1?rss=1">
<title>
<![CDATA[
Functional characterization of gene regulatory elements and neuropsychiatric disease-associated risk loci in iPSCs and iPSC-derived neurons 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.08.30.555359v1?rss=1"
</link>
<description><![CDATA[
Genome-wide association studies (GWAS) have identified thousands of non-coding variants that contribute to psychiatric disease risks, likely by perturbing cis-regulatory elements (CREs). However, our ability to interpret and explore their mechanisms of action is hampered by a lack of annotation of functional CREs (fCREs) in neural cell types. Here, through genome-scale CRISPR screens of 22,000 candidate CREs (cCREs) in human induced pluripotent stem cells (iPSCs) undergoing differentiation to excitatory neurons, we identify 2,847 and 5,540 fCREs essential for iPSC fitness and neuronal differentiation, respectively. These fCREs display dynamic epigenomic features and exhibit increased numbers and genomic spans of chromatin interactions following terminal neuronal differentiation. Furthermore, fCREs essential for neuronal differentiation show significantly greater enrichment of genetic heritability for neurodevelopmental diseases including schizophrenia (SCZ), attention deficit hyperactivity disorder (ADHD), and autism spectrum disorders (ASD) than cCREs. Using high-throughput prime editing screens we experimentally confirm 45 SCZ risk variants that act by affecting the function of fCREs. The extensive and in-depth functional annotation of cCREs in neuronal types therefore provides a crucial resource for interpreting non-coding risk variants of neuropsychiatric disorders.
]]></description>
<dc:creator>Yang, X.</dc:creator>
<dc:creator>Jones, I. R.</dc:creator>
<dc:creator>Chen, P. B.</dc:creator>
<dc:creator>Yang, H.</dc:creator>
<dc:creator>Ren, X.</dc:creator>
<dc:creator>Zheng, L.</dc:creator>
<dc:creator>Li, B.</dc:creator>
<dc:creator>Li, Y. E.</dc:creator>
<dc:creator>Sun, Q.</dc:creator>
<dc:creator>Wen, J.</dc:creator>
<dc:creator>Beaman, C.</dc:creator>
<dc:creator>Cui, X.</dc:creator>
<dc:creator>Li, Y.</dc:creator>
<dc:creator>Wang, W.</dc:creator>
<dc:creator>Hu, M.</dc:creator>
<dc:creator>Ren, B.</dc:creator>
<dc:creator>Shen, Y.</dc:creator>
<dc:date>2023-08-30</dc:date>
<dc:identifier>doi:10.1101/2023.08.30.555359</dc:identifier>
<dc:title><![CDATA[Functional characterization of gene regulatory elements and neuropsychiatric disease-associated risk loci in iPSCs and iPSC-derived neurons]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-08-30</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.04.04.535623v1?rss=1">
<title>
<![CDATA[
The ENCODE Uniform Analysis Pipelines 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.04.04.535623v1?rss=1"
</link>
<description><![CDATA[
The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

Database URL: https://www.encodeproject.org/
]]></description>
<dc:creator>Hitz, B. C.</dc:creator>
<dc:creator>Lee, J.-W.</dc:creator>
<dc:creator>Jolanki, O.</dc:creator>
<dc:creator>Kagda, M. S.</dc:creator>
<dc:creator>Graham, K.</dc:creator>
<dc:creator>Sud, P.</dc:creator>
<dc:creator>Gabdank, I.</dc:creator>
<dc:creator>Strattan, J. S.</dc:creator>
<dc:creator>Sloan, C. A.</dc:creator>
<dc:creator>Dreszer, T.</dc:creator>
<dc:creator>Rowe, L. D.</dc:creator>
<dc:creator>Podduturi, N. R.</dc:creator>
<dc:creator>Malladi, V. S.</dc:creator>
<dc:creator>Chan, E. T.</dc:creator>
<dc:creator>Davidson, J. M.</dc:creator>
<dc:creator>Ho, M.</dc:creator>
<dc:creator>Miyasato, S.</dc:creator>
<dc:creator>Simison, M.</dc:creator>
<dc:creator>Tanaka, F.</dc:creator>
<dc:creator>Luo, Y.</dc:creator>
<dc:creator>Whaling, I.</dc:creator>
<dc:creator>Lin, K.</dc:creator>
<dc:creator>Jou, J.</dc:creator>
<dc:creator>Hong, E. L.</dc:creator>
<dc:creator>Lee, B. T.</dc:creator>
<dc:creator>Sandstrom, R.</dc:creator>
<dc:creator>Rynes, E.</dc:creator>
<dc:creator>Nelson, J.</dc:creator>
<dc:creator>Nishida, A.</dc:creator>
<dc:creator>Ingersoll, A.</dc:creator>
<dc:creator>Buckley, M.</dc:creator>
<dc:creator>Frerker, M.</dc:creator>
<dc:creator>Kim, D. S.</dc:creator>
<dc:creator>Boley, N.</dc:creator>
<dc:creator>Trout, D.</dc:creator>
<dc:creator>Dobin, A.</dc:creator>
<dc:creator>Rahmanian, S.</dc:creator>
<dc:creator>Wyman, D.</dc:creator>
<dc:creator>Balderrama-Gutierrez, G.</dc:creator>
<dc:creator>Reese, F.</dc:creator>
<dc:creator>Durand, N. C.</dc:creator>
<dc:creator>Dudchenko, O.</dc:creator>
<dc:creator>Weisz, D.</dc:creator>
<dc:creator>Rao, S. S. P.</dc:creator>
<dc:creator>Blackburn, A.</dc:creator>
<dc:creator>Gkountarou</dc:creator>
<dc:date>2023-04-06</dc:date>
<dc:identifier>doi:10.1101/2023.04.04.535623</dc:identifier>
<dc:title><![CDATA[The ENCODE Uniform Analysis Pipelines]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-04-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.08.29.555291v1?rss=1">
<title>
<![CDATA[
Super-silencer perturbation by EZH2 and REST inhibition leads to large loss of chromatin interactions and reduction in cancer growth 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.08.29.555291v1?rss=1"
</link>
<description><![CDATA[
Human silencers have been shown to exist and regulate developmental gene expression. However, the functional importance of human silencers needs to be elucidated, such as whether they can form "super-silencers" and whether they are linked to cancer progression. Here, through interrogating two putative silencer components of FGF18 gene, we found that two nearby silencers can cooperate via compensatory chromatin interactions to form a "super-silencer". Furthermore, double knockout of two silencers exhibited synergistic upregulation of FGF18 expression and changes of cell identity. To perturb the "super-silencers", we applied combinational treatment of an EZH2 inhibitor GSK343, and a REST inhibitor, X5050 ("GR"). We found that GR led to severe loss of TADs and loops, while the use of one inhibitor by itself only showed mild changes. Such changes in TADs and loops were associated with reduced CTCF and TOP2A mRNA levels. Moreover, GSK343 and X5050 synergistically upregulated super-silencer-controlled genes related to cell cycle, apoptosis and DNA damage, leading to anticancer effects both in vitro and in vivo. Overall, our data demonstrated the first example of a "super-silencer" and showed that combinational usage of GSK343 and X5050 to disrupt "super-silencers" could potentially lead to cancer ablation.
]]></description>
<dc:creator>Zhang, Y.</dc:creator>
<dc:creator>Chen, K.</dc:creator>
<dc:creator>Tang, S. C.</dc:creator>
<dc:creator>Cai, Y.</dc:creator>
<dc:creator>Nambu, A.</dc:creator>
<dc:creator>See, Y. X.</dc:creator>
<dc:creator>Fu, C.</dc:creator>
<dc:creator>Raju, A.</dc:creator>
<dc:creator>Lebeau, B.</dc:creator>
<dc:creator>Ling, Z.</dc:creator>
<dc:creator>Mutwil, M.</dc:creator>
<dc:creator>Lakshmanan, M.</dc:creator>
<dc:creator>Osato, M.</dc:creator>
<dc:creator>Tergaonkar, V.</dc:creator>
<dc:creator>Fullwood, M. J.</dc:creator>
<dc:date>2023-08-30</dc:date>
<dc:identifier>doi:10.1101/2023.08.29.555291</dc:identifier>
<dc:title><![CDATA[Super-silencer perturbation by EZH2 and REST inhibition leads to large loss of chromatin interactions and reduction in cancer growth]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-08-30</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.05.556380v1?rss=1">
<title>
<![CDATA[
Regulatory Transposable Elements in the Encyclopedia of DNA Elements 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.05.556380v1?rss=1"
</link>
<description><![CDATA[
Transposable elements (TEs) make up about half of the human genome and many have the biochemical hallmarks of tissue- or cell type-specific cis-regulatory elements. While some TEs have been rigorously documented to contribute directly to host gene regulation, we still have a very partial view of their regulatory landscape. Leveraging Phase 4 ENCODE data, we carried out the most comprehensive study to date of TE contributions to the regulatory genome. Here we investigated the sequence origins of candidate cis-regulatory elements (cCREs), showing that [~]25% of human cCREs comprising 236,181 elements are derived from TEs. Human-mouse comparisons indicate that over 90% of TE-derived cCREs are lineage-specific, accounting for 8-36% of lineage-specific cCREs across cCRE types. Next, we found that cCRE-associated transcription factor (TF) binding motifs in TEs originated from TE ancestral sequences significantly more than expected in all TE classes except for SINEs. Using both cCRE and TF binding data, we discovered that TEs providing cCREs and TF binding sites are closer in genomic distance to non-TE sites compared to other TEs, suggesting that TE integration site influences their later co-option as regulatory elements. We show that TEs have promoted TF binding site turnover events since human-mouse divergence, accounting for 3-56% of turnover events across 30 TFs examined. Finally, we demonstrate that TE-derived cCREs share similar features with non-TE cCREs, including massively parallel reporter assay activity and GWAS variant enrichment. Overall, our results substantiate the notion that TEs have played an important role in shaping the human regulatory genome.
]]></description>
<dc:creator>Du, A. Y.</dc:creator>
<dc:creator>Chobirko, J. D.</dc:creator>
<dc:creator>Zhuo, X.</dc:creator>
<dc:creator>Feschotte, C.</dc:creator>
<dc:creator>Wang, T.</dc:creator>
<dc:date>2023-09-06</dc:date>
<dc:identifier>doi:10.1101/2023.09.05.556380</dc:identifier>
<dc:title><![CDATA[Regulatory Transposable Elements in the Encyclopedia of DNA Elements]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.10.06.561128v1?rss=1">
<title>
<![CDATA[
Gapped-kmer sequence modeling robustly identifies regulatory vocabularies and distal enhancers conserved between evolutionarily distant mammals 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.10.06.561128v1?rss=1"
</link>
<description><![CDATA[
Gene regulatory elements drive many complex biological phenomena such as fetal development, and their mutations are linked to a multitude of common human diseases. The phenotypic impacts of regulatory variants are often tested using their conserved orthologous counterparts in model organisms such as mice. However, mapping human enhancers to conserved elements in mice remains a challenge, due to both rapid evolution of enhancers and limitations of current computational methods to detect conserved regulatory sequences. To improve upon existing computational methods and to better understand the sources of this apparent regulatory divergence, we comprehensively measured the evolutionary dynamics of distal enhancers across 45 matched human/mouse cell/tissue pairs from more than 1,000 DNase-seq experiments. Using this expansive dataset, we show that while cell-specific regulatory vocabulary is conserved, enhancers evolve more rapidly than other genomic elements such as promoters and CTCF binding sites. We observed surprisingly high levels of cell-specific variability in enhancer conservation rates, in part explainable by tissue specific transposable element activity. To improve orthologous enhancer mapping, we developed an improved genome alignment algorithm using gapped-kmer sequence features, and using the matched cell/tissue pairs, we show that this novel computational method, gkm-align, discovers 23,660 novel human/mouse conserved enhancers missed by standard alignment algorithms.
]]></description>
<dc:creator>Oh, J. W.</dc:creator>
<dc:creator>Beer, M. A.</dc:creator>
<dc:date>2023-10-06</dc:date>
<dc:identifier>doi:10.1101/2023.10.06.561128</dc:identifier>
<dc:title><![CDATA[Gapped-kmer sequence modeling robustly identifies regulatory vocabularies and distal enhancers conserved between evolutionarily distant mammals]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-10-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.11.09.563812v1?rss=1">
<title>
<![CDATA[
An encyclopedia of enhancer-gene regulatory interactions in the human genome 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.11.09.563812v1?rss=1"
</link>
<description><![CDATA[
Identifying transcriptional enhancers and their target genes is essential for understanding gene regulation and the impact of human genetic variation on disease1-6. Here we create and evaluate a resource of >13 million enhancer-gene regulatory interactions across 352 cell types and tissues, by integrating predictive models, measurements of chromatin state and 3D contacts, and large-scale genetic perturbations generated by the ENCODE Consortium7. We first create a systematic benchmarking pipeline to compare predictive models, assembling a dataset of 10,411 element-gene pairs measured in CRISPR perturbation experiments, >30,000 fine-mapped eQTLs, and 569 fine-mapped GWAS variants linked to a likely causal gene. Using this framework, we develop a new predictive model, ENCODE-rE2G, that achieves state-of-the-art performance across multiple prediction tasks, demonstrating a strategy involving iterative perturbations and supervised machine learning to build increasingly accurate predictive models of enhancer regulation. Using the ENCODE-rE2G model, we build an encyclopedia of enhancer-gene regulatory interactions in the human genome, which reveals global properties of enhancer networks, identifies differences in the functions of genes that have more or less complex regulatory landscapes, and improves analyses to link noncoding variants to target genes and cell types for common, complex diseases. By interpreting the model, we find evidence that, beyond enhancer activity and 3D enhancer-promoter contacts, additional features guide enhancer-promoter communication including promoter class and enhancer-enhancer synergy. Altogether, these genome-wide maps of enhancer-gene regulatory interactions, benchmarking software, predictive models, and insights about enhancer function provide a valuable resource for future studies of gene regulation and human genetics.
]]></description>
<dc:creator>Gschwind, A. R.</dc:creator>
<dc:creator>Mualim, K. S.</dc:creator>
<dc:creator>Karbalayghareh, A.</dc:creator>
<dc:creator>Sheth, M. U.</dc:creator>
<dc:creator>Dey, K. K.</dc:creator>
<dc:creator>Jagoda, E.</dc:creator>
<dc:creator>Nurtdinov, R. N.</dc:creator>
<dc:creator>Xi, W.</dc:creator>
<dc:creator>Tan, A. S.</dc:creator>
<dc:creator>Jones, H.</dc:creator>
<dc:creator>Ma, X. R.</dc:creator>
<dc:creator>Yao, D.</dc:creator>
<dc:creator>Nasser, J.</dc:creator>
<dc:creator>Avsec, Z.</dc:creator>
<dc:creator>James, B. T.</dc:creator>
<dc:creator>Shamim, M. S.</dc:creator>
<dc:creator>Durand, N. C.</dc:creator>
<dc:creator>Rao, S. S. P.</dc:creator>
<dc:creator>Mahajan, R.</dc:creator>
<dc:creator>Doughty, B. R.</dc:creator>
<dc:creator>Andreeva, K.</dc:creator>
<dc:creator>Ulirsch, J. C.</dc:creator>
<dc:creator>Fan, K.</dc:creator>
<dc:creator>Perez, E. M.</dc:creator>
<dc:creator>Nguyen, T. C.</dc:creator>
<dc:creator>Kelley, D. R.</dc:creator>
<dc:creator>Finucane, H. K.</dc:creator>
<dc:creator>Moore, J. E.</dc:creator>
<dc:creator>Weng, Z.</dc:creator>
<dc:creator>Kellis, M.</dc:creator>
<dc:creator>Bassik, M. C.</dc:creator>
<dc:creator>Price, A. L.</dc:creator>
<dc:creator>Beer, M. A.</dc:creator>
<dc:creator>Guigo, R.</dc:creator>
<dc:creator>Stamatoyannopoulos, J. A.</dc:creator>
<dc:creator>Aiden, E. L.</dc:creator>
<dc:creator>Greenleaf, W. J.</dc:creator>
<dc:creator>Leslie, C. S.</dc:creator>
<dc:creator>Steinmetz, L. M.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:date>2023-11-13</dc:date>
<dc:identifier>doi:10.1101/2023.11.09.563812</dc:identifier>
<dc:title><![CDATA[An encyclopedia of enhancer-gene regulatory interactions in the human genome]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-11-13</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.06.12.598705v1?rss=1">
<title>
<![CDATA[
Isoform and pathway-specific post-transcriptional RNA processing in human cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.06.12.598705v1?rss=1"
</link>
<description><![CDATA[
Steady-state levels of RNA transcripts are controlled by their rates of synthesis and degradation. Here we used nascent RNA Bru-seq and BruChase-seq to profile RNA dynamics across 16 human cell lines as part of ENCODE4 Deeply Profiled Cell Lines collection. We show that RNA turnover dynamics differ widely between transcripts of different genes and between different classes of RNA. Gene set enrichment analysis (GSEA) revealed that transcripts encoding proteins belonging to the same pathway often show similar turnover dynamics. Furthermore, transcript isoforms show distinct dynamics suggesting that RNA turnover is important in regulating mRNA isoform choice. Finally, splicing across newly made transcripts appears to be cooperative with either all or none type splicing. These data sets generated as part of ENCODE4 illustrate the intricate and coordinated regulation of RNA dynamics in controlling gene expression to allow for the precise coordination of cellular functions.
]]></description>
<dc:creator>Bedi, K.</dc:creator>
<dc:creator>Magnuson, B.</dc:creator>
<dc:creator>Narayanan, I. V.</dc:creator>
<dc:creator>McShane, A.</dc:creator>
<dc:creator>Ashaka, M.</dc:creator>
<dc:creator>Paulsen, M. T.</dc:creator>
<dc:creator>Wilson, T. E.</dc:creator>
<dc:creator>Ljungman, M.</dc:creator>
<dc:date>2024-06-12</dc:date>
<dc:identifier>doi:10.1101/2024.06.12.598705</dc:identifier>
<dc:title><![CDATA[Isoform and pathway-specific post-transcriptional RNA processing in human cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-06-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.04.09.588612v1?rss=1">
<title>
<![CDATA[
Characterizing nascent transcription patterns of PROMPTs, eRNAs, and readthrough transcripts in the ENCODE4 deeply profiled cell lines 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.04.09.588612v1?rss=1"
</link>
<description><![CDATA[
Arising as co-products of canonical gene expression, transcription-associated lincRNAs, such as promoter upstream transcripts (PROMPTs), enhancer RNAs (eRNAs), and readthrough (RT) transcripts, are often regarded as byproducts of transcription, although they may be important for the expression of nearby genes. We identified regions of nascent expression of these lincRNA in 16 human cell lines using Bru-seq techniques, and found distinctly regulated patterns of PROMPT, eRNA, and RT transcription using the diverse biochemical approaches in the ENCODE4 deeply profiled cell lines collection. Transcription of these lincRNAs was influenced by sequence-specific features and the local or 3D chromatin landscape. However, these sequence and chromatin features do not describe the full spectrum of lincRNA expression variability we identify, highlighting the complexity of their regulation. This may suggest that transcription-associated lincRNAs are not merely byproducts, but rather that the transcript itself, or the act of its transcription, is important for genomic function.
]]></description>
<dc:creator>McShane, A.</dc:creator>
<dc:creator>Venkata Narayanan, I.</dc:creator>
<dc:creator>Paulsen, M. T.</dc:creator>
<dc:creator>Ashaka, M.</dc:creator>
<dc:creator>Blinkiewicz, H.</dc:creator>
<dc:creator>Yang, N. T.</dc:creator>
<dc:creator>Magnuson, B.</dc:creator>
<dc:creator>Bedi, K.</dc:creator>
<dc:creator>Wilson, T. E.</dc:creator>
<dc:creator>Ljungman, M.</dc:creator>
<dc:date>2024-04-09</dc:date>
<dc:identifier>doi:10.1101/2024.04.09.588612</dc:identifier>
<dc:title><![CDATA[Characterizing nascent transcription patterns of PROMPTs, eRNAs, and readthrough transcripts in the ENCODE4 deeply profiled cell lines]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-04-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.10.08.616922v1?rss=1">
<title>
<![CDATA[
CRISPR tiling deletion screens reveal functional enhancers of neuropsychiatric risk genes and allelic compensation effects (ACE) on transcription 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.10.08.616922v1?rss=1"
</link>
<description><![CDATA[
Precise transcriptional regulation is critical for cellular function and development, yet the mechanism of this process remains poorly understood for many genes. To gain a deeper understanding of the regulation of neuropsychiatric disease risk genes, we identified a total of 39 functional enhancers for four dosage-sensitive genes, APP, FMR1, MECP2, and SIN3A, using CRISPR tiling deletion screening in human induced pluripotent stem cell (iPSC)-induced excitatory neurons. We found that enhancer annotation provides potential pathological insights into disease-associated copy number variants. More importantly, we discovered that allelic enhancer deletions at SIN3A could be compensated by increased transcriptional activities from the other intact allele. Such allelic compensation effects (ACE) on transcription is stably maintained during differentiation and, once established, cannot be reversed by ectopic SIN3A expression. Further, ACE at SIN3A occurs through dosage sensing by the promoter. Together, our findings unravel a regulatory compensation mechanism that ensures stable and precise transcriptional output for SIN3A, and potentially other dosage-sensitive genes.
]]></description>
<dc:creator>Ren, X.</dc:creator>
<dc:creator>Zheng, L.</dc:creator>
<dc:creator>Maliskova, L.</dc:creator>
<dc:creator>Tam, T. W.</dc:creator>
<dc:creator>Sun, Y.</dc:creator>
<dc:creator>Liu, H.</dc:creator>
<dc:creator>Lee, J.</dc:creator>
<dc:creator>Takagi, M. A.</dc:creator>
<dc:creator>Li, B.</dc:creator>
<dc:creator>Ren, B.</dc:creator>
<dc:creator>Wang, W.</dc:creator>
<dc:creator>Shen, Y.</dc:creator>
<dc:date>2024-10-10</dc:date>
<dc:identifier>doi:10.1101/2024.10.08.616922</dc:identifier>
<dc:title><![CDATA[CRISPR tiling deletion screens reveal functional enhancers of neuropsychiatric risk genes and allelic compensation effects (ACE) on transcription]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-10-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.10.31.620766v1?rss=1">
<title>
<![CDATA[
Quantifying Functional Conservation of Human and Mouse Regulatory Elements via FUNCODE 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.10.31.620766v1?rss=1"
</link>
<description><![CDATA[
Evolutionary conservation is crucial for understanding genome functions and lays the foundation for using animal models in studying human diseases. However, conventional conservation scores based on DNA sequence evolution do not capture the dynamic biochemical activities of DNA elements, termed functional conservation. Quantifying functional conservation has been limited by the availability of functional genomic data matched across species. To address this, we developed FUNCODE, a framework for characterizing functional conservation through in silico sample matching. Applying FUNCODE to 2,595 uniformly processed datasets from the Encyclopedia of DNA Elements (ENCODE), we generated genome-wide FUNCODE scores for human and mouse regulatory elements, identifying 3.3 million functionally conserved human-mouse element pairs. We demonstrate FUNCODEs diverse applications, including annotating 78,501 novel regulatory elements, transferring 37,968 high-resolution human ENCODE Hi-C loops in immune lineages to mice, identifying conserved functional signals for disease modeling, and enhancing cross-species integration of single-cell omics data.
]]></description>
<dc:creator>Fang, W.</dc:creator>
<dc:creator>Chen, C.</dc:creator>
<dc:creator>Zhang, B.</dc:creator>
<dc:creator>Wang, Y.</dc:creator>
<dc:creator>Zhao, R.</dc:creator>
<dc:creator>Zhou, W.</dc:creator>
<dc:creator>Ji, H.</dc:creator>
<dc:date>2024-11-03</dc:date>
<dc:identifier>doi:10.1101/2024.10.31.620766</dc:identifier>
<dc:title><![CDATA[Quantifying Functional Conservation of Human and Mouse Regulatory Elements via FUNCODE]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-11-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.12.01.621925v1?rss=1">
<title>
<![CDATA[
Directionality of Transcriptional Regulatory Elements 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.12.01.621925v1?rss=1"
</link>
<description><![CDATA[
Divergent transcription is a critical marker of active transcriptional regulatory elements (TREs), including enhancers and promoters, in mammals. However, distal elements with unidirectional transcriptional patterns are often overlooked, leaving their identity and function poorly understood. Here, we performed a systematic comparison between divergent and unidirectional elements, revealing their distinct architectural and functional features. Our analysis also shows that unidirectional elements have younger sequence ages and are under weaker evolutionary constraints than divergent elements, indicating that they may represent a unique category of genomic regulatory function with more recent origins. Notably, we observed that some transcription factors, including CTCF, AP1, SP, and NFY, exhibit dual roles in modulating the directionality of TREs, either activating or repressing nascent transcription in a position-dependent manner. Overall, the elucidation of directionality enhances our understanding of the diverse architectural models, functional features, evolutionary dynamics, and regulatory logic of TREs.
]]></description>
<dc:creator>Chen, Y.</dc:creator>
<dc:creator>Shah, S. R.</dc:creator>
<dc:creator>Leung, A.</dc:creator>
<dc:creator>Paramo, M. I.</dc:creator>
<dc:creator>Cochran, K.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Clark, A. G.</dc:creator>
<dc:creator>Lis, J. T.</dc:creator>
<dc:creator>Yu, H.</dc:creator>
<dc:date>2024-12-02</dc:date>
<dc:identifier>doi:10.1101/2024.12.01.621925</dc:identifier>
<dc:title><![CDATA[Directionality of Transcriptional Regulatory Elements]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-12-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.12.26.629296v1?rss=1">
<title>
<![CDATA[
An Expanded Registry of Candidate cis-Regulatory Elements for Studying Transcriptional Regulation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.12.26.629296v1?rss=1"
</link>
<description><![CDATA[
Mammalian genomes contain millions of regulatory elements that control the complex patterns of gene expression. Previously, The ENCODE consortium mapped biochemical signals across many cell types and tissues and integrated these data to develop a Registry of 0.9 million human and 300 thousand mouse candidate cis-Regulatory Elements (cCREs) annotated with potential functions1. We have expanded the Registry to include 2.35 million human and 927 thousand mouse cCREs, leveraging new ENCODE datasets and enhanced computational methods. This expanded Registry covers hundreds of unique cell and tissue types, providing a comprehensive understanding of gene regulation. Functional characterization data from assays like STARR-seq, MPRA, CRISPR perturbation, and transgenic mouse assays now cover over 90% of human cCREs, revealing complex regulatory functions. We identified thousands of novel silencer cCREs and demonstrated their dual enhancer/silencer roles in different cellular contexts. Integrating the Registry with other ENCODE annotations facilitates genetic variation interpretation and trait-associated gene identification, exemplified by discovering KLF1 as a novel causal gene for red blood cell traits. This expanded Registry is a valuable resource for studying the regulatory genome and its impact on health and disease.
]]></description>
<dc:creator>Moore, J. E.</dc:creator>
<dc:creator>Pratt, H. E.</dc:creator>
<dc:creator>Fan, K.</dc:creator>
<dc:creator>Phalke, N.</dc:creator>
<dc:creator>Fisher, J.</dc:creator>
<dc:creator>Elhajjajy, S. I.</dc:creator>
<dc:creator>Andrews, G.</dc:creator>
<dc:creator>Gao, M.</dc:creator>
<dc:creator>Shedd, N.</dc:creator>
<dc:creator>Fu, Y.</dc:creator>
<dc:creator>Lacadie, M. C.</dc:creator>
<dc:creator>Meza, J.</dc:creator>
<dc:creator>Ganna, M.</dc:creator>
<dc:creator>Choudhury, E.</dc:creator>
<dc:creator>Swofford, R.</dc:creator>
<dc:creator>Farrell, N. P.</dc:creator>
<dc:creator>Pampari, A.</dc:creator>
<dc:creator>Ramalingam, V.</dc:creator>
<dc:creator>Reese, F.</dc:creator>
<dc:creator>Borsari, B.</dc:creator>
<dc:creator>Yu, X.</dc:creator>
<dc:creator>Wattenberg, E. S.</dc:creator>
<dc:creator>Ruiz-Romero, M.</dc:creator>
<dc:creator>Razavi-Mohseni, M.</dc:creator>
<dc:creator>Xu, J.</dc:creator>
<dc:creator>Galeev, T.</dc:creator>
<dc:creator>Beer, M. A.</dc:creator>
<dc:creator>Guigo, R.</dc:creator>
<dc:creator>Gerstein, M.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:creator>Ljungman, M.</dc:creator>
<dc:creator>Reddy, T. E.</dc:creator>
<dc:creator>Snyder, M.</dc:creator>
<dc:creator>Epstein, C. B.</dc:creator>
<dc:creator>Gaskell, E.</dc:creator>
<dc:creator>Bernstein, B. E.</dc:creator>
<dc:creator>Dickel, D. E.</dc:creator>
<dc:creator>Visel, A.</dc:creator>
<dc:creator>Pennacchio, L. A.</dc:creator>
<dc:creator>Mortazavi, A.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Weng, Z.</dc:creator>
<dc:date>2024-12-26</dc:date>
<dc:identifier>doi:10.1101/2024.12.26.629296</dc:identifier>
<dc:title><![CDATA[An Expanded Registry of Candidate cis-Regulatory Elements for Studying Transcriptional Regulation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-12-26</prism:publicationDate>
<prism:section></prism:section>
</item>
</rdf:RDF>
