	<rdf:RDF xmlns:admin="http://webns.net/mvcb/" xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:prism="http://purl.org/rss/1.0/modules/prism/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/">
	<channel rdf:about="https://biorxiv.org">
	<admin:errorReportsTo rdf:resource="mailto:biorxiv@cshlpress.edu"/>
	<title>bioRxiv Channel: Impact of Genomic Variation on Function (IGVF)</title>
	<link>https://biorxiv.org</link>
	<description>
	This feed contains articles for bioRxiv Channel "Impact of Genomic Variation on Function (IGVF)"
	</description>

		<items>
	<rdf:Seq>
		</rdf:Seq>
	</items>
	<prism:eIssn/>
	<prism:publicationName>bioRxiv</prism:publicationName>
	<prism:issn/>

	<image rdf:resource=""/>
	</channel>
	<image rdf:about="">
	<title>bioRxiv</title>
	<url/>
	<link>https://biorxiv.org</link>
	</image>
	<item rdf:about="https://biorxiv.org/cgi/content/short/2022.10.24.513593v1?rss=1">
<title>
<![CDATA[
EUGENe: A Python toolkit for predictive analyses of regulatory sequences 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.10.24.513593v1?rss=1"
</link>
<description><![CDATA[
Deep learning (DL) has become a popular tool to study cis-regulatory element function. Yet efforts to design software for DL analyses in genomics that are Findable, Accessible, Interoperable and Reusable (FAIR) have fallen short of fully meeting these criteria. Here we present EUGENe (Elucidating the Utility of Genomic Elements with Neural Nets), a FAIR toolkit for the analysis of labeled sets of nucleotide sequences with DL. EUGENe consists of a set of modules that empower users to execute the key functionality of a DL workflow: 1) extracting, transforming and loading sequence data from many common file formats, 2) instantiating, initializing and training diverse model architectures, and 3) evaluating and interpreting model behavior. We designed EUGENe to be simple; users can develop workflows on new or existing datasets with two customizable Python objects, annotated sequence data (SeqData) and PyTorch models (BaseModel). The modularity and simplicity of EUGENe also make it highly extensible and we illustrate these principles through application of the toolkit to three predictive modeling tasks. First, we train and compare a set of built-in models along with a custom architecture for the accurate prediction of activities of plant promoters from STARR-seq data. Next, we apply EUGENe to an RNA binding prediction task and showcase how seminal model architectures can be retrained in EUGENe or imported from Kipoi. Finally, we train models to classify transcription factor binding by wrapping functionality from Janngu, which can efficiently extract sequences in BED file format from the human genome. We emphasize that the code used in each use case is simple, readable, and well documented (https://eugene-tools.readthedocs.io/en/latest/index.html). We believe that EUGENe represents a springboard toward a collaborative ecosystem for DL applications in genomics research. EUGENe is available for download on GitHub (https://github.com/cartercompbio/EUGENe) along with several introductory tutorials and for installation on PyPi (https://pypi.org/project/eugene-tools/).
]]></description>
<dc:creator>Klie, A.</dc:creator>
<dc:creator>Stites, H.</dc:creator>
<dc:creator>Jores, T.</dc:creator>
<dc:creator>Carter, H.</dc:creator>
<dc:date>2022-10-26</dc:date>
<dc:identifier>doi:10.1101/2022.10.24.513593</dc:identifier>
<dc:title><![CDATA[EUGENe: A Python toolkit for predictive analyses of regulatory sequences]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-10-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.08.491094v1?rss=1">
<title>
<![CDATA[
A framework for summarizing chromatin state annotations within and identifying differential annotations across groups of samples 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.08.491094v1?rss=1"
</link>
<description><![CDATA[
MotivationGenome-wide maps of epigenetic modifications are powerful resources for non-coding genome annotation. Maps of multiple epigenetics marks have been integrated into cell or tissue type-specific chromatin state annotations for many cell or tissue types. With the increasing availability of multiple chromatin state maps for biologically similar samples, there is a need for methods that can effectively summarize the information about chromatin state annotations within groups of samples and identify differences across groups of samples at a high resolution.

ResultsWe developed CSREP, which takes as input chromatin state annotations for a group of samples and then probabilistically estimates the state at each genomic position and derives a representative chromatin state map for the group. CSREP uses an ensemble of multi-class logistic regression classifiers to predict the chromatin state assignment of each sample given the state maps from all other samples. The difference of CSREPs probability assignments for two groups can be used to identify genomic locations with differential chromatin state patterns.

Using groups of chromatin state maps of a diverse set of cell and tissue types, we demonstrate the advantages of using CSREP to summarize chromatin state maps and identify biologically relevant differences between groups at a high resolution.

Availability and implementationThe CSREP source code is openly available under http://github.com/ernstlab/csrep.

Contact: jason.ernst@ucla.edu
]]></description>
<dc:creator>Vu, H. T.</dc:creator>
<dc:creator>Koch, Z.</dc:creator>
<dc:creator>Fiziev, P.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2022-05-08</dc:date>
<dc:identifier>doi:10.1101/2022.05.08.491094</dc:identifier>
<dc:title><![CDATA[A framework for summarizing chromatin state annotations within and identifying differential annotations across groups of samples]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.08.02.502571v1?rss=1">
<title>
<![CDATA[
Chromatin state modeling across individuals reveals global patterns of histone modifications 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.08.02.502571v1?rss=1"
</link>
<description><![CDATA[
Epigenetic mapping studies across individuals have identified many positions of epigenetic variation in various human tissues and conditions. However the relationships between these positions, and in particular global patterns that recur in many regions of the genome remains understudied. In this study, we use a stacked chromatin state model to systematically learn global patterns of epigenetic variation across individuals and annotate the human genome based on them. We applied this framework to histone modification data across individuals in lymphoblastoid cell lines and across autism spectrum disorder cases and controls in prefrontal cortex tissue. We find that global patterns are correlated across multiple histone modifications and with gene expression. We used the global patterns as a framework to predict transregulators, identify trans-QTL, and study complex disease. The frameworks for identifying and analyzing global patterns of epigenetic variation are general and we expect will be useful in other systems.
]]></description>
<dc:creator>Zou, J.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2022-08-03</dc:date>
<dc:identifier>doi:10.1101/2022.08.02.502571</dc:identifier>
<dc:title><![CDATA[Chromatin state modeling across individuals reveals global patterns of histone modifications]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-08-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.01.18.476849v1?rss=1">
<title>
<![CDATA[
Exploring genomic data coupled with 3D chromatin structures using the WashU Epigenome Browser 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.01.18.476849v1?rss=1"
</link>
<description><![CDATA[
Biological functions are not only encoded by the genomes sequence but also regulated by its three-dimensional (3D) structure. More and more studies have revealed the importance of 3D chromatin structures in development and diseases; therefore, visualizing the connections between genome sequence, epigenomic dynamics (1D) and the 3D genome becomes a pressing need. The WashU Epigenome Browser introduces a new 3D visualization module to integrate visualization of 1D (such as sequence features, epigenomic data) and 2D data (such as chromosome conformation capture data) with 3D genome structure. Genomic coordinates are encoded in 3D models of the chromosomes; thus, all genomic information displayed on a 1D genome browser can be visualized on a 3D model, supported by genome browser utilities and facilitating interpretation of genomic data. Biological information that is difficult to illustrate in 1D becomes more intuitive when displayed in 3D, providing novel and powerful tools for investigators to hypothesize and understand the connections between biological functions and 3D genome structures.
]]></description>
<dc:creator>Li, D.</dc:creator>
<dc:creator>Purushotham, D.</dc:creator>
<dc:creator>Harrison, J. K.</dc:creator>
<dc:creator>Wang, T.</dc:creator>
<dc:date>2022-01-21</dc:date>
<dc:identifier>doi:10.1101/2022.01.18.476849</dc:identifier>
<dc:title><![CDATA[Exploring genomic data coupled with 3D chromatin structures using the WashU Epigenome Browser]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-01-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.24.493345v1?rss=1">
<title>
<![CDATA[
ChromGene: Gene-Based Modeling of Epigenomic Data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.24.493345v1?rss=1"
</link>
<description><![CDATA[
BackgroundVarious computational approaches have been developed to annotate epigenomes on a per-position basis by modeling combinatorial and spatial patterns within epigenomic data. However, such annotations are less suitable for gene-based analyses, in which a single annotation for each gene is desired.

ResultsTo address this, we developed ChromGene, which annotates genes based on the combinatorial and spatial patterns of multiple epigenomic marks across the gene body and flanking regions. Specifically, ChromGene models the epigenomics maps using a mixture of hidden Markov models learned de novo. Using ChromGene, we generated annotations for the human protein-coding genes for over 100 cell and tissue types. We characterize the different mixture components and their associated gene sets in terms of gene expression, constraint, and other gene annotations. We also characterize variation in ChromGene gene annotations across cell and tissue types.

ConclusionsWe expect that the ChromGene method and provided annotations will be a useful resource for gene-based epigenomic analyses.
]]></description>
<dc:creator>Jaroszewicz, A.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2022-05-25</dc:date>
<dc:identifier>doi:10.1101/2022.05.24.493345</dc:identifier>
<dc:title><![CDATA[ChromGene: Gene-Based Modeling of Epigenomic Data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-25</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.02.490368v1?rss=1">
<title>
<![CDATA[
Genome-wide CRISPR guide RNA design and specificity analysis with GuideScan2 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.02.490368v1?rss=1"
</link>
<description><![CDATA[
We present GuideScan2 for memory-efficient, parallelizable construction of high-specificity CRISPR guide RNA (gRNA) databases and user-friendly gRNA/library design in custom genomes. GuideScan2 analysis identified widespread confounding effects of low-specificity gRNAs in published CRISPR knockout, interference and activation screens and enabled construction of a ready-to-use gRNA library that reduced off-target effects in a novel gene essentiality screen. GuideScan2 also enabled the design and experimental validation of allele-specific gRNAs in a hybrid mouse genome.
]]></description>
<dc:creator>Schmidt, H.</dc:creator>
<dc:creator>Zhang, M.</dc:creator>
<dc:creator>Mourelatos, H.</dc:creator>
<dc:creator>Sanchez-Rivera, F. J.</dc:creator>
<dc:creator>Lowe, S. W.</dc:creator>
<dc:creator>Ventura, A.</dc:creator>
<dc:creator>Leslie, C. S.</dc:creator>
<dc:creator>Pritykin, Y.</dc:creator>
<dc:date>2022-05-03</dc:date>
<dc:identifier>doi:10.1101/2022.05.02.490368</dc:identifier>
<dc:title><![CDATA[Genome-wide CRISPR guide RNA design and specificity analysis with GuideScan2]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.02.28.482131v1?rss=1">
<title>
<![CDATA[
Heterogeneity of Inflammation-associated Synovial Fibroblasts in Rheumatoid Arthritis and Its Drivers 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.02.28.482131v1?rss=1"
</link>
<description><![CDATA[
Inflammation of non-barrier immunologically quiescent tissues is associated with a massive influx of blood-borne innate and adaptive immune cells. Cues from the latter are likely to alter and expand the spectrum of states observed in cells that are constitutively resident. However, local communications between immigrant and resident cell types in human inflammatory disease remain poorly understood. Here, we explored heterogeneity of synovial fibroblasts (FLS) in inflamed joints of rheumatoid arthritis (RA) patients using paired single cell RNA and ATAC sequencing (scRNA/ATAC-seq), multiplexed imaging, and spatial transcriptomics along with in vitro modeling of cell extrinsic factor signaling. These analyses suggest that local exposures to myeloid and T cell derived cytokines, TNF, IFN{gamma}, IL-1{beta}, or lack thereof, drive six distinct FLS states some of which closely resemble fibroblast states in other disease-affected tissues including skin and colon. Our results highlight a role for concurrent, spatially distributed cytokine signaling within the inflamed synovium.
]]></description>
<dc:creator>Smith, M. H.</dc:creator>
<dc:creator>Gao, V. R.</dc:creator>
<dc:creator>Schizas, M.</dc:creator>
<dc:creator>Kochen, A.</dc:creator>
<dc:creator>DiCarlo, E.</dc:creator>
<dc:creator>Goodman, S.</dc:creator>
<dc:creator>Norman, T. M.</dc:creator>
<dc:creator>Donlin, L.</dc:creator>
<dc:creator>Leslie, C. S.</dc:creator>
<dc:creator>Rudensky, A. Y.</dc:creator>
<dc:date>2022-03-02</dc:date>
<dc:identifier>doi:10.1101/2022.02.28.482131</dc:identifier>
<dc:title><![CDATA[Heterogeneity of Inflammation-associated Synovial Fibroblasts in Rheumatoid Arthritis and Its Drivers]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-03-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.11.26.470154v1?rss=1">
<title>
<![CDATA[
Diverse digital and fuzzy composite transcriptional elements are prevalent features of mammalian cis-regulomes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.11.26.470154v1?rss=1"
</link>
<description><![CDATA[
Mammalian transcriptional regulatory sequences are comprised of complex combinations of simple transcription factor (TF) motifs. Stereospecific juxta-positioning of simple TF motifs generates composite elements (CEs), that increase combinatorial and regulatory specificity of TF-DNA interactions. Although a small number of CEs and their cooperative or anti-cooperative modes of TF binding have been thoroughly characterized, a systematic analysis of CE diversity, prevalence and properties in cis-regulomes has not been undertaken. We developed a computational pipeline termed CEseek to discover >20,000 CEs in open chromatin regions of diverse immune cells and validated many using CAP-SELEX, ChIP-Seq and STARR-seq datasets. Strikingly, the CEs manifested a bimodal distribution of configurations, termed digital and fuzzy, based on their stringent or relaxed stereospecific constraints, respectively. Digital CEs mediate cooperative as well as anti-cooperative binding of structurally diverse TFs that likely reflect AND/OR genomic logic gates. In contrast, fuzzy CEs encompass a less diverse set of TF motif pairs that are selectively enriched in p300 associated, multi-genic enhancers. The annotated CEs greatly expand the regulatory DNA motif lexicon and the universe of TF-TF interactions that underlie combinatorial logic of gene regulation.
]]></description>
<dc:creator>Chaudhri, V. K.</dc:creator>
<dc:creator>Singh, H.</dc:creator>
<dc:date>2021-11-27</dc:date>
<dc:identifier>doi:10.1101/2021.11.26.470154</dc:identifier>
<dc:title><![CDATA[Diverse digital and fuzzy composite transcriptional elements are prevalent features of mammalian cis-regulomes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-11-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.03.16.483999v1?rss=1">
<title>
<![CDATA[
Evolution of transposable element-derived enhancer activity 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.03.16.483999v1?rss=1"
</link>
<description><![CDATA[
Many transposable elements (TEs) contain transcription factor binding sites and are implicated as potential regulatory elements. However, TEs are rarely functionally tested for regulatory activity, which in turn limits our understanding of how TE regulatory activity has evolved. We systematically tested the human LTR18A subfamily for regulatory activity using massively parallel reporter assay (MPRA) and found AP-1 and C/EBP-related binding motifs as drivers of enhancer activity. Functional analysis of evolutionarily reconstructed ancestral sequences revealed that LTR18A elements have generally lost regulatory activity over time through sequence changes, with the largest effects occurring due to mutations in the AP-1 and C/EBP motifs. We observed that the two motifs are conserved at higher rates than expected based on neutral evolution. Finally, we identified LTR18A elements as potential enhancers in the human genome, primarily in epithelial cells. Together, our results provide a model for the origin, evolution, and co-option of TE-derived regulatory elements.
]]></description>
<dc:creator>Du, A. Y.</dc:creator>
<dc:creator>Zhuo, X.</dc:creator>
<dc:creator>Sundaram, V.</dc:creator>
<dc:creator>Jensen, N. O.</dc:creator>
<dc:creator>Chaudhari, H. G.</dc:creator>
<dc:creator>Saccone, N. L.</dc:creator>
<dc:creator>Cohen, B. A.</dc:creator>
<dc:creator>Wang, T.</dc:creator>
<dc:date>2022-03-17</dc:date>
<dc:identifier>doi:10.1101/2022.03.16.483999</dc:identifier>
<dc:title><![CDATA[Evolution of transposable element-derived enhancer activity]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-03-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.26.493621v1?rss=1">
<title>
<![CDATA[
The dynseq genome browser track enables visualization of context-specific, dynamic DNA sequence features at single nucleotide resolution 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.26.493621v1?rss=1"
</link>
<description><![CDATA[
We introduce the dynseq genome browser track, which displays DNA nucleotide characters scaled by user-specified, base-resolution scores provided in the BigWig file format. The dynseq track enables visualization of context-specific, informative genomic sequence features. We demonstrate its utility in three popular genome browsers for interpreting cis-regulatory sequence syntax and regulatory variant interpretation by visualizing nucleotide importance scores derived from machine learning models of regulatory DNA trained on protein-DNA binding and chromatin accessibility experiments.
]]></description>
<dc:creator>Nair, S.</dc:creator>
<dc:creator>Barrett, A.</dc:creator>
<dc:creator>Li, D.</dc:creator>
<dc:creator>Raney, B. J.</dc:creator>
<dc:creator>Lee, B. T.</dc:creator>
<dc:creator>Kerpedjiev, P.</dc:creator>
<dc:creator>Ramalingam, V.</dc:creator>
<dc:creator>Pampari, A.</dc:creator>
<dc:creator>Lekschas, F.</dc:creator>
<dc:creator>Wang, T.</dc:creator>
<dc:creator>Haeussler, M.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:date>2022-05-28</dc:date>
<dc:identifier>doi:10.1101/2022.05.26.493621</dc:identifier>
<dc:title><![CDATA[The dynseq genome browser track enables visualization of context-specific, dynamic DNA sequence features at single nucleotide resolution]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.08.20.504670v1?rss=1">
<title>
<![CDATA[
A mutation rate model at the basepair resolution identifies the mutagenic effect of Polymerase III transcription 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.08.20.504670v1?rss=1"
</link>
<description><![CDATA[
De novo mutations occur with substantially different rates depending on genomic location, sequence context and DNA strand1-4. The success of many human genetics techniques, especially when applied to large population sequencing datasets with numerous recurrent mutations5-7, depends strongly on assumptions about the local mutation rate. Such techniques include estimation of selection intensity8, inference of demographic history9, and mapping of rare disease genes10. Here, we present Roulette, a genome-wide mutation rate model at the basepair resolution that incorporates known determinants of local mutation rate (http://genetics.bwh.harvard.edu/downloads/Vova/Roulette/). Roulette is shown to be more accurate than existing models1,6. Roulette has sufficient resolution at high mutation rate sites to model allele frequencies under recurrent mutation. We use Roulette to refine estimates of population growth within Europe by incorporating the full range of human mutation rates. The analysis of significant deviations from the model predictions revealed a 10-fold increase in mutation rate in nearly all genes transcribed by Polymerase III, suggesting a new mutagenic mechanism. We also detected an elevated mutation rate within transcription factor binding sites restricted to sites actively utilized in testis and residing in promoters.
]]></description>
<dc:creator>Seplyarskiy, V.</dc:creator>
<dc:creator>Lee, D. J.</dc:creator>
<dc:creator>Koch, E. M.</dc:creator>
<dc:creator>Lichtman, J. S.</dc:creator>
<dc:creator>Luan, H. H.</dc:creator>
<dc:creator>Sunyaev, S. R.</dc:creator>
<dc:date>2022-08-21</dc:date>
<dc:identifier>doi:10.1101/2022.08.20.504670</dc:identifier>
<dc:title><![CDATA[A mutation rate model at the basepair resolution identifies the mutagenic effect of Polymerase III transcription]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-08-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.08.28.505582v1?rss=1">
<title>
<![CDATA[
FAVOR: Functional Annotation of Variants Online Resource and Annotator for Variation across the Human Genome 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.08.28.505582v1?rss=1"
</link>
<description><![CDATA[
Large-scale whole genome sequencing (WGS) studies and biobanks are rapidly generating a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries or are unable to functionally annotate the genotype data of large WGS studies and biobanks for downstream analysis. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive online multi-faceted portal with summarization and visualization of all possible 9 billion single nucleotide variants (SNVs) across the genome, and allows for rapid variant-, gene-, and region-level online queries. It integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, a scalable annotation tool, FAVORannotator, is provided for functionally annotating and efficiently storing the genotype and variant functional annotation data of a large-scale sequencing study in an annotated GDS file format to facilitate downstream analysis. FAVOR and FAVORannotator are available at https://favor.genohub.org.
]]></description>
<dc:creator>Zhou, H.</dc:creator>
<dc:creator>Arapoglou, T.</dc:creator>
<dc:creator>Li, X.</dc:creator>
<dc:creator>Li, Z.</dc:creator>
<dc:creator>Zheng, X.</dc:creator>
<dc:creator>Moore, J. E.</dc:creator>
<dc:creator>Asok, A.</dc:creator>
<dc:creator>Kumar, S.</dc:creator>
<dc:creator>Blue, E. E.</dc:creator>
<dc:creator>Buyske, S.</dc:creator>
<dc:creator>Cox, N.</dc:creator>
<dc:creator>Felsenfeld, A.</dc:creator>
<dc:creator>Gerstein, M.</dc:creator>
<dc:creator>Kenny, E.</dc:creator>
<dc:creator>Li, B.</dc:creator>
<dc:creator>Matise, T.</dc:creator>
<dc:creator>Philippakis, A.</dc:creator>
<dc:creator>Rehm, H.</dc:creator>
<dc:creator>Sofia, H. J.</dc:creator>
<dc:creator>Neale, B.</dc:creator>
<dc:creator>Snyder, G.</dc:creator>
<dc:creator>Weng, Z.</dc:creator>
<dc:creator>Sunyaev, S.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:date>2022-08-29</dc:date>
<dc:identifier>doi:10.1101/2022.08.28.505582</dc:identifier>
<dc:title><![CDATA[FAVOR: Functional Annotation of Variants Online Resource and Annotator for Variation across the Human Genome]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-08-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.10.26.513833v1?rss=1">
<title>
<![CDATA[
Optimizing and benchmarking polygenic risk scores with GWAS summary statistics 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.10.26.513833v1?rss=1"
</link>
<description><![CDATA[
BackgroundPolygenic risk score (PRS) is a major research topic in human genetics. However, a significant gap exists between PRS methodology and applications in practice due to often unavailable individual-level data for various PRS tasks including model fine-tuning, benchmarking, and ensemble learning.

ResultsWe introduce an innovative statistical framework to optimize and benchmark PRS models using summary statistics of genome-wide association studies. This framework builds upon our previous work and can fine-tune virtually all existing PRS models while accounting for linkage disequilibrium. In addition, we provide an ensemble learning strategy named PUMAS-ensemble to combine multiple PRS models into an ensemble score without requiring external data for model fitting. Through extensive simulations and analysis of many complex traits in the UK Biobank, we demonstrate that this approach closely approximates gold-standard analytical strategies based on external validation, and substantially outperforms state-of-the-art PRS methods.

ConclusionsOur method is a powerful and general modeling technique that can continue to combine the best-performing PRS methods out there through ensemble learning and could become an integral component for all future PRS applications.
]]></description>
<dc:creator>Zhao, Z.</dc:creator>
<dc:creator>Gruenloh, T.</dc:creator>
<dc:creator>Wu, Y.</dc:creator>
<dc:creator>Sun, Z.</dc:creator>
<dc:creator>Miao, J.</dc:creator>
<dc:creator>Wu, Y.</dc:creator>
<dc:creator>Song, J.</dc:creator>
<dc:creator>Lu, Q.</dc:creator>
<dc:date>2022-10-27</dc:date>
<dc:identifier>doi:10.1101/2022.10.26.513833</dc:identifier>
<dc:title><![CDATA[Optimizing and benchmarking polygenic risk scores with GWAS summary statistics]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-10-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.02.08.479604v1?rss=1">
<title>
<![CDATA[
Mutagenesis at non-B DNA motifs in the human genome: a course correction 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.02.08.479604v1?rss=1"
</link>
<description><![CDATA[
Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs, and recurrent sequencing errors. Accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports. Mutagenesis not attributable to artifacts revealed several biological mechanisms. Polymerase slippage generates frequent indels within every variety of short tandem repeat motif, implicating slipped-strand structures. Interruption-correcting SNVs within STRs distinctly implicate error-prone Pol{kappa}. Secondary-structure formation promotes SNVs within palindromic repeats, as well as duplications within direct repeats. G-quadruplex motifs cause recurrent sequencing errors, while mutagenesis at Z-DNAs is conspicuously absent.
]]></description>
<dc:creator>McGinty, R. J.</dc:creator>
<dc:creator>Sunyaev, S. R.</dc:creator>
<dc:date>2022-02-09</dc:date>
<dc:identifier>doi:10.1101/2022.02.08.479604</dc:identifier>
<dc:title><![CDATA[Mutagenesis at non-B DNA motifs in the human genome: a course correction]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-02-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.07.20.500854v1?rss=1">
<title>
<![CDATA[
PerturbNet predicts single-cell responses to unseen chemical and genetic perturbations 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.07.20.500854v1?rss=1"
</link>
<description><![CDATA[
Small molecule treatment and gene knockout or overexpression induce complex changes in the molecular states of cells, and the space of possible perturbations is too large to measure exhaustively. We present PerturbNet, a deep generative model for predicting the distribution of cell states induced by unseen chemical or genetic perturbations. Our key innovation is to use high-throughput perturbation response data such as Perturb-Seq to learn a continuous mapping between the space of possible perturbations and the space of possible cell states.

Using Sci-Plex and LINCS datasets, PerturbNet can accurately predict the distribution of gene expression changes induced by unseen small molecules given only their chemical structures. PerturbNet also accurately predicts gene expression changes induced by shRNA, CRISPRi, or CRISPRa perturbations using a perturbation network trained on gene functional annotations. Furthermore, self-supervised sequence embeddings allow PerturbNet to predict gene expression changes induced by missense mutations. We also use PerturbNet to attribute cell state shifts to specific perturbation features, including atoms and functional gene annotations. Finally, we leverage PerturbNet to design perturbations that achieve a desired cell state distribution. PerturbNet holds great promise for understanding perturbation responses and ultimately designing novel chemical and genetic interventions.
]]></description>
<dc:creator>Yu, H.</dc:creator>
<dc:creator>Welch, J. D.</dc:creator>
<dc:date>2022-07-21</dc:date>
<dc:identifier>doi:10.1101/2022.07.20.500854</dc:identifier>
<dc:title><![CDATA[PerturbNet predicts single-cell responses to unseen chemical and genetic perturbations]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-07-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.11.29.403063v1?rss=1">
<title>
<![CDATA[
Cross-tissue eQTL mapping in the presence of missing data via surrogate outcome analysis 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.11.29.403063v1?rss=1"
</link>
<description><![CDATA[
Sample sizes vary substantially across tissues in the Genotype-Tissue Expression (GTEx) project, where considerably fewer samples are available from certain inaccessible tissues, such as the substantia nigra (SSN), than from accessible tissues, such as blood. This severely limits power for identifying tissue-specific expression quantitative trait loci (eQTL) in undersampled tissues. Here we propose Surrogate Phenotype Regression Analysis (SO_SCPLOWPRAYC_SCPLOW) for leveraging information from a correlated surrogate outcome (e.g. expression in blood) to improve inference on a partially missing target outcome (e.g. expression in SSN). Rather than regarding the surrogate outcome as a proxy for the target outcome, SO_SCPLOWPRAYC_SCPLOW jointly models the target and surrogate outcomes within a bivariate regression framework. Unobserved values of either outcome are treated as missing data. We describe and implement an expectation conditional maximization algorithm for performing estimation in the presence of bilateral outcome missingness. SO_SCPLOWPRAYC_SCPLOW estimates the same association parameter estimated by standard eQTL mapping and controls the type I error even when the target and surrogate outcomes are truly uncorrelated. We demonstrate analytically and empirically, using simulations and GTEx data, that in comparison with marginally modeling the target outcome, jointly modeling the target and surrogate outcomes increases estimation precision and improves power.
]]></description>
<dc:creator>McCaw, Z. R.</dc:creator>
<dc:creator>Gaynor, S. M.</dc:creator>
<dc:creator>Sun, R.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:date>2020-11-30</dc:date>
<dc:identifier>doi:10.1101/2020.11.29.403063</dc:identifier>
<dc:title><![CDATA[Cross-tissue eQTL mapping in the presence of missing data via surrogate outcome analysis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-11-30</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.08.25.505354v1?rss=1">
<title>
<![CDATA[
Modeling tissue co-regulation to estimate tissue-specific contributions to disease 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.08.25.505354v1?rss=1"
</link>
<description><![CDATA[
Integrative analyses of genome-wide association studies (GWAS) and gene expression data across diverse tissues and cell types have enabled the identification of putative disease-critical tissues. However, co-regulation of genetic effects on gene expression across tissues makes it difficult to distinguish biologically causal tissues from tagging tissues. While previous work emphasized the potential of accounting for tissue co-regulation, tissue-specific disease effects have not previously been formally modeled. Here, we introduce a new method, tissue co-regulation score regression (TCSC), that disentangles causal tissues from tagging tissues and partitions disease heritability (or covariance) into tissue-specific components. TCSC leverages gene-disease association statistics across tissues from transcriptome-wide association studies (TWAS), which implicate both causal and tagging genes and tissues. TCSC regresses TWAS chi-square statistics (or products of z-scores) on tissue co-regulation scores reflecting correlations of predicted gene expression across genes and tissues. In simulations, TCSC distinguishes causal tissues from tagging tissues while controlling type I error. We applied TCSC to GWAS summary statistics for 78 diseases and complex traits (average N = 302K) and gene expression prediction models for 48 GTEx tissues. TCSC identified 21 causal tissue-trait pairs at 5% FDR, including well-established findings, biologically plausible novel findings (e.g. aorta artery and glaucoma), and increased specificity of known tissue-trait associations (e.g. subcutaneous adipose, but not visceral adipose, and HDL). TCSC also identified 17 causal tissue-trait covariance pairs at 5% FDR. For the positive genetic covariance between BMI and red blood cell count, brain substantia nigra contributed positive covariance while pancreas contributed negative covariance; this suggests that genetic covariance may reflect distinct tissue-specific contributions. Overall, TCSC is a precise method for distinguishing causal tissues from tagging tissues, improving our understanding of disease and complex trait biology.
]]></description>
<dc:creator>Amariuta, T.</dc:creator>
<dc:creator>Siewert-Rocks, K.</dc:creator>
<dc:creator>Price, A. L.</dc:creator>
<dc:date>2022-08-26</dc:date>
<dc:identifier>doi:10.1101/2022.08.25.505354</dc:identifier>
<dc:title><![CDATA[Modeling tissue co-regulation to estimate tissue-specific contributions to disease]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-08-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.08.24.504550v1?rss=1">
<title>
<![CDATA[
A statistical genetics guide to identifying HLA alleles driving complex disease 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.08.24.504550v1?rss=1"
</link>
<description><![CDATA[
The human leukocyte antigen (HLA) locus is associated with more human complex diseases than any other locus. In many diseases it explains more heritability than all other known loci combined. Investigators have now demonstrated the accuracy of in silico HLA imputation methods. These approaches enable rapid and accurate estimation of HLA alleles in the millions of individuals that are already genotyped on microarrays. HLA imputation has been used to define causal variation in autoimmune diseases, such as type I diabetes, and infectious diseases, such as HIV infection control. However, there are few guidelines on performing HLA imputation, association testing, and fine-mapping. Here, we present comprehensive statistical genetics guide to impute HLA alleles from genotype data. We provide detailed protocols, including standard quality control measures for input genotyping data and describe options to impute HLA alleles and amino acids including a web-based Michigan Imputation Server. We updated the HLA imputation reference panel representing global populations (African, East Asian, European and Latino) available at the Michigan Imputation Server (n = 20,349) and achived high imputation accuracy (mean dosage correlation r = 0.981). We finally offer best practice recommendations to conduct association tests in order to define the alleles, amino acids, and haplotypes affecting human traits. This protocol will be broadly applicable to the large-scale genotyping data and contribute to defining the role of HLA in human diseases across global populations.
]]></description>
<dc:creator>Sakaue, S.</dc:creator>
<dc:creator>Gurajala, S.</dc:creator>
<dc:creator>Curtis, M.</dc:creator>
<dc:creator>Luo, Y.</dc:creator>
<dc:creator>Choi, W.</dc:creator>
<dc:creator>Ishigaki, K.</dc:creator>
<dc:creator>Kang, J. B.</dc:creator>
<dc:creator>Rumker, L.</dc:creator>
<dc:creator>Deutsch, A. J.</dc:creator>
<dc:creator>Schonherr, S.</dc:creator>
<dc:creator>Forer, L.</dc:creator>
<dc:creator>LeFaive, J.</dc:creator>
<dc:creator>Fuchsberger, C.</dc:creator>
<dc:creator>Han, B.</dc:creator>
<dc:creator>Lenz, T. L.</dc:creator>
<dc:creator>de Bakker, P. I. W.</dc:creator>
<dc:creator>Smith, A. V.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:date>2022-08-26</dc:date>
<dc:identifier>doi:10.1101/2022.08.24.504550</dc:identifier>
<dc:title><![CDATA[A statistical genetics guide to identifying HLA alleles driving complex disease]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-08-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.03.17.484479v1?rss=1">
<title>
<![CDATA[
Evidence-based calibration of computational tools for missense variant pathogenicity classification and ClinGen recommendations for clinical use of PP3/BP4 criteria 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.03.17.484479v1?rss=1"
</link>
<description><![CDATA[
Recommendations from the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) for interpreting sequence variants specify the use of computational predictors as Supporting level of evidence for pathogenicity or benignity using criteria PP3 and BP4, respectively. However, score intervals defined by tool developers, and ACMG/AMP recommendations that require the consensus of multiple predictors, lack quantitative support. Previously, we described a probabilistic framework that quantified the strengths of evidence (Supporting, Moderate, Strong, Very Strong) within ACMG/AMP recommendations. We have extended this framework to computational predictors and introduce a new standard that converts a tools scores to PP3 and BP4 evidence strengths. Our approach is based on estimating the local positive predictive value and can calibrate any computational tool or other continuous-scale evidence on any variant type. We estimate thresholds (score intervals) corresponding to each strength of evidence for pathogenicity and benignity for thirteen missense variant interpretation tools, using carefully assembled independent data sets. Most tools achieved Supporting evidence level for both pathogenic and benign classification using newly established thresholds. Multiple tools reached score thresholds justifying Moderate and several reached Strong evidence levels. One tool reached Very Strong evidence level for benign classification on some variants. Based on these findings, we provide recommendations for evidence-based revisions of the PP3 and BP4 ACMG/AMP criteria using individual tools and future assessment of computational methods for clinical interpretation.
]]></description>
<dc:creator>Pejaver, V.</dc:creator>
<dc:creator>Byrne, A. B.</dc:creator>
<dc:creator>Feng, B.-J.</dc:creator>
<dc:creator>Pagel, K. A.</dc:creator>
<dc:creator>Mooney, S. D.</dc:creator>
<dc:creator>Karchin, R.</dc:creator>
<dc:creator>O'Donnell-Luria, A.</dc:creator>
<dc:creator>Harrison, S. M.</dc:creator>
<dc:creator>Tavtigian, S. V.</dc:creator>
<dc:creator>Greenblatt, M. S.</dc:creator>
<dc:creator>Biesecker, L. G.</dc:creator>
<dc:creator>Radivojac, P.</dc:creator>
<dc:creator>Brenner, S. E.</dc:creator>
<dc:creator>ClinGen Sequence Variant Interpretation Working Group,</dc:creator>
<dc:date>2022-03-19</dc:date>
<dc:identifier>doi:10.1101/2022.03.17.484479</dc:identifier>
<dc:title><![CDATA[Evidence-based calibration of computational tools for missense variant pathogenicity classification and ClinGen recommendations for clinical use of PP3/BP4 criteria]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-03-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.02.23.481681v1?rss=1">
<title>
<![CDATA[
Enrichment of somatic mutations in schizophrenia brain targets prenatally active transcription factor bindings sites 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.02.23.481681v1?rss=1"
</link>
<description><![CDATA[
Schizophrenia (SCZ) is a complex neuropsychiatric disorder in which both germline genetic mutations and maternal factors, such as infection and immune activation, have been implicated, but how these two strikingly different mechanisms might converge on the same phenotype is unknown. During development, cells accumulate somatic, mosaic mutations in ways that can be shaped by the cellular environment or endogenous processes, but these early developmental mutational patterns have not been studied in SCZ. Here we analyzed deep (267x) whole-genome sequencing (WGS) of DNA from cerebral cortical neurons isolated from 61 SCZ and 25 control postmortem brains to capture mutations occurring before or during fetal neurogenesis. SCZ cases showed a >15% increase in genome-wide sSNV compared to controls (p < 2e-10). Remarkably, mosaic T>G mutations and CpG transversions (CpG>GpG or CpG>ApG) were 79- and 62-fold enriched, respectively, at transcription factor binding sites (TFBS) in SCZ, but not in controls. The pattern of T>G mutations resembles mutational processes in cancer attributed to oxidative damage that is sterically blocked from DNA repair by transcription factors (TFs) bound to damaged DNA. The CpG transversions similarly suggest unfinished DNA demethylation resulting in abasic sites that can also be blocked from repair by bound TFs. Allele frequency analysis suggests that both localized mutational spikes occur in the first trimester. We call this prenatal mutational process "skiagenesis" (from the Greek skia, meaning shadow), as these mutations occur in the shadow of bound TFs. Skiagenesis reflects as-yet unidentified prenatal factors and is associated with SCZ risk in a subset ([~]13%) of cases. In turn, mutational disruption of key TFBS active in fetal brain is well positioned to create SCZ-specific gene dysregulation in concert with germline risk genes. Skiagenesis provides a fingerprint for exploring how epigenomic regulation and prenatal factors such as maternal infection or immune activation may shape the developmental mutational landscape of human brain.
]]></description>
<dc:creator>Maury, E. A.</dc:creator>
<dc:creator>Jones, A.</dc:creator>
<dc:creator>Seplyarskiy, V.</dc:creator>
<dc:creator>Rosenbluh, C.</dc:creator>
<dc:creator>Bae, T.</dc:creator>
<dc:creator>Wang, Y.</dc:creator>
<dc:creator>Abyzov, A.</dc:creator>
<dc:creator>Khoshkoo, S.</dc:creator>
<dc:creator>Chahine, Y.</dc:creator>
<dc:creator>Brain Somatic Mosaicism Network,</dc:creator>
<dc:creator>Park, P. J.</dc:creator>
<dc:creator>Akbarian, S.</dc:creator>
<dc:creator>Lee, E. A.</dc:creator>
<dc:creator>Sunyaev, S. R.</dc:creator>
<dc:creator>Walsh, C. A.</dc:creator>
<dc:creator>Chess, A.</dc:creator>
<dc:date>2022-02-24</dc:date>
<dc:identifier>doi:10.1101/2022.02.23.481681</dc:identifier>
<dc:title><![CDATA[Enrichment of somatic mutations in schizophrenia brain targets prenatally active transcription factor bindings sites]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-02-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.09.491198v1?rss=1">
<title>
<![CDATA[
Deciphering the Impact of Genetic Variation on Human Polyadenylation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.09.491198v1?rss=1"
</link>
<description><![CDATA[
Genetic variants that disrupt polyadenylation can cause or contribute to genetic disorders. Yet, due to the complex cis-regulation of polyadenylation, variant interpretation remains challenging. Here, we introduce a residual neural network model, APARENT2, that can infer 3-cleavage and polyadenylation from DNA sequence more accurately than any previous model. This model generalizes to the case of alternative polyadenylation (APA) for a variable number of polyadenylation signals. We demonstrate APARENT2s performance on several variant datasets, including functional reporter data and human 3 aQTLs from GTEx. We apply neural network interpretation methods to gain insights into disrupted or protective higher-order features of polyadenylation. We fine-tune APARENT2 on human tissue-resolved transcriptomic data to elucidate tissue-specific variant effects. Finally, we perform in-silico saturation mutagenesis of all human polyadenylation signals and compare the predicted effects of >44 million variants against gnomAD. While loss-of-function variants were generally selected against, we also find specific clinical conditions linked to gain-of-function mutations. For example, using APARENT2s predictions we detect an association between gain-of-function mutations in the 3-end and Autism Spectrum Disorder.
]]></description>
<dc:creator>Linder, J.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Seelig, G.</dc:creator>
<dc:date>2022-05-10</dc:date>
<dc:identifier>doi:10.1101/2022.05.09.491198</dc:identifier>
<dc:title><![CDATA[Deciphering the Impact of Genetic Variation on Human Polyadenylation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.07.491045v1?rss=1">
<title>
<![CDATA[
Limited overlap of eQTLs and GWAS hits due to systematic differences in discovery 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.07.491045v1?rss=1"
</link>
<description><![CDATA[
Most signals in genome-wide association studies (GWAS) of complex traits point to noncoding genetic variants with putative gene regulatory effects. However, currently identified expression quantitative trait loci (eQTLs) explain only a small fraction of GWAS signals. By analyzing GWAS hits for complex traits in the UK Biobank, and cis-eQTLs from the GTEx consortium, we show that these assays systematically discover different types of genes and variants: eQTLs cluster strongly near transcription start sites, while GWAS hits do not. Genes near GWAS hits are enriched in numerous functional annotations, are under strong selective constraint and have a complex regulatory landscape across different tissue/cell types, while genes near eQTLs are depleted of most functional annotations, show relaxed constraint, and have simpler regulatory landscapes. We describe a model to understand these observations, including how natural selection on complex traits hinders discovery of functionally-relevant eQTLs. Our results imply that GWAS and eQTL studies are systematically biased toward different types of variants, and support the use of complementary functional approaches alongside the next generation of eQTL studies.
]]></description>
<dc:creator>Mostafavi, H.</dc:creator>
<dc:creator>Spence, J. P.</dc:creator>
<dc:creator>Naqvi, S.</dc:creator>
<dc:creator>Pritchard, J. K.</dc:creator>
<dc:date>2022-05-08</dc:date>
<dc:identifier>doi:10.1101/2022.05.07.491045</dc:identifier>
<dc:title><![CDATA[Limited overlap of eQTLs and GWAS hits due to systematic differences in discovery]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.08.18.504427v1?rss=1">
<title>
<![CDATA[
Recurrent mutation in the ancestry of a rare variant 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.08.18.504427v1?rss=1"
</link>
<description><![CDATA[
Recurrent mutation produces multiple copies of the same allele which may be co-segregating in a population. Yet most analyses of allele-frequency or site-frequency spectra assume that all observed copies of an allele trace back to a single mutation. We develop a sampling theory for the number of latent mutations in the ancestry of a rare variant, specifically a variant observed in relatively small count in a large sample. Our results follow from the statistical independence of low-count mutations, which we show to hold for the standard neutral coalescent or diffusion model of population genetics as well as for more general coalescent trees. For populations of constant size, these counts are given by the Ewens sampling formula. We develop a Poisson sampling model for populations of varying size, and illustrate it using new results for site-frequency spectra in an exponentially growing population. We apply our model to a large data set of human SNPs and use it to explain dramatic differences in site-frequency spectra across the range of mutation rates in the human genome.
]]></description>
<dc:creator>Wakeley, J.</dc:creator>
<dc:creator>Fan, W. T.</dc:creator>
<dc:creator>Koch, E.</dc:creator>
<dc:creator>Sunyaev, S.</dc:creator>
<dc:date>2022-08-18</dc:date>
<dc:identifier>doi:10.1101/2022.08.18.504427</dc:identifier>
<dc:title><![CDATA[Recurrent mutation in the ancestry of a rare variant]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-08-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.04.490680v1?rss=1">
<title>
<![CDATA[
A unique epigenomic landscape defines CD8+ tissue-resident memory T cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.04.490680v1?rss=1"
</link>
<description><![CDATA[
Memory T cells provide rapid and long-term protection against infection and tumors. The memory CD8+ T cell repertoire contains phenotypically and transcriptionally heterogeneous subsets with specialized functions and recirculation patterns. While these T cell populations have been well characterized in terms of differentiation potential and function, the epigenetic changes underlying memory T cell fate determination and tissue-residency remain largely unexplored. Here, we examined the single-cell chromatin landscape of CD8+ T cells over the course of acute viral infection. We reveal an early bifurcation of memory precursors displaying distinct chromatin accessibility and define epigenetic trajectories that lead to a circulating (TCIRC) or tissue-resident memory T (TRM) cell fate. While TRM cells displayed a conserved epigenetic signature across organs, we demonstrate that these cells exhibit tissue-specific signatures and identify transcription factors that regulate TRM cell populations in a site-specific manner. Moreover, we demonstrate that TRM cells and exhausted T (TEX) cells are distinct epigenetic lineages that are distinguishable early in their differentiation. Together, these findings show that TRM cell development is accompanied by dynamic alterations in chromatin accessibility that direct a unique transcriptional program resulting in a tissue-adapted and functionally distinct T cell state.

Graphical Abstract

O_FIG O_LINKSMALLFIG WIDTH=192 HEIGHT=200 SRC="FIGDIR/small/490680v1_ufig1.gif" ALT="Figure 1">
View larger version (56K):
org.highwire.dtl.DTLVardef@b03f1corg.highwire.dtl.DTLVardef@ff6871org.highwire.dtl.DTLVardef@220db2org.highwire.dtl.DTLVardef@1b15166_HPS_FORMAT_FIGEXP  M_FIG C_FIG HighlightsO_LIscATAC atlas reveals the epigenetic variance of memory CD8+ T cell subsets over the course of acute infection
C_LIO_LIEarly bifurcation of memory precursors leads to circulating versus tissue-resident cell fates
C_LIO_LIIntegrating transcriptional and epigenetic analyses identified organ-specific TRM cell regulators including HIC1 and BACH2
C_LIO_LIEpigenetic distinction of TRM cells and TEX cell subsets
C_LI
]]></description>
<dc:creator>Buquicchio, F. A.</dc:creator>
<dc:creator>Fonseca, R.</dc:creator>
<dc:creator>Belk, J. A.</dc:creator>
<dc:creator>Evrard, M.</dc:creator>
<dc:creator>Obers, A.</dc:creator>
<dc:creator>Qi, Y.</dc:creator>
<dc:creator>Daniel, B.</dc:creator>
<dc:creator>Yost, K. E.</dc:creator>
<dc:creator>Satpathy, A. T.</dc:creator>
<dc:creator>Mackay, L. K.</dc:creator>
<dc:date>2022-05-06</dc:date>
<dc:identifier>doi:10.1101/2022.05.04.490680</dc:identifier>
<dc:title><![CDATA[A unique epigenomic landscape defines CD8+ tissue-resident memory T cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.11.19.469318v1?rss=1">
<title>
<![CDATA[
Antigen presentation by type 3 innate lymphoid cells instructs the differentiation of gut microbiota-specific regulatory T cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.11.19.469318v1?rss=1"
</link>
<description><![CDATA[
The mutualistic relationship of gut-resident microbiota and cells of the host immune system promotes homeostasis that ensures maintenance of the microbial community and of a poised, but largely non-aggressive, immune cell compartment1, 2. Consequences of disturbing this balance, by environmental or genetic factors, include proximal inflammatory conditions, like Crohns disease, and systemic illnesses, both metabolic and autoimmune. One of the means by which this equilibrium is achieved is through induction of both effector and suppressor or regulatory arms of the adaptive immune system. In mice, Helicobacter species induce regulatory (iTreg) and follicular helper (Tfh) T cells in the colon-draining mesenteric lymph nodes under homeostatic conditions, but can instead induce inflammatory Th17 cells when iTreg cells are compromised3, 4. How Helicobacter hepaticus and other gut bacteria direct T cells to adopt distinct functions remains poorly understood. Here, we investigated which cells and molecular components are required to convey the microbial instruction for the iTreg differentiation program. We found that antigen presentation by cells expressing ROR{gamma}t, rather than by classical dendritic cells, was both required and sufficient for iTreg induction. These ROR{gamma}t+ cells, likely to be type 3 innate lymphoid cells (ILC3) and/or a recently-described population of Aire+ cells termed Janus cells5, require the MHC class II antigen presentation machinery, the chemokine receptor CCR7, and v integrin, which activates TGF-{beta}, for iTreg cell differentiation. In the absence of any of these, instead of iTreg cells there was expansion of microbiota-specific pathogenic Th17 cells, which were induced by other antigen presenting cells (APCs) that did not require CCR7. Thus, intestinal commensal microbes and their products target multiple APCs with pre-determined features suited to directing appropriate T cell differentiation programs, rather than a common APC that they endow with appropriate functions. Our results illustrate the ability of microbiota to exploit specialized functions of distinct innate immune system cells, targeting them to achieve the desired composition of equipoised T cells, thus maintaining tolerance.
]]></description>
<dc:creator>Kedmi, R.</dc:creator>
<dc:creator>Najar, T.</dc:creator>
<dc:creator>Mesa, K. R.</dc:creator>
<dc:creator>Grayson, A.</dc:creator>
<dc:creator>Kroehling, L.</dc:creator>
<dc:creator>Hao, Y.</dc:creator>
<dc:creator>Hao, S.</dc:creator>
<dc:creator>Pokrovskii, M.</dc:creator>
<dc:creator>Xu, M.</dc:creator>
<dc:creator>Talbot, J.</dc:creator>
<dc:creator>Wang, J.</dc:creator>
<dc:creator>Anderson, M. S.</dc:creator>
<dc:creator>Gardner, J. M.</dc:creator>
<dc:creator>Laufer, T. M.</dc:creator>
<dc:creator>Aifantis, I.</dc:creator>
<dc:creator>Bartleson, J. M.</dc:creator>
<dc:creator>Allen, P. M.</dc:creator>
<dc:creator>Stoeckius, M.</dc:creator>
<dc:creator>Littman, D. R.</dc:creator>
<dc:date>2021-11-20</dc:date>
<dc:identifier>doi:10.1101/2021.11.19.469318</dc:identifier>
<dc:title><![CDATA[Antigen presentation by type 3 innate lymphoid cells instructs the differentiation of gut microbiota-specific regulatory T cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-11-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.04.20.488974v1?rss=1">
<title>
<![CDATA[
Genome-wide CRISPR screens of T cell exhaustion identify chromatin remodeling factors that limit T cell persistence 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.04.20.488974v1?rss=1"
</link>
<description><![CDATA[
T cell exhaustion limits anti-tumor immunity, but the molecular determinants of this process remain poorly understood. Using a chronic antigen stimulation assay, we performed genome-wide CRISPR/Cas9 screens to systematically discover genetic regulators of T cell exhaustion, which identified an enrichment of epigenetic factors. In vivo CRISPR screens in murine and human tumor models demonstrated that perturbation of several epigenetic regulators, including members of the INO80 and BAF chromatin remodeling complexes, improved T cell persistence in tumors. In vivo paired CRISPR perturbation and single-cell RNA sequencing revealed distinct transcriptional roles of each complex and that depletion of canonical BAF complex members, including Arid1a, resulted in the maintenance of an effector program and downregulation of terminal exhaustion-related genes in tumor-infiltrating T cells. Finally, Arid1a-depletion limited the global acquisition of chromatin accessibility associated with T cell exhaustion and led to improved anti-tumor immunity after adoptive cell therapy. In summary, we provide a comprehensive atlas of the genetic regulators of T cell exhaustion and demonstrate that modulation of the epigenetic state of T cell exhaustion can improve T cell responses in cancer immunotherapy.
]]></description>
<dc:creator>Belk, J.</dc:creator>
<dc:creator>Yao, W.</dc:creator>
<dc:creator>Ly, N.</dc:creator>
<dc:creator>Freitas, K.</dc:creator>
<dc:creator>Chen, Y.-T.</dc:creator>
<dc:creator>Shi, Q.</dc:creator>
<dc:creator>Valencia, A.</dc:creator>
<dc:creator>Shifrut, E.</dc:creator>
<dc:creator>Kale, N.</dc:creator>
<dc:creator>Yost, K.</dc:creator>
<dc:creator>Duffy, C.</dc:creator>
<dc:creator>Hwee, M.</dc:creator>
<dc:creator>Miao, Z.</dc:creator>
<dc:creator>Ashworth, A.</dc:creator>
<dc:creator>Mackall, C.</dc:creator>
<dc:creator>Marson, A.</dc:creator>
<dc:creator>Carnevale, J.</dc:creator>
<dc:creator>Vardhana, S.</dc:creator>
<dc:creator>Satpathy, A.</dc:creator>
<dc:date>2022-04-21</dc:date>
<dc:identifier>doi:10.1101/2022.04.20.488974</dc:identifier>
<dc:title><![CDATA[Genome-wide CRISPR screens of T cell exhaustion identify chromatin remodeling factors that limit T cell persistence]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-04-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.04.18.488696v1?rss=1">
<title>
<![CDATA[
A flexible modeling and inference framework for estimating variant effect sizes from GWAS summary statistics 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.04.18.488696v1?rss=1"
</link>
<description><![CDATA[
Genome-wide association studies (GWAS) have highlighted that almost any trait is affected by many variants of relatively small effect. On one hand this presents a challenge for inferring the effect of any single variant as the signal-to-noise ratio is high for variants of small effect. This challenge is compounded when combining information across many variants in polygenic scores for predicting trait values. On the other hand, the large number of contributing variants provides an opportunity to learn about the average behavior of variants encoded in the distribution of variant effect sizes. Many approaches have looked at aspects of this problem, but no method has unified the inference of the effects of individual variants with the inference of the distribution of effect sizes while requiring only GWAS summary statistics and properly accounting for linkage disequilibrium between variants. Here we present a flexible, unifying framework that combines information across variants to infer a distribution of effect sizes and uses this distribution to improve the estimation of the effects of individual variants. We also develop a variational inference (VI) scheme to perform efficient inference under this framework. We show this framework is useful by constructing polygenic scores (PGSs) that outperform the state-of-the-art. Our modeling framework easily extends to jointly inferring effect sizes across multiple cohorts, where we show that building PGSs using additional cohorts of differing ancestries improves predictive accuracy and portability. We also investigate the inferred distributions of effect sizes across many traits and find that these distributions have effect sizes ranging over multiple orders of magnitude, in contrast to the assumptions implicit in many commonly-used statistical genetics methods.
]]></description>
<dc:creator>Spence, J. P.</dc:creator>
<dc:creator>Sinnott-Armstrong, N.</dc:creator>
<dc:creator>Assimes, T.</dc:creator>
<dc:creator>Pritchard, J. K.</dc:creator>
<dc:date>2022-04-19</dc:date>
<dc:identifier>doi:10.1101/2022.04.18.488696</dc:identifier>
<dc:title><![CDATA[A flexible modeling and inference framework for estimating variant effect sizes from GWAS summary statistics]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-04-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.06.29.498132v1?rss=1">
<title>
<![CDATA[
Integrative single-cell analysis of cardiogenesis identifies developmental trajectories and non-coding mutations in congenital heart disease 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.06.29.498132v1?rss=1"
</link>
<description><![CDATA[
Congenital heart defects, the most common birth disorders, are the clinical manifestation of anomalies in fetal heart development - a complex process involving dynamic spatiotemporal coordination among various precursor cell lineages. This complexity underlies the incomplete understanding of the genetic architecture of congenital heart diseases (CHDs). To define the multi-cellular epigenomic and transcriptional landscape of cardiac cellular development, we generated single-cell chromatin accessibility maps of human fetal heart tissues. We identified eight major differentiation trajectories involving primary cardiac cell types, each associated with dynamic transcription factor (TF) activity signatures. We identified similarities and differences of regulatory landscapes of iPSC-derived cardiac cell types and their in vivo counterparts. We interpreted deep learning models that predict cell-type resolved, base-resolution chromatin accessibility profiles from DNA sequence to decipher underlying TF motif lexicons and infer the regulatory impact of non-coding variants. De novo mutations predicted to affect chromatin accessibility in arterial endothelium were enriched in CHD cases versus controls. We used CRISPR-based perturbations to validate an enhancer harboring a nominated regulatory CHD mutation, linking it to effects on the expression of a known CHD gene JARID2. Together, this work defines the cell-type resolved cis-regulatory sequence determinants of heart development and identifies disruption of cell type-specific regulatory elements as a component of the genetic etiology of CHD.
]]></description>
<dc:creator>Ameen, M.</dc:creator>
<dc:creator>Sundaram, L.</dc:creator>
<dc:creator>Banerjee, A.</dc:creator>
<dc:creator>Shen, M.</dc:creator>
<dc:creator>Kundu, S.</dc:creator>
<dc:creator>Nair, S.</dc:creator>
<dc:creator>Shcherbina, A.</dc:creator>
<dc:creator>Gu, M.</dc:creator>
<dc:creator>Wilson, K. D.</dc:creator>
<dc:creator>Varadarajan, A.</dc:creator>
<dc:creator>Vadgama, N.</dc:creator>
<dc:creator>Balsubramani, A.</dc:creator>
<dc:creator>Wu, J. C.</dc:creator>
<dc:creator>Engreitz, J.</dc:creator>
<dc:creator>Farh, K.</dc:creator>
<dc:creator>Karakikes, I.</dc:creator>
<dc:creator>Wang, K. C.</dc:creator>
<dc:creator>Quertermous, T.</dc:creator>
<dc:creator>Greenleaf, W.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:date>2022-06-29</dc:date>
<dc:identifier>doi:10.1101/2022.06.29.498132</dc:identifier>
<dc:title><![CDATA[Integrative single-cell analysis of cardiogenesis identifies developmental trajectories and non-coding mutations in congenital heart disease]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-06-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.09.24.461597v1?rss=1">
<title>
<![CDATA[
Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.09.24.461597v1?rss=1"
</link>
<description><![CDATA[
Gene expression at the individual cell-level resolution, as quantified by single-cell RNA-sequencing (scRNA-seq), can provide unique insights into the pathology and cellular origin of diseases and complex traits. Here, we introduce single-cell Disease Relevance Score (scDRS), an approach that links scRNA-seq with polygenic risk of disease at individual cell resolution without the need for annotation of individual cells to cell types; scDRS identifies individual cells that show excess expression levels for genes in a disease-specific gene set constructed from GWAS data. We determined via simulations that scDRS is well-calibrated and powerful in identifying individual cells associated to disease. We applied scDRS to GWAS data from 74 diseases and complex traits (average N =346K) in conjunction with 16 scRNA-seq data sets spanning 1.3 million cells from 31 tissues and organs. At the cell type level, scDRS broadly recapitulated known links between classical cell types and disease, and also produced novel biologically plausible findings. At the individual cell level, scDRS identified subpopulations of disease-associated cells that are not captured by existing cell type labels, including subpopulations of CD4+ T cells associated with inflammatory bowel disease, partially characterized by their effector-like states; subpopulations of hippocampal CA1 pyramidal neurons associated with schizophrenia, partially characterized by their spatial location at the proximal part of the hippocampal CA1 region; and subpopulations of hepatocytes associated with triglyceride levels, partially characterized by their higher ploidy levels. At the gene level, we determined that genes whose expression across individual cells was correlated with the scDRS score (thus reflecting co-expression with GWAS disease genes) were strongly enriched for gold-standard drug target and Mendelian disease genes.
]]></description>
<dc:creator>Zhang, M. J.</dc:creator>
<dc:creator>Hou, K.</dc:creator>
<dc:creator>Dey, K. K.</dc:creator>
<dc:creator>Jagadeesh, K. A.</dc:creator>
<dc:creator>Weinand, K.</dc:creator>
<dc:creator>Sakaue, S.</dc:creator>
<dc:creator>Taychameekiatchai, A.</dc:creator>
<dc:creator>Rao, P.</dc:creator>
<dc:creator>Pisco, A. O.</dc:creator>
<dc:creator>Zou, J.</dc:creator>
<dc:creator>Wang, B.</dc:creator>
<dc:creator>Gandal, M.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:creator>Pasaniuc, B.</dc:creator>
<dc:creator>Price, A. L.</dc:creator>
<dc:date>2021-09-28</dc:date>
<dc:identifier>doi:10.1101/2021.09.24.461597</dc:identifier>
<dc:title><![CDATA[Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-09-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.09.02.279059v1?rss=1">
<title>
<![CDATA[
Unique contribution of enhancer-driven and master-regulator genes to autoimmune disease revealed using functionally informed SNP-to-gene linking strategies 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.09.02.279059v1?rss=1"
</link>
<description><![CDATA[
Gene regulation is known to play a fundamental role in human disease, but mechanisms of regulation vary greatly across genes. Here, we explore the contributions to disease of two types of genes: genes whose regulation is driven by enhancer regions as opposed to promoter regions (enhancer-related) and genes that regulate other genes in trans (candidate master-regulator). We link these genes to SNPs using a comprehensive set of SNP-to-gene (S2G) strategies and apply stratified LD score regression to the resulting SNP annotations to draw three main conclusions about 11 autoimmune diseases and blood cell traits (average Ncase=13K across 6 autoimmune diseases, average N =443K across 5 blood cell traits). First, several characterizations of enhancer-related genes defined in blood using functional genomics data (e.g. ATAC-seq, RNA-seq, PC-HiC) are conditionally informative for autoimmune disease heritability, after conditioning on a broad set of regulatory annotations from the baseline-LD model. Second, candidate master-regulator genes defined using trans-eQTL in blood are also conditionally informative for autoimmune disease heritability. Third, integrating enhancer-related and candidate master-regulator gene sets with protein-protein interaction (PPI) network information magnified their disease signal. The resulting PPI-enhancer gene score produced >2x stronger conditional signal (maximum standardized SNP annotation effect size ({tau}*) = 2.0 (s.e. 0.3) vs. 0.91 (s.e. 0.21)), and >2x stronger gene-level enrichment for approved autoimmune disease drug targets (5.3x vs. 2.1x), as compared to the recently proposed Enhancer Domain Score (EDS). In each case, using functionally informed S2G strategies to link genes to SNPs that may regulate them produced much stronger disease signals (4.1x-13x larger{tau} * values) than conventional window-based S2G strategies. We conclude that our characterizations of enhancer-related and candidate master-regulator genes identify gene sets that are important for autoimmune disease, and that combining those gene sets with functionally informed S2G strategies enables us to identify SNP annotations in which disease heritability is concentrated.
]]></description>
<dc:creator>Dey, K. K.</dc:creator>
<dc:creator>Gazal, S. K.</dc:creator>
<dc:creator>van de Geijn, B.</dc:creator>
<dc:creator>Kim, S. S.</dc:creator>
<dc:creator>Nasser, J.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:creator>Price, A.</dc:creator>
<dc:date>2020-09-03</dc:date>
<dc:identifier>doi:10.1101/2020.09.02.279059</dc:identifier>
<dc:title><![CDATA[Unique contribution of enhancer-driven and master-regulator genes to autoimmune disease revealed using functionally informed SNP-to-gene linking strategies]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-09-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.11.08.515683v1?rss=1">
<title>
<![CDATA[
Analysis of alternative polyadenylation from long-read or short-read RNA-seq with LAPA 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.11.08.515683v1?rss=1"
</link>
<description><![CDATA[
MotivationAlternative polyadenylation (APA) is a major mechanism that increases transcriptional diversity and regulates mRNA abundance. Existing computational tools to analyze APA have low precision because these tools are designed for short-read RNA-seq, which is a suboptimal data source to study APA. Long-read RNA-seq (LR-RNA-seq) accurately detects complete transcript isoforms with poly(A)-tails, providing an ideal data source to study APA. However, current computational tools are incompatible with LR-RNA-seq.

ResultsHere, we introduce LAPA, a computational toolkit to study alternative polyadenylation (APA) from diverse data sources such as LR-RNA-seq and short-read 3 sequencing (3-seq). LAPA counts and clusters reads with poly(A)-tail, then performs peak-calling to detect poly(A)-site in a data source agnostic manner. The resulting peaks are annotated based on genomics features and regulatory sequence elements such as presence of a poly(A)-signal. Finally, LAPA can perform robust statistical testing and multiple testing correction to detect differential APA.

We analyzed ENCODE LR-RNA-seq data from human WTC11, mouse C2C12 myoblast, and C2C12-derived differentiated myotube cells using LAPA. Comparing LR-RNA-seq from different platforms and library preparation methods against 3-seq shows that LR-RNA-seq detects poly(A)-sites with a performance of 75% precision at 57% recall. Moreover, LAPA consistently improved TES validation by at least 25% over the baseline transcriptome annotation generated by TALON, independent of protocol or platform. Differential APA analysis detected 788 statistically significant genes with unique polyadenylation signatures between undifferentiated myoblast and differentiated myotube cells. Among these genes, 3 UTR elongation is significantly associated with higher expression, while shortening is linked with lower expression. This analysis reveals a link between cell state/identity and APA. Overall, our results show that LR-RNA-seq is a reliable data source for the study of post-transcriptional regulation by providing precise information about alternative polyadenylation.

AvailabilityLAPA is publicly available at https://github.com/mortazavilab/lapa and PyPI.

Contact:: ali.mortazavi@uci.edu
]]></description>
<dc:creator>Celik, M. H.</dc:creator>
<dc:creator>Mortazavi, A.</dc:creator>
<dc:date>2022-11-08</dc:date>
<dc:identifier>doi:10.1101/2022.11.08.515683</dc:identifier>
<dc:title><![CDATA[Analysis of alternative polyadenylation from long-read or short-read RNA-seq with LAPA]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-11-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.10.06.463360v1?rss=1">
<title>
<![CDATA[
A systematic genotype-phenotype map for missense variants in the human intellectual disability-associated gene GDI1 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.10.06.463360v1?rss=1"
</link>
<description><![CDATA[
Next generation sequencing has become a common tool in the diagnosis of genetic diseases. However, for the vast majority of genetic variants that are discovered, a clinical interpretation is not available. Variant effect mapping allows the functional effects of many single amino acid variants to be characterized in parallel. Here, we combine multiplexed functional assays with machine learning to assess the effects of amino acid substitutions in the human intellectual disability-associated gene, GDI1. We show that the resulting variant effect map can be used to discriminate pathogenic from benign variants. Our variant effect map recovers known biochemical and structural features of GDI1 and reveals additional aspects of GDI1 function. We explore how our functional assays can aid in the interpretation of novel GDI1 variants as they are discovered, and to re-classify previously observed variants of unknown significance.
]]></description>
<dc:creator>Silverstein, R. A.</dc:creator>
<dc:creator>Sun, S. A.</dc:creator>
<dc:creator>Verby, M.</dc:creator>
<dc:creator>Weile, J.</dc:creator>
<dc:creator>Wu, Y.</dc:creator>
<dc:creator>Roth, F. P.</dc:creator>
<dc:date>2021-10-06</dc:date>
<dc:identifier>doi:10.1101/2021.10.06.463360</dc:identifier>
<dc:title><![CDATA[A systematic genotype-phenotype map for missense variants in the human intellectual disability-associated gene GDI1]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-10-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.11.29.470445v1?rss=1">
<title>
<![CDATA[
MaveDB v2: a curated community database with over three million variant effects from multiplexed functional assays 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.11.29.470445v1?rss=1"
</link>
<description><![CDATA[
A central problem in genomics is understanding the effect of individual DNA variants. Multiplexed Assays of Variant Effect (MAVEs) can help address this challenge by measuring all possible single nucleotide variant effects in a gene or regulatory sequence simultaneously. Here we describe MaveDB v2, which has become the database of record for MAVEs. MaveDB now contains a large fraction of published studies, comprising over two hundred datasets and three million variant effect measurements. We created tools and APIs to streamline data submission and access, transforming MaveDB into a hub for the analysis and dissemination of these impactful datasets.
]]></description>
<dc:creator>Rubin, A. F.</dc:creator>
<dc:creator>Min, J. K.</dc:creator>
<dc:creator>Rollins, N. J.</dc:creator>
<dc:creator>Da, E. Y.</dc:creator>
<dc:creator>Esposito, D.</dc:creator>
<dc:creator>Harrington, M.</dc:creator>
<dc:creator>Stone, J.</dc:creator>
<dc:creator>Bianchi, A. H.</dc:creator>
<dc:creator>Fu, Y.</dc:creator>
<dc:creator>Gallaher, M.</dc:creator>
<dc:creator>Li, I.</dc:creator>
<dc:creator>Moscatelli, O.</dc:creator>
<dc:creator>Ong, J. Y.</dc:creator>
<dc:creator>Rollins, J. E.</dc:creator>
<dc:creator>Wakefield, M. J.</dc:creator>
<dc:creator>Ye, S.</dc:creator>
<dc:creator>Tam, A.</dc:creator>
<dc:creator>McEwen, A. E.</dc:creator>
<dc:creator>Starita, L. M.</dc:creator>
<dc:creator>Bryant, V. L.</dc:creator>
<dc:creator>Marks, D. S.</dc:creator>
<dc:creator>Fowler, D. M.</dc:creator>
<dc:date>2021-11-30</dc:date>
<dc:identifier>doi:10.1101/2021.11.29.470445</dc:identifier>
<dc:title><![CDATA[MaveDB v2: a curated community database with over three million variant effects from multiplexed functional assays]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-11-30</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.10.06.511211v1?rss=1">
<title>
<![CDATA[
Chemico-genetic Analysis of Native Autism Proteomes Reveals Shared Biology Predictive of Functional Modifiers 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.10.06.511211v1?rss=1"
</link>
<description><![CDATA[
One of the main drivers of autism spectrum disorder is risk alleles within hundreds of genes, which may interact within shared but unknown protein complexes. Here we develop a scalable genome-editing-mediated approach to target 14 high-confidence autism risk genes within the mouse brain for proximity-based endogenous proteomics, achieving high specificity spatial interactomes compared to prior methods. The resulting native proximity interactomes are enriched for human genes dysregulated in the brain of autistic individuals and reveal unexpected and highly significant interactions with other lower-confidence autism risk gene products, positing new avenues to prioritize genetic risk. Importantly, the datasets are enriched for shared cellular functions and genetic interactions that may underlie the condition. We test this notion by spatial proteomics and CRISPR-based regulation of expression in two autism models, demonstrating functional interactions that modulate mechanisms of their dysregulation. Together, these results reveal native proteome networks in vivo relevant to autism, providing new inroads for understanding and manipulating the cellular drivers underpinning its etiology.
]]></description>
<dc:creator>Gao, Y.</dc:creator>
<dc:creator>Trn, M.</dc:creator>
<dc:creator>Shonai, D.</dc:creator>
<dc:creator>Zhao, J.</dc:creator>
<dc:creator>Soderblom, E. J.</dc:creator>
<dc:creator>Garcia-moreno, S. A.</dc:creator>
<dc:creator>Gersbach, C. A.</dc:creator>
<dc:creator>Wetsel, W. C.</dc:creator>
<dc:creator>Dawson, G.</dc:creator>
<dc:creator>Velmeshev, D.</dc:creator>
<dc:creator>Jiang, Y.-h.</dc:creator>
<dc:creator>Sloofman, L.</dc:creator>
<dc:creator>Buxbaum, J.</dc:creator>
<dc:creator>Soderling, S. H.</dc:creator>
<dc:date>2022-10-07</dc:date>
<dc:identifier>doi:10.1101/2022.10.06.511211</dc:identifier>
<dc:title><![CDATA[Chemico-genetic Analysis of Native Autism Proteomes Reveals Shared Biology Predictive of Functional Modifiers]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-10-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.09.20.459182v1?rss=1">
<title>
<![CDATA[
Assessing computational variant effect predictors with a large prospective cohort 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.09.20.459182v1?rss=1"
</link>
<description><![CDATA[
BackgroundCausal gene/trait relationships can be identified via observation of an excess (or reduced) burden of rare variation in a given gene within humans who have that trait. Although computational predictors can improve the power of such  burden tests, it is unclear which are optimal for this task.

MethodUsing 140 gene-trait combinations with a reported rare-variant burden association, we evaluated the ability of 20 computational predictors to predict human traits. We used the best-performing predictors to increase the power of genome-wide rare variant burden scans based on [~]450K UK Biobank participants.

ResultsTwo predictors--VARITY and REVEL--outperformed all others in predicting human traits in the UK Biobank from missense variation. Genome-scale burden scans using the two best-performing predictors identified 1,038 gene-trait associations (FDR < 5%), including 567 (55%) that had not been previously reported. We explore 54 cardiovascular gene-trait associations (including 15 not reported in other burden scans) in greater depth.

ConclusionsRigorous selection of computational missense variant effect predictors can improve the power of rare-variant burden scans for human gene-trait associations, yielding many new associations with potential value in informing mechanistic understanding and therapeutic development. The strategy we describe here is generalizable to future computational variant effect predictors, traits and organisms.
]]></description>
<dc:creator>Kuang, D.</dc:creator>
<dc:creator>Li, R.</dc:creator>
<dc:creator>Wu, Y.</dc:creator>
<dc:creator>Weile, J.</dc:creator>
<dc:creator>Hegele, R. A.</dc:creator>
<dc:creator>Roth, F. P.</dc:creator>
<dc:date>2021-09-20</dc:date>
<dc:identifier>doi:10.1101/2021.09.20.459182</dc:identifier>
<dc:title><![CDATA[Assessing computational variant effect predictors with a large prospective cohort]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-09-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.06.27.497649v1?rss=1">
<title>
<![CDATA[
Epo-IGF1R crosstalk expands stress-specific progenitors in regenerative erythropoiesis and myeloproliferative neoplasm 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.06.27.497649v1?rss=1"
</link>
<description><![CDATA[
We find that in regenerative erythropoiesis, the erythroid progenitor landscape is reshaped, and a previously undescribed progenitor population with CFU-E activity (stress CFU-E/sCFU-E) is markedly expanded to restore the erythron. sCFU-E are targets of erythropoietin (Epo) and sCFU-E expansion requires signaling from the Epo receptor (EpoR) cytoplasmic tyrosines. Molecularly, Epo promotes sCFU-E expansion via JAK2/STAT5-dependent expression of IRS2, thus engaging the pro-growth signaling from the IGF1 receptor (IGF1R). Inhibition of IGF1R/IRS2 signaling impairs sCFU-E cell growth, whereas exogenous IRS2 expression rescues cell growth in sCFU-E expressing truncated EpoR lacking cytoplasmic tyrosines. This sCFU-E pathway is the major pathway involved in erythrocytosis driven by the oncogenic JAK2 mutant, JAK2(V617F), in myeloproliferative neoplasm. Inability to expand sCFU-E cells by truncated EpoR protects against JAK2(V617F)-driven erythrocytosis. In myeloproliferative neoplasm patient samples, the number of sCFU-E like cells increases, and inhibition of IGR1R/IRS2 signaling blocks Epo-hypersensitive erythroid cell colony formation. In summary, we identify a new stress-specific erythroid progenitor cell population that links regenerative erythropoiesis to pathogenic erythrocytosis.

Key PointsO_LIEpo-induced IRS2 allows engagement of IGF1R signaling to expand a previously unrecognized progenitor population in erythropoietic stress.
C_LIO_LITruncated EpoR does not support stress CFU-E expansion and protects against JAK2(V617F)-driven erythrocytosis in MPN.
C_LI
]]></description>
<dc:creator>Huang, L.</dc:creator>
<dc:creator>Hsieh, H.-h.</dc:creator>
<dc:creator>Yao, H.</dc:creator>
<dc:creator>Ma, Y.</dc:creator>
<dc:creator>Zhang, Y.</dc:creator>
<dc:creator>Xiao, X.</dc:creator>
<dc:creator>Stephens, H.</dc:creator>
<dc:creator>Chung, S. S.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Xu, J.</dc:creator>
<dc:creator>Rampal, R. K.</dc:creator>
<dc:date>2022-06-29</dc:date>
<dc:identifier>doi:10.1101/2022.06.27.497649</dc:identifier>
<dc:title><![CDATA[Epo-IGF1R crosstalk expands stress-specific progenitors in regenerative erythropoiesis and myeloproliferative neoplasm]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-06-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/256313v1?rss=1">
<title>
<![CDATA[
Massively parallel dissection of human accelerated regions in human and chimpanzee neural progenitors 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/256313v1?rss=1"
</link>
<description><![CDATA[
Using machine learning (ML), we interrogated the function of all human-chimpanzee variants in 2,645 Human Accelerated Regions (HARs), some of the fastest evolving regions of the human genome. We predicted that 43% of HARs have variants with large opposing effects on chromatin state and 14% on neurodevelopmental enhancer activity. This pattern, consistent with compensatory evolution, was confirmed using massively parallel reporter assays in human and chimpanzee neural progenitor cells. The species-specific enhancer activity of assayed HARs was accurately predicted from the presence and absence of transcription factor footprints in each species. Despite these striking cis effects, activity of a given HAR sequence was nearly identical in human and chimpanzee cells. These findings suggest that HARs did not evolve to compensate for changes in the trans environment but instead altered their ability to bind factors present in both species. Thus, ML prioritized variants with functional effects on human neurodevelopment and revealed an unexpected reason why HARs may have evolved so rapidly.
]]></description>
<dc:creator>Ryu, H.</dc:creator>
<dc:creator>Inoue, F.</dc:creator>
<dc:creator>Whalen, S.</dc:creator>
<dc:creator>Williams, A.</dc:creator>
<dc:creator>Kircher, M.</dc:creator>
<dc:creator>Martin, B.</dc:creator>
<dc:creator>Alvarado, B.</dc:creator>
<dc:creator>Samee, M. A. H.</dc:creator>
<dc:creator>Keough, K.</dc:creator>
<dc:creator>Thomas, S.</dc:creator>
<dc:creator>Kriegstein, A.</dc:creator>
<dc:creator>Shendure, J.</dc:creator>
<dc:creator>Pollen, A.</dc:creator>
<dc:creator>Ahituv, N.</dc:creator>
<dc:creator>Pollard, K.</dc:creator>
<dc:date>2018-01-29</dc:date>
<dc:identifier>doi:10.1101/256313</dc:identifier>
<dc:title><![CDATA[Massively parallel dissection of human accelerated regions in human and chimpanzee neural progenitors]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2018-01-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.10.05.511030v1?rss=1">
<title>
<![CDATA[
Integrative dissection of gene regulatory elements at base resolution 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.10.05.511030v1?rss=1"
</link>
<description><![CDATA[
Although vast numbers of putative gene regulatory elements have been cataloged, the sequence motifs and individual bases that underlie their functions remain largely unknown. Here we combine epigenetic perturbations, base editing, and deep learning models to dissect regulatory sequences within the exemplar immune locus encoding CD69. Focusing on a differentially accessible and acetylated upstream enhancer, we find that the complementary strategies converge on a [~]170 base interval as critical for CD69 induction in stimulated Jurkat T cells. We pinpoint individual cytosine to thymine base edits that markedly reduce element accessibility and acetylation, with corresponding reduction of CD69 expression. The most potent base edits may be explained by their effect on binding competition between the transcriptional activator GATA3 and the repressor BHLHE40. Systematic analysis of GATA and bHLH/Ebox motifs suggests that interplay between these factors plays a general role in rapid T cell transcriptional responses. Our study provides a framework for parsing gene regulatory elements in their endogenous chromatin contexts and identifying operative artificial variants.

HighlightsO_LIBase editing screens and deep learning pinpoint sequences and single bases affecting immune gene expression
C_LIO_LIAn artificial C-to-T variant in a regulatory element suppresses CD69 expression by altering the balance of transcription factor binding
C_LIO_LICompetition between GATA3 and BHLHE40 regulates inducible immune genes and T cell states
C_LI
]]></description>
<dc:creator>Chen, Z.</dc:creator>
<dc:creator>Javed, N. M.</dc:creator>
<dc:creator>Moore, M.</dc:creator>
<dc:creator>Wu, J.</dc:creator>
<dc:creator>Vinyard, M. E.</dc:creator>
<dc:creator>Pinello, L.</dc:creator>
<dc:creator>Najm, F.</dc:creator>
<dc:creator>Bernstein, B. E.</dc:creator>
<dc:date>2022-10-06</dc:date>
<dc:identifier>doi:10.1101/2022.10.05.511030</dc:identifier>
<dc:title><![CDATA[Integrative dissection of gene regulatory elements at base resolution]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-10-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.04.07.487515v1?rss=1">
<title>
<![CDATA[
CHD-associated enhancers shape human cardiomyocyte lineage commitment 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.04.07.487515v1?rss=1"
</link>
<description><![CDATA[
Enhancers orchestrate gene expression programs that drive multicellular development and lineage commitment. Thus, genetic variants at enhancers are thought to contribute to developmental diseases by altering cell fate commitment. However, while many variant-containing enhancers have been identified, studies to endogenously test the impact of these enhancers on lineage commitment have been lacking. We perform a single-cell CRISPRi screen to assess the endogenous roles of 25 enhancers and putative cardiac target genes implicated in genetic studies of congenital heart defects (CHD). We identify 16 enhancers whose repression leads to deficient differentiation of human cardiomyocytes (CMs). A focused CRISPRi validation screen shows that repression of TBX5 enhancers delays the transcriptional switch from mid- to late-stage CM states. Endogenous genetic deletions of two TBX5 enhancers phenocopy epigenetic perturbations. Together, these results identify critical enhancers of cardiac development and suggest that misregulation of these enhancers could contribute to cardiac defects in human patients.

HIGHLIGHTSO_LISingle-cell enhancer perturbation screens during human cardiomyocyte differentiation.
C_LIO_LIPerturbation of CHD-linked enhancers/genes causes deficient CM differentiation.
C_LIO_LIRepression or knockout of TBX5 enhancers delays transition from mid to late CM states.
C_LIO_LIDeficient differentiation coincides with reduced expression of known cardiac genes.
C_LI
]]></description>
<dc:creator>Armendariz, D. A.</dc:creator>
<dc:creator>Goetsch, S. C.</dc:creator>
<dc:creator>Wang, Y.</dc:creator>
<dc:creator>Xie, S.</dc:creator>
<dc:creator>Munshi, N. V.</dc:creator>
<dc:creator>Hon, G. C.</dc:creator>
<dc:date>2022-04-10</dc:date>
<dc:identifier>doi:10.1101/2022.04.07.487515</dc:identifier>
<dc:title><![CDATA[CHD-associated enhancers shape human cardiomyocyte lineage commitment]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.07.26.501609v1?rss=1">
<title>
<![CDATA[
Dynamic states of cervical epithelia during pregnancy and epithelial barrier disruption 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.07.26.501609v1?rss=1"
</link>
<description><![CDATA[
The cervical epithelium undergoes continuous changes in proliferation, differentiation, and function that are critical before pregnancy to ensure fertility and during pregnancy to provide a physical and immunoprotective barrier for pregnancy maintenance. Barrier disruption can lead to the ascension of pathogens that elicit inflammatory responses and preterm birth. Here, we identify cervical epithelial subtypes in nonpregnant, pregnant, and in-labor mice using single-cell transcriptome and spatial analysis. We identify heterogeneous subpopulations of epithelia displaying spatial and temporal specificity. Notably, two goblet cell subtypes with distinct transcriptional programs and mucosal networks were dominant in pregnancy. Untimely basal cell proliferation and goblet cells with diminished mucosal integrity characterize barrier dysfunction in mice lacking hyaluronan. These data demonstrate how the cervical epithelium undergoes continuous remodeling to maintain dynamic states of homeostasis in pregnancy and labor, and provide a framework to understand perturbations in epithelial health and host-microbe interactions that increase the risk of premature birth.
]]></description>
<dc:creator>Cooley, A.</dc:creator>
<dc:creator>Madhukaran, S.</dc:creator>
<dc:creator>Stroebele, E.</dc:creator>
<dc:creator>Caraballo, M. C.</dc:creator>
<dc:creator>Wang, L.</dc:creator>
<dc:creator>Hon, G.</dc:creator>
<dc:creator>Mahendroo, M.</dc:creator>
<dc:date>2022-07-28</dc:date>
<dc:identifier>doi:10.1101/2022.07.26.501609</dc:identifier>
<dc:title><![CDATA[Dynamic states of cervical epithelia during pregnancy and epithelial barrier disruption]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-07-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.06.27.497796v1?rss=1">
<title>
<![CDATA[
CROP-Seq: a single-cell CRISPRi platform for characterizing candidate genes relevant to metabolic disorders in human adipocytes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.06.27.497796v1?rss=1"
</link>
<description><![CDATA[
ObjectiveCROP-Seq combines gene silencing using CRISPR interference (CRISPRi) with single-cell RNA sequencing (scRNA-Seq) to conduct a functional reverse genetic screen of novel gene targets associated with adipocyte differentiation or function, with single-cell transcriptomes as the readout.

MethodsWe created a human preadipocyte SGBS cell line with stable expression of KRAB-dCas9 for CRISPRi-mediated gene knock-down. This line was transduced with a lentiviral library of sgRNAs targeting 6 genes of interest (3 sgRNAs / gene, 18 sgRNAs), 6 positive control genes (3 sgRNAs / gene, 18 sgRNAs), and non-targeting control sgRNAs (4 sgRNAs). Transduced cells were selected and differentiated, and individual cells were captured using microfluidics at day 0, 4 and 8 of adipogenic differentiation. Next, expression and sgRNA libraries were created and sequenced. Bioinformatic analysis of resulting scRNA-Seq expression data was used to determine the effects of gene knock-down and the dysregulated pathways, and to predict cellular phenotypes.

ResultsSingle-cell transcriptomes obtained from SGBS cells following CRISPRi recapitulate different states of differentiation from preadipocytes to adipocytes. We confirmed successful knock-down of targeted genes. Transcriptome-wide changes were observed for all targeted genes, with over 400 differentially expressed genes identified per gene at least at one timepoint. Knock-down of known adipogenesis regulators PPARG and CEBPB inhibited adipogenesis. Gene set enrichment analyses revealed molecular processes for adipose tissue differentiation and function for novel genes. MAFF knock-down led to a downregulation of transcriptional response to proinflammatory cytokine TNF- in preadipocytes. TIPARP knock-down resulted in an increase in the expression of a beiging marker UCP1 at D8 of adipogenesis.

ConclusionsThe CROP-Seq system in SGBS cells can determine the consequences of target gene knock-down at the transcriptome level. This powerful, hypothesis-free tool can identify novel regulators of adipogenesis, preadipocyte and adipocyte function associated with metabolic disease.

HighlightsO_LICRISPR interference screen coupled with single-cell RNA sequencing (CROP-Seq)
C_LIO_LIParallel screening of 12 genes in human SGBS adipocytes and preadipocytes
C_LIO_LIUncovered novel regulators of adipogenesis and adipocyte function
C_LI

Graphical abstract

O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=127 SRC="FIGDIR/small/497796v1_ufig1.gif" ALT="Figure 1">
View larger version (33K):
org.highwire.dtl.DTLVardef@1ee34b6org.highwire.dtl.DTLVardef@1c6ba8aorg.highwire.dtl.DTLVardef@fa89org.highwire.dtl.DTLVardef@403966_HPS_FORMAT_FIGEXP  M_FIG C_FIG
]]></description>
<dc:creator>Bielczyk-Maczynska, E.</dc:creator>
<dc:creator>Sharma, D.</dc:creator>
<dc:creator>Blencowe, M.</dc:creator>
<dc:creator>Saliba-Gustafsson, P.</dc:creator>
<dc:creator>Gloudemans, M. J.</dc:creator>
<dc:creator>Yang, X.</dc:creator>
<dc:creator>Carcamo-Orive, I.</dc:creator>
<dc:creator>Wabitsch, M.</dc:creator>
<dc:creator>Svensson, K. J.</dc:creator>
<dc:creator>Park, C. Y.</dc:creator>
<dc:creator>Quertermous, T.</dc:creator>
<dc:creator>Knowles, J. W.</dc:creator>
<dc:creator>Li, J.</dc:creator>
<dc:date>2022-06-27</dc:date>
<dc:identifier>doi:10.1101/2022.06.27.497796</dc:identifier>
<dc:title><![CDATA[CROP-Seq: a single-cell CRISPRi platform for characterizing candidate genes relevant to metabolic disorders in human adipocytes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-06-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.11.01.514606v1?rss=1">
<title>
<![CDATA[
Mapping the convergence of genes for coronary artery disease onto endothelial cell programs 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.11.01.514606v1?rss=1"
</link>
<description><![CDATA[
Genome-wide association studies (GWAS) have discovered thousands of risk loci for common, complex diseases, each of which could point to genes and gene programs that influence disease. For some diseases, it has been observed that GWAS signals converge on a smaller number of biological programs, and that this convergence can help to identify causal genes1-6. However, identifying such convergence remains challenging: each GWAS locus can have many candidate genes, each gene might act in one or more possible programs, and it remains unclear which programs might influence disease risk. Here, we developed a new approach to address this challenge, by creating unbiased maps to link disease variants to genes to programs (V2G2P) in a given cell type. We applied this approach to study the role of endothelial cells in the genetics of coronary artery disease (CAD). To link variants to genes, we constructed enhancer-gene maps using the Activity-by-Contact model7,8. To link genes to programs, we applied CRISPRi-Perturb-seq9-12 to knock down all expressed genes within {+/-}500 Kb of 306 CAD GWAS signals13,14 and identify their effects on gene expression programs using single-cell RNA-sequencing. By combining these variant-to-gene and gene-to-program maps, we find that 43 of 306 CAD GWAS signals converge onto 5 gene programs linked to the cerebral cavernous malformations (CCM) pathway--which is known to coordinate transcriptional responses in endothelial cells15, but has not been previously linked to CAD risk. The strongest regulator of these programs is TLNRD1, which we show is a new CAD gene and novel regulator of the CCM pathway. TLNRD1 loss-of-function alters actin organization and barrier function in endothelial cells in vitro, and heart development in zebrafish in vivo. Together, our study identifies convergence of CAD risk loci into prioritized gene programs in endothelial cells, nominates new genes of potential therapeutic relevance for CAD, and demonstrates a generalizable strategy to connect disease variants to functions.
]]></description>
<dc:creator>Schnitzler, G. R.</dc:creator>
<dc:creator>Kang, H.</dc:creator>
<dc:creator>Lee-Kim, V. S.</dc:creator>
<dc:creator>Ma, R. X.</dc:creator>
<dc:creator>Zeng, T.</dc:creator>
<dc:creator>Angom, R. S.</dc:creator>
<dc:creator>Fang, S.</dc:creator>
<dc:creator>Vellarikkal, S. K.</dc:creator>
<dc:creator>Zhou, R.</dc:creator>
<dc:creator>Guo, K.</dc:creator>
<dc:creator>Sias-Garcia, O.</dc:creator>
<dc:creator>Bloemendal, A.</dc:creator>
<dc:creator>Munson, G.</dc:creator>
<dc:creator>Guckelberger, P.</dc:creator>
<dc:creator>Nguyen, T. H.</dc:creator>
<dc:creator>Bergman, D. T.</dc:creator>
<dc:creator>Cheng, N.</dc:creator>
<dc:creator>Cleary, B.</dc:creator>
<dc:creator>Aragam, K.</dc:creator>
<dc:creator>Mukhopadhyay, D.</dc:creator>
<dc:creator>Lander, E. S.</dc:creator>
<dc:creator>Finucane, H. K.</dc:creator>
<dc:creator>Gupta, R. M.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:date>2022-11-04</dc:date>
<dc:identifier>doi:10.1101/2022.11.01.514606</dc:identifier>
<dc:title><![CDATA[Mapping the convergence of genes for coronary artery disease onto endothelial cell programs]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-11-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.01.17.524475v1?rss=1">
<title>
<![CDATA[
HiCLift: A fast and efficient tool for converting chromatin interaction data between genome assemblies 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.01.17.524475v1?rss=1"
</link>
<description><![CDATA[
MotivationWith the continuous effort to improve the quality of human reference genome and the generation of more and more personal genomes, the conversion of genomic coordinates between genome assemblies is critical in many integrative and comparative studies. While tools have been developed for such task for linear genome signals such as ChIP-Seq, no tool exists to convert genome assemblies for chromatin interaction data, despite the importance of three-dimensional (3D) genome organization in gene regulation and disease.

ResultsHere, we present HiCLift, a fast and efficient tool that can convert the genomic coordinates of chromatin contacts such as Hi-C and Micro-C from one assembly to another, including the latest T2T genome. Comparing with the strategy of directly re-mapping raw reads to a different genome, HiCLift runs on average 42 times faster (hours vs. days), while outputs nearly identical contact matrices. More importantly, as HiCLift does not need to re-map the raw reads, it can directly convert human patient sample data, where the raw sequencing reads are sometimes hard to acquire or not available.

AvailabilityHiCLift is publicly available at https://github.com/XiaoTaoWang/HiCLift.
]]></description>
<dc:creator>Wang, X.</dc:creator>
<dc:creator>Yue, F.</dc:creator>
<dc:date>2023-01-20</dc:date>
<dc:identifier>doi:10.1101/2023.01.17.524475</dc:identifier>
<dc:title><![CDATA[HiCLift: A fast and efficient tool for converting chromatin interaction data between genome assemblies]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-01-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.11.29.518374v1?rss=1">
<title>
<![CDATA[
Comparing Genomic and Epigenomic Features across Species Using the WashU Comparative Epigenome Browser 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.11.29.518374v1?rss=1"
</link>
<description><![CDATA[
Genome browsers have become an intuitive and critical tool to visualize and analyze genomic features and data. Conventional genome browsers display data/annotations on a single reference genome/assembly; there are also genomic alignment viewer/browsers that help users visualize alignment, mismatch, and rearrangement between syntenic regions. However, there is a growing need for a comparative epigenome browser that can display genomic and epigenomic datasets across different species and enable users to compare them between syntenic regions. Here, we present the WashU Comparative Epigenome Browser (http://comparativegateway.wustl.edu). It allows users to load functional genomic datasets/annotations mapped to different genomes and display them over syntenic regions simultaneously. The browser also displays genetic differences between the genomes from single nucleotide variants (SNVs) to structural variants (SVs) to visualize the association between epigenomic differences and genetic differences. Instead of anchoring all datasets to the reference genome coordinates, it creates independent coordinates of different genome assemblies to faithfully present features and data mapped to different genomes. It uses a simple, intuitive genome-align track to illustrate the syntenic relationship between different species. It extends the widely used WashU Epigenome Browser infrastructure and can be expanded to support multiple species. This new browser function will greatly facilitate comparative genomic/epigenomic research, as well as support the recent growing needs to directly compare and benchmark the T2T CHM13 assembly and other human genome assemblies.
]]></description>
<dc:creator>Zhuo, X.</dc:creator>
<dc:creator>Hsu, S.</dc:creator>
<dc:creator>Purushotham, D.</dc:creator>
<dc:creator>Chen, S.</dc:creator>
<dc:creator>Li, D.</dc:creator>
<dc:creator>Wang, T.</dc:creator>
<dc:date>2022-12-02</dc:date>
<dc:identifier>doi:10.1101/2022.11.29.518374</dc:identifier>
<dc:title><![CDATA[Comparing Genomic and Epigenomic Features across Species Using the WashU Comparative Epigenome Browser]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.17.533215v1?rss=1">
<title>
<![CDATA[
A machine-readable specification for genomics assays 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.17.533215v1?rss=1"
</link>
<description><![CDATA[
Understanding the structure of sequenced fragments from genomics libraries is essential for accurate read preprocessing. Currently, different assays and sequencing technologies require custom scripts and programs that do not leverage the common structure of sequence elements present in genomics libraries. We present seqspec, a machine-readable specification for libraries produced by genomics assays that facilitates standardization of preprocessing and enables tracking and comparison of genomics assays. The specification and associated seqspec command line tool is available at https://github.com/IGVF/seqspec.
]]></description>
<dc:creator>Booeshaghi, A. S.</dc:creator>
<dc:creator>Chen, X.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-03-21</dc:date>
<dc:identifier>doi:10.1101/2023.03.17.533215</dc:identifier>
<dc:title><![CDATA[A machine-readable specification for genomics assays]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.07.531569v1?rss=1">
<title>
<![CDATA[
Dynamic network-guided CRISPRi screen reveals CTCF loop-constrained nonlinear enhancer-gene regulatory activity in cell state transitions 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.07.531569v1?rss=1"
</link>
<description><![CDATA[
Comprehensive enhancer discovery is challenging because most enhancers, especially those affected in complex diseases, have weak effects on gene expression. Our network modeling revealed that nonlinear enhancer-gene regulation during cell state transitions can be leveraged to improve the sensitivity of enhancer discovery. Utilizing hESC definitive endoderm differentiation as a dynamic transition system, we conducted a mid-transition CRISPRi-based enhancer screen. The screen discovered a comprehensive set of enhancers (4 to 9 per locus) for each of the core endoderm lineage-specifying transcription factors, and many enhancers had strong effects mid-transition but weak effects post-transition. Through integrating enhancer activity measurements and three-dimensional enhancer-promoter interaction information, we were able to develop a CTCF loop-constrained Interaction Activity (CIA) model that can better predict functional enhancers compared to models that rely on Hi-C-based enhancer-promoter contact frequency. Our study provides generalizable strategies for sensitive and more comprehensive enhancer discovery in both normal and pathological cell state transitions.
]]></description>
<dc:creator>Luo, R.</dc:creator>
<dc:creator>Yan, J.</dc:creator>
<dc:creator>Oh, J. W.</dc:creator>
<dc:creator>Xi, W.</dc:creator>
<dc:creator>Shigaki, D.</dc:creator>
<dc:creator>Wong, W.</dc:creator>
<dc:creator>Cho, H.</dc:creator>
<dc:creator>Murphy, D.</dc:creator>
<dc:creator>Cutler, R.</dc:creator>
<dc:creator>Rosen, B. P.</dc:creator>
<dc:creator>Pulecio, J.</dc:creator>
<dc:creator>Yang, D.</dc:creator>
<dc:creator>Glenn, R.</dc:creator>
<dc:creator>Chen, T.</dc:creator>
<dc:creator>Li, Q. V.</dc:creator>
<dc:creator>Vierbuchen, T.</dc:creator>
<dc:creator>Sidoli, S.</dc:creator>
<dc:creator>Apostolou, E.</dc:creator>
<dc:creator>Huangfu, D.</dc:creator>
<dc:creator>Beer, M. A.</dc:creator>
<dc:date>2023-03-09</dc:date>
<dc:identifier>doi:10.1101/2023.03.07.531569</dc:identifier>
<dc:title><![CDATA[Dynamic network-guided CRISPRi screen reveals CTCF loop-constrained nonlinear enhancer-gene regulatory activity in cell state transitions]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.05.01.538906v1?rss=1">
<title>
<![CDATA[
Orthogonal CRISPR screens to identify transcriptional and epigenetic regulators of human CD8 T cell function 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.05.01.538906v1?rss=1"
</link>
<description><![CDATA[
The clinical response to adoptive T cell therapies is strongly associated with transcriptional and epigenetic state. Thus, technologies to discover regulators of T cell gene networks and their corresponding phenotypes have great potential to improve the efficacy of T cell therapies. We developed pooled CRISPR screening approaches with compact epigenome editors to systematically profile the effects of activation and repression of 120 transcription factors and epigenetic modifiers on human CD8+ T cell state. These screens nominated known and novel regulators of T cell phenotypes with BATF3 emerging as a high confidence gene in both screens. We found that BATF3 overexpression promoted specific features of memory T cells such as increased IL7R expression and glycolytic capacity, while attenuating gene programs associated with cytotoxicity, regulatory T cell function, and T cell exhaustion. In the context of chronic antigen stimulation, BATF3 overexpression countered phenotypic and epigenetic signatures of T cell exhaustion. CAR T cells overexpressing BATF3 significantly outperformed control CAR T cells in both in vitro and in vivo tumor models. Moreover, we found that BATF3 programmed a transcriptional profile that correlated with positive clinical response to adoptive T cell therapy. Finally, we performed CRISPR knockout screens with and without BATF3 overexpression to define co-factors and downstream factors of BATF3, as well as other therapeutic targets. These screens pointed to a model where BATF3 interacts with JUNB and IRF4 to regulate gene expression and illuminated several other novel targets for further investigation.
]]></description>
<dc:creator>McCutcheon, S.</dc:creator>
<dc:creator>Swartz, A.</dc:creator>
<dc:creator>Brown, M.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>McRoberts Amador, C.</dc:creator>
<dc:creator>Siklenka, K.</dc:creator>
<dc:creator>Humayun, L.</dc:creator>
<dc:creator>Isaacs, J.</dc:creator>
<dc:creator>Reddy, T. E.</dc:creator>
<dc:creator>Nair, S.</dc:creator>
<dc:creator>Antonia, S.</dc:creator>
<dc:creator>Gersbach, C. A.</dc:creator>
<dc:date>2023-05-01</dc:date>
<dc:identifier>doi:10.1101/2023.05.01.538906</dc:identifier>
<dc:title><![CDATA[Orthogonal CRISPR screens to identify transcriptional and epigenetic regulators of human CD8 T cell function]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-05-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.10.04.560808v1?rss=1">
<title>
<![CDATA[
Transcription factor stoichiometry, motif affinity and syntax regulate single-cell chromatin dynamics during fibroblast reprogramming to pluripotency 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.10.04.560808v1?rss=1"
</link>
<description><![CDATA[
The concentration and stoichiometry of transcription factors (TFs) determine cellular identity and can be manipulated to drive cell state transitions. Understanding how changes in TF concentration regulate chromatin state and expression across cell state transitions remains a challenge. We investigated this relationship by profiling chromatin accessibility and gene expression at single-cell resolution across a densely sampled time course of reprogramming human fibroblasts to induced pluripotent stem cells via ectopic expression of OCT4, SOX2, KLF4, and MYC (OSKM). Using deep learning sequence models of base-resolution chromatin accessibility profiles across cell states, we deciphered predictive transcription factor (TF) motif syntax in regulatory elements, inferred affinity- and concentration-dependent dynamics of TF footprints, linked peaks to putative target genes, and elucidated rewiring of cis-regulatory networks. Our models reveal that early in reprogramming, OSK, at supraphysiological concentrations, rapidly open transient regulatory elements by occupying non-canonical low-affinity binding sites. As OSK concentration falls, the accessibility of these transient elements decays as a function of motif affinity. We find that these OSK-dependent transient elements sequester the somatic TF AP-1. This redistribution is strongly associated with the silencing of fibroblast-specific genes within individual nuclei. Together, our integrated single-cell resource and models reveal insights into the cis-regulatory code of reprogramming at unprecedented resolution. We establish a quantitative, predictive framework that links TF stoichiometry, motif syntax, and somatic silencing to provide new perspectives on the control of cell identity by TFs during fate transitions.
]]></description>
<dc:creator>Nair, S.</dc:creator>
<dc:creator>Ameen, M.</dc:creator>
<dc:creator>Sundaram, L.</dc:creator>
<dc:creator>Pampari, A.</dc:creator>
<dc:creator>Schreiber, J.</dc:creator>
<dc:creator>Balsubramani, A.</dc:creator>
<dc:creator>Wang, Y. X.</dc:creator>
<dc:creator>Burns, D.</dc:creator>
<dc:creator>Blau, H. M.</dc:creator>
<dc:creator>Karakikes, I.</dc:creator>
<dc:creator>Wang, K. C.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:date>2023-10-04</dc:date>
<dc:identifier>doi:10.1101/2023.10.04.560808</dc:identifier>
<dc:title><![CDATA[Transcription factor stoichiometry, motif affinity and syntax regulate single-cell chromatin dynamics during fibroblast reprogramming to pluripotency]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-10-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.12.07.570715v1?rss=1">
<title>
<![CDATA[
Reconstructing Spatial Transcriptomics at the Single-cell Resolution with BayesDeep 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.12.07.570715v1?rss=1"
</link>
<description><![CDATA[
Spatially resolved transcriptomics (SRT) techniques have revolutionized the characterization of molecular profiles while preserving spatial and morphological context. However, most next-generation sequencing-based SRT techniques are limited to measuring gene expression in a confined array of spots, capturing only a fraction of the spatial domain. Typically, these spots encompass gene expression from a few to hundreds of cells, underscoring a critical need for more detailed, single-cell resolution SRT data to enhance our understanding of biological functions within the tissue context. Addressing this challenge, we introduce BayesDeep, a novel Bayesian hierarchical model that leverages cellular morphological data from histology images, commonly paired with SRT data, to reconstruct SRT data at the single-cell resolution. BayesDeep effectively model count data from SRT studies via a negative binomial regression model. This model incorporates explanatory variables such as cell types and nuclei-shape information for each cell extracted from the paired histology image. A feature selection scheme is integrated to examine the association between the morphological and molecular profiles, thereby improving the model robustness. We applied BayesDeep to two real SRT datasets, successfully demonstrating its capability to reconstruct SRT data at the single-cell resolution. This advancement not only yields new biological insights but also significantly enhances various downstream analyses, such as pseudotime and cell-cell communication.
]]></description>
<dc:creator>Jiang, X.</dc:creator>
<dc:creator>Dong, L.</dc:creator>
<dc:creator>Wang, S.</dc:creator>
<dc:creator>Wen, Z.</dc:creator>
<dc:creator>Chen, M.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Xiao, G.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:date>2023-12-08</dc:date>
<dc:identifier>doi:10.1101/2023.12.07.570715</dc:identifier>
<dc:title><![CDATA[Reconstructing Spatial Transcriptomics at the Single-cell Resolution with BayesDeep]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-12-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.11.19.567742v1?rss=1">
<title>
<![CDATA[
MPRAbase: A Massively Parallel Reporter Assay Database 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.11.19.567742v1?rss=1"
</link>
<description><![CDATA[
Massively parallel reporter assays (MPRAs) represent a set of high-throughput technologies that measure the functional effects of thousands of sequences/variants on gene regulatory activity. There are several different variations of MPRA technology and they are used for numerous applications, including regulatory element discovery, variant effect measurement, saturation mutagenesis, synthetic regulatory element generation or characterization of evolutionary gene regulatory differences. Despite their many designs and uses, there is no comprehensive database that incorporates the results of these experiments. To address this, we developed MPRAbase, a manually curated database that currently harbors 129 experiments, encompassing 17,718,677 elements tested across 35 cell types and 4 organisms. The MPRAbase web interface (http://www.mprabase.com) serves as a centralized user-friendly repository to download existing MPRA data for independent analysis and is designed with the ability to allow researchers to share their published data for rapid dissemination to the community.
]]></description>
<dc:creator>Zhao, J.</dc:creator>
<dc:creator>Baltoumas, F. A.</dc:creator>
<dc:creator>Konnaris, M. A.</dc:creator>
<dc:creator>Mouratidis, I.</dc:creator>
<dc:creator>Liu, Z.</dc:creator>
<dc:creator>Sims, J.</dc:creator>
<dc:creator>Agarwal, V.</dc:creator>
<dc:creator>Pavlopoulos, G. A.</dc:creator>
<dc:creator>Georgakopoulos-Soares, I.</dc:creator>
<dc:creator>Ahituv, N.</dc:creator>
<dc:date>2023-11-22</dc:date>
<dc:identifier>doi:10.1101/2023.11.19.567742</dc:identifier>
<dc:title><![CDATA[MPRAbase: A Massively Parallel Reporter Assay Database]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-11-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.04.12.536587v1?rss=1">
<title>
<![CDATA[
Chromatin context-dependent regulation and epigenetic manipulation of prime editing 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.04.12.536587v1?rss=1"
</link>
<description><![CDATA[
Prime editing is a powerful means of introducing precise changes to specific locations in mammalian genomes. However, the widely varying efficiency of prime editing across target sites of interest has limited its adoption in the context of both basic research and clinical settings. Here, we set out to exhaustively characterize the impact of the cis-chromatin environment on prime editing efficiency. Using a newly developed and highly sensitive method for mapping the genomic locations of a randomly integrated "sensor", we identify specific epigenetic features that strongly correlate with the highly variable efficiency of prime editing across different genomic locations. Next, to assess the interaction of trans-acting factors with the cis-chromatin environment, we develop and apply a pooled genetic screening approach with which the impact of knocking down various DNA repair factors on prime editing efficiency can be stratified by cis-chromatin context. Finally, we demonstrate that we can dramatically modulate the efficiency of prime editing through epigenome editing, i.e. altering chromatin state in a locus-specific manner in order to increase or decrease the efficiency of prime editing at a target site. Looking forward, we envision that the insights and tools described here will broaden the range of both basic research and therapeutic contexts in which prime editing is useful.
]]></description>
<dc:creator>Li, X.</dc:creator>
<dc:creator>Chen, W.</dc:creator>
<dc:creator>Martin, B. K.</dc:creator>
<dc:creator>Calderon, D.</dc:creator>
<dc:creator>Lee, C.</dc:creator>
<dc:creator>Choi, J.</dc:creator>
<dc:creator>Chardon, F. M.</dc:creator>
<dc:creator>McDiarmid, T.</dc:creator>
<dc:creator>Kim, H.</dc:creator>
<dc:creator>Lalanne, J.-B.</dc:creator>
<dc:creator>Nathans, J. F.</dc:creator>
<dc:creator>Shendure, J.</dc:creator>
<dc:date>2023-04-12</dc:date>
<dc:identifier>doi:10.1101/2023.04.12.536587</dc:identifier>
<dc:title><![CDATA[Chromatin context-dependent regulation and epigenetic manipulation of prime editing]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-04-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.28.534017v1?rss=1">
<title>
<![CDATA[
Multiplex, single-cell CRISPRa screening for cell type specific regulatory elements 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.28.534017v1?rss=1"
</link>
<description><![CDATA[
CRISPR-based gene activation (CRISPRa) is a promising therapeutic approach for gene therapy, upregulating gene expression by targeting promoters or enhancers in a tissue/cell-type specific manner. Here, we describe an experimental framework that combines highly multiplexed perturbations with single-cell RNA sequencing (sc-RNA-seq) to identify cell-type-specific, CRISPRa-responsive cis-regulatory elements and the gene(s) they regulate. Random combinations of many gRNAs are introduced to each of many cells, which are then profiled and partitioned into test and control groups to test for effect(s) of CRISPRa perturbations of both enhancers and promoters on the expression of neighboring genes. Applying this method to a library of 493 gRNAs targeting candidate cis-regulatory elements in both K562 cells and iPSC-derived excitatory neurons, we identify gRNAs capable of specifically upregulating intended target genes and no other neighboring genes within 1 Mb, including gRNAs yielding upregulation of six autism spectrum disorder (ASD) and neurodevelopmental disorder (NDD) risk genes in neurons. A consistent pattern is that the responsiveness of individual enhancers to CRISPRa is restricted by cell type, implying a dependency on either chromatin landscape and/or additional trans-acting factors for successful gene activation. The approach outlined here may facilitate large-scale screens for gRNAs that activate therapeutically relevant genes in a cell type-specific manner.
]]></description>
<dc:creator>Chardon, F. M.</dc:creator>
<dc:creator>McDiarmid, T. A.</dc:creator>
<dc:creator>Page, N. F.</dc:creator>
<dc:creator>Martin, B. K.</dc:creator>
<dc:creator>Domcke, S.</dc:creator>
<dc:creator>Regalado, S. G.</dc:creator>
<dc:creator>Lalanne, J.-B.</dc:creator>
<dc:creator>Calderon, D.</dc:creator>
<dc:creator>Starita, L. M.</dc:creator>
<dc:creator>Sanders, S. J.</dc:creator>
<dc:creator>Ahituv, N.</dc:creator>
<dc:creator>Shendure, J.</dc:creator>
<dc:date>2023-03-28</dc:date>
<dc:identifier>doi:10.1101/2023.03.28.534017</dc:identifier>
<dc:title><![CDATA[Multiplex, single-cell CRISPRa screening for cell type specific regulatory elements]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.05.531189v1?rss=1">
<title>
<![CDATA[
Massively parallel characterization of transcriptional regulatory elements in three diverse human cell types 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.05.531189v1?rss=1"
</link>
<description><![CDATA[
The human genome contains millions of candidate cis-regulatory elements (CREs) with cell-type-specific activities that shape both health and myriad disease states. However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these CREs. Here, we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of over 680,000 sequences, representing a nearly comprehensive set of all annotated CREs among three cell types (HepG2, K562, and WTC11), finding 41.7% to be functional. By testing sequences in both orientations, we find promoters to have significant strand orientation effects. We also observe that their 200 nucleotide cores function as non-cell-type-specific  on switches providing similar expression levels to their associated gene. In contrast, enhancers have weaker orientation effects, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict CRE function with high accuracy and delineate regulatory motifs. Testing an additional lentiMPRA library encompassing 60,000 CREs in all three cell types, we further identified factors that determine cell-type specificity. Collectively, our work provides an exhaustive catalog of functional CREs in three widely used cell lines, and showcases how large-scale functional measurements can be used to dissect regulatory grammar.
]]></description>
<dc:creator>Agarwal, V.</dc:creator>
<dc:creator>Inoue, F.</dc:creator>
<dc:creator>Schubach, M.</dc:creator>
<dc:creator>Martin, B.</dc:creator>
<dc:creator>Dash, P.</dc:creator>
<dc:creator>Zhang, Z.</dc:creator>
<dc:creator>Sohota, A.</dc:creator>
<dc:creator>Noble, W.</dc:creator>
<dc:creator>Yardimci, G.</dc:creator>
<dc:creator>Kircher, M.</dc:creator>
<dc:creator>Shendure, J.</dc:creator>
<dc:creator>Ahituv, N.</dc:creator>
<dc:date>2023-03-06</dc:date>
<dc:identifier>doi:10.1101/2023.03.05.531189</dc:identifier>
<dc:title><![CDATA[Massively parallel characterization of transcriptional regulatory elements in three diverse human cell types]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.02.15.528663v1?rss=1">
<title>
<![CDATA[
Massively parallel characterization of psychiatric disorder-associated and cell-type-specific regulatory elements in the developing human cortex 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.02.15.528663v1?rss=1"
</link>
<description><![CDATA[
Nucleotide changes in gene regulatory elements are important determinants of neuronal development and disease. Using massively parallel reporter assays in primary human cells from mid-gestation cortex and cerebral organoids, we interrogated the cis-regulatory activity of 102,767 sequences, including differentially accessible cell-type specific regions in the developing cortex and single-nucleotide variants associated with psychiatric disorders. In primary cells, we identified 46,802 active enhancer sequences and 164 disorder-associated variants that significantly alter enhancer activity. Activity was comparable in organoids and primary cells, suggesting that organoids provide an adequate model for the developing cortex. Using deep learning, we decoded the sequence basis and upstream regulators of enhancer activity. This work establishes a comprehensive catalog of functional gene regulatory elements and variants in human neuronal development.

One Sentence SummaryWe identify 46,802 enhancers and 164 psychiatric disorder variants with regulatory effects in the developing cortex and organoids.
]]></description>
<dc:creator>Deng, C.</dc:creator>
<dc:creator>Whalen, S.</dc:creator>
<dc:creator>Steyert, M.</dc:creator>
<dc:creator>Ziffra, R.</dc:creator>
<dc:creator>Przytycki, P. F.</dc:creator>
<dc:creator>Inoue, F.</dc:creator>
<dc:creator>Pereira, D. A.</dc:creator>
<dc:creator>Capauto, D.</dc:creator>
<dc:creator>Norton, S.</dc:creator>
<dc:creator>Vaccarino, F. M.</dc:creator>
<dc:creator>Pollen, A. A.</dc:creator>
<dc:creator>Nowakowski, T. J.</dc:creator>
<dc:creator>Ahituv, N. A.</dc:creator>
<dc:creator>Pollard, K. S.</dc:creator>
<dc:date>2023-02-15</dc:date>
<dc:identifier>doi:10.1101/2023.02.15.528663</dc:identifier>
<dc:title><![CDATA[Massively parallel characterization of psychiatric disorder-associated and cell-type-specific regulatory elements in the developing human cortex]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-02-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.10.519236v1?rss=1">
<title>
<![CDATA[
Multiplex profiling of developmental enhancers with quantitative, single-cell expression reporters 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.10.519236v1?rss=1"
</link>
<description><![CDATA[
The inability to scalably and precisely measure the activity of developmental enhancers in multicellular systems is a bottleneck in genomics. Here, we develop a dual RNA cassette that decouples the detection and quantification tasks inherent to multiplex single-cell reporter assays, resulting in accurate measurement of reporter expression over a >10,000-fold range of activity with a precision approaching the limit set by Poisson counting noise. Together with RNA barcode circularization, these single-cell quantitative expression reporters (scQers) provide high-contrast readouts analogous to classic in situ assays, but entirely from sequencing. Screening >200 enhancers in a multicellular in vitro model of early mammalian development, we identified numerous autonomous and cell-type-specific elements, including constituents of the Sox2 control region exclusively active in pluripotent cells, endoderm-specific enhancers, including near Foxa2 and Gata4, and a compact pleiotropic enhancer at the Lamc1 locus. scQers can be mobilized in developmental systems to quantitatively characterize native, perturbed, and synthetic enhancers at scale, with high sensitivity and at single-cell resolution.
]]></description>
<dc:creator>Lalanne, J.-B.</dc:creator>
<dc:creator>Regalado, S. G.</dc:creator>
<dc:creator>Domcke, S.</dc:creator>
<dc:creator>Calderon, D.</dc:creator>
<dc:creator>Martin, B.</dc:creator>
<dc:creator>Li, T.</dc:creator>
<dc:creator>Suiter, C. C.</dc:creator>
<dc:creator>Lee, C.</dc:creator>
<dc:creator>Trapnell, C.</dc:creator>
<dc:creator>Shendure, J. A.</dc:creator>
<dc:date>2022-12-10</dc:date>
<dc:identifier>doi:10.1101/2022.12.10.519236</dc:identifier>
<dc:title><![CDATA[Multiplex profiling of developmental enhancers with quantitative, single-cell expression reporters]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.02.22.529427v1?rss=1">
<title>
<![CDATA[
Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.02.22.529427v1?rss=1"
</link>
<description><![CDATA[
SummaryLong read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. sequencing mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, sequencing errors can interfere with correct barcode identification, and a given barcode sequence may be linked to multiple independent clones within a given library.

Here we focus on the target application of sequencing mutagenized libraries in the context of multiplexed assays of variant effects (MAVEs). MAVEs are increasingly used to create comprehensive genotype-phenotype maps that can aid clinical variant interpretation. Many MAVE methods use long-read sequencing of barcoded mutant libraries for accurate association of barcode with genotype. Existing long-read sequencing pipelines do not account for inaccurate sequencing or non-unique barcodes. Here, we describe Pacybara, which handles these issues by clustering long reads based on the similarities of (error-prone) barcodes while also detecting barcodes that have been associated with multiple genotypes. Pacybara also detects recombinant (chimeric) clones and reduces false positive indel calls. In three example applications, we show that Pacybara identifies and correctly resolves these issues.

Availability and ImplementationPacybara, freely available at https://github.com/rothlab/pacybara, is implemented using R, Python and bash for Linux. It has both a single-threaded implementation and, for GNU/Linux clusters that use Slurm, PBS, or GridEngine schedulers, a multi-node version.

Supplementary MaterialSupplementary materials are available at Bioinformatics online.
]]></description>
<dc:creator>Weile, J.</dc:creator>
<dc:creator>Cote, A. G.</dc:creator>
<dc:creator>Kishore, N.</dc:creator>
<dc:creator>Tabet, D.</dc:creator>
<dc:creator>van Loggerenberg, W.</dc:creator>
<dc:creator>Rayhan, A.</dc:creator>
<dc:creator>Roth, F. P.</dc:creator>
<dc:date>2023-02-23</dc:date>
<dc:identifier>doi:10.1101/2023.02.22.529427</dc:identifier>
<dc:title><![CDATA[Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-02-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.10.20.562794v1?rss=1">
<title>
<![CDATA[
Assigning credit where it's due: An information content score to capture the clinical value of Multiplexed Assays of Variant Effect 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.10.20.562794v1?rss=1"
</link>
<description><![CDATA[
BackgroundA variant can be pathogenic or benign with relation to a human disease. Current classification categories from benign to pathogenic reflect a probabilistic summary of current understanding. A primary metric of clinical utility for multiplexed assays of variant effect (MAVE) is the number of variants that can be reclassified from uncertain significance (VUS). However, we hypothesized that this measure of utility underrepresents the information gained from MAVEs and that an information theory approach which includes data that does not reclassify variants will better reflect true information gain. We used this information theory approach to evaluate the information gain, in bits, for MAVEs of BRCA1, PTEN, and TP53. Here, one bit represents the amount of information required to completely classify a single variant starting from no information.

ResultsBRCA1 MAVEs produced a total of 831.2 bits of information, 6.58% of the total missense information in BRCA1 and a 22-fold increase over the information that only contributed to VUS reclassification. PTEN MAVEs produced 2059.6 bits of information which represents 32.8% of the total missense information in PTEN and an 85-fold increase over the information that contributed to VUS reclassification. TP53 MAVEs produced 277.8 bits of information which represents 6.22% of the total missense information in TP53 and a 3.5-fold increase over the information that contributed to VUS reclassification.

ConclusionsAn information content approach will more accurately portray information gained through MAVE mapping efforts than counting the number of variants reclassified. This information content approach may also help define the impact of modifying information definitions used to classify many variants, such as guideline rule changes.
]]></description>
<dc:creator>Ranola, J. M.</dc:creator>
<dc:creator>Horton, C.</dc:creator>
<dc:creator>Pesaran, T.</dc:creator>
<dc:creator>Fayer, S.</dc:creator>
<dc:creator>Starita, L. M.</dc:creator>
<dc:creator>Shirts, B. H.</dc:creator>
<dc:date>2023-10-20</dc:date>
<dc:identifier>doi:10.1101/2023.10.20.562794</dc:identifier>
<dc:title><![CDATA[Assigning credit where it's due: An information content score to capture the clinical value of Multiplexed Assays of Variant Effect]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-10-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.06.20.545702v1?rss=1">
<title>
<![CDATA[
Mapping MAVE data for use in human genomics applications 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.06.20.545702v1?rss=1"
</link>
<description><![CDATA[
The large-scale experimental measures of variant functional assays submitted to MaveDB have the potential to provide key information for resolving variants of uncertain significance, but the reporting of results relative to assayed sequence hinders their downstream utility. The Atlas of Variant Effects Alliance mapped multiplexed assays of variant effect data to human reference sequences, creating a robust set of machine-readable homology mappings. This method processed approximately 2.5 million protein and genomic variants in MaveDB, successfully mapping 98.61% of examined variants and disseminating data to resources such as the UCSC Genome Browser and Ensembl Variant Effect Predictor.
]]></description>
<dc:creator>Arbesfeld, J. A.</dc:creator>
<dc:creator>Da, E. Y.</dc:creator>
<dc:creator>Kuzma, K.</dc:creator>
<dc:creator>Paul, A.</dc:creator>
<dc:creator>Farris, T.</dc:creator>
<dc:creator>Riehle, K.</dc:creator>
<dc:creator>Agostinho, N. D. S.</dc:creator>
<dc:creator>Safer, J. F.</dc:creator>
<dc:creator>Milosavljevic, A.</dc:creator>
<dc:creator>Foreman, J.</dc:creator>
<dc:creator>Firth, H. V.</dc:creator>
<dc:creator>Hunt, S. E.</dc:creator>
<dc:creator>Iqbal, S.</dc:creator>
<dc:creator>Cline, M.</dc:creator>
<dc:creator>Rubin, A. F.</dc:creator>
<dc:creator>Wagner, A. H.</dc:creator>
<dc:date>2023-06-23</dc:date>
<dc:identifier>doi:10.1101/2023.06.20.545702</dc:identifier>
<dc:title><![CDATA[Mapping MAVE data for use in human genomics applications]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-06-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.01.02.573913v1?rss=1">
<title>
<![CDATA[
Genomics 2 Proteins portal: A resource and discovery tool for linking genetic screening outputs to protein sequences and structures 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.01.02.573913v1?rss=1"
</link>
<description><![CDATA[
Recent advances in AI-based methods have revolutionized the field of structural biology. Concomitantly, high-throughput sequencing and functional genomics technologies have enabled the detection and generation of variants at an unprecedented scale. However, efficient tools and resources are needed to link these two disparate data types - to "map" variants onto protein structures, to better understand how the variation causes disease and thereby design therapeutics. Here we present the Genomics 2 Proteins Portal (G2P; g2p.broadinstitute.org/): a human proteome-wide resource that maps 19,996,443 genetic variants onto 42,413 protein sequences and 77,923 structures, with a comprehensive set of structural and functional features. Additionally, the G2P portal generalizes the capability of linking genomics to proteins beyond databases by allowing users to interactively upload protein residue-wise annotations (variants, scores, etc.) as well as the protein structure to establish the connection. The portal serves as an easy-to-use discovery tool for researchers and scientists to hypothesize the structure-function relationship between natural or synthetic variations and their molecular phenotype.
]]></description>
<dc:creator>Kwon, S.</dc:creator>
<dc:creator>Safer, J.</dc:creator>
<dc:creator>Nguyen, D. T.</dc:creator>
<dc:creator>Hoksza, D.</dc:creator>
<dc:creator>May, P.</dc:creator>
<dc:creator>Arbesfeld, J.</dc:creator>
<dc:creator>Rubin, A. F.</dc:creator>
<dc:creator>Campbell, A. J.</dc:creator>
<dc:creator>Burgin, A.</dc:creator>
<dc:creator>Iqbal, S.</dc:creator>
<dc:date>2024-01-02</dc:date>
<dc:identifier>doi:10.1101/2024.01.02.573913</dc:identifier>
<dc:title><![CDATA[Genomics 2 Proteins portal: A resource and discovery tool for linking genetic screening outputs to protein sequences and structures]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-01-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.14.520494v1?rss=1">
<title>
<![CDATA[
Integrating deep mutational scanning and low-throughput mutagenesis data to predict the impact of amino acid variants 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.14.520494v1?rss=1"
</link>
<description><![CDATA[
Evaluating the impact of amino acid variants has been a critical challenge for studying protein function and interpreting genomic data. High-throughput experimental methods like deep mutational scanning (DMS) can measure the effect of large numbers of variants in a target protein, but because DMS studies have not been performed on all proteins, researchers also model DMS data computationally to estimate variant impacts by predictors. In this study, we extended a linear regression-based predictor to explore whether incorporating data from alanine scanning (AS), a widely-used low-throughput mutagenesis method, would improve prediction results. To evaluate our model, we collected 146 AS datasets, mapping to 54 DMS datasets across 22 distinct proteins. We show that improved model performance depends on the compatibility of the DMS and AS assays, and the scale of improvement is closely related to the correlation between DMS and AS results.
]]></description>
<dc:creator>Fu, Y.</dc:creator>
<dc:creator>Bedo, J.</dc:creator>
<dc:creator>Papenfuss, A. T.</dc:creator>
<dc:creator>Rubin, A. F.</dc:creator>
<dc:date>2022-12-16</dc:date>
<dc:identifier>doi:10.1101/2022.12.14.520494</dc:identifier>
<dc:title><![CDATA[Integrating deep mutational scanning and low-throughput mutagenesis data to predict the impact of amino acid variants]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-16</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.11.09.563812v1?rss=1">
<title>
<![CDATA[
An encyclopedia of enhancer-gene regulatory interactions in the human genome 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.11.09.563812v1?rss=1"
</link>
<description><![CDATA[
Identifying transcriptional enhancers and their target genes is essential for understanding gene regulation and the impact of human genetic variation on disease1-6. Here we create and evaluate a resource of >13 million enhancer-gene regulatory interactions across 352 cell types and tissues, by integrating predictive models, measurements of chromatin state and 3D contacts, and large-scale genetic perturbations generated by the ENCODE Consortium7. We first create a systematic benchmarking pipeline to compare predictive models, assembling a dataset of 10,411 element-gene pairs measured in CRISPR perturbation experiments, >30,000 fine-mapped eQTLs, and 569 fine-mapped GWAS variants linked to a likely causal gene. Using this framework, we develop a new predictive model, ENCODE-rE2G, that achieves state-of-the-art performance across multiple prediction tasks, demonstrating a strategy involving iterative perturbations and supervised machine learning to build increasingly accurate predictive models of enhancer regulation. Using the ENCODE-rE2G model, we build an encyclopedia of enhancer-gene regulatory interactions in the human genome, which reveals global properties of enhancer networks, identifies differences in the functions of genes that have more or less complex regulatory landscapes, and improves analyses to link noncoding variants to target genes and cell types for common, complex diseases. By interpreting the model, we find evidence that, beyond enhancer activity and 3D enhancer-promoter contacts, additional features guide enhancer-promoter communication including promoter class and enhancer-enhancer synergy. Altogether, these genome-wide maps of enhancer-gene regulatory interactions, benchmarking software, predictive models, and insights about enhancer function provide a valuable resource for future studies of gene regulation and human genetics.
]]></description>
<dc:creator>Gschwind, A. R.</dc:creator>
<dc:creator>Mualim, K. S.</dc:creator>
<dc:creator>Karbalayghareh, A.</dc:creator>
<dc:creator>Sheth, M. U.</dc:creator>
<dc:creator>Dey, K. K.</dc:creator>
<dc:creator>Jagoda, E.</dc:creator>
<dc:creator>Nurtdinov, R. N.</dc:creator>
<dc:creator>Xi, W.</dc:creator>
<dc:creator>Tan, A. S.</dc:creator>
<dc:creator>Jones, H.</dc:creator>
<dc:creator>Ma, X. R.</dc:creator>
<dc:creator>Yao, D.</dc:creator>
<dc:creator>Nasser, J.</dc:creator>
<dc:creator>Avsec, Z.</dc:creator>
<dc:creator>James, B. T.</dc:creator>
<dc:creator>Shamim, M. S.</dc:creator>
<dc:creator>Durand, N. C.</dc:creator>
<dc:creator>Rao, S. S. P.</dc:creator>
<dc:creator>Mahajan, R.</dc:creator>
<dc:creator>Doughty, B. R.</dc:creator>
<dc:creator>Andreeva, K.</dc:creator>
<dc:creator>Ulirsch, J. C.</dc:creator>
<dc:creator>Fan, K.</dc:creator>
<dc:creator>Perez, E. M.</dc:creator>
<dc:creator>Nguyen, T. C.</dc:creator>
<dc:creator>Kelley, D. R.</dc:creator>
<dc:creator>Finucane, H. K.</dc:creator>
<dc:creator>Moore, J. E.</dc:creator>
<dc:creator>Weng, Z.</dc:creator>
<dc:creator>Kellis, M.</dc:creator>
<dc:creator>Bassik, M. C.</dc:creator>
<dc:creator>Price, A. L.</dc:creator>
<dc:creator>Beer, M. A.</dc:creator>
<dc:creator>Guigo, R.</dc:creator>
<dc:creator>Stamatoyannopoulos, J. A.</dc:creator>
<dc:creator>Aiden, E. L.</dc:creator>
<dc:creator>Greenleaf, W. J.</dc:creator>
<dc:creator>Leslie, C. S.</dc:creator>
<dc:creator>Steinmetz, L. M.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:date>2023-11-13</dc:date>
<dc:identifier>doi:10.1101/2023.11.09.563812</dc:identifier>
<dc:title><![CDATA[An encyclopedia of enhancer-gene regulatory interactions in the human genome]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-11-13</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.27.534456v1?rss=1">
<title>
<![CDATA[
Single-cell transcriptome dataset of human and mouse 	in vitro adipogenesis models 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.27.534456v1?rss=1"
</link>
<description><![CDATA[
Adipogenesis is a process in which fat-specific progenitor cells (preadipocytes) differentiate into adipocytes that carry out the key metabolic functions of the adipose tissue, including glucose uptake, energy storage, and adipokine secretion. Several cell lines are routinely used to study the molecular regulation of adipogenesis, in particular the immortalized mouse 3T3-L1 line and the primary human Simpson-Golabi-Behmel syndrome (SGBS) line. However, the cell-to-cell variability of transcriptional changes prior to and during adipogenesis in these models is not well understood. Here, we present a single-cell RNA-Sequencing (scRNA-Seq) dataset collected before and during adipogenic differentiation of 3T3-L1 and SGBS cells. To minimize the effects of experimental variation, we mixed 3T3-L1 and SGBS cells and used computational analysis to demultiplex transcriptomes of mouse and human cells. In both models, adipogenesis results in the appearance of three cell clusters, corresponding to preadipocytes, early and mature adipocytes. These data provide a groundwork for comparative studies on human and mouse adipogenesis, as well as on cell-to-cell variability in gene expression during this process.
]]></description>
<dc:creator>Li, J.</dc:creator>
<dc:creator>Jin, C.</dc:creator>
<dc:creator>Gustafsson, S.</dc:creator>
<dc:creator>Rao, A.</dc:creator>
<dc:creator>Wabitsch, M.</dc:creator>
<dc:creator>Park, C. Y.</dc:creator>
<dc:creator>Quertermous, T.</dc:creator>
<dc:creator>Bielczyk-Maczynska, E.</dc:creator>
<dc:creator>Knowles, J. W.</dc:creator>
<dc:date>2023-03-29</dc:date>
<dc:identifier>doi:10.1101/2023.03.27.534456</dc:identifier>
<dc:title><![CDATA[Single-cell transcriptome dataset of human and mouse 	in vitro adipogenesis models]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.12.20.572268v1?rss=1">
<title>
<![CDATA[
Rewriting regulatory DNA to dissect and reprogram gene expression 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.12.20.572268v1?rss=1"
</link>
<description><![CDATA[
Regulatory DNA sequences within enhancers and promoters bind transcription factors to encode cell type-specific patterns of gene expression. However, the regulatory effects and programmability of such DNA sequences remain difficult to map or predict because we have lacked scalable methods to precisely edit regulatory DNA and quantify the effects in an endogenous genomic context. Here we present an approach to measure the quantitative effects of hundreds of designed DNA sequence variants on gene expression, by combining pooled CRISPR prime editing with RNA fluorescence in situ hybridization and cell sorting (Variant-FlowFISH). We apply this method to mutagenize and rewrite regulatory DNA sequences in an enhancer and the promoter of PPIF in two immune cell lines. Of 672 variant-cell type pairs, we identify 497 that affect PPIF expression. These variants appear to act through a variety of mechanisms including disruption or optimization of existing transcription factor binding sites, as well as creation of de novo sites. Disrupting a single endogenous transcription factor binding site often led to large changes in expression (up to -40% in the enhancer, and -50% in the promoter). The same variant often had different effects across cell types and states, demonstrating a highly tunable regulatory landscape. We use these data to benchmark performance of sequence-based predictive models of gene regulation, and find that certain types of variants are not accurately predicted by existing models. Finally, we computationally design 185 small sequence variants ([&le;]10 bp) and optimize them for specific effects on expression in silico. 84% of these rationally designed edits showed the intended direction of effect, and some had dramatic effects on expression (-100% to +202%). Variant-FlowFISH thus provides a powerful tool to map the effects of variants and transcription factor binding sites on gene expression, test and improve computational models of gene regulation, and reprogram regulatory DNA.
]]></description>
<dc:creator>Martyn, G. E.</dc:creator>
<dc:creator>Montgomery, M. T.</dc:creator>
<dc:creator>Jones, H.</dc:creator>
<dc:creator>Guo, K.</dc:creator>
<dc:creator>Doughty, B. R.</dc:creator>
<dc:creator>Linder, J.</dc:creator>
<dc:creator>Chen, Z.</dc:creator>
<dc:creator>Cochran, K.</dc:creator>
<dc:creator>Lawrence, K. A.</dc:creator>
<dc:creator>Munson, G.</dc:creator>
<dc:creator>Pampari, A.</dc:creator>
<dc:creator>Fulco, C. P.</dc:creator>
<dc:creator>Kelley, D. R.</dc:creator>
<dc:creator>Lander, E. S.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:date>2023-12-21</dc:date>
<dc:identifier>doi:10.1101/2023.12.20.572268</dc:identifier>
<dc:title><![CDATA[Rewriting regulatory DNA to dissect and reprogram gene expression]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-12-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.05.22.541801v1?rss=1">
<title>
<![CDATA[
SLC12A9 is a lysosome-detoxifying ammonium - chloride co-transporter 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.05.22.541801v1?rss=1"
</link>
<description><![CDATA[
Ammonia is a ubiquitous, toxic by-product of cell metabolism. Its high membrane permeability and proton affinity causes ammonia to accumulate inside acidic lysosomes in its poorly membrane-permeant form: ammonium (NH4+). Ammonium buildup compromises lysosomal function, suggesting the existence of mechanisms that protect cells from ammonium toxicity. Here, we identified SLC12A9 as a lysosomal ammonium exporter that preserves lysosomal homeostasis. SLC12A9 knockout cells showed grossly enlarged lysosomes and elevated ammonium content. These phenotypes were reversed upon removal of the metabolic source of ammonium or dissipation of the lysosomal pH gradient. Lysosomal chloride increased in SLC12A9 knockout cells and chloride binding by SLC12A9 was required for ammonium transport. Our data indicate that SLC12A9 is a chloride-driven ammonium co-transporter that is central in an unappreciated, fundamental mechanism of lysosomal physiology that may have special relevance in tissues with elevated ammonia, such as tumors.
]]></description>
<dc:creator>Levin-Konigsberg, R.</dc:creator>
<dc:creator>Mitra, K.</dc:creator>
<dc:creator>Nigam, A.</dc:creator>
<dc:creator>Spees, K.</dc:creator>
<dc:creator>Hivare, P.</dc:creator>
<dc:creator>Liu, K.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Krishnan, Y.</dc:creator>
<dc:creator>Bassik, M.</dc:creator>
<dc:date>2023-05-22</dc:date>
<dc:identifier>doi:10.1101/2023.05.22.541801</dc:identifier>
<dc:title><![CDATA[SLC12A9 is a lysosome-detoxifying ammonium - chloride co-transporter]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-05-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.01.26.525789v1?rss=1">
<title>
<![CDATA[
Molecular mechanisms of coronary artery disease risk at the PDGFD locus 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.01.26.525789v1?rss=1"
</link>
<description><![CDATA[
Platelet derived growth factor (PDGF) signaling has been extensively studied in the context of vascular disease, but the genetics of this pathway remain to be established. Genome wide association studies (GWAS) for coronary artery disease (CAD) have identified a risk locus at 11q22.3, and we have verified with fine mapping approaches that the regulatory variant rs2019090 and PDGFD represent the functional variant and putative functional gene. Further, FOXC1/C2 transcription factor (TF) binding at rs2019090 was found to promote PDGFD transcription through the CAD promoting allele. Employing a constitutive Pdgfd knockout allele along with SMC lineage tracing in a male atherosclerosis mouse model we mapped single cell transcriptomic, cell state, and lesion anatomical changes associated with gene loss. These studies revealed that Pdgfd promotes expansion, migration, and transition of SMC lineage cells to the chondromyocyte phenotype and vascular calcification. This is in contrast to protective CAD genes TCF21, ZEB2, and SMAD3 which we have shown to promote the fibroblast-like cell transition or perturb the pattern or extent of transition to the chondromyocyte phenotype. Further, Pdgfd expressing fibroblasts and pericytes exhibited greater expression of chemokines and leukocyte adhesion molecules, consistent with observed increased macrophage recruitment to the plaque. Despite these changes there was no effect of Pdgfd deletion on SMC contribution to the fibrous cap or overall lesion burden. These findings suggest that PDGFD mediates CAD risk through promoting SMC expansion and migration, in conjunction with deleterious phenotypic changes, and through promoting an inflammatory response that is primarily focused in the adventitia where it contributes to leukocyte trafficking to the diseased vessel wall.
]]></description>
<dc:creator>Kim, H.-J.</dc:creator>
<dc:creator>Cheng, P.</dc:creator>
<dc:creator>Travisano, S.</dc:creator>
<dc:creator>Weldy, C. S.</dc:creator>
<dc:creator>Monteiro, J.</dc:creator>
<dc:creator>Kundu, R.</dc:creator>
<dc:creator>Nguyen, T.</dc:creator>
<dc:creator>Sharma, D.</dc:creator>
<dc:creator>Shi, H.</dc:creator>
<dc:creator>Liu, B.</dc:creator>
<dc:creator>Lin, Y.</dc:creator>
<dc:creator>Haldar, S.</dc:creator>
<dc:creator>Jackson, S.</dc:creator>
<dc:creator>Quertermous, T.</dc:creator>
<dc:date>2023-01-27</dc:date>
<dc:identifier>doi:10.1101/2023.01.26.525789</dc:identifier>
<dc:title><![CDATA[Molecular mechanisms of coronary artery disease risk at the PDGFD locus]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-01-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.18.492517v1?rss=1">
<title>
<![CDATA[
The epigenomic landscape of single vascular cells reflects developmental origin and identifies disease risk loci 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.18.492517v1?rss=1"
</link>
<description><![CDATA[
Vascular sites have distinct susceptibility to atherosclerosis and aneurysm, yet the biological underpinning of vascular site-specific disease risk is largely unknown.

Vascular tissues have different developmental origins that may influence global chromatin accessibility, and understanding differential chromatin accessibility, gene expression profiles, and gene regulatory networks (GRN) on single cell resolution may give key insight into vascular site-specific disease risk. Here, we performed single cell chromatin accessibility (scATACseq) and gene expression profiling (scRNAseq) of healthy adult mouse vascular tissue from three vascular sites, 1) aortic root and ascending aorta, 2) brachiocephalic and carotid artery, and 3) descending thoracic aorta. Through a comprehensive analysis at single cell resolution, we discovered key regulatory enhancers to not only be cell type, but vascular site specific in vascular smooth muscle (SMC), fibroblasts, and endothelial cells. We identified epigenetic markers of embryonic origin with differential chromatin accessibility of key developmental transcription factors such as Tbx20, Hand2, Gata4, and Hoxb family members and discovered transcription factor motif accessibility to be cell type and vascular site specific. Notably, we found ascending fibroblasts to have distinct epigenomic patterns, highlighting SMAD2/3 function to suggest a differential susceptibility to TGF{beta}, a finding we confirmed through in vitro culture of primary adventitial fibroblasts. Finally, to understand how vascular site-specific enhancers may regulate human genetic risk for disease, we integrated genome wide association study (GWAS) data for ascending and descending aortic dimension, and through using a distinct base resolution deep learning model to predict variant effect on chromatin accessibility, ChromBPNet, to predict variant effects in SMC, Fibroblasts, and Endothelial cells within ascending aorta, carotid, and descending aorta sites of origin. We reveal that although cell type remains a primary influence on variant effects, vascular site modifies cell type transcription and highlights genomic regions that are enriched for specific TF motif footprints -- including MEF2A, SMAD3, and HAND2. This work supports a paradigm that the epigenomic and transcriptomic landscape of vascular cells are cell type and vascular site-specific and that site-specific enhancers govern complex genetic drivers of disease risk.
]]></description>
<dc:creator>Weldy, C. S.</dc:creator>
<dc:creator>Cheng, P. P.</dc:creator>
<dc:creator>Pedroza, A. J.</dc:creator>
<dc:creator>Dalal, A. R.</dc:creator>
<dc:creator>Sharma, D.</dc:creator>
<dc:creator>Kim, H.-J.</dc:creator>
<dc:creator>Shi, H.</dc:creator>
<dc:creator>Nguyen, T.</dc:creator>
<dc:creator>Kundu, R. K.</dc:creator>
<dc:creator>Fischbein, M. P.</dc:creator>
<dc:creator>Quertermous, T.</dc:creator>
<dc:date>2022-05-18</dc:date>
<dc:identifier>doi:10.1101/2022.05.18.492517</dc:identifier>
<dc:title><![CDATA[The epigenomic landscape of single vascular cells reflects developmental origin and identifies disease risk loci]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.05.556368v1?rss=1">
<title>
<![CDATA[
Pervasive mislocalization of pathogenic coding variants underlying human disorders 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.05.556368v1?rss=1"
</link>
<description><![CDATA[
Widespread sequencing has yielded thousands of missense variants predicted or confirmed as disease-causing. This creates a new bottleneck: determining the functional impact of each variant - largely a painstaking, customized process undertaken one or a few genes or variants at a time. Here, we established a high-throughput imaging platform to assay the impact of coding variation on protein localization, evaluating 3,547 missense variants of over 1,000 genes and phenotypes. We discovered that mislocalization is a common consequence of coding variation, affecting about one-sixth of all pathogenic missense variants, all cellular compartments, and recessive and dominant disorders alike. Mislocalization is primarily driven by effects on protein stability and membrane insertion rather than disruptions of trafficking signals or specific interactions. Furthermore, mislocalization patterns help explain pleiotropy and disease severity and provide insights on variants of unknown significance. Our publicly available resource will likely accelerate the understanding of coding variation in human diseases.
]]></description>
<dc:creator>Lacoste, J.</dc:creator>
<dc:creator>Haghighi, M.</dc:creator>
<dc:creator>Haider, S.</dc:creator>
<dc:creator>Lin, Z.-Y.</dc:creator>
<dc:creator>Segal, D.</dc:creator>
<dc:creator>Reno, C.</dc:creator>
<dc:creator>Qian, W. W.</dc:creator>
<dc:creator>Xiong, X.</dc:creator>
<dc:creator>Shafqat-Abbasi, H.</dc:creator>
<dc:creator>Ryder, P.</dc:creator>
<dc:creator>Senft, R.</dc:creator>
<dc:creator>Cimini, B.</dc:creator>
<dc:creator>Roth, F.</dc:creator>
<dc:creator>Calderwood, M.</dc:creator>
<dc:creator>Hill, D.</dc:creator>
<dc:creator>Vidal, M.</dc:creator>
<dc:creator>Yi, S.</dc:creator>
<dc:creator>Sahni, N.</dc:creator>
<dc:creator>Peng, J.</dc:creator>
<dc:creator>Gingras, A.-C.</dc:creator>
<dc:creator>Singh, S.</dc:creator>
<dc:creator>Carpenter, A.</dc:creator>
<dc:creator>Taipale, M.</dc:creator>
<dc:date>2023-09-05</dc:date>
<dc:identifier>doi:10.1101/2023.09.05.556368</dc:identifier>
<dc:title><![CDATA[Pervasive mislocalization of pathogenic coding variants underlying human disorders]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.05.24.542036v1?rss=1">
<title>
<![CDATA[
Characterizing glucokinase variant mechanisms using a multiplexed abundance assay 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.05.24.542036v1?rss=1"
</link>
<description><![CDATA[
Amino acid substitutions can perturb protein activity in multiple ways. Understanding their mechanistic basis may pinpoint how residues contribute to protein function. Here, we characterize the mechanisms of human glucokinase (GCK) variants, building on our previous comprehensive study on GCK variant activity. We assayed the abundance of 95% of GCK missense and nonsense variants, and found that 43% of hypoactive variants have a decreased cellular abundance. By combining our abundance scores with predictions of protein thermodynamic stability, we identify residues important for GCK metabolic stability and conformational dynamics. These residues could be targeted to modulate GCK activity, and thereby affect glucose homeostasis.
]]></description>
<dc:creator>Gersing, S.</dc:creator>
<dc:creator>Schulze, T. K.</dc:creator>
<dc:creator>Cagiada, M.</dc:creator>
<dc:creator>Stein, A.</dc:creator>
<dc:creator>Roth, F. P.</dc:creator>
<dc:creator>Lindorff-Larsen, K.</dc:creator>
<dc:creator>Hartmann-Petersen, R.</dc:creator>
<dc:date>2023-05-24</dc:date>
<dc:identifier>doi:10.1101/2023.05.24.542036</dc:identifier>
<dc:title><![CDATA[Characterizing glucokinase variant mechanisms using a multiplexed abundance assay]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-05-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.02.06.527353v1?rss=1">
<title>
<![CDATA[
Systematically testing human HMBS missense variants to reveal mechanism and pathogenic variation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.02.06.527353v1?rss=1"
</link>
<description><![CDATA[
Defects in hydroxymethylbilane synthase (HMBS) can cause Acute Intermittent Porphyria (AIP), an acute neurological disease. Although sequencing-based diagnosis can be definitive, ~[1/3] of clinical HMBS variants are missense variants, and most clinically-reported HMBS missense variants are designated as "variants of uncertain significance" (VUS). Using saturation mutagenesis, en masse selection, and sequencing, we applied a multiplexed validated assay to both the erythroid-specific and ubiquitous isoforms of HMBS, obtaining confident functional impact scores for >84% of all possible amino-acid substitutions. The resulting variant effect maps generally agreed with biochemical expectation. However, the maps showed variants at the dimerization interface to be unexpectedly well tolerated, and suggested residue roles in active site dynamics that were supported by molecular dynamics simulations. Most importantly, these HMBS variant effect maps can help discriminate pathogenic from benign variants, proactively providing evidence even for yet-to-be-observed clinical missense variants.
]]></description>
<dc:creator>van Loggerenberg, W.</dc:creator>
<dc:creator>Sowlati-Hashjin, S.</dc:creator>
<dc:creator>Weile, J.</dc:creator>
<dc:creator>Hamilton, R.</dc:creator>
<dc:creator>Chawla, A.</dc:creator>
<dc:creator>Gebbia, M.</dc:creator>
<dc:creator>Kishore, N.</dc:creator>
<dc:creator>Fresard, L.</dc:creator>
<dc:creator>Mustajoki, S.</dc:creator>
<dc:creator>Pischik, E.</dc:creator>
<dc:creator>Di Pierro, E.</dc:creator>
<dc:creator>Barbaro, M.</dc:creator>
<dc:creator>Floderus, Y.</dc:creator>
<dc:creator>Schmitt, C.</dc:creator>
<dc:creator>Gouya, L.</dc:creator>
<dc:creator>Colavin, A.</dc:creator>
<dc:creator>Nussbaum, R.</dc:creator>
<dc:creator>Friesema, E. C. H.</dc:creator>
<dc:creator>Kauppinen, R.</dc:creator>
<dc:creator>To-Figueras, J.</dc:creator>
<dc:creator>Aarsand, A. K.</dc:creator>
<dc:creator>Desnick, R. J.</dc:creator>
<dc:creator>Garton, M.</dc:creator>
<dc:creator>Roth, F. P.</dc:creator>
<dc:date>2023-02-06</dc:date>
<dc:identifier>doi:10.1101/2023.02.06.527353</dc:identifier>
<dc:title><![CDATA[Systematically testing human HMBS missense variants to reveal mechanism and pathogenic variation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-02-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.06.18.545488v1?rss=1">
<title>
<![CDATA[
Integrating Image and Molecular Profiles for Spatial Transcriptomics Analysis 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.06.18.545488v1?rss=1"
</link>
<description><![CDATA[
The spatially resolved transcriptomics (SRT) field has revolutionized our ability to comprehensively leverage image and molecular profiles to elucidate spatial organization of cellular microenvironments. Current clustering analysis of SRT data primarily relies on molecular information and fails to fully exploit the morphological features present in histology images, leading to compromised accuracy and interpretability. To overcome these limitations, we have developed a multi-stage statistical method called iIMPACT. It includes a finite mixture model to identify and define histology-based spatial domains based on AI-reconstructed histology images and spatial context of gene expression measurements, and a negative binomial regression model to detect domain-specific spatially variable genes. Through multiple case studies, we demonstrate iIMPACT outperformed existing methods, confirmed by ground truth biological knowledge. These findings underscore the accuracy and interpretability of iIMPACT as a new clustering approach, providing valuable insights into the cellular spatial organization and landscape of functional genes within spatial transcriptomics data.
]]></description>
<dc:creator>Jiang, X.</dc:creator>
<dc:creator>Wang, S.</dc:creator>
<dc:creator>Guo, L.</dc:creator>
<dc:creator>Wen, Z.</dc:creator>
<dc:creator>Jia, L.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Xiao, G.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:date>2023-06-20</dc:date>
<dc:identifier>doi:10.1101/2023.06.18.545488</dc:identifier>
<dc:title><![CDATA[Integrating Image and Molecular Profiles for Spatial Transcriptomics Analysis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-06-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.11.20.567880v1?rss=1">
<title>
<![CDATA[
Enhancer regulatory networks globally connect non-coding breast cancer loci to cancer genes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.11.20.567880v1?rss=1"
</link>
<description><![CDATA[
Genetic studies have associated thousands of enhancers with breast cancer. However, the vast majority have not been functionally characterized. Thus, it remains unclear how variant-associated enhancers contribute to cancer. Here, we perform single-cell CRISPRi screens of 3,512 regulatory elements associated with breast cancer to measure the impact of these regions on transcriptional phenotypes. Analysis of >500,000 single-cell transcriptomes in two breast cancer cell lines shows that perturbation of variant-associated enhancers disrupts breast cancer gene programs. We observe variant-associated enhancers that directly or indirectly regulate the expression of cancer genes. We also find one-to-multiple and multiple-to-one network motifs where enhancers indirectly regulate cancer genes. Notably, multiple variant-associated enhancers indirectly regulate TP53. Comparative studies illustrate sub-type specific functions between enhancers in ER+ and ER- cells. Finally, we developed the pySpade package to facilitate analysis of single-cell enhancer screens. Overall, we demonstrate that enhancers form regulatory networks that link cancer genes in the genome, providing a more comprehensive understanding of the contribution of enhancers to breast cancer development.
]]></description>
<dc:creator>Wang, Y.</dc:creator>
<dc:creator>Armendariz, D. A.</dc:creator>
<dc:creator>Wang, L.</dc:creator>
<dc:creator>Zhao, H.</dc:creator>
<dc:creator>Xie, S.</dc:creator>
<dc:creator>Hon, G. C.</dc:creator>
<dc:date>2023-11-20</dc:date>
<dc:identifier>doi:10.1101/2023.11.20.567880</dc:identifier>
<dc:title><![CDATA[Enhancer regulatory networks globally connect non-coding breast cancer loci to cancer genes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-11-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.07.531525v1?rss=1">
<title>
<![CDATA[
Dissecting embryonic and extra-embryonic lineage crosstalk with stem cell co-culture 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.07.531525v1?rss=1"
</link>
<description><![CDATA[
Faithful embryogenesis requires precise coordination between embryonic and extraembryonic tissues. Although stem cells from embryonic and extraembryonic origins have been generated for several mammalian species(Bogliotti et al., 2018; Choi et al., 2019; Cui et al., 2019; Evans and Kaufman, 1981; Kunath et al., 2005; Li et al., 2008; Martin, 1981; Okae et al., 2018; Tanaka et al., 1998; Thomson et al., 1998; Vandevoort et al., 2007; Vilarino et al., 2020; Yu et al., 2021b; Zhong et al., 2018), they are grown in different culture conditions with diverse media composition, which makes it difficult to study cross-lineage communication. Here, by using the same culture condition that activates FGF, TGF-{beta} and WNT signaling pathways, we derived stable embryonic stem cells (ESCs), extraembryonic endoderm stem cells (XENs) and trophoblast stem cells (TSCs) from all three founding tissues of mouse and cynomolgus monkey blastocysts. This allowed us to establish embryonic and extraembryonic stem cell co-cultures to dissect lineage crosstalk during early mammalian development. Co-cultures of ESCs and XENs uncovered a conserved and previously unrecognized growth inhibition of pluripotent cells by extraembryonic endoderm cells, which is in part mediated through extracellular matrix signaling. Our study unveils a more universal state of stem cell self-renewal stabilized by activation, as opposed to inhibition, of developmental signaling pathways. The embryonic and extraembryonic stem cell co-culture strategy developed here will open new avenues for creating more faithful embryo models and developing more developmentally relevant differentiation protocols.
]]></description>
<dc:creator>Wu, J.</dc:creator>
<dc:creator>Wei, Y.</dc:creator>
<dc:creator>Zhang, E.</dc:creator>
<dc:creator>Yu, L.</dc:creator>
<dc:creator>Guo, L.</dc:creator>
<dc:creator>Sakurai, M.</dc:creator>
<dc:creator>Takii, S.</dc:creator>
<dc:creator>Schmitz, D.</dc:creator>
<dc:creator>Ding, Y.</dc:creator>
<dc:creator>Zheng, C.</dc:creator>
<dc:creator>Sun, H.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Okamura, D.</dc:creator>
<dc:creator>Ji, W.</dc:creator>
<dc:creator>Tan, T.</dc:creator>
<dc:creator>Zhan, L.</dc:creator>
<dc:creator>Ci, B.</dc:creator>
<dc:creator>Liu, J.</dc:creator>
<dc:date>2023-03-07</dc:date>
<dc:identifier>doi:10.1101/2023.03.07.531525</dc:identifier>
<dc:title><![CDATA[Dissecting embryonic and extra-embryonic lineage crosstalk with stem cell co-culture]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.01.10.574997v1?rss=1">
<title>
<![CDATA[
Mechanosensitive genomic enhancers potentiate the cellular response to matrix stiffness 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.01.10.574997v1?rss=1"
</link>
<description><![CDATA[
Epigenetic control of cellular transcription and phenotype is influenced by changes in the cellular microenvironment, yet how mechanical cues from these microenvironments precisely influence epigenetic state to regulate transcription remains largely unmapped. Here, we combine genome-wide epigenome profiling, epigenome editing, and phenotypic and single-cell RNA-seq CRISPR screening to identify a new class of genomic enhancers that responds to the mechanical microenvironment. These  mechanoenhancers could be active on either soft or stiff extracellular matrix contexts, and regulated transcription to influence critical cell functions including apoptosis, mechanotransduction, proliferation, and migration. Epigenetic editing of mechanoenhancers on rigid materials tuned gene expression to levels observed on softer materials, thereby reprogramming the cellular response to the mechanical microenvironment. These editing approaches may enable the precise alteration of mechanically-driven disease states.
]]></description>
<dc:creator>Cosgrove, B. D.</dc:creator>
<dc:creator>Bounds, L. R.</dc:creator>
<dc:creator>Taylor, C. K.</dc:creator>
<dc:creator>Su, A. L.</dc:creator>
<dc:creator>Rizzo, A. J.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>Crawford, G. E.</dc:creator>
<dc:creator>Hoffman, B. D.</dc:creator>
<dc:creator>Gersbach, C. A.</dc:creator>
<dc:date>2024-01-10</dc:date>
<dc:identifier>doi:10.1101/2024.01.10.574997</dc:identifier>
<dc:title><![CDATA[Mechanosensitive genomic enhancers potentiate the cellular response to matrix stiffness]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-01-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.28.533945v1?rss=1">
<title>
<![CDATA[
Single-cell multi-scale footprinting reveals the modular organization of DNA regulatory elements 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.28.533945v1?rss=1"
</link>
<description><![CDATA[
Cis-regulatory elements control gene expression and are dynamic in their structure, reflecting changes to the composition of diverse effector proteins over time1-3. Here we sought to connect the structural changes at cis-regulatory elements to alterations in cellular fate and function. To do this we developed PRINT, a computational method that uses deep learning to correct sequence bias in chromatin accessibility data and identifies multi-scale footprints of DNA-protein interactions. We find that multi-scale footprints enable more accurate inference of TF and nucleosome binding. Using PRINT with single-cell multi-omics, we discover wide-spread changes to the structure and function of candidate cis-regulatory elements (cCREs) across hematopoiesis, wherein nucleosomes slide, expose DNA for TF binding, and promote gene expression. Activity segmentation using the co-variance across cell states identifies "sub-cCREs" as modular cCRE subunits of regulatory DNA. We apply this single-cell and PRINT approach to characterize the age-associated alterations to cCREs within hematopoietic stem cells (HSCs). Remarkably, we find a spectrum of aging alterations among HSCs corresponding to a global gain of sub-cCRE activity while preserving cCRE accessibility. Collectively, we reveal the functional importance of cCRE structure across cell states, highlighting changes to gene regulation at single-cell and single-base-pair resolution.
]]></description>
<dc:creator>Hu, Y.</dc:creator>
<dc:creator>Ma, S.</dc:creator>
<dc:creator>Kartha, V. K.</dc:creator>
<dc:creator>Duarte, F. M.</dc:creator>
<dc:creator>Horlbeck, M.</dc:creator>
<dc:creator>Zhang, R.</dc:creator>
<dc:creator>Shrestha, R.</dc:creator>
<dc:creator>Labade, A.</dc:creator>
<dc:creator>Kletzien, H.</dc:creator>
<dc:creator>Meliki, A.</dc:creator>
<dc:creator>Castillo, A.</dc:creator>
<dc:creator>Durand, N.</dc:creator>
<dc:creator>Mattei, E.</dc:creator>
<dc:creator>Anderson, L. J.</dc:creator>
<dc:creator>Tay, T.</dc:creator>
<dc:creator>Earl, A. S.</dc:creator>
<dc:creator>Shoresh, N.</dc:creator>
<dc:creator>Epstein, C. B.</dc:creator>
<dc:creator>Wagers, A.</dc:creator>
<dc:creator>Buenrostro, J. D.</dc:creator>
<dc:date>2023-03-29</dc:date>
<dc:identifier>doi:10.1101/2023.03.28.533945</dc:identifier>
<dc:title><![CDATA[Single-cell multi-scale footprinting reveals the modular organization of DNA regulatory elements]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.10.10.561642v1?rss=1">
<title>
<![CDATA[
Convergent Epigenetic Evolution Drives Relapse in Acute Myeloid Leukemia 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.10.10.561642v1?rss=1"
</link>
<description><![CDATA[
Relapse of acute myeloid leukemia (AML) is highly aggressive and often treatment refractory. We analyzed previously published AML relapse cohorts and found that 40% of relapses occur without changes in driver mutations, suggesting that non-genetic mechanisms drive relapse in a large proportion of cases. We therefore characterized epigenetic patterns of AML relapse using 26 matched diagnosis-relapse samples with ATAC-seq. This analysis identified a relapse-specific chromatin accessibility signature for mutationally stable AML, suggesting that AML undergoes epigenetic evolution at relapse independent of mutational changes. Analysis of leukemia stem cell (LSC) chromatin changes at relapse indicated that this leukemic compartment underwent significantly less epigenetic evolution than non-LSCs, while epigenetic changes in non-LSCs reflected overall evolution of the bulk leukemia. Finally, we used single-cell ATAC-seq paired with mitochondrial sequencing (mtscATAC) to map clones from diagnosis into relapse along with their epigenetic features. We found that distinct mitochondrially-defined clones exhibit more similar chromatin accessibility at relapse relative to diagnosis, demonstrating convergent epigenetic evolution in relapsed AML. These results demonstrate that epigenetic evolution is a feature of relapsed AML and that convergent epigenetic evolution can occur following treatment with induction chemotherapy.
]]></description>
<dc:creator>Nuno, K.</dc:creator>
<dc:creator>Azizi, A.</dc:creator>
<dc:creator>Koehnke, T.</dc:creator>
<dc:creator>Lareau, C.</dc:creator>
<dc:creator>Ediriwickrema, A.</dc:creator>
<dc:creator>Corces, M. R.</dc:creator>
<dc:creator>Satpathy, A. T.</dc:creator>
<dc:creator>Majeti, R.</dc:creator>
<dc:date>2023-10-10</dc:date>
<dc:identifier>doi:10.1101/2023.10.10.561642</dc:identifier>
<dc:title><![CDATA[Convergent Epigenetic Evolution Drives Relapse in Acute Myeloid Leukemia]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-10-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.11.20.517242v1?rss=1">
<title>
<![CDATA[
Single-cell multi-omics reveals dynamics of purifying selection of pathogenic mitochondrial DNA across human immune cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.11.20.517242v1?rss=1"
</link>
<description><![CDATA[
Cells experience intrinsic and extrinsic pressures that affect their proclivity to expand and persist in vivo. In congenital disorders caused by loss-of-function mutations in mitochondrial DNA (mtDNA), metabolic vulnerabilities may result in cell-type specific phenotypes and depletion of pathogenic alleles, contributing to purifying selection. However, the impact of pathogenic mtDNA mutations on the cellular hematopoietic landscape is not well understood. Here, we establish a multi-omics approach to quantify deletions in mtDNA alongside cell state features in single cells derived from Pearson syndrome patients. We resolve the interdependence between pathogenic mtDNA and lineage, including purifying selection against deletions in effector/memory CD8 T-cell populations and recent thymic emigrants and dynamics in other hematopoietic populations. Our mapping of lineage-specific purifying selection dynamics in primary cells from patients carrying pathogenic heteroplasmy provides a new perspective on recurrent clinical phenotypes in mitochondrial disorders, including cancer and infection, with potential broader relevance to age-related immune dysfunction.
]]></description>
<dc:creator>Lareau, C. A.</dc:creator>
<dc:creator>Dubois, S. M.</dc:creator>
<dc:creator>Buquicchio, F. A.</dc:creator>
<dc:creator>Hsieh, Y.-H.</dc:creator>
<dc:creator>Garg, K.</dc:creator>
<dc:creator>Kautz, P.</dc:creator>
<dc:creator>Nitsch, L.</dc:creator>
<dc:creator>Praktiknjo, S. D.</dc:creator>
<dc:creator>Maschmeyer, P.</dc:creator>
<dc:creator>Verboon, J. M.</dc:creator>
<dc:creator>Gutierrez, J. C.</dc:creator>
<dc:creator>Yin, Y.</dc:creator>
<dc:creator>Fiskin, E.</dc:creator>
<dc:creator>Luo, W.</dc:creator>
<dc:creator>Mimitou, E.</dc:creator>
<dc:creator>Muus, C.</dc:creator>
<dc:creator>Malhotra, R.</dc:creator>
<dc:creator>Parikh, S.</dc:creator>
<dc:creator>Fleming, M. D.</dc:creator>
<dc:creator>Oevermann, L.</dc:creator>
<dc:creator>Schulte, J.</dc:creator>
<dc:creator>Eckert, C.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Smibert, P.</dc:creator>
<dc:creator>Satpathy, A. T.</dc:creator>
<dc:creator>Regev, A.</dc:creator>
<dc:creator>Sankaran, V. G.</dc:creator>
<dc:creator>Agarwal, S.</dc:creator>
<dc:creator>Ludwig, L. S.</dc:creator>
<dc:date>2022-11-20</dc:date>
<dc:identifier>doi:10.1101/2022.11.20.517242</dc:identifier>
<dc:title><![CDATA[Single-cell multi-omics reveals dynamics of purifying selection of pathogenic mitochondrial DNA across human immune cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-11-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.04.23.537997v1?rss=1">
<title>
<![CDATA[
Codon affinity in mitochondrial DNA shapes evolutionary and somatic fitness 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.04.23.537997v1?rss=1"
</link>
<description><![CDATA[
Summary ParagraphSomatic variation contributes to biological heterogeneity by modulating cellular proclivity to differentiate, expand, adapt, or die. While large-scale sequencing efforts have revealed the foundational role of somatic variants to drive human tumor evolution, our understanding of the contribution of mutations to modulate cellular fitness in non-malignant contexts remains understudied. Here, we identify a mosaic synonymous variant (m.7076A>G) in the mitochondrial DNA (mtDNA) encoded cytochrome c-oxidase subunit 1 gene (MT-CO1, p.Gly391=), which was present at homoplasmy in 47% of immune cells from a healthy donor. Using single-cell multi-omics, we discover highly specific selection against the m.7076G mutant allele in the CD8+ effector memory T cell compartment in vivo, reminiscent of selection observed for pathogenic mtDNA alleles1, 2 and indicative of lineage-specific metabolic requirements. While the wildtype m.7076A allele is translated via Watson-Crick-Franklin base-pairing, the anticodon diversity of the mitochondrial transfer RNA pool is limited, requiring wobble-dependent translation of the m.7076G mutant allele. Notably, mitochondrial ribosome profiling revealed altered codon-anticodon affinity at the wobble position as evidenced by stalled translation of the synonymous m.7076G mutant allele encoding for glycine. Generalizing this observation, we provide a new ontogeny of the 8,482 synonymous variants in the human mitochondrial genome that enables interpretation of functional mtDNA variation. Specifically, via inter- and intra-species evolutionary analyses, population-level complex trait associations, and the occurrence of germline and somatic mtDNA mutations from large-scale sequencing studies, we demonstrate that synonymous variation impacting codon:anticodon affinity is actively evolving across the entire mitochondrial genome and has broad functional and phenotypic effects. In summary, our results introduce a new ontogeny for mitochondrial genetic variation and support a model where organismal principles can be discerned from somatic evolution via single-cell genomics.
]]></description>
<dc:creator>Lareau, C. A.</dc:creator>
<dc:creator>Yin, Y.</dc:creator>
<dc:creator>Gutierrez, J. C.</dc:creator>
<dc:creator>Dhindsa, R. S.</dc:creator>
<dc:creator>Gribling-Burrer, A.-S.</dc:creator>
<dc:creator>Hsieh, Y.-H.</dc:creator>
<dc:creator>Nitsch, L.</dc:creator>
<dc:creator>Buquicchio, F. A.</dc:creator>
<dc:creator>Abay, T.</dc:creator>
<dc:creator>Zielinski, S.</dc:creator>
<dc:creator>Stickels, R. R.</dc:creator>
<dc:creator>Ulirsch, J. C.</dc:creator>
<dc:creator>Yan, P.</dc:creator>
<dc:creator>Wang, F.</dc:creator>
<dc:creator>Miao, Z.</dc:creator>
<dc:creator>Sandor, K.</dc:creator>
<dc:creator>Daniel, B.</dc:creator>
<dc:creator>Liu, V.</dc:creator>
<dc:creator>Wang, Q.</dc:creator>
<dc:creator>Hu, F.</dc:creator>
<dc:creator>Smith, K. R.</dc:creator>
<dc:creator>Deevi, S. V. V.</dc:creator>
<dc:creator>Maschmeyer, P.</dc:creator>
<dc:creator>Petrovski, S.</dc:creator>
<dc:creator>Smyth, R. P.</dc:creator>
<dc:creator>Greenleaf, W. J.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Munschauer, M.</dc:creator>
<dc:creator>Ludwig, L. S.</dc:creator>
<dc:creator>Satpathy, A. T.</dc:creator>
<dc:date>2023-04-23</dc:date>
<dc:identifier>doi:10.1101/2023.04.23.537997</dc:identifier>
<dc:title><![CDATA[Codon affinity in mitochondrial DNA shapes evolutionary and somatic fitness]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-04-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.11.13.566919v1?rss=1">
<title>
<![CDATA[
Identifying genetic variants that influence the abundance of cell states in single-cell data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.11.13.566919v1?rss=1"
</link>
<description><![CDATA[
Introductory ParagraphTo understand genetic mechanisms driving disease, it is essential but difficult to map how risk alleles affect the composition of cells present in the body. Single-cell profiling quantifies granular information about tissues, but variant-associated cell states may reflect diverse combinations of the profiled cell features that are challenging to predefine. We introduce GeNA (Genotype-Neighborhood Associations), a statistical tool to identify cell state abundance quantitative trait loci (csaQTLs) in high-dimensional single-cell datasets. Instead of testing associations to predefined cell states, GeNA flexibly identifies the cell states whose abundance is most associated with genetic variants. In a genome-wide survey of scRNA-seq peripheral blood profiling from 969 individuals,1 GeNA identifies five independent loci associated with shifts in the relative abundance of immune cell states. For example, rs3003-T (p=1.96x10-11) associates with increased abundance of NK cells expressing TNF- response programs. This csaQTL colocalizes with increased risk for psoriasis, an autoimmune disease that responds to anti-TNF treatments. Flexibly characterizing csaQTLs for granular cell states may help illuminate how genetic background alters cellular composition to confer disease risk.
]]></description>
<dc:creator>Rumker, L.</dc:creator>
<dc:creator>Sakaue, S.</dc:creator>
<dc:creator>Reshef, Y.</dc:creator>
<dc:creator>Kang, J. B.</dc:creator>
<dc:creator>Yazar, S.</dc:creator>
<dc:creator>Alquicira-Hernandez, J.</dc:creator>
<dc:creator>Valencia, C.</dc:creator>
<dc:creator>Lagattuta, K. A.</dc:creator>
<dc:creator>Mah-Som, A.</dc:creator>
<dc:creator>Nathan, A.</dc:creator>
<dc:creator>Powell, J. E.</dc:creator>
<dc:creator>Loh, P.-R.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:date>2023-11-15</dc:date>
<dc:identifier>doi:10.1101/2023.11.13.566919</dc:identifier>
<dc:title><![CDATA[Identifying genetic variants that influence the abundance of cell states in single-cell data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-11-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.22.521678v1?rss=1">
<title>
<![CDATA[
Uncovering context-specific genetic-regulation of gene expression from single-cell RNA-sequencing using latent-factor models 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.22.521678v1?rss=1"
</link>
<description><![CDATA[
Genetic regulation of gene expression is a complex process, with genetic effects known to vary across cellular contexts such as cell types and environmental conditions. We developed SURGE, a method for unsupervised discovery of context-specific expression quantitative trait loci (eQTLs) from single-cell transcriptomic data. This allows discovery of the contexts or cell types modulating genetic regulation without prior knowledge. Applied to peripheral blood single-cell eQTL data, SURGE contexts capture continuous representations of distinct cell types and groupings of biologically related cell types. We demonstrate the disease-relevance of SURGE context-specific eQTLs using colocalization analysis and stratified LD-score regression.
]]></description>
<dc:creator>Strober, B. J.</dc:creator>
<dc:creator>Tayeb, K.</dc:creator>
<dc:creator>Popp, J.</dc:creator>
<dc:creator>Qi, G.</dc:creator>
<dc:creator>Gordon, M. G.</dc:creator>
<dc:creator>Perez, R.</dc:creator>
<dc:creator>Ye, C. J.</dc:creator>
<dc:creator>Battle, A.</dc:creator>
<dc:date>2022-12-23</dc:date>
<dc:identifier>doi:10.1101/2022.12.22.521678</dc:identifier>
<dc:title><![CDATA[Uncovering context-specific genetic-regulation of gene expression from single-cell RNA-sequencing using latent-factor models]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.11.519973v1?rss=1">
<title>
<![CDATA[
Reimagining Gene-Environment Interaction Analysis for Human Complex Traits 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.11.519973v1?rss=1"
</link>
<description><![CDATA[
In this study, we introduce PIGEON--a novel statistical framework for quantifying and estimating polygenic gene-environment interaction (GxE) using a variance component analytical approach. Based on PIGEON, we outline the main objectives in GxE studies, demonstrate the flaws in existing GxE approaches, and introduce an innovative estimation procedure which only requires summary statistics as input. We demonstrate the statistical superiority of PIGEON through extensive theoretical and empirical analyses and showcase its performance in multiple analytic settings, including a quasi-experimental GxE study of health outcomes, gene-by-sex interaction for 530 traits, and gene-by-treatment interaction in a randomized clinical trial. Our results show that PIGEON provides an innovative solution to many long-standing challenges in GxE inference and may fundamentally reshape analytical strategies in future GxE studies.
]]></description>
<dc:creator>Miao, J.</dc:creator>
<dc:creator>Song, G.</dc:creator>
<dc:creator>Wu, Y.</dc:creator>
<dc:creator>Hu, J.</dc:creator>
<dc:creator>Wu, Y.</dc:creator>
<dc:creator>Basu, S.</dc:creator>
<dc:creator>Andrews, J. S.</dc:creator>
<dc:creator>Schaumberg, K.</dc:creator>
<dc:creator>Fletcher, J. M.</dc:creator>
<dc:creator>Schmitz, L. L.</dc:creator>
<dc:creator>Lu, Q.</dc:creator>
<dc:date>2022-12-14</dc:date>
<dc:identifier>doi:10.1101/2022.12.11.519973</dc:identifier>
<dc:title><![CDATA[Reimagining Gene-Environment Interaction Analysis for Human Complex Traits]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.12.520180v1?rss=1">
<title>
<![CDATA[
Leveraging a machine learning derived surrogate phenotype to improve power for genome-wide association studies of partially missing phenotypes in population biobanks 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.12.520180v1?rss=1"
</link>
<description><![CDATA[
Within population biobanks, genetic discovery for specialized phenotypes is often limited by incomplete ascertainment. Machine learning (ML) is increasingly used to impute missing phenotypes from surrogate information. However, imputing missing phenotypes can invalidate statistical inference when the imputation model is misspecified, and proxy analysis of the ML-phenotype can introduce spurious associations. To overcome these limitations, we introduce SynSurr, an approach that jointly analyzes a partially missing target phenotype with a "synthetic surrogate", its predicted value from an ML-model. SynSurr estimates the same genetic effect as standard genome-wide association studies (GWAS) of the target phenotype, but improves power provided the synthetic surrogate is correlated with the target. Unlike imputation or proxy analysis, SynSurr does not require that the synthetic surrogate is obtained from a correctly specified generative model. We perform extensive simulations and an ablation analysis to compare SynSurr with existing methods. We also apply SynSurr to empower GWAS of dual-energy x-ray absorptiometry traits within the UK Biobank, leveraging a synthetic surrogate composed of bioelectrical impedance and anthropometric traits.
]]></description>
<dc:creator>McCaw, Z. R.</dc:creator>
<dc:creator>Gao, J. R.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:creator>Gronsbell, J.</dc:creator>
<dc:date>2022-12-14</dc:date>
<dc:identifier>doi:10.1101/2022.12.12.520180</dc:identifier>
<dc:title><![CDATA[Leveraging a machine learning derived surrogate phenotype to improve power for genome-wide association studies of partially missing phenotypes in population biobanks]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.25.559307v1?rss=1">
<title>
<![CDATA[
Ensembled best subset selection using summary statistics for polygenic risk prediction 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.25.559307v1?rss=1"
</link>
<description><![CDATA[
Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, yet existing methods face a tradeoff between predictive power and computational efficiency. We introduce ALL-Sum, a fast and scalable PRS method that combines an efficient summary statistic-based L0L2 penalized regression algorithm with an ensembling step that aggregates estimates from different tuning parameters for improved prediction performance. In extensive large-scale simulations across a wide range of polygenicity and genome-wide association studies (GWAS) sample sizes, ALL-Sum consistently outperforms popular alternative methods in terms of prediction accuracy, runtime, and memory usage. We analyze 27 published GWAS summary statistics for 11 complex traits from 9 reputable data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen, evaluated using individual-level UKBB data. ALL-Sum achieves the highest accuracy for most traits, particularly for GWAS with large sample sizes. We provide ALL-Sum as a user-friendly command-line software with pre-computed reference data for streamlined user-end analysis.
]]></description>
<dc:creator>Chen, T.</dc:creator>
<dc:creator>Zhang, H.</dc:creator>
<dc:creator>Mazumder, R.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:date>2023-09-26</dc:date>
<dc:identifier>doi:10.1101/2023.09.25.559307</dc:identifier>
<dc:title><![CDATA[Ensembled best subset selection using summary statistics for polygenic risk prediction]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.10.30.564764v1?rss=1">
<title>
<![CDATA[
A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.10.30.564764v1?rss=1"
</link>
<description><![CDATA[
Large-scale whole-genome sequencing (WGS) studies have improved our understanding of the contributions of coding and noncoding rare variants to complex human traits. Leveraging association effect sizes across multiple traits in WGS rare variant association analysis can improve statistical power over single-trait analysis, and also detect pleiotropic genes and regions. Existing multi-trait methods have limited ability to perform rare variant analysis of large-scale WGS data. We propose MultiSTAAR, a statistical framework and computationally-scalable analytical pipeline for functionally-informed multi-trait rare variant analysis in large-scale WGS studies. MultiSTAAR accounts for relatedness, population structure and correlation among phenotypes by jointly analyzing multiple traits, and further empowers rare variant association analysis by incorporating multiple functional annotations. We applied MultiSTAAR to jointly analyze three lipid traits (low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides) in 61,861 multi-ethnic samples from the Trans-Omics for Precision Medicine (TOPMed) Program. We discovered new associations with lipid traits missed by single-trait analysis, including rare variants within an enhancer of NIPSNAP3A and an intergenic region on chromosome 1.
]]></description>
<dc:creator>Li, X.</dc:creator>
<dc:creator>Chen, H.</dc:creator>
<dc:creator>Selvaraj, M. S.</dc:creator>
<dc:creator>Van Buren, E.</dc:creator>
<dc:creator>Zhou, H.</dc:creator>
<dc:creator>Wang, Y.</dc:creator>
<dc:creator>Sun, R.</dc:creator>
<dc:creator>McCaw, Z. R.</dc:creator>
<dc:creator>Yu, Z.</dc:creator>
<dc:creator>Arnett, D. K.</dc:creator>
<dc:creator>Bis, J. C.</dc:creator>
<dc:creator>Blangero, J.</dc:creator>
<dc:creator>Boerwinkle, E.</dc:creator>
<dc:creator>Bowden, D. W.</dc:creator>
<dc:creator>Brody, J. A.</dc:creator>
<dc:creator>Cade, B. E.</dc:creator>
<dc:creator>Carson, A. P.</dc:creator>
<dc:creator>Carlson, J. C.</dc:creator>
<dc:creator>Chami, N.</dc:creator>
<dc:creator>Chen, Y.-D. I.</dc:creator>
<dc:creator>Curran, J. E.</dc:creator>
<dc:creator>de Vries, P. S.</dc:creator>
<dc:creator>Fornage, M.</dc:creator>
<dc:creator>Franceschini, N.</dc:creator>
<dc:creator>Freedman, B. I.</dc:creator>
<dc:creator>Gu, C.</dc:creator>
<dc:creator>Heard-Costa, N. L.</dc:creator>
<dc:creator>He, J.</dc:creator>
<dc:creator>Hou, L.</dc:creator>
<dc:creator>Hung, Y.-J.</dc:creator>
<dc:creator>Irvin, M. R.</dc:creator>
<dc:creator>Kaplan, R. C.</dc:creator>
<dc:creator>Kardia, S. L. R.</dc:creator>
<dc:creator>Kelly, T.</dc:creator>
<dc:creator>Konigsberg, I.</dc:creator>
<dc:creator>Kooperberg, C.</dc:creator>
<dc:creator>Kral, B. G.</dc:creator>
<dc:creator>Li, C.</dc:creator>
<dc:creator>Loos, R. J. F.</dc:creator>
<dc:creator>Mahaney, M. C.</dc:creator>
<dc:creator>Martin, L. W.</dc:creator>
<dc:creator>Mathias, R. A.</dc:creator>
<dc:creator>Minster, R. L.</dc:creator>
<dc:creator>Mitchell, B. D</dc:creator>
<dc:date>2023-11-02</dc:date>
<dc:identifier>doi:10.1101/2023.10.30.564764</dc:identifier>
<dc:title><![CDATA[A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-11-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.10.555215v1?rss=1">
<title>
<![CDATA[
Whole Genome Sequencing Based Analysis of Inflammation Biomarkers in the Trans-Omics for Precision Medicine (TOPMed) Consortium 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.10.555215v1?rss=1"
</link>
<description><![CDATA[
Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38,465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program. We identified 22 distinct single-variant associations across 6 traits - E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin - that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.
]]></description>
<dc:creator>Jiang, M.-Z.</dc:creator>
<dc:creator>Gaynor, S. M.</dc:creator>
<dc:creator>Li, X.</dc:creator>
<dc:creator>Van Buren, E.</dc:creator>
<dc:creator>Stilp, A.</dc:creator>
<dc:creator>Buth, E.</dc:creator>
<dc:creator>Wang, F. F.</dc:creator>
<dc:creator>Manansala, R.</dc:creator>
<dc:creator>Gogarten, S. M.</dc:creator>
<dc:creator>Li, Z.</dc:creator>
<dc:creator>Polfus, L. M.</dc:creator>
<dc:creator>Salimi, S.</dc:creator>
<dc:creator>Bis, J. C.</dc:creator>
<dc:creator>Pankratz, N.</dc:creator>
<dc:creator>Yanek, L. R.</dc:creator>
<dc:creator>Durda, P.</dc:creator>
<dc:creator>Tracy, R. P.</dc:creator>
<dc:creator>Rich, S. S.</dc:creator>
<dc:creator>Rotter, J. I.</dc:creator>
<dc:creator>Mitchell, B. D.</dc:creator>
<dc:creator>Lewis, J. P.</dc:creator>
<dc:creator>Psaty, B. M.</dc:creator>
<dc:creator>Pratte, K. A.</dc:creator>
<dc:creator>Silverman, E. K.</dc:creator>
<dc:creator>Kaplan, R. C.</dc:creator>
<dc:creator>Avery, C.</dc:creator>
<dc:creator>North, K.</dc:creator>
<dc:creator>Mathias, R. A.</dc:creator>
<dc:creator>Faraday, N.</dc:creator>
<dc:creator>Lin, H.</dc:creator>
<dc:creator>Wang, B.</dc:creator>
<dc:creator>Carson, A. P.</dc:creator>
<dc:creator>Norwood, A. F.</dc:creator>
<dc:creator>Gibbs, R. A.</dc:creator>
<dc:creator>Kooperberg, C.</dc:creator>
<dc:creator>Lundin, J.</dc:creator>
<dc:creator>Peters, U.</dc:creator>
<dc:creator>Dupuis, J.</dc:creator>
<dc:creator>Hou, L.</dc:creator>
<dc:creator>Fornage, M.</dc:creator>
<dc:creator>Benjamin, E. J.</dc:creator>
<dc:creator>Reiner, A. P.</dc:creator>
<dc:creator>Bowler, R. P.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:creator>Auer, P. L.</dc:creator>
<dc:creator>Raf</dc:creator>
<dc:date>2023-09-12</dc:date>
<dc:identifier>doi:10.1101/2023.09.10.555215</dc:identifier>
<dc:title><![CDATA[Whole Genome Sequencing Based Analysis of Inflammation Biomarkers in the Trans-Omics for Precision Medicine (TOPMed) Consortium]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.03.24.485519v1?rss=1">
<title>
<![CDATA[
Novel Methods for Multi-ancestry Polygenic Prediction and their Evaluations in 3.7 Million Individuals of Diverse Ancestry 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.03.24.485519v1?rss=1"
</link>
<description><![CDATA[
Polygenic risk scores (PRS) increasingly predict complex traits, however, suboptimal performance in non-European populations raise concerns about clinical applications and health inequities. We developed CT-SLEB, a powerful and scalable method to calculate PRS using ancestry-specific GWAS summary statistics from multi-ancestry training samples, integrating clumping and thresholding, empirical Bayes and super learning. We evaluate CT-SLEB and nine-alternatives methods with large-scale simulated GWAS ([~]19 million common variants) and datasets from 23andMe Inc., the Global Lipids Genetics Consortium, All of Us and UK Biobank involving 5.1 million individuals of diverse ancestry, with 1.18 million individuals from four non-European populations across thirteen complex traits. Results demonstrate that CT-SLEB significantly improves PRS performance in non-European populations compared to simple alternatives, with comparable or superior performance to a recent, computationally intensive method. Moreover, our simulation studies offer insights into sample size requirements and SNP density effects on multi-ancestry risk prediction.
]]></description>
<dc:creator>Zhang, H.</dc:creator>
<dc:creator>Zhan, J.</dc:creator>
<dc:creator>Jin, J.</dc:creator>
<dc:creator>Ahearn, T. U.</dc:creator>
<dc:creator>Yu, Z.</dc:creator>
<dc:creator>O'Connell, J.</dc:creator>
<dc:creator>Jiang, Y.</dc:creator>
<dc:creator>Chen, T.</dc:creator>
<dc:creator>23andMe Research Team,</dc:creator>
<dc:creator>Garcia-Closas, M.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:creator>Koelsch, B. L.</dc:creator>
<dc:creator>Chatterjee, N.</dc:creator>
<dc:date>2022-03-27</dc:date>
<dc:identifier>doi:10.1101/2022.03.24.485519</dc:identifier>
<dc:title><![CDATA[Novel Methods for Multi-ancestry Polygenic Prediction and their Evaluations in 3.7 Million Individuals of Diverse Ancestry]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-03-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.02.08.527759v1?rss=1">
<title>
<![CDATA[
Accurate and Efficient Estimation of Local Heritability using Summary Statistics and LD Matrix 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.02.08.527759v1?rss=1"
</link>
<description><![CDATA[
Existing SNP-heritability estimation methods that leverage GWAS summary statistics produce estimators that are less efficient than the restricted maximum likelihood (REML) estimator using individual-level data under linear mixed models (LMMs). Increasing the precision of a heritability estimator is particularly important for regional analyses, as local genetic variances tend to be small. We introduce a new estimator for local heritability, "HEELS", which attains comparable statistical efficiency as REML (i.e. relative efficiency greater than 92%) but only requires summary-level statistics - Z-scores from the marginal association tests plus the empirical LD matrix. HEELS significantly improves the statistical efficiency of the existing summary-statistics-based heritability estimators- for instance, HEELS produces heritability estimates that are more than 3-fold and 7-times less variable than GRE and LDSC, respectively. Moreover, we introduce a unified framework to evaluate and compare the performance of different LD approximation strategies. We propose representing the empirical LD as the sum of a low-rank matrix and a banded matrix. This approximation not only reduces the storage and memory cost of using the LD matrix, but also improves the computational efficiency of the HEELS estimation. We demonstrate the statistical efficiency of HEELS and the advantages of our proposed LD approximation strategies both in simulations and through empirical analyses of the UK Biobank data.
]]></description>
<dc:creator>Li, H.</dc:creator>
<dc:creator>Mazumder, R.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:date>2023-02-09</dc:date>
<dc:identifier>doi:10.1101/2023.02.08.527759</dc:identifier>
<dc:title><![CDATA[Accurate and Efficient Estimation of Local Heritability using Summary Statistics and LD Matrix]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-02-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.05.11.540401v1?rss=1">
<title>
<![CDATA[
De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.05.11.540401v1?rss=1"
</link>
<description><![CDATA[
Transcription factors (TF) are proteins that bind DNA in a sequence-specific manner to regulate gene transcription. Despite their unique intrinsic sequence preferences, in vivo genomic occupancy profiles of TFs differ across cellular contexts. Hence, deciphering the sequence determinants of TF binding, both intrinsic and context-specific, is essential to understand gene regulation and the impact of regulatory, non-coding genetic variation. Biophysical models trained on in vitro TF binding assays can estimate intrinsic affinity landscapes and predict occupancy based on TF concentration and affinity. However, these models cannot adequately explain context-specific, in vivo binding profiles. Conversely, deep learning models, trained on in vivo TF binding assays, effectively predict and explain genomic occupancy profiles as a function of complex regulatory sequence syntax, albeit without a clear biophysical interpretation. To reconcile these complementary models of in vitro and in vivo TF binding, we developed Affinity Distillation (AD), a method that extracts thermodynamic affinities de-novo from deep learning models of TF chromatin immunoprecipitation (ChIP) experiments by marginalizing away the influence of genomic sequence context. Applied to neural networks modeling diverse classes of yeast and mammalian TFs, AD predicts energetic impacts of sequence variation within and surrounding motifs on TF binding as measured by diverse in vitro assays with superior dynamic range and accuracy compared to motif-based methods. Furthermore, AD can accurately discern affinities of TF paralogs. Our results highlight thermodynamic affinity as a key determinant of in vivo binding, suggest that deep learning models of in vivo binding implicitly learn high-resolution affinity landscapes, and show that these affinities can be successfully distilled using AD. This new biophysical interpretation of deep learning models enables high-throughput in silico experiments to explore the influence of sequence context and variation on both intrinsic affinity and in vivo occupancy.
]]></description>
<dc:creator>Alexandari, A. M.</dc:creator>
<dc:creator>Horton, C. A.</dc:creator>
<dc:creator>Shrikumar, A.</dc:creator>
<dc:creator>Shah, N.</dc:creator>
<dc:creator>Li, E.</dc:creator>
<dc:creator>Weilert, M.</dc:creator>
<dc:creator>Pufall, M. A.</dc:creator>
<dc:creator>Zeitlinger, J.</dc:creator>
<dc:creator>Fordyce, P. M.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:date>2023-05-11</dc:date>
<dc:identifier>doi:10.1101/2023.05.11.540401</dc:identifier>
<dc:title><![CDATA[De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-05-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.26.559662v1?rss=1">
<title>
<![CDATA[
5-hydroxymethylcytosines regulate gene expression as a passive DNA demethylation resisting epigenetic mark in proliferative somatic cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.26.559662v1?rss=1"
</link>
<description><![CDATA[
Enzymatic erasure of DNA methylation in mammals involves iterative 5-methylcytosine (5mC) oxidation by the ten-eleven translocation (TET) family of DNA dioxygenase proteins. As the most abundant form of oxidized 5mC, the prevailing model considers 5-hydroxymethylcytosine (5hmC) as a key nexus in active DNA demethylation that can either indirectly facilitate replication-dependent depletion of 5mC by inhibiting maintenance DNA methylation machinery (UHRF1/DNMT1), or directly be iteratively oxidized to 5-formylcytosine (5fC) and 5-carboxycytosine (5caC) and restored to cytosine (C) through thymine DNA glycosylase (TDG)-mediated 5fC/5caC excision repair. In proliferative somatic cells, to what extent TET-dependent removal of 5mC entails indirect DNA demethylation via 5hmC-induced replication-dependent dilution or direct iterative conversion of 5hmC to 5fC/5caC is unclear. Here we leverage a catalytic processivity stalling variant of human TET1 (TET1.var: T1662E) to decouple the stepwise generation of 5hmC from subsequent 5fC/5caC generation, excision and repair. By using a CRISPR/dCas9-based epigenome-editing platform, we demonstrate that 5fC/5caC excision repair (by wild-type TET1, TET1.wt), but not 5hmC generation alone (by TET1.var), is requisite for robust restoration of unmodified cytosines and reversal of somatic silencing of the methylation-sensitive, germline-specific RHOXF2B gene promoter. Furthermore, integrated whole-genome multi-modal epigenetic sequencing reveals that hemi-hydroxymethylated CpG dyads predominantly resist replication-dependent depletion of 5mC on the opposing strand in TET1.var-expressing cells. Notably, TET1.var-mediated 5hmC generation is sufficient to induce similar levels of differential gene expression (compared to TET1.wt) without inducing major changes in unmodified cytosine profiles across the genome. Our study suggests 5hmC alone plays a limited role in driving replication-dependent DNA demethylation in the presence of functional DNMT1/UHRF1 mechanisms, but can regulate gene expression as a bona fide epigenetic mark in proliferative somatic cells.
]]></description>
<dc:creator>Wei, A.</dc:creator>
<dc:creator>Zhang, H.</dc:creator>
<dc:creator>Qiu, Q.</dc:creator>
<dc:creator>Fabyanic, E. B.</dc:creator>
<dc:creator>Hu, P.</dc:creator>
<dc:creator>Wu, H.</dc:creator>
<dc:date>2023-09-27</dc:date>
<dc:identifier>doi:10.1101/2023.09.26.559662</dc:identifier>
<dc:title><![CDATA[5-hydroxymethylcytosines regulate gene expression as a passive DNA demethylation resisting epigenetic mark in proliferative somatic cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.04.15.537037v1?rss=1">
<title>
<![CDATA[
A transient dermal niche and dual epidermal programs underlie sweat gland development 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.04.15.537037v1?rss=1"
</link>
<description><![CDATA[
Eccrine glands are mammalian skin appendages indispensable for human thermoregulation. Like all skin-derived appendages, eccrine glands form from multipotent progenitors in the basal skin epidermis. It remains unclear how epidermal progenitors progressively specialize to specifically form eccrine glands, precluding efforts to regenerate these vital organs. Herein, we applied single nucleus transcriptomics to compare the expression content of wildtype, eccrine-forming mouse skin to that of mice harboring a skin-specific disruption of Engrailed 1 (En1), a transcription factor that promotes the formation of eccrine glands in both humans and mice. We identify two concurrent epidermal transcriptomes in the earliest eccrine anlagen: a predominant transcriptome that is shared with hair follicles, and a vastly underrepresented transcriptome that is En1-dependent and eccrine-specific. We demonstrate that differentiation of the eccrine anlage requires the induction of a transient and transcriptionally unique dermal niche that forms around each developing gland in humans and mice. Our study defines the transcriptional determinants underlying eccrine identity in the epidermis and uncovers the dermal niche required for eccrine developmental progression. By identifying these defining components of the eccrine developmental program, our findings set the stage for directed efforts to regenerate eccrine glands for comprehensive skin repair.
]]></description>
<dc:creator>Dingwall, H. L.</dc:creator>
<dc:creator>Tomizawa, R. R.</dc:creator>
<dc:creator>Aharoni, A.</dc:creator>
<dc:creator>Hu, P.</dc:creator>
<dc:creator>Qiu, Q.</dc:creator>
<dc:creator>Kokalari, B.</dc:creator>
<dc:creator>Martinez, S. M.</dc:creator>
<dc:creator>Donahue, J. C.</dc:creator>
<dc:creator>Aldea, D.</dc:creator>
<dc:creator>Mendoza, M.</dc:creator>
<dc:creator>Glass, I. A.</dc:creator>
<dc:creator>Birth Defects Research Laboratory,</dc:creator>
<dc:creator>Wu, H.</dc:creator>
<dc:creator>Kamberov, Y. G.</dc:creator>
<dc:date>2023-04-17</dc:date>
<dc:identifier>doi:10.1101/2023.04.15.537037</dc:identifier>
<dc:title><![CDATA[A transient dermal niche and dual epidermal programs underlie sweat gland development]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-04-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.10.30.564796v1?rss=1">
<title>
<![CDATA[
Decoding Heterogenous Single-cell Perturbation Responses 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.10.30.564796v1?rss=1"
</link>
<description><![CDATA[
Understanding diverse responses of individual cells to the same perturbation is central to many biological and biomedical problems. Current methods, however, do not precisely quantify the strength of perturbation responses and, more importantly, reveal new biological insights from heterogeneity in responses. Here we introduce the perturbation-response score (PS), based on constrained quadratic optimization, to quantify diverse perturbation responses at a single-cell level. Applied to single-cell transcriptomes of large-scale genetic perturbation datasets (e.g., Perturb-seq), PS outperforms existing methods for quantifying partial gene perturbation responses. In addition, PS presents two major advances. First, PS enables large-scale, single-cell-resolution dosage analysis of perturbation, without the need to titrate perturbation strength. By analyzing the dose-response patterns of over 2,000 essential genes in Perturb-seq, we identify two distinct patterns, depending on whether a moderate reduction in their expression induces strong downstream expression alterations. Second, PS identifies intrinsic and extrinsic biological determinants of perturbation responses. We demonstrate the application of PS in contexts such as T cell stimulation, latent HIV-1 expression, and pancreatic cell differentiation. Notably, PS unveiled a previously unrecognized, cell-type-specific role of coiled-coil domain containing 6 (CCDC6) in guiding liver and pancreatic lineage decisions, where CCDC6 knockouts drive the endoderm cell differentiation towards liver lineage, rather than pancreatic lineage. The PS approach provides an innovative method for dose-to-function analysis and will enable new biological discoveries from single-cell perturbation datasets.

One sentence summaryWe present a method to quantify diverse perturbation responses and discover novel biological insights in single-cell perturbation datasets.
]]></description>
<dc:creator>Song, B.</dc:creator>
<dc:creator>Liu, D.</dc:creator>
<dc:creator>Dai, W.</dc:creator>
<dc:creator>McMyn, N.</dc:creator>
<dc:creator>Wang, Q.</dc:creator>
<dc:creator>Yang, D.</dc:creator>
<dc:creator>Krejci, A.</dc:creator>
<dc:creator>Vasilyev, A.</dc:creator>
<dc:creator>Untermoser, N.</dc:creator>
<dc:creator>Loregger, A.</dc:creator>
<dc:creator>Song, D.</dc:creator>
<dc:creator>Williams, B.</dc:creator>
<dc:creator>Rosen, B.</dc:creator>
<dc:creator>Cheng, X.</dc:creator>
<dc:creator>Chao, L.</dc:creator>
<dc:creator>Kale, H.</dc:creator>
<dc:creator>Zhang, H.</dc:creator>
<dc:creator>Diao, Y.</dc:creator>
<dc:creator>Bürckstümmer, T.</dc:creator>
<dc:creator>Siliciano, J. M.</dc:creator>
<dc:creator>Li, J. J.</dc:creator>
<dc:creator>Siliciano, R.</dc:creator>
<dc:creator>Huangfu, D.</dc:creator>
<dc:creator>Li, W.</dc:creator>
<dc:date>2023-11-02</dc:date>
<dc:identifier>doi:10.1101/2023.10.30.564796</dc:identifier>
<dc:title><![CDATA[Decoding Heterogenous Single-cell Perturbation Responses]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-11-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.06.14.544990v1?rss=1">
<title>
<![CDATA[
Discovery of Competent Chromatin Regions in Human Embryonic Stem Cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.06.14.544990v1?rss=1"
</link>
<description><![CDATA[
The mechanisms underlying the ability of embryonic stem cells (ESCs) to rapidly activate lineage-specific genes during differentiation remain largely unknown. Through multiple CRISPR-activation screens, we discovered human ESCs have pre-established transcriptionally competent chromatin regions (CCRs) that support lineage-specific gene expression at levels comparable to differentiated cells. CCRs reside in the same topological domains as their target genes. They lack typical enhancer-associated histone modifications but show enriched occupancy of pluripotent transcription factors, DNA demethylation factors, and histone deacetylases. TET1 and QSER1 protect CCRs from excessive DNA methylation, while HDAC1 family members prevent premature activation. This "push and pull" feature resembles bivalent domains at developmental gene promoters but involves distinct molecular mechanisms. Our study provides new insights into pluripotency regulation and cellular plasticity in development and disease.

One sentence summaryWe report a class of distal regulatory regions distinct from enhancers that confer human embryonic stem cells with the competence to rapidly activate the expression of lineage-specific genes.
]]></description>
<dc:creator>Pulecio, J.</dc:creator>
<dc:creator>Tayyebi, Z.</dc:creator>
<dc:creator>Liu, D.</dc:creator>
<dc:creator>Wong, W.</dc:creator>
<dc:creator>Luo, R.</dc:creator>
<dc:creator>Damodaran, J. R.</dc:creator>
<dc:creator>Kaplan, S.</dc:creator>
<dc:creator>Cho, H.</dc:creator>
<dc:creator>Yan, J.</dc:creator>
<dc:creator>Murphy, D. J.</dc:creator>
<dc:creator>Rickert, R.</dc:creator>
<dc:creator>Shukla, A.</dc:creator>
<dc:creator>Zhong, A.</dc:creator>
<dc:creator>Gonzalez, F.</dc:creator>
<dc:creator>Yang, D.</dc:creator>
<dc:creator>Li, W.</dc:creator>
<dc:creator>Zhou, T.</dc:creator>
<dc:creator>Apostolou, E.</dc:creator>
<dc:creator>Leslie, C.</dc:creator>
<dc:creator>Huangfu, D.</dc:creator>
<dc:date>2023-06-14</dc:date>
<dc:identifier>doi:10.1101/2023.06.14.544990</dc:identifier>
<dc:title><![CDATA[Discovery of Competent Chromatin Regions in Human Embryonic Stem Cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-06-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.05.03.539283v1?rss=1">
<title>
<![CDATA[
Parallel genome-scale CRISPR screens distinguish pluripotency and self-renewal 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.05.03.539283v1?rss=1"
</link>
<description><![CDATA[
Pluripotent stem cells are defined by their self-renewal capacity, which is the ability of the stem cells to proliferate indefinitely while maintaining the pluripotent identity essential for their ability to differentiate into any somatic cell lineage. However, understanding the mechanisms that control stem cell fitness versus the pluripotent cell identity is challenging. To investigate the interplay between these two aspects of pluripotency, we performed four parallel genome-scale CRISPR-Cas9 loss-of-function screens interrogating stem cell fitness in hPSC self-renewal conditions, and the dissolution of the primed pluripotency identity during early differentiation. Comparative analyses led to the discovery of genes with distinct roles in pluripotency regulation, including mitochondrial and metabolism regulators crucial for stem cell fitness, and chromatin regulators that control pluripotent identity during early differentiation. We further discovered a core set of factors that control both stem cell fitness and pluripotent identity, including a network of chromatin factors that safeguard pluripotency. Our unbiased and systematic screening and comparative analyses disentangle two interconnected aspects of pluripotency, provide rich datasets for exploring pluripotent cell identity versus cell fitness, and offer a valuable model for categorizing gene function in broad biological contexts.
]]></description>
<dc:creator>Rosen, B. P.</dc:creator>
<dc:creator>Li, Q. V.</dc:creator>
<dc:creator>Cho, H.</dc:creator>
<dc:creator>Liu, D.</dc:creator>
<dc:creator>Yang, D.</dc:creator>
<dc:creator>Graff, S.</dc:creator>
<dc:creator>Yan, J.</dc:creator>
<dc:creator>Luo, R.</dc:creator>
<dc:creator>Verma, N.</dc:creator>
<dc:creator>Damodaran, J. R.</dc:creator>
<dc:creator>Beer, M. A.</dc:creator>
<dc:creator>Sidoli, S.</dc:creator>
<dc:creator>Huangfu, D.</dc:creator>
<dc:date>2023-05-03</dc:date>
<dc:identifier>doi:10.1101/2023.05.03.539283</dc:identifier>
<dc:title><![CDATA[Parallel genome-scale CRISPR screens distinguish pluripotency and self-renewal]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-05-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.08.03.551876v1?rss=1">
<title>
<![CDATA[
Interface-guided phenotyping of coding variants in the transcription factor RUNX1 with SEUSS 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.08.03.551876v1?rss=1"
</link>
<description><![CDATA[
Understanding the consequences of single amino acid substitutions in cancer driver genes remains an unmet need. Perturb-seq provides a tool to investigate the effects of individual mutations on cellular programs. Here we deploy SEUSS, a Perturb-seq like approach, to generate and assay mutations at physical interfaces of the RUNX1 Runt domain. We measured the impact of 115 mutations on RNA profiles in single myelogenous leukemia cells and used the profiles to categorize mutations into three functionally distinct groups: wild-type (WT)-like, loss-of-function (LOF)-like and hypomorphic. Notably, the largest concentration of functional mutations (non-WT-like) clustered at the DNA binding site and contained many of the more frequently observed mutations in human cancers. Hypomorphic variants shared characteristics with loss of function variants but had gene expression profiles indicative of response to neural growth factor and cytokine recruitment of neutrophils. Additionally, DNA accessibility changes upon perturbations were enriched for RUNX1 binding motifs, particularly near differentially expressed genes. Overall, our work demonstrates the potential of targeting protein interaction interfaces to better define the landscape of prospective phenotypes reachable by amino acid substitutions.
]]></description>
<dc:creator>Ozturk, K.</dc:creator>
<dc:creator>Panwala, R.</dc:creator>
<dc:creator>Sheen, J.</dc:creator>
<dc:creator>Ford, K.</dc:creator>
<dc:creator>Payne, N.</dc:creator>
<dc:creator>Zhang, D.-E.</dc:creator>
<dc:creator>Hutter, S.</dc:creator>
<dc:creator>Haferlach, T.</dc:creator>
<dc:creator>Ideker, T.</dc:creator>
<dc:creator>Mali, P.</dc:creator>
<dc:creator>Carter, H.</dc:creator>
<dc:date>2023-08-04</dc:date>
<dc:identifier>doi:10.1101/2023.08.03.551876</dc:identifier>
<dc:title><![CDATA[Interface-guided phenotyping of coding variants in the transcription factor RUNX1 with SEUSS]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-08-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.19.521116v1?rss=1">
<title>
<![CDATA[
Universal chromatin state annotation of the mouse genome 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.19.521116v1?rss=1"
</link>
<description><![CDATA[
Genome-wide chromatin states learned from integrating genome-wide maps of multiple epigenetic marks within the same cell type have been widely used to generate genome annotations of individual cell types. An alternative strategy based on  stacked modeling can provide a single  universal chromatin state annotation based jointly on data from many cell types. In human, such an approach was recently demonstrated and the resulting chromatin state annotation, denoted full-stack, was shown to have complementary advantages to per-cell-type annotations. However, an analogous annotation has not been previously available in mouse. Here, we produce a chromatin state annotation for mouse based on 901 datasets assaying 14 chromatin marks in 26 different cell or tissue types. To characterize each chromatin state, we relate the states to other external annotations and compare them to analogously defined states in human. We expect the full-stack chromatin state annotation for mouse will be a useful resource for studying the genome of this key mammalian model organism.
]]></description>
<dc:creator>Vu, H. T.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2022-12-20</dc:date>
<dc:identifier>doi:10.1101/2022.12.19.521116</dc:identifier>
<dc:title><![CDATA[Universal chromatin state annotation of the mouse genome]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.07.14.549056v1?rss=1">
<title>
<![CDATA[
Integrative epigenomic and functional characterization assay based annotation of regulatory activity across diverse human cell types 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.07.14.549056v1?rss=1"
</link>
<description><![CDATA[
We introduce ChromActivity, a computational framework for predicting and annotating regulatory activity across the genome through integration of multiple epigenomic maps and various functional characterization datasets. ChromActivity generates genomewide predictions of regulatory activity associated with each functional characterization dataset across many cell types based on available epigenomic data. It then for each cell type produces (1) ChromScoreHMM genome annotations based on the combinatorial and spatial patterns within these predictions and (2) ChromScore tracks of overall predicted regulatory activity. ChromActivity provides a resource for analyzing and interpreting the human regulatory genome across diverse cell types.
]]></description>
<dc:creator>Dincer, T. U.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2023-07-15</dc:date>
<dc:identifier>doi:10.1101/2023.07.14.549056</dc:identifier>
<dc:title><![CDATA[Integrative epigenomic and functional characterization assay based annotation of regulatory activity across diverse human cell types]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-07-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.07.27.550836v1?rss=1">
<title>
<![CDATA[
ChromaFold predicts the 3D contact map from single-cell chromatin accessibility 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.07.27.550836v1?rss=1"
</link>
<description><![CDATA[
The identification of cell-type-specific 3D chromatin interactions between regulatory elements can help to decipher gene regulation and to interpret the function of disease-associated non-coding variants. However, current chromosome conformation capture (3C) technologies are unable to resolve interactions at this resolution when only small numbers of cells are available as input. We therefore present ChromaFold, a deep learning model that predicts 3D contact maps and regulatory interactions from single-cell ATAC sequencing (scATAC-seq) data alone. ChromaFold uses pseudobulk chromatin accessibility, co-accessibility profiles across metacells, and predicted CTCF motif tracks as input features and employs a lightweight architecture to enable training on standard GPUs. Once trained on paired scATAC-seq and Hi-C data in human cell lines and tissues, ChromaFold can accurately predict both the 3D contact map and peak-level interactions across diverse human and mouse test cell types. In benchmarking against a recent deep learning method that uses bulk ATAC-seq, DNA sequence, and CTCF ChIP-seq to make cell-type-specific predictions, ChromaFold yields superior prediction performance when including CTCF ChIP-seq data as an input and comparable performance without. Finally, fine-tuning ChromaFold on paired scATAC-seq and Hi-C in a complex tissue enables deconvolution of chromatin interactions across cell subpopulations. ChromaFold thus achieves state-of-the-art prediction of 3D contact maps and regulatory interactions using scATAC-seq alone as input data, enabling accurate inference of cell-type-specific interactions in settings where 3C-based assays are infeasible.
]]></description>
<dc:creator>Gao, V. R.</dc:creator>
<dc:creator>Yang, R.</dc:creator>
<dc:creator>Das, A.</dc:creator>
<dc:creator>Luo, R.</dc:creator>
<dc:creator>Luo, H.</dc:creator>
<dc:creator>McNally, D. R.</dc:creator>
<dc:creator>Karagiannidis, I.</dc:creator>
<dc:creator>Rivas, M. A.</dc:creator>
<dc:creator>Wang, Z.-m.</dc:creator>
<dc:creator>Barisic, D.</dc:creator>
<dc:creator>Karbalayghareh, A.</dc:creator>
<dc:creator>Wong, W.</dc:creator>
<dc:creator>Zhan, Y.</dc:creator>
<dc:creator>Chin, C. R.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:creator>Bilmes, J. A.</dc:creator>
<dc:creator>Apostolou, E.</dc:creator>
<dc:creator>Kharas, M.</dc:creator>
<dc:creator>Beguelin, W.</dc:creator>
<dc:creator>Viny, A. D.</dc:creator>
<dc:creator>Huangfu, D.</dc:creator>
<dc:creator>Rudensky, A.</dc:creator>
<dc:creator>Melnick, A.</dc:creator>
<dc:creator>Leslie, C. S.</dc:creator>
<dc:date>2023-07-28</dc:date>
<dc:identifier>doi:10.1101/2023.07.27.550836</dc:identifier>
<dc:title><![CDATA[ChromaFold predicts the 3D contact map from single-cell chromatin accessibility]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-07-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.20.533521v1?rss=1">
<title>
<![CDATA[
Flexible parsing and preprocessing of technical sequences with splitcode 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.20.533521v1?rss=1"
</link>
<description><![CDATA[
Next-generation sequencing libraries are constructed with numerous synthetic constructs such as sequencing adapters, barcodes, and unique molecular identifiers. Such sequences can be essential for interpreting results of sequencing assays, and when they contain information pertinent to an experiment, they must be processed and analyzed. We present a tool called splitcode, that enables flexible and efficient parsing, interpreting, and editing of sequencing reads. This versatile tool facilitates simple, reproducible preprocessing of reads from libraries constructed for a large array of single-cell and bulk sequencing assays.

Availability and ImplementationThe splitcode program is free, open source, and available for download at http://github.com/pachterlab/splitcode.
]]></description>
<dc:creator>Sullivan, D. K.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-03-23</dc:date>
<dc:identifier>doi:10.1101/2023.03.20.533521</dc:identifier>
<dc:title><![CDATA[Flexible parsing and preprocessing of technical sequences with splitcode]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.14.543267v1?rss=1">
<title>
<![CDATA[
Universal preprocessing of single-cell genomics data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.14.543267v1?rss=1"
</link>
<description><![CDATA[
We describe a workflow for preprocessing a wide variety of single-cell genomics data types. The approach is based on parsing of machine-readable seqspec assay specifications to customize inputs for kb-python, which uses kallisto and bustools to catalog reads, error correct barcodes, and count reads. The universal preprocessing method is implemented in the Python package cellatlas that is available for download at: https://github.com/cellatlas/cellatlas/.
]]></description>
<dc:creator>Booeshaghi, A. S.</dc:creator>
<dc:creator>Sullivan, D. K.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-09-15</dc:date>
<dc:identifier>doi:10.1101/2023.09.14.543267</dc:identifier>
<dc:title><![CDATA[Universal preprocessing of single-cell genomics data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.11.21.568164v1?rss=1">
<title>
<![CDATA[
kallisto, bustools, and kb-python for quantifying bulk, single-cell, and single-nucleus RNA-seq 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.11.21.568164v1?rss=1"
</link>
<description><![CDATA[
The term "RNA-seq" refers to a collection of assays based on sequencing experiments that involve quantifying RNA species from bulk tissue, from single cells, or from single nuclei. The kallisto, bustools, and kb-python programs are free, open-source software tools for performing this analysis that together can produce gene expression quantification from raw sequencing reads. The quantifications can be individualized for multiple cells, multiple samples, or both. Additionally, these tools allow gene expression values to be classified as originating from nascent RNA species or mature RNA species, making this workflow amenable to both cell-based and nucleus-based assays. This protocol describes in detail how to use kallisto and bustools in conjunction with a wrapper, kb-python, to preprocess RNA-seq data.
]]></description>
<dc:creator>Sullivan, D. K.</dc:creator>
<dc:creator>Min, K. H.</dc:creator>
<dc:creator>Hjörleifsson, K. E.</dc:creator>
<dc:creator>Luebbert, L.</dc:creator>
<dc:creator>Holley, G.</dc:creator>
<dc:creator>Moses, L.</dc:creator>
<dc:creator>Gustafsson, J.</dc:creator>
<dc:creator>Bray, N. L.</dc:creator>
<dc:creator>Pimentel, H.</dc:creator>
<dc:creator>Booeshaghi, A. S.</dc:creator>
<dc:creator>Melsted, P.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-11-22</dc:date>
<dc:identifier>doi:10.1101/2023.11.21.568164</dc:identifier>
<dc:title><![CDATA[kallisto, bustools, and kb-python for quantifying bulk, single-cell, and single-nucleus RNA-seq]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-11-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.12.08.471788v1?rss=1">
<title>
<![CDATA[
Efficient pre-processing of Single-cell ATAC-seq data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.12.08.471788v1?rss=1"
</link>
<description><![CDATA[
Single-cell and single-nucleus genomics assays are becoming increasingly complex, with multiple measurements of distinct modalities performed concurrently resulting in "multimodal" readouts. While multimodal single-cell and single-nucleus genomics offers the potential to better understand how distinct cellular processes are coordinated, there can be technical and cost tradeoffs associated with increasing the number of measurement modes. To assess some of the tradeoffs inherent in multimodal assays, we have developed snATAK for preprocessing sequencing-based high-throughput assays that measure single-nucleus chromatin accessibility. Coupled with kallisto bustools for single-nucleus RNA-seq preprocessing, the snATAK workflow can be used for uniform preprocessing of 10x Genomics Multiome and single-nucleus ATAC-seq, SHARE-seq, ISSAAC-seq, spatial ATAC-seq and other chromatin-related assays. Using snATAK, we are able to perform cross-platform comparisons and quantify some of the tradeoffs between Multiome and unregistered single-nucleus RNA-seq/ATAC-seq experiments. We also show that snATAK can be used to assess allele concordance between paired RNAseq and ATACseq. snATAK is available at https://github.com/pachterlab/snATAK/.
]]></description>
<dc:creator>Gao, F.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2021-12-10</dc:date>
<dc:identifier>doi:10.1101/2021.12.08.471788</dc:identifier>
<dc:title><![CDATA[Efficient pre-processing of Single-cell ATAC-seq data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-12-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.17.558131v1?rss=1">
<title>
<![CDATA[
Biophysically Interpretable Inference of Cell Types from Multimodal Sequencing Data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.17.558131v1?rss=1"
</link>
<description><![CDATA[
Multimodal, single-cell genomics technologies enable simultaneous capture of multiple facets of DNA and RNA processing in the cell. This creates opportunities for transcriptome-wide, mechanistic studies of cellular processing in heterogeneous cell types, with applications ranging from inferring kinetic differences between cells, to the role of stochasticity in driving heterogeneity. However, current methods for determining cell types or  clusters present in multimodal data often rely on ad hoc or independent treatment of modalities, and assumptions ignoring inherent properties of the count data. To enable interpretable and consistent cell cluster determination from multimodal data, we present meK-Means (mechanistic K-Means) which integrates modalities and learns underlying, shared biophysical states through a unifying model of transcription. In particular, we demonstrate how meK-Means can be used to cluster cells from unspliced and spliced mRNA count modalities. By utilizing the causal, physical relationships underlying these modalities, we identify shared transcriptional kinetics across cells, which induce the observed gene expression profiles, and provide an alternative definition for  clusters through the governing parameters of cellular processes.
]]></description>
<dc:creator>Chari, T.</dc:creator>
<dc:creator>Gorin, G.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-09-17</dc:date>
<dc:identifier>doi:10.1101/2023.09.17.558131</dc:identifier>
<dc:title><![CDATA[Biophysically Interpretable Inference of Cell Types from Multimodal Sequencing Data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.07.20.549945v1?rss=1">
<title>
<![CDATA[
Voyager: exploratory single-cell genomics data analysis with geospatial statistics 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.07.20.549945v1?rss=1"
</link>
<description><![CDATA[
Exploratory spatial data analysis (ESDA) can be a powerful approach to understanding single-cell genomics datasets, but it is not yet part of standard data analysis workflows. In particular, geospatial analyses, which have been developed and refined for decades, have yet to be fully adapted and applied to spatial single-cell analysis. We introduce the Voyager platform, which systematically brings the geospatial ESDA tradition to (spatial) -omics, with local, bivariate, and multivariate spatial methods not yet commonly applied to spatial -omics, united by a uniform user interface. Using Voyager, we showcase biological insights that can be derived with its methods, such as biologically relevant negative spatial autocorrelation. Underlying Voyager is the SpatialFeatureExperiment data structure, which combines Simple Feature with SingleCellExperiment and AnnData to represent and operate on geometries bundled with gene expression data. Voyager has comprehensive tutorials demonstrating ESDA built on GitHub Actions to ensure reproducibility and scalability, using data from popular commercial technologies. Voyager is implemented in both R/Bioconductor and Python/PyPI, and features compatibility tests to ensure that both implementations return consistent results.
]]></description>
<dc:creator>Moses, L.</dc:creator>
<dc:creator>Einarsson, P. H.</dc:creator>
<dc:creator>Jackson, K. C.</dc:creator>
<dc:creator>Luebbert, L.</dc:creator>
<dc:creator>Booeshaghi, A. S.</dc:creator>
<dc:creator>Antonsson, S. E.</dc:creator>
<dc:creator>Melsted, P.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-07-22</dc:date>
<dc:identifier>doi:10.1101/2023.07.20.549945</dc:identifier>
<dc:title><![CDATA[Voyager: exploratory single-cell genomics data analysis with geospatial statistics]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-07-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.01.13.523995v1?rss=1">
<title>
<![CDATA[
Mechanistic modeling with a variational autoencoder for multimodal single-cell RNA sequencing data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.01.13.523995v1?rss=1"
</link>
<description><![CDATA[
We motivate and present biVI, which combines the variational autoencoder framework of scVI with biophysically motivated, bivariate models for nascent and mature RNA distributions. While previous approaches to integrate bimodal data via the variational autoencoder framework ignore the causal relationship between measurements, biVI models the biophysical processes that give rise to observations. We demonstrate through simulated benchmarking that biVI captures cell type structure in a low-dimensional space and accurately recapitulates parameter values and copy number distributions. On biological data, biVI provides a scalable route for identifying the biophysical mechanisms underlying gene expression. This analytical approach outlines a generalizable strateg for treating multimodal datasets generated by high-throughput, single-cell genomic assays.
]]></description>
<dc:creator>Carilli, M. T.</dc:creator>
<dc:creator>Gorin, G.</dc:creator>
<dc:creator>Choi, Y.</dc:creator>
<dc:creator>Chari, T.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-01-14</dc:date>
<dc:identifier>doi:10.1101/2023.01.13.523995</dc:identifier>
<dc:title><![CDATA[Mechanistic modeling with a variational autoencoder for multimodal single-cell RNA sequencing data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-01-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.06.11.495771v1?rss=1">
<title>
<![CDATA[
Monod: mechanistic analysis of single-cell RNA sequencing count data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.06.11.495771v1?rss=1"
</link>
<description><![CDATA[
Single-cell RNA sequencing analysis centers on illuminating cell diversity and understanding the transcriptional mechanisms underlying cellular function. These datasets are large, noisy, and complex. Current analyses prioritize noise removal and dimensionality reduction to tackle these challenges and extract biological insight. We propose an alternative, physical approach to leverage the stochasticity, size, and multimodal nature of these data to explicitly distinguish their biological and technical facets while revealing the underlying regulatory processes. With the Python package Monod, we demonstrate how nascent and mature RNA counts, present in most published datasets, can be meaningfully "integrated" under biophysical models of transcription. By utilizing variation in these modalities, we can identify transcriptional modulation not discernible though changes in average gene expression, quantitatively compare mechanistic hypotheses of gene regulation, analyze transcriptional data from different technologies within a common framework, and minimize the use of opaque or distortive normalization and transformation techniques.
]]></description>
<dc:creator>Gorin, G.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2022-06-12</dc:date>
<dc:identifier>doi:10.1101/2022.06.11.495771</dc:identifier>
<dc:title><![CDATA[Monod: mechanistic analysis of single-cell RNA sequencing count data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-06-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.05.17.541250v1?rss=1">
<title>
<![CDATA[
Studying stochastic systems biology of the cell with single-cell genomics data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.05.17.541250v1?rss=1"
</link>
<description><![CDATA[
Recent experimental developments in genome-wide RNA quantification hold considerable promise for systems biology. However, rigorously probing the biology of living cells requires a unified mathematical framework that accounts for single-molecule biological stochasticity in the context of technical variation associated with genomics assays. We review models for a variety of RNA transcription processes, as well as the encapsulation and library construction steps of microfluidics-based single-cell RNA sequencing, and present a framework to integrate these phenomena by the manipulation of generating functions. Finally, we use simulated scenarios and biological data to illustrate the implications and applications of the approach.
]]></description>
<dc:creator>Gorin, G.</dc:creator>
<dc:creator>Vastola, J. J.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-05-18</dc:date>
<dc:identifier>doi:10.1101/2023.05.17.541250</dc:identifier>
<dc:title><![CDATA[Studying stochastic systems biology of the cell with single-cell genomics data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-05-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.04.26.591412v1?rss=1">
<title>
<![CDATA[
CRISPR Screening Uncovers a Long-Range Enhancer for ONECUT1 in Pancreatic Differentiation and Links a Diabetes Risk Variant 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.04.26.591412v1?rss=1"
</link>
<description><![CDATA[
Functional enhancer annotation is a valuable first step for understanding tissue-specific transcriptional regulation and prioritizing disease-associated non-coding variants for investigation. However, unbiased enhancer discovery in physiologically relevant contexts remains a major challenge. To discover regulatory elements pertinent to diabetes, we conducted a CRISPR interference screen in the human pluripotent stem cell (hPSC) pancreatic differentiation system. Among the enhancers uncovered, we focused on a long-range enhancer [~]664 kb from the ONECUT1 promoter, since coding mutations in ONECUT1 cause pancreatic hypoplasia and neonatal diabetes. Homozygous enhancer deletion in hPSCs was associated with a near-complete loss of ONECUT1 gene expression and compromised pancreatic differentiation. This enhancer contains a confidently fine-mapped type 2 diabetes associated variant (rs528350911) which disrupts a GATA motif. Introduction of the risk variant into hPSCs revealed substantially reduced binding of key pancreatic transcription factors (GATA4, GATA6 and FOXA2) on the edited allele, accompanied by a slight reduction of ONECUT1 transcription, supporting a causal role for this risk variant in metabolic disease. This work expands our knowledge about transcriptional regulation in pancreatic development through the characterization of a long-range enhancer and highlights the utility of enhancer discovery in disease-relevant settings for understanding monogenic and complex disease.
]]></description>
<dc:creator>Kaplan, S. J.</dc:creator>
<dc:creator>Wong, W.</dc:creator>
<dc:creator>Yan, J.</dc:creator>
<dc:creator>Pulecio, J.</dc:creator>
<dc:creator>Cho, H.</dc:creator>
<dc:creator>Leslie-Iyer, J.</dc:creator>
<dc:creator>Kazakov, J.</dc:creator>
<dc:creator>Zhao, J.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:creator>Murphy, D.</dc:creator>
<dc:creator>Luo, R.</dc:creator>
<dc:creator>Dey, K. K.</dc:creator>
<dc:creator>Apostolou, E.</dc:creator>
<dc:creator>Lesie, C. S.</dc:creator>
<dc:creator>Huangfu, D.</dc:creator>
<dc:date>2024-04-29</dc:date>
<dc:identifier>doi:10.1101/2024.04.26.591412</dc:identifier>
<dc:title><![CDATA[CRISPR Screening Uncovers a Long-Range Enhancer for ONECUT1 in Pancreatic Differentiation and Links a Diabetes Risk Variant]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-04-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.04.22.590634v1?rss=1">
<title>
<![CDATA[
Massively parallel reporter assays and mouse transgenic assays provide complementary information about neuronal enhancer activity 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.04.22.590634v1?rss=1"
</link>
<description><![CDATA[
Genetic studies find hundreds of thousands of noncoding variants associated with psychiatric disorders. Massively parallel reporter assays (MPRAs) and in vivo transgenic mouse assays can be used to assay the impact of these variants. However, the relevance of MPRAs to in vivo function is unknown and transgenic assays suffer from low throughput. Here, we studied the utility of combining the two assays to study the impact of non-coding variants. We carried out an MPRA on over 50,000 sequences derived from enhancers validated in transgenic mouse assays and from multiple fetal neuronal ATAC-seq datasets. We also tested over 20,000 variants, including synthetic mutations in highly active neuronal enhancers and 177 common variants associated with psychiatric disorders. Variants with a high impact on MPRA activity were further tested in mice. We found a strong and specific correlation between MPRA and mouse neuronal enhancer activity including changes in neuronal enhancer activity in mouse embryos for variants with strong MPRA effects. Mouse assays also revealed pleiotropic variant effects that could not be observed in MPRA. Our work provides a large catalog of functional neuronal enhancers and variant effects and highlights the effectiveness of combining MPRAs and mouse transgenic assays.
]]></description>
<dc:creator>Kosicki, M.</dc:creator>
<dc:creator>Cintron, D. L.</dc:creator>
<dc:creator>Page, N. F.</dc:creator>
<dc:creator>Georgakopoulos-Soares, I.</dc:creator>
<dc:creator>Akiyama, J. A.</dc:creator>
<dc:creator>Plajzer-Frick, I.</dc:creator>
<dc:creator>Novak, C. S.</dc:creator>
<dc:creator>Kato, M.</dc:creator>
<dc:creator>Hunter, R. D.</dc:creator>
<dc:creator>von Maydell, K.</dc:creator>
<dc:creator>Barton, S.</dc:creator>
<dc:creator>Godfrey, P.</dc:creator>
<dc:creator>Beckman, E.</dc:creator>
<dc:creator>Sanders, S. J.</dc:creator>
<dc:creator>Pennacchio, L. A.</dc:creator>
<dc:creator>Ahituv, N.</dc:creator>
<dc:date>2024-04-23</dc:date>
<dc:identifier>doi:10.1101/2024.04.22.590634</dc:identifier>
<dc:title><![CDATA[Massively parallel reporter assays and mouse transgenic assays provide complementary information about neuronal enhancer activity]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-04-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.04.16.589814v1?rss=1">
<title>
<![CDATA[
Massively parallel jumping assay decodes Alu retrotransposition activity 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.04.16.589814v1?rss=1"
</link>
<description><![CDATA[
The human genome contains millions of retrotransposons, several of which could become active due to somatic mutations having phenotypic consequences, including disease. However, it is not thoroughly understood how nucleotide changes in retrotransposons affect their jumping activity. Here, we developed a novel massively parallel jumping assay (MPJA) that can test the jumping potential of thousands of transposons en masse. We generated nucleotide variant library of selected four Alu retrotransposons containing 165,087 different haplotypes and tested them for their jumping ability using MPJA. We found 66,821 unique jumping haplotypes, allowing us to pinpoint domains and variants vital for transposition. Mapping these variants to the Alu-RNA secondary structure revealed stem-loop features that contribute to jumping potential. Combined, our work provides a novel high-throughput assay that assesses the ability of retrotransposons to jump and identifies nucleotide changes that have the potential to reactivate them in the human genome.
]]></description>
<dc:creator>Ahituv, N.</dc:creator>
<dc:creator>Matharu, N.</dc:creator>
<dc:creator>Zhao, J.</dc:creator>
<dc:creator>Sohota, A.</dc:creator>
<dc:creator>Deng, L.</dc:creator>
<dc:creator>Hung, Y.</dc:creator>
<dc:creator>Li, Z.</dc:creator>
<dc:creator>Sims, J.</dc:creator>
<dc:creator>Rattanasopha, S.</dc:creator>
<dc:creator>Meyer, J.</dc:creator>
<dc:creator>Carbone, L.</dc:creator>
<dc:creator>Kircher, M.</dc:creator>
<dc:date>2024-04-19</dc:date>
<dc:identifier>doi:10.1101/2024.04.16.589814</dc:identifier>
<dc:title><![CDATA[Massively parallel jumping assay decodes Alu retrotransposition activity]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-04-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.07.08.602569v1?rss=1">
<title>
<![CDATA[
Smooth muscle expression of RNA editing enzyme ADAR1 controls vascular integrity and progression of atherosclerosis 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.07.08.602569v1?rss=1"
</link>
<description><![CDATA[
Mapping the genomic architecture of complex disease has been predicated on the understanding that genetic variants influence disease risk through modifying gene expression. However, recent discoveries have revealed that a significant burden of disease heritability in common autoinflammatory disorders and coronary artery disease (CAD) is mediated through genetic variation modifying post-transcriptional modification of RNA through adenosine-to-inosine (A-to-I) RNA editing. This common RNA modification is catalyzed by ADAR enzymes, where ADAR1 edits specific immunogenic double stranded RNA (dsRNA) to prevent activation of the double strand RNA (dsRNA) sensor MDA5 (IFIH1) and stimulation of an interferon stimulated gene (ISG) response. Multiple lines of human genetic data indicate impaired RNA editing and increased dsRNA sensing by MDA5 to be an important mechanism of CAD risk. Here, we provide a crucial link between observations in human genetics and mechanistic cell biology leading to progression of CAD. Through analysis of human atherosclerotic plaque and culture of human coronary artery vascular smooth muscle cells (SMCs) we implicate the SMC to have a distinct requirement for RNA editing, and that MDA5 activation regulates SMC phenotypic modulation. Through generation of a conditional SMC specific Adar1 deletion mouse model on a pro-atherosclerosis background with additional constitutive deletion of MDA5 (Ifih1), and with incorporation of single cell RNA sequencing cellular profiling, we further show that Adar1 controls SMC phenotypic state by regulating Mda5 activation, is required to maintain vascular integrity, and controls progression of atherosclerosis and vascular calcification. Finally, we further corroborate our findings in a large human carotid endarterectomy dataset (Athero-Express) where we show that ISG activation is strongly associated with decreased plaque stability, increased SMC phenotypic modulation, and increased plaque calcification. Through this work, we describe a fundamental mechanism of CAD, where cell type and context specific RNA editing and sensing of dsRNA mediates disease progression, bridging our understanding of human genetics and disease causality.

One Sentence SummarySmooth muscle expression of RNA editing enzyme ADAR1 regulates activation of double strand RNA sensor MDA5 in novel mechanism of atherosclerosis.
]]></description>
<dc:creator>Weldy, C. S.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:creator>Monteiro, J. P.</dc:creator>
<dc:creator>Guo, H.</dc:creator>
<dc:creator>Galls, D.</dc:creator>
<dc:creator>Gu, W.</dc:creator>
<dc:creator>Cheng, P. P.</dc:creator>
<dc:creator>Ramste, M.</dc:creator>
<dc:creator>Li, D. Y.</dc:creator>
<dc:creator>Palmisano, B. T.</dc:creator>
<dc:creator>Sharma, D.</dc:creator>
<dc:creator>Worssam, M. D.</dc:creator>
<dc:creator>Zhao, Q.</dc:creator>
<dc:creator>Bhate, A.</dc:creator>
<dc:creator>Kundu, R.</dc:creator>
<dc:creator>Nguyen, T.</dc:creator>
<dc:creator>Li, J. B.</dc:creator>
<dc:creator>Quertermous, T.</dc:creator>
<dc:date>2024-07-11</dc:date>
<dc:identifier>doi:10.1101/2024.07.08.602569</dc:identifier>
<dc:title><![CDATA[Smooth muscle expression of RNA editing enzyme ADAR1 controls vascular integrity and progression of atherosclerosis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-07-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.07.12.603288v1?rss=1">
<title>
<![CDATA[
Cohesin-mediated 3D contacts tune enhancer-promoter regulation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.07.12.603288v1?rss=1"
</link>
<description><![CDATA[
Enhancers are key drivers of gene regulation thought to act via 3D physical interactions with the promoters of their target genes. However, genome-wide depletions of architectural proteins such as cohesin result in only limited changes in gene expression, despite a loss of contact domains and loops. Consequently, the role of cohesin and 3D contacts in enhancer function remains debated. Here, we developed CRISPRi of regulatory elements upon degron operation (CRUDO), a novel approach to measure how changes in contact frequency impact enhancer effects on target genes by perturbing enhancers with CRISPRi and measuring gene expression in the presence or absence of cohesin. We systematically perturbed all 1,039 candidate enhancers near five cohesin-dependent genes and identified 34 enhancer-gene regulatory interactions. Of 26 regulatory interactions with sufficient statistical power to evaluate cohesin dependence, 18 show cohesin-dependent effects. A decrease in enhancer-promoter contact frequency upon removal of cohesin is frequently accompanied by a decrease in the regulatory effect of the enhancer on gene expression, consistent with a contact-based model for enhancer function. However, changes in contact frequency and regulatory effects on gene expression vary as a function of distance, with distal enhancers (e.g., >50Kb) experiencing much larger changes than proximal ones (e.g., <50Kb). Because most enhancers are located close to their target genes, these observations can explain how only a small subset of genes -- those with strong distal enhancers -- are sensitive to cohesin. Together, our results illuminate how 3D contacts, influenced by both cohesin and genomic distance, tune enhancer effects on gene expression.
]]></description>
<dc:creator>Guckelberger, P.</dc:creator>
<dc:creator>Doughty, B. R.</dc:creator>
<dc:creator>Munson, G.</dc:creator>
<dc:creator>Rao, S. S. P.</dc:creator>
<dc:creator>Tan, Y.</dc:creator>
<dc:creator>Cai, X. S.</dc:creator>
<dc:creator>Fulco, C. P.</dc:creator>
<dc:creator>Nasser, J.</dc:creator>
<dc:creator>Mualim, K. S.</dc:creator>
<dc:creator>Bergman, D. T.</dc:creator>
<dc:creator>Ray, J.</dc:creator>
<dc:creator>Jagoda, E.</dc:creator>
<dc:creator>Munger, C. J.</dc:creator>
<dc:creator>Gschwind, A. R.</dc:creator>
<dc:creator>Sheth, M. U.</dc:creator>
<dc:creator>Tan, A. S.</dc:creator>
<dc:creator>Steinmetz, L. M.</dc:creator>
<dc:creator>Lander, E. S.</dc:creator>
<dc:creator>Meissner, A.</dc:creator>
<dc:creator>Lieberman Aiden, E.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:date>2024-07-12</dc:date>
<dc:identifier>doi:10.1101/2024.07.12.603288</dc:identifier>
<dc:title><![CDATA[Cohesin-mediated 3D contacts tune enhancer-promoter regulation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-07-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.09.10.612293v1?rss=1">
<title>
<![CDATA[
A cell and transcriptome atlas of the human arterial vasculature 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.09.10.612293v1?rss=1"
</link>
<description><![CDATA[
Contiguous arterial segments show different propensities for different vascular pathologies, yet mechanisms explaining these fundamental differences remain unknown. We sought to build a transcriptomic, cellular, and spatial atlas of human arterial cells across multiple different arterial segments to understand these underlying differences.

Analysis of multiple isogenic arterial segments from healthy donors reveals a significant stereotyped pattern of cell type-specific segmental heterogeneity in healthy arteries. Combining single cell analysis with spatial transcriptomic data reveals cellular heterogeneity not captured by commonly used cell-type marker genes. Determinants of arterial transcriptomic identities are predominantly encoded in fibroblasts and smooth muscle cells (SMC), and their differentially expressed genes are particularly enriched for different vascular disease-associated genetic risk- loci and risk-genes. Adventitial fibroblast-specific heterogeneity in gene expression coincides with a disproportionally large number of vascular disease genetic signals, suggesting a previously unrecognized role for this cell type in disease risk. Adult arterial cells from different segments cluster not by anatomical proximity, but by embryonic origin. Global regulon analysis of disease related segment-specific gene expression program in fibroblast and SMC enriches for binding sites of transcription factors that are developmental master regulators whose expression persists into adulthood, suggesting an important functional role of the same developmental master regulators in adult gene expression and disease. Lastly, non-coding transcriptomes across arterial cells contain extensive variation in lncRNAs expressed in cell type- and segment-specific patterns, rivaling heterogeneity in protein coding transcriptomes. Differentially expressed LncRNA demonstrate enrichment for non-coding genetic signals for vascular diseases, suggesting a potential global role of segmental specific LncRNAs in regulating inherited human vascular disease risk.
]]></description>
<dc:creator>Zhao, Q.</dc:creator>
<dc:creator>Pedroza, A.</dc:creator>
<dc:creator>Sharma, D.</dc:creator>
<dc:creator>Gu, W.</dc:creator>
<dc:creator>Dalal, A.</dc:creator>
<dc:creator>Weldy, C.</dc:creator>
<dc:creator>Jackson, W.</dc:creator>
<dc:creator>Li, D. Y.</dc:creator>
<dc:creator>Ryan, Y.</dc:creator>
<dc:creator>Nguyen, T.</dc:creator>
<dc:creator>Shad, R.</dc:creator>
<dc:creator>Palmisano, B. T.</dc:creator>
<dc:creator>Monteiro, J. P.</dc:creator>
<dc:creator>Worssam, M.</dc:creator>
<dc:creator>Berezwitz, A.</dc:creator>
<dc:creator>Iyer, M.</dc:creator>
<dc:creator>Shi, H.</dc:creator>
<dc:creator>Kundu, R.</dc:creator>
<dc:creator>Limbu, L.</dc:creator>
<dc:creator>Kim, J. B.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Fischbein, M.</dc:creator>
<dc:creator>Wirka, R.</dc:creator>
<dc:creator>Quertermous, T.</dc:creator>
<dc:creator>Cheng, P.</dc:creator>
<dc:date>2024-09-10</dc:date>
<dc:identifier>doi:10.1101/2024.09.10.612293</dc:identifier>
<dc:title><![CDATA[A cell and transcriptome atlas of the human arterial vasculature]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-09-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.02.13.579700v1?rss=1">
<title>
<![CDATA[
A missense variant effect map for the human tumour suppressor protein CHK2 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.02.13.579700v1?rss=1"
</link>
<description><![CDATA[
The tumour suppressor CHEK2 encodes the serine/threonine protein kinase CHK2 which, upon DNA damage, is important for pausing the cell cycle, initiating DNA repair and inducing apoptosis. CHK2 phosphorylation of the tumour suppressor BRCA1 is also important for mitotic spindle assembly and chromosomal stability. Consistent with its cell cycle checkpoint role, both germline and somatic variants in CHEK2 have been linked to breast and multiple other cancer types. Over 90% of clinical germline CHEK2 missense variants are classified as variants of uncertain significance, complicating diagnosis of CHK2-dependent cancer. We therefore sought to test the functional impact of all possible missense variants in CHK2. Using a scalable multiplexed assay based on the ability of human CHK2 to complement DNA sensitivity of a S. cerevisiae lacking its ortholog RAD53, we generated a systematic  missense variant effect map for CHEK2 missense variation. Map scores reflect known biochemical features of CHK2 and exhibit good performance in separating pathogenic from benign clinical missense variants. Thus, the missense variant effect map for CHK2 offers value in understanding both known and yet-to-be-observed CHK2 variants.
]]></description>
<dc:creator>Gebbia, M.</dc:creator>
<dc:creator>Zimmerman, D. I.</dc:creator>
<dc:creator>Jiang, R.</dc:creator>
<dc:creator>Nguyen, M.</dc:creator>
<dc:creator>Weile, J.</dc:creator>
<dc:creator>Li, R.</dc:creator>
<dc:creator>Gavac, M.</dc:creator>
<dc:creator>Kishore, N.</dc:creator>
<dc:creator>Sun, S.</dc:creator>
<dc:creator>Boonen, R. A.</dc:creator>
<dc:creator>Dines, J. N.</dc:creator>
<dc:creator>Wahl, A.</dc:creator>
<dc:creator>Reuter, J.</dc:creator>
<dc:creator>Johnson, B.</dc:creator>
<dc:creator>Fowler, D.</dc:creator>
<dc:creator>van Attikum, H.</dc:creator>
<dc:creator>Roth, F. P.</dc:creator>
<dc:date>2024-02-15</dc:date>
<dc:identifier>doi:10.1101/2024.02.13.579700</dc:identifier>
<dc:title><![CDATA[A missense variant effect map for the human tumour suppressor protein CHK2]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-02-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.06.02.597061v1?rss=1">
<title>
<![CDATA[
BIT: Bayesian Identification of Transcriptional Regulators 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.06.02.597061v1?rss=1"
</link>
<description><![CDATA[
Transcriptional regulators (TRs) are master controllers of gene expression and play a critical role in both normal tissue development and disease progression. However, existing computational methods for identification of TRs regulating specific biological processes have significant limitations, such as relying on distance on a linear chromosome or binding motifs that have low specificity. Many also use statistical tests in ways that lack interpretability and rigorous confidence measures. We introduce BIT, a novel Bayesian hierarchical model for in-silico TR identification. Leveraging a comprehensive library of TR ChIP-seq data, BIT offers a fully integrated Bayesian approach to assess genome-wide consistency between user-provided epigenomic profiling data and the TR binding library, enabling the identification of critical TRs while quantifying uncertainty. It avoids estimation and inference in a sequential manner or numerous isolated statistical tests, thereby enhancing accuracy and interpretability. BIT successfully identified critical TRs in perturbation experiments, functionally essential TRs in various cancer types, and cell-type-specific TRs within heterogeneous cell populations, offering deeper biological insights into transcriptional regulation.
]]></description>
<dc:creator>Lu, Z.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Wang, X.</dc:creator>
<dc:date>2024-06-03</dc:date>
<dc:identifier>doi:10.1101/2024.06.02.597061</dc:identifier>
<dc:title><![CDATA[BIT: Bayesian Identification of Transcriptional Regulators]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-06-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.08.30.610571v1?rss=1">
<title>
<![CDATA[
BayeSMART: Bayesian Clustering of Multi-sample Spatially Resolved Transcriptomics Data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.08.30.610571v1?rss=1"
</link>
<description><![CDATA[
The field of spatially resolved transcriptomics (SRT) has greatly advanced our understanding of cellular microenvironments by integrating spatial information with molecular data collected from multiple tissue sections or individuals. However, methods for multi-sample spatial clustering are lacking, and existing methods primarily rely on molecular information alone. This paper introduces BayeSMART, a Bayesian statistical method designed to identify spatial domains across multiple samples. BayeSMART leverages artificial intelligence (AI)-reconstructed single-cell level information from the paired histology images of multi-sample SRT datasets while simultaneously considering the spatial context of gene expression. The AI integration enables BayeSMART to effectively interpret the spatial domains. We conducted case studies using four datasets from various tissue types and SRT platforms and compared BayeSMART with alternative multi-sample spatial clustering approaches and a number of state-of-the-art methods for single-sample SRT analysis, demonstrating that it surpasses existing methods in terms of clustering accuracy, interpretability, and computational efficiency. BayeSMART offers new insights into the spatial organization of cells in multi-sample SRT data.
]]></description>
<dc:creator>Guo, Y.</dc:creator>
<dc:creator>Zhu, B.</dc:creator>
<dc:creator>Tang, C.</dc:creator>
<dc:creator>Rong, R.</dc:creator>
<dc:creator>Ma, Y.</dc:creator>
<dc:creator>Xiao, G.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:date>2024-09-01</dc:date>
<dc:identifier>doi:10.1101/2024.08.30.610571</dc:identifier>
<dc:title><![CDATA[BayeSMART: Bayesian Clustering of Multi-sample Spatially Resolved Transcriptomics Data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-09-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.06.04.597391v1?rss=1">
<title>
<![CDATA[
A Regularized Bayesian Dirichlet-multinomial Regression Model for Integrating Single-cell-level Omics and Patient-level Clinical Study Data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.06.04.597391v1?rss=1"
</link>
<description><![CDATA[
SummaryThe abundance of various cell types can vary significantly among patients with varying phenotypes and even those with the same phenotype. Recent scientific advancements provide mounting evidence that other clinical variables, such as age, gender, and lifestyle habits, can also influence the abundance of certain cell types. However, current methods for integrating single-cell-level omics data with clinical variables are inadequate. In this study, we propose a regularized Bayesian Dirichlet-multinomial regression framework to investigate the relationship between single-cell RNA sequencing data and patient-level clinical data. Additionally, the model employs a novel hierarchical tree structure to identify such relationships at different cell-type levels. Our model successfully uncovers significant associations between specific cell types and clinical variables across three distinct diseases: pulmonary fibrosis, COVID-19, and non-small cell lung cancer. This integrative analysis provides biological insights and could potentially inform clinical interventions for various diseases.
]]></description>
<dc:creator>Guo, Y.</dc:creator>
<dc:creator>Yu, L.</dc:creator>
<dc:creator>Guo, L.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:date>2024-06-06</dc:date>
<dc:identifier>doi:10.1101/2024.06.04.597391</dc:identifier>
<dc:title><![CDATA[A Regularized Bayesian Dirichlet-multinomial Regression Model for Integrating Single-cell-level Omics and Patient-level Clinical Study Data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-06-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.02.01.578316v1?rss=1">
<title>
<![CDATA[
Assessing NGS-based computational methods for predicting transcriptional regulators with query gene sets 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.02.01.578316v1?rss=1"
</link>
<description><![CDATA[
This article provides an in-depth review of computational methods for predicting transcriptional regulators with query gene sets. Identification of transcriptional regulators is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement.

Key pointsO_LIAn introduction to available computational methods for predicting functional TRs from a query gene set.
C_LIO_LIA detailed walk-through along with practical concerns and limitations.
C_LIO_LIA systematic benchmark of NGS-based methods in terms of accuracy, sensitivity, coverage, and usability, using 570 TR perturbation-derived gene sets.
C_LIO_LINGS-based methods outperform motif-based methods. Among NGS methods, those utilizing larger databases and adopting region-centric approaches demonstrate favorable performance. BART, ChIP-Atlas, and Lisa are recommended as these methods have overall better performance in evaluated scenarios.
C_LI
]]></description>
<dc:creator>Lu, Z.</dc:creator>
<dc:creator>Xiao, X.</dc:creator>
<dc:creator>Zheng, Q.</dc:creator>
<dc:creator>Wang, X.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:date>2024-02-06</dc:date>
<dc:identifier>doi:10.1101/2024.02.01.578316</dc:identifier>
<dc:title><![CDATA[Assessing NGS-based computational methods for predicting transcriptional regulators with query gene sets]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-02-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.09.09.612085v1?rss=1">
<title>
<![CDATA[
CRISPR-CLEAR: Nucleotide-Resolution Mapping of Regulatory Elements via Allelic Readout of Tiled Base Editing 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.09.09.612085v1?rss=1"
</link>
<description><![CDATA[
CRISPR tiling screens have advanced the identification and characterization of regulatory sequences but are limited by low resolution arising from the indirect readout of editing via guide RNA sequencing. This study introduces CRISPR-CLEAR, an end-to-end experimental assay and computational pipeline, which leverages targeted sequencing of CRISPR-introduced alleles at the endogenous target locus following dense base-editing mutagenesis. This approach enables the dissection of regulatory elements at nucleotide resolution, facilitating a direct assessment of genotype-phenotype effects.
]]></description>
<dc:creator>Becerra, B.</dc:creator>
<dc:creator>Wittibschlager, S.</dc:creator>
<dc:creator>Patel, Z. M.</dc:creator>
<dc:creator>Kutschat, A.</dc:creator>
<dc:creator>Delano, J.</dc:creator>
<dc:creator>Karjalainen, A.</dc:creator>
<dc:creator>Wu, T.</dc:creator>
<dc:creator>Starrs, M.</dc:creator>
<dc:creator>Jankowiak, M.</dc:creator>
<dc:creator>Bauer, D.</dc:creator>
<dc:creator>Seruggia, D.</dc:creator>
<dc:creator>Pinello, L.</dc:creator>
<dc:date>2024-09-09</dc:date>
<dc:identifier>doi:10.1101/2024.09.09.612085</dc:identifier>
<dc:title><![CDATA[CRISPR-CLEAR: Nucleotide-Resolution Mapping of Regulatory Elements via Allelic Readout of Tiled Base Editing]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-09-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.09.04.611293v1?rss=1">
<title>
<![CDATA[
Characterization and bioinformatic filtering of ambient gRNAs in single-cell CRISPR screens using CLEANSER 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.09.04.611293v1?rss=1"
</link>
<description><![CDATA[
Recent technological developments in single-cell RNA-seq CRISPR screens enable high-throughput investigation of the genome. Through transduction of a gRNA library to a cell population followed by transcriptomic profiling by scRNA-seq, it is possible to characterize the effects of thousands of genomic perturbations on global gene expression. A major source of noise in scRNA-seq CRISPR screens are ambient gRNAs, which are contaminating gRNAs that likely originate from other cells. If not properly filtered, ambient gRNAs can result in an excess of false positive gRNA assignments. Here, we utilize CRISPR barnyard assays to characterize ambient gRNA noise in single-cell CRISPR screens. We use these datasets to develop and train CLEANSER, a mixture model that identifies and filters ambient gRNA noise. This model takes advantage of the bimodal distribution between native and ambient gRNAs and includes both gRNA and cell-specific normalization parameters, correcting for confounding technical factors that affect individual gRNAs and cells. The output of CLEANSER is the probability that a gRNA-cell assignment is in the native distribution over the ambient distribution. We find that ambient gRNA filtering methods impact differential gene expression analysis outcomes and that CLEANSER outperforms alternate approaches by increasing gRNA-cell assignment accuracy.

Graphical Abstract

O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=189 SRC="FIGDIR/small/611293v1_ufig1.gif" ALT="Figure 1">
View larger version (66K):
org.highwire.dtl.DTLVardef@165c63dorg.highwire.dtl.DTLVardef@ba0e15org.highwire.dtl.DTLVardef@f2b12eorg.highwire.dtl.DTLVardef@14e6c86_HPS_FORMAT_FIGEXP  M_FIG C_FIG
]]></description>
<dc:creator>Liu, S.</dc:creator>
<dc:creator>Hamilton, M. C.</dc:creator>
<dc:creator>Cowart, T. N.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>Bounds, L. R.</dc:creator>
<dc:creator>Nelson, A. C.</dc:creator>
<dc:creator>Doty, R. W.</dc:creator>
<dc:creator>Allen, A. S.</dc:creator>
<dc:creator>Crawford, G. E.</dc:creator>
<dc:creator>Majoros, W. H.</dc:creator>
<dc:creator>Gersbach, C. A.</dc:creator>
<dc:date>2024-09-04</dc:date>
<dc:identifier>doi:10.1101/2024.09.04.611293</dc:identifier>
<dc:title><![CDATA[Characterization and bioinformatic filtering of ambient gRNAs in single-cell CRISPR screens using CLEANSER]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-09-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.09.19.613754v1?rss=1">
<title>
<![CDATA[
scooby: Modeling multi-modal genomic profiles from DNA sequence at single-cell resolution 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.09.19.613754v1?rss=1"
</link>
<description><![CDATA[
Understanding how regulatory DNA elements shape gene expression across individual cells is a fundamental challenge in genomics. Joint RNA-seq and epigenomic profiling provides opportunities to build unifying models of gene regulation capturing sequence determinants across steps of gene expression. However, current models, developed primarily for bulk omics data, fail to capture the cellular heterogeneity and dynamic processes revealed by single-cell multi-modal technologies. Here, we introduce scooby, the first framework to model scRNA-seq coverage and scATAC-seq insertion profiles along the genome from sequence at single-cell resolution. For this, we leverage the pre-trained multi-omics profile predictor Borzoi as a foundation model, equip it with a cell-specific decoder, and fine-tune its sequence embeddings. Specifically, we condition the decoder on the cell position in a precomputed single-cell embedding resulting in strong generalization capability. Applied to a hematopoiesis dataset, scooby recapitulates cell-specific expression levels of held-out genes, and identifies regulators and their putative target genes through in silico motif deletion. Moreover, accurate variant effect prediction with scooby allows for breaking down bulk eQTL effects into single-cell effects and delineating their impact on chromatin accessibility and gene expression. We anticipate scooby to aid unraveling the complexities of gene regulation at the resolution of individual cells.
]]></description>
<dc:creator>Gagneur, J.</dc:creator>
<dc:creator>Hingerl, J. C.</dc:creator>
<dc:creator>Martens, L. D.</dc:creator>
<dc:creator>Manz, T.</dc:creator>
<dc:creator>Theis, F. J.</dc:creator>
<dc:creator>Buenrostro, J. D.</dc:creator>
<dc:creator>Karollus, A.</dc:creator>
<dc:date>2024-09-22</dc:date>
<dc:identifier>doi:10.1101/2024.09.19.613754</dc:identifier>
<dc:title><![CDATA[scooby: Modeling multi-modal genomic profiles from DNA sequence at single-cell resolution]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-09-22</prism:publicationDate>
<prism:section></prism:section>
</item>
</rdf:RDF>
