	<rdf:RDF xmlns:admin="http://webns.net/mvcb/" xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:prism="http://purl.org/rss/1.0/modules/prism/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/">
	<channel rdf:about="https://biorxiv.org">
	<admin:errorReportsTo rdf:resource="mailto:biorxiv@cshlpress.edu"/>
	<title>bioRxiv Channel: Impact of Genomic Variation on Function (IGVF)</title>
	<link>https://biorxiv.org</link>
	<description>
	This feed contains articles for bioRxiv Channel "Impact of Genomic Variation on Function (IGVF)"
	</description>

		<items>
	<rdf:Seq>
		</rdf:Seq>
	</items>
	<prism:eIssn/>
	<prism:publicationName>bioRxiv</prism:publicationName>
	<prism:issn/>

	<image rdf:resource=""/>
	</channel>
	<image rdf:about="">
	<title>bioRxiv</title>
	<url/>
	<link>https://biorxiv.org</link>
	</image>
	<item rdf:about="https://biorxiv.org/cgi/content/short/2022.10.24.513593v1?rss=1">
<title>
<![CDATA[
EUGENe: A Python toolkit for predictive analyses of regulatory sequences 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.10.24.513593v1?rss=1"
</link>
<description><![CDATA[
Deep learning (DL) has become a popular tool to study cis-regulatory element function. Yet efforts to design software for DL analyses in genomics that are Findable, Accessible, Interoperable and Reusable (FAIR) have fallen short of fully meeting these criteria. Here we present EUGENe (Elucidating the Utility of Genomic Elements with Neural Nets), a FAIR toolkit for the analysis of labeled sets of nucleotide sequences with DL. EUGENe consists of a set of modules that empower users to execute the key functionality of a DL workflow: 1) extracting, transforming and loading sequence data from many common file formats, 2) instantiating, initializing and training diverse model architectures, and 3) evaluating and interpreting model behavior. We designed EUGENe to be simple; users can develop workflows on new or existing datasets with two customizable Python objects, annotated sequence data (SeqData) and PyTorch models (BaseModel). The modularity and simplicity of EUGENe also make it highly extensible and we illustrate these principles through application of the toolkit to three predictive modeling tasks. First, we train and compare a set of built-in models along with a custom architecture for the accurate prediction of activities of plant promoters from STARR-seq data. Next, we apply EUGENe to an RNA binding prediction task and showcase how seminal model architectures can be retrained in EUGENe or imported from Kipoi. Finally, we train models to classify transcription factor binding by wrapping functionality from Janngu, which can efficiently extract sequences in BED file format from the human genome. We emphasize that the code used in each use case is simple, readable, and well documented (https://eugene-tools.readthedocs.io/en/latest/index.html). We believe that EUGENe represents a springboard toward a collaborative ecosystem for DL applications in genomics research. EUGENe is available for download on GitHub (https://github.com/cartercompbio/EUGENe) along with several introductory tutorials and for installation on PyPi (https://pypi.org/project/eugene-tools/).
]]></description>
<dc:creator>Klie, A.</dc:creator>
<dc:creator>Stites, H.</dc:creator>
<dc:creator>Jores, T.</dc:creator>
<dc:creator>Carter, H.</dc:creator>
<dc:date>2022-10-26</dc:date>
<dc:identifier>doi:10.1101/2022.10.24.513593</dc:identifier>
<dc:title><![CDATA[EUGENe: A Python toolkit for predictive analyses of regulatory sequences]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-10-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.08.491094v1?rss=1">
<title>
<![CDATA[
A framework for summarizing chromatin state annotations within and identifying differential annotations across groups of samples 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.08.491094v1?rss=1"
</link>
<description><![CDATA[
MotivationGenome-wide maps of epigenetic modifications are powerful resources for non-coding genome annotation. Maps of multiple epigenetics marks have been integrated into cell or tissue type-specific chromatin state annotations for many cell or tissue types. With the increasing availability of multiple chromatin state maps for biologically similar samples, there is a need for methods that can effectively summarize the information about chromatin state annotations within groups of samples and identify differences across groups of samples at a high resolution.

ResultsWe developed CSREP, which takes as input chromatin state annotations for a group of samples and then probabilistically estimates the state at each genomic position and derives a representative chromatin state map for the group. CSREP uses an ensemble of multi-class logistic regression classifiers to predict the chromatin state assignment of each sample given the state maps from all other samples. The difference of CSREPs probability assignments for two groups can be used to identify genomic locations with differential chromatin state patterns.

Using groups of chromatin state maps of a diverse set of cell and tissue types, we demonstrate the advantages of using CSREP to summarize chromatin state maps and identify biologically relevant differences between groups at a high resolution.

Availability and implementationThe CSREP source code is openly available under http://github.com/ernstlab/csrep.

Contact: jason.ernst@ucla.edu
]]></description>
<dc:creator>Vu, H. T.</dc:creator>
<dc:creator>Koch, Z.</dc:creator>
<dc:creator>Fiziev, P.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2022-05-08</dc:date>
<dc:identifier>doi:10.1101/2022.05.08.491094</dc:identifier>
<dc:title><![CDATA[A framework for summarizing chromatin state annotations within and identifying differential annotations across groups of samples]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.08.02.502571v1?rss=1">
<title>
<![CDATA[
Chromatin state modeling across individuals reveals global patterns of histone modifications 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.08.02.502571v1?rss=1"
</link>
<description><![CDATA[
Epigenetic mapping studies across individuals have identified many positions of epigenetic variation in various human tissues and conditions. However the relationships between these positions, and in particular global patterns that recur in many regions of the genome remains understudied. In this study, we use a stacked chromatin state model to systematically learn global patterns of epigenetic variation across individuals and annotate the human genome based on them. We applied this framework to histone modification data across individuals in lymphoblastoid cell lines and across autism spectrum disorder cases and controls in prefrontal cortex tissue. We find that global patterns are correlated across multiple histone modifications and with gene expression. We used the global patterns as a framework to predict transregulators, identify trans-QTL, and study complex disease. The frameworks for identifying and analyzing global patterns of epigenetic variation are general and we expect will be useful in other systems.
]]></description>
<dc:creator>Zou, J.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2022-08-03</dc:date>
<dc:identifier>doi:10.1101/2022.08.02.502571</dc:identifier>
<dc:title><![CDATA[Chromatin state modeling across individuals reveals global patterns of histone modifications]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-08-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.01.18.476849v1?rss=1">
<title>
<![CDATA[
Exploring genomic data coupled with 3D chromatin structures using the WashU Epigenome Browser 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.01.18.476849v1?rss=1"
</link>
<description><![CDATA[
Biological functions are not only encoded by the genomes sequence but also regulated by its three-dimensional (3D) structure. More and more studies have revealed the importance of 3D chromatin structures in development and diseases; therefore, visualizing the connections between genome sequence, epigenomic dynamics (1D) and the 3D genome becomes a pressing need. The WashU Epigenome Browser introduces a new 3D visualization module to integrate visualization of 1D (such as sequence features, epigenomic data) and 2D data (such as chromosome conformation capture data) with 3D genome structure. Genomic coordinates are encoded in 3D models of the chromosomes; thus, all genomic information displayed on a 1D genome browser can be visualized on a 3D model, supported by genome browser utilities and facilitating interpretation of genomic data. Biological information that is difficult to illustrate in 1D becomes more intuitive when displayed in 3D, providing novel and powerful tools for investigators to hypothesize and understand the connections between biological functions and 3D genome structures.
]]></description>
<dc:creator>Li, D.</dc:creator>
<dc:creator>Purushotham, D.</dc:creator>
<dc:creator>Harrison, J. K.</dc:creator>
<dc:creator>Wang, T.</dc:creator>
<dc:date>2022-01-21</dc:date>
<dc:identifier>doi:10.1101/2022.01.18.476849</dc:identifier>
<dc:title><![CDATA[Exploring genomic data coupled with 3D chromatin structures using the WashU Epigenome Browser]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-01-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.24.493345v1?rss=1">
<title>
<![CDATA[
ChromGene: Gene-Based Modeling of Epigenomic Data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.24.493345v1?rss=1"
</link>
<description><![CDATA[
BackgroundVarious computational approaches have been developed to annotate epigenomes on a per-position basis by modeling combinatorial and spatial patterns within epigenomic data. However, such annotations are less suitable for gene-based analyses, in which a single annotation for each gene is desired.

ResultsTo address this, we developed ChromGene, which annotates genes based on the combinatorial and spatial patterns of multiple epigenomic marks across the gene body and flanking regions. Specifically, ChromGene models the epigenomics maps using a mixture of hidden Markov models learned de novo. Using ChromGene, we generated annotations for the human protein-coding genes for over 100 cell and tissue types. We characterize the different mixture components and their associated gene sets in terms of gene expression, constraint, and other gene annotations. We also characterize variation in ChromGene gene annotations across cell and tissue types.

ConclusionsWe expect that the ChromGene method and provided annotations will be a useful resource for gene-based epigenomic analyses.
]]></description>
<dc:creator>Jaroszewicz, A.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2022-05-25</dc:date>
<dc:identifier>doi:10.1101/2022.05.24.493345</dc:identifier>
<dc:title><![CDATA[ChromGene: Gene-Based Modeling of Epigenomic Data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-25</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.02.490368v1?rss=1">
<title>
<![CDATA[
Genome-wide CRISPR guide RNA design and specificity analysis with GuideScan2 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.02.490368v1?rss=1"
</link>
<description><![CDATA[
We present GuideScan2 for memory-efficient, parallelizable construction of high-specificity CRISPR guide RNA (gRNA) databases and user-friendly gRNA/library design in custom genomes. GuideScan2 analysis identified widespread confounding effects of low-specificity gRNAs in published CRISPR knockout, interference and activation screens and enabled construction of a ready-to-use gRNA library that reduced off-target effects in a novel gene essentiality screen. GuideScan2 also enabled the design and experimental validation of allele-specific gRNAs in a hybrid mouse genome.
]]></description>
<dc:creator>Schmidt, H.</dc:creator>
<dc:creator>Zhang, M.</dc:creator>
<dc:creator>Mourelatos, H.</dc:creator>
<dc:creator>Sanchez-Rivera, F. J.</dc:creator>
<dc:creator>Lowe, S. W.</dc:creator>
<dc:creator>Ventura, A.</dc:creator>
<dc:creator>Leslie, C. S.</dc:creator>
<dc:creator>Pritykin, Y.</dc:creator>
<dc:date>2022-05-03</dc:date>
<dc:identifier>doi:10.1101/2022.05.02.490368</dc:identifier>
<dc:title><![CDATA[Genome-wide CRISPR guide RNA design and specificity analysis with GuideScan2]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.02.28.482131v1?rss=1">
<title>
<![CDATA[
Heterogeneity of Inflammation-associated Synovial Fibroblasts in Rheumatoid Arthritis and Its Drivers 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.02.28.482131v1?rss=1"
</link>
<description><![CDATA[
Inflammation of non-barrier immunologically quiescent tissues is associated with a massive influx of blood-borne innate and adaptive immune cells. Cues from the latter are likely to alter and expand the spectrum of states observed in cells that are constitutively resident. However, local communications between immigrant and resident cell types in human inflammatory disease remain poorly understood. Here, we explored heterogeneity of synovial fibroblasts (FLS) in inflamed joints of rheumatoid arthritis (RA) patients using paired single cell RNA and ATAC sequencing (scRNA/ATAC-seq), multiplexed imaging, and spatial transcriptomics along with in vitro modeling of cell extrinsic factor signaling. These analyses suggest that local exposures to myeloid and T cell derived cytokines, TNF, IFN{gamma}, IL-1{beta}, or lack thereof, drive six distinct FLS states some of which closely resemble fibroblast states in other disease-affected tissues including skin and colon. Our results highlight a role for concurrent, spatially distributed cytokine signaling within the inflamed synovium.
]]></description>
<dc:creator>Smith, M. H.</dc:creator>
<dc:creator>Gao, V. R.</dc:creator>
<dc:creator>Schizas, M.</dc:creator>
<dc:creator>Kochen, A.</dc:creator>
<dc:creator>DiCarlo, E.</dc:creator>
<dc:creator>Goodman, S.</dc:creator>
<dc:creator>Norman, T. M.</dc:creator>
<dc:creator>Donlin, L.</dc:creator>
<dc:creator>Leslie, C. S.</dc:creator>
<dc:creator>Rudensky, A. Y.</dc:creator>
<dc:date>2022-03-02</dc:date>
<dc:identifier>doi:10.1101/2022.02.28.482131</dc:identifier>
<dc:title><![CDATA[Heterogeneity of Inflammation-associated Synovial Fibroblasts in Rheumatoid Arthritis and Its Drivers]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-03-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.11.26.470154v1?rss=1">
<title>
<![CDATA[
Diverse digital and fuzzy composite transcriptional elements are prevalent features of mammalian cis-regulomes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.11.26.470154v1?rss=1"
</link>
<description><![CDATA[
Mammalian transcriptional regulatory sequences are comprised of complex combinations of simple transcription factor (TF) motifs. Stereospecific juxta-positioning of simple TF motifs generates composite elements (CEs), that increase combinatorial and regulatory specificity of TF-DNA interactions. Although a small number of CEs and their cooperative or anti-cooperative modes of TF binding have been thoroughly characterized, a systematic analysis of CE diversity, prevalence and properties in cis-regulomes has not been undertaken. We developed a computational pipeline termed CEseek to discover >20,000 CEs in open chromatin regions of diverse immune cells and validated many using CAP-SELEX, ChIP-Seq and STARR-seq datasets. Strikingly, the CEs manifested a bimodal distribution of configurations, termed digital and fuzzy, based on their stringent or relaxed stereospecific constraints, respectively. Digital CEs mediate cooperative as well as anti-cooperative binding of structurally diverse TFs that likely reflect AND/OR genomic logic gates. In contrast, fuzzy CEs encompass a less diverse set of TF motif pairs that are selectively enriched in p300 associated, multi-genic enhancers. The annotated CEs greatly expand the regulatory DNA motif lexicon and the universe of TF-TF interactions that underlie combinatorial logic of gene regulation.
]]></description>
<dc:creator>Chaudhri, V. K.</dc:creator>
<dc:creator>Singh, H.</dc:creator>
<dc:date>2021-11-27</dc:date>
<dc:identifier>doi:10.1101/2021.11.26.470154</dc:identifier>
<dc:title><![CDATA[Diverse digital and fuzzy composite transcriptional elements are prevalent features of mammalian cis-regulomes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-11-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.03.16.483999v1?rss=1">
<title>
<![CDATA[
Evolution of transposable element-derived enhancer activity 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.03.16.483999v1?rss=1"
</link>
<description><![CDATA[
Many transposable elements (TEs) contain transcription factor binding sites and are implicated as potential regulatory elements. However, TEs are rarely functionally tested for regulatory activity, which in turn limits our understanding of how TE regulatory activity has evolved. We systematically tested the human LTR18A subfamily for regulatory activity using massively parallel reporter assay (MPRA) and found AP-1 and C/EBP-related binding motifs as drivers of enhancer activity. Functional analysis of evolutionarily reconstructed ancestral sequences revealed that LTR18A elements have generally lost regulatory activity over time through sequence changes, with the largest effects occurring due to mutations in the AP-1 and C/EBP motifs. We observed that the two motifs are conserved at higher rates than expected based on neutral evolution. Finally, we identified LTR18A elements as potential enhancers in the human genome, primarily in epithelial cells. Together, our results provide a model for the origin, evolution, and co-option of TE-derived regulatory elements.
]]></description>
<dc:creator>Du, A. Y.</dc:creator>
<dc:creator>Zhuo, X.</dc:creator>
<dc:creator>Sundaram, V.</dc:creator>
<dc:creator>Jensen, N. O.</dc:creator>
<dc:creator>Chaudhari, H. G.</dc:creator>
<dc:creator>Saccone, N. L.</dc:creator>
<dc:creator>Cohen, B. A.</dc:creator>
<dc:creator>Wang, T.</dc:creator>
<dc:date>2022-03-17</dc:date>
<dc:identifier>doi:10.1101/2022.03.16.483999</dc:identifier>
<dc:title><![CDATA[Evolution of transposable element-derived enhancer activity]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-03-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.26.493621v1?rss=1">
<title>
<![CDATA[
The dynseq genome browser track enables visualization of context-specific, dynamic DNA sequence features at single nucleotide resolution 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.26.493621v1?rss=1"
</link>
<description><![CDATA[
We introduce the dynseq genome browser track, which displays DNA nucleotide characters scaled by user-specified, base-resolution scores provided in the BigWig file format. The dynseq track enables visualization of context-specific, informative genomic sequence features. We demonstrate its utility in three popular genome browsers for interpreting cis-regulatory sequence syntax and regulatory variant interpretation by visualizing nucleotide importance scores derived from machine learning models of regulatory DNA trained on protein-DNA binding and chromatin accessibility experiments.
]]></description>
<dc:creator>Nair, S.</dc:creator>
<dc:creator>Barrett, A.</dc:creator>
<dc:creator>Li, D.</dc:creator>
<dc:creator>Raney, B. J.</dc:creator>
<dc:creator>Lee, B. T.</dc:creator>
<dc:creator>Kerpedjiev, P.</dc:creator>
<dc:creator>Ramalingam, V.</dc:creator>
<dc:creator>Pampari, A.</dc:creator>
<dc:creator>Lekschas, F.</dc:creator>
<dc:creator>Wang, T.</dc:creator>
<dc:creator>Haeussler, M.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:date>2022-05-28</dc:date>
<dc:identifier>doi:10.1101/2022.05.26.493621</dc:identifier>
<dc:title><![CDATA[The dynseq genome browser track enables visualization of context-specific, dynamic DNA sequence features at single nucleotide resolution]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.08.20.504670v1?rss=1">
<title>
<![CDATA[
A mutation rate model at the basepair resolution identifies the mutagenic effect of Polymerase III transcription 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.08.20.504670v1?rss=1"
</link>
<description><![CDATA[
De novo mutations occur with substantially different rates depending on genomic location, sequence context and DNA strand1-4. The success of many human genetics techniques, especially when applied to large population sequencing datasets with numerous recurrent mutations5-7, depends strongly on assumptions about the local mutation rate. Such techniques include estimation of selection intensity8, inference of demographic history9, and mapping of rare disease genes10. Here, we present Roulette, a genome-wide mutation rate model at the basepair resolution that incorporates known determinants of local mutation rate (http://genetics.bwh.harvard.edu/downloads/Vova/Roulette/). Roulette is shown to be more accurate than existing models1,6. Roulette has sufficient resolution at high mutation rate sites to model allele frequencies under recurrent mutation. We use Roulette to refine estimates of population growth within Europe by incorporating the full range of human mutation rates. The analysis of significant deviations from the model predictions revealed a 10-fold increase in mutation rate in nearly all genes transcribed by Polymerase III, suggesting a new mutagenic mechanism. We also detected an elevated mutation rate within transcription factor binding sites restricted to sites actively utilized in testis and residing in promoters.
]]></description>
<dc:creator>Seplyarskiy, V.</dc:creator>
<dc:creator>Lee, D. J.</dc:creator>
<dc:creator>Koch, E. M.</dc:creator>
<dc:creator>Lichtman, J. S.</dc:creator>
<dc:creator>Luan, H. H.</dc:creator>
<dc:creator>Sunyaev, S. R.</dc:creator>
<dc:date>2022-08-21</dc:date>
<dc:identifier>doi:10.1101/2022.08.20.504670</dc:identifier>
<dc:title><![CDATA[A mutation rate model at the basepair resolution identifies the mutagenic effect of Polymerase III transcription]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-08-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.08.28.505582v1?rss=1">
<title>
<![CDATA[
FAVOR: Functional Annotation of Variants Online Resource and Annotator for Variation across the Human Genome 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.08.28.505582v1?rss=1"
</link>
<description><![CDATA[
Large-scale whole genome sequencing (WGS) studies and biobanks are rapidly generating a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries or are unable to functionally annotate the genotype data of large WGS studies and biobanks for downstream analysis. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive online multi-faceted portal with summarization and visualization of all possible 9 billion single nucleotide variants (SNVs) across the genome, and allows for rapid variant-, gene-, and region-level online queries. It integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, a scalable annotation tool, FAVORannotator, is provided for functionally annotating and efficiently storing the genotype and variant functional annotation data of a large-scale sequencing study in an annotated GDS file format to facilitate downstream analysis. FAVOR and FAVORannotator are available at https://favor.genohub.org.
]]></description>
<dc:creator>Zhou, H.</dc:creator>
<dc:creator>Arapoglou, T.</dc:creator>
<dc:creator>Li, X.</dc:creator>
<dc:creator>Li, Z.</dc:creator>
<dc:creator>Zheng, X.</dc:creator>
<dc:creator>Moore, J. E.</dc:creator>
<dc:creator>Asok, A.</dc:creator>
<dc:creator>Kumar, S.</dc:creator>
<dc:creator>Blue, E. E.</dc:creator>
<dc:creator>Buyske, S.</dc:creator>
<dc:creator>Cox, N.</dc:creator>
<dc:creator>Felsenfeld, A.</dc:creator>
<dc:creator>Gerstein, M.</dc:creator>
<dc:creator>Kenny, E.</dc:creator>
<dc:creator>Li, B.</dc:creator>
<dc:creator>Matise, T.</dc:creator>
<dc:creator>Philippakis, A.</dc:creator>
<dc:creator>Rehm, H.</dc:creator>
<dc:creator>Sofia, H. J.</dc:creator>
<dc:creator>Neale, B.</dc:creator>
<dc:creator>Snyder, G.</dc:creator>
<dc:creator>Weng, Z.</dc:creator>
<dc:creator>Sunyaev, S.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:date>2022-08-29</dc:date>
<dc:identifier>doi:10.1101/2022.08.28.505582</dc:identifier>
<dc:title><![CDATA[FAVOR: Functional Annotation of Variants Online Resource and Annotator for Variation across the Human Genome]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-08-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.10.26.513833v1?rss=1">
<title>
<![CDATA[
Optimizing and benchmarking polygenic risk scores with GWAS summary statistics 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.10.26.513833v1?rss=1"
</link>
<description><![CDATA[
BackgroundPolygenic risk score (PRS) is a major research topic in human genetics. However, a significant gap exists between PRS methodology and applications in practice due to often unavailable individual-level data for various PRS tasks including model fine-tuning, benchmarking, and ensemble learning.

ResultsWe introduce an innovative statistical framework to optimize and benchmark PRS models using summary statistics of genome-wide association studies. This framework builds upon our previous work and can fine-tune virtually all existing PRS models while accounting for linkage disequilibrium. In addition, we provide an ensemble learning strategy named PUMAS-ensemble to combine multiple PRS models into an ensemble score without requiring external data for model fitting. Through extensive simulations and analysis of many complex traits in the UK Biobank, we demonstrate that this approach closely approximates gold-standard analytical strategies based on external validation, and substantially outperforms state-of-the-art PRS methods.

ConclusionsOur method is a powerful and general modeling technique that can continue to combine the best-performing PRS methods out there through ensemble learning and could become an integral component for all future PRS applications.
]]></description>
<dc:creator>Zhao, Z.</dc:creator>
<dc:creator>Gruenloh, T.</dc:creator>
<dc:creator>Wu, Y.</dc:creator>
<dc:creator>Sun, Z.</dc:creator>
<dc:creator>Miao, J.</dc:creator>
<dc:creator>Wu, Y.</dc:creator>
<dc:creator>Song, J.</dc:creator>
<dc:creator>Lu, Q.</dc:creator>
<dc:date>2022-10-27</dc:date>
<dc:identifier>doi:10.1101/2022.10.26.513833</dc:identifier>
<dc:title><![CDATA[Optimizing and benchmarking polygenic risk scores with GWAS summary statistics]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-10-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.02.08.479604v1?rss=1">
<title>
<![CDATA[
Mutagenesis at non-B DNA motifs in the human genome: a course correction 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.02.08.479604v1?rss=1"
</link>
<description><![CDATA[
Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs, and recurrent sequencing errors. Accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports. Mutagenesis not attributable to artifacts revealed several biological mechanisms. Polymerase slippage generates frequent indels within every variety of short tandem repeat motif, implicating slipped-strand structures. Interruption-correcting SNVs within STRs distinctly implicate error-prone Pol{kappa}. Secondary-structure formation promotes SNVs within palindromic repeats, as well as duplications within direct repeats. G-quadruplex motifs cause recurrent sequencing errors, while mutagenesis at Z-DNAs is conspicuously absent.
]]></description>
<dc:creator>McGinty, R. J.</dc:creator>
<dc:creator>Sunyaev, S. R.</dc:creator>
<dc:date>2022-02-09</dc:date>
<dc:identifier>doi:10.1101/2022.02.08.479604</dc:identifier>
<dc:title><![CDATA[Mutagenesis at non-B DNA motifs in the human genome: a course correction]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-02-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.07.20.500854v1?rss=1">
<title>
<![CDATA[
PerturbNet predicts single-cell responses to unseen chemical and genetic perturbations 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.07.20.500854v1?rss=1"
</link>
<description><![CDATA[
Small molecule treatment and gene knockout or overexpression induce complex changes in the molecular states of cells, and the space of possible perturbations is too large to measure exhaustively. We present PerturbNet, a deep generative model for predicting the distribution of cell states induced by unseen chemical or genetic perturbations. Our key innovation is to use high-throughput perturbation response data such as Perturb-Seq to learn a continuous mapping between the space of possible perturbations and the space of possible cell states.

Using Sci-Plex and LINCS datasets, PerturbNet can accurately predict the distribution of gene expression changes induced by unseen small molecules given only their chemical structures. PerturbNet also accurately predicts gene expression changes induced by shRNA, CRISPRi, or CRISPRa perturbations using a perturbation network trained on gene functional annotations. Furthermore, self-supervised sequence embeddings allow PerturbNet to predict gene expression changes induced by missense mutations. We also use PerturbNet to attribute cell state shifts to specific perturbation features, including atoms and functional gene annotations. Finally, we leverage PerturbNet to design perturbations that achieve a desired cell state distribution. PerturbNet holds great promise for understanding perturbation responses and ultimately designing novel chemical and genetic interventions.
]]></description>
<dc:creator>Yu, H.</dc:creator>
<dc:creator>Welch, J. D.</dc:creator>
<dc:date>2022-07-21</dc:date>
<dc:identifier>doi:10.1101/2022.07.20.500854</dc:identifier>
<dc:title><![CDATA[PerturbNet predicts single-cell responses to unseen chemical and genetic perturbations]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-07-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.11.29.403063v1?rss=1">
<title>
<![CDATA[
Cross-tissue eQTL mapping in the presence of missing data via surrogate outcome analysis 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.11.29.403063v1?rss=1"
</link>
<description><![CDATA[
Sample sizes vary substantially across tissues in the Genotype-Tissue Expression (GTEx) project, where considerably fewer samples are available from certain inaccessible tissues, such as the substantia nigra (SSN), than from accessible tissues, such as blood. This severely limits power for identifying tissue-specific expression quantitative trait loci (eQTL) in undersampled tissues. Here we propose Surrogate Phenotype Regression Analysis (SO_SCPLOWPRAYC_SCPLOW) for leveraging information from a correlated surrogate outcome (e.g. expression in blood) to improve inference on a partially missing target outcome (e.g. expression in SSN). Rather than regarding the surrogate outcome as a proxy for the target outcome, SO_SCPLOWPRAYC_SCPLOW jointly models the target and surrogate outcomes within a bivariate regression framework. Unobserved values of either outcome are treated as missing data. We describe and implement an expectation conditional maximization algorithm for performing estimation in the presence of bilateral outcome missingness. SO_SCPLOWPRAYC_SCPLOW estimates the same association parameter estimated by standard eQTL mapping and controls the type I error even when the target and surrogate outcomes are truly uncorrelated. We demonstrate analytically and empirically, using simulations and GTEx data, that in comparison with marginally modeling the target outcome, jointly modeling the target and surrogate outcomes increases estimation precision and improves power.
]]></description>
<dc:creator>McCaw, Z. R.</dc:creator>
<dc:creator>Gaynor, S. M.</dc:creator>
<dc:creator>Sun, R.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:date>2020-11-30</dc:date>
<dc:identifier>doi:10.1101/2020.11.29.403063</dc:identifier>
<dc:title><![CDATA[Cross-tissue eQTL mapping in the presence of missing data via surrogate outcome analysis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-11-30</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.08.25.505354v1?rss=1">
<title>
<![CDATA[
Modeling tissue co-regulation to estimate tissue-specific contributions to disease 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.08.25.505354v1?rss=1"
</link>
<description><![CDATA[
Integrative analyses of genome-wide association studies (GWAS) and gene expression data across diverse tissues and cell types have enabled the identification of putative disease-critical tissues. However, co-regulation of genetic effects on gene expression across tissues makes it difficult to distinguish biologically causal tissues from tagging tissues. While previous work emphasized the potential of accounting for tissue co-regulation, tissue-specific disease effects have not previously been formally modeled. Here, we introduce a new method, tissue co-regulation score regression (TCSC), that disentangles causal tissues from tagging tissues and partitions disease heritability (or covariance) into tissue-specific components. TCSC leverages gene-disease association statistics across tissues from transcriptome-wide association studies (TWAS), which implicate both causal and tagging genes and tissues. TCSC regresses TWAS chi-square statistics (or products of z-scores) on tissue co-regulation scores reflecting correlations of predicted gene expression across genes and tissues. In simulations, TCSC distinguishes causal tissues from tagging tissues while controlling type I error. We applied TCSC to GWAS summary statistics for 78 diseases and complex traits (average N = 302K) and gene expression prediction models for 48 GTEx tissues. TCSC identified 21 causal tissue-trait pairs at 5% FDR, including well-established findings, biologically plausible novel findings (e.g. aorta artery and glaucoma), and increased specificity of known tissue-trait associations (e.g. subcutaneous adipose, but not visceral adipose, and HDL). TCSC also identified 17 causal tissue-trait covariance pairs at 5% FDR. For the positive genetic covariance between BMI and red blood cell count, brain substantia nigra contributed positive covariance while pancreas contributed negative covariance; this suggests that genetic covariance may reflect distinct tissue-specific contributions. Overall, TCSC is a precise method for distinguishing causal tissues from tagging tissues, improving our understanding of disease and complex trait biology.
]]></description>
<dc:creator>Amariuta, T.</dc:creator>
<dc:creator>Siewert-Rocks, K.</dc:creator>
<dc:creator>Price, A. L.</dc:creator>
<dc:date>2022-08-26</dc:date>
<dc:identifier>doi:10.1101/2022.08.25.505354</dc:identifier>
<dc:title><![CDATA[Modeling tissue co-regulation to estimate tissue-specific contributions to disease]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-08-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.08.24.504550v1?rss=1">
<title>
<![CDATA[
A statistical genetics guide to identifying HLA alleles driving complex disease 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.08.24.504550v1?rss=1"
</link>
<description><![CDATA[
The human leukocyte antigen (HLA) locus is associated with more human complex diseases than any other locus. In many diseases it explains more heritability than all other known loci combined. Investigators have now demonstrated the accuracy of in silico HLA imputation methods. These approaches enable rapid and accurate estimation of HLA alleles in the millions of individuals that are already genotyped on microarrays. HLA imputation has been used to define causal variation in autoimmune diseases, such as type I diabetes, and infectious diseases, such as HIV infection control. However, there are few guidelines on performing HLA imputation, association testing, and fine-mapping. Here, we present comprehensive statistical genetics guide to impute HLA alleles from genotype data. We provide detailed protocols, including standard quality control measures for input genotyping data and describe options to impute HLA alleles and amino acids including a web-based Michigan Imputation Server. We updated the HLA imputation reference panel representing global populations (African, East Asian, European and Latino) available at the Michigan Imputation Server (n = 20,349) and achived high imputation accuracy (mean dosage correlation r = 0.981). We finally offer best practice recommendations to conduct association tests in order to define the alleles, amino acids, and haplotypes affecting human traits. This protocol will be broadly applicable to the large-scale genotyping data and contribute to defining the role of HLA in human diseases across global populations.
]]></description>
<dc:creator>Sakaue, S.</dc:creator>
<dc:creator>Gurajala, S.</dc:creator>
<dc:creator>Curtis, M.</dc:creator>
<dc:creator>Luo, Y.</dc:creator>
<dc:creator>Choi, W.</dc:creator>
<dc:creator>Ishigaki, K.</dc:creator>
<dc:creator>Kang, J. B.</dc:creator>
<dc:creator>Rumker, L.</dc:creator>
<dc:creator>Deutsch, A. J.</dc:creator>
<dc:creator>Schonherr, S.</dc:creator>
<dc:creator>Forer, L.</dc:creator>
<dc:creator>LeFaive, J.</dc:creator>
<dc:creator>Fuchsberger, C.</dc:creator>
<dc:creator>Han, B.</dc:creator>
<dc:creator>Lenz, T. L.</dc:creator>
<dc:creator>de Bakker, P. I. W.</dc:creator>
<dc:creator>Smith, A. V.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:date>2022-08-26</dc:date>
<dc:identifier>doi:10.1101/2022.08.24.504550</dc:identifier>
<dc:title><![CDATA[A statistical genetics guide to identifying HLA alleles driving complex disease]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-08-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.03.17.484479v1?rss=1">
<title>
<![CDATA[
Evidence-based calibration of computational tools for missense variant pathogenicity classification and ClinGen recommendations for clinical use of PP3/BP4 criteria 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.03.17.484479v1?rss=1"
</link>
<description><![CDATA[
Recommendations from the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) for interpreting sequence variants specify the use of computational predictors as Supporting level of evidence for pathogenicity or benignity using criteria PP3 and BP4, respectively. However, score intervals defined by tool developers, and ACMG/AMP recommendations that require the consensus of multiple predictors, lack quantitative support. Previously, we described a probabilistic framework that quantified the strengths of evidence (Supporting, Moderate, Strong, Very Strong) within ACMG/AMP recommendations. We have extended this framework to computational predictors and introduce a new standard that converts a tools scores to PP3 and BP4 evidence strengths. Our approach is based on estimating the local positive predictive value and can calibrate any computational tool or other continuous-scale evidence on any variant type. We estimate thresholds (score intervals) corresponding to each strength of evidence for pathogenicity and benignity for thirteen missense variant interpretation tools, using carefully assembled independent data sets. Most tools achieved Supporting evidence level for both pathogenic and benign classification using newly established thresholds. Multiple tools reached score thresholds justifying Moderate and several reached Strong evidence levels. One tool reached Very Strong evidence level for benign classification on some variants. Based on these findings, we provide recommendations for evidence-based revisions of the PP3 and BP4 ACMG/AMP criteria using individual tools and future assessment of computational methods for clinical interpretation.
]]></description>
<dc:creator>Pejaver, V.</dc:creator>
<dc:creator>Byrne, A. B.</dc:creator>
<dc:creator>Feng, B.-J.</dc:creator>
<dc:creator>Pagel, K. A.</dc:creator>
<dc:creator>Mooney, S. D.</dc:creator>
<dc:creator>Karchin, R.</dc:creator>
<dc:creator>O'Donnell-Luria, A.</dc:creator>
<dc:creator>Harrison, S. M.</dc:creator>
<dc:creator>Tavtigian, S. V.</dc:creator>
<dc:creator>Greenblatt, M. S.</dc:creator>
<dc:creator>Biesecker, L. G.</dc:creator>
<dc:creator>Radivojac, P.</dc:creator>
<dc:creator>Brenner, S. E.</dc:creator>
<dc:creator>ClinGen Sequence Variant Interpretation Working Group,</dc:creator>
<dc:date>2022-03-19</dc:date>
<dc:identifier>doi:10.1101/2022.03.17.484479</dc:identifier>
<dc:title><![CDATA[Evidence-based calibration of computational tools for missense variant pathogenicity classification and ClinGen recommendations for clinical use of PP3/BP4 criteria]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-03-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.02.23.481681v1?rss=1">
<title>
<![CDATA[
Enrichment of somatic mutations in schizophrenia brain targets prenatally active transcription factor bindings sites 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.02.23.481681v1?rss=1"
</link>
<description><![CDATA[
Schizophrenia (SCZ) is a complex neuropsychiatric disorder in which both germline genetic mutations and maternal factors, such as infection and immune activation, have been implicated, but how these two strikingly different mechanisms might converge on the same phenotype is unknown. During development, cells accumulate somatic, mosaic mutations in ways that can be shaped by the cellular environment or endogenous processes, but these early developmental mutational patterns have not been studied in SCZ. Here we analyzed deep (267x) whole-genome sequencing (WGS) of DNA from cerebral cortical neurons isolated from 61 SCZ and 25 control postmortem brains to capture mutations occurring before or during fetal neurogenesis. SCZ cases showed a >15% increase in genome-wide sSNV compared to controls (p < 2e-10). Remarkably, mosaic T>G mutations and CpG transversions (CpG>GpG or CpG>ApG) were 79- and 62-fold enriched, respectively, at transcription factor binding sites (TFBS) in SCZ, but not in controls. The pattern of T>G mutations resembles mutational processes in cancer attributed to oxidative damage that is sterically blocked from DNA repair by transcription factors (TFs) bound to damaged DNA. The CpG transversions similarly suggest unfinished DNA demethylation resulting in abasic sites that can also be blocked from repair by bound TFs. Allele frequency analysis suggests that both localized mutational spikes occur in the first trimester. We call this prenatal mutational process "skiagenesis" (from the Greek skia, meaning shadow), as these mutations occur in the shadow of bound TFs. Skiagenesis reflects as-yet unidentified prenatal factors and is associated with SCZ risk in a subset ([~]13%) of cases. In turn, mutational disruption of key TFBS active in fetal brain is well positioned to create SCZ-specific gene dysregulation in concert with germline risk genes. Skiagenesis provides a fingerprint for exploring how epigenomic regulation and prenatal factors such as maternal infection or immune activation may shape the developmental mutational landscape of human brain.
]]></description>
<dc:creator>Maury, E. A.</dc:creator>
<dc:creator>Jones, A.</dc:creator>
<dc:creator>Seplyarskiy, V.</dc:creator>
<dc:creator>Rosenbluh, C.</dc:creator>
<dc:creator>Bae, T.</dc:creator>
<dc:creator>Wang, Y.</dc:creator>
<dc:creator>Abyzov, A.</dc:creator>
<dc:creator>Khoshkoo, S.</dc:creator>
<dc:creator>Chahine, Y.</dc:creator>
<dc:creator>Brain Somatic Mosaicism Network,</dc:creator>
<dc:creator>Park, P. J.</dc:creator>
<dc:creator>Akbarian, S.</dc:creator>
<dc:creator>Lee, E. A.</dc:creator>
<dc:creator>Sunyaev, S. R.</dc:creator>
<dc:creator>Walsh, C. A.</dc:creator>
<dc:creator>Chess, A.</dc:creator>
<dc:date>2022-02-24</dc:date>
<dc:identifier>doi:10.1101/2022.02.23.481681</dc:identifier>
<dc:title><![CDATA[Enrichment of somatic mutations in schizophrenia brain targets prenatally active transcription factor bindings sites]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-02-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.09.491198v1?rss=1">
<title>
<![CDATA[
Deciphering the Impact of Genetic Variation on Human Polyadenylation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.09.491198v1?rss=1"
</link>
<description><![CDATA[
Genetic variants that disrupt polyadenylation can cause or contribute to genetic disorders. Yet, due to the complex cis-regulation of polyadenylation, variant interpretation remains challenging. Here, we introduce a residual neural network model, APARENT2, that can infer 3-cleavage and polyadenylation from DNA sequence more accurately than any previous model. This model generalizes to the case of alternative polyadenylation (APA) for a variable number of polyadenylation signals. We demonstrate APARENT2s performance on several variant datasets, including functional reporter data and human 3 aQTLs from GTEx. We apply neural network interpretation methods to gain insights into disrupted or protective higher-order features of polyadenylation. We fine-tune APARENT2 on human tissue-resolved transcriptomic data to elucidate tissue-specific variant effects. Finally, we perform in-silico saturation mutagenesis of all human polyadenylation signals and compare the predicted effects of >44 million variants against gnomAD. While loss-of-function variants were generally selected against, we also find specific clinical conditions linked to gain-of-function mutations. For example, using APARENT2s predictions we detect an association between gain-of-function mutations in the 3-end and Autism Spectrum Disorder.
]]></description>
<dc:creator>Linder, J.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Seelig, G.</dc:creator>
<dc:date>2022-05-10</dc:date>
<dc:identifier>doi:10.1101/2022.05.09.491198</dc:identifier>
<dc:title><![CDATA[Deciphering the Impact of Genetic Variation on Human Polyadenylation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.07.491045v1?rss=1">
<title>
<![CDATA[
Limited overlap of eQTLs and GWAS hits due to systematic differences in discovery 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.07.491045v1?rss=1"
</link>
<description><![CDATA[
Most signals in genome-wide association studies (GWAS) of complex traits point to noncoding genetic variants with putative gene regulatory effects. However, currently identified expression quantitative trait loci (eQTLs) explain only a small fraction of GWAS signals. By analyzing GWAS hits for complex traits in the UK Biobank, and cis-eQTLs from the GTEx consortium, we show that these assays systematically discover different types of genes and variants: eQTLs cluster strongly near transcription start sites, while GWAS hits do not. Genes near GWAS hits are enriched in numerous functional annotations, are under strong selective constraint and have a complex regulatory landscape across different tissue/cell types, while genes near eQTLs are depleted of most functional annotations, show relaxed constraint, and have simpler regulatory landscapes. We describe a model to understand these observations, including how natural selection on complex traits hinders discovery of functionally-relevant eQTLs. Our results imply that GWAS and eQTL studies are systematically biased toward different types of variants, and support the use of complementary functional approaches alongside the next generation of eQTL studies.
]]></description>
<dc:creator>Mostafavi, H.</dc:creator>
<dc:creator>Spence, J. P.</dc:creator>
<dc:creator>Naqvi, S.</dc:creator>
<dc:creator>Pritchard, J. K.</dc:creator>
<dc:date>2022-05-08</dc:date>
<dc:identifier>doi:10.1101/2022.05.07.491045</dc:identifier>
<dc:title><![CDATA[Limited overlap of eQTLs and GWAS hits due to systematic differences in discovery]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.08.18.504427v1?rss=1">
<title>
<![CDATA[
Recurrent mutation in the ancestry of a rare variant 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.08.18.504427v1?rss=1"
</link>
<description><![CDATA[
Recurrent mutation produces multiple copies of the same allele which may be co-segregating in a population. Yet most analyses of allele-frequency or site-frequency spectra assume that all observed copies of an allele trace back to a single mutation. We develop a sampling theory for the number of latent mutations in the ancestry of a rare variant, specifically a variant observed in relatively small count in a large sample. Our results follow from the statistical independence of low-count mutations, which we show to hold for the standard neutral coalescent or diffusion model of population genetics as well as for more general coalescent trees. For populations of constant size, these counts are given by the Ewens sampling formula. We develop a Poisson sampling model for populations of varying size, and illustrate it using new results for site-frequency spectra in an exponentially growing population. We apply our model to a large data set of human SNPs and use it to explain dramatic differences in site-frequency spectra across the range of mutation rates in the human genome.
]]></description>
<dc:creator>Wakeley, J.</dc:creator>
<dc:creator>Fan, W. T.</dc:creator>
<dc:creator>Koch, E.</dc:creator>
<dc:creator>Sunyaev, S.</dc:creator>
<dc:date>2022-08-18</dc:date>
<dc:identifier>doi:10.1101/2022.08.18.504427</dc:identifier>
<dc:title><![CDATA[Recurrent mutation in the ancestry of a rare variant]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-08-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.04.490680v1?rss=1">
<title>
<![CDATA[
A unique epigenomic landscape defines CD8+ tissue-resident memory T cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.04.490680v1?rss=1"
</link>
<description><![CDATA[
Memory T cells provide rapid and long-term protection against infection and tumors. The memory CD8+ T cell repertoire contains phenotypically and transcriptionally heterogeneous subsets with specialized functions and recirculation patterns. While these T cell populations have been well characterized in terms of differentiation potential and function, the epigenetic changes underlying memory T cell fate determination and tissue-residency remain largely unexplored. Here, we examined the single-cell chromatin landscape of CD8+ T cells over the course of acute viral infection. We reveal an early bifurcation of memory precursors displaying distinct chromatin accessibility and define epigenetic trajectories that lead to a circulating (TCIRC) or tissue-resident memory T (TRM) cell fate. While TRM cells displayed a conserved epigenetic signature across organs, we demonstrate that these cells exhibit tissue-specific signatures and identify transcription factors that regulate TRM cell populations in a site-specific manner. Moreover, we demonstrate that TRM cells and exhausted T (TEX) cells are distinct epigenetic lineages that are distinguishable early in their differentiation. Together, these findings show that TRM cell development is accompanied by dynamic alterations in chromatin accessibility that direct a unique transcriptional program resulting in a tissue-adapted and functionally distinct T cell state.

Graphical Abstract

O_FIG O_LINKSMALLFIG WIDTH=192 HEIGHT=200 SRC="FIGDIR/small/490680v1_ufig1.gif" ALT="Figure 1">
View larger version (56K):
org.highwire.dtl.DTLVardef@b03f1corg.highwire.dtl.DTLVardef@ff6871org.highwire.dtl.DTLVardef@220db2org.highwire.dtl.DTLVardef@1b15166_HPS_FORMAT_FIGEXP  M_FIG C_FIG HighlightsO_LIscATAC atlas reveals the epigenetic variance of memory CD8+ T cell subsets over the course of acute infection
C_LIO_LIEarly bifurcation of memory precursors leads to circulating versus tissue-resident cell fates
C_LIO_LIIntegrating transcriptional and epigenetic analyses identified organ-specific TRM cell regulators including HIC1 and BACH2
C_LIO_LIEpigenetic distinction of TRM cells and TEX cell subsets
C_LI
]]></description>
<dc:creator>Buquicchio, F. A.</dc:creator>
<dc:creator>Fonseca, R.</dc:creator>
<dc:creator>Belk, J. A.</dc:creator>
<dc:creator>Evrard, M.</dc:creator>
<dc:creator>Obers, A.</dc:creator>
<dc:creator>Qi, Y.</dc:creator>
<dc:creator>Daniel, B.</dc:creator>
<dc:creator>Yost, K. E.</dc:creator>
<dc:creator>Satpathy, A. T.</dc:creator>
<dc:creator>Mackay, L. K.</dc:creator>
<dc:date>2022-05-06</dc:date>
<dc:identifier>doi:10.1101/2022.05.04.490680</dc:identifier>
<dc:title><![CDATA[A unique epigenomic landscape defines CD8+ tissue-resident memory T cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.11.19.469318v1?rss=1">
<title>
<![CDATA[
Antigen presentation by type 3 innate lymphoid cells instructs the differentiation of gut microbiota-specific regulatory T cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.11.19.469318v1?rss=1"
</link>
<description><![CDATA[
The mutualistic relationship of gut-resident microbiota and cells of the host immune system promotes homeostasis that ensures maintenance of the microbial community and of a poised, but largely non-aggressive, immune cell compartment1, 2. Consequences of disturbing this balance, by environmental or genetic factors, include proximal inflammatory conditions, like Crohns disease, and systemic illnesses, both metabolic and autoimmune. One of the means by which this equilibrium is achieved is through induction of both effector and suppressor or regulatory arms of the adaptive immune system. In mice, Helicobacter species induce regulatory (iTreg) and follicular helper (Tfh) T cells in the colon-draining mesenteric lymph nodes under homeostatic conditions, but can instead induce inflammatory Th17 cells when iTreg cells are compromised3, 4. How Helicobacter hepaticus and other gut bacteria direct T cells to adopt distinct functions remains poorly understood. Here, we investigated which cells and molecular components are required to convey the microbial instruction for the iTreg differentiation program. We found that antigen presentation by cells expressing ROR{gamma}t, rather than by classical dendritic cells, was both required and sufficient for iTreg induction. These ROR{gamma}t+ cells, likely to be type 3 innate lymphoid cells (ILC3) and/or a recently-described population of Aire+ cells termed Janus cells5, require the MHC class II antigen presentation machinery, the chemokine receptor CCR7, and v integrin, which activates TGF-{beta}, for iTreg cell differentiation. In the absence of any of these, instead of iTreg cells there was expansion of microbiota-specific pathogenic Th17 cells, which were induced by other antigen presenting cells (APCs) that did not require CCR7. Thus, intestinal commensal microbes and their products target multiple APCs with pre-determined features suited to directing appropriate T cell differentiation programs, rather than a common APC that they endow with appropriate functions. Our results illustrate the ability of microbiota to exploit specialized functions of distinct innate immune system cells, targeting them to achieve the desired composition of equipoised T cells, thus maintaining tolerance.
]]></description>
<dc:creator>Kedmi, R.</dc:creator>
<dc:creator>Najar, T.</dc:creator>
<dc:creator>Mesa, K. R.</dc:creator>
<dc:creator>Grayson, A.</dc:creator>
<dc:creator>Kroehling, L.</dc:creator>
<dc:creator>Hao, Y.</dc:creator>
<dc:creator>Hao, S.</dc:creator>
<dc:creator>Pokrovskii, M.</dc:creator>
<dc:creator>Xu, M.</dc:creator>
<dc:creator>Talbot, J.</dc:creator>
<dc:creator>Wang, J.</dc:creator>
<dc:creator>Anderson, M. S.</dc:creator>
<dc:creator>Gardner, J. M.</dc:creator>
<dc:creator>Laufer, T. M.</dc:creator>
<dc:creator>Aifantis, I.</dc:creator>
<dc:creator>Bartleson, J. M.</dc:creator>
<dc:creator>Allen, P. M.</dc:creator>
<dc:creator>Stoeckius, M.</dc:creator>
<dc:creator>Littman, D. R.</dc:creator>
<dc:date>2021-11-20</dc:date>
<dc:identifier>doi:10.1101/2021.11.19.469318</dc:identifier>
<dc:title><![CDATA[Antigen presentation by type 3 innate lymphoid cells instructs the differentiation of gut microbiota-specific regulatory T cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-11-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.04.20.488974v1?rss=1">
<title>
<![CDATA[
Genome-wide CRISPR screens of T cell exhaustion identify chromatin remodeling factors that limit T cell persistence 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.04.20.488974v1?rss=1"
</link>
<description><![CDATA[
T cell exhaustion limits anti-tumor immunity, but the molecular determinants of this process remain poorly understood. Using a chronic antigen stimulation assay, we performed genome-wide CRISPR/Cas9 screens to systematically discover genetic regulators of T cell exhaustion, which identified an enrichment of epigenetic factors. In vivo CRISPR screens in murine and human tumor models demonstrated that perturbation of several epigenetic regulators, including members of the INO80 and BAF chromatin remodeling complexes, improved T cell persistence in tumors. In vivo paired CRISPR perturbation and single-cell RNA sequencing revealed distinct transcriptional roles of each complex and that depletion of canonical BAF complex members, including Arid1a, resulted in the maintenance of an effector program and downregulation of terminal exhaustion-related genes in tumor-infiltrating T cells. Finally, Arid1a-depletion limited the global acquisition of chromatin accessibility associated with T cell exhaustion and led to improved anti-tumor immunity after adoptive cell therapy. In summary, we provide a comprehensive atlas of the genetic regulators of T cell exhaustion and demonstrate that modulation of the epigenetic state of T cell exhaustion can improve T cell responses in cancer immunotherapy.
]]></description>
<dc:creator>Belk, J.</dc:creator>
<dc:creator>Yao, W.</dc:creator>
<dc:creator>Ly, N.</dc:creator>
<dc:creator>Freitas, K.</dc:creator>
<dc:creator>Chen, Y.-T.</dc:creator>
<dc:creator>Shi, Q.</dc:creator>
<dc:creator>Valencia, A.</dc:creator>
<dc:creator>Shifrut, E.</dc:creator>
<dc:creator>Kale, N.</dc:creator>
<dc:creator>Yost, K.</dc:creator>
<dc:creator>Duffy, C.</dc:creator>
<dc:creator>Hwee, M.</dc:creator>
<dc:creator>Miao, Z.</dc:creator>
<dc:creator>Ashworth, A.</dc:creator>
<dc:creator>Mackall, C.</dc:creator>
<dc:creator>Marson, A.</dc:creator>
<dc:creator>Carnevale, J.</dc:creator>
<dc:creator>Vardhana, S.</dc:creator>
<dc:creator>Satpathy, A.</dc:creator>
<dc:date>2022-04-21</dc:date>
<dc:identifier>doi:10.1101/2022.04.20.488974</dc:identifier>
<dc:title><![CDATA[Genome-wide CRISPR screens of T cell exhaustion identify chromatin remodeling factors that limit T cell persistence]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-04-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.04.18.488696v1?rss=1">
<title>
<![CDATA[
A flexible modeling and inference framework for estimating variant effect sizes from GWAS summary statistics 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.04.18.488696v1?rss=1"
</link>
<description><![CDATA[
Genome-wide association studies (GWAS) have highlighted that almost any trait is affected by many variants of relatively small effect. On one hand this presents a challenge for inferring the effect of any single variant as the signal-to-noise ratio is high for variants of small effect. This challenge is compounded when combining information across many variants in polygenic scores for predicting trait values. On the other hand, the large number of contributing variants provides an opportunity to learn about the average behavior of variants encoded in the distribution of variant effect sizes. Many approaches have looked at aspects of this problem, but no method has unified the inference of the effects of individual variants with the inference of the distribution of effect sizes while requiring only GWAS summary statistics and properly accounting for linkage disequilibrium between variants. Here we present a flexible, unifying framework that combines information across variants to infer a distribution of effect sizes and uses this distribution to improve the estimation of the effects of individual variants. We also develop a variational inference (VI) scheme to perform efficient inference under this framework. We show this framework is useful by constructing polygenic scores (PGSs) that outperform the state-of-the-art. Our modeling framework easily extends to jointly inferring effect sizes across multiple cohorts, where we show that building PGSs using additional cohorts of differing ancestries improves predictive accuracy and portability. We also investigate the inferred distributions of effect sizes across many traits and find that these distributions have effect sizes ranging over multiple orders of magnitude, in contrast to the assumptions implicit in many commonly-used statistical genetics methods.
]]></description>
<dc:creator>Spence, J. P.</dc:creator>
<dc:creator>Sinnott-Armstrong, N.</dc:creator>
<dc:creator>Assimes, T.</dc:creator>
<dc:creator>Pritchard, J. K.</dc:creator>
<dc:date>2022-04-19</dc:date>
<dc:identifier>doi:10.1101/2022.04.18.488696</dc:identifier>
<dc:title><![CDATA[A flexible modeling and inference framework for estimating variant effect sizes from GWAS summary statistics]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-04-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.06.29.498132v1?rss=1">
<title>
<![CDATA[
Integrative single-cell analysis of cardiogenesis identifies developmental trajectories and non-coding mutations in congenital heart disease 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.06.29.498132v1?rss=1"
</link>
<description><![CDATA[
Congenital heart defects, the most common birth disorders, are the clinical manifestation of anomalies in fetal heart development - a complex process involving dynamic spatiotemporal coordination among various precursor cell lineages. This complexity underlies the incomplete understanding of the genetic architecture of congenital heart diseases (CHDs). To define the multi-cellular epigenomic and transcriptional landscape of cardiac cellular development, we generated single-cell chromatin accessibility maps of human fetal heart tissues. We identified eight major differentiation trajectories involving primary cardiac cell types, each associated with dynamic transcription factor (TF) activity signatures. We identified similarities and differences of regulatory landscapes of iPSC-derived cardiac cell types and their in vivo counterparts. We interpreted deep learning models that predict cell-type resolved, base-resolution chromatin accessibility profiles from DNA sequence to decipher underlying TF motif lexicons and infer the regulatory impact of non-coding variants. De novo mutations predicted to affect chromatin accessibility in arterial endothelium were enriched in CHD cases versus controls. We used CRISPR-based perturbations to validate an enhancer harboring a nominated regulatory CHD mutation, linking it to effects on the expression of a known CHD gene JARID2. Together, this work defines the cell-type resolved cis-regulatory sequence determinants of heart development and identifies disruption of cell type-specific regulatory elements as a component of the genetic etiology of CHD.
]]></description>
<dc:creator>Ameen, M.</dc:creator>
<dc:creator>Sundaram, L.</dc:creator>
<dc:creator>Banerjee, A.</dc:creator>
<dc:creator>Shen, M.</dc:creator>
<dc:creator>Kundu, S.</dc:creator>
<dc:creator>Nair, S.</dc:creator>
<dc:creator>Shcherbina, A.</dc:creator>
<dc:creator>Gu, M.</dc:creator>
<dc:creator>Wilson, K. D.</dc:creator>
<dc:creator>Varadarajan, A.</dc:creator>
<dc:creator>Vadgama, N.</dc:creator>
<dc:creator>Balsubramani, A.</dc:creator>
<dc:creator>Wu, J. C.</dc:creator>
<dc:creator>Engreitz, J.</dc:creator>
<dc:creator>Farh, K.</dc:creator>
<dc:creator>Karakikes, I.</dc:creator>
<dc:creator>Wang, K. C.</dc:creator>
<dc:creator>Quertermous, T.</dc:creator>
<dc:creator>Greenleaf, W.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:date>2022-06-29</dc:date>
<dc:identifier>doi:10.1101/2022.06.29.498132</dc:identifier>
<dc:title><![CDATA[Integrative single-cell analysis of cardiogenesis identifies developmental trajectories and non-coding mutations in congenital heart disease]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-06-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.09.24.461597v1?rss=1">
<title>
<![CDATA[
Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.09.24.461597v1?rss=1"
</link>
<description><![CDATA[
Gene expression at the individual cell-level resolution, as quantified by single-cell RNA-sequencing (scRNA-seq), can provide unique insights into the pathology and cellular origin of diseases and complex traits. Here, we introduce single-cell Disease Relevance Score (scDRS), an approach that links scRNA-seq with polygenic risk of disease at individual cell resolution without the need for annotation of individual cells to cell types; scDRS identifies individual cells that show excess expression levels for genes in a disease-specific gene set constructed from GWAS data. We determined via simulations that scDRS is well-calibrated and powerful in identifying individual cells associated to disease. We applied scDRS to GWAS data from 74 diseases and complex traits (average N =346K) in conjunction with 16 scRNA-seq data sets spanning 1.3 million cells from 31 tissues and organs. At the cell type level, scDRS broadly recapitulated known links between classical cell types and disease, and also produced novel biologically plausible findings. At the individual cell level, scDRS identified subpopulations of disease-associated cells that are not captured by existing cell type labels, including subpopulations of CD4+ T cells associated with inflammatory bowel disease, partially characterized by their effector-like states; subpopulations of hippocampal CA1 pyramidal neurons associated with schizophrenia, partially characterized by their spatial location at the proximal part of the hippocampal CA1 region; and subpopulations of hepatocytes associated with triglyceride levels, partially characterized by their higher ploidy levels. At the gene level, we determined that genes whose expression across individual cells was correlated with the scDRS score (thus reflecting co-expression with GWAS disease genes) were strongly enriched for gold-standard drug target and Mendelian disease genes.
]]></description>
<dc:creator>Zhang, M. J.</dc:creator>
<dc:creator>Hou, K.</dc:creator>
<dc:creator>Dey, K. K.</dc:creator>
<dc:creator>Jagadeesh, K. A.</dc:creator>
<dc:creator>Weinand, K.</dc:creator>
<dc:creator>Sakaue, S.</dc:creator>
<dc:creator>Taychameekiatchai, A.</dc:creator>
<dc:creator>Rao, P.</dc:creator>
<dc:creator>Pisco, A. O.</dc:creator>
<dc:creator>Zou, J.</dc:creator>
<dc:creator>Wang, B.</dc:creator>
<dc:creator>Gandal, M.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:creator>Pasaniuc, B.</dc:creator>
<dc:creator>Price, A. L.</dc:creator>
<dc:date>2021-09-28</dc:date>
<dc:identifier>doi:10.1101/2021.09.24.461597</dc:identifier>
<dc:title><![CDATA[Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-09-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.09.02.279059v1?rss=1">
<title>
<![CDATA[
Unique contribution of enhancer-driven and master-regulator genes to autoimmune disease revealed using functionally informed SNP-to-gene linking strategies 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.09.02.279059v1?rss=1"
</link>
<description><![CDATA[
Gene regulation is known to play a fundamental role in human disease, but mechanisms of regulation vary greatly across genes. Here, we explore the contributions to disease of two types of genes: genes whose regulation is driven by enhancer regions as opposed to promoter regions (enhancer-related) and genes that regulate other genes in trans (candidate master-regulator). We link these genes to SNPs using a comprehensive set of SNP-to-gene (S2G) strategies and apply stratified LD score regression to the resulting SNP annotations to draw three main conclusions about 11 autoimmune diseases and blood cell traits (average Ncase=13K across 6 autoimmune diseases, average N =443K across 5 blood cell traits). First, several characterizations of enhancer-related genes defined in blood using functional genomics data (e.g. ATAC-seq, RNA-seq, PC-HiC) are conditionally informative for autoimmune disease heritability, after conditioning on a broad set of regulatory annotations from the baseline-LD model. Second, candidate master-regulator genes defined using trans-eQTL in blood are also conditionally informative for autoimmune disease heritability. Third, integrating enhancer-related and candidate master-regulator gene sets with protein-protein interaction (PPI) network information magnified their disease signal. The resulting PPI-enhancer gene score produced >2x stronger conditional signal (maximum standardized SNP annotation effect size ({tau}*) = 2.0 (s.e. 0.3) vs. 0.91 (s.e. 0.21)), and >2x stronger gene-level enrichment for approved autoimmune disease drug targets (5.3x vs. 2.1x), as compared to the recently proposed Enhancer Domain Score (EDS). In each case, using functionally informed S2G strategies to link genes to SNPs that may regulate them produced much stronger disease signals (4.1x-13x larger{tau} * values) than conventional window-based S2G strategies. We conclude that our characterizations of enhancer-related and candidate master-regulator genes identify gene sets that are important for autoimmune disease, and that combining those gene sets with functionally informed S2G strategies enables us to identify SNP annotations in which disease heritability is concentrated.
]]></description>
<dc:creator>Dey, K. K.</dc:creator>
<dc:creator>Gazal, S. K.</dc:creator>
<dc:creator>van de Geijn, B.</dc:creator>
<dc:creator>Kim, S. S.</dc:creator>
<dc:creator>Nasser, J.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:creator>Price, A.</dc:creator>
<dc:date>2020-09-03</dc:date>
<dc:identifier>doi:10.1101/2020.09.02.279059</dc:identifier>
<dc:title><![CDATA[Unique contribution of enhancer-driven and master-regulator genes to autoimmune disease revealed using functionally informed SNP-to-gene linking strategies]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-09-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.11.08.515683v1?rss=1">
<title>
<![CDATA[
Analysis of alternative polyadenylation from long-read or short-read RNA-seq with LAPA 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.11.08.515683v1?rss=1"
</link>
<description><![CDATA[
MotivationAlternative polyadenylation (APA) is a major mechanism that increases transcriptional diversity and regulates mRNA abundance. Existing computational tools to analyze APA have low precision because these tools are designed for short-read RNA-seq, which is a suboptimal data source to study APA. Long-read RNA-seq (LR-RNA-seq) accurately detects complete transcript isoforms with poly(A)-tails, providing an ideal data source to study APA. However, current computational tools are incompatible with LR-RNA-seq.

ResultsHere, we introduce LAPA, a computational toolkit to study alternative polyadenylation (APA) from diverse data sources such as LR-RNA-seq and short-read 3 sequencing (3-seq). LAPA counts and clusters reads with poly(A)-tail, then performs peak-calling to detect poly(A)-site in a data source agnostic manner. The resulting peaks are annotated based on genomics features and regulatory sequence elements such as presence of a poly(A)-signal. Finally, LAPA can perform robust statistical testing and multiple testing correction to detect differential APA.

We analyzed ENCODE LR-RNA-seq data from human WTC11, mouse C2C12 myoblast, and C2C12-derived differentiated myotube cells using LAPA. Comparing LR-RNA-seq from different platforms and library preparation methods against 3-seq shows that LR-RNA-seq detects poly(A)-sites with a performance of 75% precision at 57% recall. Moreover, LAPA consistently improved TES validation by at least 25% over the baseline transcriptome annotation generated by TALON, independent of protocol or platform. Differential APA analysis detected 788 statistically significant genes with unique polyadenylation signatures between undifferentiated myoblast and differentiated myotube cells. Among these genes, 3 UTR elongation is significantly associated with higher expression, while shortening is linked with lower expression. This analysis reveals a link between cell state/identity and APA. Overall, our results show that LR-RNA-seq is a reliable data source for the study of post-transcriptional regulation by providing precise information about alternative polyadenylation.

AvailabilityLAPA is publicly available at https://github.com/mortazavilab/lapa and PyPI.

Contact:: ali.mortazavi@uci.edu
]]></description>
<dc:creator>Celik, M. H.</dc:creator>
<dc:creator>Mortazavi, A.</dc:creator>
<dc:date>2022-11-08</dc:date>
<dc:identifier>doi:10.1101/2022.11.08.515683</dc:identifier>
<dc:title><![CDATA[Analysis of alternative polyadenylation from long-read or short-read RNA-seq with LAPA]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-11-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.10.06.463360v1?rss=1">
<title>
<![CDATA[
A systematic genotype-phenotype map for missense variants in the human intellectual disability-associated gene GDI1 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.10.06.463360v1?rss=1"
</link>
<description><![CDATA[
Next generation sequencing has become a common tool in the diagnosis of genetic diseases. However, for the vast majority of genetic variants that are discovered, a clinical interpretation is not available. Variant effect mapping allows the functional effects of many single amino acid variants to be characterized in parallel. Here, we combine multiplexed functional assays with machine learning to assess the effects of amino acid substitutions in the human intellectual disability-associated gene, GDI1. We show that the resulting variant effect map can be used to discriminate pathogenic from benign variants. Our variant effect map recovers known biochemical and structural features of GDI1 and reveals additional aspects of GDI1 function. We explore how our functional assays can aid in the interpretation of novel GDI1 variants as they are discovered, and to re-classify previously observed variants of unknown significance.
]]></description>
<dc:creator>Silverstein, R. A.</dc:creator>
<dc:creator>Sun, S. A.</dc:creator>
<dc:creator>Verby, M.</dc:creator>
<dc:creator>Weile, J.</dc:creator>
<dc:creator>Wu, Y.</dc:creator>
<dc:creator>Roth, F. P.</dc:creator>
<dc:date>2021-10-06</dc:date>
<dc:identifier>doi:10.1101/2021.10.06.463360</dc:identifier>
<dc:title><![CDATA[A systematic genotype-phenotype map for missense variants in the human intellectual disability-associated gene GDI1]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-10-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.11.29.470445v1?rss=1">
<title>
<![CDATA[
MaveDB v2: a curated community database with over three million variant effects from multiplexed functional assays 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.11.29.470445v1?rss=1"
</link>
<description><![CDATA[
A central problem in genomics is understanding the effect of individual DNA variants. Multiplexed Assays of Variant Effect (MAVEs) can help address this challenge by measuring all possible single nucleotide variant effects in a gene or regulatory sequence simultaneously. Here we describe MaveDB v2, which has become the database of record for MAVEs. MaveDB now contains a large fraction of published studies, comprising over two hundred datasets and three million variant effect measurements. We created tools and APIs to streamline data submission and access, transforming MaveDB into a hub for the analysis and dissemination of these impactful datasets.
]]></description>
<dc:creator>Rubin, A. F.</dc:creator>
<dc:creator>Min, J. K.</dc:creator>
<dc:creator>Rollins, N. J.</dc:creator>
<dc:creator>Da, E. Y.</dc:creator>
<dc:creator>Esposito, D.</dc:creator>
<dc:creator>Harrington, M.</dc:creator>
<dc:creator>Stone, J.</dc:creator>
<dc:creator>Bianchi, A. H.</dc:creator>
<dc:creator>Fu, Y.</dc:creator>
<dc:creator>Gallaher, M.</dc:creator>
<dc:creator>Li, I.</dc:creator>
<dc:creator>Moscatelli, O.</dc:creator>
<dc:creator>Ong, J. Y.</dc:creator>
<dc:creator>Rollins, J. E.</dc:creator>
<dc:creator>Wakefield, M. J.</dc:creator>
<dc:creator>Ye, S.</dc:creator>
<dc:creator>Tam, A.</dc:creator>
<dc:creator>McEwen, A. E.</dc:creator>
<dc:creator>Starita, L. M.</dc:creator>
<dc:creator>Bryant, V. L.</dc:creator>
<dc:creator>Marks, D. S.</dc:creator>
<dc:creator>Fowler, D. M.</dc:creator>
<dc:date>2021-11-30</dc:date>
<dc:identifier>doi:10.1101/2021.11.29.470445</dc:identifier>
<dc:title><![CDATA[MaveDB v2: a curated community database with over three million variant effects from multiplexed functional assays]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-11-30</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.10.06.511211v1?rss=1">
<title>
<![CDATA[
Chemico-genetic Analysis of Native Autism Proteomes Reveals Shared Biology Predictive of Functional Modifiers 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.10.06.511211v1?rss=1"
</link>
<description><![CDATA[
One of the main drivers of autism spectrum disorder is risk alleles within hundreds of genes, which may interact within shared but unknown protein complexes. Here we develop a scalable genome-editing-mediated approach to target 14 high-confidence autism risk genes within the mouse brain for proximity-based endogenous proteomics, achieving high specificity spatial interactomes compared to prior methods. The resulting native proximity interactomes are enriched for human genes dysregulated in the brain of autistic individuals and reveal unexpected and highly significant interactions with other lower-confidence autism risk gene products, positing new avenues to prioritize genetic risk. Importantly, the datasets are enriched for shared cellular functions and genetic interactions that may underlie the condition. We test this notion by spatial proteomics and CRISPR-based regulation of expression in two autism models, demonstrating functional interactions that modulate mechanisms of their dysregulation. Together, these results reveal native proteome networks in vivo relevant to autism, providing new inroads for understanding and manipulating the cellular drivers underpinning its etiology.
]]></description>
<dc:creator>Gao, Y.</dc:creator>
<dc:creator>Trn, M.</dc:creator>
<dc:creator>Shonai, D.</dc:creator>
<dc:creator>Zhao, J.</dc:creator>
<dc:creator>Soderblom, E. J.</dc:creator>
<dc:creator>Garcia-moreno, S. A.</dc:creator>
<dc:creator>Gersbach, C. A.</dc:creator>
<dc:creator>Wetsel, W. C.</dc:creator>
<dc:creator>Dawson, G.</dc:creator>
<dc:creator>Velmeshev, D.</dc:creator>
<dc:creator>Jiang, Y.-h.</dc:creator>
<dc:creator>Sloofman, L.</dc:creator>
<dc:creator>Buxbaum, J.</dc:creator>
<dc:creator>Soderling, S. H.</dc:creator>
<dc:date>2022-10-07</dc:date>
<dc:identifier>doi:10.1101/2022.10.06.511211</dc:identifier>
<dc:title><![CDATA[Chemico-genetic Analysis of Native Autism Proteomes Reveals Shared Biology Predictive of Functional Modifiers]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-10-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.09.20.459182v1?rss=1">
<title>
<![CDATA[
Assessing computational variant effect predictors with a large prospective cohort 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.09.20.459182v1?rss=1"
</link>
<description><![CDATA[
BackgroundCausal gene/trait relationships can be identified via observation of an excess (or reduced) burden of rare variation in a given gene within humans who have that trait. Although computational predictors can improve the power of such  burden tests, it is unclear which are optimal for this task.

MethodUsing 140 gene-trait combinations with a reported rare-variant burden association, we evaluated the ability of 20 computational predictors to predict human traits. We used the best-performing predictors to increase the power of genome-wide rare variant burden scans based on [~]450K UK Biobank participants.

ResultsTwo predictors--VARITY and REVEL--outperformed all others in predicting human traits in the UK Biobank from missense variation. Genome-scale burden scans using the two best-performing predictors identified 1,038 gene-trait associations (FDR < 5%), including 567 (55%) that had not been previously reported. We explore 54 cardiovascular gene-trait associations (including 15 not reported in other burden scans) in greater depth.

ConclusionsRigorous selection of computational missense variant effect predictors can improve the power of rare-variant burden scans for human gene-trait associations, yielding many new associations with potential value in informing mechanistic understanding and therapeutic development. The strategy we describe here is generalizable to future computational variant effect predictors, traits and organisms.
]]></description>
<dc:creator>Kuang, D.</dc:creator>
<dc:creator>Li, R.</dc:creator>
<dc:creator>Wu, Y.</dc:creator>
<dc:creator>Weile, J.</dc:creator>
<dc:creator>Hegele, R. A.</dc:creator>
<dc:creator>Roth, F. P.</dc:creator>
<dc:date>2021-09-20</dc:date>
<dc:identifier>doi:10.1101/2021.09.20.459182</dc:identifier>
<dc:title><![CDATA[Assessing computational variant effect predictors with a large prospective cohort]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-09-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.06.27.497649v1?rss=1">
<title>
<![CDATA[
Epo-IGF1R crosstalk expands stress-specific progenitors in regenerative erythropoiesis and myeloproliferative neoplasm 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.06.27.497649v1?rss=1"
</link>
<description><![CDATA[
We find that in regenerative erythropoiesis, the erythroid progenitor landscape is reshaped, and a previously undescribed progenitor population with CFU-E activity (stress CFU-E/sCFU-E) is markedly expanded to restore the erythron. sCFU-E are targets of erythropoietin (Epo) and sCFU-E expansion requires signaling from the Epo receptor (EpoR) cytoplasmic tyrosines. Molecularly, Epo promotes sCFU-E expansion via JAK2/STAT5-dependent expression of IRS2, thus engaging the pro-growth signaling from the IGF1 receptor (IGF1R). Inhibition of IGF1R/IRS2 signaling impairs sCFU-E cell growth, whereas exogenous IRS2 expression rescues cell growth in sCFU-E expressing truncated EpoR lacking cytoplasmic tyrosines. This sCFU-E pathway is the major pathway involved in erythrocytosis driven by the oncogenic JAK2 mutant, JAK2(V617F), in myeloproliferative neoplasm. Inability to expand sCFU-E cells by truncated EpoR protects against JAK2(V617F)-driven erythrocytosis. In myeloproliferative neoplasm patient samples, the number of sCFU-E like cells increases, and inhibition of IGR1R/IRS2 signaling blocks Epo-hypersensitive erythroid cell colony formation. In summary, we identify a new stress-specific erythroid progenitor cell population that links regenerative erythropoiesis to pathogenic erythrocytosis.

Key PointsO_LIEpo-induced IRS2 allows engagement of IGF1R signaling to expand a previously unrecognized progenitor population in erythropoietic stress.
C_LIO_LITruncated EpoR does not support stress CFU-E expansion and protects against JAK2(V617F)-driven erythrocytosis in MPN.
C_LI
]]></description>
<dc:creator>Huang, L.</dc:creator>
<dc:creator>Hsieh, H.-h.</dc:creator>
<dc:creator>Yao, H.</dc:creator>
<dc:creator>Ma, Y.</dc:creator>
<dc:creator>Zhang, Y.</dc:creator>
<dc:creator>Xiao, X.</dc:creator>
<dc:creator>Stephens, H.</dc:creator>
<dc:creator>Chung, S. S.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Xu, J.</dc:creator>
<dc:creator>Rampal, R. K.</dc:creator>
<dc:date>2022-06-29</dc:date>
<dc:identifier>doi:10.1101/2022.06.27.497649</dc:identifier>
<dc:title><![CDATA[Epo-IGF1R crosstalk expands stress-specific progenitors in regenerative erythropoiesis and myeloproliferative neoplasm]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-06-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/256313v1?rss=1">
<title>
<![CDATA[
Massively parallel dissection of human accelerated regions in human and chimpanzee neural progenitors 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/256313v1?rss=1"
</link>
<description><![CDATA[
Using machine learning (ML), we interrogated the function of all human-chimpanzee variants in 2,645 Human Accelerated Regions (HARs), some of the fastest evolving regions of the human genome. We predicted that 43% of HARs have variants with large opposing effects on chromatin state and 14% on neurodevelopmental enhancer activity. This pattern, consistent with compensatory evolution, was confirmed using massively parallel reporter assays in human and chimpanzee neural progenitor cells. The species-specific enhancer activity of assayed HARs was accurately predicted from the presence and absence of transcription factor footprints in each species. Despite these striking cis effects, activity of a given HAR sequence was nearly identical in human and chimpanzee cells. These findings suggest that HARs did not evolve to compensate for changes in the trans environment but instead altered their ability to bind factors present in both species. Thus, ML prioritized variants with functional effects on human neurodevelopment and revealed an unexpected reason why HARs may have evolved so rapidly.
]]></description>
<dc:creator>Ryu, H.</dc:creator>
<dc:creator>Inoue, F.</dc:creator>
<dc:creator>Whalen, S.</dc:creator>
<dc:creator>Williams, A.</dc:creator>
<dc:creator>Kircher, M.</dc:creator>
<dc:creator>Martin, B.</dc:creator>
<dc:creator>Alvarado, B.</dc:creator>
<dc:creator>Samee, M. A. H.</dc:creator>
<dc:creator>Keough, K.</dc:creator>
<dc:creator>Thomas, S.</dc:creator>
<dc:creator>Kriegstein, A.</dc:creator>
<dc:creator>Shendure, J.</dc:creator>
<dc:creator>Pollen, A.</dc:creator>
<dc:creator>Ahituv, N.</dc:creator>
<dc:creator>Pollard, K.</dc:creator>
<dc:date>2018-01-29</dc:date>
<dc:identifier>doi:10.1101/256313</dc:identifier>
<dc:title><![CDATA[Massively parallel dissection of human accelerated regions in human and chimpanzee neural progenitors]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2018-01-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.10.05.511030v1?rss=1">
<title>
<![CDATA[
Integrative dissection of gene regulatory elements at base resolution 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.10.05.511030v1?rss=1"
</link>
<description><![CDATA[
Although vast numbers of putative gene regulatory elements have been cataloged, the sequence motifs and individual bases that underlie their functions remain largely unknown. Here we combine epigenetic perturbations, base editing, and deep learning models to dissect regulatory sequences within the exemplar immune locus encoding CD69. Focusing on a differentially accessible and acetylated upstream enhancer, we find that the complementary strategies converge on a [~]170 base interval as critical for CD69 induction in stimulated Jurkat T cells. We pinpoint individual cytosine to thymine base edits that markedly reduce element accessibility and acetylation, with corresponding reduction of CD69 expression. The most potent base edits may be explained by their effect on binding competition between the transcriptional activator GATA3 and the repressor BHLHE40. Systematic analysis of GATA and bHLH/Ebox motifs suggests that interplay between these factors plays a general role in rapid T cell transcriptional responses. Our study provides a framework for parsing gene regulatory elements in their endogenous chromatin contexts and identifying operative artificial variants.

HighlightsO_LIBase editing screens and deep learning pinpoint sequences and single bases affecting immune gene expression
C_LIO_LIAn artificial C-to-T variant in a regulatory element suppresses CD69 expression by altering the balance of transcription factor binding
C_LIO_LICompetition between GATA3 and BHLHE40 regulates inducible immune genes and T cell states
C_LI
]]></description>
<dc:creator>Chen, Z.</dc:creator>
<dc:creator>Javed, N. M.</dc:creator>
<dc:creator>Moore, M.</dc:creator>
<dc:creator>Wu, J.</dc:creator>
<dc:creator>Vinyard, M. E.</dc:creator>
<dc:creator>Pinello, L.</dc:creator>
<dc:creator>Najm, F.</dc:creator>
<dc:creator>Bernstein, B. E.</dc:creator>
<dc:date>2022-10-06</dc:date>
<dc:identifier>doi:10.1101/2022.10.05.511030</dc:identifier>
<dc:title><![CDATA[Integrative dissection of gene regulatory elements at base resolution]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-10-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.04.07.487515v1?rss=1">
<title>
<![CDATA[
CHD-associated enhancers shape human cardiomyocyte lineage commitment 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.04.07.487515v1?rss=1"
</link>
<description><![CDATA[
Enhancers orchestrate gene expression programs that drive multicellular development and lineage commitment. Thus, genetic variants at enhancers are thought to contribute to developmental diseases by altering cell fate commitment. However, while many variant-containing enhancers have been identified, studies to endogenously test the impact of these enhancers on lineage commitment have been lacking. We perform a single-cell CRISPRi screen to assess the endogenous roles of 25 enhancers and putative cardiac target genes implicated in genetic studies of congenital heart defects (CHD). We identify 16 enhancers whose repression leads to deficient differentiation of human cardiomyocytes (CMs). A focused CRISPRi validation screen shows that repression of TBX5 enhancers delays the transcriptional switch from mid- to late-stage CM states. Endogenous genetic deletions of two TBX5 enhancers phenocopy epigenetic perturbations. Together, these results identify critical enhancers of cardiac development and suggest that misregulation of these enhancers could contribute to cardiac defects in human patients.

HIGHLIGHTSO_LISingle-cell enhancer perturbation screens during human cardiomyocyte differentiation.
C_LIO_LIPerturbation of CHD-linked enhancers/genes causes deficient CM differentiation.
C_LIO_LIRepression or knockout of TBX5 enhancers delays transition from mid to late CM states.
C_LIO_LIDeficient differentiation coincides with reduced expression of known cardiac genes.
C_LI
]]></description>
<dc:creator>Armendariz, D. A.</dc:creator>
<dc:creator>Goetsch, S. C.</dc:creator>
<dc:creator>Wang, Y.</dc:creator>
<dc:creator>Xie, S.</dc:creator>
<dc:creator>Munshi, N. V.</dc:creator>
<dc:creator>Hon, G. C.</dc:creator>
<dc:date>2022-04-10</dc:date>
<dc:identifier>doi:10.1101/2022.04.07.487515</dc:identifier>
<dc:title><![CDATA[CHD-associated enhancers shape human cardiomyocyte lineage commitment]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.07.26.501609v1?rss=1">
<title>
<![CDATA[
Dynamic states of cervical epithelia during pregnancy and epithelial barrier disruption 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.07.26.501609v1?rss=1"
</link>
<description><![CDATA[
The cervical epithelium undergoes continuous changes in proliferation, differentiation, and function that are critical before pregnancy to ensure fertility and during pregnancy to provide a physical and immunoprotective barrier for pregnancy maintenance. Barrier disruption can lead to the ascension of pathogens that elicit inflammatory responses and preterm birth. Here, we identify cervical epithelial subtypes in nonpregnant, pregnant, and in-labor mice using single-cell transcriptome and spatial analysis. We identify heterogeneous subpopulations of epithelia displaying spatial and temporal specificity. Notably, two goblet cell subtypes with distinct transcriptional programs and mucosal networks were dominant in pregnancy. Untimely basal cell proliferation and goblet cells with diminished mucosal integrity characterize barrier dysfunction in mice lacking hyaluronan. These data demonstrate how the cervical epithelium undergoes continuous remodeling to maintain dynamic states of homeostasis in pregnancy and labor, and provide a framework to understand perturbations in epithelial health and host-microbe interactions that increase the risk of premature birth.
]]></description>
<dc:creator>Cooley, A.</dc:creator>
<dc:creator>Madhukaran, S.</dc:creator>
<dc:creator>Stroebele, E.</dc:creator>
<dc:creator>Caraballo, M. C.</dc:creator>
<dc:creator>Wang, L.</dc:creator>
<dc:creator>Hon, G.</dc:creator>
<dc:creator>Mahendroo, M.</dc:creator>
<dc:date>2022-07-28</dc:date>
<dc:identifier>doi:10.1101/2022.07.26.501609</dc:identifier>
<dc:title><![CDATA[Dynamic states of cervical epithelia during pregnancy and epithelial barrier disruption]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-07-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.06.27.497796v1?rss=1">
<title>
<![CDATA[
CROP-Seq: a single-cell CRISPRi platform for characterizing candidate genes relevant to metabolic disorders in human adipocytes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.06.27.497796v1?rss=1"
</link>
<description><![CDATA[
ObjectiveCROP-Seq combines gene silencing using CRISPR interference (CRISPRi) with single-cell RNA sequencing (scRNA-Seq) to conduct a functional reverse genetic screen of novel gene targets associated with adipocyte differentiation or function, with single-cell transcriptomes as the readout.

MethodsWe created a human preadipocyte SGBS cell line with stable expression of KRAB-dCas9 for CRISPRi-mediated gene knock-down. This line was transduced with a lentiviral library of sgRNAs targeting 6 genes of interest (3 sgRNAs / gene, 18 sgRNAs), 6 positive control genes (3 sgRNAs / gene, 18 sgRNAs), and non-targeting control sgRNAs (4 sgRNAs). Transduced cells were selected and differentiated, and individual cells were captured using microfluidics at day 0, 4 and 8 of adipogenic differentiation. Next, expression and sgRNA libraries were created and sequenced. Bioinformatic analysis of resulting scRNA-Seq expression data was used to determine the effects of gene knock-down and the dysregulated pathways, and to predict cellular phenotypes.

ResultsSingle-cell transcriptomes obtained from SGBS cells following CRISPRi recapitulate different states of differentiation from preadipocytes to adipocytes. We confirmed successful knock-down of targeted genes. Transcriptome-wide changes were observed for all targeted genes, with over 400 differentially expressed genes identified per gene at least at one timepoint. Knock-down of known adipogenesis regulators PPARG and CEBPB inhibited adipogenesis. Gene set enrichment analyses revealed molecular processes for adipose tissue differentiation and function for novel genes. MAFF knock-down led to a downregulation of transcriptional response to proinflammatory cytokine TNF- in preadipocytes. TIPARP knock-down resulted in an increase in the expression of a beiging marker UCP1 at D8 of adipogenesis.

ConclusionsThe CROP-Seq system in SGBS cells can determine the consequences of target gene knock-down at the transcriptome level. This powerful, hypothesis-free tool can identify novel regulators of adipogenesis, preadipocyte and adipocyte function associated with metabolic disease.

HighlightsO_LICRISPR interference screen coupled with single-cell RNA sequencing (CROP-Seq)
C_LIO_LIParallel screening of 12 genes in human SGBS adipocytes and preadipocytes
C_LIO_LIUncovered novel regulators of adipogenesis and adipocyte function
C_LI

Graphical abstract

O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=127 SRC="FIGDIR/small/497796v1_ufig1.gif" ALT="Figure 1">
View larger version (33K):
org.highwire.dtl.DTLVardef@1ee34b6org.highwire.dtl.DTLVardef@1c6ba8aorg.highwire.dtl.DTLVardef@fa89org.highwire.dtl.DTLVardef@403966_HPS_FORMAT_FIGEXP  M_FIG C_FIG
]]></description>
<dc:creator>Bielczyk-Maczynska, E.</dc:creator>
<dc:creator>Sharma, D.</dc:creator>
<dc:creator>Blencowe, M.</dc:creator>
<dc:creator>Saliba-Gustafsson, P.</dc:creator>
<dc:creator>Gloudemans, M. J.</dc:creator>
<dc:creator>Yang, X.</dc:creator>
<dc:creator>Carcamo-Orive, I.</dc:creator>
<dc:creator>Wabitsch, M.</dc:creator>
<dc:creator>Svensson, K. J.</dc:creator>
<dc:creator>Park, C. Y.</dc:creator>
<dc:creator>Quertermous, T.</dc:creator>
<dc:creator>Knowles, J. W.</dc:creator>
<dc:creator>Li, J.</dc:creator>
<dc:date>2022-06-27</dc:date>
<dc:identifier>doi:10.1101/2022.06.27.497796</dc:identifier>
<dc:title><![CDATA[CROP-Seq: a single-cell CRISPRi platform for characterizing candidate genes relevant to metabolic disorders in human adipocytes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-06-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.11.01.514606v1?rss=1">
<title>
<![CDATA[
Mapping the convergence of genes for coronary artery disease onto endothelial cell programs 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.11.01.514606v1?rss=1"
</link>
<description><![CDATA[
Genome-wide association studies (GWAS) have discovered thousands of risk loci for common, complex diseases, each of which could point to genes and gene programs that influence disease. For some diseases, it has been observed that GWAS signals converge on a smaller number of biological programs, and that this convergence can help to identify causal genes1-6. However, identifying such convergence remains challenging: each GWAS locus can have many candidate genes, each gene might act in one or more possible programs, and it remains unclear which programs might influence disease risk. Here, we developed a new approach to address this challenge, by creating unbiased maps to link disease variants to genes to programs (V2G2P) in a given cell type. We applied this approach to study the role of endothelial cells in the genetics of coronary artery disease (CAD). To link variants to genes, we constructed enhancer-gene maps using the Activity-by-Contact model7,8. To link genes to programs, we applied CRISPRi-Perturb-seq9-12 to knock down all expressed genes within {+/-}500 Kb of 306 CAD GWAS signals13,14 and identify their effects on gene expression programs using single-cell RNA-sequencing. By combining these variant-to-gene and gene-to-program maps, we find that 43 of 306 CAD GWAS signals converge onto 5 gene programs linked to the cerebral cavernous malformations (CCM) pathway--which is known to coordinate transcriptional responses in endothelial cells15, but has not been previously linked to CAD risk. The strongest regulator of these programs is TLNRD1, which we show is a new CAD gene and novel regulator of the CCM pathway. TLNRD1 loss-of-function alters actin organization and barrier function in endothelial cells in vitro, and heart development in zebrafish in vivo. Together, our study identifies convergence of CAD risk loci into prioritized gene programs in endothelial cells, nominates new genes of potential therapeutic relevance for CAD, and demonstrates a generalizable strategy to connect disease variants to functions.
]]></description>
<dc:creator>Schnitzler, G. R.</dc:creator>
<dc:creator>Kang, H.</dc:creator>
<dc:creator>Lee-Kim, V. S.</dc:creator>
<dc:creator>Ma, R. X.</dc:creator>
<dc:creator>Zeng, T.</dc:creator>
<dc:creator>Angom, R. S.</dc:creator>
<dc:creator>Fang, S.</dc:creator>
<dc:creator>Vellarikkal, S. K.</dc:creator>
<dc:creator>Zhou, R.</dc:creator>
<dc:creator>Guo, K.</dc:creator>
<dc:creator>Sias-Garcia, O.</dc:creator>
<dc:creator>Bloemendal, A.</dc:creator>
<dc:creator>Munson, G.</dc:creator>
<dc:creator>Guckelberger, P.</dc:creator>
<dc:creator>Nguyen, T. H.</dc:creator>
<dc:creator>Bergman, D. T.</dc:creator>
<dc:creator>Cheng, N.</dc:creator>
<dc:creator>Cleary, B.</dc:creator>
<dc:creator>Aragam, K.</dc:creator>
<dc:creator>Mukhopadhyay, D.</dc:creator>
<dc:creator>Lander, E. S.</dc:creator>
<dc:creator>Finucane, H. K.</dc:creator>
<dc:creator>Gupta, R. M.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:date>2022-11-04</dc:date>
<dc:identifier>doi:10.1101/2022.11.01.514606</dc:identifier>
<dc:title><![CDATA[Mapping the convergence of genes for coronary artery disease onto endothelial cell programs]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-11-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.01.17.524475v1?rss=1">
<title>
<![CDATA[
HiCLift: A fast and efficient tool for converting chromatin interaction data between genome assemblies 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.01.17.524475v1?rss=1"
</link>
<description><![CDATA[
MotivationWith the continuous effort to improve the quality of human reference genome and the generation of more and more personal genomes, the conversion of genomic coordinates between genome assemblies is critical in many integrative and comparative studies. While tools have been developed for such task for linear genome signals such as ChIP-Seq, no tool exists to convert genome assemblies for chromatin interaction data, despite the importance of three-dimensional (3D) genome organization in gene regulation and disease.

ResultsHere, we present HiCLift, a fast and efficient tool that can convert the genomic coordinates of chromatin contacts such as Hi-C and Micro-C from one assembly to another, including the latest T2T genome. Comparing with the strategy of directly re-mapping raw reads to a different genome, HiCLift runs on average 42 times faster (hours vs. days), while outputs nearly identical contact matrices. More importantly, as HiCLift does not need to re-map the raw reads, it can directly convert human patient sample data, where the raw sequencing reads are sometimes hard to acquire or not available.

AvailabilityHiCLift is publicly available at https://github.com/XiaoTaoWang/HiCLift.
]]></description>
<dc:creator>Wang, X.</dc:creator>
<dc:creator>Yue, F.</dc:creator>
<dc:date>2023-01-20</dc:date>
<dc:identifier>doi:10.1101/2023.01.17.524475</dc:identifier>
<dc:title><![CDATA[HiCLift: A fast and efficient tool for converting chromatin interaction data between genome assemblies]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-01-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.11.29.518374v1?rss=1">
<title>
<![CDATA[
Comparing Genomic and Epigenomic Features across Species Using the WashU Comparative Epigenome Browser 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.11.29.518374v1?rss=1"
</link>
<description><![CDATA[
Genome browsers have become an intuitive and critical tool to visualize and analyze genomic features and data. Conventional genome browsers display data/annotations on a single reference genome/assembly; there are also genomic alignment viewer/browsers that help users visualize alignment, mismatch, and rearrangement between syntenic regions. However, there is a growing need for a comparative epigenome browser that can display genomic and epigenomic datasets across different species and enable users to compare them between syntenic regions. Here, we present the WashU Comparative Epigenome Browser (http://comparativegateway.wustl.edu). It allows users to load functional genomic datasets/annotations mapped to different genomes and display them over syntenic regions simultaneously. The browser also displays genetic differences between the genomes from single nucleotide variants (SNVs) to structural variants (SVs) to visualize the association between epigenomic differences and genetic differences. Instead of anchoring all datasets to the reference genome coordinates, it creates independent coordinates of different genome assemblies to faithfully present features and data mapped to different genomes. It uses a simple, intuitive genome-align track to illustrate the syntenic relationship between different species. It extends the widely used WashU Epigenome Browser infrastructure and can be expanded to support multiple species. This new browser function will greatly facilitate comparative genomic/epigenomic research, as well as support the recent growing needs to directly compare and benchmark the T2T CHM13 assembly and other human genome assemblies.
]]></description>
<dc:creator>Zhuo, X.</dc:creator>
<dc:creator>Hsu, S.</dc:creator>
<dc:creator>Purushotham, D.</dc:creator>
<dc:creator>Chen, S.</dc:creator>
<dc:creator>Li, D.</dc:creator>
<dc:creator>Wang, T.</dc:creator>
<dc:date>2022-12-02</dc:date>
<dc:identifier>doi:10.1101/2022.11.29.518374</dc:identifier>
<dc:title><![CDATA[Comparing Genomic and Epigenomic Features across Species Using the WashU Comparative Epigenome Browser]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.17.533215v1?rss=1">
<title>
<![CDATA[
A machine-readable specification for genomics assays 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.17.533215v1?rss=1"
</link>
<description><![CDATA[
Understanding the structure of sequenced fragments from genomics libraries is essential for accurate read preprocessing. Currently, different assays and sequencing technologies require custom scripts and programs that do not leverage the common structure of sequence elements present in genomics libraries. We present seqspec, a machine-readable specification for libraries produced by genomics assays that facilitates standardization of preprocessing and enables tracking and comparison of genomics assays. The specification and associated seqspec command line tool is available at https://github.com/IGVF/seqspec.
]]></description>
<dc:creator>Booeshaghi, A. S.</dc:creator>
<dc:creator>Chen, X.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-03-21</dc:date>
<dc:identifier>doi:10.1101/2023.03.17.533215</dc:identifier>
<dc:title><![CDATA[A machine-readable specification for genomics assays]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.07.531569v1?rss=1">
<title>
<![CDATA[
Dynamic network-guided CRISPRi screen reveals CTCF loop-constrained nonlinear enhancer-gene regulatory activity in cell state transitions 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.07.531569v1?rss=1"
</link>
<description><![CDATA[
Comprehensive enhancer discovery is challenging because most enhancers, especially those affected in complex diseases, have weak effects on gene expression. Our network modeling revealed that nonlinear enhancer-gene regulation during cell state transitions can be leveraged to improve the sensitivity of enhancer discovery. Utilizing hESC definitive endoderm differentiation as a dynamic transition system, we conducted a mid-transition CRISPRi-based enhancer screen. The screen discovered a comprehensive set of enhancers (4 to 9 per locus) for each of the core endoderm lineage-specifying transcription factors, and many enhancers had strong effects mid-transition but weak effects post-transition. Through integrating enhancer activity measurements and three-dimensional enhancer-promoter interaction information, we were able to develop a CTCF loop-constrained Interaction Activity (CIA) model that can better predict functional enhancers compared to models that rely on Hi-C-based enhancer-promoter contact frequency. Our study provides generalizable strategies for sensitive and more comprehensive enhancer discovery in both normal and pathological cell state transitions.
]]></description>
<dc:creator>Luo, R.</dc:creator>
<dc:creator>Yan, J.</dc:creator>
<dc:creator>Oh, J. W.</dc:creator>
<dc:creator>Xi, W.</dc:creator>
<dc:creator>Shigaki, D.</dc:creator>
<dc:creator>Wong, W.</dc:creator>
<dc:creator>Cho, H.</dc:creator>
<dc:creator>Murphy, D.</dc:creator>
<dc:creator>Cutler, R.</dc:creator>
<dc:creator>Rosen, B. P.</dc:creator>
<dc:creator>Pulecio, J.</dc:creator>
<dc:creator>Yang, D.</dc:creator>
<dc:creator>Glenn, R.</dc:creator>
<dc:creator>Chen, T.</dc:creator>
<dc:creator>Li, Q. V.</dc:creator>
<dc:creator>Vierbuchen, T.</dc:creator>
<dc:creator>Sidoli, S.</dc:creator>
<dc:creator>Apostolou, E.</dc:creator>
<dc:creator>Huangfu, D.</dc:creator>
<dc:creator>Beer, M. A.</dc:creator>
<dc:date>2023-03-09</dc:date>
<dc:identifier>doi:10.1101/2023.03.07.531569</dc:identifier>
<dc:title><![CDATA[Dynamic network-guided CRISPRi screen reveals CTCF loop-constrained nonlinear enhancer-gene regulatory activity in cell state transitions]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.05.01.538906v1?rss=1">
<title>
<![CDATA[
Orthogonal CRISPR screens to identify transcriptional and epigenetic regulators of human CD8 T cell function 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.05.01.538906v1?rss=1"
</link>
<description><![CDATA[
The clinical response to adoptive T cell therapies is strongly associated with transcriptional and epigenetic state. Thus, technologies to discover regulators of T cell gene networks and their corresponding phenotypes have great potential to improve the efficacy of T cell therapies. We developed pooled CRISPR screening approaches with compact epigenome editors to systematically profile the effects of activation and repression of 120 transcription factors and epigenetic modifiers on human CD8+ T cell state. These screens nominated known and novel regulators of T cell phenotypes with BATF3 emerging as a high confidence gene in both screens. We found that BATF3 overexpression promoted specific features of memory T cells such as increased IL7R expression and glycolytic capacity, while attenuating gene programs associated with cytotoxicity, regulatory T cell function, and T cell exhaustion. In the context of chronic antigen stimulation, BATF3 overexpression countered phenotypic and epigenetic signatures of T cell exhaustion. CAR T cells overexpressing BATF3 significantly outperformed control CAR T cells in both in vitro and in vivo tumor models. Moreover, we found that BATF3 programmed a transcriptional profile that correlated with positive clinical response to adoptive T cell therapy. Finally, we performed CRISPR knockout screens with and without BATF3 overexpression to define co-factors and downstream factors of BATF3, as well as other therapeutic targets. These screens pointed to a model where BATF3 interacts with JUNB and IRF4 to regulate gene expression and illuminated several other novel targets for further investigation.
]]></description>
<dc:creator>McCutcheon, S.</dc:creator>
<dc:creator>Swartz, A.</dc:creator>
<dc:creator>Brown, M.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>McRoberts Amador, C.</dc:creator>
<dc:creator>Siklenka, K.</dc:creator>
<dc:creator>Humayun, L.</dc:creator>
<dc:creator>Isaacs, J.</dc:creator>
<dc:creator>Reddy, T. E.</dc:creator>
<dc:creator>Nair, S.</dc:creator>
<dc:creator>Antonia, S.</dc:creator>
<dc:creator>Gersbach, C. A.</dc:creator>
<dc:date>2023-05-01</dc:date>
<dc:identifier>doi:10.1101/2023.05.01.538906</dc:identifier>
<dc:title><![CDATA[Orthogonal CRISPR screens to identify transcriptional and epigenetic regulators of human CD8 T cell function]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-05-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.10.04.560808v1?rss=1">
<title>
<![CDATA[
Transcription factor stoichiometry, motif affinity and syntax regulate single-cell chromatin dynamics during fibroblast reprogramming to pluripotency 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.10.04.560808v1?rss=1"
</link>
<description><![CDATA[
The concentration and stoichiometry of transcription factors (TFs) determine cellular identity and can be manipulated to drive cell state transitions. Understanding how changes in TF concentration regulate chromatin state and expression across cell state transitions remains a challenge. We investigated this relationship by profiling chromatin accessibility and gene expression at single-cell resolution across a densely sampled time course of reprogramming human fibroblasts to induced pluripotent stem cells via ectopic expression of OCT4, SOX2, KLF4, and MYC (OSKM). Using deep learning sequence models of base-resolution chromatin accessibility profiles across cell states, we deciphered predictive transcription factor (TF) motif syntax in regulatory elements, inferred affinity- and concentration-dependent dynamics of TF footprints, linked peaks to putative target genes, and elucidated rewiring of cis-regulatory networks. Our models reveal that early in reprogramming, OSK, at supraphysiological concentrations, rapidly open transient regulatory elements by occupying non-canonical low-affinity binding sites. As OSK concentration falls, the accessibility of these transient elements decays as a function of motif affinity. We find that these OSK-dependent transient elements sequester the somatic TF AP-1. This redistribution is strongly associated with the silencing of fibroblast-specific genes within individual nuclei. Together, our integrated single-cell resource and models reveal insights into the cis-regulatory code of reprogramming at unprecedented resolution. We establish a quantitative, predictive framework that links TF stoichiometry, motif syntax, and somatic silencing to provide new perspectives on the control of cell identity by TFs during fate transitions.
]]></description>
<dc:creator>Nair, S.</dc:creator>
<dc:creator>Ameen, M.</dc:creator>
<dc:creator>Sundaram, L.</dc:creator>
<dc:creator>Pampari, A.</dc:creator>
<dc:creator>Schreiber, J.</dc:creator>
<dc:creator>Balsubramani, A.</dc:creator>
<dc:creator>Wang, Y. X.</dc:creator>
<dc:creator>Burns, D.</dc:creator>
<dc:creator>Blau, H. M.</dc:creator>
<dc:creator>Karakikes, I.</dc:creator>
<dc:creator>Wang, K. C.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:date>2023-10-04</dc:date>
<dc:identifier>doi:10.1101/2023.10.04.560808</dc:identifier>
<dc:title><![CDATA[Transcription factor stoichiometry, motif affinity and syntax regulate single-cell chromatin dynamics during fibroblast reprogramming to pluripotency]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-10-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.12.07.570715v1?rss=1">
<title>
<![CDATA[
Reconstructing Spatial Transcriptomics at the Single-cell Resolution with BayesDeep 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.12.07.570715v1?rss=1"
</link>
<description><![CDATA[
Spatially resolved transcriptomics (SRT) techniques have revolutionized the characterization of molecular profiles while preserving spatial and morphological context. However, most next-generation sequencing-based SRT techniques are limited to measuring gene expression in a confined array of spots, capturing only a fraction of the spatial domain. Typically, these spots encompass gene expression from a few to hundreds of cells, underscoring a critical need for more detailed, single-cell resolution SRT data to enhance our understanding of biological functions within the tissue context. Addressing this challenge, we introduce BayesDeep, a novel Bayesian hierarchical model that leverages cellular morphological data from histology images, commonly paired with SRT data, to reconstruct SRT data at the single-cell resolution. BayesDeep effectively model count data from SRT studies via a negative binomial regression model. This model incorporates explanatory variables such as cell types and nuclei-shape information for each cell extracted from the paired histology image. A feature selection scheme is integrated to examine the association between the morphological and molecular profiles, thereby improving the model robustness. We applied BayesDeep to two real SRT datasets, successfully demonstrating its capability to reconstruct SRT data at the single-cell resolution. This advancement not only yields new biological insights but also significantly enhances various downstream analyses, such as pseudotime and cell-cell communication.
]]></description>
<dc:creator>Jiang, X.</dc:creator>
<dc:creator>Dong, L.</dc:creator>
<dc:creator>Wang, S.</dc:creator>
<dc:creator>Wen, Z.</dc:creator>
<dc:creator>Chen, M.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Xiao, G.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:date>2023-12-08</dc:date>
<dc:identifier>doi:10.1101/2023.12.07.570715</dc:identifier>
<dc:title><![CDATA[Reconstructing Spatial Transcriptomics at the Single-cell Resolution with BayesDeep]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-12-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.11.19.567742v1?rss=1">
<title>
<![CDATA[
MPRAbase: A Massively Parallel Reporter Assay Database 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.11.19.567742v1?rss=1"
</link>
<description><![CDATA[
Massively parallel reporter assays (MPRAs) represent a set of high-throughput technologies that measure the functional effects of thousands of sequences/variants on gene regulatory activity. There are several different variations of MPRA technology and they are used for numerous applications, including regulatory element discovery, variant effect measurement, saturation mutagenesis, synthetic regulatory element generation or characterization of evolutionary gene regulatory differences. Despite their many designs and uses, there is no comprehensive database that incorporates the results of these experiments. To address this, we developed MPRAbase, a manually curated database that currently harbors 129 experiments, encompassing 17,718,677 elements tested across 35 cell types and 4 organisms. The MPRAbase web interface (http://www.mprabase.com) serves as a centralized user-friendly repository to download existing MPRA data for independent analysis and is designed with the ability to allow researchers to share their published data for rapid dissemination to the community.
]]></description>
<dc:creator>Zhao, J.</dc:creator>
<dc:creator>Baltoumas, F. A.</dc:creator>
<dc:creator>Konnaris, M. A.</dc:creator>
<dc:creator>Mouratidis, I.</dc:creator>
<dc:creator>Liu, Z.</dc:creator>
<dc:creator>Sims, J.</dc:creator>
<dc:creator>Agarwal, V.</dc:creator>
<dc:creator>Pavlopoulos, G. A.</dc:creator>
<dc:creator>Georgakopoulos-Soares, I.</dc:creator>
<dc:creator>Ahituv, N.</dc:creator>
<dc:date>2023-11-22</dc:date>
<dc:identifier>doi:10.1101/2023.11.19.567742</dc:identifier>
<dc:title><![CDATA[MPRAbase: A Massively Parallel Reporter Assay Database]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-11-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.04.12.536587v1?rss=1">
<title>
<![CDATA[
Chromatin context-dependent regulation and epigenetic manipulation of prime editing 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.04.12.536587v1?rss=1"
</link>
<description><![CDATA[
Prime editing is a powerful means of introducing precise changes to specific locations in mammalian genomes. However, the widely varying efficiency of prime editing across target sites of interest has limited its adoption in the context of both basic research and clinical settings. Here, we set out to exhaustively characterize the impact of the cis-chromatin environment on prime editing efficiency. Using a newly developed and highly sensitive method for mapping the genomic locations of a randomly integrated "sensor", we identify specific epigenetic features that strongly correlate with the highly variable efficiency of prime editing across different genomic locations. Next, to assess the interaction of trans-acting factors with the cis-chromatin environment, we develop and apply a pooled genetic screening approach with which the impact of knocking down various DNA repair factors on prime editing efficiency can be stratified by cis-chromatin context. Finally, we demonstrate that we can dramatically modulate the efficiency of prime editing through epigenome editing, i.e. altering chromatin state in a locus-specific manner in order to increase or decrease the efficiency of prime editing at a target site. Looking forward, we envision that the insights and tools described here will broaden the range of both basic research and therapeutic contexts in which prime editing is useful.
]]></description>
<dc:creator>Li, X.</dc:creator>
<dc:creator>Chen, W.</dc:creator>
<dc:creator>Martin, B. K.</dc:creator>
<dc:creator>Calderon, D.</dc:creator>
<dc:creator>Lee, C.</dc:creator>
<dc:creator>Choi, J.</dc:creator>
<dc:creator>Chardon, F. M.</dc:creator>
<dc:creator>McDiarmid, T.</dc:creator>
<dc:creator>Kim, H.</dc:creator>
<dc:creator>Lalanne, J.-B.</dc:creator>
<dc:creator>Nathans, J. F.</dc:creator>
<dc:creator>Shendure, J.</dc:creator>
<dc:date>2023-04-12</dc:date>
<dc:identifier>doi:10.1101/2023.04.12.536587</dc:identifier>
<dc:title><![CDATA[Chromatin context-dependent regulation and epigenetic manipulation of prime editing]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-04-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.28.534017v1?rss=1">
<title>
<![CDATA[
Multiplex, single-cell CRISPRa screening for cell type specific regulatory elements 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.28.534017v1?rss=1"
</link>
<description><![CDATA[
CRISPR-based gene activation (CRISPRa) is a promising therapeutic approach for gene therapy, upregulating gene expression by targeting promoters or enhancers in a tissue/cell-type specific manner. Here, we describe an experimental framework that combines highly multiplexed perturbations with single-cell RNA sequencing (sc-RNA-seq) to identify cell-type-specific, CRISPRa-responsive cis-regulatory elements and the gene(s) they regulate. Random combinations of many gRNAs are introduced to each of many cells, which are then profiled and partitioned into test and control groups to test for effect(s) of CRISPRa perturbations of both enhancers and promoters on the expression of neighboring genes. Applying this method to a library of 493 gRNAs targeting candidate cis-regulatory elements in both K562 cells and iPSC-derived excitatory neurons, we identify gRNAs capable of specifically upregulating intended target genes and no other neighboring genes within 1 Mb, including gRNAs yielding upregulation of six autism spectrum disorder (ASD) and neurodevelopmental disorder (NDD) risk genes in neurons. A consistent pattern is that the responsiveness of individual enhancers to CRISPRa is restricted by cell type, implying a dependency on either chromatin landscape and/or additional trans-acting factors for successful gene activation. The approach outlined here may facilitate large-scale screens for gRNAs that activate therapeutically relevant genes in a cell type-specific manner.
]]></description>
<dc:creator>Chardon, F. M.</dc:creator>
<dc:creator>McDiarmid, T. A.</dc:creator>
<dc:creator>Page, N. F.</dc:creator>
<dc:creator>Martin, B. K.</dc:creator>
<dc:creator>Domcke, S.</dc:creator>
<dc:creator>Regalado, S. G.</dc:creator>
<dc:creator>Lalanne, J.-B.</dc:creator>
<dc:creator>Calderon, D.</dc:creator>
<dc:creator>Starita, L. M.</dc:creator>
<dc:creator>Sanders, S. J.</dc:creator>
<dc:creator>Ahituv, N.</dc:creator>
<dc:creator>Shendure, J.</dc:creator>
<dc:date>2023-03-28</dc:date>
<dc:identifier>doi:10.1101/2023.03.28.534017</dc:identifier>
<dc:title><![CDATA[Multiplex, single-cell CRISPRa screening for cell type specific regulatory elements]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.05.531189v1?rss=1">
<title>
<![CDATA[
Massively parallel characterization of transcriptional regulatory elements in three diverse human cell types 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.05.531189v1?rss=1"
</link>
<description><![CDATA[
The human genome contains millions of candidate cis-regulatory elements (CREs) with cell-type-specific activities that shape both health and myriad disease states. However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these CREs. Here, we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of over 680,000 sequences, representing a nearly comprehensive set of all annotated CREs among three cell types (HepG2, K562, and WTC11), finding 41.7% to be functional. By testing sequences in both orientations, we find promoters to have significant strand orientation effects. We also observe that their 200 nucleotide cores function as non-cell-type-specific  on switches providing similar expression levels to their associated gene. In contrast, enhancers have weaker orientation effects, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict CRE function with high accuracy and delineate regulatory motifs. Testing an additional lentiMPRA library encompassing 60,000 CREs in all three cell types, we further identified factors that determine cell-type specificity. Collectively, our work provides an exhaustive catalog of functional CREs in three widely used cell lines, and showcases how large-scale functional measurements can be used to dissect regulatory grammar.
]]></description>
<dc:creator>Agarwal, V.</dc:creator>
<dc:creator>Inoue, F.</dc:creator>
<dc:creator>Schubach, M.</dc:creator>
<dc:creator>Martin, B.</dc:creator>
<dc:creator>Dash, P.</dc:creator>
<dc:creator>Zhang, Z.</dc:creator>
<dc:creator>Sohota, A.</dc:creator>
<dc:creator>Noble, W.</dc:creator>
<dc:creator>Yardimci, G.</dc:creator>
<dc:creator>Kircher, M.</dc:creator>
<dc:creator>Shendure, J.</dc:creator>
<dc:creator>Ahituv, N.</dc:creator>
<dc:date>2023-03-06</dc:date>
<dc:identifier>doi:10.1101/2023.03.05.531189</dc:identifier>
<dc:title><![CDATA[Massively parallel characterization of transcriptional regulatory elements in three diverse human cell types]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.02.15.528663v1?rss=1">
<title>
<![CDATA[
Massively parallel characterization of psychiatric disorder-associated and cell-type-specific regulatory elements in the developing human cortex 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.02.15.528663v1?rss=1"
</link>
<description><![CDATA[
Nucleotide changes in gene regulatory elements are important determinants of neuronal development and disease. Using massively parallel reporter assays in primary human cells from mid-gestation cortex and cerebral organoids, we interrogated the cis-regulatory activity of 102,767 sequences, including differentially accessible cell-type specific regions in the developing cortex and single-nucleotide variants associated with psychiatric disorders. In primary cells, we identified 46,802 active enhancer sequences and 164 disorder-associated variants that significantly alter enhancer activity. Activity was comparable in organoids and primary cells, suggesting that organoids provide an adequate model for the developing cortex. Using deep learning, we decoded the sequence basis and upstream regulators of enhancer activity. This work establishes a comprehensive catalog of functional gene regulatory elements and variants in human neuronal development.

One Sentence SummaryWe identify 46,802 enhancers and 164 psychiatric disorder variants with regulatory effects in the developing cortex and organoids.
]]></description>
<dc:creator>Deng, C.</dc:creator>
<dc:creator>Whalen, S.</dc:creator>
<dc:creator>Steyert, M.</dc:creator>
<dc:creator>Ziffra, R.</dc:creator>
<dc:creator>Przytycki, P. F.</dc:creator>
<dc:creator>Inoue, F.</dc:creator>
<dc:creator>Pereira, D. A.</dc:creator>
<dc:creator>Capauto, D.</dc:creator>
<dc:creator>Norton, S.</dc:creator>
<dc:creator>Vaccarino, F. M.</dc:creator>
<dc:creator>Pollen, A. A.</dc:creator>
<dc:creator>Nowakowski, T. J.</dc:creator>
<dc:creator>Ahituv, N. A.</dc:creator>
<dc:creator>Pollard, K. S.</dc:creator>
<dc:date>2023-02-15</dc:date>
<dc:identifier>doi:10.1101/2023.02.15.528663</dc:identifier>
<dc:title><![CDATA[Massively parallel characterization of psychiatric disorder-associated and cell-type-specific regulatory elements in the developing human cortex]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-02-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.10.519236v1?rss=1">
<title>
<![CDATA[
Multiplex profiling of developmental enhancers with quantitative, single-cell expression reporters 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.10.519236v1?rss=1"
</link>
<description><![CDATA[
The inability to scalably and precisely measure the activity of developmental enhancers in multicellular systems is a bottleneck in genomics. Here, we develop a dual RNA cassette that decouples the detection and quantification tasks inherent to multiplex single-cell reporter assays, resulting in accurate measurement of reporter expression over a >10,000-fold range of activity with a precision approaching the limit set by Poisson counting noise. Together with RNA barcode circularization, these single-cell quantitative expression reporters (scQers) provide high-contrast readouts analogous to classic in situ assays, but entirely from sequencing. Screening >200 enhancers in a multicellular in vitro model of early mammalian development, we identified numerous autonomous and cell-type-specific elements, including constituents of the Sox2 control region exclusively active in pluripotent cells, endoderm-specific enhancers, including near Foxa2 and Gata4, and a compact pleiotropic enhancer at the Lamc1 locus. scQers can be mobilized in developmental systems to quantitatively characterize native, perturbed, and synthetic enhancers at scale, with high sensitivity and at single-cell resolution.
]]></description>
<dc:creator>Lalanne, J.-B.</dc:creator>
<dc:creator>Regalado, S. G.</dc:creator>
<dc:creator>Domcke, S.</dc:creator>
<dc:creator>Calderon, D.</dc:creator>
<dc:creator>Martin, B.</dc:creator>
<dc:creator>Li, T.</dc:creator>
<dc:creator>Suiter, C. C.</dc:creator>
<dc:creator>Lee, C.</dc:creator>
<dc:creator>Trapnell, C.</dc:creator>
<dc:creator>Shendure, J. A.</dc:creator>
<dc:date>2022-12-10</dc:date>
<dc:identifier>doi:10.1101/2022.12.10.519236</dc:identifier>
<dc:title><![CDATA[Multiplex profiling of developmental enhancers with quantitative, single-cell expression reporters]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.02.22.529427v1?rss=1">
<title>
<![CDATA[
Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.02.22.529427v1?rss=1"
</link>
<description><![CDATA[
SummaryLong read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. sequencing mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, sequencing errors can interfere with correct barcode identification, and a given barcode sequence may be linked to multiple independent clones within a given library.

Here we focus on the target application of sequencing mutagenized libraries in the context of multiplexed assays of variant effects (MAVEs). MAVEs are increasingly used to create comprehensive genotype-phenotype maps that can aid clinical variant interpretation. Many MAVE methods use long-read sequencing of barcoded mutant libraries for accurate association of barcode with genotype. Existing long-read sequencing pipelines do not account for inaccurate sequencing or non-unique barcodes. Here, we describe Pacybara, which handles these issues by clustering long reads based on the similarities of (error-prone) barcodes while also detecting barcodes that have been associated with multiple genotypes. Pacybara also detects recombinant (chimeric) clones and reduces false positive indel calls. In three example applications, we show that Pacybara identifies and correctly resolves these issues.

Availability and ImplementationPacybara, freely available at https://github.com/rothlab/pacybara, is implemented using R, Python and bash for Linux. It has both a single-threaded implementation and, for GNU/Linux clusters that use Slurm, PBS, or GridEngine schedulers, a multi-node version.

Supplementary MaterialSupplementary materials are available at Bioinformatics online.
]]></description>
<dc:creator>Weile, J.</dc:creator>
<dc:creator>Cote, A. G.</dc:creator>
<dc:creator>Kishore, N.</dc:creator>
<dc:creator>Tabet, D.</dc:creator>
<dc:creator>van Loggerenberg, W.</dc:creator>
<dc:creator>Rayhan, A.</dc:creator>
<dc:creator>Roth, F. P.</dc:creator>
<dc:date>2023-02-23</dc:date>
<dc:identifier>doi:10.1101/2023.02.22.529427</dc:identifier>
<dc:title><![CDATA[Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-02-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.10.20.562794v1?rss=1">
<title>
<![CDATA[
Assigning credit where it's due: An information content score to capture the clinical value of Multiplexed Assays of Variant Effect 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.10.20.562794v1?rss=1"
</link>
<description><![CDATA[
BackgroundA variant can be pathogenic or benign with relation to a human disease. Current classification categories from benign to pathogenic reflect a probabilistic summary of current understanding. A primary metric of clinical utility for multiplexed assays of variant effect (MAVE) is the number of variants that can be reclassified from uncertain significance (VUS). However, we hypothesized that this measure of utility underrepresents the information gained from MAVEs and that an information theory approach which includes data that does not reclassify variants will better reflect true information gain. We used this information theory approach to evaluate the information gain, in bits, for MAVEs of BRCA1, PTEN, and TP53. Here, one bit represents the amount of information required to completely classify a single variant starting from no information.

ResultsBRCA1 MAVEs produced a total of 831.2 bits of information, 6.58% of the total missense information in BRCA1 and a 22-fold increase over the information that only contributed to VUS reclassification. PTEN MAVEs produced 2059.6 bits of information which represents 32.8% of the total missense information in PTEN and an 85-fold increase over the information that contributed to VUS reclassification. TP53 MAVEs produced 277.8 bits of information which represents 6.22% of the total missense information in TP53 and a 3.5-fold increase over the information that contributed to VUS reclassification.

ConclusionsAn information content approach will more accurately portray information gained through MAVE mapping efforts than counting the number of variants reclassified. This information content approach may also help define the impact of modifying information definitions used to classify many variants, such as guideline rule changes.
]]></description>
<dc:creator>Ranola, J. M.</dc:creator>
<dc:creator>Horton, C.</dc:creator>
<dc:creator>Pesaran, T.</dc:creator>
<dc:creator>Fayer, S.</dc:creator>
<dc:creator>Starita, L. M.</dc:creator>
<dc:creator>Shirts, B. H.</dc:creator>
<dc:date>2023-10-20</dc:date>
<dc:identifier>doi:10.1101/2023.10.20.562794</dc:identifier>
<dc:title><![CDATA[Assigning credit where it's due: An information content score to capture the clinical value of Multiplexed Assays of Variant Effect]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-10-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.06.20.545702v1?rss=1">
<title>
<![CDATA[
Mapping MAVE data for use in human genomics applications 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.06.20.545702v1?rss=1"
</link>
<description><![CDATA[
The large-scale experimental measures of variant functional assays submitted to MaveDB have the potential to provide key information for resolving variants of uncertain significance, but the reporting of results relative to assayed sequence hinders their downstream utility. The Atlas of Variant Effects Alliance mapped multiplexed assays of variant effect data to human reference sequences, creating a robust set of machine-readable homology mappings. This method processed approximately 2.5 million protein and genomic variants in MaveDB, successfully mapping 98.61% of examined variants and disseminating data to resources such as the UCSC Genome Browser and Ensembl Variant Effect Predictor.
]]></description>
<dc:creator>Arbesfeld, J. A.</dc:creator>
<dc:creator>Da, E. Y.</dc:creator>
<dc:creator>Kuzma, K.</dc:creator>
<dc:creator>Paul, A.</dc:creator>
<dc:creator>Farris, T.</dc:creator>
<dc:creator>Riehle, K.</dc:creator>
<dc:creator>Agostinho, N. D. S.</dc:creator>
<dc:creator>Safer, J. F.</dc:creator>
<dc:creator>Milosavljevic, A.</dc:creator>
<dc:creator>Foreman, J.</dc:creator>
<dc:creator>Firth, H. V.</dc:creator>
<dc:creator>Hunt, S. E.</dc:creator>
<dc:creator>Iqbal, S.</dc:creator>
<dc:creator>Cline, M.</dc:creator>
<dc:creator>Rubin, A. F.</dc:creator>
<dc:creator>Wagner, A. H.</dc:creator>
<dc:date>2023-06-23</dc:date>
<dc:identifier>doi:10.1101/2023.06.20.545702</dc:identifier>
<dc:title><![CDATA[Mapping MAVE data for use in human genomics applications]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-06-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.01.02.573913v1?rss=1">
<title>
<![CDATA[
Genomics 2 Proteins portal: A resource and discovery tool for linking genetic screening outputs to protein sequences and structures 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.01.02.573913v1?rss=1"
</link>
<description><![CDATA[
Recent advances in AI-based methods have revolutionized the field of structural biology. Concomitantly, high-throughput sequencing and functional genomics technologies have enabled the detection and generation of variants at an unprecedented scale. However, efficient tools and resources are needed to link these two disparate data types - to "map" variants onto protein structures, to better understand how the variation causes disease and thereby design therapeutics. Here we present the Genomics 2 Proteins Portal (G2P; g2p.broadinstitute.org/): a human proteome-wide resource that maps 19,996,443 genetic variants onto 42,413 protein sequences and 77,923 structures, with a comprehensive set of structural and functional features. Additionally, the G2P portal generalizes the capability of linking genomics to proteins beyond databases by allowing users to interactively upload protein residue-wise annotations (variants, scores, etc.) as well as the protein structure to establish the connection. The portal serves as an easy-to-use discovery tool for researchers and scientists to hypothesize the structure-function relationship between natural or synthetic variations and their molecular phenotype.
]]></description>
<dc:creator>Kwon, S.</dc:creator>
<dc:creator>Safer, J.</dc:creator>
<dc:creator>Nguyen, D. T.</dc:creator>
<dc:creator>Hoksza, D.</dc:creator>
<dc:creator>May, P.</dc:creator>
<dc:creator>Arbesfeld, J.</dc:creator>
<dc:creator>Rubin, A. F.</dc:creator>
<dc:creator>Campbell, A. J.</dc:creator>
<dc:creator>Burgin, A.</dc:creator>
<dc:creator>Iqbal, S.</dc:creator>
<dc:date>2024-01-02</dc:date>
<dc:identifier>doi:10.1101/2024.01.02.573913</dc:identifier>
<dc:title><![CDATA[Genomics 2 Proteins portal: A resource and discovery tool for linking genetic screening outputs to protein sequences and structures]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-01-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.14.520494v1?rss=1">
<title>
<![CDATA[
Integrating deep mutational scanning and low-throughput mutagenesis data to predict the impact of amino acid variants 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.14.520494v1?rss=1"
</link>
<description><![CDATA[
Evaluating the impact of amino acid variants has been a critical challenge for studying protein function and interpreting genomic data. High-throughput experimental methods like deep mutational scanning (DMS) can measure the effect of large numbers of variants in a target protein, but because DMS studies have not been performed on all proteins, researchers also model DMS data computationally to estimate variant impacts by predictors. In this study, we extended a linear regression-based predictor to explore whether incorporating data from alanine scanning (AS), a widely-used low-throughput mutagenesis method, would improve prediction results. To evaluate our model, we collected 146 AS datasets, mapping to 54 DMS datasets across 22 distinct proteins. We show that improved model performance depends on the compatibility of the DMS and AS assays, and the scale of improvement is closely related to the correlation between DMS and AS results.
]]></description>
<dc:creator>Fu, Y.</dc:creator>
<dc:creator>Bedo, J.</dc:creator>
<dc:creator>Papenfuss, A. T.</dc:creator>
<dc:creator>Rubin, A. F.</dc:creator>
<dc:date>2022-12-16</dc:date>
<dc:identifier>doi:10.1101/2022.12.14.520494</dc:identifier>
<dc:title><![CDATA[Integrating deep mutational scanning and low-throughput mutagenesis data to predict the impact of amino acid variants]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-16</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.11.09.563812v1?rss=1">
<title>
<![CDATA[
An encyclopedia of enhancer-gene regulatory interactions in the human genome 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.11.09.563812v1?rss=1"
</link>
<description><![CDATA[
Identifying transcriptional enhancers and their target genes is essential for understanding gene regulation and the impact of human genetic variation on disease1-6. Here we create and evaluate a resource of >13 million enhancer-gene regulatory interactions across 352 cell types and tissues, by integrating predictive models, measurements of chromatin state and 3D contacts, and large-scale genetic perturbations generated by the ENCODE Consortium7. We first create a systematic benchmarking pipeline to compare predictive models, assembling a dataset of 10,411 element-gene pairs measured in CRISPR perturbation experiments, >30,000 fine-mapped eQTLs, and 569 fine-mapped GWAS variants linked to a likely causal gene. Using this framework, we develop a new predictive model, ENCODE-rE2G, that achieves state-of-the-art performance across multiple prediction tasks, demonstrating a strategy involving iterative perturbations and supervised machine learning to build increasingly accurate predictive models of enhancer regulation. Using the ENCODE-rE2G model, we build an encyclopedia of enhancer-gene regulatory interactions in the human genome, which reveals global properties of enhancer networks, identifies differences in the functions of genes that have more or less complex regulatory landscapes, and improves analyses to link noncoding variants to target genes and cell types for common, complex diseases. By interpreting the model, we find evidence that, beyond enhancer activity and 3D enhancer-promoter contacts, additional features guide enhancer-promoter communication including promoter class and enhancer-enhancer synergy. Altogether, these genome-wide maps of enhancer-gene regulatory interactions, benchmarking software, predictive models, and insights about enhancer function provide a valuable resource for future studies of gene regulation and human genetics.
]]></description>
<dc:creator>Gschwind, A. R.</dc:creator>
<dc:creator>Mualim, K. S.</dc:creator>
<dc:creator>Karbalayghareh, A.</dc:creator>
<dc:creator>Sheth, M. U.</dc:creator>
<dc:creator>Dey, K. K.</dc:creator>
<dc:creator>Jagoda, E.</dc:creator>
<dc:creator>Nurtdinov, R. N.</dc:creator>
<dc:creator>Xi, W.</dc:creator>
<dc:creator>Tan, A. S.</dc:creator>
<dc:creator>Jones, H.</dc:creator>
<dc:creator>Ma, X. R.</dc:creator>
<dc:creator>Yao, D.</dc:creator>
<dc:creator>Nasser, J.</dc:creator>
<dc:creator>Avsec, Z.</dc:creator>
<dc:creator>James, B. T.</dc:creator>
<dc:creator>Shamim, M. S.</dc:creator>
<dc:creator>Durand, N. C.</dc:creator>
<dc:creator>Rao, S. S. P.</dc:creator>
<dc:creator>Mahajan, R.</dc:creator>
<dc:creator>Doughty, B. R.</dc:creator>
<dc:creator>Andreeva, K.</dc:creator>
<dc:creator>Ulirsch, J. C.</dc:creator>
<dc:creator>Fan, K.</dc:creator>
<dc:creator>Perez, E. M.</dc:creator>
<dc:creator>Nguyen, T. C.</dc:creator>
<dc:creator>Kelley, D. R.</dc:creator>
<dc:creator>Finucane, H. K.</dc:creator>
<dc:creator>Moore, J. E.</dc:creator>
<dc:creator>Weng, Z.</dc:creator>
<dc:creator>Kellis, M.</dc:creator>
<dc:creator>Bassik, M. C.</dc:creator>
<dc:creator>Price, A. L.</dc:creator>
<dc:creator>Beer, M. A.</dc:creator>
<dc:creator>Guigo, R.</dc:creator>
<dc:creator>Stamatoyannopoulos, J. A.</dc:creator>
<dc:creator>Aiden, E. L.</dc:creator>
<dc:creator>Greenleaf, W. J.</dc:creator>
<dc:creator>Leslie, C. S.</dc:creator>
<dc:creator>Steinmetz, L. M.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:date>2023-11-13</dc:date>
<dc:identifier>doi:10.1101/2023.11.09.563812</dc:identifier>
<dc:title><![CDATA[An encyclopedia of enhancer-gene regulatory interactions in the human genome]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-11-13</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.27.534456v1?rss=1">
<title>
<![CDATA[
Single-cell transcriptome dataset of human and mouse 	in vitro adipogenesis models 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.27.534456v1?rss=1"
</link>
<description><![CDATA[
Adipogenesis is a process in which fat-specific progenitor cells (preadipocytes) differentiate into adipocytes that carry out the key metabolic functions of the adipose tissue, including glucose uptake, energy storage, and adipokine secretion. Several cell lines are routinely used to study the molecular regulation of adipogenesis, in particular the immortalized mouse 3T3-L1 line and the primary human Simpson-Golabi-Behmel syndrome (SGBS) line. However, the cell-to-cell variability of transcriptional changes prior to and during adipogenesis in these models is not well understood. Here, we present a single-cell RNA-Sequencing (scRNA-Seq) dataset collected before and during adipogenic differentiation of 3T3-L1 and SGBS cells. To minimize the effects of experimental variation, we mixed 3T3-L1 and SGBS cells and used computational analysis to demultiplex transcriptomes of mouse and human cells. In both models, adipogenesis results in the appearance of three cell clusters, corresponding to preadipocytes, early and mature adipocytes. These data provide a groundwork for comparative studies on human and mouse adipogenesis, as well as on cell-to-cell variability in gene expression during this process.
]]></description>
<dc:creator>Li, J.</dc:creator>
<dc:creator>Jin, C.</dc:creator>
<dc:creator>Gustafsson, S.</dc:creator>
<dc:creator>Rao, A.</dc:creator>
<dc:creator>Wabitsch, M.</dc:creator>
<dc:creator>Park, C. Y.</dc:creator>
<dc:creator>Quertermous, T.</dc:creator>
<dc:creator>Bielczyk-Maczynska, E.</dc:creator>
<dc:creator>Knowles, J. W.</dc:creator>
<dc:date>2023-03-29</dc:date>
<dc:identifier>doi:10.1101/2023.03.27.534456</dc:identifier>
<dc:title><![CDATA[Single-cell transcriptome dataset of human and mouse 	in vitro adipogenesis models]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.12.20.572268v1?rss=1">
<title>
<![CDATA[
Rewriting regulatory DNA to dissect and reprogram gene expression 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.12.20.572268v1?rss=1"
</link>
<description><![CDATA[
Regulatory DNA sequences within enhancers and promoters bind transcription factors to encode cell type-specific patterns of gene expression. However, the regulatory effects and programmability of such DNA sequences remain difficult to map or predict because we have lacked scalable methods to precisely edit regulatory DNA and quantify the effects in an endogenous genomic context. Here we present an approach to measure the quantitative effects of hundreds of designed DNA sequence variants on gene expression, by combining pooled CRISPR prime editing with RNA fluorescence in situ hybridization and cell sorting (Variant-FlowFISH). We apply this method to mutagenize and rewrite regulatory DNA sequences in an enhancer and the promoter of PPIF in two immune cell lines. Of 672 variant-cell type pairs, we identify 497 that affect PPIF expression. These variants appear to act through a variety of mechanisms including disruption or optimization of existing transcription factor binding sites, as well as creation of de novo sites. Disrupting a single endogenous transcription factor binding site often led to large changes in expression (up to -40% in the enhancer, and -50% in the promoter). The same variant often had different effects across cell types and states, demonstrating a highly tunable regulatory landscape. We use these data to benchmark performance of sequence-based predictive models of gene regulation, and find that certain types of variants are not accurately predicted by existing models. Finally, we computationally design 185 small sequence variants ([&le;]10 bp) and optimize them for specific effects on expression in silico. 84% of these rationally designed edits showed the intended direction of effect, and some had dramatic effects on expression (-100% to +202%). Variant-FlowFISH thus provides a powerful tool to map the effects of variants and transcription factor binding sites on gene expression, test and improve computational models of gene regulation, and reprogram regulatory DNA.
]]></description>
<dc:creator>Martyn, G. E.</dc:creator>
<dc:creator>Montgomery, M. T.</dc:creator>
<dc:creator>Jones, H.</dc:creator>
<dc:creator>Guo, K.</dc:creator>
<dc:creator>Doughty, B. R.</dc:creator>
<dc:creator>Linder, J.</dc:creator>
<dc:creator>Chen, Z.</dc:creator>
<dc:creator>Cochran, K.</dc:creator>
<dc:creator>Lawrence, K. A.</dc:creator>
<dc:creator>Munson, G.</dc:creator>
<dc:creator>Pampari, A.</dc:creator>
<dc:creator>Fulco, C. P.</dc:creator>
<dc:creator>Kelley, D. R.</dc:creator>
<dc:creator>Lander, E. S.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:date>2023-12-21</dc:date>
<dc:identifier>doi:10.1101/2023.12.20.572268</dc:identifier>
<dc:title><![CDATA[Rewriting regulatory DNA to dissect and reprogram gene expression]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-12-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.05.22.541801v1?rss=1">
<title>
<![CDATA[
SLC12A9 is a lysosome-detoxifying ammonium - chloride co-transporter 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.05.22.541801v1?rss=1"
</link>
<description><![CDATA[
Ammonia is a ubiquitous, toxic by-product of cell metabolism. Its high membrane permeability and proton affinity causes ammonia to accumulate inside acidic lysosomes in its poorly membrane-permeant form: ammonium (NH4+). Ammonium buildup compromises lysosomal function, suggesting the existence of mechanisms that protect cells from ammonium toxicity. Here, we identified SLC12A9 as a lysosomal ammonium exporter that preserves lysosomal homeostasis. SLC12A9 knockout cells showed grossly enlarged lysosomes and elevated ammonium content. These phenotypes were reversed upon removal of the metabolic source of ammonium or dissipation of the lysosomal pH gradient. Lysosomal chloride increased in SLC12A9 knockout cells and chloride binding by SLC12A9 was required for ammonium transport. Our data indicate that SLC12A9 is a chloride-driven ammonium co-transporter that is central in an unappreciated, fundamental mechanism of lysosomal physiology that may have special relevance in tissues with elevated ammonia, such as tumors.
]]></description>
<dc:creator>Levin-Konigsberg, R.</dc:creator>
<dc:creator>Mitra, K.</dc:creator>
<dc:creator>Nigam, A.</dc:creator>
<dc:creator>Spees, K.</dc:creator>
<dc:creator>Hivare, P.</dc:creator>
<dc:creator>Liu, K.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Krishnan, Y.</dc:creator>
<dc:creator>Bassik, M.</dc:creator>
<dc:date>2023-05-22</dc:date>
<dc:identifier>doi:10.1101/2023.05.22.541801</dc:identifier>
<dc:title><![CDATA[SLC12A9 is a lysosome-detoxifying ammonium - chloride co-transporter]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-05-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.01.26.525789v1?rss=1">
<title>
<![CDATA[
Molecular mechanisms of coronary artery disease risk at the PDGFD locus 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.01.26.525789v1?rss=1"
</link>
<description><![CDATA[
Platelet derived growth factor (PDGF) signaling has been extensively studied in the context of vascular disease, but the genetics of this pathway remain to be established. Genome wide association studies (GWAS) for coronary artery disease (CAD) have identified a risk locus at 11q22.3, and we have verified with fine mapping approaches that the regulatory variant rs2019090 and PDGFD represent the functional variant and putative functional gene. Further, FOXC1/C2 transcription factor (TF) binding at rs2019090 was found to promote PDGFD transcription through the CAD promoting allele. Employing a constitutive Pdgfd knockout allele along with SMC lineage tracing in a male atherosclerosis mouse model we mapped single cell transcriptomic, cell state, and lesion anatomical changes associated with gene loss. These studies revealed that Pdgfd promotes expansion, migration, and transition of SMC lineage cells to the chondromyocyte phenotype and vascular calcification. This is in contrast to protective CAD genes TCF21, ZEB2, and SMAD3 which we have shown to promote the fibroblast-like cell transition or perturb the pattern or extent of transition to the chondromyocyte phenotype. Further, Pdgfd expressing fibroblasts and pericytes exhibited greater expression of chemokines and leukocyte adhesion molecules, consistent with observed increased macrophage recruitment to the plaque. Despite these changes there was no effect of Pdgfd deletion on SMC contribution to the fibrous cap or overall lesion burden. These findings suggest that PDGFD mediates CAD risk through promoting SMC expansion and migration, in conjunction with deleterious phenotypic changes, and through promoting an inflammatory response that is primarily focused in the adventitia where it contributes to leukocyte trafficking to the diseased vessel wall.
]]></description>
<dc:creator>Kim, H.-J.</dc:creator>
<dc:creator>Cheng, P.</dc:creator>
<dc:creator>Travisano, S.</dc:creator>
<dc:creator>Weldy, C. S.</dc:creator>
<dc:creator>Monteiro, J.</dc:creator>
<dc:creator>Kundu, R.</dc:creator>
<dc:creator>Nguyen, T.</dc:creator>
<dc:creator>Sharma, D.</dc:creator>
<dc:creator>Shi, H.</dc:creator>
<dc:creator>Liu, B.</dc:creator>
<dc:creator>Lin, Y.</dc:creator>
<dc:creator>Haldar, S.</dc:creator>
<dc:creator>Jackson, S.</dc:creator>
<dc:creator>Quertermous, T.</dc:creator>
<dc:date>2023-01-27</dc:date>
<dc:identifier>doi:10.1101/2023.01.26.525789</dc:identifier>
<dc:title><![CDATA[Molecular mechanisms of coronary artery disease risk at the PDGFD locus]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-01-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.05.18.492517v1?rss=1">
<title>
<![CDATA[
The epigenomic landscape of single vascular cells reflects developmental origin and identifies disease risk loci 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.05.18.492517v1?rss=1"
</link>
<description><![CDATA[
Vascular sites have distinct susceptibility to atherosclerosis and aneurysm, yet the biological underpinning of vascular site-specific disease risk is largely unknown.

Vascular tissues have different developmental origins that may influence global chromatin accessibility, and understanding differential chromatin accessibility, gene expression profiles, and gene regulatory networks (GRN) on single cell resolution may give key insight into vascular site-specific disease risk. Here, we performed single cell chromatin accessibility (scATACseq) and gene expression profiling (scRNAseq) of healthy adult mouse vascular tissue from three vascular sites, 1) aortic root and ascending aorta, 2) brachiocephalic and carotid artery, and 3) descending thoracic aorta. Through a comprehensive analysis at single cell resolution, we discovered key regulatory enhancers to not only be cell type, but vascular site specific in vascular smooth muscle (SMC), fibroblasts, and endothelial cells. We identified epigenetic markers of embryonic origin with differential chromatin accessibility of key developmental transcription factors such as Tbx20, Hand2, Gata4, and Hoxb family members and discovered transcription factor motif accessibility to be cell type and vascular site specific. Notably, we found ascending fibroblasts to have distinct epigenomic patterns, highlighting SMAD2/3 function to suggest a differential susceptibility to TGF{beta}, a finding we confirmed through in vitro culture of primary adventitial fibroblasts. Finally, to understand how vascular site-specific enhancers may regulate human genetic risk for disease, we integrated genome wide association study (GWAS) data for ascending and descending aortic dimension, and through using a distinct base resolution deep learning model to predict variant effect on chromatin accessibility, ChromBPNet, to predict variant effects in SMC, Fibroblasts, and Endothelial cells within ascending aorta, carotid, and descending aorta sites of origin. We reveal that although cell type remains a primary influence on variant effects, vascular site modifies cell type transcription and highlights genomic regions that are enriched for specific TF motif footprints -- including MEF2A, SMAD3, and HAND2. This work supports a paradigm that the epigenomic and transcriptomic landscape of vascular cells are cell type and vascular site-specific and that site-specific enhancers govern complex genetic drivers of disease risk.
]]></description>
<dc:creator>Weldy, C. S.</dc:creator>
<dc:creator>Cheng, P. P.</dc:creator>
<dc:creator>Pedroza, A. J.</dc:creator>
<dc:creator>Dalal, A. R.</dc:creator>
<dc:creator>Sharma, D.</dc:creator>
<dc:creator>Kim, H.-J.</dc:creator>
<dc:creator>Shi, H.</dc:creator>
<dc:creator>Nguyen, T.</dc:creator>
<dc:creator>Kundu, R. K.</dc:creator>
<dc:creator>Fischbein, M. P.</dc:creator>
<dc:creator>Quertermous, T.</dc:creator>
<dc:date>2022-05-18</dc:date>
<dc:identifier>doi:10.1101/2022.05.18.492517</dc:identifier>
<dc:title><![CDATA[The epigenomic landscape of single vascular cells reflects developmental origin and identifies disease risk loci]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-05-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.05.556368v1?rss=1">
<title>
<![CDATA[
Pervasive mislocalization of pathogenic coding variants underlying human disorders 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.05.556368v1?rss=1"
</link>
<description><![CDATA[
Widespread sequencing has yielded thousands of missense variants predicted or confirmed as disease-causing. This creates a new bottleneck: determining the functional impact of each variant - largely a painstaking, customized process undertaken one or a few genes or variants at a time. Here, we established a high-throughput imaging platform to assay the impact of coding variation on protein localization, evaluating 3,547 missense variants of over 1,000 genes and phenotypes. We discovered that mislocalization is a common consequence of coding variation, affecting about one-sixth of all pathogenic missense variants, all cellular compartments, and recessive and dominant disorders alike. Mislocalization is primarily driven by effects on protein stability and membrane insertion rather than disruptions of trafficking signals or specific interactions. Furthermore, mislocalization patterns help explain pleiotropy and disease severity and provide insights on variants of unknown significance. Our publicly available resource will likely accelerate the understanding of coding variation in human diseases.
]]></description>
<dc:creator>Lacoste, J.</dc:creator>
<dc:creator>Haghighi, M.</dc:creator>
<dc:creator>Haider, S.</dc:creator>
<dc:creator>Lin, Z.-Y.</dc:creator>
<dc:creator>Segal, D.</dc:creator>
<dc:creator>Reno, C.</dc:creator>
<dc:creator>Qian, W. W.</dc:creator>
<dc:creator>Xiong, X.</dc:creator>
<dc:creator>Shafqat-Abbasi, H.</dc:creator>
<dc:creator>Ryder, P.</dc:creator>
<dc:creator>Senft, R.</dc:creator>
<dc:creator>Cimini, B.</dc:creator>
<dc:creator>Roth, F.</dc:creator>
<dc:creator>Calderwood, M.</dc:creator>
<dc:creator>Hill, D.</dc:creator>
<dc:creator>Vidal, M.</dc:creator>
<dc:creator>Yi, S.</dc:creator>
<dc:creator>Sahni, N.</dc:creator>
<dc:creator>Peng, J.</dc:creator>
<dc:creator>Gingras, A.-C.</dc:creator>
<dc:creator>Singh, S.</dc:creator>
<dc:creator>Carpenter, A.</dc:creator>
<dc:creator>Taipale, M.</dc:creator>
<dc:date>2023-09-05</dc:date>
<dc:identifier>doi:10.1101/2023.09.05.556368</dc:identifier>
<dc:title><![CDATA[Pervasive mislocalization of pathogenic coding variants underlying human disorders]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.05.24.542036v1?rss=1">
<title>
<![CDATA[
Characterizing glucokinase variant mechanisms using a multiplexed abundance assay 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.05.24.542036v1?rss=1"
</link>
<description><![CDATA[
Amino acid substitutions can perturb protein activity in multiple ways. Understanding their mechanistic basis may pinpoint how residues contribute to protein function. Here, we characterize the mechanisms of human glucokinase (GCK) variants, building on our previous comprehensive study on GCK variant activity. We assayed the abundance of 95% of GCK missense and nonsense variants, and found that 43% of hypoactive variants have a decreased cellular abundance. By combining our abundance scores with predictions of protein thermodynamic stability, we identify residues important for GCK metabolic stability and conformational dynamics. These residues could be targeted to modulate GCK activity, and thereby affect glucose homeostasis.
]]></description>
<dc:creator>Gersing, S.</dc:creator>
<dc:creator>Schulze, T. K.</dc:creator>
<dc:creator>Cagiada, M.</dc:creator>
<dc:creator>Stein, A.</dc:creator>
<dc:creator>Roth, F. P.</dc:creator>
<dc:creator>Lindorff-Larsen, K.</dc:creator>
<dc:creator>Hartmann-Petersen, R.</dc:creator>
<dc:date>2023-05-24</dc:date>
<dc:identifier>doi:10.1101/2023.05.24.542036</dc:identifier>
<dc:title><![CDATA[Characterizing glucokinase variant mechanisms using a multiplexed abundance assay]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-05-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.02.06.527353v1?rss=1">
<title>
<![CDATA[
Systematically testing human HMBS missense variants to reveal mechanism and pathogenic variation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.02.06.527353v1?rss=1"
</link>
<description><![CDATA[
Defects in hydroxymethylbilane synthase (HMBS) can cause Acute Intermittent Porphyria (AIP), an acute neurological disease. Although sequencing-based diagnosis can be definitive, ~[1/3] of clinical HMBS variants are missense variants, and most clinically-reported HMBS missense variants are designated as "variants of uncertain significance" (VUS). Using saturation mutagenesis, en masse selection, and sequencing, we applied a multiplexed validated assay to both the erythroid-specific and ubiquitous isoforms of HMBS, obtaining confident functional impact scores for >84% of all possible amino-acid substitutions. The resulting variant effect maps generally agreed with biochemical expectation. However, the maps showed variants at the dimerization interface to be unexpectedly well tolerated, and suggested residue roles in active site dynamics that were supported by molecular dynamics simulations. Most importantly, these HMBS variant effect maps can help discriminate pathogenic from benign variants, proactively providing evidence even for yet-to-be-observed clinical missense variants.
]]></description>
<dc:creator>van Loggerenberg, W.</dc:creator>
<dc:creator>Sowlati-Hashjin, S.</dc:creator>
<dc:creator>Weile, J.</dc:creator>
<dc:creator>Hamilton, R.</dc:creator>
<dc:creator>Chawla, A.</dc:creator>
<dc:creator>Gebbia, M.</dc:creator>
<dc:creator>Kishore, N.</dc:creator>
<dc:creator>Fresard, L.</dc:creator>
<dc:creator>Mustajoki, S.</dc:creator>
<dc:creator>Pischik, E.</dc:creator>
<dc:creator>Di Pierro, E.</dc:creator>
<dc:creator>Barbaro, M.</dc:creator>
<dc:creator>Floderus, Y.</dc:creator>
<dc:creator>Schmitt, C.</dc:creator>
<dc:creator>Gouya, L.</dc:creator>
<dc:creator>Colavin, A.</dc:creator>
<dc:creator>Nussbaum, R.</dc:creator>
<dc:creator>Friesema, E. C. H.</dc:creator>
<dc:creator>Kauppinen, R.</dc:creator>
<dc:creator>To-Figueras, J.</dc:creator>
<dc:creator>Aarsand, A. K.</dc:creator>
<dc:creator>Desnick, R. J.</dc:creator>
<dc:creator>Garton, M.</dc:creator>
<dc:creator>Roth, F. P.</dc:creator>
<dc:date>2023-02-06</dc:date>
<dc:identifier>doi:10.1101/2023.02.06.527353</dc:identifier>
<dc:title><![CDATA[Systematically testing human HMBS missense variants to reveal mechanism and pathogenic variation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-02-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.06.18.545488v1?rss=1">
<title>
<![CDATA[
Integrating Image and Molecular Profiles for Spatial Transcriptomics Analysis 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.06.18.545488v1?rss=1"
</link>
<description><![CDATA[
The spatially resolved transcriptomics (SRT) field has revolutionized our ability to comprehensively leverage image and molecular profiles to elucidate spatial organization of cellular microenvironments. Current clustering analysis of SRT data primarily relies on molecular information and fails to fully exploit the morphological features present in histology images, leading to compromised accuracy and interpretability. To overcome these limitations, we have developed a multi-stage statistical method called iIMPACT. It includes a finite mixture model to identify and define histology-based spatial domains based on AI-reconstructed histology images and spatial context of gene expression measurements, and a negative binomial regression model to detect domain-specific spatially variable genes. Through multiple case studies, we demonstrate iIMPACT outperformed existing methods, confirmed by ground truth biological knowledge. These findings underscore the accuracy and interpretability of iIMPACT as a new clustering approach, providing valuable insights into the cellular spatial organization and landscape of functional genes within spatial transcriptomics data.
]]></description>
<dc:creator>Jiang, X.</dc:creator>
<dc:creator>Wang, S.</dc:creator>
<dc:creator>Guo, L.</dc:creator>
<dc:creator>Wen, Z.</dc:creator>
<dc:creator>Jia, L.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Xiao, G.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:date>2023-06-20</dc:date>
<dc:identifier>doi:10.1101/2023.06.18.545488</dc:identifier>
<dc:title><![CDATA[Integrating Image and Molecular Profiles for Spatial Transcriptomics Analysis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-06-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.11.20.567880v1?rss=1">
<title>
<![CDATA[
Enhancer regulatory networks globally connect non-coding breast cancer loci to cancer genes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.11.20.567880v1?rss=1"
</link>
<description><![CDATA[
Genetic studies have associated thousands of enhancers with breast cancer. However, the vast majority have not been functionally characterized. Thus, it remains unclear how variant-associated enhancers contribute to cancer. Here, we perform single-cell CRISPRi screens of 3,512 regulatory elements associated with breast cancer to measure the impact of these regions on transcriptional phenotypes. Analysis of >500,000 single-cell transcriptomes in two breast cancer cell lines shows that perturbation of variant-associated enhancers disrupts breast cancer gene programs. We observe variant-associated enhancers that directly or indirectly regulate the expression of cancer genes. We also find one-to-multiple and multiple-to-one network motifs where enhancers indirectly regulate cancer genes. Notably, multiple variant-associated enhancers indirectly regulate TP53. Comparative studies illustrate sub-type specific functions between enhancers in ER+ and ER- cells. Finally, we developed the pySpade package to facilitate analysis of single-cell enhancer screens. Overall, we demonstrate that enhancers form regulatory networks that link cancer genes in the genome, providing a more comprehensive understanding of the contribution of enhancers to breast cancer development.
]]></description>
<dc:creator>Wang, Y.</dc:creator>
<dc:creator>Armendariz, D. A.</dc:creator>
<dc:creator>Wang, L.</dc:creator>
<dc:creator>Zhao, H.</dc:creator>
<dc:creator>Xie, S.</dc:creator>
<dc:creator>Hon, G. C.</dc:creator>
<dc:date>2023-11-20</dc:date>
<dc:identifier>doi:10.1101/2023.11.20.567880</dc:identifier>
<dc:title><![CDATA[Enhancer regulatory networks globally connect non-coding breast cancer loci to cancer genes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-11-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.07.531525v1?rss=1">
<title>
<![CDATA[
Dissecting embryonic and extra-embryonic lineage crosstalk with stem cell co-culture 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.07.531525v1?rss=1"
</link>
<description><![CDATA[
Faithful embryogenesis requires precise coordination between embryonic and extraembryonic tissues. Although stem cells from embryonic and extraembryonic origins have been generated for several mammalian species(Bogliotti et al., 2018; Choi et al., 2019; Cui et al., 2019; Evans and Kaufman, 1981; Kunath et al., 2005; Li et al., 2008; Martin, 1981; Okae et al., 2018; Tanaka et al., 1998; Thomson et al., 1998; Vandevoort et al., 2007; Vilarino et al., 2020; Yu et al., 2021b; Zhong et al., 2018), they are grown in different culture conditions with diverse media composition, which makes it difficult to study cross-lineage communication. Here, by using the same culture condition that activates FGF, TGF-{beta} and WNT signaling pathways, we derived stable embryonic stem cells (ESCs), extraembryonic endoderm stem cells (XENs) and trophoblast stem cells (TSCs) from all three founding tissues of mouse and cynomolgus monkey blastocysts. This allowed us to establish embryonic and extraembryonic stem cell co-cultures to dissect lineage crosstalk during early mammalian development. Co-cultures of ESCs and XENs uncovered a conserved and previously unrecognized growth inhibition of pluripotent cells by extraembryonic endoderm cells, which is in part mediated through extracellular matrix signaling. Our study unveils a more universal state of stem cell self-renewal stabilized by activation, as opposed to inhibition, of developmental signaling pathways. The embryonic and extraembryonic stem cell co-culture strategy developed here will open new avenues for creating more faithful embryo models and developing more developmentally relevant differentiation protocols.
]]></description>
<dc:creator>Wu, J.</dc:creator>
<dc:creator>Wei, Y.</dc:creator>
<dc:creator>Zhang, E.</dc:creator>
<dc:creator>Yu, L.</dc:creator>
<dc:creator>Guo, L.</dc:creator>
<dc:creator>Sakurai, M.</dc:creator>
<dc:creator>Takii, S.</dc:creator>
<dc:creator>Schmitz, D.</dc:creator>
<dc:creator>Ding, Y.</dc:creator>
<dc:creator>Zheng, C.</dc:creator>
<dc:creator>Sun, H.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Okamura, D.</dc:creator>
<dc:creator>Ji, W.</dc:creator>
<dc:creator>Tan, T.</dc:creator>
<dc:creator>Zhan, L.</dc:creator>
<dc:creator>Ci, B.</dc:creator>
<dc:creator>Liu, J.</dc:creator>
<dc:date>2023-03-07</dc:date>
<dc:identifier>doi:10.1101/2023.03.07.531525</dc:identifier>
<dc:title><![CDATA[Dissecting embryonic and extra-embryonic lineage crosstalk with stem cell co-culture]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.01.10.574997v1?rss=1">
<title>
<![CDATA[
Mechanosensitive genomic enhancers potentiate the cellular response to matrix stiffness 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.01.10.574997v1?rss=1"
</link>
<description><![CDATA[
Epigenetic control of cellular transcription and phenotype is influenced by changes in the cellular microenvironment, yet how mechanical cues from these microenvironments precisely influence epigenetic state to regulate transcription remains largely unmapped. Here, we combine genome-wide epigenome profiling, epigenome editing, and phenotypic and single-cell RNA-seq CRISPR screening to identify a new class of genomic enhancers that responds to the mechanical microenvironment. These  mechanoenhancers could be active on either soft or stiff extracellular matrix contexts, and regulated transcription to influence critical cell functions including apoptosis, mechanotransduction, proliferation, and migration. Epigenetic editing of mechanoenhancers on rigid materials tuned gene expression to levels observed on softer materials, thereby reprogramming the cellular response to the mechanical microenvironment. These editing approaches may enable the precise alteration of mechanically-driven disease states.
]]></description>
<dc:creator>Cosgrove, B. D.</dc:creator>
<dc:creator>Bounds, L. R.</dc:creator>
<dc:creator>Taylor, C. K.</dc:creator>
<dc:creator>Su, A. L.</dc:creator>
<dc:creator>Rizzo, A. J.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>Crawford, G. E.</dc:creator>
<dc:creator>Hoffman, B. D.</dc:creator>
<dc:creator>Gersbach, C. A.</dc:creator>
<dc:date>2024-01-10</dc:date>
<dc:identifier>doi:10.1101/2024.01.10.574997</dc:identifier>
<dc:title><![CDATA[Mechanosensitive genomic enhancers potentiate the cellular response to matrix stiffness]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-01-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.28.533945v1?rss=1">
<title>
<![CDATA[
Single-cell multi-scale footprinting reveals the modular organization of DNA regulatory elements 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.28.533945v1?rss=1"
</link>
<description><![CDATA[
Cis-regulatory elements control gene expression and are dynamic in their structure, reflecting changes to the composition of diverse effector proteins over time1-3. Here we sought to connect the structural changes at cis-regulatory elements to alterations in cellular fate and function. To do this we developed PRINT, a computational method that uses deep learning to correct sequence bias in chromatin accessibility data and identifies multi-scale footprints of DNA-protein interactions. We find that multi-scale footprints enable more accurate inference of TF and nucleosome binding. Using PRINT with single-cell multi-omics, we discover wide-spread changes to the structure and function of candidate cis-regulatory elements (cCREs) across hematopoiesis, wherein nucleosomes slide, expose DNA for TF binding, and promote gene expression. Activity segmentation using the co-variance across cell states identifies "sub-cCREs" as modular cCRE subunits of regulatory DNA. We apply this single-cell and PRINT approach to characterize the age-associated alterations to cCREs within hematopoietic stem cells (HSCs). Remarkably, we find a spectrum of aging alterations among HSCs corresponding to a global gain of sub-cCRE activity while preserving cCRE accessibility. Collectively, we reveal the functional importance of cCRE structure across cell states, highlighting changes to gene regulation at single-cell and single-base-pair resolution.
]]></description>
<dc:creator>Hu, Y.</dc:creator>
<dc:creator>Ma, S.</dc:creator>
<dc:creator>Kartha, V. K.</dc:creator>
<dc:creator>Duarte, F. M.</dc:creator>
<dc:creator>Horlbeck, M.</dc:creator>
<dc:creator>Zhang, R.</dc:creator>
<dc:creator>Shrestha, R.</dc:creator>
<dc:creator>Labade, A.</dc:creator>
<dc:creator>Kletzien, H.</dc:creator>
<dc:creator>Meliki, A.</dc:creator>
<dc:creator>Castillo, A.</dc:creator>
<dc:creator>Durand, N.</dc:creator>
<dc:creator>Mattei, E.</dc:creator>
<dc:creator>Anderson, L. J.</dc:creator>
<dc:creator>Tay, T.</dc:creator>
<dc:creator>Earl, A. S.</dc:creator>
<dc:creator>Shoresh, N.</dc:creator>
<dc:creator>Epstein, C. B.</dc:creator>
<dc:creator>Wagers, A.</dc:creator>
<dc:creator>Buenrostro, J. D.</dc:creator>
<dc:date>2023-03-29</dc:date>
<dc:identifier>doi:10.1101/2023.03.28.533945</dc:identifier>
<dc:title><![CDATA[Single-cell multi-scale footprinting reveals the modular organization of DNA regulatory elements]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.10.10.561642v1?rss=1">
<title>
<![CDATA[
Convergent Epigenetic Evolution Drives Relapse in Acute Myeloid Leukemia 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.10.10.561642v1?rss=1"
</link>
<description><![CDATA[
Relapse of acute myeloid leukemia (AML) is highly aggressive and often treatment refractory. We analyzed previously published AML relapse cohorts and found that 40% of relapses occur without changes in driver mutations, suggesting that non-genetic mechanisms drive relapse in a large proportion of cases. We therefore characterized epigenetic patterns of AML relapse using 26 matched diagnosis-relapse samples with ATAC-seq. This analysis identified a relapse-specific chromatin accessibility signature for mutationally stable AML, suggesting that AML undergoes epigenetic evolution at relapse independent of mutational changes. Analysis of leukemia stem cell (LSC) chromatin changes at relapse indicated that this leukemic compartment underwent significantly less epigenetic evolution than non-LSCs, while epigenetic changes in non-LSCs reflected overall evolution of the bulk leukemia. Finally, we used single-cell ATAC-seq paired with mitochondrial sequencing (mtscATAC) to map clones from diagnosis into relapse along with their epigenetic features. We found that distinct mitochondrially-defined clones exhibit more similar chromatin accessibility at relapse relative to diagnosis, demonstrating convergent epigenetic evolution in relapsed AML. These results demonstrate that epigenetic evolution is a feature of relapsed AML and that convergent epigenetic evolution can occur following treatment with induction chemotherapy.
]]></description>
<dc:creator>Nuno, K.</dc:creator>
<dc:creator>Azizi, A.</dc:creator>
<dc:creator>Koehnke, T.</dc:creator>
<dc:creator>Lareau, C.</dc:creator>
<dc:creator>Ediriwickrema, A.</dc:creator>
<dc:creator>Corces, M. R.</dc:creator>
<dc:creator>Satpathy, A. T.</dc:creator>
<dc:creator>Majeti, R.</dc:creator>
<dc:date>2023-10-10</dc:date>
<dc:identifier>doi:10.1101/2023.10.10.561642</dc:identifier>
<dc:title><![CDATA[Convergent Epigenetic Evolution Drives Relapse in Acute Myeloid Leukemia]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-10-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.11.20.517242v1?rss=1">
<title>
<![CDATA[
Single-cell multi-omics reveals dynamics of purifying selection of pathogenic mitochondrial DNA across human immune cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.11.20.517242v1?rss=1"
</link>
<description><![CDATA[
Cells experience intrinsic and extrinsic pressures that affect their proclivity to expand and persist in vivo. In congenital disorders caused by loss-of-function mutations in mitochondrial DNA (mtDNA), metabolic vulnerabilities may result in cell-type specific phenotypes and depletion of pathogenic alleles, contributing to purifying selection. However, the impact of pathogenic mtDNA mutations on the cellular hematopoietic landscape is not well understood. Here, we establish a multi-omics approach to quantify deletions in mtDNA alongside cell state features in single cells derived from Pearson syndrome patients. We resolve the interdependence between pathogenic mtDNA and lineage, including purifying selection against deletions in effector/memory CD8 T-cell populations and recent thymic emigrants and dynamics in other hematopoietic populations. Our mapping of lineage-specific purifying selection dynamics in primary cells from patients carrying pathogenic heteroplasmy provides a new perspective on recurrent clinical phenotypes in mitochondrial disorders, including cancer and infection, with potential broader relevance to age-related immune dysfunction.
]]></description>
<dc:creator>Lareau, C. A.</dc:creator>
<dc:creator>Dubois, S. M.</dc:creator>
<dc:creator>Buquicchio, F. A.</dc:creator>
<dc:creator>Hsieh, Y.-H.</dc:creator>
<dc:creator>Garg, K.</dc:creator>
<dc:creator>Kautz, P.</dc:creator>
<dc:creator>Nitsch, L.</dc:creator>
<dc:creator>Praktiknjo, S. D.</dc:creator>
<dc:creator>Maschmeyer, P.</dc:creator>
<dc:creator>Verboon, J. M.</dc:creator>
<dc:creator>Gutierrez, J. C.</dc:creator>
<dc:creator>Yin, Y.</dc:creator>
<dc:creator>Fiskin, E.</dc:creator>
<dc:creator>Luo, W.</dc:creator>
<dc:creator>Mimitou, E.</dc:creator>
<dc:creator>Muus, C.</dc:creator>
<dc:creator>Malhotra, R.</dc:creator>
<dc:creator>Parikh, S.</dc:creator>
<dc:creator>Fleming, M. D.</dc:creator>
<dc:creator>Oevermann, L.</dc:creator>
<dc:creator>Schulte, J.</dc:creator>
<dc:creator>Eckert, C.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Smibert, P.</dc:creator>
<dc:creator>Satpathy, A. T.</dc:creator>
<dc:creator>Regev, A.</dc:creator>
<dc:creator>Sankaran, V. G.</dc:creator>
<dc:creator>Agarwal, S.</dc:creator>
<dc:creator>Ludwig, L. S.</dc:creator>
<dc:date>2022-11-20</dc:date>
<dc:identifier>doi:10.1101/2022.11.20.517242</dc:identifier>
<dc:title><![CDATA[Single-cell multi-omics reveals dynamics of purifying selection of pathogenic mitochondrial DNA across human immune cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-11-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.04.23.537997v1?rss=1">
<title>
<![CDATA[
Codon affinity in mitochondrial DNA shapes evolutionary and somatic fitness 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.04.23.537997v1?rss=1"
</link>
<description><![CDATA[
Summary ParagraphSomatic variation contributes to biological heterogeneity by modulating cellular proclivity to differentiate, expand, adapt, or die. While large-scale sequencing efforts have revealed the foundational role of somatic variants to drive human tumor evolution, our understanding of the contribution of mutations to modulate cellular fitness in non-malignant contexts remains understudied. Here, we identify a mosaic synonymous variant (m.7076A>G) in the mitochondrial DNA (mtDNA) encoded cytochrome c-oxidase subunit 1 gene (MT-CO1, p.Gly391=), which was present at homoplasmy in 47% of immune cells from a healthy donor. Using single-cell multi-omics, we discover highly specific selection against the m.7076G mutant allele in the CD8+ effector memory T cell compartment in vivo, reminiscent of selection observed for pathogenic mtDNA alleles1, 2 and indicative of lineage-specific metabolic requirements. While the wildtype m.7076A allele is translated via Watson-Crick-Franklin base-pairing, the anticodon diversity of the mitochondrial transfer RNA pool is limited, requiring wobble-dependent translation of the m.7076G mutant allele. Notably, mitochondrial ribosome profiling revealed altered codon-anticodon affinity at the wobble position as evidenced by stalled translation of the synonymous m.7076G mutant allele encoding for glycine. Generalizing this observation, we provide a new ontogeny of the 8,482 synonymous variants in the human mitochondrial genome that enables interpretation of functional mtDNA variation. Specifically, via inter- and intra-species evolutionary analyses, population-level complex trait associations, and the occurrence of germline and somatic mtDNA mutations from large-scale sequencing studies, we demonstrate that synonymous variation impacting codon:anticodon affinity is actively evolving across the entire mitochondrial genome and has broad functional and phenotypic effects. In summary, our results introduce a new ontogeny for mitochondrial genetic variation and support a model where organismal principles can be discerned from somatic evolution via single-cell genomics.
]]></description>
<dc:creator>Lareau, C. A.</dc:creator>
<dc:creator>Yin, Y.</dc:creator>
<dc:creator>Gutierrez, J. C.</dc:creator>
<dc:creator>Dhindsa, R. S.</dc:creator>
<dc:creator>Gribling-Burrer, A.-S.</dc:creator>
<dc:creator>Hsieh, Y.-H.</dc:creator>
<dc:creator>Nitsch, L.</dc:creator>
<dc:creator>Buquicchio, F. A.</dc:creator>
<dc:creator>Abay, T.</dc:creator>
<dc:creator>Zielinski, S.</dc:creator>
<dc:creator>Stickels, R. R.</dc:creator>
<dc:creator>Ulirsch, J. C.</dc:creator>
<dc:creator>Yan, P.</dc:creator>
<dc:creator>Wang, F.</dc:creator>
<dc:creator>Miao, Z.</dc:creator>
<dc:creator>Sandor, K.</dc:creator>
<dc:creator>Daniel, B.</dc:creator>
<dc:creator>Liu, V.</dc:creator>
<dc:creator>Wang, Q.</dc:creator>
<dc:creator>Hu, F.</dc:creator>
<dc:creator>Smith, K. R.</dc:creator>
<dc:creator>Deevi, S. V. V.</dc:creator>
<dc:creator>Maschmeyer, P.</dc:creator>
<dc:creator>Petrovski, S.</dc:creator>
<dc:creator>Smyth, R. P.</dc:creator>
<dc:creator>Greenleaf, W. J.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Munschauer, M.</dc:creator>
<dc:creator>Ludwig, L. S.</dc:creator>
<dc:creator>Satpathy, A. T.</dc:creator>
<dc:date>2023-04-23</dc:date>
<dc:identifier>doi:10.1101/2023.04.23.537997</dc:identifier>
<dc:title><![CDATA[Codon affinity in mitochondrial DNA shapes evolutionary and somatic fitness]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-04-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.11.13.566919v1?rss=1">
<title>
<![CDATA[
Identifying genetic variants that influence the abundance of cell states in single-cell data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.11.13.566919v1?rss=1"
</link>
<description><![CDATA[
Introductory ParagraphTo understand genetic mechanisms driving disease, it is essential but difficult to map how risk alleles affect the composition of cells present in the body. Single-cell profiling quantifies granular information about tissues, but variant-associated cell states may reflect diverse combinations of the profiled cell features that are challenging to predefine. We introduce GeNA (Genotype-Neighborhood Associations), a statistical tool to identify cell state abundance quantitative trait loci (csaQTLs) in high-dimensional single-cell datasets. Instead of testing associations to predefined cell states, GeNA flexibly identifies the cell states whose abundance is most associated with genetic variants. In a genome-wide survey of scRNA-seq peripheral blood profiling from 969 individuals,1 GeNA identifies five independent loci associated with shifts in the relative abundance of immune cell states. For example, rs3003-T (p=1.96x10-11) associates with increased abundance of NK cells expressing TNF- response programs. This csaQTL colocalizes with increased risk for psoriasis, an autoimmune disease that responds to anti-TNF treatments. Flexibly characterizing csaQTLs for granular cell states may help illuminate how genetic background alters cellular composition to confer disease risk.
]]></description>
<dc:creator>Rumker, L.</dc:creator>
<dc:creator>Sakaue, S.</dc:creator>
<dc:creator>Reshef, Y.</dc:creator>
<dc:creator>Kang, J. B.</dc:creator>
<dc:creator>Yazar, S.</dc:creator>
<dc:creator>Alquicira-Hernandez, J.</dc:creator>
<dc:creator>Valencia, C.</dc:creator>
<dc:creator>Lagattuta, K. A.</dc:creator>
<dc:creator>Mah-Som, A.</dc:creator>
<dc:creator>Nathan, A.</dc:creator>
<dc:creator>Powell, J. E.</dc:creator>
<dc:creator>Loh, P.-R.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:date>2023-11-15</dc:date>
<dc:identifier>doi:10.1101/2023.11.13.566919</dc:identifier>
<dc:title><![CDATA[Identifying genetic variants that influence the abundance of cell states in single-cell data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-11-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.22.521678v1?rss=1">
<title>
<![CDATA[
Uncovering context-specific genetic-regulation of gene expression from single-cell RNA-sequencing using latent-factor models 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.22.521678v1?rss=1"
</link>
<description><![CDATA[
Genetic regulation of gene expression is a complex process, with genetic effects known to vary across cellular contexts such as cell types and environmental conditions. We developed SURGE, a method for unsupervised discovery of context-specific expression quantitative trait loci (eQTLs) from single-cell transcriptomic data. This allows discovery of the contexts or cell types modulating genetic regulation without prior knowledge. Applied to peripheral blood single-cell eQTL data, SURGE contexts capture continuous representations of distinct cell types and groupings of biologically related cell types. We demonstrate the disease-relevance of SURGE context-specific eQTLs using colocalization analysis and stratified LD-score regression.
]]></description>
<dc:creator>Strober, B. J.</dc:creator>
<dc:creator>Tayeb, K.</dc:creator>
<dc:creator>Popp, J.</dc:creator>
<dc:creator>Qi, G.</dc:creator>
<dc:creator>Gordon, M. G.</dc:creator>
<dc:creator>Perez, R.</dc:creator>
<dc:creator>Ye, C. J.</dc:creator>
<dc:creator>Battle, A.</dc:creator>
<dc:date>2022-12-23</dc:date>
<dc:identifier>doi:10.1101/2022.12.22.521678</dc:identifier>
<dc:title><![CDATA[Uncovering context-specific genetic-regulation of gene expression from single-cell RNA-sequencing using latent-factor models]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.11.519973v1?rss=1">
<title>
<![CDATA[
Reimagining Gene-Environment Interaction Analysis for Human Complex Traits 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.11.519973v1?rss=1"
</link>
<description><![CDATA[
In this study, we introduce PIGEON--a novel statistical framework for quantifying and estimating polygenic gene-environment interaction (GxE) using a variance component analytical approach. Based on PIGEON, we outline the main objectives in GxE studies, demonstrate the flaws in existing GxE approaches, and introduce an innovative estimation procedure which only requires summary statistics as input. We demonstrate the statistical superiority of PIGEON through extensive theoretical and empirical analyses and showcase its performance in multiple analytic settings, including a quasi-experimental GxE study of health outcomes, gene-by-sex interaction for 530 traits, and gene-by-treatment interaction in a randomized clinical trial. Our results show that PIGEON provides an innovative solution to many long-standing challenges in GxE inference and may fundamentally reshape analytical strategies in future GxE studies.
]]></description>
<dc:creator>Miao, J.</dc:creator>
<dc:creator>Song, G.</dc:creator>
<dc:creator>Wu, Y.</dc:creator>
<dc:creator>Hu, J.</dc:creator>
<dc:creator>Wu, Y.</dc:creator>
<dc:creator>Basu, S.</dc:creator>
<dc:creator>Andrews, J. S.</dc:creator>
<dc:creator>Schaumberg, K.</dc:creator>
<dc:creator>Fletcher, J. M.</dc:creator>
<dc:creator>Schmitz, L. L.</dc:creator>
<dc:creator>Lu, Q.</dc:creator>
<dc:date>2022-12-14</dc:date>
<dc:identifier>doi:10.1101/2022.12.11.519973</dc:identifier>
<dc:title><![CDATA[Reimagining Gene-Environment Interaction Analysis for Human Complex Traits]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.12.520180v1?rss=1">
<title>
<![CDATA[
Leveraging a machine learning derived surrogate phenotype to improve power for genome-wide association studies of partially missing phenotypes in population biobanks 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.12.520180v1?rss=1"
</link>
<description><![CDATA[
Within population biobanks, genetic discovery for specialized phenotypes is often limited by incomplete ascertainment. Machine learning (ML) is increasingly used to impute missing phenotypes from surrogate information. However, imputing missing phenotypes can invalidate statistical inference when the imputation model is misspecified, and proxy analysis of the ML-phenotype can introduce spurious associations. To overcome these limitations, we introduce SynSurr, an approach that jointly analyzes a partially missing target phenotype with a "synthetic surrogate", its predicted value from an ML-model. SynSurr estimates the same genetic effect as standard genome-wide association studies (GWAS) of the target phenotype, but improves power provided the synthetic surrogate is correlated with the target. Unlike imputation or proxy analysis, SynSurr does not require that the synthetic surrogate is obtained from a correctly specified generative model. We perform extensive simulations and an ablation analysis to compare SynSurr with existing methods. We also apply SynSurr to empower GWAS of dual-energy x-ray absorptiometry traits within the UK Biobank, leveraging a synthetic surrogate composed of bioelectrical impedance and anthropometric traits.
]]></description>
<dc:creator>McCaw, Z. R.</dc:creator>
<dc:creator>Gao, J. R.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:creator>Gronsbell, J.</dc:creator>
<dc:date>2022-12-14</dc:date>
<dc:identifier>doi:10.1101/2022.12.12.520180</dc:identifier>
<dc:title><![CDATA[Leveraging a machine learning derived surrogate phenotype to improve power for genome-wide association studies of partially missing phenotypes in population biobanks]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.25.559307v1?rss=1">
<title>
<![CDATA[
Ensembled best subset selection using summary statistics for polygenic risk prediction 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.25.559307v1?rss=1"
</link>
<description><![CDATA[
Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, yet existing methods face a tradeoff between predictive power and computational efficiency. We introduce ALL-Sum, a fast and scalable PRS method that combines an efficient summary statistic-based L0L2 penalized regression algorithm with an ensembling step that aggregates estimates from different tuning parameters for improved prediction performance. In extensive large-scale simulations across a wide range of polygenicity and genome-wide association studies (GWAS) sample sizes, ALL-Sum consistently outperforms popular alternative methods in terms of prediction accuracy, runtime, and memory usage. We analyze 27 published GWAS summary statistics for 11 complex traits from 9 reputable data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen, evaluated using individual-level UKBB data. ALL-Sum achieves the highest accuracy for most traits, particularly for GWAS with large sample sizes. We provide ALL-Sum as a user-friendly command-line software with pre-computed reference data for streamlined user-end analysis.
]]></description>
<dc:creator>Chen, T.</dc:creator>
<dc:creator>Zhang, H.</dc:creator>
<dc:creator>Mazumder, R.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:date>2023-09-26</dc:date>
<dc:identifier>doi:10.1101/2023.09.25.559307</dc:identifier>
<dc:title><![CDATA[Ensembled best subset selection using summary statistics for polygenic risk prediction]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.10.30.564764v1?rss=1">
<title>
<![CDATA[
A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.10.30.564764v1?rss=1"
</link>
<description><![CDATA[
Large-scale whole-genome sequencing (WGS) studies have improved our understanding of the contributions of coding and noncoding rare variants to complex human traits. Leveraging association effect sizes across multiple traits in WGS rare variant association analysis can improve statistical power over single-trait analysis, and also detect pleiotropic genes and regions. Existing multi-trait methods have limited ability to perform rare variant analysis of large-scale WGS data. We propose MultiSTAAR, a statistical framework and computationally-scalable analytical pipeline for functionally-informed multi-trait rare variant analysis in large-scale WGS studies. MultiSTAAR accounts for relatedness, population structure and correlation among phenotypes by jointly analyzing multiple traits, and further empowers rare variant association analysis by incorporating multiple functional annotations. We applied MultiSTAAR to jointly analyze three lipid traits (low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides) in 61,861 multi-ethnic samples from the Trans-Omics for Precision Medicine (TOPMed) Program. We discovered new associations with lipid traits missed by single-trait analysis, including rare variants within an enhancer of NIPSNAP3A and an intergenic region on chromosome 1.
]]></description>
<dc:creator>Li, X.</dc:creator>
<dc:creator>Chen, H.</dc:creator>
<dc:creator>Selvaraj, M. S.</dc:creator>
<dc:creator>Van Buren, E.</dc:creator>
<dc:creator>Zhou, H.</dc:creator>
<dc:creator>Wang, Y.</dc:creator>
<dc:creator>Sun, R.</dc:creator>
<dc:creator>McCaw, Z. R.</dc:creator>
<dc:creator>Yu, Z.</dc:creator>
<dc:creator>Arnett, D. K.</dc:creator>
<dc:creator>Bis, J. C.</dc:creator>
<dc:creator>Blangero, J.</dc:creator>
<dc:creator>Boerwinkle, E.</dc:creator>
<dc:creator>Bowden, D. W.</dc:creator>
<dc:creator>Brody, J. A.</dc:creator>
<dc:creator>Cade, B. E.</dc:creator>
<dc:creator>Carson, A. P.</dc:creator>
<dc:creator>Carlson, J. C.</dc:creator>
<dc:creator>Chami, N.</dc:creator>
<dc:creator>Chen, Y.-D. I.</dc:creator>
<dc:creator>Curran, J. E.</dc:creator>
<dc:creator>de Vries, P. S.</dc:creator>
<dc:creator>Fornage, M.</dc:creator>
<dc:creator>Franceschini, N.</dc:creator>
<dc:creator>Freedman, B. I.</dc:creator>
<dc:creator>Gu, C.</dc:creator>
<dc:creator>Heard-Costa, N. L.</dc:creator>
<dc:creator>He, J.</dc:creator>
<dc:creator>Hou, L.</dc:creator>
<dc:creator>Hung, Y.-J.</dc:creator>
<dc:creator>Irvin, M. R.</dc:creator>
<dc:creator>Kaplan, R. C.</dc:creator>
<dc:creator>Kardia, S. L. R.</dc:creator>
<dc:creator>Kelly, T.</dc:creator>
<dc:creator>Konigsberg, I.</dc:creator>
<dc:creator>Kooperberg, C.</dc:creator>
<dc:creator>Kral, B. G.</dc:creator>
<dc:creator>Li, C.</dc:creator>
<dc:creator>Loos, R. J. F.</dc:creator>
<dc:creator>Mahaney, M. C.</dc:creator>
<dc:creator>Martin, L. W.</dc:creator>
<dc:creator>Mathias, R. A.</dc:creator>
<dc:creator>Minster, R. L.</dc:creator>
<dc:creator>Mitchell, B. D</dc:creator>
<dc:date>2023-11-02</dc:date>
<dc:identifier>doi:10.1101/2023.10.30.564764</dc:identifier>
<dc:title><![CDATA[A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-11-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.10.555215v1?rss=1">
<title>
<![CDATA[
Whole Genome Sequencing Based Analysis of Inflammation Biomarkers in the Trans-Omics for Precision Medicine (TOPMed) Consortium 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.10.555215v1?rss=1"
</link>
<description><![CDATA[
Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38,465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program. We identified 22 distinct single-variant associations across 6 traits - E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin - that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.
]]></description>
<dc:creator>Jiang, M.-Z.</dc:creator>
<dc:creator>Gaynor, S. M.</dc:creator>
<dc:creator>Li, X.</dc:creator>
<dc:creator>Van Buren, E.</dc:creator>
<dc:creator>Stilp, A.</dc:creator>
<dc:creator>Buth, E.</dc:creator>
<dc:creator>Wang, F. F.</dc:creator>
<dc:creator>Manansala, R.</dc:creator>
<dc:creator>Gogarten, S. M.</dc:creator>
<dc:creator>Li, Z.</dc:creator>
<dc:creator>Polfus, L. M.</dc:creator>
<dc:creator>Salimi, S.</dc:creator>
<dc:creator>Bis, J. C.</dc:creator>
<dc:creator>Pankratz, N.</dc:creator>
<dc:creator>Yanek, L. R.</dc:creator>
<dc:creator>Durda, P.</dc:creator>
<dc:creator>Tracy, R. P.</dc:creator>
<dc:creator>Rich, S. S.</dc:creator>
<dc:creator>Rotter, J. I.</dc:creator>
<dc:creator>Mitchell, B. D.</dc:creator>
<dc:creator>Lewis, J. P.</dc:creator>
<dc:creator>Psaty, B. M.</dc:creator>
<dc:creator>Pratte, K. A.</dc:creator>
<dc:creator>Silverman, E. K.</dc:creator>
<dc:creator>Kaplan, R. C.</dc:creator>
<dc:creator>Avery, C.</dc:creator>
<dc:creator>North, K.</dc:creator>
<dc:creator>Mathias, R. A.</dc:creator>
<dc:creator>Faraday, N.</dc:creator>
<dc:creator>Lin, H.</dc:creator>
<dc:creator>Wang, B.</dc:creator>
<dc:creator>Carson, A. P.</dc:creator>
<dc:creator>Norwood, A. F.</dc:creator>
<dc:creator>Gibbs, R. A.</dc:creator>
<dc:creator>Kooperberg, C.</dc:creator>
<dc:creator>Lundin, J.</dc:creator>
<dc:creator>Peters, U.</dc:creator>
<dc:creator>Dupuis, J.</dc:creator>
<dc:creator>Hou, L.</dc:creator>
<dc:creator>Fornage, M.</dc:creator>
<dc:creator>Benjamin, E. J.</dc:creator>
<dc:creator>Reiner, A. P.</dc:creator>
<dc:creator>Bowler, R. P.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:creator>Auer, P. L.</dc:creator>
<dc:creator>Raf</dc:creator>
<dc:date>2023-09-12</dc:date>
<dc:identifier>doi:10.1101/2023.09.10.555215</dc:identifier>
<dc:title><![CDATA[Whole Genome Sequencing Based Analysis of Inflammation Biomarkers in the Trans-Omics for Precision Medicine (TOPMed) Consortium]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.03.24.485519v1?rss=1">
<title>
<![CDATA[
Novel Methods for Multi-ancestry Polygenic Prediction and their Evaluations in 3.7 Million Individuals of Diverse Ancestry 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.03.24.485519v1?rss=1"
</link>
<description><![CDATA[
Polygenic risk scores (PRS) increasingly predict complex traits, however, suboptimal performance in non-European populations raise concerns about clinical applications and health inequities. We developed CT-SLEB, a powerful and scalable method to calculate PRS using ancestry-specific GWAS summary statistics from multi-ancestry training samples, integrating clumping and thresholding, empirical Bayes and super learning. We evaluate CT-SLEB and nine-alternatives methods with large-scale simulated GWAS ([~]19 million common variants) and datasets from 23andMe Inc., the Global Lipids Genetics Consortium, All of Us and UK Biobank involving 5.1 million individuals of diverse ancestry, with 1.18 million individuals from four non-European populations across thirteen complex traits. Results demonstrate that CT-SLEB significantly improves PRS performance in non-European populations compared to simple alternatives, with comparable or superior performance to a recent, computationally intensive method. Moreover, our simulation studies offer insights into sample size requirements and SNP density effects on multi-ancestry risk prediction.
]]></description>
<dc:creator>Zhang, H.</dc:creator>
<dc:creator>Zhan, J.</dc:creator>
<dc:creator>Jin, J.</dc:creator>
<dc:creator>Ahearn, T. U.</dc:creator>
<dc:creator>Yu, Z.</dc:creator>
<dc:creator>O'Connell, J.</dc:creator>
<dc:creator>Jiang, Y.</dc:creator>
<dc:creator>Chen, T.</dc:creator>
<dc:creator>23andMe Research Team,</dc:creator>
<dc:creator>Garcia-Closas, M.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:creator>Koelsch, B. L.</dc:creator>
<dc:creator>Chatterjee, N.</dc:creator>
<dc:date>2022-03-27</dc:date>
<dc:identifier>doi:10.1101/2022.03.24.485519</dc:identifier>
<dc:title><![CDATA[Novel Methods for Multi-ancestry Polygenic Prediction and their Evaluations in 3.7 Million Individuals of Diverse Ancestry]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-03-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.02.08.527759v1?rss=1">
<title>
<![CDATA[
Accurate and Efficient Estimation of Local Heritability using Summary Statistics and LD Matrix 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.02.08.527759v1?rss=1"
</link>
<description><![CDATA[
Existing SNP-heritability estimation methods that leverage GWAS summary statistics produce estimators that are less efficient than the restricted maximum likelihood (REML) estimator using individual-level data under linear mixed models (LMMs). Increasing the precision of a heritability estimator is particularly important for regional analyses, as local genetic variances tend to be small. We introduce a new estimator for local heritability, "HEELS", which attains comparable statistical efficiency as REML (i.e. relative efficiency greater than 92%) but only requires summary-level statistics - Z-scores from the marginal association tests plus the empirical LD matrix. HEELS significantly improves the statistical efficiency of the existing summary-statistics-based heritability estimators- for instance, HEELS produces heritability estimates that are more than 3-fold and 7-times less variable than GRE and LDSC, respectively. Moreover, we introduce a unified framework to evaluate and compare the performance of different LD approximation strategies. We propose representing the empirical LD as the sum of a low-rank matrix and a banded matrix. This approximation not only reduces the storage and memory cost of using the LD matrix, but also improves the computational efficiency of the HEELS estimation. We demonstrate the statistical efficiency of HEELS and the advantages of our proposed LD approximation strategies both in simulations and through empirical analyses of the UK Biobank data.
]]></description>
<dc:creator>Li, H.</dc:creator>
<dc:creator>Mazumder, R.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:date>2023-02-09</dc:date>
<dc:identifier>doi:10.1101/2023.02.08.527759</dc:identifier>
<dc:title><![CDATA[Accurate and Efficient Estimation of Local Heritability using Summary Statistics and LD Matrix]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-02-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.05.11.540401v1?rss=1">
<title>
<![CDATA[
De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.05.11.540401v1?rss=1"
</link>
<description><![CDATA[
Transcription factors (TF) are proteins that bind DNA in a sequence-specific manner to regulate gene transcription. Despite their unique intrinsic sequence preferences, in vivo genomic occupancy profiles of TFs differ across cellular contexts. Hence, deciphering the sequence determinants of TF binding, both intrinsic and context-specific, is essential to understand gene regulation and the impact of regulatory, non-coding genetic variation. Biophysical models trained on in vitro TF binding assays can estimate intrinsic affinity landscapes and predict occupancy based on TF concentration and affinity. However, these models cannot adequately explain context-specific, in vivo binding profiles. Conversely, deep learning models, trained on in vivo TF binding assays, effectively predict and explain genomic occupancy profiles as a function of complex regulatory sequence syntax, albeit without a clear biophysical interpretation. To reconcile these complementary models of in vitro and in vivo TF binding, we developed Affinity Distillation (AD), a method that extracts thermodynamic affinities de-novo from deep learning models of TF chromatin immunoprecipitation (ChIP) experiments by marginalizing away the influence of genomic sequence context. Applied to neural networks modeling diverse classes of yeast and mammalian TFs, AD predicts energetic impacts of sequence variation within and surrounding motifs on TF binding as measured by diverse in vitro assays with superior dynamic range and accuracy compared to motif-based methods. Furthermore, AD can accurately discern affinities of TF paralogs. Our results highlight thermodynamic affinity as a key determinant of in vivo binding, suggest that deep learning models of in vivo binding implicitly learn high-resolution affinity landscapes, and show that these affinities can be successfully distilled using AD. This new biophysical interpretation of deep learning models enables high-throughput in silico experiments to explore the influence of sequence context and variation on both intrinsic affinity and in vivo occupancy.
]]></description>
<dc:creator>Alexandari, A. M.</dc:creator>
<dc:creator>Horton, C. A.</dc:creator>
<dc:creator>Shrikumar, A.</dc:creator>
<dc:creator>Shah, N.</dc:creator>
<dc:creator>Li, E.</dc:creator>
<dc:creator>Weilert, M.</dc:creator>
<dc:creator>Pufall, M. A.</dc:creator>
<dc:creator>Zeitlinger, J.</dc:creator>
<dc:creator>Fordyce, P. M.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:date>2023-05-11</dc:date>
<dc:identifier>doi:10.1101/2023.05.11.540401</dc:identifier>
<dc:title><![CDATA[De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-05-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.26.559662v1?rss=1">
<title>
<![CDATA[
5-hydroxymethylcytosines regulate gene expression as a passive DNA demethylation resisting epigenetic mark in proliferative somatic cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.26.559662v1?rss=1"
</link>
<description><![CDATA[
Enzymatic erasure of DNA methylation in mammals involves iterative 5-methylcytosine (5mC) oxidation by the ten-eleven translocation (TET) family of DNA dioxygenase proteins. As the most abundant form of oxidized 5mC, the prevailing model considers 5-hydroxymethylcytosine (5hmC) as a key nexus in active DNA demethylation that can either indirectly facilitate replication-dependent depletion of 5mC by inhibiting maintenance DNA methylation machinery (UHRF1/DNMT1), or directly be iteratively oxidized to 5-formylcytosine (5fC) and 5-carboxycytosine (5caC) and restored to cytosine (C) through thymine DNA glycosylase (TDG)-mediated 5fC/5caC excision repair. In proliferative somatic cells, to what extent TET-dependent removal of 5mC entails indirect DNA demethylation via 5hmC-induced replication-dependent dilution or direct iterative conversion of 5hmC to 5fC/5caC is unclear. Here we leverage a catalytic processivity stalling variant of human TET1 (TET1.var: T1662E) to decouple the stepwise generation of 5hmC from subsequent 5fC/5caC generation, excision and repair. By using a CRISPR/dCas9-based epigenome-editing platform, we demonstrate that 5fC/5caC excision repair (by wild-type TET1, TET1.wt), but not 5hmC generation alone (by TET1.var), is requisite for robust restoration of unmodified cytosines and reversal of somatic silencing of the methylation-sensitive, germline-specific RHOXF2B gene promoter. Furthermore, integrated whole-genome multi-modal epigenetic sequencing reveals that hemi-hydroxymethylated CpG dyads predominantly resist replication-dependent depletion of 5mC on the opposing strand in TET1.var-expressing cells. Notably, TET1.var-mediated 5hmC generation is sufficient to induce similar levels of differential gene expression (compared to TET1.wt) without inducing major changes in unmodified cytosine profiles across the genome. Our study suggests 5hmC alone plays a limited role in driving replication-dependent DNA demethylation in the presence of functional DNMT1/UHRF1 mechanisms, but can regulate gene expression as a bona fide epigenetic mark in proliferative somatic cells.
]]></description>
<dc:creator>Wei, A.</dc:creator>
<dc:creator>Zhang, H.</dc:creator>
<dc:creator>Qiu, Q.</dc:creator>
<dc:creator>Fabyanic, E. B.</dc:creator>
<dc:creator>Hu, P.</dc:creator>
<dc:creator>Wu, H.</dc:creator>
<dc:date>2023-09-27</dc:date>
<dc:identifier>doi:10.1101/2023.09.26.559662</dc:identifier>
<dc:title><![CDATA[5-hydroxymethylcytosines regulate gene expression as a passive DNA demethylation resisting epigenetic mark in proliferative somatic cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.04.15.537037v1?rss=1">
<title>
<![CDATA[
A transient dermal niche and dual epidermal programs underlie sweat gland development 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.04.15.537037v1?rss=1"
</link>
<description><![CDATA[
Eccrine glands are mammalian skin appendages indispensable for human thermoregulation. Like all skin-derived appendages, eccrine glands form from multipotent progenitors in the basal skin epidermis. It remains unclear how epidermal progenitors progressively specialize to specifically form eccrine glands, precluding efforts to regenerate these vital organs. Herein, we applied single nucleus transcriptomics to compare the expression content of wildtype, eccrine-forming mouse skin to that of mice harboring a skin-specific disruption of Engrailed 1 (En1), a transcription factor that promotes the formation of eccrine glands in both humans and mice. We identify two concurrent epidermal transcriptomes in the earliest eccrine anlagen: a predominant transcriptome that is shared with hair follicles, and a vastly underrepresented transcriptome that is En1-dependent and eccrine-specific. We demonstrate that differentiation of the eccrine anlage requires the induction of a transient and transcriptionally unique dermal niche that forms around each developing gland in humans and mice. Our study defines the transcriptional determinants underlying eccrine identity in the epidermis and uncovers the dermal niche required for eccrine developmental progression. By identifying these defining components of the eccrine developmental program, our findings set the stage for directed efforts to regenerate eccrine glands for comprehensive skin repair.
]]></description>
<dc:creator>Dingwall, H. L.</dc:creator>
<dc:creator>Tomizawa, R. R.</dc:creator>
<dc:creator>Aharoni, A.</dc:creator>
<dc:creator>Hu, P.</dc:creator>
<dc:creator>Qiu, Q.</dc:creator>
<dc:creator>Kokalari, B.</dc:creator>
<dc:creator>Martinez, S. M.</dc:creator>
<dc:creator>Donahue, J. C.</dc:creator>
<dc:creator>Aldea, D.</dc:creator>
<dc:creator>Mendoza, M.</dc:creator>
<dc:creator>Glass, I. A.</dc:creator>
<dc:creator>Birth Defects Research Laboratory,</dc:creator>
<dc:creator>Wu, H.</dc:creator>
<dc:creator>Kamberov, Y. G.</dc:creator>
<dc:date>2023-04-17</dc:date>
<dc:identifier>doi:10.1101/2023.04.15.537037</dc:identifier>
<dc:title><![CDATA[A transient dermal niche and dual epidermal programs underlie sweat gland development]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-04-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.10.30.564796v1?rss=1">
<title>
<![CDATA[
Decoding Heterogenous Single-cell Perturbation Responses 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.10.30.564796v1?rss=1"
</link>
<description><![CDATA[
Understanding diverse responses of individual cells to the same perturbation is central to many biological and biomedical problems. Current methods, however, do not precisely quantify the strength of perturbation responses and, more importantly, reveal new biological insights from heterogeneity in responses. Here we introduce the perturbation-response score (PS), based on constrained quadratic optimization, to quantify diverse perturbation responses at a single-cell level. Applied to single-cell transcriptomes of large-scale genetic perturbation datasets (e.g., Perturb-seq), PS outperforms existing methods for quantifying partial gene perturbation responses. In addition, PS presents two major advances. First, PS enables large-scale, single-cell-resolution dosage analysis of perturbation, without the need to titrate perturbation strength. By analyzing the dose-response patterns of over 2,000 essential genes in Perturb-seq, we identify two distinct patterns, depending on whether a moderate reduction in their expression induces strong downstream expression alterations. Second, PS identifies intrinsic and extrinsic biological determinants of perturbation responses. We demonstrate the application of PS in contexts such as T cell stimulation, latent HIV-1 expression, and pancreatic cell differentiation. Notably, PS unveiled a previously unrecognized, cell-type-specific role of coiled-coil domain containing 6 (CCDC6) in guiding liver and pancreatic lineage decisions, where CCDC6 knockouts drive the endoderm cell differentiation towards liver lineage, rather than pancreatic lineage. The PS approach provides an innovative method for dose-to-function analysis and will enable new biological discoveries from single-cell perturbation datasets.

One sentence summaryWe present a method to quantify diverse perturbation responses and discover novel biological insights in single-cell perturbation datasets.
]]></description>
<dc:creator>Song, B.</dc:creator>
<dc:creator>Liu, D.</dc:creator>
<dc:creator>Dai, W.</dc:creator>
<dc:creator>McMyn, N.</dc:creator>
<dc:creator>Wang, Q.</dc:creator>
<dc:creator>Yang, D.</dc:creator>
<dc:creator>Krejci, A.</dc:creator>
<dc:creator>Vasilyev, A.</dc:creator>
<dc:creator>Untermoser, N.</dc:creator>
<dc:creator>Loregger, A.</dc:creator>
<dc:creator>Song, D.</dc:creator>
<dc:creator>Williams, B.</dc:creator>
<dc:creator>Rosen, B.</dc:creator>
<dc:creator>Cheng, X.</dc:creator>
<dc:creator>Chao, L.</dc:creator>
<dc:creator>Kale, H.</dc:creator>
<dc:creator>Zhang, H.</dc:creator>
<dc:creator>Diao, Y.</dc:creator>
<dc:creator>Bürckstümmer, T.</dc:creator>
<dc:creator>Siliciano, J. M.</dc:creator>
<dc:creator>Li, J. J.</dc:creator>
<dc:creator>Siliciano, R.</dc:creator>
<dc:creator>Huangfu, D.</dc:creator>
<dc:creator>Li, W.</dc:creator>
<dc:date>2023-11-02</dc:date>
<dc:identifier>doi:10.1101/2023.10.30.564796</dc:identifier>
<dc:title><![CDATA[Decoding Heterogenous Single-cell Perturbation Responses]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-11-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.06.14.544990v1?rss=1">
<title>
<![CDATA[
Discovery of Competent Chromatin Regions in Human Embryonic Stem Cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.06.14.544990v1?rss=1"
</link>
<description><![CDATA[
The mechanisms underlying the ability of embryonic stem cells (ESCs) to rapidly activate lineage-specific genes during differentiation remain largely unknown. Through multiple CRISPR-activation screens, we discovered human ESCs have pre-established transcriptionally competent chromatin regions (CCRs) that support lineage-specific gene expression at levels comparable to differentiated cells. CCRs reside in the same topological domains as their target genes. They lack typical enhancer-associated histone modifications but show enriched occupancy of pluripotent transcription factors, DNA demethylation factors, and histone deacetylases. TET1 and QSER1 protect CCRs from excessive DNA methylation, while HDAC1 family members prevent premature activation. This "push and pull" feature resembles bivalent domains at developmental gene promoters but involves distinct molecular mechanisms. Our study provides new insights into pluripotency regulation and cellular plasticity in development and disease.

One sentence summaryWe report a class of distal regulatory regions distinct from enhancers that confer human embryonic stem cells with the competence to rapidly activate the expression of lineage-specific genes.
]]></description>
<dc:creator>Pulecio, J.</dc:creator>
<dc:creator>Tayyebi, Z.</dc:creator>
<dc:creator>Liu, D.</dc:creator>
<dc:creator>Wong, W.</dc:creator>
<dc:creator>Luo, R.</dc:creator>
<dc:creator>Damodaran, J. R.</dc:creator>
<dc:creator>Kaplan, S.</dc:creator>
<dc:creator>Cho, H.</dc:creator>
<dc:creator>Yan, J.</dc:creator>
<dc:creator>Murphy, D. J.</dc:creator>
<dc:creator>Rickert, R.</dc:creator>
<dc:creator>Shukla, A.</dc:creator>
<dc:creator>Zhong, A.</dc:creator>
<dc:creator>Gonzalez, F.</dc:creator>
<dc:creator>Yang, D.</dc:creator>
<dc:creator>Li, W.</dc:creator>
<dc:creator>Zhou, T.</dc:creator>
<dc:creator>Apostolou, E.</dc:creator>
<dc:creator>Leslie, C.</dc:creator>
<dc:creator>Huangfu, D.</dc:creator>
<dc:date>2023-06-14</dc:date>
<dc:identifier>doi:10.1101/2023.06.14.544990</dc:identifier>
<dc:title><![CDATA[Discovery of Competent Chromatin Regions in Human Embryonic Stem Cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-06-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.05.03.539283v1?rss=1">
<title>
<![CDATA[
Parallel genome-scale CRISPR screens distinguish pluripotency and self-renewal 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.05.03.539283v1?rss=1"
</link>
<description><![CDATA[
Pluripotent stem cells are defined by their self-renewal capacity, which is the ability of the stem cells to proliferate indefinitely while maintaining the pluripotent identity essential for their ability to differentiate into any somatic cell lineage. However, understanding the mechanisms that control stem cell fitness versus the pluripotent cell identity is challenging. To investigate the interplay between these two aspects of pluripotency, we performed four parallel genome-scale CRISPR-Cas9 loss-of-function screens interrogating stem cell fitness in hPSC self-renewal conditions, and the dissolution of the primed pluripotency identity during early differentiation. Comparative analyses led to the discovery of genes with distinct roles in pluripotency regulation, including mitochondrial and metabolism regulators crucial for stem cell fitness, and chromatin regulators that control pluripotent identity during early differentiation. We further discovered a core set of factors that control both stem cell fitness and pluripotent identity, including a network of chromatin factors that safeguard pluripotency. Our unbiased and systematic screening and comparative analyses disentangle two interconnected aspects of pluripotency, provide rich datasets for exploring pluripotent cell identity versus cell fitness, and offer a valuable model for categorizing gene function in broad biological contexts.
]]></description>
<dc:creator>Rosen, B. P.</dc:creator>
<dc:creator>Li, Q. V.</dc:creator>
<dc:creator>Cho, H.</dc:creator>
<dc:creator>Liu, D.</dc:creator>
<dc:creator>Yang, D.</dc:creator>
<dc:creator>Graff, S.</dc:creator>
<dc:creator>Yan, J.</dc:creator>
<dc:creator>Luo, R.</dc:creator>
<dc:creator>Verma, N.</dc:creator>
<dc:creator>Damodaran, J. R.</dc:creator>
<dc:creator>Beer, M. A.</dc:creator>
<dc:creator>Sidoli, S.</dc:creator>
<dc:creator>Huangfu, D.</dc:creator>
<dc:date>2023-05-03</dc:date>
<dc:identifier>doi:10.1101/2023.05.03.539283</dc:identifier>
<dc:title><![CDATA[Parallel genome-scale CRISPR screens distinguish pluripotency and self-renewal]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-05-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.08.03.551876v1?rss=1">
<title>
<![CDATA[
Interface-guided phenotyping of coding variants in the transcription factor RUNX1 with SEUSS 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.08.03.551876v1?rss=1"
</link>
<description><![CDATA[
Understanding the consequences of single amino acid substitutions in cancer driver genes remains an unmet need. Perturb-seq provides a tool to investigate the effects of individual mutations on cellular programs. Here we deploy SEUSS, a Perturb-seq like approach, to generate and assay mutations at physical interfaces of the RUNX1 Runt domain. We measured the impact of 115 mutations on RNA profiles in single myelogenous leukemia cells and used the profiles to categorize mutations into three functionally distinct groups: wild-type (WT)-like, loss-of-function (LOF)-like and hypomorphic. Notably, the largest concentration of functional mutations (non-WT-like) clustered at the DNA binding site and contained many of the more frequently observed mutations in human cancers. Hypomorphic variants shared characteristics with loss of function variants but had gene expression profiles indicative of response to neural growth factor and cytokine recruitment of neutrophils. Additionally, DNA accessibility changes upon perturbations were enriched for RUNX1 binding motifs, particularly near differentially expressed genes. Overall, our work demonstrates the potential of targeting protein interaction interfaces to better define the landscape of prospective phenotypes reachable by amino acid substitutions.
]]></description>
<dc:creator>Ozturk, K.</dc:creator>
<dc:creator>Panwala, R.</dc:creator>
<dc:creator>Sheen, J.</dc:creator>
<dc:creator>Ford, K.</dc:creator>
<dc:creator>Payne, N.</dc:creator>
<dc:creator>Zhang, D.-E.</dc:creator>
<dc:creator>Hutter, S.</dc:creator>
<dc:creator>Haferlach, T.</dc:creator>
<dc:creator>Ideker, T.</dc:creator>
<dc:creator>Mali, P.</dc:creator>
<dc:creator>Carter, H.</dc:creator>
<dc:date>2023-08-04</dc:date>
<dc:identifier>doi:10.1101/2023.08.03.551876</dc:identifier>
<dc:title><![CDATA[Interface-guided phenotyping of coding variants in the transcription factor RUNX1 with SEUSS]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-08-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.19.521116v1?rss=1">
<title>
<![CDATA[
Universal chromatin state annotation of the mouse genome 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.19.521116v1?rss=1"
</link>
<description><![CDATA[
Genome-wide chromatin states learned from integrating genome-wide maps of multiple epigenetic marks within the same cell type have been widely used to generate genome annotations of individual cell types. An alternative strategy based on  stacked modeling can provide a single  universal chromatin state annotation based jointly on data from many cell types. In human, such an approach was recently demonstrated and the resulting chromatin state annotation, denoted full-stack, was shown to have complementary advantages to per-cell-type annotations. However, an analogous annotation has not been previously available in mouse. Here, we produce a chromatin state annotation for mouse based on 901 datasets assaying 14 chromatin marks in 26 different cell or tissue types. To characterize each chromatin state, we relate the states to other external annotations and compare them to analogously defined states in human. We expect the full-stack chromatin state annotation for mouse will be a useful resource for studying the genome of this key mammalian model organism.
]]></description>
<dc:creator>Vu, H. T.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2022-12-20</dc:date>
<dc:identifier>doi:10.1101/2022.12.19.521116</dc:identifier>
<dc:title><![CDATA[Universal chromatin state annotation of the mouse genome]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.07.14.549056v1?rss=1">
<title>
<![CDATA[
Integrative epigenomic and functional characterization assay based annotation of regulatory activity across diverse human cell types 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.07.14.549056v1?rss=1"
</link>
<description><![CDATA[
We introduce ChromActivity, a computational framework for predicting and annotating regulatory activity across the genome through integration of multiple epigenomic maps and various functional characterization datasets. ChromActivity generates genomewide predictions of regulatory activity associated with each functional characterization dataset across many cell types based on available epigenomic data. It then for each cell type produces (1) ChromScoreHMM genome annotations based on the combinatorial and spatial patterns within these predictions and (2) ChromScore tracks of overall predicted regulatory activity. ChromActivity provides a resource for analyzing and interpreting the human regulatory genome across diverse cell types.
]]></description>
<dc:creator>Dincer, T. U.</dc:creator>
<dc:creator>Ernst, J.</dc:creator>
<dc:date>2023-07-15</dc:date>
<dc:identifier>doi:10.1101/2023.07.14.549056</dc:identifier>
<dc:title><![CDATA[Integrative epigenomic and functional characterization assay based annotation of regulatory activity across diverse human cell types]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-07-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.07.27.550836v1?rss=1">
<title>
<![CDATA[
ChromaFold predicts the 3D contact map from single-cell chromatin accessibility 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.07.27.550836v1?rss=1"
</link>
<description><![CDATA[
The identification of cell-type-specific 3D chromatin interactions between regulatory elements can help to decipher gene regulation and to interpret the function of disease-associated non-coding variants. However, current chromosome conformation capture (3C) technologies are unable to resolve interactions at this resolution when only small numbers of cells are available as input. We therefore present ChromaFold, a deep learning model that predicts 3D contact maps and regulatory interactions from single-cell ATAC sequencing (scATAC-seq) data alone. ChromaFold uses pseudobulk chromatin accessibility, co-accessibility profiles across metacells, and predicted CTCF motif tracks as input features and employs a lightweight architecture to enable training on standard GPUs. Once trained on paired scATAC-seq and Hi-C data in human cell lines and tissues, ChromaFold can accurately predict both the 3D contact map and peak-level interactions across diverse human and mouse test cell types. In benchmarking against a recent deep learning method that uses bulk ATAC-seq, DNA sequence, and CTCF ChIP-seq to make cell-type-specific predictions, ChromaFold yields superior prediction performance when including CTCF ChIP-seq data as an input and comparable performance without. Finally, fine-tuning ChromaFold on paired scATAC-seq and Hi-C in a complex tissue enables deconvolution of chromatin interactions across cell subpopulations. ChromaFold thus achieves state-of-the-art prediction of 3D contact maps and regulatory interactions using scATAC-seq alone as input data, enabling accurate inference of cell-type-specific interactions in settings where 3C-based assays are infeasible.
]]></description>
<dc:creator>Gao, V. R.</dc:creator>
<dc:creator>Yang, R.</dc:creator>
<dc:creator>Das, A.</dc:creator>
<dc:creator>Luo, R.</dc:creator>
<dc:creator>Luo, H.</dc:creator>
<dc:creator>McNally, D. R.</dc:creator>
<dc:creator>Karagiannidis, I.</dc:creator>
<dc:creator>Rivas, M. A.</dc:creator>
<dc:creator>Wang, Z.-m.</dc:creator>
<dc:creator>Barisic, D.</dc:creator>
<dc:creator>Karbalayghareh, A.</dc:creator>
<dc:creator>Wong, W.</dc:creator>
<dc:creator>Zhan, Y.</dc:creator>
<dc:creator>Chin, C. R.</dc:creator>
<dc:creator>Noble, W. S.</dc:creator>
<dc:creator>Bilmes, J. A.</dc:creator>
<dc:creator>Apostolou, E.</dc:creator>
<dc:creator>Kharas, M.</dc:creator>
<dc:creator>Beguelin, W.</dc:creator>
<dc:creator>Viny, A. D.</dc:creator>
<dc:creator>Huangfu, D.</dc:creator>
<dc:creator>Rudensky, A.</dc:creator>
<dc:creator>Melnick, A.</dc:creator>
<dc:creator>Leslie, C. S.</dc:creator>
<dc:date>2023-07-28</dc:date>
<dc:identifier>doi:10.1101/2023.07.27.550836</dc:identifier>
<dc:title><![CDATA[ChromaFold predicts the 3D contact map from single-cell chromatin accessibility]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-07-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.03.20.533521v1?rss=1">
<title>
<![CDATA[
Flexible parsing and preprocessing of technical sequences with splitcode 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.03.20.533521v1?rss=1"
</link>
<description><![CDATA[
Next-generation sequencing libraries are constructed with numerous synthetic constructs such as sequencing adapters, barcodes, and unique molecular identifiers. Such sequences can be essential for interpreting results of sequencing assays, and when they contain information pertinent to an experiment, they must be processed and analyzed. We present a tool called splitcode, that enables flexible and efficient parsing, interpreting, and editing of sequencing reads. This versatile tool facilitates simple, reproducible preprocessing of reads from libraries constructed for a large array of single-cell and bulk sequencing assays.

Availability and ImplementationThe splitcode program is free, open source, and available for download at http://github.com/pachterlab/splitcode.
]]></description>
<dc:creator>Sullivan, D. K.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-03-23</dc:date>
<dc:identifier>doi:10.1101/2023.03.20.533521</dc:identifier>
<dc:title><![CDATA[Flexible parsing and preprocessing of technical sequences with splitcode]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-03-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.14.543267v1?rss=1">
<title>
<![CDATA[
Universal preprocessing of single-cell genomics data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.14.543267v1?rss=1"
</link>
<description><![CDATA[
We describe a workflow for preprocessing a wide variety of single-cell genomics data types. The approach is based on parsing of machine-readable seqspec assay specifications to customize inputs for kb-python, which uses kallisto and bustools to catalog reads, error correct barcodes, and count reads. The universal preprocessing method is implemented in the Python package cellatlas that is available for download at: https://github.com/cellatlas/cellatlas/.
]]></description>
<dc:creator>Booeshaghi, A. S.</dc:creator>
<dc:creator>Sullivan, D. K.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-09-15</dc:date>
<dc:identifier>doi:10.1101/2023.09.14.543267</dc:identifier>
<dc:title><![CDATA[Universal preprocessing of single-cell genomics data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.11.21.568164v1?rss=1">
<title>
<![CDATA[
kallisto, bustools, and kb-python for quantifying bulk, single-cell, and single-nucleus RNA-seq 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.11.21.568164v1?rss=1"
</link>
<description><![CDATA[
The term "RNA-seq" refers to a collection of assays based on sequencing experiments that involve quantifying RNA species from bulk tissue, from single cells, or from single nuclei. The kallisto, bustools, and kb-python programs are free, open-source software tools for performing this analysis that together can produce gene expression quantification from raw sequencing reads. The quantifications can be individualized for multiple cells, multiple samples, or both. Additionally, these tools allow gene expression values to be classified as originating from nascent RNA species or mature RNA species, making this workflow amenable to both cell-based and nucleus-based assays. This protocol describes in detail how to use kallisto and bustools in conjunction with a wrapper, kb-python, to preprocess RNA-seq data.
]]></description>
<dc:creator>Sullivan, D. K.</dc:creator>
<dc:creator>Min, K. H.</dc:creator>
<dc:creator>Hjörleifsson, K. E.</dc:creator>
<dc:creator>Luebbert, L.</dc:creator>
<dc:creator>Holley, G.</dc:creator>
<dc:creator>Moses, L.</dc:creator>
<dc:creator>Gustafsson, J.</dc:creator>
<dc:creator>Bray, N. L.</dc:creator>
<dc:creator>Pimentel, H.</dc:creator>
<dc:creator>Booeshaghi, A. S.</dc:creator>
<dc:creator>Melsted, P.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-11-22</dc:date>
<dc:identifier>doi:10.1101/2023.11.21.568164</dc:identifier>
<dc:title><![CDATA[kallisto, bustools, and kb-python for quantifying bulk, single-cell, and single-nucleus RNA-seq]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-11-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.12.08.471788v1?rss=1">
<title>
<![CDATA[
Efficient pre-processing of Single-cell ATAC-seq data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.12.08.471788v1?rss=1"
</link>
<description><![CDATA[
Single-cell and single-nucleus genomics assays are becoming increasingly complex, with multiple measurements of distinct modalities performed concurrently resulting in "multimodal" readouts. While multimodal single-cell and single-nucleus genomics offers the potential to better understand how distinct cellular processes are coordinated, there can be technical and cost tradeoffs associated with increasing the number of measurement modes. To assess some of the tradeoffs inherent in multimodal assays, we have developed snATAK for preprocessing sequencing-based high-throughput assays that measure single-nucleus chromatin accessibility. Coupled with kallisto bustools for single-nucleus RNA-seq preprocessing, the snATAK workflow can be used for uniform preprocessing of 10x Genomics Multiome and single-nucleus ATAC-seq, SHARE-seq, ISSAAC-seq, spatial ATAC-seq and other chromatin-related assays. Using snATAK, we are able to perform cross-platform comparisons and quantify some of the tradeoffs between Multiome and unregistered single-nucleus RNA-seq/ATAC-seq experiments. We also show that snATAK can be used to assess allele concordance between paired RNAseq and ATACseq. snATAK is available at https://github.com/pachterlab/snATAK/.
]]></description>
<dc:creator>Gao, F.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2021-12-10</dc:date>
<dc:identifier>doi:10.1101/2021.12.08.471788</dc:identifier>
<dc:title><![CDATA[Efficient pre-processing of Single-cell ATAC-seq data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-12-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.17.558131v1?rss=1">
<title>
<![CDATA[
Biophysically Interpretable Inference of Cell Types from Multimodal Sequencing Data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.17.558131v1?rss=1"
</link>
<description><![CDATA[
Multimodal, single-cell genomics technologies enable simultaneous capture of multiple facets of DNA and RNA processing in the cell. This creates opportunities for transcriptome-wide, mechanistic studies of cellular processing in heterogeneous cell types, with applications ranging from inferring kinetic differences between cells, to the role of stochasticity in driving heterogeneity. However, current methods for determining cell types or  clusters present in multimodal data often rely on ad hoc or independent treatment of modalities, and assumptions ignoring inherent properties of the count data. To enable interpretable and consistent cell cluster determination from multimodal data, we present meK-Means (mechanistic K-Means) which integrates modalities and learns underlying, shared biophysical states through a unifying model of transcription. In particular, we demonstrate how meK-Means can be used to cluster cells from unspliced and spliced mRNA count modalities. By utilizing the causal, physical relationships underlying these modalities, we identify shared transcriptional kinetics across cells, which induce the observed gene expression profiles, and provide an alternative definition for  clusters through the governing parameters of cellular processes.
]]></description>
<dc:creator>Chari, T.</dc:creator>
<dc:creator>Gorin, G.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-09-17</dc:date>
<dc:identifier>doi:10.1101/2023.09.17.558131</dc:identifier>
<dc:title><![CDATA[Biophysically Interpretable Inference of Cell Types from Multimodal Sequencing Data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.07.20.549945v1?rss=1">
<title>
<![CDATA[
Voyager: exploratory single-cell genomics data analysis with geospatial statistics 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.07.20.549945v1?rss=1"
</link>
<description><![CDATA[
Exploratory spatial data analysis (ESDA) can be a powerful approach to understanding single-cell genomics datasets, but it is not yet part of standard data analysis workflows. In particular, geospatial analyses, which have been developed and refined for decades, have yet to be fully adapted and applied to spatial single-cell analysis. We introduce the Voyager platform, which systematically brings the geospatial ESDA tradition to (spatial) -omics, with local, bivariate, and multivariate spatial methods not yet commonly applied to spatial -omics, united by a uniform user interface. Using Voyager, we showcase biological insights that can be derived with its methods, such as biologically relevant negative spatial autocorrelation. Underlying Voyager is the SpatialFeatureExperiment data structure, which combines Simple Feature with SingleCellExperiment and AnnData to represent and operate on geometries bundled with gene expression data. Voyager has comprehensive tutorials demonstrating ESDA built on GitHub Actions to ensure reproducibility and scalability, using data from popular commercial technologies. Voyager is implemented in both R/Bioconductor and Python/PyPI, and features compatibility tests to ensure that both implementations return consistent results.
]]></description>
<dc:creator>Moses, L.</dc:creator>
<dc:creator>Einarsson, P. H.</dc:creator>
<dc:creator>Jackson, K. C.</dc:creator>
<dc:creator>Luebbert, L.</dc:creator>
<dc:creator>Booeshaghi, A. S.</dc:creator>
<dc:creator>Antonsson, S. E.</dc:creator>
<dc:creator>Melsted, P.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-07-22</dc:date>
<dc:identifier>doi:10.1101/2023.07.20.549945</dc:identifier>
<dc:title><![CDATA[Voyager: exploratory single-cell genomics data analysis with geospatial statistics]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-07-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.01.13.523995v1?rss=1">
<title>
<![CDATA[
Mechanistic modeling with a variational autoencoder for multimodal single-cell RNA sequencing data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.01.13.523995v1?rss=1"
</link>
<description><![CDATA[
We motivate and present biVI, which combines the variational autoencoder framework of scVI with biophysically motivated, bivariate models for nascent and mature RNA distributions. While previous approaches to integrate bimodal data via the variational autoencoder framework ignore the causal relationship between measurements, biVI models the biophysical processes that give rise to observations. We demonstrate through simulated benchmarking that biVI captures cell type structure in a low-dimensional space and accurately recapitulates parameter values and copy number distributions. On biological data, biVI provides a scalable route for identifying the biophysical mechanisms underlying gene expression. This analytical approach outlines a generalizable strateg for treating multimodal datasets generated by high-throughput, single-cell genomic assays.
]]></description>
<dc:creator>Carilli, M. T.</dc:creator>
<dc:creator>Gorin, G.</dc:creator>
<dc:creator>Choi, Y.</dc:creator>
<dc:creator>Chari, T.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-01-14</dc:date>
<dc:identifier>doi:10.1101/2023.01.13.523995</dc:identifier>
<dc:title><![CDATA[Mechanistic modeling with a variational autoencoder for multimodal single-cell RNA sequencing data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-01-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.06.11.495771v1?rss=1">
<title>
<![CDATA[
Monod: mechanistic analysis of single-cell RNA sequencing count data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.06.11.495771v1?rss=1"
</link>
<description><![CDATA[
Single-cell RNA sequencing analysis centers on illuminating cell diversity and understanding the transcriptional mechanisms underlying cellular function. These datasets are large, noisy, and complex. Current analyses prioritize noise removal and dimensionality reduction to tackle these challenges and extract biological insight. We propose an alternative, physical approach to leverage the stochasticity, size, and multimodal nature of these data to explicitly distinguish their biological and technical facets while revealing the underlying regulatory processes. With the Python package Monod, we demonstrate how nascent and mature RNA counts, present in most published datasets, can be meaningfully "integrated" under biophysical models of transcription. By utilizing variation in these modalities, we can identify transcriptional modulation not discernible though changes in average gene expression, quantitatively compare mechanistic hypotheses of gene regulation, analyze transcriptional data from different technologies within a common framework, and minimize the use of opaque or distortive normalization and transformation techniques.
]]></description>
<dc:creator>Gorin, G.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2022-06-12</dc:date>
<dc:identifier>doi:10.1101/2022.06.11.495771</dc:identifier>
<dc:title><![CDATA[Monod: mechanistic analysis of single-cell RNA sequencing count data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-06-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.05.17.541250v1?rss=1">
<title>
<![CDATA[
Studying stochastic systems biology of the cell with single-cell genomics data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.05.17.541250v1?rss=1"
</link>
<description><![CDATA[
Recent experimental developments in genome-wide RNA quantification hold considerable promise for systems biology. However, rigorously probing the biology of living cells requires a unified mathematical framework that accounts for single-molecule biological stochasticity in the context of technical variation associated with genomics assays. We review models for a variety of RNA transcription processes, as well as the encapsulation and library construction steps of microfluidics-based single-cell RNA sequencing, and present a framework to integrate these phenomena by the manipulation of generating functions. Finally, we use simulated scenarios and biological data to illustrate the implications and applications of the approach.
]]></description>
<dc:creator>Gorin, G.</dc:creator>
<dc:creator>Vastola, J. J.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-05-18</dc:date>
<dc:identifier>doi:10.1101/2023.05.17.541250</dc:identifier>
<dc:title><![CDATA[Studying stochastic systems biology of the cell with single-cell genomics data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-05-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.04.26.591412v1?rss=1">
<title>
<![CDATA[
CRISPR Screening Uncovers a Long-Range Enhancer for ONECUT1 in Pancreatic Differentiation and Links a Diabetes Risk Variant 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.04.26.591412v1?rss=1"
</link>
<description><![CDATA[
Functional enhancer annotation is a valuable first step for understanding tissue-specific transcriptional regulation and prioritizing disease-associated non-coding variants for investigation. However, unbiased enhancer discovery in physiologically relevant contexts remains a major challenge. To discover regulatory elements pertinent to diabetes, we conducted a CRISPR interference screen in the human pluripotent stem cell (hPSC) pancreatic differentiation system. Among the enhancers uncovered, we focused on a long-range enhancer [~]664 kb from the ONECUT1 promoter, since coding mutations in ONECUT1 cause pancreatic hypoplasia and neonatal diabetes. Homozygous enhancer deletion in hPSCs was associated with a near-complete loss of ONECUT1 gene expression and compromised pancreatic differentiation. This enhancer contains a confidently fine-mapped type 2 diabetes associated variant (rs528350911) which disrupts a GATA motif. Introduction of the risk variant into hPSCs revealed substantially reduced binding of key pancreatic transcription factors (GATA4, GATA6 and FOXA2) on the edited allele, accompanied by a slight reduction of ONECUT1 transcription, supporting a causal role for this risk variant in metabolic disease. This work expands our knowledge about transcriptional regulation in pancreatic development through the characterization of a long-range enhancer and highlights the utility of enhancer discovery in disease-relevant settings for understanding monogenic and complex disease.
]]></description>
<dc:creator>Kaplan, S. J.</dc:creator>
<dc:creator>Wong, W.</dc:creator>
<dc:creator>Yan, J.</dc:creator>
<dc:creator>Pulecio, J.</dc:creator>
<dc:creator>Cho, H.</dc:creator>
<dc:creator>Leslie-Iyer, J.</dc:creator>
<dc:creator>Kazakov, J.</dc:creator>
<dc:creator>Zhao, J.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:creator>Murphy, D.</dc:creator>
<dc:creator>Luo, R.</dc:creator>
<dc:creator>Dey, K. K.</dc:creator>
<dc:creator>Apostolou, E.</dc:creator>
<dc:creator>Lesie, C. S.</dc:creator>
<dc:creator>Huangfu, D.</dc:creator>
<dc:date>2024-04-29</dc:date>
<dc:identifier>doi:10.1101/2024.04.26.591412</dc:identifier>
<dc:title><![CDATA[CRISPR Screening Uncovers a Long-Range Enhancer for ONECUT1 in Pancreatic Differentiation and Links a Diabetes Risk Variant]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-04-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.04.22.590634v1?rss=1">
<title>
<![CDATA[
Massively parallel reporter assays and mouse transgenic assays provide complementary information about neuronal enhancer activity 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.04.22.590634v1?rss=1"
</link>
<description><![CDATA[
Genetic studies find hundreds of thousands of noncoding variants associated with psychiatric disorders. Massively parallel reporter assays (MPRAs) and in vivo transgenic mouse assays can be used to assay the impact of these variants. However, the relevance of MPRAs to in vivo function is unknown and transgenic assays suffer from low throughput. Here, we studied the utility of combining the two assays to study the impact of non-coding variants. We carried out an MPRA on over 50,000 sequences derived from enhancers validated in transgenic mouse assays and from multiple fetal neuronal ATAC-seq datasets. We also tested over 20,000 variants, including synthetic mutations in highly active neuronal enhancers and 177 common variants associated with psychiatric disorders. Variants with a high impact on MPRA activity were further tested in mice. We found a strong and specific correlation between MPRA and mouse neuronal enhancer activity including changes in neuronal enhancer activity in mouse embryos for variants with strong MPRA effects. Mouse assays also revealed pleiotropic variant effects that could not be observed in MPRA. Our work provides a large catalog of functional neuronal enhancers and variant effects and highlights the effectiveness of combining MPRAs and mouse transgenic assays.
]]></description>
<dc:creator>Kosicki, M.</dc:creator>
<dc:creator>Cintron, D. L.</dc:creator>
<dc:creator>Page, N. F.</dc:creator>
<dc:creator>Georgakopoulos-Soares, I.</dc:creator>
<dc:creator>Akiyama, J. A.</dc:creator>
<dc:creator>Plajzer-Frick, I.</dc:creator>
<dc:creator>Novak, C. S.</dc:creator>
<dc:creator>Kato, M.</dc:creator>
<dc:creator>Hunter, R. D.</dc:creator>
<dc:creator>von Maydell, K.</dc:creator>
<dc:creator>Barton, S.</dc:creator>
<dc:creator>Godfrey, P.</dc:creator>
<dc:creator>Beckman, E.</dc:creator>
<dc:creator>Sanders, S. J.</dc:creator>
<dc:creator>Pennacchio, L. A.</dc:creator>
<dc:creator>Ahituv, N.</dc:creator>
<dc:date>2024-04-23</dc:date>
<dc:identifier>doi:10.1101/2024.04.22.590634</dc:identifier>
<dc:title><![CDATA[Massively parallel reporter assays and mouse transgenic assays provide complementary information about neuronal enhancer activity]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-04-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.04.16.589814v1?rss=1">
<title>
<![CDATA[
Massively parallel jumping assay decodes Alu retrotransposition activity 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.04.16.589814v1?rss=1"
</link>
<description><![CDATA[
The human genome contains millions of retrotransposons, several of which could become active due to somatic mutations having phenotypic consequences, including disease. However, it is not thoroughly understood how nucleotide changes in retrotransposons affect their jumping activity. Here, we developed a novel massively parallel jumping assay (MPJA) that can test the jumping potential of thousands of transposons en masse. We generated nucleotide variant library of selected four Alu retrotransposons containing 165,087 different haplotypes and tested them for their jumping ability using MPJA. We found 66,821 unique jumping haplotypes, allowing us to pinpoint domains and variants vital for transposition. Mapping these variants to the Alu-RNA secondary structure revealed stem-loop features that contribute to jumping potential. Combined, our work provides a novel high-throughput assay that assesses the ability of retrotransposons to jump and identifies nucleotide changes that have the potential to reactivate them in the human genome.
]]></description>
<dc:creator>Ahituv, N.</dc:creator>
<dc:creator>Matharu, N.</dc:creator>
<dc:creator>Zhao, J.</dc:creator>
<dc:creator>Sohota, A.</dc:creator>
<dc:creator>Deng, L.</dc:creator>
<dc:creator>Hung, Y.</dc:creator>
<dc:creator>Li, Z.</dc:creator>
<dc:creator>Sims, J.</dc:creator>
<dc:creator>Rattanasopha, S.</dc:creator>
<dc:creator>Meyer, J.</dc:creator>
<dc:creator>Carbone, L.</dc:creator>
<dc:creator>Kircher, M.</dc:creator>
<dc:date>2024-04-19</dc:date>
<dc:identifier>doi:10.1101/2024.04.16.589814</dc:identifier>
<dc:title><![CDATA[Massively parallel jumping assay decodes Alu retrotransposition activity]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-04-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.07.08.602569v1?rss=1">
<title>
<![CDATA[
Smooth muscle expression of RNA editing enzyme ADAR1 controls vascular integrity and progression of atherosclerosis 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.07.08.602569v1?rss=1"
</link>
<description><![CDATA[
Mapping the genomic architecture of complex disease has been predicated on the understanding that genetic variants influence disease risk through modifying gene expression. However, recent discoveries have revealed that a significant burden of disease heritability in common autoinflammatory disorders and coronary artery disease (CAD) is mediated through genetic variation modifying post-transcriptional modification of RNA through adenosine-to-inosine (A-to-I) RNA editing. This common RNA modification is catalyzed by ADAR enzymes, where ADAR1 edits specific immunogenic double stranded RNA (dsRNA) to prevent activation of the double strand RNA (dsRNA) sensor MDA5 (IFIH1) and stimulation of an interferon stimulated gene (ISG) response. Multiple lines of human genetic data indicate impaired RNA editing and increased dsRNA sensing by MDA5 to be an important mechanism of CAD risk. Here, we provide a crucial link between observations in human genetics and mechanistic cell biology leading to progression of CAD. Through analysis of human atherosclerotic plaque and culture of human coronary artery vascular smooth muscle cells (SMCs) we implicate the SMC to have a distinct requirement for RNA editing, and that MDA5 activation regulates SMC phenotypic modulation. Through generation of a conditional SMC specific Adar1 deletion mouse model on a pro-atherosclerosis background with additional constitutive deletion of MDA5 (Ifih1), and with incorporation of single cell RNA sequencing cellular profiling, we further show that Adar1 controls SMC phenotypic state by regulating Mda5 activation, is required to maintain vascular integrity, and controls progression of atherosclerosis and vascular calcification. Finally, we further corroborate our findings in a large human carotid endarterectomy dataset (Athero-Express) where we show that ISG activation is strongly associated with decreased plaque stability, increased SMC phenotypic modulation, and increased plaque calcification. Through this work, we describe a fundamental mechanism of CAD, where cell type and context specific RNA editing and sensing of dsRNA mediates disease progression, bridging our understanding of human genetics and disease causality.

One Sentence SummarySmooth muscle expression of RNA editing enzyme ADAR1 regulates activation of double strand RNA sensor MDA5 in novel mechanism of atherosclerosis.
]]></description>
<dc:creator>Weldy, C. S.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:creator>Monteiro, J. P.</dc:creator>
<dc:creator>Guo, H.</dc:creator>
<dc:creator>Galls, D.</dc:creator>
<dc:creator>Gu, W.</dc:creator>
<dc:creator>Cheng, P. P.</dc:creator>
<dc:creator>Ramste, M.</dc:creator>
<dc:creator>Li, D. Y.</dc:creator>
<dc:creator>Palmisano, B. T.</dc:creator>
<dc:creator>Sharma, D.</dc:creator>
<dc:creator>Worssam, M. D.</dc:creator>
<dc:creator>Zhao, Q.</dc:creator>
<dc:creator>Bhate, A.</dc:creator>
<dc:creator>Kundu, R.</dc:creator>
<dc:creator>Nguyen, T.</dc:creator>
<dc:creator>Li, J. B.</dc:creator>
<dc:creator>Quertermous, T.</dc:creator>
<dc:date>2024-07-11</dc:date>
<dc:identifier>doi:10.1101/2024.07.08.602569</dc:identifier>
<dc:title><![CDATA[Smooth muscle expression of RNA editing enzyme ADAR1 controls vascular integrity and progression of atherosclerosis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-07-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.07.12.603288v1?rss=1">
<title>
<![CDATA[
Cohesin-mediated 3D contacts tune enhancer-promoter regulation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.07.12.603288v1?rss=1"
</link>
<description><![CDATA[
Enhancers are key drivers of gene regulation thought to act via 3D physical interactions with the promoters of their target genes. However, genome-wide depletions of architectural proteins such as cohesin result in only limited changes in gene expression, despite a loss of contact domains and loops. Consequently, the role of cohesin and 3D contacts in enhancer function remains debated. Here, we developed CRISPRi of regulatory elements upon degron operation (CRUDO), a novel approach to measure how changes in contact frequency impact enhancer effects on target genes by perturbing enhancers with CRISPRi and measuring gene expression in the presence or absence of cohesin. We systematically perturbed all 1,039 candidate enhancers near five cohesin-dependent genes and identified 34 enhancer-gene regulatory interactions. Of 26 regulatory interactions with sufficient statistical power to evaluate cohesin dependence, 18 show cohesin-dependent effects. A decrease in enhancer-promoter contact frequency upon removal of cohesin is frequently accompanied by a decrease in the regulatory effect of the enhancer on gene expression, consistent with a contact-based model for enhancer function. However, changes in contact frequency and regulatory effects on gene expression vary as a function of distance, with distal enhancers (e.g., >50Kb) experiencing much larger changes than proximal ones (e.g., <50Kb). Because most enhancers are located close to their target genes, these observations can explain how only a small subset of genes -- those with strong distal enhancers -- are sensitive to cohesin. Together, our results illuminate how 3D contacts, influenced by both cohesin and genomic distance, tune enhancer effects on gene expression.
]]></description>
<dc:creator>Guckelberger, P.</dc:creator>
<dc:creator>Doughty, B. R.</dc:creator>
<dc:creator>Munson, G.</dc:creator>
<dc:creator>Rao, S. S. P.</dc:creator>
<dc:creator>Tan, Y.</dc:creator>
<dc:creator>Cai, X. S.</dc:creator>
<dc:creator>Fulco, C. P.</dc:creator>
<dc:creator>Nasser, J.</dc:creator>
<dc:creator>Mualim, K. S.</dc:creator>
<dc:creator>Bergman, D. T.</dc:creator>
<dc:creator>Ray, J.</dc:creator>
<dc:creator>Jagoda, E.</dc:creator>
<dc:creator>Munger, C. J.</dc:creator>
<dc:creator>Gschwind, A. R.</dc:creator>
<dc:creator>Sheth, M. U.</dc:creator>
<dc:creator>Tan, A. S.</dc:creator>
<dc:creator>Steinmetz, L. M.</dc:creator>
<dc:creator>Lander, E. S.</dc:creator>
<dc:creator>Meissner, A.</dc:creator>
<dc:creator>Lieberman Aiden, E.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:date>2024-07-12</dc:date>
<dc:identifier>doi:10.1101/2024.07.12.603288</dc:identifier>
<dc:title><![CDATA[Cohesin-mediated 3D contacts tune enhancer-promoter regulation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-07-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.09.10.612293v1?rss=1">
<title>
<![CDATA[
A cell and transcriptome atlas of the human arterial vasculature 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.09.10.612293v1?rss=1"
</link>
<description><![CDATA[
Contiguous arterial segments show different propensities for different vascular pathologies, yet mechanisms explaining these fundamental differences remain unknown. We sought to build a transcriptomic, cellular, and spatial atlas of human arterial cells across multiple different arterial segments to understand these underlying differences.

Analysis of multiple isogenic arterial segments from healthy donors reveals a significant stereotyped pattern of cell type-specific segmental heterogeneity in healthy arteries. Combining single cell analysis with spatial transcriptomic data reveals cellular heterogeneity not captured by commonly used cell-type marker genes. Determinants of arterial transcriptomic identities are predominantly encoded in fibroblasts and smooth muscle cells (SMC), and their differentially expressed genes are particularly enriched for different vascular disease-associated genetic risk- loci and risk-genes. Adventitial fibroblast-specific heterogeneity in gene expression coincides with a disproportionally large number of vascular disease genetic signals, suggesting a previously unrecognized role for this cell type in disease risk. Adult arterial cells from different segments cluster not by anatomical proximity, but by embryonic origin. Global regulon analysis of disease related segment-specific gene expression program in fibroblast and SMC enriches for binding sites of transcription factors that are developmental master regulators whose expression persists into adulthood, suggesting an important functional role of the same developmental master regulators in adult gene expression and disease. Lastly, non-coding transcriptomes across arterial cells contain extensive variation in lncRNAs expressed in cell type- and segment-specific patterns, rivaling heterogeneity in protein coding transcriptomes. Differentially expressed LncRNA demonstrate enrichment for non-coding genetic signals for vascular diseases, suggesting a potential global role of segmental specific LncRNAs in regulating inherited human vascular disease risk.
]]></description>
<dc:creator>Zhao, Q.</dc:creator>
<dc:creator>Pedroza, A.</dc:creator>
<dc:creator>Sharma, D.</dc:creator>
<dc:creator>Gu, W.</dc:creator>
<dc:creator>Dalal, A.</dc:creator>
<dc:creator>Weldy, C.</dc:creator>
<dc:creator>Jackson, W.</dc:creator>
<dc:creator>Li, D. Y.</dc:creator>
<dc:creator>Ryan, Y.</dc:creator>
<dc:creator>Nguyen, T.</dc:creator>
<dc:creator>Shad, R.</dc:creator>
<dc:creator>Palmisano, B. T.</dc:creator>
<dc:creator>Monteiro, J. P.</dc:creator>
<dc:creator>Worssam, M.</dc:creator>
<dc:creator>Berezwitz, A.</dc:creator>
<dc:creator>Iyer, M.</dc:creator>
<dc:creator>Shi, H.</dc:creator>
<dc:creator>Kundu, R.</dc:creator>
<dc:creator>Limbu, L.</dc:creator>
<dc:creator>Kim, J. B.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Fischbein, M.</dc:creator>
<dc:creator>Wirka, R.</dc:creator>
<dc:creator>Quertermous, T.</dc:creator>
<dc:creator>Cheng, P.</dc:creator>
<dc:date>2024-09-10</dc:date>
<dc:identifier>doi:10.1101/2024.09.10.612293</dc:identifier>
<dc:title><![CDATA[A cell and transcriptome atlas of the human arterial vasculature]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-09-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.02.13.579700v1?rss=1">
<title>
<![CDATA[
A missense variant effect map for the human tumour suppressor protein CHK2 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.02.13.579700v1?rss=1"
</link>
<description><![CDATA[
The tumour suppressor CHEK2 encodes the serine/threonine protein kinase CHK2 which, upon DNA damage, is important for pausing the cell cycle, initiating DNA repair and inducing apoptosis. CHK2 phosphorylation of the tumour suppressor BRCA1 is also important for mitotic spindle assembly and chromosomal stability. Consistent with its cell cycle checkpoint role, both germline and somatic variants in CHEK2 have been linked to breast and multiple other cancer types. Over 90% of clinical germline CHEK2 missense variants are classified as variants of uncertain significance, complicating diagnosis of CHK2-dependent cancer. We therefore sought to test the functional impact of all possible missense variants in CHK2. Using a scalable multiplexed assay based on the ability of human CHK2 to complement DNA sensitivity of a S. cerevisiae lacking its ortholog RAD53, we generated a systematic  missense variant effect map for CHEK2 missense variation. Map scores reflect known biochemical features of CHK2 and exhibit good performance in separating pathogenic from benign clinical missense variants. Thus, the missense variant effect map for CHK2 offers value in understanding both known and yet-to-be-observed CHK2 variants.
]]></description>
<dc:creator>Gebbia, M.</dc:creator>
<dc:creator>Zimmerman, D. I.</dc:creator>
<dc:creator>Jiang, R.</dc:creator>
<dc:creator>Nguyen, M.</dc:creator>
<dc:creator>Weile, J.</dc:creator>
<dc:creator>Li, R.</dc:creator>
<dc:creator>Gavac, M.</dc:creator>
<dc:creator>Kishore, N.</dc:creator>
<dc:creator>Sun, S.</dc:creator>
<dc:creator>Boonen, R. A.</dc:creator>
<dc:creator>Dines, J. N.</dc:creator>
<dc:creator>Wahl, A.</dc:creator>
<dc:creator>Reuter, J.</dc:creator>
<dc:creator>Johnson, B.</dc:creator>
<dc:creator>Fowler, D.</dc:creator>
<dc:creator>van Attikum, H.</dc:creator>
<dc:creator>Roth, F. P.</dc:creator>
<dc:date>2024-02-15</dc:date>
<dc:identifier>doi:10.1101/2024.02.13.579700</dc:identifier>
<dc:title><![CDATA[A missense variant effect map for the human tumour suppressor protein CHK2]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-02-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.06.02.597061v1?rss=1">
<title>
<![CDATA[
BIT: Bayesian Identification of Transcriptional Regulators 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.06.02.597061v1?rss=1"
</link>
<description><![CDATA[
Transcriptional regulators (TRs) are master controllers of gene expression and play a critical role in both normal tissue development and disease progression. However, existing computational methods for identification of TRs regulating specific biological processes have significant limitations, such as relying on distance on a linear chromosome or binding motifs that have low specificity. Many also use statistical tests in ways that lack interpretability and rigorous confidence measures. We introduce BIT, a novel Bayesian hierarchical model for in-silico TR identification. Leveraging a comprehensive library of TR ChIP-seq data, BIT offers a fully integrated Bayesian approach to assess genome-wide consistency between user-provided epigenomic profiling data and the TR binding library, enabling the identification of critical TRs while quantifying uncertainty. It avoids estimation and inference in a sequential manner or numerous isolated statistical tests, thereby enhancing accuracy and interpretability. BIT successfully identified critical TRs in perturbation experiments, functionally essential TRs in various cancer types, and cell-type-specific TRs within heterogeneous cell populations, offering deeper biological insights into transcriptional regulation.
]]></description>
<dc:creator>Lu, Z.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Wang, X.</dc:creator>
<dc:date>2024-06-03</dc:date>
<dc:identifier>doi:10.1101/2024.06.02.597061</dc:identifier>
<dc:title><![CDATA[BIT: Bayesian Identification of Transcriptional Regulators]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-06-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.08.30.610571v1?rss=1">
<title>
<![CDATA[
BayeSMART: Bayesian Clustering of Multi-sample Spatially Resolved Transcriptomics Data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.08.30.610571v1?rss=1"
</link>
<description><![CDATA[
The field of spatially resolved transcriptomics (SRT) has greatly advanced our understanding of cellular microenvironments by integrating spatial information with molecular data collected from multiple tissue sections or individuals. However, methods for multi-sample spatial clustering are lacking, and existing methods primarily rely on molecular information alone. This paper introduces BayeSMART, a Bayesian statistical method designed to identify spatial domains across multiple samples. BayeSMART leverages artificial intelligence (AI)-reconstructed single-cell level information from the paired histology images of multi-sample SRT datasets while simultaneously considering the spatial context of gene expression. The AI integration enables BayeSMART to effectively interpret the spatial domains. We conducted case studies using four datasets from various tissue types and SRT platforms and compared BayeSMART with alternative multi-sample spatial clustering approaches and a number of state-of-the-art methods for single-sample SRT analysis, demonstrating that it surpasses existing methods in terms of clustering accuracy, interpretability, and computational efficiency. BayeSMART offers new insights into the spatial organization of cells in multi-sample SRT data.
]]></description>
<dc:creator>Guo, Y.</dc:creator>
<dc:creator>Zhu, B.</dc:creator>
<dc:creator>Tang, C.</dc:creator>
<dc:creator>Rong, R.</dc:creator>
<dc:creator>Ma, Y.</dc:creator>
<dc:creator>Xiao, G.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:date>2024-09-01</dc:date>
<dc:identifier>doi:10.1101/2024.08.30.610571</dc:identifier>
<dc:title><![CDATA[BayeSMART: Bayesian Clustering of Multi-sample Spatially Resolved Transcriptomics Data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-09-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.06.04.597391v1?rss=1">
<title>
<![CDATA[
A Regularized Bayesian Dirichlet-multinomial Regression Model for Integrating Single-cell-level Omics and Patient-level Clinical Study Data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.06.04.597391v1?rss=1"
</link>
<description><![CDATA[
SummaryThe abundance of various cell types can vary significantly among patients with varying phenotypes and even those with the same phenotype. Recent scientific advancements provide mounting evidence that other clinical variables, such as age, gender, and lifestyle habits, can also influence the abundance of certain cell types. However, current methods for integrating single-cell-level omics data with clinical variables are inadequate. In this study, we propose a regularized Bayesian Dirichlet-multinomial regression framework to investigate the relationship between single-cell RNA sequencing data and patient-level clinical data. Additionally, the model employs a novel hierarchical tree structure to identify such relationships at different cell-type levels. Our model successfully uncovers significant associations between specific cell types and clinical variables across three distinct diseases: pulmonary fibrosis, COVID-19, and non-small cell lung cancer. This integrative analysis provides biological insights and could potentially inform clinical interventions for various diseases.
]]></description>
<dc:creator>Guo, Y.</dc:creator>
<dc:creator>Yu, L.</dc:creator>
<dc:creator>Guo, L.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:date>2024-06-06</dc:date>
<dc:identifier>doi:10.1101/2024.06.04.597391</dc:identifier>
<dc:title><![CDATA[A Regularized Bayesian Dirichlet-multinomial Regression Model for Integrating Single-cell-level Omics and Patient-level Clinical Study Data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-06-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.02.01.578316v1?rss=1">
<title>
<![CDATA[
Assessing NGS-based computational methods for predicting transcriptional regulators with query gene sets 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.02.01.578316v1?rss=1"
</link>
<description><![CDATA[
This article provides an in-depth review of computational methods for predicting transcriptional regulators with query gene sets. Identification of transcriptional regulators is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement.

Key pointsO_LIAn introduction to available computational methods for predicting functional TRs from a query gene set.
C_LIO_LIA detailed walk-through along with practical concerns and limitations.
C_LIO_LIA systematic benchmark of NGS-based methods in terms of accuracy, sensitivity, coverage, and usability, using 570 TR perturbation-derived gene sets.
C_LIO_LINGS-based methods outperform motif-based methods. Among NGS methods, those utilizing larger databases and adopting region-centric approaches demonstrate favorable performance. BART, ChIP-Atlas, and Lisa are recommended as these methods have overall better performance in evaluated scenarios.
C_LI
]]></description>
<dc:creator>Lu, Z.</dc:creator>
<dc:creator>Xiao, X.</dc:creator>
<dc:creator>Zheng, Q.</dc:creator>
<dc:creator>Wang, X.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:date>2024-02-06</dc:date>
<dc:identifier>doi:10.1101/2024.02.01.578316</dc:identifier>
<dc:title><![CDATA[Assessing NGS-based computational methods for predicting transcriptional regulators with query gene sets]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-02-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.09.09.612085v1?rss=1">
<title>
<![CDATA[
CRISPR-CLEAR: Nucleotide-Resolution Mapping of Regulatory Elements via Allelic Readout of Tiled Base Editing 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.09.09.612085v1?rss=1"
</link>
<description><![CDATA[
CRISPR tiling screens have advanced the identification and characterization of regulatory sequences but are limited by low resolution arising from the indirect readout of editing via guide RNA sequencing. This study introduces CRISPR-CLEAR, an end-to-end experimental assay and computational pipeline, which leverages targeted sequencing of CRISPR-introduced alleles at the endogenous target locus following dense base-editing mutagenesis. This approach enables the dissection of regulatory elements at nucleotide resolution, facilitating a direct assessment of genotype-phenotype effects.
]]></description>
<dc:creator>Becerra, B.</dc:creator>
<dc:creator>Wittibschlager, S.</dc:creator>
<dc:creator>Patel, Z. M.</dc:creator>
<dc:creator>Kutschat, A.</dc:creator>
<dc:creator>Delano, J.</dc:creator>
<dc:creator>Karjalainen, A.</dc:creator>
<dc:creator>Wu, T.</dc:creator>
<dc:creator>Starrs, M.</dc:creator>
<dc:creator>Jankowiak, M.</dc:creator>
<dc:creator>Bauer, D.</dc:creator>
<dc:creator>Seruggia, D.</dc:creator>
<dc:creator>Pinello, L.</dc:creator>
<dc:date>2024-09-09</dc:date>
<dc:identifier>doi:10.1101/2024.09.09.612085</dc:identifier>
<dc:title><![CDATA[CRISPR-CLEAR: Nucleotide-Resolution Mapping of Regulatory Elements via Allelic Readout of Tiled Base Editing]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-09-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.09.04.611293v1?rss=1">
<title>
<![CDATA[
Characterization and bioinformatic filtering of ambient gRNAs in single-cell CRISPR screens using CLEANSER 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.09.04.611293v1?rss=1"
</link>
<description><![CDATA[
Recent technological developments in single-cell RNA-seq CRISPR screens enable high-throughput investigation of the genome. Through transduction of a gRNA library to a cell population followed by transcriptomic profiling by scRNA-seq, it is possible to characterize the effects of thousands of genomic perturbations on global gene expression. A major source of noise in scRNA-seq CRISPR screens are ambient gRNAs, which are contaminating gRNAs that likely originate from other cells. If not properly filtered, ambient gRNAs can result in an excess of false positive gRNA assignments. Here, we utilize CRISPR barnyard assays to characterize ambient gRNA noise in single-cell CRISPR screens. We use these datasets to develop and train CLEANSER, a mixture model that identifies and filters ambient gRNA noise. This model takes advantage of the bimodal distribution between native and ambient gRNAs and includes both gRNA and cell-specific normalization parameters, correcting for confounding technical factors that affect individual gRNAs and cells. The output of CLEANSER is the probability that a gRNA-cell assignment is in the native distribution over the ambient distribution. We find that ambient gRNA filtering methods impact differential gene expression analysis outcomes and that CLEANSER outperforms alternate approaches by increasing gRNA-cell assignment accuracy.

Graphical Abstract

O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=189 SRC="FIGDIR/small/611293v1_ufig1.gif" ALT="Figure 1">
View larger version (66K):
org.highwire.dtl.DTLVardef@165c63dorg.highwire.dtl.DTLVardef@ba0e15org.highwire.dtl.DTLVardef@f2b12eorg.highwire.dtl.DTLVardef@14e6c86_HPS_FORMAT_FIGEXP  M_FIG C_FIG
]]></description>
<dc:creator>Liu, S.</dc:creator>
<dc:creator>Hamilton, M. C.</dc:creator>
<dc:creator>Cowart, T. N.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>Bounds, L. R.</dc:creator>
<dc:creator>Nelson, A. C.</dc:creator>
<dc:creator>Doty, R. W.</dc:creator>
<dc:creator>Allen, A. S.</dc:creator>
<dc:creator>Crawford, G. E.</dc:creator>
<dc:creator>Majoros, W. H.</dc:creator>
<dc:creator>Gersbach, C. A.</dc:creator>
<dc:date>2024-09-04</dc:date>
<dc:identifier>doi:10.1101/2024.09.04.611293</dc:identifier>
<dc:title><![CDATA[Characterization and bioinformatic filtering of ambient gRNAs in single-cell CRISPR screens using CLEANSER]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-09-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.09.19.613754v1?rss=1">
<title>
<![CDATA[
scooby: Modeling multi-modal genomic profiles from DNA sequence at single-cell resolution 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.09.19.613754v1?rss=1"
</link>
<description><![CDATA[
Understanding how regulatory DNA elements shape gene expression across individual cells is a fundamental challenge in genomics. Joint RNA-seq and epigenomic profiling provides opportunities to build unifying models of gene regulation capturing sequence determinants across steps of gene expression. However, current models, developed primarily for bulk omics data, fail to capture the cellular heterogeneity and dynamic processes revealed by single-cell multi-modal technologies. Here, we introduce scooby, the first framework to model scRNA-seq coverage and scATAC-seq insertion profiles along the genome from sequence at single-cell resolution. For this, we leverage the pre-trained multi-omics profile predictor Borzoi as a foundation model, equip it with a cell-specific decoder, and fine-tune its sequence embeddings. Specifically, we condition the decoder on the cell position in a precomputed single-cell embedding resulting in strong generalization capability. Applied to a hematopoiesis dataset, scooby recapitulates cell-specific expression levels of held-out genes, and identifies regulators and their putative target genes through in silico motif deletion. Moreover, accurate variant effect prediction with scooby allows for breaking down bulk eQTL effects into single-cell effects and delineating their impact on chromatin accessibility and gene expression. We anticipate scooby to aid unraveling the complexities of gene regulation at the resolution of individual cells.
]]></description>
<dc:creator>Gagneur, J.</dc:creator>
<dc:creator>Hingerl, J. C.</dc:creator>
<dc:creator>Martens, L. D.</dc:creator>
<dc:creator>Manz, T.</dc:creator>
<dc:creator>Theis, F. J.</dc:creator>
<dc:creator>Buenrostro, J. D.</dc:creator>
<dc:creator>Karollus, A.</dc:creator>
<dc:date>2024-09-22</dc:date>
<dc:identifier>doi:10.1101/2024.09.19.613754</dc:identifier>
<dc:title><![CDATA[scooby: Modeling multi-modal genomic profiles from DNA sequence at single-cell resolution]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-09-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.01.16.699909v1?rss=1">
<title>
<![CDATA[
An integrated, scaled approach to resolve TSC2 variants of uncertain significance 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.01.16.699909v1?rss=1"
</link>
<description><![CDATA[
Obtaining a precise genetic tuberous sclerosis diagnosis is a challenge as many missense TSC2 variants are variants of uncertain significance (VUS). VUS in TSC2 have been resolved by one-at-a-time functional assays, but these assays cannot scale to the 3,634 TSC2 missense VUS observed so far. To address this challenge, we used massively parallel sequencing to measure the steady-state abundance of almost 9,000 TSC2 missense variants and developed an mTOR pathway activity assay using genome editing and cell sorting to generate activity scores for 391 missense variants. 1,288 of 8,891 (14.49%) missense variants assayed had altered TSC2 abundance, and 69 of 391 (17.65%) missense variants assayed had altered mTOR pathway activity. Calibration and integration of these data into classification of variants identified in a clinical cohort putatively reclassified 212 of 276 (76.8%) TSC2 missense VUS. These datasets will lead to improved genetic diagnosis of tuberous sclerosis with potential positive impacts on the clinical management of patients and their families.
]]></description>
<dc:creator>Biar, C. G.</dc:creator>
<dc:creator>Wang, Z. R.</dc:creator>
<dc:creator>Camp, N. D.</dc:creator>
<dc:creator>Holmes, D. L.</dc:creator>
<dc:creator>Wheelock, M. K.</dc:creator>
<dc:creator>Pendyala, S.</dc:creator>
<dc:creator>McGee, A. V.</dc:creator>
<dc:creator>Gupta, P.</dc:creator>
<dc:creator>McEwen, A. E.</dc:creator>
<dc:creator>Tejura, M.</dc:creator>
<dc:creator>Richardson, M. E.</dc:creator>
<dc:creator>Weyandt, J. D.</dc:creator>
<dc:creator>Coleman, T.</dc:creator>
<dc:creator>Stewart, R.</dc:creator>
<dc:creator>Zeiberg, D.</dc:creator>
<dc:creator>Vandi, A. J.</dc:creator>
<dc:creator>Dawson, S.</dc:creator>
<dc:creator>Radivojac, P.</dc:creator>
<dc:creator>Starita, L. M.</dc:creator>
<dc:creator>Carvill, G. L.</dc:creator>
<dc:creator>James, R. G.</dc:creator>
<dc:creator>Fowler, D. M.</dc:creator>
<dc:creator>Calhoun, J. D.</dc:creator>
<dc:date>2026-01-18</dc:date>
<dc:identifier>doi:10.64898/2026.01.16.699909</dc:identifier>
<dc:title><![CDATA[An integrated, scaled approach to resolve TSC2 variants of uncertain significance]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-01-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.09.25.678548v1?rss=1">
<title>
<![CDATA[
Uniform processing and analysis of IGVF massively parallel reporter assay data with MPRAsnakeflow 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.09.25.678548v1?rss=1"
</link>
<description><![CDATA[
As researchers and clinicians seek to identify human genomic alterations relevant to traits and disorders, identifying and aggregating evidence providing mechanistic support for associations between alterations and phenotypes remains challenging. In particular, the study of non-coding genomic variation remains a major challenge due to the lack of accurate functional annotation for activity in a given context and across alleles. Experimental evidence is critical for prioritizing and interpreting functional effects of genetic alterations. Massively Parallel Reporter Assays (MPRAs) have emerged as a powerful high-throughput approach, enabling quantification of regulatory element activity and allelic effects, and systematic dissection of gene regulatory logic and variant effects across different contexts. However, the diversity of MPRA designs, lack of standardized formats, and many potential processing parameters hamper data integration, reproducibility, and meta-analyses across studies.

To address these challenges, the Impact of Genomic Variation on Function (IGVF) Consortium established an MPRA focus group to develop community standards, including harmonized file formats, and robust analysis pipelines for a wide range of library types and experimental designs. Here, we present these formats and comprehensive computational tools, MPRAlib and MPRAsnakeflow, for uniform processing from raw sequencing reads to counts, processing and visualization. Using diverse MPRA datasets, we characterize technical variability sources including barcode sequence bias, outlier barcodes, and delivery method (episomal vs. lentiviral). Our results establish best practices for MPRA data generation and analysis, facilitating robust, reproducible research and large-scale integration. The presented tools and standards are publicly available, providing a foundation for future collaborative efforts in regulatory genomics.
]]></description>
<dc:creator>Rosen, J. D.</dc:creator>
<dc:creator>Vasanthakumari, A. D.</dc:creator>
<dc:creator>Salomon, K.</dc:creator>
<dc:creator>de Lange, N.</dc:creator>
<dc:creator>Dash, P. M.</dc:creator>
<dc:creator>Keukeleire, P.</dc:creator>
<dc:creator>Hassan, A.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>Kircher, M.</dc:creator>
<dc:creator>Love, M. I.</dc:creator>
<dc:creator>Schubach, M.</dc:creator>
<dc:date>2025-09-28</dc:date>
<dc:identifier>doi:10.1101/2025.09.25.678548</dc:identifier>
<dc:title><![CDATA[Uniform processing and analysis of IGVF massively parallel reporter assay data with MPRAsnakeflow]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-09-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.03.30.715055v1?rss=1">
<title>
<![CDATA[
High-throughput biochemical phenotyping of SHP2 variants reveals molecular basis of diseases and allosteric drug inhibition 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.03.30.715055v1?rss=1"
</link>
<description><![CDATA[
Interpreting clinical and functional consequences of genetic variants remains challenging due to limited quantitative biochemical data at scale. We applied high-throughput microfluidic enzyme kinetics to profile 190 clinical variants of SHP2, a protein tyrosine phosphatase linked to developmental disorders and cancers. Through >300,000 reaction progress curves, we derived kinetic and thermodynamic parameters quantifying variant effects on catalysis, autoinhibition, stability, phosphopeptide binding, and drug responses. This multidimensional dataset reveals that dysregulated autoinhibition, rather than altered stability or catalysis, predominantly determines SHP2-associated pathogenesis. Thermodynamic modeling reveals that clinical-stage allosteric inhibitors preferentially stabilize a previously underappreciated, partially active conformation over the fully inactive state, leading to variant-dependent drug responses. Our high-throughput biochemical framework establishes a general approach to decipher the biochemical logic connecting protein variants to clinical outcomes.
]]></description>
<dc:creator>Lee, A. A.</dc:creator>
<dc:creator>Mokhtari, D. A.</dc:creator>
<dc:creator>Egan, E. D.</dc:creator>
<dc:creator>Blacklow, S. C.</dc:creator>
<dc:creator>Herschlag, D.</dc:creator>
<dc:creator>Fordyce, P. M.</dc:creator>
<dc:date>2026-04-01</dc:date>
<dc:identifier>doi:10.64898/2026.03.30.715055</dc:identifier>
<dc:title><![CDATA[High-throughput biochemical phenotyping of SHP2 variants reveals molecular basis of diseases and allosteric drug inhibition]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-04-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.08.15.503769v1?rss=1">
<title>
<![CDATA[
Deciphering causal genomic templates of complex molecular phenotypes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.08.15.503769v1?rss=1"
</link>
<description><![CDATA[
The genetic code is a formal principle that determines which proteins an organism can produce from only its genome sequence, without mechanistic modeling. Whether similar formal principles govern the relationship between genome sequence and phenotype across scales - from molecules to cells to tissues - is unknown. Here, we show that a single formal principle - structural correspondence -- underlies the relationship between phenotype and genome sequence across scales. We represent phenotypes and the genome as graphs and find mappings between them using structure preservation as the sole constraint. Combinatorial richness in phenotypes more tightly constrains which mappings preserve that structure. Thus, phenotypic structure predicts genetic associations independently of covariation with genotype. This principle rediscovers the amino acid code without prior knowledge of translation or coding sequences, using just one protein and genome sequence as input. We benchmark this principle: applied to phenotypes at the cell, tissue and organ scales, the mappings correctly predict established associations and are driven by transcription factor motifs. Applied to cancer tissue images, we find regulators of spatial gene expression in immune cells. We thus offer a first-principles framework to relate genome sequence with phenotypic structure and guide mechanistic discovery across scales.
]]></description>
<dc:creator>Bhate, S. S.</dc:creator>
<dc:creator>Seigal, A.</dc:creator>
<dc:creator>Caicedo, J.</dc:creator>
<dc:date>2022-08-15</dc:date>
<dc:identifier>doi:10.1101/2022.08.15.503769</dc:identifier>
<dc:title><![CDATA[Deciphering causal genomic templates of complex molecular phenotypes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-08-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.06.26.661849v1?rss=1">
<title>
<![CDATA[
Simultaneous epigenomic profiling and regulatory activity measurement using e2MPRA 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.06.26.661849v1?rss=1"
</link>
<description><![CDATA[
Cis-regulatory elements (CREs) have a major effect on phenotypes including disease. They are identified in a genome-wide manner by analyzing the binding of transcription factors (TFs), various co-factors and histone modifications in DNA using assays such as ChIP-seq, Cut&Tag and ATAC-seq. However, these assays are descriptive and require high-throughput technologies, such as massively parallel reporter assays (MPRAs), to test the functional activity and variant effect on these sequences. Currently, technologies that can simultaneously analyze both the regulatory function of a specific sequence and the TFs, cofactors and epigenomic modifications that determine it do not exist. Here, we developed enrichment followed by epigenomic profiling MPRA (e2MPRA), a novel technology that utilizes lentivirus-based MPRA to enrich for the integration of specific CREs into the genome followed by Cut&Tag or ATAC-seq targeted specifically for these sequences. This method allows to simultaneously analyze in a high-throughput manner regulatory activity, protein binding and epigenetic modification of thousands of candidate CREs and their variants. We demonstrate that e2MPRA can be used to dissect the epigenetic functions of TF motifs arranged in synthetic enhancers, as well as to analyze the effect of enhancer sequence variants on epigenetic modifications. In summary, this technology will increase our understanding of the regulatory code, its effect on the epigenome and how its alteration can lead to a variety of phenotypes including human disease.
]]></description>
<dc:creator>Zhang, Z.</dc:creator>
<dc:creator>Georgakopoulos Soares, I.</dc:creator>
<dc:creator>Bourque, G.</dc:creator>
<dc:creator>Ahituv, N.</dc:creator>
<dc:creator>Inoue, F.</dc:creator>
<dc:date>2025-06-30</dc:date>
<dc:identifier>doi:10.1101/2025.06.26.661849</dc:identifier>
<dc:title><![CDATA[Simultaneous epigenomic profiling and regulatory activity measurement using e2MPRA]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-06-30</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.11.04.686611v1?rss=1">
<title>
<![CDATA[
Large-scale discovery of neural enhancers for cis-regulation therapies 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.11.04.686611v1?rss=1"
</link>
<description><![CDATA[
CRISPR-based gene activation (CRISPRa) has emerged as a promising therapeutic approach for neurodevelopmental disorders (NDD) caused by haploinsufficiency. However, scaling this cis-regulatory therapy (CRT) paradigm requires pinpointing which candidate cis-regulatory elements (cCREs) are active in human neurons, and which can be targeted with CRISPRa to yield specific and therapeutic levels of target gene upregulation. Here, we combine Massively Parallel Reporter Assays (MPRAs) and a multiplex single cell CRISPRa screen to discover functional human neural enhancers whose CRISPRa targeting yields specific upregulation of NDD risk genes. First, we tested 5,425 candidate neuronal enhancers with MPRA, identifying 2,422 that are active in human neurons. Selected cCREs also displayed specific, autonomous in vivo activity in the developing mouse central nervous system. Next, we applied multiplex single-cell CRISPRa screening with 15,643 gRNAs to test all MPRA-prioritized cCREs and 761 promoters of NDD genes in their endogenous genomic contexts. We identified hundreds of promoter- and enhancer-targeting CRISPRa gRNAs that upregulated 200 of the 337 NDD genes in human neurons, including 91 novel enhancer-gene pairs. Finally, we confirmed that several of the CRISPRa gRNAs identified here demonstrated selective and therapeutically relevant upregulation of SCN2A, CHD8, CTNND2 and TCF4 when delivered virally to patient cell lines, human cerebral organoids, and a humanized mouse model of hTcf4. Our results provide a comprehensive resource of active, target-linked human neural enhancers for NDD genes and corresponding gRNA reagents for CRT development. More broadly, this work advances understanding of neural gene regulation and establishes a generalizable strategy for discovering CRT gRNA candidates across cell types and haploinsufficient disorders.
]]></description>
<dc:creator>McDiarmid, T. A.</dc:creator>
<dc:creator>Page, N. F.</dc:creator>
<dc:creator>Chardon, F. M.</dc:creator>
<dc:creator>Daza, R. M.</dc:creator>
<dc:creator>Chen, G. T.</dc:creator>
<dc:creator>Kosicki, M.</dc:creator>
<dc:creator>James, L. M.</dc:creator>
<dc:creator>Nourie, H. C.</dc:creator>
<dc:creator>Laboy-Cintron, D.</dc:creator>
<dc:creator>Lee, A. S.</dc:creator>
<dc:creator>Vij, P.</dc:creator>
<dc:creator>Calderon, D.</dc:creator>
<dc:creator>Lalanne, J.-B.</dc:creator>
<dc:creator>Martin, B. K.</dc:creator>
<dc:creator>Fink, K.</dc:creator>
<dc:creator>Talkowski, M. E.</dc:creator>
<dc:creator>Muotri, A. R.</dc:creator>
<dc:creator>Philpot, B. D.</dc:creator>
<dc:creator>Pennacchio, L. A.</dc:creator>
<dc:creator>Geschwind, D. H.</dc:creator>
<dc:creator>Sanders, S. J.</dc:creator>
<dc:creator>Ahituv, N.</dc:creator>
<dc:creator>Shendure, J.</dc:creator>
<dc:date>2025-11-05</dc:date>
<dc:identifier>doi:10.1101/2025.11.04.686611</dc:identifier>
<dc:title><![CDATA[Large-scale discovery of neural enhancers for cis-regulation therapies]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-11-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.06.11.658967v1?rss=1">
<title>
<![CDATA[
Capture-C MPRA: A high-throughput method to simultaneously characterize promoter interactions and regulatory activity 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.06.11.658967v1?rss=1"
</link>
<description><![CDATA[
Cis regulatory elements (CREs) interact with their target promoters over long genomic distances and can be identified using chromatin conformation capture (3C) assays. Their regulatory activity can be functionally characterized in a high-throughput manner using massively parallel reporter assays (MPRAs) that generally test an enhancer alongside a minimal promoter. Here, we developed a novel technology called Capture-C MPRA (ccMPRA) that combines both technologies and can simultaneously obtain chromatin interactions and measure CRE activity alongside their target promoters. We utilized ccMPRA to analyze the regulatory activity of 650 promoters interacting with 42,719 sequences. As C-based techniques also capture isolated promoters, we were able to obtain promoter baseline activity, enabling the identification of both enhancers and silencers. Analysis of CREs interacting with more than one promoter showed significant activity differences depending on the promoter. In summary, ccMPRA can simultaneously characterize chromatin interactions and regulatory activity, allowing to further dissect regulatory grammar.
]]></description>
<dc:creator>Arnould, C.</dc:creator>
<dc:creator>Keukeleire, P.</dc:creator>
<dc:creator>Inoue, F.</dc:creator>
<dc:creator>Cui, X.</dc:creator>
<dc:creator>An, K.</dc:creator>
<dc:creator>Murray, E.</dc:creator>
<dc:creator>Zhang, X.</dc:creator>
<dc:creator>Drmanac, R.</dc:creator>
<dc:creator>Peters, B.</dc:creator>
<dc:creator>Shendure, J.</dc:creator>
<dc:creator>Shen, Y.</dc:creator>
<dc:creator>Kircher, M.</dc:creator>
<dc:creator>Ahituv, N.</dc:creator>
<dc:date>2025-06-14</dc:date>
<dc:identifier>doi:10.1101/2025.06.11.658967</dc:identifier>
<dc:title><![CDATA[Capture-C MPRA: A high-throughput method to simultaneously characterize promoter interactions and regulatory activity]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-06-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.04.29.651326v1?rss=1">
<title>
<![CDATA[
Gene-based calibration of high-throughput functional assays for clinical variant classification 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.04.29.651326v1?rss=1"
</link>
<description><![CDATA[
High-throughput assays measure a broad range of variant effects on gene function and hold promise for supporting genomic medicine. Current clinical guidelines for rare Mendelian diseases rely on establishing gene-specific score thresholds for each assay that separate pathogenic from benign variants. This introduces inconsistencies and subjectivity, ultimately lacking the rigor of calibration; i.e., mapping a variant score to a probability of pathogenicity. To address this problem, we introduce a semi-supervised framework for calibrating experimental assay data and propose Experimental score CALIBRator (ExCALIBR), a method that jointly models pathogenic, benign, synonymous, and population variants using skew normal mixtures to produce variant-specific probabilities of pathogenicity. Evaluated across 80 datasets from 39 genes, all meeting fit quality criteria, ExCALIBR substantially outperformed existing field standards and was further validated on the All of Us biobank data. Our results demonstrate that calibrated experimental assays generate indispensable evidence that will dramatically reduce variants of uncertain significance.
]]></description>
<dc:creator>Zeiberg, D.</dc:creator>
<dc:creator>Tejura, M.</dc:creator>
<dc:creator>McEwen, A. E.</dc:creator>
<dc:creator>Fayer, S.</dc:creator>
<dc:creator>Pejaver, V.</dc:creator>
<dc:creator>Rubin, A. F.</dc:creator>
<dc:creator>Starita, L. M.</dc:creator>
<dc:creator>Fowler, D. M.</dc:creator>
<dc:creator>O'Donnell-Luria, A.</dc:creator>
<dc:creator>Radivojac, P.</dc:creator>
<dc:date>2025-05-04</dc:date>
<dc:identifier>doi:10.1101/2025.04.29.651326</dc:identifier>
<dc:title><![CDATA[Gene-based calibration of high-throughput functional assays for clinical variant classification]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-05-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.08.11.669723v1?rss=1">
<title>
<![CDATA[
Functional evidence for G6PD variant classification from mutational scanning 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.08.11.669723v1?rss=1"
</link>
<description><![CDATA[
G6PD deficiency is one of the most common enzyme deficiencies worldwide, and increases the likelihood of adverse reactions to certain drugs and foods. Identifying people at risk is challenging, since most are asymptomatic until they encounter a trigger. This is further complicated since over 60% of 1,559 known genetic variants in G6PD are variants of uncertain significance and thus cannot guide drug prescribing and dosing. To resolve which variants are clinically meaningful and avoid harm from adverse drug reactions, we conducted two high-throughput functional assays: one for G6PD activity, and one for abundance. We measured the function of 9,527 missense, nonsense, and synonymous G6PD variants. The patterns of variant effect on activity and abundance confirmed the importance of structural NADP+ for G6PD activity and abundance, and G6PD dimerization for G6PD activity. Based on the ability of our functional assay scores to accurately classify G6PD variants of known clinical effect, we generated evidence that 4,870 missense variants contribute to G6PD deficiency and 2,245 are unlikely to contribute to G6PD deficiency. Our data can be used to deepen our understanding of G6PD as a protein, and to close the gap in classification for variants of uncertain significance to improve implementation of genetic medicine for G6PD deficiency.
]]></description>
<dc:creator>Geck, R. C.</dc:creator>
<dc:creator>Wheelock, M. K.</dc:creator>
<dc:creator>Powell, R. L.</dc:creator>
<dc:creator>Wang, Z. R.</dc:creator>
<dc:creator>Holmes, D. L.</dc:creator>
<dc:creator>Fayer, S.</dc:creator>
<dc:creator>Boyle, G. E.</dc:creator>
<dc:creator>Vandi, A. J.</dc:creator>
<dc:creator>Amorosi, C. J.</dc:creator>
<dc:creator>Moore, N.</dc:creator>
<dc:creator>Rubin, A. F.</dc:creator>
<dc:creator>Fowler, D. M.</dc:creator>
<dc:creator>Dunham, M. J.</dc:creator>
<dc:date>2025-08-14</dc:date>
<dc:identifier>doi:10.1101/2025.08.11.669723</dc:identifier>
<dc:title><![CDATA[Functional evidence for G6PD variant classification from mutational scanning]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-08-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.02.17.706269v1?rss=1">
<title>
<![CDATA[
Gene- and domain-aware calibration increases the clinical utility of variant effect predictors 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.02.17.706269v1?rss=1"
</link>
<description><![CDATA[
The utility of clinical genetic testing is limited because around 90% of missense variants in ClinVar remain of uncertain clinical significance. Variant effect predictors (VEPs) can score any missense variant, potentially empowering variant classification. Realizing this potential requires calibration to translate predictor scores into evidence. However, genome-wide calibration ignores predictor heterogeneity across genes, causing evidence misassignment. We developed an automated, data-adaptive framework that optimizes two complementary approaches: gene-specific calibration for genes with enough variants for calibration, and domain-aggregate calibration for other disease-associated genes, which groups variants from protein domains with similar predictor score distributions for calibration. Applied to three predictors across 2,769 genes, this framework assigned evidence to 10.6% more variants on average while generally improving evidence accuracy compared to genome-wide calibration. These calibrations and the resulting calibrated computational evidence are available through the PredictMD portal. Our framework substantially increases the clinical utility of VEPs for variant classification.
]]></description>
<dc:creator>Chen, Y.</dc:creator>
<dc:creator>Fayer, S.</dc:creator>
<dc:creator>Jain, S.</dc:creator>
<dc:creator>Benazouz, M.</dc:creator>
<dc:creator>Sverchkov, Y.</dc:creator>
<dc:creator>Stone, J.</dc:creator>
<dc:creator>Sharma, H.</dc:creator>
<dc:creator>Bergquist, T.</dc:creator>
<dc:creator>Stewart, R.</dc:creator>
<dc:creator>Mooney, S. D.</dc:creator>
<dc:creator>Craven, M.</dc:creator>
<dc:creator>Radivojac, P.</dc:creator>
<dc:creator>Starita, L. M.</dc:creator>
<dc:creator>Fowler, D. M.</dc:creator>
<dc:creator>Pejaver, V.</dc:creator>
<dc:date>2026-02-18</dc:date>
<dc:identifier>doi:10.64898/2026.02.17.706269</dc:identifier>
<dc:title><![CDATA[Gene- and domain-aware calibration increases the clinical utility of variant effect predictors]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-02-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.02.14.705848v1?rss=1">
<title>
<![CDATA[
A scalable approach to resolving variants of uncertain significance 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.02.14.705848v1?rss=1"
</link>
<description><![CDATA[
Over 90% of missense variants across [~]4,000 disease-associated genes are variants of uncertain significance (VUS). Experimental variant effect measurements provide critical evidence about pathogenicity and inform disease biology, but most variants lack data and clinical translation has been limited. The Impact of Genomic Variation on Function Consortium generated experimental data for 62,215 variants across ten genes using multiplexed assays and 1,407 variants across 163 genes using arrayed assays, curated 193,139 additional community-generated variant effect measurements across 30 additional genes, and developed automated calibration methods for translating experimental data and variant effect predictions into clinical evidence. To reduce current VUS, we developed a scalable workflow using only experimental and predictive evidence, enabling reclassification of 75% of the 16,115 VUS in these genes as pathogenic or benign with <1% error. To minimize future VUS, we analyzed >90,000 unobserved variants; 62% had enough evidence to be "preclassified" as pathogenic or benign. We validated our data, evidence and classifications using All of Us and created interactive resources to enable clinical use of the calibrated data. Thus, for 40 genes, representing 1% of the clinical genome, we resolve most existing VUS and future variants, illustrating how systematic use of scalable evidence can empower genomic medicine.
]]></description>
<dc:creator>Tejura, M.</dc:creator>
<dc:creator>Chen, Y.</dc:creator>
<dc:creator>McEwen, A. E.</dc:creator>
<dc:creator>Stewart, R.</dc:creator>
<dc:creator>Sverchkov, Y.</dc:creator>
<dc:creator>Laval, F.</dc:creator>
<dc:creator>Woo, I.</dc:creator>
<dc:creator>Zeiberg, D.</dc:creator>
<dc:creator>Shen, R.</dc:creator>
<dc:creator>Fayer, S.</dc:creator>
<dc:creator>Stone, J.</dc:creator>
<dc:creator>Smith, N.</dc:creator>
<dc:creator>Casadei, S.</dc:creator>
<dc:creator>Wang, Z. R.</dc:creator>
<dc:creator>Snyder, M.</dc:creator>
<dc:creator>Capodanno, B. J.</dc:creator>
<dc:creator>Gupta, P.</dc:creator>
<dc:creator>Benazouz, M.</dc:creator>
<dc:creator>Jain, S.</dc:creator>
<dc:creator>Heidl, S.</dc:creator>
<dc:creator>Muffley, L.</dc:creator>
<dc:creator>Dong, S.</dc:creator>
<dc:creator>Lin, K.</dc:creator>
<dc:creator>Hitz, B. C.</dc:creator>
<dc:creator>Gabdank, I.</dc:creator>
<dc:creator>Da, E. Y.</dc:creator>
<dc:creator>Best, S.</dc:creator>
<dc:creator>Grindstaff, S.</dc:creator>
<dc:creator>Reinhart, D.</dc:creator>
<dc:creator>Rodriguez-Salas, L.</dc:creator>
<dc:creator>Seid, O.</dc:creator>
<dc:creator>Vandi, A. J.</dc:creator>
<dc:creator>Wenman, C.</dc:creator>
<dc:creator>Wheelock, M. K.</dc:creator>
<dc:creator>Pendyala, S.</dc:creator>
<dc:creator>Holmes, D.</dc:creator>
<dc:creator>Xu, A.</dc:creator>
<dc:creator>Hosokai, A.</dc:creator>
<dc:creator>Tixhon, M.</dc:creator>
<dc:creator>Reno, C.</dc:creator>
<dc:creator>Ewald, J. D.</dc:creator>
<dc:creator>Spirohn-Fitzgerald, K.</dc:creator>
<dc:creator>Teelucksingh, T.</dc:creator>
<dc:creator>Hao, T.</dc:creator>
<dc:creator>Chen, Z. S.</dc:creator>
<dc:creator>Haghighi, M.</dc:creator>
<dc:creator>Hamid, A. K.</dc:creator>
<dc:creator></dc:creator>
<dc:date>2026-02-15</dc:date>
<dc:identifier>doi:10.64898/2026.02.14.705848</dc:identifier>
<dc:title><![CDATA[A scalable approach to resolving variants of uncertain significance]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-02-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.09.09.675271v1?rss=1">
<title>
<![CDATA[
Shared and distinct sequence-function signatures define different modes of human TpoR activation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.09.09.675271v1?rss=1"
</link>
<description><![CDATA[
The human thrombopoietin receptor (hTpoR) exists primarily as JAK2-associated monomers that become activated when converted to dimeric forms that support JAK trans-phosphorylation. This can be achieved by several different modes of stimuli, including the natural ligand Tpo, biologic agonists that bind the same site as Tpo, small-molecule drugs that bind the transmembrane (TM) domain, oncogenic mutations in and near the TM domain, and by association with constitutively active JAK V617F or a mutant form of the chaperone protein calreticulin. It is unclear how the dimeric structures induced by synthetic agonists and mutations relate to one another, and whether any of these induce the same active structure as the native ligand Tpo, yet this has important implications both for fundamental cytokine receptor biology and for development of targeted interventions for hTpoR-driven myeloproliferative diseases. Here we used deep mutational scanning (DMS) across the TM and juxtamembrane (JM) regions of hTpoR to extract feature-rich sequence-function signatures across a variety of different activating contexts. While each displayed some unique features, synthetic agonists and activating mutations all exhibited strong dependence on a common TM interface that is consistent with previous models of a left-handed, near-parallel helix dimer with H499 facing lipid. In contrast, Tpo-mediated activation was broadly insensitive to TM-JM substitutions, indicating that it does not rely on the same interface. Modeling with AlphaFold 3 (AF3) consistently yielded a right-handed, "splayed" helix dimer that is close at the extracellular face, contains H499 in the interface and diverges toward the cytosolic face, resting on an intracellular amphipathic JM helix that lies parallel to the membrane, which is also observed in a DMS/AF3 analysis of human erythropoietin receptor. This splayed Tpo-bound dimer could be stably inserted into a lipid bilayer with associated JAK2 using molecular dynamics and is supported by experiments showing that most or all of the TM domain can be replaced by poly-valine, with little effect on Tpo-driven activation but catastrophic effects on responses to synthetic ligands. Our data support at least two different structural modes of hTpoR activation that reconcile prior biochemical models, rationalize patient variants, and inform mechanism-based agonist and antagonist design.
]]></description>
<dc:creator>Wu, X.</dc:creator>
<dc:creator>McLeod, H.</dc:creator>
<dc:creator>Malinovitch, A.</dc:creator>
<dc:creator>Hunter, S.</dc:creator>
<dc:creator>Ramesh, S.</dc:creator>
<dc:creator>Go, M.</dc:creator>
<dc:creator>Nguyen, J. V.</dc:creator>
<dc:creator>Rubin, A. F.</dc:creator>
<dc:creator>Burger, W. A.</dc:creator>
<dc:creator>Blombery, P.</dc:creator>
<dc:creator>Call, M. E.</dc:creator>
<dc:creator>Call, M. J.</dc:creator>
<dc:date>2025-09-13</dc:date>
<dc:identifier>doi:10.1101/2025.09.09.675271</dc:identifier>
<dc:title><![CDATA[Shared and distinct sequence-function signatures define different modes of human TpoR activation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-09-13</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.04.01.587474v1?rss=1">
<title>
<![CDATA[
Multiplex, multimodal mapping of variant effects in secreted proteins 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.04.01.587474v1?rss=1"
</link>
<description><![CDATA[
Despite widespread advances in DNA sequencing, the functional consequences of most genetic variants remain poorly understood. Multiplexed Assays of Variant Effect (MAVEs) can measure the function of variants at scale, and are beginning to address this problem. However, MAVEs cannot readily be applied to the [~]10% of human genes encoding secreted proteins. We developed a flexible, scalable human cell surface display method, Multiplexed Surface Tethering of Extracellular Proteins (MultiSTEP), to measure secreted protein variant effects. We used MultiSTEP to study the consequences of missense variation in coagulation factor IX (FIX), a serine protease where genetic variation can cause hemophilia B. We combined MultiSTEP with a panel of antibodies to detect FIX secretion and post-translational modification, measuring a total of 44,816 effects for 436 synonymous variants and 8,528 of the 8,759 possible missense variants. 49.6% of possible F9 missense variants impacted secretion, post-translational modification, or both. We also identified functional constraints on secretion within the signal peptide and for nearly all variants that caused gain or loss of cysteine. Secretion scores correlated strongly with FIX levels in hemophilia B and revealed that loss of secretion variants are particularly likely to cause severe disease. Integration of the secretion and post-translational modification scores enabled reclassification of 63.1% of F9 variants of uncertain significance in the My Life, Our Future hemophilia genotyping project. Lastly, we showed that MultiSTEP can be applied to a wide variety of secreted proteins. Thus, MultiSTEP is a multiplexed, multimodal, and generalizable method for systematically assessing variant effects in secreted proteins at scale.
]]></description>
<dc:creator>Popp, N. A.</dc:creator>
<dc:creator>Powell, R. L.</dc:creator>
<dc:creator>Wheelock, M. K.</dc:creator>
<dc:creator>Zapp, B. D.</dc:creator>
<dc:creator>Holmes, K. J.</dc:creator>
<dc:creator>Sheldon, K. M.</dc:creator>
<dc:creator>Fletcher, S. N.</dc:creator>
<dc:creator>Wu, X.</dc:creator>
<dc:creator>Fayer, S.</dc:creator>
<dc:creator>Rubin, A. F.</dc:creator>
<dc:creator>Lannert, K. W.</dc:creator>
<dc:creator>Chang, A. T.</dc:creator>
<dc:creator>Sheehan, J. P.</dc:creator>
<dc:creator>Johnsen, J. M.</dc:creator>
<dc:creator>Fowler, D. M.</dc:creator>
<dc:date>2024-04-01</dc:date>
<dc:identifier>doi:10.1101/2024.04.01.587474</dc:identifier>
<dc:title><![CDATA[Multiplex, multimodal mapping of variant effects in secreted proteins]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-04-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.05.14.653991v1?rss=1">
<title>
<![CDATA[
VEFill: a model for accurate and generalizable deep mutational scanning score imputation across protein domains 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.05.14.653991v1?rss=1"
</link>
<description><![CDATA[
BackgroundDeep Mutational Scanning (DMS) assays can systematically assess the effects of amino acid substitutions on protein function. While DMS datasets have been generated for many targets, they often suffer from incomplete variant coverage due to technical constraints, limiting their utility in variant interpretation and downstream analyses.

ResultsWe developed VEFill, a gradient boosting model for imputing missing DMS scores across protein domains. VEFill is trained on the Human Domainome 1 dataset, a large, standardized set of DMS experiments using a uniform stability-based assay, and integrates a broad set of additional biologically informative features including ESM-1v sequence embeddings, evolutionary conservation (EVE scores), amino acid substitution matrices, and physicochemical descriptors. The model achieved robust predictive performance (R2 = 0.64, Pearson r = 0.80). It also demonstrated reliable generalization to unseen proteins in other stability-based datasets, while showing weaker performance on activity-based assays. Per-protein models further confirmed VEFills effectiveness under limited-data conditions. A reduced two-feature version using only ESM-1v embeddings and mean DMS scores performed comparably to the full model, suggesting a computationally efficient alternative. However, true zeroshot prediction without positional context remains a challenge, particularly for functionally complex proteins.

ConclusionsVEFill offers an interpretable, scalable framework for DMS score imputation, especially effective in stability-focused and sparse-data settings. It enables systematic mutation prioritization and may support the design of efficient experimental libraries for variant effect studies.
]]></description>
<dc:creator>Polunina, P. V.</dc:creator>
<dc:creator>Maier, W.</dc:creator>
<dc:creator>Rubin, A. F.</dc:creator>
<dc:date>2025-05-14</dc:date>
<dc:identifier>doi:10.1101/2025.05.14.653991</dc:identifier>
<dc:title><![CDATA[VEFill: a model for accurate and generalizable deep mutational scanning score imputation across protein domains]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-05-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.03.09.710372v1?rss=1">
<title>
<![CDATA[
Genome-scale mapping of variant, enhancer and gene function in primary human CD4+ T cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.03.09.710372v1?rss=1"
</link>
<description><![CDATA[
CD4+ T cells harbor a disproportionate enrichment of immune disease risk loci and represent the primary cellular context for immune disease biology, yet the genes and regulatory programs these variants affect remain largely unknown. We combined targeted Perturb-seq of 1,032 cis-regulatory elements (CREs) overlapping 4,724 variants across 14 immune diseases with genome-wide Perturb-seq of all expressed genes in primary human CD4+ T cells, spanning 4.1 million cells. We identified 626 CRE-gene pairs, and connected CRE targets to downstream regulatory cascades. At the TYK2 and DEXI/CLEC16A loci, we resolved target genes and linked noncoding variants to inflammatory and metabolic programs. Across diseases, we revealed that dispersed variants converged on shared and disease-specific programs. Our work provides a framework for tracing variant-to-CRE-to-gene-to-network in disease-relevant primary cells.
]]></description>
<dc:creator>Moonen, D. P.</dc:creator>
<dc:creator>Claringbould, A.</dc:creator>
<dc:creator>Gschwind, A. R.</dc:creator>
<dc:creator>Schrod, S.</dc:creator>
<dc:creator>Braunger, J.</dc:creator>
<dc:creator>Feng, C.</dc:creator>
<dc:creator>Rauscher, B.</dc:creator>
<dc:creator>Yi, J.</dc:creator>
<dc:creator>Bi, S. Z.</dc:creator>
<dc:creator>Matthess, Y.</dc:creator>
<dc:creator>Kaulich, M.</dc:creator>
<dc:creator>Acob, R. A.</dc:creator>
<dc:creator>Ayer, A.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:creator>Velten, B.</dc:creator>
<dc:creator>Stegle, O.</dc:creator>
<dc:creator>Trynka, G.</dc:creator>
<dc:creator>Zaugg, J. B.</dc:creator>
<dc:creator>Schraivogel, D.</dc:creator>
<dc:creator>Steinmetz, L. M.</dc:creator>
<dc:date>2026-03-11</dc:date>
<dc:identifier>doi:10.64898/2026.03.09.710372</dc:identifier>
<dc:title><![CDATA[Genome-scale mapping of variant, enhancer and gene function in primary human CD4+ T cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-03-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.09.25.678665v1?rss=1">
<title>
<![CDATA[
Sensitive, direct detection of non-coding off-target base editor unwinding and editing in primary cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.09.25.678665v1?rss=1"
</link>
<description><![CDATA[
Base editors create precise nucleotide changes in DNA, but their off-target activity remains challenging to quantify. Here, we develop and deploy a direct, in cellulo sequencing assay that simultaneously measures both Cas9-mediated unwinding and deaminase editing of genomic DNA (beCasKAS). Our strategy nominates >460-fold more potential off-target sites than other methods by enriching for Cas9-dependent R-loops immediately preceding editing. Using beCasKAS in primary human T-cells, we observe that mRNA-encoded ABE8e and PAMless ABE8e-SpRY base editors have distinct off-target profiles that can be mitigated by optimizing mRNA dose. Finally, we combine beCasKAS with base-resolution deep learning models to risk-stratify off-target edits by their likelihood of epigenetic dysregulation. Collectively, beCasKAS offers a sensitive and facile tool to optimize the balance between base editor on- and off-target activity.
]]></description>
<dc:creator>Wang, T.</dc:creator>
<dc:creator>Jessa, S.</dc:creator>
<dc:creator>Marinov, G. K.</dc:creator>
<dc:creator>Klemm, S.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Greenleaf, W. J.</dc:creator>
<dc:date>2025-09-25</dc:date>
<dc:identifier>doi:10.1101/2025.09.25.678665</dc:identifier>
<dc:title><![CDATA[Sensitive, direct detection of non-coding off-target base editor unwinding and editing in primary cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-09-25</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.09.16.676677v1?rss=1">
<title>
<![CDATA[
An unbiased survey of distal element-gene regulatory interactions with direct-capture targeted Perturb-seq 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.09.16.676677v1?rss=1"
</link>
<description><![CDATA[
A major challenge in human genetics is to identify all distal regulatory elements and determine their effects on target gene expression in a given cell type. To this end, large-scale CRISPR screens have been conducted to perturb thousands of candidate enhancers. Using these data, predictive models have been developed that aim to generalize such findings to predict which enhancers regulate which genes across the genome. However, existing CRISPR methods and large-scale datasets have limitations in power, scale, or selection bias, with the potential to skew our understanding of the properties of distal regulatory elements and confound our ability to evaluate predictive models. Here, we develop a new framework for highly powered, unbiased CRISPR screens, including an optimized experimental method (Direct-Capture Targeted Perturb-seq (DC-TAP-seq)), a random design strategy, and a comprehensive analytical pipeline that accounts for statistical power. We applied this framework to survey 1,425 randomly selected candidate regulatory elements across two human cell lines. Our results reveal fundamental properties of distal regulatory elements in the human genome. Most element-gene regulatory interactions are estimated to have small effect sizes (<10%), which previous experiments were not powered to detect. Most cis-regulatory interactions occur over short genomic distances (<100 kb). A large fraction of the discovered regulatory elements bind CTCF but do not show chromatin marks typical of classical enhancers. Housekeeping genes have similar frequencies of distal regulatory elements compared to other genes, but with 2-fold weaker effect sizes. Comparisons to the predictions of the ENCODE-rE2G model suggest that, while performance is similar across two cell types, new models will be needed to detect elements with weaker effect sizes, regulatory effects of CTCF sites, and enhancers for housekeeping genes. Overall, this study describes the first unbiased, perturbation-based survey of thousands of distal regulatory element-gene connections, and provides a framework for expanding such efforts to build more complete maps of distal regulation in the human genome.
]]></description>
<dc:creator>Ray, J.</dc:creator>
<dc:creator>Jagoda, E.</dc:creator>
<dc:creator>Sheth, M. U.</dc:creator>
<dc:creator>Galante, J.</dc:creator>
<dc:creator>Amgalan, D.</dc:creator>
<dc:creator>Gschwind, A. R.</dc:creator>
<dc:creator>Munger, C. J.</dc:creator>
<dc:creator>Huang, J.</dc:creator>
<dc:creator>Munson, G.</dc:creator>
<dc:creator>Murphy, M.</dc:creator>
<dc:creator>Mattei, E.</dc:creator>
<dc:creator>Barry, T.</dc:creator>
<dc:creator>Singh, V.</dc:creator>
<dc:creator>Baskaran, A.</dc:creator>
<dc:creator>Kang, H.</dc:creator>
<dc:creator>Katsevich, E.</dc:creator>
<dc:creator>Steinmetz, L. M.</dc:creator>
<dc:creator>Engreitz, J.</dc:creator>
<dc:date>2025-09-17</dc:date>
<dc:identifier>doi:10.1101/2025.09.16.676677</dc:identifier>
<dc:title><![CDATA[An unbiased survey of distal element-gene regulatory interactions with direct-capture targeted Perturb-seq]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-09-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.07.24.666684v1?rss=1">
<title>
<![CDATA[
An automated ATAC-seq method reveals sequence determinants of transcription factor dose response in the open chromatin 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.07.24.666684v1?rss=1"
</link>
<description><![CDATA[
Transcription factor (TF) dosage is a critical determinant of cellular identity. However, the quantitative relationship between TF dosage and its regulation of chromatin accessibility and gene expression remains poorly understood. To address this, we developed RoboATAC, a scalable, automated ATAC-seq platform for high-throughput accessibility profiling. We then systematically profiled genome-wide chromatin accessibility and gene expression changes induced by graded overexpression of 22 TFs in HEK293T cells (246 total samples), observing dose-dependent changes in accessibility and aggregate TF footprints. Modeling accessibility as a function of sequence and chromatin states revealed that DNA sequence alone accurately predicts dosage sensitivity at elements that become accessible, with low-affinity motifs requiring higher TF levels to induce accessibility. Interpretable deep learning models revealed contributions of motif orientation, spacing, and flanking bases to accessibility, both recapitulating known motifs and nominating novel dosage-sensitive motif arrangements. Nucleosome positioning analysis uncovered two distinct, TF identity dependent patterns by which accessibility is established by changing nucleosome position and occupancy.
]]></description>
<dc:creator>Liu, B. B.</dc:creator>
<dc:creator>Shimasawa, M.</dc:creator>
<dc:creator>Vermeulen, S.</dc:creator>
<dc:creator>Kim, S. H.</dc:creator>
<dc:creator>Iremadze, N.</dc:creator>
<dc:creator>Lipson, D.</dc:creator>
<dc:creator>Shipony, Z.</dc:creator>
<dc:creator>Greenleaf, W. J.</dc:creator>
<dc:date>2025-07-27</dc:date>
<dc:identifier>doi:10.1101/2025.07.24.666684</dc:identifier>
<dc:title><![CDATA[An automated ATAC-seq method reveals sequence determinants of transcription factor dose response in the open chromatin]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-07-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.06.04.655863v1?rss=1">
<title>
<![CDATA[
Vascular smooth muscle cell atherosclerosis trajectories characterized at single cell resolution identify causal transcriptomic and epigenomic mechanisms of disease risk 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.06.04.655863v1?rss=1"
</link>
<description><![CDATA[
Vascular smooth muscle cells (SMC) contribute to heritable coronary artery disease (CAD) risk and undergo complex cell state transitions to multiple disease related phenotypes. To investigate the genetic basis of SMC state trajectories that underlie the SMC component of CAD causality we have developed a dense timecourse single cell transcriptomic and epigenetic map of atherosclerosis in a murine disease animal model. Cellular trajectories were derived from the temporal data and probabilistic fate modeling with Waddington-Optimal Transport (WOT). We created transcription factor (TF) centered regulons mapped across the developmental timeline and through network-based prioritization with WOT predicted TFs and in silico TF perturbation, identified key drivers of cell state changes associated with EMT, vascular development, and circadian clock functions. Integration of mouse disease data with human CAD genetic findings identified transition SMC phenotypes that mediate disease risk and point to causal disease mechanisms. Parallel studies using knockout of the validated CAD gene Tcf21 revealed its impact on SMC transition cellular phenotypes and disease risk genes, due in part to a role regulating the transition of SMC precursor cells in the secondary heart field. Together, these studies characterize atherosclerosis trajectories at single cell resolution and identify genetic causal transcriptomic and epigenomic mechanisms of CAD risk.
]]></description>
<dc:creator>Li, D. Y.</dc:creator>
<dc:creator>Kundu, S.</dc:creator>
<dc:creator>Cheng, P.</dc:creator>
<dc:creator>Gu, W.</dc:creator>
<dc:creator>Jackson, W. R.</dc:creator>
<dc:creator>Zhao, Q.</dc:creator>
<dc:creator>Nguyen, T.</dc:creator>
<dc:creator>Worssam, M.</dc:creator>
<dc:creator>Monteiro, J. P.</dc:creator>
<dc:creator>Caceres, R. D.</dc:creator>
<dc:creator>Dale, S.</dc:creator>
<dc:creator>Palmisano, B.</dc:creator>
<dc:creator>Weldy, C. S.</dc:creator>
<dc:creator>Kundu, R.</dc:creator>
<dc:creator>Wirka, R. C.</dc:creator>
<dc:creator>Quertermous, T.</dc:creator>
<dc:date>2025-06-06</dc:date>
<dc:identifier>doi:10.1101/2025.06.04.655863</dc:identifier>
<dc:title><![CDATA[Vascular smooth muscle cell atherosclerosis trajectories characterized at single cell resolution identify causal transcriptomic and epigenomic mechanisms of disease risk]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-06-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.10.15.680997v1?rss=1">
<title>
<![CDATA[
Deep learning the dynamic regulatory sequence code of cardiac organoid differentiation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.10.15.680997v1?rss=1"
</link>
<description><![CDATA[
Defining temporal gene regulatory programs driving human organogenesis is essential for understanding congenital defects. We combined a time-resolved, single-cell multi-omic atlas of human induced pluripotent stem cell-derived cardiac organoids with deep learning models of chromatin accessibility, enabling systematic discovery of cis-regulatory syntax underlying heart development and disease. This framework identified cell-state-specific motif syntax, linked motif instances to candidate target genes, and resolved programs governing lineage divergence. Integrating cell-state-resolved molecular profiles with computationally predicted variant effects from congenital heart disease (CHD) cases enabled the prioritization of noncoding variants predicted to disrupt developmental transitions, supporting the paradigm that disease etiology derives from perturbations to regulatory networks governing cardiogenesis. Experimental validation demonstrated that an intronic ANGPTL2 variant altered differentiation outcomes, implicating ANGPTL2 in CHD. This study bridges developmental regulation with disease genetics, establishing a framework for discovering the genetic and molecular basis of congenital disorders.
]]></description>
<dc:creator>Metzl-Raz, E.</dc:creator>
<dc:creator>Zhao, R.</dc:creator>
<dc:creator>Deshpande, S.</dc:creator>
<dc:creator>Powell, J.</dc:creator>
<dc:creator>Porter, E. G.</dc:creator>
<dc:creator>Zouaghi, Y.</dc:creator>
<dc:creator>Liu, B. B.</dc:creator>
<dc:creator>Kim, S. H.</dc:creator>
<dc:creator>Abdi, I.</dc:creator>
<dc:creator>Evergreen, I.</dc:creator>
<dc:creator>Agarwal, M.</dc:creator>
<dc:creator>Sheth, M. U.</dc:creator>
<dc:creator>Rico, J.</dc:creator>
<dc:creator>Miyamoto, M.</dc:creator>
<dc:creator>Sanchez, J. M.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Greenleaf, W. J.</dc:creator>
<dc:creator>Gifford, C. A.</dc:creator>
<dc:date>2025-10-15</dc:date>
<dc:identifier>doi:10.1101/2025.10.15.680997</dc:identifier>
<dc:title><![CDATA[Deep learning the dynamic regulatory sequence code of cardiac organoid differentiation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-10-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.11.23.624931v1?rss=1">
<title>
<![CDATA[
Mapping enhancer-gene regulatory interactions from single-cell data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.11.23.624931v1?rss=1"
</link>
<description><![CDATA[
Mapping enhancers and their target genes in specific cell types is crucial for understanding gene regulation and human disease genetics. However, accurately predicting enhancer-gene regulatory interactions from single-cell datasets has been challenging. Here, we introduce a new family of classification models, scE2G, to predict enhancer-gene regulation. These models use features from single-cell ATAC-seq or multiomic RNA and ATAC-seq data and are trained on a CRISPR perturbation dataset including >10,000 evaluated element-gene pairs. We benchmark scE2G models against CRISPR perturbations, fine-mapped eQTLs, and GWAS variant-gene associations and demonstrate state-of-the-art performance at prediction tasks across multiple cell types and categories of perturbations. We apply scE2G to build maps of enhancer-gene regulatory interactions in heterogeneous tissues and interpret noncoding variants associated with complex traits, nominating regulatory interactions linking INPP4B and IL15 to lymphocyte counts. The scE2G models will enable accurate mapping of enhancer-gene regulatory interactions across thousands of diverse human cell types.
]]></description>
<dc:creator>Sheth, M. U.</dc:creator>
<dc:creator>Qiu, W.-L.</dc:creator>
<dc:creator>Ma, X. R.</dc:creator>
<dc:creator>Gschwind, A. R.</dc:creator>
<dc:creator>Jagoda, E.</dc:creator>
<dc:creator>Tan, A. S.</dc:creator>
<dc:creator>Einarsson, H.</dc:creator>
<dc:creator>Gorissen, B. L.</dc:creator>
<dc:creator>Dubocanin, D.</dc:creator>
<dc:creator>McGinnis, C. S.</dc:creator>
<dc:creator>Amgalan, D.</dc:creator>
<dc:creator>Satpathy, A. T.</dc:creator>
<dc:creator>Jones, T. R.</dc:creator>
<dc:creator>Steinmetz, L. M.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Ustun, B.</dc:creator>
<dc:creator>Engreitz, J. M.</dc:creator>
<dc:creator>Andersson, R.</dc:creator>
<dc:date>2024-11-24</dc:date>
<dc:identifier>doi:10.1101/2024.11.23.624931</dc:identifier>
<dc:title><![CDATA[Mapping enhancer-gene regulatory interactions from single-cell data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-11-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.12.20.695738v1?rss=1">
<title>
<![CDATA[
Predicting interaction-specific protein-protein interaction perturbations by missense variants with MutPred-PPI 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.12.20.695738v1?rss=1"
</link>
<description><![CDATA[
Disruption of protein-protein interactions (PPIs) is a major mechanism of a variants deleterious effect. Computational tools are needed to assess such variants at scale, yet existing predictors rarely consider loss of specific interactions, particularly when variants perturb binding interfaces without significantly affecting protein stability. To address this problem, we present MutPred-PPI, a graph attention network that predicts interaction-specific (edgetic) effects of missense variants by operating on AlphaFold 3-based protein complex contact graphs with protein language model embeddings imposed upon nodes. We systematically evaluated our model with stringent group cross-validation as well as benchmark data recently collected within the IGVF Consortium. MutPred-PPI outperformed all baseline methods across all evaluation criteria, achieving an AUC of 0.85 on seen proteins and 0.72 on previously unseen proteins in cross-validation, demonstrating strong generalizability despite scarce training data. To demonstrate biomedical relevance, we applied MutPred-PPI to variants from ClinVar, HGMD, COSMIC, gnomAD, and two de novo neurodevelopmental disorder-linked datasets. Disease-associated variants from Clin-Var and HGMD showed strong enrichment for both quasi-null and edgetic effects, whereas population variants from gnomAD increasingly preserved interactions with higher allele frequencies. Notably, we observed a strong edgetic disruption signature in highly recurrent cancer variants from both the full COSMIC dataset and a subset of variants from oncogenes. Recurrent tumor suppressor gene variants and autism spectrum disorder-associated variants exhibited moderate quasi-null enrichment, whilst neurodevelopmental disorder-linked variants showed a weak edgetic disruption signature. These results indicate distinct PPI perturbation mechanisms across disease types and show that MutPred-PPI captures functionally relevant molecular effects of pathogenic variants.
]]></description>
<dc:creator>Stewart, R.</dc:creator>
<dc:creator>Laval, F.</dc:creator>
<dc:creator>Coppin, G.</dc:creator>
<dc:creator>Spirohn-Fitzgerald, K.</dc:creator>
<dc:creator>Tixhon, M.</dc:creator>
<dc:creator>Hao, T.</dc:creator>
<dc:creator>Calderwood, M. A.</dc:creator>
<dc:creator>Mort, M.</dc:creator>
<dc:creator>Cooper, D. N.</dc:creator>
<dc:creator>Vidal, M.</dc:creator>
<dc:creator>Radivojac, P.</dc:creator>
<dc:date>2025-12-23</dc:date>
<dc:identifier>doi:10.64898/2025.12.20.695738</dc:identifier>
<dc:title><![CDATA[Predicting interaction-specific protein-protein interaction perturbations by missense variants with MutPred-PPI]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-12-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.11.18.688830v1?rss=1">
<title>
<![CDATA[
Systematic and proactive evaluation of AIRE missense variant effects 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.11.18.688830v1?rss=1"
</link>
<description><![CDATA[
Pathogenic variants in the Autoimmune Regulator (AIRE) gene cause Autoimmune Polyendocrine Syndrome Type 1 (APS-1), a rare primary immunodeficiency disease with symptoms including hypoparathyroidism, adrenal insufficiency, and chronic mucocutaneous candidiasis. AIRE increases the expression and presentation of tissue-specific genes expressing  self antigens in the developing T cell niche, thus triggering the elimination of self-reactive T cells and preventing autoimmunity. Earlier diagnoses can benefit patients, and APS-1 diagnosis by AIRE sequencing is increasingly common. However, two thirds of reported clinical variants are missense, and more than half of these are "variants of uncertain significance" (VUS). Cell-based variant functional assays can provide strong evidence towards more informative variant classification, but these are carried out reactively, often years after clinical presentation. By contrast, proactively assessing all possible missense variants could provide immediate evidence to guide genetic diagnosis, even for never-before-seen variants. Here we used an insulin promoter-driven reporter to proactively assess the function of 9790 AIRE missense variants. The resulting AIRE variant effect map both validates and extends current biochemical knowledge, concords with pathogenicity annotations, and provides proactive evidence for 70% of previously-reported VUS. Placing our map in the context of both an international APS-1 cohort and the UK BioBank revealed quantitative genotype-phenotype correlations. Using current guidelines, we provide classifications for 32% of current VUS. Together, our proactive resource of AIRE variant impacts offers the potential to improve patient outcomes via more rapid and definitive APS-1 diagnosis.
]]></description>
<dc:creator>Axakova, A.</dc:creator>
<dc:creator>Berger, A. H.</dc:creator>
<dc:creator>van Loggerenberg, W.</dc:creator>
<dc:creator>Kishore, N.</dc:creator>
<dc:creator>Gebbia, M.</dc:creator>
<dc:creator>Ding, M. X.</dc:creator>
<dc:creator>Douville, S. V.</dc:creator>
<dc:creator>Tabet, D. R.</dc:creator>
<dc:creator>Cote, A. G.</dc:creator>
<dc:creator>Weile, J.</dc:creator>
<dc:creator>Johansson, S.</dc:creator>
<dc:creator>Bratland, E.</dc:creator>
<dc:creator>Roth, F. P.</dc:creator>
<dc:date>2025-11-18</dc:date>
<dc:identifier>doi:10.1101/2025.11.18.688830</dc:identifier>
<dc:title><![CDATA[Systematic and proactive evaluation of AIRE missense variant effects]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-11-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.07.14.664734v1?rss=1">
<title>
<![CDATA[
Comprehensively Testing the Function of Missense Variation in the STK11 Tumour Suppressor 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.07.14.664734v1?rss=1"
</link>
<description><![CDATA[
The tumor suppressor gene STK11 encoding Serine/Threonine Kinase 11 (STK11) is associated with Peutz-Jeghers Syndrome (PJS), a heritable gastrointestinal disease that increases lifetime cancer risk, and with somatic variation that contributes to [~]30% of lung and 20% of cervical cancers. Although identifying pathogenic variants is clinically actionable, over 94% of STK11 missense variants that have been observed clinically lack a definitive classification. We therefore measured the impact of STK11 variants at scale in a mammalian cell-based assay, scoring 6,026 (73% of all possible) amino acid substitutions across the full-length gene. Functional scores--which were consistent with biochemical properties, smaller-scale assays, and pathogenicity annotations--identified a subset of PJS patients with germline STK11 variants diagnosed later in life, as well as somatic STK11 variants found in cancer patients that had comparable overall survival estimates to wild-type STK11. Our scores provided new evidence for 350 annotated VUS STK11 missense variants and [~]80% of missense variants that have not yet been reported clinically, but we might expect to observe in the future. Thus, our effect map provides a proactive resource for gaining sequence-structure-function insights and evidence for actionable interpretation of clinical missense variants.
]]></description>
<dc:creator>Zimmerman, D.</dc:creator>
<dc:creator>Cote, A.</dc:creator>
<dc:creator>van Loggerenberg, W.</dc:creator>
<dc:creator>Gebbia, M.</dc:creator>
<dc:creator>Kishore, N.</dc:creator>
<dc:creator>Weile, J.</dc:creator>
<dc:creator>Li, R.</dc:creator>
<dc:creator>Reno, C.</dc:creator>
<dc:creator>Marsh, A.</dc:creator>
<dc:creator>Hernandez, F.</dc:creator>
<dc:creator>Shahagadkar, P.</dc:creator>
<dc:creator>Grove, L.</dc:creator>
<dc:creator>Meier, S.</dc:creator>
<dc:creator>Wu, H.-J.</dc:creator>
<dc:creator>Fengolia, S.</dc:creator>
<dc:creator>Ahronian, L.</dc:creator>
<dc:creator>Teng, T.</dc:creator>
<dc:creator>Waters, A. J.</dc:creator>
<dc:creator>Seward, D.</dc:creator>
<dc:creator>Taipale, M.</dc:creator>
<dc:creator>Aronson, M.</dc:creator>
<dc:creator>Richardson, M. E.</dc:creator>
<dc:creator>Adams, D.</dc:creator>
<dc:creator>Roth, F.</dc:creator>
<dc:date>2025-07-18</dc:date>
<dc:identifier>doi:10.1101/2025.07.14.664734</dc:identifier>
<dc:title><![CDATA[Comprehensively Testing the Function of Missense Variation in the STK11 Tumour Suppressor]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-07-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.02.25.640191v1?rss=1">
<title>
<![CDATA[
Landscapes of missense variant impact for human superoxide dismutase 1 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.02.25.640191v1?rss=1"
</link>
<description><![CDATA[
Amyotrophic lateral sclerosis (ALS) is a progressive motor neuron disease for which important subtypes are caused by variation in the Superoxide Dismutase 1 gene SOD1. Diagnosis based on SOD1 sequencing can not only be definitive but also indicate specific therapies available for SOD1-associated ALS (SOD1-ALS). Unfortunately, SOD1-ALS diagnosis is limited by the fact that a substantial fraction (currently 26%) of ClinVar SOD1 missense variants are classified as "variants of uncertain significance" (VUS). Although functional assays can provide strong evidence for clinical variant interpretation, SOD1 assay validation is challenging, given the current incomplete and controversial understanding of SOD1-ALS disease mechanism. Using saturation mutagenesis and multiplexed cell-based assays, we measured the functional impact of over two thousand SOD1 amino acid substitutions on both enzymatic function and protein abundance. The resulting  missense variant effect maps not only reflect prior biochemical knowledge of SOD1 but also provide sequence-structure-function insights. Importantly, our variant abundance assay can discriminate pathogenic missense variation and provides new evidence for 41% of missense variants that had been previously reported as VUS, offering the potential to identify additional patients who would benefit from therapy approved for SOD1-ALS.
]]></description>
<dc:creator>Axakova, A.</dc:creator>
<dc:creator>Ding, M. X.</dc:creator>
<dc:creator>Cote, A. G.</dc:creator>
<dc:creator>Subramaniam, R.</dc:creator>
<dc:creator>Senguttuvan, V.</dc:creator>
<dc:creator>Zhang, H.</dc:creator>
<dc:creator>Weile, J.</dc:creator>
<dc:creator>Douville, S. V.</dc:creator>
<dc:creator>Gebbia, M.</dc:creator>
<dc:creator>Al-Chalabi, A.</dc:creator>
<dc:creator>Wahl, A.</dc:creator>
<dc:creator>Reuter, J.</dc:creator>
<dc:creator>Hurt, J.</dc:creator>
<dc:creator>Mitchell, A.</dc:creator>
<dc:creator>Fradette, S.</dc:creator>
<dc:creator>Andersen, P. M.</dc:creator>
<dc:creator>van Loggerenberg, W.</dc:creator>
<dc:creator>Roth, F. P.</dc:creator>
<dc:date>2025-02-28</dc:date>
<dc:identifier>doi:10.1101/2025.02.25.640191</dc:identifier>
<dc:title><![CDATA[Landscapes of missense variant impact for human superoxide dismutase 1]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-02-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.12.15.694070v1?rss=1">
<title>
<![CDATA[
Comprehensive perturbation of transcription factors in human cardiomyocytes reveals the regulatory architecture of congenital heart disease 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.12.15.694070v1?rss=1"
</link>
<description><![CDATA[
Over 100 genes have been implicated in congenital heart disease (CHD), yet the genetic basis for >50% of CHD remains unexplained. A key challenge is to define the regulatory architecture of CHD genes. Here, we systematically perturb 1983 transcription factors (TFs) during cardiomyocyte differentiation of human stem cells. Our analysis links TFs to gene expression phenotypes across scales and nominates TFs in cardiac cell fate commitment, dosage sensitivity, and transposable element regulation. By deriving TF-gene regulatory networks from experimental perturbation, we gain insights into how developmental networks are structured, use this map to interpret the regulatory architecture of CHD, and deorphanize TFs with roles in CHD that have been under-sampled by genetic studies. To extend these networks to include enhancer-TF-gene linkages, we also perturb 981 putative enhancers of TFs. Finally, we develop a deep learning transformer model to accurately predict perturbed TFs from altered transcriptomes, which we apply to interpret CHD patient transcriptomes. This reference map represents a foundational platform to model the functions of TFs and aids in interpreting CHD variants and mechanisms.

HighlightsO_LISystematic perturbation of 1983 human TFs in cardiac differentiation.
C_LIO_LIDefining the regulatory architecture of CHD genes and cardiac cell fate commitment.
C_LIO_LINominating TFs in dosage sensitivity and regulation of transposable elements.
C_LIO_LIExtended enhancer-TF-gene networks improve fine mapping of patient-derived variants.
C_LIO_LIPerturbation models nominate TFs driving altered regulatory networks in patients.
C_LI
]]></description>
<dc:creator>Takeuchi, C.</dc:creator>
<dc:creator>Sivakumar, S.</dc:creator>
<dc:creator>Sundarrajan, A.</dc:creator>
<dc:creator>Wang, Y.</dc:creator>
<dc:creator>Goetsch, S. C.</dc:creator>
<dc:creator>Zhao, H.</dc:creator>
<dc:creator>Wang, L.</dc:creator>
<dc:creator>Nzima, M.</dc:creator>
<dc:creator>Deng, M.</dc:creator>
<dc:creator>Kulkarni, K. N.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Wu, J.</dc:creator>
<dc:creator>Posner, B. A.</dc:creator>
<dc:creator>Chahrour, M. H.</dc:creator>
<dc:creator>Kraus, W. L.</dc:creator>
<dc:creator>Munshi, N. V.</dc:creator>
<dc:creator>Hon, G. C.</dc:creator>
<dc:date>2025-12-17</dc:date>
<dc:identifier>doi:10.64898/2025.12.15.694070</dc:identifier>
<dc:title><![CDATA[Comprehensive perturbation of transcription factors in human cardiomyocytes reveals the regulatory architecture of congenital heart disease]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-12-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.01.13.699328v1?rss=1">
<title>
<![CDATA[
3D reconstruction of spatial transcriptomics with spatial pattern enhanced graph convolutional neural network 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.01.13.699328v1?rss=1"
</link>
<description><![CDATA[
Spatially resolved transcriptomics (SRT) is a promising new technology that enables simultaneous analysis of gene expression and spatial information for biomedical research. However, the existing statistical and deep learning algorithms used for analyzing SRT data rely solely on two-dimensional (2D) spatial coordinates, which limits their ability to accurately identify spatial domains, spatially variable genes, cell-to-cell communications, and developmental trajectories in a three-dimensional (3D) spatial manner. To address these limitations, we introduced Spa3D, which utilized the anti-leakage Fourier transform and graph convolutional neural network model to reconstruct 3D-based spatial structures from multiple 2D SRT slices. We demonstrate that Spa3D is appliable to analyze data from various SRT technology platforms and outperforms state-of-art methods by: (I) improving spatial domain identification through 3D reconstruction, (II) elucidating cell-cell communication landscape in the 3D cellular organization, (III) modeling of organ-level tempo-spatial development patterns in a 3D fashion, and (IV) annotating 3D spatial trajectory that are not captured by 2D spatial coordinates.

Key pointsO_LIMost existing spatial omics analysis methods rely on 2D data, limiting their ability to capture full spatial and developmental tissue complexity
C_LIO_LISpa3D incorporates physical z-axis distances, enabling accurate 3D modeling even when adjacent slices vary in tissue structure and composition
C_LIO_LISpa3D reconstructs true 3D spatial structures from 2D SRT slices using graph convolutional networks and anti-leakage Fourier transforms
C_LIO_LISpa3D enhances spatial domain detection, revealing detailed cell-cell communication and organ-level development patterns across multiple spatial omics platforms
C_LIO_LISpa3D enables novel biological discoveries by revealing spatial features and trajectories not detectable using traditional 2D transcriptomic analysis approaches
C_LI
]]></description>
<dc:creator>Tang, C.</dc:creator>
<dc:creator>Zhou, Y.</dc:creator>
<dc:creator>Xiao, X.</dc:creator>
<dc:creator>Dong, L.</dc:creator>
<dc:creator>Yu, L.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:creator>Xiao, G.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:date>2026-01-14</dc:date>
<dc:identifier>doi:10.64898/2026.01.13.699328</dc:identifier>
<dc:title><![CDATA[3D reconstruction of spatial transcriptomics with spatial pattern enhanced graph convolutional neural network]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-01-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.02.17.638766v1?rss=1">
<title>
<![CDATA[
SpaFun: Discovering Domain-specific Spatial Expression Patterns and New Disease-Relevant Genes using Functional Principal Component Analysis 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.02.17.638766v1?rss=1"
</link>
<description><![CDATA[
SpaFun is a novel, non-model-based method developed to address limitations in existing spatially variable gene (SVG) detection techniques, particularly for large-scale spatially resolved transcriptomics (SRT) datasets. These limitations include computational inefficiency, limited statistical power with increasing data size, and the inability to capture spatial heterogeneity and co-expression patterns among genes. Built on functional principal component analysis (fPCA), SpaFun identifies domain-representative genes (DRGs) with significantly better computational efficiency and greater statistical power while accounting for spatial heterogeneity and co-expression patterns among genes. We applied SpaFun to three SRT datasets and demonstrated that SpaFun outperformed state-of-the-art algorithms for identifying representative genes for tumor regions (e.g., DESeq, edgeR, and limma), as well as recently developed novel algorithms designed for spatial omics to identify the representative genes (e.g., SPARK and CSIDE). This highlights SpaFuns ability to accurately identify genes most representative of each spatial domain (e.g., tumor, immune, or stroma regions). By uncovering novel disease-relevant genes overlooked by existing algorithms, SpaFun could provide insights into new molecular mechanisms and propose innovative therapeutic strategies to improve patient outcomes.

Key PointsO_LISpaFun is the first method dedicated to identifying DRGs, capturing spatially representative expression patterns within annotated tissue regions, setting it apart from all traditional SVG detection methods.
C_LIO_LIIt leverages fPCA to model gene expression as a function of spatial location, avoiding reliance on predefined spatial distribution assumptions.
C_LIO_LIThe non-model-based framework ensures compatibility with different SRT platforms and experimental designs, making it a scalable and widely applicable tool for SRT research.
C_LI
]]></description>
<dc:creator>Jiang, X.</dc:creator>
<dc:creator>Guo, Y.</dc:creator>
<dc:creator>Guo, L.</dc:creator>
<dc:creator>Zhong, L.</dc:creator>
<dc:creator>Wang, J.</dc:creator>
<dc:creator>Xiao, G.</dc:creator>
<dc:creator>Li, Q.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:date>2025-02-21</dc:date>
<dc:identifier>doi:10.1101/2025.02.17.638766</dc:identifier>
<dc:title><![CDATA[SpaFun: Discovering Domain-specific Spatial Expression Patterns and New Disease-Relevant Genes using Functional Principal Component Analysis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-02-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.01.21.633969v1?rss=1">
<title>
<![CDATA[
Benchmarking and optimizing Perturb-seq in differentiating human pluripotent stem cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.01.21.633969v1?rss=1"
</link>
<description><![CDATA[
Perturb-seq is a powerful approach to systematically assess how genes and enhancers impact the molecular and cellular pathways of development and disease. However, technical challenges have limited its application in stem cell-based systems. Here, we benchmarked Perturb-seq across multiple CRISPRi modalities, on diverse genomic targets, in multiple human pluripotent stem cells, during directed differentiation to multiple lineages, and across multiple sgRNA delivery systems. To ensure cost-effective production of large-scale Perturb-seq datasets as part of the Impact of Genomic Variants on Function (IGVF) consortium, our optimized protocol dynamically assesses experiment quality across the weeks-long procedure. Our analysis of 1,996,260 sequenced cells across benchmarking datasets reveals shared regulatory networks linking disease-associated enhancers and genes with downstream targets during cardiomyocyte differentiation. This study establishes open tools and resources for interrogating genome function during stem cell differentiation.
]]></description>
<dc:creator>Sivakumar, S.</dc:creator>
<dc:creator>Wang, Y.</dc:creator>
<dc:creator>Goetsch, S. C.</dc:creator>
<dc:creator>Pandit, V.</dc:creator>
<dc:creator>Wang, L.</dc:creator>
<dc:creator>Zhao, H.</dc:creator>
<dc:creator>Sundarrajan, A.</dc:creator>
<dc:creator>Armendariz, D. A.</dc:creator>
<dc:creator>Takeuchi, C.</dc:creator>
<dc:creator>Nzima, M.</dc:creator>
<dc:creator>Chen, W.-C.</dc:creator>
<dc:creator>Dederich, A. E.</dc:creator>
<dc:creator>El Hayek, L.</dc:creator>
<dc:creator>Gao, T.</dc:creator>
<dc:creator>Ghazawi, R.</dc:creator>
<dc:creator>Gogate, A.</dc:creator>
<dc:creator>Kaur, K.</dc:creator>
<dc:creator>Kim, H. B.</dc:creator>
<dc:creator>McCoy, M.</dc:creator>
<dc:creator>Niederstrasser, H.</dc:creator>
<dc:creator>Oura, S.</dc:creator>
<dc:creator>Pinzon-Arteaga, C. A.</dc:creator>
<dc:creator>Sanghvi, M.</dc:creator>
<dc:creator>Schmitz, D. A.</dc:creator>
<dc:creator>Yu, L.</dc:creator>
<dc:creator>Zhang, Y.</dc:creator>
<dc:creator>Zhou, Q.</dc:creator>
<dc:creator>Kraus, W. L.</dc:creator>
<dc:creator>Xu, L.</dc:creator>
<dc:creator>Wu, J.</dc:creator>
<dc:creator>Posner, B. A.</dc:creator>
<dc:creator>Chahrour, M. H.</dc:creator>
<dc:creator>Hon, G. C.</dc:creator>
<dc:creator>Munshi, N. V.</dc:creator>
<dc:date>2025-01-23</dc:date>
<dc:identifier>doi:10.1101/2025.01.21.633969</dc:identifier>
<dc:title><![CDATA[Benchmarking and optimizing Perturb-seq in differentiating human pluripotent stem cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-01-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.03.07.710282v1?rss=1">
<title>
<![CDATA[
Reprogramming of neuronal genome function and phenotype by astrocytes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.03.07.710282v1?rss=1"
</link>
<description><![CDATA[
Heterotypic cell-cell interactions are critical to governing cellular physiology, disease progression, and responses to the environment and pharmacologic interventions. For example, neurons and astrocytes engage in intricate interactions that are essential for brain development and function1-3. However, the transformation of these extracellular signals into epigenomic regulation that governs cell function is poorly understood. Here, we report that weeks of co-culture between human induced pluripotent stem cell (hiPSC)-derived neurons and mouse cortical astrocytes extensively reprograms gene expression and the chromatin accessibility landscape in neurons, affecting thousands of genes and putative gene regulatory elements (REs), including many transcription factors (TFs). These genes are enriched for functions implicated in neuronal differentiation and maturation, and tend to be impacted in schizophrenia, and autosomal dominant Alzheimers disease. Through complementary CRISPR interference and activation screens, we recapitulated hundreds of astrocyte-induced transcriptional and chromatin remodeling events in mono-cultured neurons at both promoters and distal regulatory elements (REs) of TF genes. We discovered functional REs for [~]50 astrocyte-responsive TF genes, providing a map of gene regulatory network control. Astrocyte-responsive TF genes fall into groups that exert independent or counter-balancing transcriptional effects, highlighting the complex coordination of the neuronal response to astrocytes. Functional effects of specific TFs, including POU3F2 and TFAP2E, on neurite morphology and neuronal electrophysiology are consistent with transcriptional effects, demonstrating the capacity of direct epigenetic control to mimic heterotypic cellular signals. This work illuminates the regulation of neurodevelopment-and disease-relevant gene modules by neuron-astrocyte interactions, and provides a blueprint for applying modern functional genomics to uncover the links between cell microenvironment and epigenomic programming.

HighlightsO_LINeuronal gene expression and chromatin accessibility landscape are profoundly remodeled by astrocytes over weeks of co-culture
C_LIO_LIAstrocyte-responsive neuronal gene modules and neuron-responsive astrocytic gene modules are enriched for genes associated with schizophrenia and familial Alzheimers Disease
C_LIO_LISingle-cell CRISPR interference and activation screens of astrocyte-responsive gene regulatory elements identified dozens of functional regulatory elements of TF genes in neurons
C_LIO_LISingle-cell CRISPR interference and activation screens of >200 astrocyte-responsive TF genes uncovered discrete functional clusters that promote neuronal maturity or stemness
C_LIO_LIAstrocyte-responsive TF genes reprogram neuronal electrophysiology and neurite morphology
C_LI
]]></description>
<dc:creator>Li, B.</dc:creator>
<dc:creator>Hagy, K.</dc:creator>
<dc:creator>Safi, A.</dc:creator>
<dc:creator>Beer, M. A.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>Geraghty, S.</dc:creator>
<dc:creator>Rai, R.</dc:creator>
<dc:creator>Pederson, A. N.</dc:creator>
<dc:creator>Reisman, S. J.</dc:creator>
<dc:creator>Love, M. I.</dc:creator>
<dc:creator>Sullivan, P. F.</dc:creator>
<dc:creator>Eroglu, C.</dc:creator>
<dc:creator>Crawford, G. E.</dc:creator>
<dc:creator>Gersbach, C. A.</dc:creator>
<dc:date>2026-03-07</dc:date>
<dc:identifier>doi:10.64898/2026.03.07.710282</dc:identifier>
<dc:title><![CDATA[Reprogramming of neuronal genome function and phenotype by astrocytes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-03-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.02.01.703124v1?rss=1">
<title>
<![CDATA[
Functional Annotation of the Major Histocompatibility Complex Locus 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.02.01.703124v1?rss=1"
</link>
<description><![CDATA[
The human major histocompatibility complex (MHC) locus has the greatest density of disease-associations in the human genome, including links to over 100 polygenic disorders. Its complex haplotype structure, rich gene density, and high degree of linkage disequilibrium combine to make deciphering the gene regulatory logic of the MHC locus extremely challenging. Employing complementary high-throughput CRISPR interference (CRISPRi) and activation (CRISPRa) epigenetic screens coupled with single-cell transcriptome profiling across three distinct human cell types, we identified hundreds of new connections between cis-regulatory elements (CREs) and their target genes in this locus. These CRE-gene links are largely cell type-specific and act as enhancers. Additionally, some CREs have complex features, including harboring both active and repressive histone marks, lacking chromatin accessibility, targeting multiple genes, or acting as silencers. Computational methods fail to predict a majority of these CRE-gene connections. These findings emphasize the potential for functional perturbation experiments to dissect complex loci and reveal shared and cell type-specific regulatory mechanisms relevant to genomics of complex diseases. Collectively, this study provides a unique resource for understanding the complex regulatory landscape within the MHC locus and supports the need for creating new models that encompass CRE-gene interactions, cell type-specific gene expression, and disease genetics in the noncoding genome.
]]></description>
<dc:creator>Bounds, L. R.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>ter Weele, M.</dc:creator>
<dc:creator>Liu, S.</dc:creator>
<dc:creator>Wu, E.</dc:creator>
<dc:creator>Li, S.</dc:creator>
<dc:creator>Venukuttan, R.</dc:creator>
<dc:creator>Rai, R.</dc:creator>
<dc:creator>Mu, W.</dc:creator>
<dc:creator>Iglesias, N.</dc:creator>
<dc:creator>Giusti-Rodriguez, P.</dc:creator>
<dc:creator>Reddy, T. E.</dc:creator>
<dc:creator>Li, Y.</dc:creator>
<dc:creator>Gordan, R.</dc:creator>
<dc:creator>Allen, A. S.</dc:creator>
<dc:creator>Love, M. I.</dc:creator>
<dc:creator>Sullivan, P. F.</dc:creator>
<dc:creator>Crawford, G. E.</dc:creator>
<dc:creator>Gersbach, C. A.</dc:creator>
<dc:date>2026-02-03</dc:date>
<dc:identifier>doi:10.64898/2026.02.01.703124</dc:identifier>
<dc:title><![CDATA[Functional Annotation of the Major Histocompatibility Complex Locus]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-02-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.08.05.668745v1?rss=1">
<title>
<![CDATA[
Higher eQTL power reveals signals that boost GWAS colocalization 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.08.05.668745v1?rss=1"
</link>
<description><![CDATA[
Expression quantitative trait locus (eQTL) studies in human cohorts typically detect at least one regulatory signal per gene, and have been proposed as a way to explain mechanisms of genetic liability for other traits, as discovered in genome-wide association studies (GWAS). In particular, eQTL signals may colocalize with GWAS signals, suggesting gene expression as a possible mediator. However, recent studies have noted colocalization occurs infrequently, even when expression is measured in biologically relevant tissues. Most eQTL studies to date include only hundreds of individuals, and are underpowered to discover distal regulatory signals explaining smaller fractions of gene expression variance. We integrate evidence from recent eQTL studies and demonstrate that limited statistical power due to sample size skews the detection of eQTL signals identified at various signal strengths. We estimate that a sample size of 500 detects <0.1 to 60% of eQTL for a range of signal strengths and that a sample size of 2,000 would detect 36.8% of all eQTL. We show that eQTL signals that can only be discovered in larger studies exhibit characteristics more similar to those of GWAS signals, including greater distance to the regulated gene and higher probability of loss intolerance. Finally, using results from recent eQTL studies and meta-analyses, we observe a large increase in detected colocalizations with GWAS signals compared to previous studies. These findings caution against overinterpreting the absence of colocalization in underpowered studies and provide guidance for designing future eQTL experiments, to improve power and complement perturbation-based approaches in characterizing gene-trait mechanisms.
]]></description>
<dc:creator>Rosen, J. D.</dc:creator>
<dc:creator>Broadaway, K. A.</dc:creator>
<dc:creator>Brotman, S. M.</dc:creator>
<dc:creator>Mohlke, K. L.</dc:creator>
<dc:creator>Love, M. I.</dc:creator>
<dc:date>2025-08-05</dc:date>
<dc:identifier>doi:10.1101/2025.08.05.668745</dc:identifier>
<dc:title><![CDATA[Higher eQTL power reveals signals that boost GWAS colocalization]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-08-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.03.27.714794v1?rss=1">
<title>
<![CDATA[
Structured Pooling Improves Detection of Rare Regulatory Mutations in Population-Scale Reporter Assays 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.03.27.714794v1?rss=1"
</link>
<description><![CDATA[
Identifying genetic variants in noncoding DNA that impact gene expression and thereby contribute to disease risk remains a difficult but important challenge in genomic medicine. Modern reporter assays such as STARR-seq and MPRA provide an efficient and effective means of testing, in very high throughput, millions of variants captured directly from patient genomes. While these assays have previously been scaled to whole genomes and, separately, to populations, we report findings from the first whole-genome population-scale STARR-seq experiment performed on 100 individuals. In order to achieve that scale we devised a novel experimental design that partitions samples into pools so as to increase allele frequencies within pools and thereby reduce expected dropout and increase signal-to-noise ratio in experimental readouts. We show that this design produces more accurate estimates of variant effect sizes, and we provide a Bayesian model for robust estimation of those effect sizes that also reports full posterior distributions for assessment of confidence in estimates. Together, these methodological innovations facilitate the detection of functional regulatory variants, particularly rare variants, with much higher accuracy and at greater scale than previously possible. We demonstrate the utility of this approach on the task of functional annotation of quantitative trait loci such as eQTLs and caQTLs, and show concordance with patterns of constraint in transcription factor binding profiles.
]]></description>
<dc:creator>Dura, K.</dc:creator>
<dc:creator>Siklenka, K.</dc:creator>
<dc:creator>Strouse, K. P.</dc:creator>
<dc:creator>Morrow, S.</dc:creator>
<dc:creator>Zhang, C.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>Allen, A. S.</dc:creator>
<dc:creator>Reddy, T. E.</dc:creator>
<dc:creator>Majoros, W. H.</dc:creator>
<dc:date>2026-03-31</dc:date>
<dc:identifier>doi:10.64898/2026.03.27.714794</dc:identifier>
<dc:title><![CDATA[Structured Pooling Improves Detection of Rare Regulatory Mutations in Population-Scale Reporter Assays]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-03-31</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.03.27.714770v1?rss=1">
<title>
<![CDATA[
Modeling gene regulatory perturbations via deep learning from high-throughput reporter assays 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.03.27.714770v1?rss=1"
</link>
<description><![CDATA[
Assessing likely variant effects on phenotypes is of critical importance in diagnostic settings, and while much progress has been made in interpreting genic mutations based on our understanding of coding sequence, noncoding variants can be much more challenging to reliably interpret based on DNA sequence alone. High-throughput reporter assays such as STARR-seq and MPRA have shown utility in experimentally measuring regulatory effects of noncoding variants present in samples but provide no readout for variants not present in the assay inputs. However, whole-genome reporter assays provide copious data that can be used to train predictive models for prioritizing variants not directly observed in the experiment. We describe a retrainable predictive modeling framework, BlueSTARR, for this task, and present results of training several models with this framework on whole-genome STARR-seq data from two cell lines and one drug treatment. Using these models, we uncover a global signature across the human genome consistent with purifying selection against both loss-of-function and gain-of-function regulatory variants, with the latter showing a significant bias consistent with selection against gains of cis regulatory function in closed chromatin proximal to genes. By testing the model on synthetic enhancers with binding motifs for transcription factors GR and AP-1, we find that when trained on drug perturbation data, the model is able to learn distance-dependent and treatment-dependent binding patterns and their resulting reporter gene activation. These results demonstrate that lightweight, easily retrainable models such as ours have utility in probing latent signals present in novel experimental data. Finally, we find only modest differences in performance between different deep-learning architectures when trained on this single data modality, and while somewhat greater predictive accuracy can be achieved with much larger models trained at great expense on many terabytes of data, there is still copious room for improvement even for industrial strength, state-of-the-art models.
]]></description>
<dc:creator>Venukuttan, R.</dc:creator>
<dc:creator>Doty, R.</dc:creator>
<dc:creator>Thomson, A.</dc:creator>
<dc:creator>Chen, Y.</dc:creator>
<dc:creator>Li, B.</dc:creator>
<dc:creator>Duan, Y.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>Dura, K.</dc:creator>
<dc:creator>Ko, K.-Y.</dc:creator>
<dc:creator>Lapp, H.</dc:creator>
<dc:creator>Reddy, T. E.</dc:creator>
<dc:creator>Allen, A. S.</dc:creator>
<dc:creator>Majoros, W. H.</dc:creator>
<dc:date>2026-03-31</dc:date>
<dc:identifier>doi:10.64898/2026.03.27.714770</dc:identifier>
<dc:title><![CDATA[Modeling gene regulatory perturbations via deep learning from high-throughput reporter assays]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-03-31</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.02.01.703129v1?rss=1">
<title>
<![CDATA[
Mismatch tolerance of a gRNA for CRISPR-based gene activation confers broad activity critical for cell reprogramming 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.02.01.703129v1?rss=1"
</link>
<description><![CDATA[
CRISPR activation and interference systems (CRISPRa/i) are widely used for programmable transcriptional control. Although these technologies are capable of highly specific single-gene activity, some applications of transcriptional network reprogramming require broad, genome-wide effects. Here, we identify a CRISPRa gRNA that robustly reprograms astrocyte transcriptional state. Unexpectedly, this activity arises from extensive off-target binding that induces expression changes in thousands of genes, unlike neighboring gRNAs targeting the same intended on-target site. We leverage this promiscuous gRNA to dissect determinants of gRNA-driven off-target dCas9 binding in the context of transcriptional reprogramming. Using ChIP-seq, high-throughput protein-binding microarrays, and gRNA-variant library screening in cells, we demonstrate that PAM-proximal bases are primary determinants of genomic binding, mismatch tolerance is both gRNA- and base-specific, and targeted mutations within the PAM-proximal region can tune gRNA specificity. We further demonstrate that CRISPRa-driven phenotypes can reflect combined contributions from widespread off-target activity and dose-dependent on-target effects. These findings highlight the potentially widespread impacts of CRISPRa off-target activity, underscore the need to account for cryptic effects when selecting and evaluating gRNAs for programming cell phenotypes, and demonstrate that multi-site binding by CRISPRa systems can be exploited as a feature for network-level perturbations in cell reprogramming.

GRAPHICAL ABSTRACT

O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=83 SRC="FIGDIR/small/703129v1_ufig1.gif" ALT="Figure 1">
View larger version (34K):
org.highwire.dtl.DTLVardef@b697b0org.highwire.dtl.DTLVardef@1a0b390org.highwire.dtl.DTLVardef@16ce710org.highwire.dtl.DTLVardef@b5d87a_HPS_FORMAT_FIGEXP  M_FIG C_FIG
]]></description>
<dc:creator>Reisman, S. J.</dc:creator>
<dc:creator>Zhu, W.</dc:creator>
<dc:creator>Miller, S. E.</dc:creator>
<dc:creator>Halabi, D.</dc:creator>
<dc:creator>Sangvai, N.</dc:creator>
<dc:creator>Crawford, G. E.</dc:creator>
<dc:creator>Gordan, R.</dc:creator>
<dc:creator>Gersbach, C. A.</dc:creator>
<dc:date>2026-02-03</dc:date>
<dc:identifier>doi:10.64898/2026.02.01.703129</dc:identifier>
<dc:title><![CDATA[Mismatch tolerance of a gRNA for CRISPR-based gene activation confers broad activity critical for cell reprogramming]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-02-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.10.11.681828v1?rss=1">
<title>
<![CDATA[
Comprehensive profiling of transcription factors for reprogramming human astrocytes to neuronal cells through endogenous CRISPR-based gene activation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.10.11.681828v1?rss=1"
</link>
<description><![CDATA[
Neuronal loss is a hallmark of neurodegeneration and brain injury. Direct reprogramming of astrocytes into neurons has emerged as a promising approach to restore lost neurons. Comprehensive mapping and characterization of candidate astrocyte-to-neuron reprogramming factors is an essential step to realizing the potential of this strategy. Here, we established a CRISPR activation (CRISPRa)-based approach for neuronal reprogramming of primary human astrocytes. We conducted high-throughput CRISPRa screens of all human genes encoding transcription factors (TFs) to identify novel and efficient reprogramming factors. scRNA-seq characterization of top hits revealed that single TFs reprogram primary human astrocytes into multiple neuronal subtypes with distinct cell type-specific gene signatures. We demonstrate that INSM1 reprograms astrocytes to a glutamatergic neuron-like state and has broad neurogenic activity across different cell types and across human and mouse contexts. Finally, we conduct paired CRISPRa screens to identify cofactors that cooperate with INSM1 to enhance neuronal reprogramming and subtype specification, and elucidate genomic mechanisms of interaction and downstream regulators.
]]></description>
<dc:creator>Reisman, S. J.</dc:creator>
<dc:creator>Halabi, D.</dc:creator>
<dc:creator>Miller, S. E.</dc:creator>
<dc:creator>Song, L.</dc:creator>
<dc:creator>Geraghty, S.</dc:creator>
<dc:creator>Sangvai, N.</dc:creator>
<dc:creator>Rice, G.</dc:creator>
<dc:creator>Safi, A.</dc:creator>
<dc:creator>Crawford, G. E.</dc:creator>
<dc:creator>Gersbach, C. A.</dc:creator>
<dc:date>2025-10-12</dc:date>
<dc:identifier>doi:10.1101/2025.10.11.681828</dc:identifier>
<dc:title><![CDATA[Comprehensive profiling of transcription factors for reprogramming human astrocytes to neuronal cells through endogenous CRISPR-based gene activation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-10-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.10.11.681829v1?rss=1">
<title>
<![CDATA[
RELB Reprograms Exhausted Tumor-Infiltrating Lymphocytes for Improved Adoptive Cell Therapy 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.10.11.681829v1?rss=1"
</link>
<description><![CDATA[
Tumor-infiltrating lymphocytes (TILs) are a promising autologous cell therapy to treat solid tumors. TILs are manufactured by expanding and reinfusing tumor-reactive T cells from tumor biopsies. Efficacy of TIL therapies has been limited by the heterogeneity of expanded TIL products and the high prevalence of dysfunctional exhausted CD8+ T cells (TEX). While a subset of CD8+ TILs co-expressing CD103 and CD39 are enriched for tumor-reactive TILs across multiple cancer types, these cells are often in the TEX state with low proliferative potential. To identify regulators of human TIL proliferation, we screened an open reading frame library encoding for all human transcription factors (TFs). RELB emerged as the dominant driver of human TIL expansion with a skew towards CD8+ cells. TCR diversity was maintained after multiple days of in vitro expansion driven by RELB. Transcriptome profiling of multiple RELB-expressing TIL subtypes revealed a shift towards a memory/costimulatory-like phenotype. Using a HER2-targeting CAR and tumor co-culture model, RELB conferred improved persistence after multiple tumor challenges in vitro and improved solid tumor control in mouse xenografts in vivo. Finally, co-culture of RELB-overexpressing TILs with patient-matched tumor organoids showed an increase in TIL product polyfunctionality, tumor reactivity, and tumor killing. Collectively these results support promoting RELB expression as a strategy for broadly enabling TIL therapy for treating solid tumors.
]]></description>
<dc:creator>McRoberts Amador, C. D.</dc:creator>
<dc:creator>Conover, R. E.</dc:creator>
<dc:creator>Brown, M. C.</dc:creator>
<dc:creator>Lyniv, L. S.</dc:creator>
<dc:creator>Noldner, P. K.</dc:creator>
<dc:creator>Zhou, Y.</dc:creator>
<dc:creator>Gao, A. R.</dc:creator>
<dc:creator>McCutcheon, S. R.</dc:creator>
<dc:creator>Antonia, S. J.</dc:creator>
<dc:creator>Gersbach, C. A.</dc:creator>
<dc:date>2025-10-12</dc:date>
<dc:identifier>doi:10.1101/2025.10.11.681829</dc:identifier>
<dc:title><![CDATA[RELB Reprograms Exhausted Tumor-Infiltrating Lymphocytes for Improved Adoptive Cell Therapy]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-10-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.07.19.665672v1?rss=1">
<title>
<![CDATA[
A gene regulatory element modulates myosin expression and controls cardiomyocyte response to stress 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.07.19.665672v1?rss=1"
</link>
<description><![CDATA[
A hallmark of heart disease is gene dysregulation and reactivation of fetal gene programs. Reactivation of these fetal programs has compensatory effects during heart failure, depending on the type and stage of the underlying cardiomyopathy. Thousands of putative cardiac gene regulatory elements have been identified that may control these programs, but their functions are largely unknown. We profile genome-wide changes to gene expression and chromatin structure in cardiomyocytes derived from human pluripotent stem cells. We identify and characterize a gene regulatory element essential for the regulation of MYH6, which encodes human fetal myosin. Using chromatin conformation assays in combination with epigenome editing, we find that gene regulation is mediated by direct interaction between MYH6 and the enhancer. We also find that enhancer activation alters cardiomyocyte response to the hypertrophy-inducing peptide endothelin-1. Enhancer activation prevents polyploidization and changes in calcium dynamics following stress with endothelin-1. Collectively, these results identify regulatory mechanisms of cardiac gene expression programs that modulate cardiomyocyte maturation, cellular stress response, and could serve as potential therapeutic targets.
]]></description>
<dc:creator>Anglen, T.</dc:creator>
<dc:creator>Kaplow, I. M.</dc:creator>
<dc:creator>Choi, B.</dc:creator>
<dc:creator>Dewars, E.</dc:creator>
<dc:creator>Perelli, R. M.</dc:creator>
<dc:creator>Hagy, K. T.</dc:creator>
<dc:creator>Tran, D.</dc:creator>
<dc:creator>Ramaker, M. E.</dc:creator>
<dc:creator>Shah, S.</dc:creator>
<dc:creator>Jung, I.</dc:creator>
<dc:creator>Landstrom, A. P.</dc:creator>
<dc:creator>Karra, R.</dc:creator>
<dc:creator>Diao, Y.</dc:creator>
<dc:creator>Gersbach, C. A.</dc:creator>
<dc:date>2025-07-20</dc:date>
<dc:identifier>doi:10.1101/2025.07.19.665672</dc:identifier>
<dc:title><![CDATA[A gene regulatory element modulates myosin expression and controls cardiomyocyte response to stress]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-07-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.03.08.434470v1?rss=1">
<title>
<![CDATA[
Genome-wide annotation of gene regulatory elements linked to cell fitness 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.03.08.434470v1?rss=1"
</link>
<description><![CDATA[
Noncoding regulatory elements control gene expression and thus govern nearly all biological processes. Epigenomic profiling assays have identified millions of putative regulatory elements, but systematically determining the function of those regulatory elements remains a substantial challenge. Here we adapt CRISPR screening by epigenetic repression to screen all 111,619 putative non-coding regulatory elements defined by open chromatin sites in human K562 leukemia cells for their role in regulating essential cellular processes and proliferation. In an initial screen containing 1,084,704 gRNAs, we implemented an analysis framework to quantify perturbation effects, and nominate 1,108 regulatory elements that strongly impact cell fitness. We tested 8,845 of the primary screen elements in a secondary screen, evaluated their cell-type specificity in a second cancer cell line, and then used a single-cell RNA-seq CRISPR screen to discover 63 connections between distal regulatory elements and target genes. This comprehensive and quantitative genome-wide map of essential gene regulatory elements presents a framework for extensive characterization of noncoding regulatory elements that drive complex cell phenotypes and for prioritizing non-coding genetic variants that may contribute to common traits and disease risk.
]]></description>
<dc:creator>Klann, T.</dc:creator>
<dc:creator>Barrera, A.</dc:creator>
<dc:creator>Ettyreddy, A.</dc:creator>
<dc:creator>Rickels, R.</dc:creator>
<dc:creator>Bryois, J.</dc:creator>
<dc:creator>Jiang, S.</dc:creator>
<dc:creator>Adkar, S.</dc:creator>
<dc:creator>Iglesias, N.</dc:creator>
<dc:creator>Sullivan, P.</dc:creator>
<dc:creator>Reddy, T. E.</dc:creator>
<dc:creator>Crawford, G. E.</dc:creator>
<dc:creator>Gersbach, C.</dc:creator>
<dc:date>2021-03-09</dc:date>
<dc:identifier>doi:10.1101/2021.03.08.434470</dc:identifier>
<dc:title><![CDATA[Genome-wide annotation of gene regulatory elements linked to cell fitness]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-03-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.05.14.654043v1?rss=1">
<title>
<![CDATA[
Cell modeling and rescue of a novel non-coding genetic cause of Glycogen Storage Disease IX 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.05.14.654043v1?rss=1"
</link>
<description><![CDATA[
Delayed diagnosis of Mendelian disease substantially prevents early therapeutic intervention that could improve symptoms and prognosis. One major contributing challenge is the functional interpretation of non-coding variants that cause disease by altering splicing and/or gene expression. We identified two siblings with glycogen storage disease (GSD) type IX {gamma}2, both of whom had a classic clinical presentation, enzyme deficiency, and a known pathogenic splice acceptor variant on one allele of PHKG2. Despite the autosomal recessive nature of the disease, no variant on the second allele was identified by gene panel sequencing. To identify a potential missing second pathogenic variant, we completed whole genome sequencing (WGS) and detected putative deep intronic splicing variant in PHKG2 in both siblings. We confirmed the functional splicing effects of this variant using short-read and long-read RNA-seq on patient blood and a HEK293T cell model in which we installed the variant using CRISPR editing. Using the cell model, we demonstrated multiple biochemical and cellular impacts that are consistent with GSD IX {gamma}2, and a reversal of aberrant splicing using antisense splice-switching oligonucleotides. In doing so, we demonstrate a novel and robust pathway for detecting, validating, and reversing the impacts of novel non-coding causes of rare disease.
]]></description>
<dc:creator>Iyengar, A. K.</dc:creator>
<dc:creator>Zou, X.</dc:creator>
<dc:creator>Dai, J.</dc:creator>
<dc:creator>Francis, R. A.</dc:creator>
<dc:creator>Safi, A.</dc:creator>
<dc:creator>Patterson, K.</dc:creator>
<dc:creator>Koch, R. L.</dc:creator>
<dc:creator>Clarke, S.</dc:creator>
<dc:creator>Beaman, M. M.</dc:creator>
<dc:creator>Chong, J. X.</dc:creator>
<dc:creator>Bamshad, M. J.</dc:creator>
<dc:creator>Majoros, W. H.</dc:creator>
<dc:creator>Rehder, R. C.</dc:creator>
<dc:creator>Bali, D. S.</dc:creator>
<dc:creator>Allen, A. S.</dc:creator>
<dc:creator>Crawford, G. E.</dc:creator>
<dc:creator>Kishnani, P. S.</dc:creator>
<dc:creator>Reddy, T. E.</dc:creator>
<dc:date>2025-05-17</dc:date>
<dc:identifier>doi:10.1101/2025.05.14.654043</dc:identifier>
<dc:title><![CDATA[Cell modeling and rescue of a novel non-coding genetic cause of Glycogen Storage Disease IX]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-05-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.06.555206v1?rss=1">
<title>
<![CDATA[
Enhancer-driven regulatory network of forebrain human development provides insights into autism 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.06.555206v1?rss=1"
</link>
<description><![CDATA[
Cell differentiation is orchestrated by transcription factors (TFs) binding to enhancers, shaping gene regulatory networks that drive neuronal lineage specification. Deciphering these enhancer-driven networks in human forebrain development is essential for understanding the genetic basis of neurodevelopmental disorders. Through integrative epigenomic and transcriptomic analyses of human forebrain organoids derived from 10 individuals with autism spectrum disorder (ASD) and their neurotypical fathers, we constructed a comprehensive enhancer-driven gene regulatory network (GRN) of early neurodevelopment. This GRN revealed hierarchical regulatory transitions guiding neuronal differentiation and was experimentally validated via CRISPR interference (CRISPRi) and loss-of-function analyses. A subnetwork linked ASD-associated transcriptomic alterations to dysregulated TF activity, implicating FOXG1, BHLHE22, EOMES, and NEUROD2 as key regulators of excitatory neuron specification in macrocephalic ASD. These findings suggest that ASD disrupts enhancer-driven regulatory frameworks, altering neuronal cell fate decisions in the developing fetal brain.
]]></description>
<dc:creator>Vaccarino, F. M.</dc:creator>
<dc:creator>Jourdon, A.</dc:creator>
<dc:creator>Mariani, J.</dc:creator>
<dc:creator>Wu, F.</dc:creator>
<dc:creator>Capauto, D.</dc:creator>
<dc:creator>Norton, S.</dc:creator>
<dc:creator>Tomasini, L.</dc:creator>
<dc:creator>Amiri, A.</dc:creator>
<dc:creator>Schreiner, J.</dc:creator>
<dc:creator>Nguyen, C. K.</dc:creator>
<dc:creator>Nolan, N.</dc:creator>
<dc:creator>Szekely, A.</dc:creator>
<dc:creator>McPartland, J. C.</dc:creator>
<dc:creator>Pelphrey, K.</dc:creator>
<dc:creator>Chawarska, K.</dc:creator>
<dc:creator>Ventola, P.</dc:creator>
<dc:creator>Abyzov, A.</dc:creator>
<dc:date>2023-09-08</dc:date>
<dc:identifier>doi:10.1101/2023.09.06.555206</dc:identifier>
<dc:title><![CDATA[Enhancer-driven regulatory network of forebrain human development provides insights into autism]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.02.13.638099v1?rss=1">
<title>
<![CDATA[
Clonal memory of colitis accumulates and promotes tumor growth 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.02.13.638099v1?rss=1"
</link>
<description><![CDATA[
Chronic inflammation is a well-established risk factor for cancer, but the underlying molecular mechanisms remain unclear. Using a mouse model of colitis, we demonstrate that colonic stem cells retain an epigenetic memory of inflammation following disease resolution, characterized by a cumulative gain of activator protein 1 (AP-1) transcription factor activity. Further, we develop SHARE-TRACE, a method that enables simultaneous profiling of gene expression, chromatin accessibility and clonal history in single cells, enabling high resolution tracking of epigenomic memory. This reveals that inflammatory memory is propagated cell-intrinsically and inherited through stem cell lineages, with certain clones demonstrating dramatically stronger memory than others. Finally, we show that colitis primes stem cells for amplified expression of regenerative gene programs following oncogenic mutation that accelerate tumor growth. This includes a subpopulation of tumors that have exceptionally high AP-1 activity and the additional upregulation of pro-oncogenic programs. Together, our findings provide a mechanistic link between chronic inflammation and malignancy, revealing how long-lived epigenetic alterations in regenerative tissues may contribute to disease susceptibility and suggesting potential therapeutic strategies to mitigate cancer risk in patients with chronic inflammatory conditions.
]]></description>
<dc:creator>Nagaraja, S.</dc:creator>
<dc:creator>Ojeda-Miron, L.</dc:creator>
<dc:creator>Zhang, R.</dc:creator>
<dc:creator>Oreskovic, E.</dc:creator>
<dc:creator>Hu, Y.</dc:creator>
<dc:creator>Zeve, D.</dc:creator>
<dc:creator>Sharma, K.</dc:creator>
<dc:creator>Hyman, R. R.</dc:creator>
<dc:creator>Zhang, Q.</dc:creator>
<dc:creator>Castillo, A.</dc:creator>
<dc:creator>Breault, D. T.</dc:creator>
<dc:creator>Yilmaz, O. H.</dc:creator>
<dc:creator>Buenrostro, J. D.</dc:creator>
<dc:date>2025-02-17</dc:date>
<dc:identifier>doi:10.1101/2025.02.13.638099</dc:identifier>
<dc:title><![CDATA[Clonal memory of colitis accumulates and promotes tumor growth]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-02-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.03.13.711357v1?rss=1">
<title>
<![CDATA[
Comparing bulk and single-cell methodologies and models to profile gene expression, chromatin accessibility and regulatory links in endothelial cells treated with TNFα 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.03.13.711357v1?rss=1"
</link>
<description><![CDATA[
Genome-wide association studies (GWAS) have identified thousands of non-coding variants associated with complex traits and diseases. However, it remains challenging to pinpoint the causal genes that are regulated by associated genetic variants. Connecting causal non-coding variants with genes can rely on methods that identify direct physical interactions (e.g. chromosome conformation capture) or on probabilistic models that predict regulatory links. These statistical models take advantage of gene expression and chromatin accessibility profiles generated in cells and tissues by bulk or single-cell (sc) methodologies. Here, we tested whether using bulk or sc RNAseq/ATACseq data and corresponding predictive enhancer-to-gene models impact the prioritization of causal GWAS genes. Using non-treated and TNF-treated human endothelial cells in vitro as a well-controlled experimental system, we show that bulk and sc RNAseq/ATACseq profiles are similar and highlight the same biology (e.g. biological pathways). Despite these similarities, we show using GWAS results for coronary artery disease (CAD) and diastolic blood pressure that applying enhancer-to-gene models designed for bulk or sc methodologies can yield differences in terms of captured heritability, fine-mapped variants and linked genes. For instance, at one CAD locus, the bulk-based ABC model predicts a regulatory link with BCAR1, whereas the sc-based model scE2G prioritizes a different gene (CFDP1). On the same experimental model, our results indicate that choosing between a bulk or sc approach will influence regulatory link model predictions; this should be considered when planning functional experiments to characterize GWAS discoveries.
]]></description>
<dc:creator>Zevounou, J.</dc:creator>
<dc:creator>Lo, K. S.</dc:creator>
<dc:creator>McGinnis, C. S.</dc:creator>
<dc:creator>Satpathy, A. T.</dc:creator>
<dc:creator>Lettre, G.</dc:creator>
<dc:date>2026-03-16</dc:date>
<dc:identifier>doi:10.64898/2026.03.13.711357</dc:identifier>
<dc:title><![CDATA[Comparing bulk and single-cell methodologies and models to profile gene expression, chromatin accessibility and regulatory links in endothelial cells treated with TNFα]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-03-16</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.03.05.709922v1?rss=1">
<title>
<![CDATA[
Massive-scale single-nucleus multi-omics identifies novel rare noncoding drivers of Parkinson's disease 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.03.05.709922v1?rss=1"
</link>
<description><![CDATA[
Most genetic variants contributing to complex diseases reside in the noncoding genome. While common variants uncovered by genome-wide association studies often fail to explain much of the observed heritability of these diseases, rare variants often have higher effect sizes and cumulatively explain a larger portion of heritability. However, rare variants, particularly rare noncoding variants, have remained under-characterized largely due to the difficulties of accurately predicting variant functionality at scale, given that each individual carries an average of [~]10,000 rare variants. Here, we generated multi-omic data from >3.3 million nuclei sampled from five brain regions across a cohort of 80 individuals with Parkinsons disease (PD) and 21 neurologically normal control individuals with matched 30x whole-genome sequencing. We use this data to identify cell type-specific features of PD, map cell type-specific chromatin accessibility and expression quantitative trait loci, and train machine learning models to predict the effect of variants on gene regulation. We identify rare noncoding variants statistically associated with sporadic PD and extend our approaches to predict drivers of familial PD of unknown genetic origin. Our results underscore the significance of rare noncoding variants in complex diseases and provide a roadmap for applying similar approaches in other disease systems.
]]></description>
<dc:creator>Menon, S.</dc:creator>
<dc:creator>Turner, A. W.</dc:creator>
<dc:creator>Chang, S. H.</dc:creator>
<dc:creator>Johnson, A. W.</dc:creator>
<dc:creator>Chang, H. H.</dc:creator>
<dc:creator>Shah, A. J.</dc:creator>
<dc:creator>Zeng, Y.</dc:creator>
<dc:creator>Strohlein, C. E.</dc:creator>
<dc:creator>Kampman, L.</dc:creator>
<dc:creator>Colston, C.</dc:creator>
<dc:creator>Kozlenkov, A.</dc:creator>
<dc:creator>Dracheva, S.</dc:creator>
<dc:creator>Avenali, M.</dc:creator>
<dc:creator>Palermo, G.</dc:creator>
<dc:creator>Ceravolo, R.</dc:creator>
<dc:creator>Valente, E. M.</dc:creator>
<dc:creator>Gabbert, C.</dc:creator>
<dc:creator>Trinh, J.</dc:creator>
<dc:creator>Serrano, G. E.</dc:creator>
<dc:creator>Beach, T. G.</dc:creator>
<dc:creator>Global Parkinson's Genetic Program (GP2),</dc:creator>
<dc:creator>Shulman, J. M.</dc:creator>
<dc:creator>Blauwendraat, C.</dc:creator>
<dc:creator>Montine, T. J.</dc:creator>
<dc:creator>Fang, Z.-H.</dc:creator>
<dc:creator>Belloy, M. E.</dc:creator>
<dc:creator>Corces, M. R.</dc:creator>
<dc:date>2026-03-05</dc:date>
<dc:identifier>doi:10.64898/2026.03.05.709922</dc:identifier>
<dc:title><![CDATA[Massive-scale single-nucleus multi-omics identifies novel rare noncoding drivers of Parkinson's disease]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-03-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.10.05.680562v1?rss=1">
<title>
<![CDATA[
Base-editing a single missense mutation in A20 enhances CAR-T cell efficacy 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.10.05.680562v1?rss=1"
</link>
<description><![CDATA[
T cell exhaustion limits the efficacy of cancer immunotherapies. Here, we performed genome-wide loss-of-function screening in repetitively stimulated human T cells and identified the mulitfunctional ubiquitin-modifying protein A20/TNFAIP3 as a major negative regulator of exhausted T cell persistence. Protein large language modeling, deep base-editing mutagenesis, and studies in immunocompetent mice with domain-specific inactivating mutations revealed A20s non-enzymatic M1 ubiquitin-binding zinc finger 7 (A20ZF7) motif as critical to suppression of anti-tumor immunity. A20ZF7-deficient CD8+ tumor-infiltrating lymphocytes (TILs) resisted terminal exhaustion and circumvented an unappreciated mechanism restraining perforin degranulation in terminally exhausted cells. Human chimeric antigen receptor (CAR)-T cells engineered via base-editing to inactivate A20ZF7 via a single missense mutation also resisted exhaustion, secreted more perforin and robustly suppressed cancer in vivo. These studies pinpoint A20ZF7 as a novel T cell checkpoint and reveal precision base-editing of missense mutations as an effective approach to enhance CAR-T cell therapy.
]]></description>
<dc:creator>Blaisdell, A.</dc:creator>
<dc:creator>Bachl, S.</dc:creator>
<dc:creator>Sandoval, L. R.</dc:creator>
<dc:creator>Ching, C.</dc:creator>
<dc:creator>Bowman, C. J.</dc:creator>
<dc:creator>Kale, N.</dc:creator>
<dc:creator>Prabandham, M.</dc:creator>
<dc:creator>Diolaiti, M.</dc:creator>
<dc:creator>Havig, C.</dc:creator>
<dc:creator>Advincula, R.</dc:creator>
<dc:creator>Lenci, N.</dc:creator>
<dc:creator>Li, Z.</dc:creator>
<dc:creator>Yamashita, E.</dc:creator>
<dc:creator>Wang, C. H.</dc:creator>
<dc:creator>Zhang, S.</dc:creator>
<dc:creator>Liu, Q.</dc:creator>
<dc:creator>Achacoso, P.</dc:creator>
<dc:creator>Stibor, D.</dc:creator>
<dc:creator>Oynebraten, I.</dc:creator>
<dc:creator>Seo, J.</dc:creator>
<dc:creator>Ashworth, A.</dc:creator>
<dc:creator>Marson, A.</dc:creator>
<dc:creator>Ye, C. J.</dc:creator>
<dc:creator>Malynn, B. A.</dc:creator>
<dc:creator>Eyquem, J.</dc:creator>
<dc:creator>Carnevale, J.</dc:creator>
<dc:creator>Ma, A.</dc:creator>
<dc:date>2025-10-06</dc:date>
<dc:identifier>doi:10.1101/2025.10.05.680562</dc:identifier>
<dc:title><![CDATA[Base-editing a single missense mutation in A20 enhances CAR-T cell efficacy]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-10-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.09.17.675534v1?rss=1">
<title>
<![CDATA[
Cryo-mtscATAC-seq for single-cell mitochondrial DNA genotyping and clonal tracing in archived human tissues 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.09.17.675534v1?rss=1"
</link>
<description><![CDATA[
High-throughput clonal tracing of primary human samples relies on naturally occurring barcodes, such as somatic mitochondrial DNA (mtDNA) mutations detected via single-cell ATAC-seq (mtscATAC-seq). Fresh-frozen clinical specimens preserve tissue architecture but compromise cell integrity, thereby precluding their use in multi- omic approaches such as mitochondrial genotyping at single-cell resolution. Here, we introduce Cryo-mtscATAC-seq, a broadly applicable method for diverse pathophysiological contexts to isolate nuclei with their associated mitochondria ("CryoCells") from frozen samples for high-throughput clonal analysis. We applied Cryo-mtscATAC-seq to the neurodegenerated human brain, glioblastoma (GBM), pediatric neuroblastoma, and human aorta, and implemented mitobender, a computational tool to reduce ambient mtDNA in single-cell assays. Our approach revealed regional clonal gliogenesis and microglial expansions in amyotrophic lateral sclerosis (ALS), persistence of oligodendrocyte progenitor cell (OPC)-like clones in GBM recurrence, mtDNA depth heterogeneity after neuroblastoma chemotherapy, and oligoclonal proliferation of smooth muscle cells in human aorta. In conclusion, Cryo-mtscATAC-seq broadly extends mtDNA genotyping to archival frozen specimens across tissue types, opening new avenues for investigation of cell state- informed clonality in human health and disease.
]]></description>
<dc:creator>Salla, M.</dc:creator>
<dc:creator>Obermayer, B.</dc:creator>
<dc:creator>Cotta, M.</dc:creator>
<dc:creator>Friebel, E.</dc:creator>
<dc:creator>Campo-Garcia, J.</dc:creator>
<dc:creator>Charalambous, G.</dc:creator>
<dc:creator>Bueno, R. J.</dc:creator>
<dc:creator>Lieu, D.</dc:creator>
<dc:creator>Dabek, P.</dc:creator>
<dc:creator>Helmuth, A.</dc:creator>
<dc:creator>Tellides, G.</dc:creator>
<dc:creator>Assi, R.</dc:creator>
<dc:creator>Bankov, K.</dc:creator>
<dc:creator>Lodrini, M.</dc:creator>
<dc:creator>Deubzer, H.</dc:creator>
<dc:creator>Chung, H.</dc:creator>
<dc:creator>Beule, D.</dc:creator>
<dc:creator>Radbruch, H.</dc:creator>
<dc:creator>Capper, D.</dc:creator>
<dc:creator>Heppner, F.</dc:creator>
<dc:creator>Starossom, S. C.</dc:creator>
<dc:creator>Lareau, C.</dc:creator>
<dc:creator>Liu, I.</dc:creator>
<dc:creator>Ludwig, L. S.</dc:creator>
<dc:date>2025-09-20</dc:date>
<dc:identifier>doi:10.1101/2025.09.17.675534</dc:identifier>
<dc:title><![CDATA[Cryo-mtscATAC-seq for single-cell mitochondrial DNA genotyping and clonal tracing in archived human tissues]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-09-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.03.26.645108v1?rss=1">
<title>
<![CDATA[
Defining the host dependencies and the transcriptional landscape of RSV infection and bystander activation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.03.26.645108v1?rss=1"
</link>
<description><![CDATA[
Respiratory syncytial virus (RSV) is a globally prevalent pathogen, causes severe disease in older adults, and is the leading cause of bronchiolitis and pneumonia in the United States for children during their first year of life [1]. Despite its prevalence worldwide, RSV-specific treatments remain unavailable for most infected patients. Here, we leveraged a combination of genome-wide CRISPR knockout screening and single-cell RNA sequencing to improve our understanding of the host determinants of RSV infection and the host response in both infected cells, and uninfected bystanders. These data reveal temporal transcriptional patterns that are markedly different between RSV infected and bystander activated cells. Our data show that expression of interferon-stimulated genes is primarily observed in bystander activated cells, while genes implicated in the unfolded protein response and cellular stress are upregulated specifically in RSV infected cells. Furthermore, genome-wide CRISPR screens identified multiple host factors important for viral infection, findings which we contextualize relative to 29 previously published screens across 17 additional viruses. These unique data complement and extend prior studies that investigate the proinflammatory response to RSV infection, and juxtaposed to other viral infections, provide a rich resource for further hypothesis testing.

ImportanceRespiratory syncytial virus (RSV) is a leading cause of lower respiratory tract infection in infants and the elderly. Despite its substantial global health burden, RSV-targeted treatments remain unavailable for the majority of individuals. While vaccine development is underway, a detailed understanding of the host response to RSV infection and identification of required human host factors for RSV may provide insight into combatting this pathogen. Here, we utilized single-cell RNA sequencing and functional genomics to understand the host response in both RSV infected and bystander cells, identify what host factors mediate infection, and contextualize these findings relative to dozens of previously reported screens across 17 additional viruses.
]]></description>
<dc:creator>Sunshine, S.</dc:creator>
<dc:creator>Puschnik, A.</dc:creator>
<dc:creator>Retallack, H.</dc:creator>
<dc:creator>Laurie, M. T.</dc:creator>
<dc:creator>Liu, J.</dc:creator>
<dc:creator>Peng, D.</dc:creator>
<dc:creator>Knopp, K.</dc:creator>
<dc:creator>Zinter, M. S.</dc:creator>
<dc:creator>Ye, C. J.</dc:creator>
<dc:creator>DeRisi, J. L.</dc:creator>
<dc:date>2025-03-26</dc:date>
<dc:identifier>doi:10.1101/2025.03.26.645108</dc:identifier>
<dc:title><![CDATA[Defining the host dependencies and the transcriptional landscape of RSV infection and bystander activation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-03-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.07.16.665245v1?rss=1">
<title>
<![CDATA[
Clonal lineage tracing of innate immune cells in human cancer 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.07.16.665245v1?rss=1"
</link>
<description><![CDATA[
Innate immune cells constitute the majority of the tumor microenvironment (TME), where they mediate both natural anti-tumor immunity and immunotherapy responses. While single-cell T- and B-cell receptor sequencing has provided fundamental insights into the clonal dynamics of human adaptive immunity, the lack of appropriate tools has precluded similar analysis of innate immune cells. Here, we describe a method that leverages somatic mitochondrial DNA (mtDNA) mutations to reconstruct clonal lineage relationships between single cells across cell types in native human tissues. We jointly sequenced single-cell transposase-accessible chromatin and mtDNA to profile n=124,958 cells from matched tumor, non-involved lung tissue (NILT), and peripheral blood of early-stage non-small cell lung cancer (NSCLC) patients, as well as n=93,757 cells from matched tumor and peripheral blood of ovarian cancer patients. Single-cell concomitant profiling of lineage and cell states of thousands of immune cells resolved clonality across cell types, tissue sites, and malignancies. Clonal tracing of innate immune cells demonstrates that TME-resident myeloid subsets, including macrophages and type 3 dendritic cells (DC3), are clonally linked to both circulating and tissue-infiltrating monocytes. Further, we identify distinct DC-biased and macrophage-biased myeloid clones, enriched in the tumor and NILT, respectively, and find that their circulating monocyte precursors exhibit distinct epigenetic profiles, suggesting that myeloid differentiation fate may be predetermined before TME infiltration. These results delineate the clonal pathways of intratumoral myeloid cell recruitment and differentiation in human cancer and suggest that remodeling of the tumor myeloid compartment may be peripherally programmed.
]]></description>
<dc:creator>Liu, V.</dc:creator>
<dc:creator>Sandor, K.</dc:creator>
<dc:creator>Yan, P. K.</dc:creator>
<dc:creator>Miao, M.</dc:creator>
<dc:creator>Yin, Y.</dc:creator>
<dc:creator>Stickels, R. R.</dc:creator>
<dc:creator>Chen, A. Y.</dc:creator>
<dc:creator>Hiam-Galvez, K.</dc:creator>
<dc:creator>Gutierrez, J.</dc:creator>
<dc:creator>Zhang, W.</dc:creator>
<dc:creator>Sajjath, S. M.</dc:creator>
<dc:creator>Valbuena, R.</dc:creator>
<dc:creator>Wang, S.</dc:creator>
<dc:creator>Daniel, B.</dc:creator>
<dc:creator>Ludwig, L. S.</dc:creator>
<dc:creator>Howitt, B. E.</dc:creator>
<dc:creator>Lareau, C. A.</dc:creator>
<dc:creator>Satpathy, A. T.</dc:creator>
<dc:date>2025-07-21</dc:date>
<dc:identifier>doi:10.1101/2025.07.16.665245</dc:identifier>
<dc:title><![CDATA[Clonal lineage tracing of innate immune cells in human cancer]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-07-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.04.23.649973v1?rss=1">
<title>
<![CDATA[
Cell cycle-coupled transcriptional network orchestrates human B cell fate bifurcation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.04.23.649973v1?rss=1"
</link>
<description><![CDATA[
Antibody responses are determined by activated B cells bifurcating into plasmablasts (PBs) and germinal center B cells (GCBCs). Gene regulatory networks (GRNs) underlying human B cell fate choice remain uncharted. Temporally resolved single-cell multi-omics, computational modeling and CRISPR-based perturbations were used to assemble, simulate and test high-resolution GRNs underlying PB and GC fates. The results converged with orthogonal predictions of transcription factor (TF) action at single-nucleotide resolution, revealing dominant and reciprocal actions of IRF4 and its binding partners at simple and composite IRF motifs. Single-cell perturbation analysis of these TFs demonstrated multiple reciprocal negative feedback loops controlling the bifurcation. Additionally, IRF4 and BLIMP1, co-repressed the cell cycle regulators MYC and CCND2. G0/G1 lengthening accelerated the switching of cells to an IRF4hiBLIMP1hi regulatory state and enhanced the probability of PB specification, thereby uncovering a self-reinforcing regulatory module that couples cell cycle dynamics to B cell fate choice.
]]></description>
<dc:creator>Pease, N. A.</dc:creator>
<dc:creator>Fan, J.</dc:creator>
<dc:creator>Keshari, S.</dc:creator>
<dc:creator>Stratton, J.</dc:creator>
<dc:creator>Gerges, P.</dc:creator>
<dc:creator>Ann Varghese, B.</dc:creator>
<dc:creator>Nampoothiri VP, N.</dc:creator>
<dc:creator>McGinnnis, C. S.</dc:creator>
<dc:creator>Zhang, W.</dc:creator>
<dc:creator>Geirlack, S. B.</dc:creator>
<dc:creator>Swaminathan, T.</dc:creator>
<dc:creator>Sachan, A.</dc:creator>
<dc:creator>Manakkat Vijay, G.</dc:creator>
<dc:creator>Mena Hernandez, L.</dc:creator>
<dc:creator>Heidari Rarani, Z.</dc:creator>
<dc:creator>Macedo, C.</dc:creator>
<dc:creator>Metes, D.</dc:creator>
<dc:creator>Satpathy, A. T.</dc:creator>
<dc:creator>Jain, A. K.</dc:creator>
<dc:creator>Sahni, N.</dc:creator>
<dc:creator>Stallaert, W.</dc:creator>
<dc:creator>Das, J.</dc:creator>
<dc:creator>Singh, H.</dc:creator>
<dc:date>2025-04-24</dc:date>
<dc:identifier>doi:10.1101/2025.04.23.649973</dc:identifier>
<dc:title><![CDATA[Cell cycle-coupled transcriptional network orchestrates human B cell fate bifurcation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-04-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.03.27.587039v1?rss=1">
<title>
<![CDATA[
Transcript-specific enrichment enables profiling rare cell states via scRNA-seq 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.03.27.587039v1?rss=1"
</link>
<description><![CDATA[
Single-cell genomics technologies have accelerated our understanding of cell-state heterogeneity in diverse contexts. Although single-cell RNA sequencing (scRNA-seq) identifies many rare populations of interest that express specific marker transcript combinations, traditional flow sorting limits our ability to enrich these populations for further profiling, including requiring cell surface markers with high-fidelity antibodies. Additionally, many single-cell studies require the isolation of nuclei from tissue, eliminating the ability to enrich learned rare cell states based on extranuclear protein markers. To address these limitations, we describe Programmable Enrichment via RNA Flow-FISH by sequencing (PERFF-seq), a scalable assay that enables scRNA-seq profiling of subpopulations from complex cellular mixtures defined by the presence or absence of specific RNA transcripts. Across immune populations (n = 141,227 cells) and fresh-frozen and formalin-fixed paraffin-embedded brain tissue (n = 29,522 nuclei), we demonstrate the sorting logic that can be used to enrich for cell populations via RNA-based cytometry followed by high-throughput scRNA-seq. Our approach provides a rational, programmable method for studying rare populations identified by one or more marker transcripts.
]]></description>
<dc:creator>Abay, T.</dc:creator>
<dc:creator>Stickels, R. R.</dc:creator>
<dc:creator>Takizawa, M. T.</dc:creator>
<dc:creator>Nalbant, B. N.</dc:creator>
<dc:creator>Hsieh, Y.-H.</dc:creator>
<dc:creator>Hwang, S.</dc:creator>
<dc:creator>Snopkowski, C.</dc:creator>
<dc:creator>Yu, K. K. H.</dc:creator>
<dc:creator>Abou-Mrad, Z.</dc:creator>
<dc:creator>Tabar, V.</dc:creator>
<dc:creator>Ludwig, L. S.</dc:creator>
<dc:creator>Chaligne, R.</dc:creator>
<dc:creator>Satpathy, A. T.</dc:creator>
<dc:creator>Lareau, C. A.</dc:creator>
<dc:date>2024-03-27</dc:date>
<dc:identifier>doi:10.1101/2024.03.27.587039</dc:identifier>
<dc:title><![CDATA[Transcript-specific enrichment enables profiling rare cell states via scRNA-seq]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-03-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.04.02.716195v1?rss=1">
<title>
<![CDATA[
Hybrid crosses reveal a cell-type-specific landscape of mouse regulatory variation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.04.02.716195v1?rss=1"
</link>
<description><![CDATA[
Understanding the genetic architecture of gene expression is fundamental to evolutionary biology and medicine. As part of the IGVF Consortium, we present a single-nucleus RNA-seq resource of 6.7 million nuclei across eight tissue groups, featuring seven F1 hybrids from C57BL/6J dams crossed with the other Collaborative Cross founder strains for comparison against parental strains. We identify 25,777 genes (91% of those detected) exhibiting non-conserved regulatory behavior in at least one of 92 cell types in one or more crosses. Our results show that while cis-acting variation primarily drives divergence, trans-acting effects are substantially more cell-type specific and sensitive to tissue environment. Notably, bulk tissue analyses frequently mask these signals, particularly in smaller populations such as astrocytes. Furthermore, increasing genetic divergence primarily expands the landscape of cis-acting variation, while trans-acting effects remain stable across genetic distances within species. This atlas establishes a foundational framework for decoding the complex interplay between genetic variation and cell-type-specific regulation across the mammalian body.
]]></description>
<dc:creator>Weber, R.</dc:creator>
<dc:creator>Carilli, M.</dc:creator>
<dc:creator>Rebboah, E.</dc:creator>
<dc:creator>Filimban, G.</dc:creator>
<dc:creator>Liang, H. Y.</dc:creator>
<dc:creator>Trout, D.</dc:creator>
<dc:creator>Duffield, M.</dc:creator>
<dc:creator>Mahdipoor, P.</dc:creator>
<dc:creator>Taghizadeh, E.</dc:creator>
<dc:creator>Fattahi, N.</dc:creator>
<dc:creator>Mojaverzargar, R.</dc:creator>
<dc:creator>Kawauchi, S.</dc:creator>
<dc:creator>Williams, B. A.</dc:creator>
<dc:creator>MacGregor, G.</dc:creator>
<dc:creator>Wold, B.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:creator>Hallgrimsdottir, I. B.</dc:creator>
<dc:creator>Mortazavi, A.</dc:creator>
<dc:date>2026-04-04</dc:date>
<dc:identifier>doi:10.64898/2026.04.02.716195</dc:identifier>
<dc:title><![CDATA[Hybrid crosses reveal a cell-type-specific landscape of mouse regulatory variation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-04-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.03.04.709349v1?rss=1">
<title>
<![CDATA[
Single-Cell Genomics Decontamination with CellSweep 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.03.04.709349v1?rss=1"
</link>
<description><![CDATA[
Single-cell genomics technologies enable high-throughput cell profiling, but technical contamination remains an obstacle to accurate downstream analysis. Free-floating ambient molecules released from lysed cells and global bulk contamination introduced during library preparation can distort molecular profiles. These artifacts can obscure cellular identities and reduce the reliability of differential analysis or clustering results. We present an efficient and effective approach to removing ambient and bulk contamination that can be applied to data generated from a wide variety of technologies. We show that our tool, CellSweep, outperforms other methods to remove artifacts using numerous benchmarks.
]]></description>
<dc:creator>Caskey, M.</dc:creator>
<dc:creator>Rich, J.</dc:creator>
<dc:creator>Weber, R.</dc:creator>
<dc:creator>Mortazavi, A.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:creator>Hallgrimsdottir, I. B.</dc:creator>
<dc:date>2026-03-06</dc:date>
<dc:identifier>doi:10.64898/2026.03.04.709349</dc:identifier>
<dc:title><![CDATA[Single-Cell Genomics Decontamination with CellSweep]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-03-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.07.13.603403v1?rss=1">
<title>
<![CDATA[
Estimating cis and trans contributions todifferences in gene regulation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.07.13.603403v1?rss=1"
</link>
<description><![CDATA[
We describe a coordinate system and associated hypothesis testing framework for determining whether cis or trans regulation is responsible for differences in gene expression between two homozygous strains or species. We apply our framework to data from single replicate studies on yeast strains and human-chimpanzee hybrid cells, as well as to data from a mouse study with replicates, showing marked differences between our gene regulatory assignments and those previously reported. We also show how our multi-sample framework can determine the context dependency of cis and trans effects as well as explicitly model different hypotheses regarding the underlying mechanism of trans regulation.
]]></description>
<dc:creator>Hallgrimsdottir, I. B.</dc:creator>
<dc:creator>Carilli, M.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2024-07-16</dc:date>
<dc:identifier>doi:10.1101/2024.07.13.603403</dc:identifier>
<dc:title><![CDATA[Estimating cis and trans contributions todifferences in gene regulation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-07-16</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.11.21.689845v1?rss=1">
<title>
<![CDATA[
Determining gene specificity from multivariate single-cell RNA sequencing data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.11.21.689845v1?rss=1"
</link>
<description><![CDATA[
An important application of single-cell genomics experiments is to identify genes specific to biological categories or experimental conditions. Although numerous approaches have been proposed to identify such genes, we consider an axiomatic approach based on defining properties that a specificity measure should have. This leads us to develop ember (Entropy Metrics for Biological ExploRation), which we show is the only method satisfying four key desired properties for a specificity measure. Applying ember to eight tissues from eight founder mouse strains, we find that gene specificity is often unintuitive: canonical markers can be supplanted, housekeeping genes are context-dependent, and mouse strain can drive unexpected cell type switching. Unsupervised learning on entropy metrics uncovers shared genes specialized to male gonads and kidney, as well as genes specific to non-consecutive developmental stages in the kidney. To facilitate further exploration of gene specificity in mice, we have also developed a comprehensive specificity database, along with a web interface and API. Extending ember to a human PBMC dataset collected from 255 diverse individuals, we find that variation in PBMCs is largely localized to classical monocytes. We also find genes with unique specificity by sex, age and ancestral background. Together, these applications establish ember as a powerful tool and provide a roadmap for elucidating the impact of human genetic variation using the murine model.
]]></description>
<dc:creator>Swarna, N. P.</dc:creator>
<dc:creator>Booeshaghi, A. S.</dc:creator>
<dc:creator>Rebboah, E.</dc:creator>
<dc:creator>Gordon, M. G.</dc:creator>
<dc:creator>Kathail, P.</dc:creator>
<dc:creator>Li, T.</dc:creator>
<dc:creator>Alvarez, M.</dc:creator>
<dc:creator>Ye, C. J.</dc:creator>
<dc:creator>Wold, B. J.</dc:creator>
<dc:creator>Mortazavi, A.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2025-11-24</dc:date>
<dc:identifier>doi:10.1101/2025.11.21.689845</dc:identifier>
<dc:title><![CDATA[Determining gene specificity from multivariate single-cell RNA sequencing data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-11-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.06.28.546949v1?rss=1">
<title>
<![CDATA[
Quantitative assessment of single-cell RNA-seq clustering with CONCORDEX 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.06.28.546949v1?rss=1"
</link>
<description><![CDATA[
The rapid advancement of spatially resolved transcriptomics (SRT) technologies has facilitated exploration of how gene expression varies across tissues. However, identifying spatially variable genes remains challenging due to confounding variation introduced by the spatial distribution of cell types. We introduce a new approach to identifying spatial domains that are homogeneous with respect to cell-type composition that facilitates the decomposition of gene expression patterns by cell-type and spatial variation. Our method, called concordex, is efficient and effective across technological platforms and tissue types, and using several biological datasets we show that it can be used to identify genes with subtle variation patterns that are missed when considering only cell-type variation, or spatial variation, alone. The con-cordex tool is freely available at https://github.com/pachterlab/concordexR.
]]></description>
<dc:creator>Jackson, K. C.</dc:creator>
<dc:creator>Booeshaghi, A. S.</dc:creator>
<dc:creator>Galvez-Merchan, A.</dc:creator>
<dc:creator>Moses, L.</dc:creator>
<dc:creator>Chari, T.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2023-06-30</dc:date>
<dc:identifier>doi:10.1101/2023.06.28.546949</dc:identifier>
<dc:title><![CDATA[Quantitative assessment of single-cell RNA-seq clustering with CONCORDEX]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-06-30</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.06.04.657941v1?rss=1">
<title>
<![CDATA[
Dogme: A nextflow pipeline for reprocessing nanopore RNA and DNA modifications 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.06.04.657941v1?rss=1"
</link>
<description><![CDATA[
MotivationThe Oxford Nanopore Technologies (ONT) platform allows for the direct detection of RNA and DNA modifications from unamplified nucleic acids, which is a significant advantage over other platforms. However, the rapid updates to ONT basecalling models and the evolving landscape of computational tools for modification detection bring about challenges for reproducible and standardized analyses. To address these challenges, we developed Dogme, which is a Nextflowbased workflow that automates the processing of ONT data, including basecalling, alignment, modification detection, and transcript quantification. Dogme automates the reprocessing of ONT POD5 files by integrating basecalling using Dorado, read mapping using minimap2 and subsequent analysis steps such as running modkit. The pipeline supports three major types of ONT sequencing data - direct RNA (dRNA), complementary DNA (cDNA), and genomic DNA (gDNA) - enabling comprehensive analyses across different library preparations. Dogme facilitates detection of diverse RNA modifications supported by Dorado such as N6-methyladenosine (m6A), 5-methylcytosine (m5C), inosine, pseudouridine, 2-Omethylation (Nm) and DNA methylation, while concurrently quantifying full-length transcript isoforms LR-Kallisto for transcript quantification for dRNA and cDNA.

ResultsWe applied Dogme to three separate mouse C2C12 myoblast replicates using direct RNA sequencing on MinION flow cells. We detected an average of 147,879 m6A, 86,673 m5C, 21,242 inosine, 24,540 pseudouridine, and 83,841 2- O-methylation sites per replicate with 96,581 m6A, 43,446 m5C, 8,825 inosine, 10,048 pseudouridine, and 30,157 2-O- methylation sites detected in all three biological replicates. The pipeline produced reproducible modification profiles and transcript expression levels across replicates, demonstrating its utility for integrative long-read transcriptomic and epigenomic analyses.

AvailabilityDogme is implemented in Nextflow and is freely available under the MIT license at https://github.com/mortazavilab/dogme, with documentation provided for installation and usage.
]]></description>
<dc:creator>Abdollahzadeh, E.</dc:creator>
<dc:creator>Mortazavi, A.</dc:creator>
<dc:date>2025-06-08</dc:date>
<dc:identifier>doi:10.1101/2025.06.04.657941</dc:identifier>
<dc:title><![CDATA[Dogme: A nextflow pipeline for reprocessing nanopore RNA and DNA modifications]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-06-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.05.11.653354v1?rss=1">
<title>
<![CDATA[
Pseudoassembly of k-mers 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.05.11.653354v1?rss=1"
</link>
<description><![CDATA[
We introduce a pseudoassembly approach to identifying variation in sets of genomic sequences via colored de Bruijn graphs. Our pseudoassembly method is implemented in a program called klue that assembles k-mers into sequences compatible with a variant-aware extension of pseudoalignment. We show that this approach can be used to identify cell-type specific de novo variants from single-cell RNA-seq in a mouse melanoma model.
]]></description>
<dc:creator>Sullivan, D. K.</dc:creator>
<dc:creator>Boffelli, M.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2025-05-13</dc:date>
<dc:identifier>doi:10.1101/2025.05.11.653354</dc:identifier>
<dc:title><![CDATA[Pseudoassembly of k-mers]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-05-13</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.04.21.649844v1?rss=1">
<title>
<![CDATA[
Systematic cell-type resolved transcriptomes of 8 tissues in 8 lab and wild-derived mouse strains captures global and local expression variation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.04.21.649844v1?rss=1"
</link>
<description><![CDATA[
Mapping the impact of genomic variation on gene expression facilitates an understanding of the molecular basis of complex phenotypic traits and disease predisposition. Mouse models provide a controlled and reproducible framework for capturing the breadth of genomic variation observed in different genotypes across a wide variety of tissues. As part of the IGVF consortiums effort to catalog the effects of genetic variation, we uniformly characterized the transcriptomes of eight tissues from each mouse founder strain used to derive the Collaborative Cross strains, comprising five classical laboratory inbred strains and three wild-derived inbred strains. We sequenced samples from four male and four female replicates per tissue using single-nucleus RNA-seq to generate an "8-cube" dataset of 5.2 million nuclei across 106 cell types and cell states. As expected, the overall extent of transcriptome variation correlates positively with genetic divergence across the strains with the greatest differential between PWK/PhJ and CAST/EiJ. At the individual tissue level, heart and brain are relatively more similar across strains compared with gonads, adrenal, skeletal muscle, kidney, and liver. Further analyses revealed substantial strain variation, often concentrated in a few cell types as well as cell-state signatures that especially reflect strain-associated immune and metabolic trait differences. The founder 8-cube dataset provides rich transcriptome variation signatures to help explain strain-specific phenotypic traits and disease states, as illustrated by examples in tissue-resident immune cells, muscle degeneration, kidney sex differences, and the hypothalamicpituitary-adrenal axis. This data further provides a systematic foundation for the analysis of these tissues in the founder strains as well as the Collaborative Cross.
]]></description>
<dc:creator>Rebboah, E.</dc:creator>
<dc:creator>Weber, R.</dc:creator>
<dc:creator>Abdollahzadeh, E.</dc:creator>
<dc:creator>Swarna, N. P.</dc:creator>
<dc:creator>Sullivan, D. K.</dc:creator>
<dc:creator>Trout, D.</dc:creator>
<dc:creator>Reese, F.</dc:creator>
<dc:creator>Liang, H. Y.</dc:creator>
<dc:creator>Filimban, G.</dc:creator>
<dc:creator>Mahdipour, P.</dc:creator>
<dc:creator>Duffield, M.</dc:creator>
<dc:creator>Mojaverzargar, R.</dc:creator>
<dc:creator>Taghizadeh, E.</dc:creator>
<dc:creator>Fattahi, N.</dc:creator>
<dc:creator>Mojgani, N.</dc:creator>
<dc:creator>Zhang, H.</dc:creator>
<dc:creator>Loving, R. K.</dc:creator>
<dc:creator>Carilli, M.</dc:creator>
<dc:creator>Booeshaghi, A. S.</dc:creator>
<dc:creator>Kawauchi, S.</dc:creator>
<dc:creator>Hallgrimsdottir, I. B.</dc:creator>
<dc:creator>Williams, B. A.</dc:creator>
<dc:creator>MacGregor, G.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:creator>Wold, B.</dc:creator>
<dc:creator>Mortazavi, A.</dc:creator>
<dc:date>2025-04-24</dc:date>
<dc:identifier>doi:10.1101/2025.04.21.649844</dc:identifier>
<dc:title><![CDATA[Systematic cell-type resolved transcriptomes of 8 tissues in 8 lab and wild-derived mouse strains captures global and local expression variation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-04-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.02.24.640007v1?rss=1">
<title>
<![CDATA[
Geospatially informed representation of spatial genomics data with SpatialFeatureExperiment 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.02.24.640007v1?rss=1"
</link>
<description><![CDATA[
SpatialFeatureExperiment is a Bioconductor package that leverages the versatility of Simple Features for spatial data analysis and SpatialExperiment for single-cell -omics to provide an expansive and convenient S4 class for working with spatial -omics data.

SpatialFeatureExperiment can be used to store and analyze a variety of spatial -omics data types, including data from the Visium, Xenium, MERFISH, SeqFish, and Slide-seq platforms, bringing spatial operations to the SingleCellExperiment ecosystem.
]]></description>
<dc:creator>Moses, L.</dc:creator>
<dc:creator>Huseynov, A.</dc:creator>
<dc:creator>Rich, J.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2025-02-27</dc:date>
<dc:identifier>doi:10.1101/2025.02.24.640007</dc:identifier>
<dc:title><![CDATA[Geospatially informed representation of spatial genomics data with SpatialFeatureExperiment]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-02-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.07.19.604364v1?rss=1">
<title>
<![CDATA[
Long-read sequencing transcriptome quantification with lr-kallisto 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.07.19.604364v1?rss=1"
</link>
<description><![CDATA[
RNA abundance quantification has become routine and affordable thanks to high-throughput "short-read" technologies that provide accurate molecule counts at the gene level. Similarly accurate and affordable quantification of definitive fulllength, transcript isoforms has remained a stubborn challenge, despite its obvious biological significance across a wide range of problems. "Long-read" sequencing platforms now produce data-types that can, in principle, drive routine definitive isoform quantification. However some particulars of contemporary long-read datatypes, together with isoform complexity and genetic variation, present bioinformatic challenges. We show here, using ONT data, that fast and accurate quantification of long-read data is possible and that it is improved by exome capture. To perform quantifications we developed lr-kallisto, which adapts the kallisto bulk and single-cell RNA-seq quantification methods for long-read technologies.
]]></description>
<dc:creator>Loving, R. K.</dc:creator>
<dc:creator>Sullivan, D. K.</dc:creator>
<dc:creator>Reese, F.</dc:creator>
<dc:creator>Rebboah, E.</dc:creator>
<dc:creator>Sakr, J.</dc:creator>
<dc:creator>Rezaie, N.</dc:creator>
<dc:creator>Liang, H. Y.</dc:creator>
<dc:creator>Filimban, G.</dc:creator>
<dc:creator>Kawauchi, S.</dc:creator>
<dc:creator>Oakes, C.</dc:creator>
<dc:creator>Trout, D.</dc:creator>
<dc:creator>Williams, B. A.</dc:creator>
<dc:creator>MacGregor, G.</dc:creator>
<dc:creator>Wold, B.</dc:creator>
<dc:creator>Mortazavi, A.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2024-07-19</dc:date>
<dc:identifier>doi:10.1101/2024.07.19.604364</dc:identifier>
<dc:title><![CDATA[Long-read sequencing transcriptome quantification with lr-kallisto]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-07-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.07.04.602131v1?rss=1">
<title>
<![CDATA[
Stochastic Modeling of Biophysical Responses to Perturbation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.07.04.602131v1?rss=1"
</link>
<description><![CDATA[
Recent advances in high-throughput, multi-condition experiments allow for genome-wide investigation of how perturbations affect transcription and translation in the cell across multiple biological entities or modalities, from chromatin and mRNA information to protein production and spatial morphology. This presents an unprecedented opportunity to unravel how the processes of DNA and RNA regulation direct cell fate determination and disease response. Most methods designed for analyzing large-scale perturbation data focus on the observational outcomes, e.g., expression; however, many potential transcriptional mechanisms, such as transcriptional bursting or splicing dynamics, can underlie these complex and noisy observations. In this analysis, we demonstrate how a stochastic biophysical modeling approach to interpreting high-throughout perturbation data enables deeper investigation of the  how behind such molecular measurements. Our approach takes advantage of modalities already present in data produced with current technologies, such as nascent and mature mRNA measurements, to illuminate transcriptional dynamics induced by perturbation, predict kinetic behaviors in new perturbation settings, and uncover novel populations of cells with distinct kinetic responses to perturbation.
]]></description>
<dc:creator>Chari, T.</dc:creator>
<dc:creator>Gorin, G.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2024-07-06</dc:date>
<dc:identifier>doi:10.1101/2024.07.04.602131</dc:identifier>
<dc:title><![CDATA[Stochastic Modeling of Biophysical Responses to Perturbation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-07-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.04.04.588111v1?rss=1">
<title>
<![CDATA[
The impact of package selection and versioning on single-cell RNA-seq analysis 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.04.04.588111v1?rss=1"
</link>
<description><![CDATA[
Standard single-cell RNA-sequencing analysis (scRNA-seq) workflows consist of converting raw read data into cell-gene count matrices through sequence alignment, followed by analyses including filtering, highly variable gene selection, dimensionality reduction, clustering, and differential expression analysis. Seurat and Scanpy are the most widely-used packages implementing such workflows, and are generally thought to implement individual steps similarly. We investigate in detail the algorithms and methods underlying Seurat and Scanpy and find that there are, in fact, considerable differences in the outputs of Seurat and Scanpy. The extent of differences between the programs is approximately equivalent to the variability that would be introduced in benchmarking scRNA-seq datasets by sequencing less than 5% of the reads or analyzing less than 20% of the cell population. Additionally, distinct versions of Seurat and Scanpy can produce very different results, especially during parts of differential expression analysis. Our analysis highlights the need for users of scRNA-seq to carefully assess the tools on which they rely, and the importance of developers of scientific software to prioritize transparency, consistency, and reproducibility for their tools.
]]></description>
<dc:creator>Rich, J. M.</dc:creator>
<dc:creator>Moses, L.</dc:creator>
<dc:creator>Einarsson, P. H.</dc:creator>
<dc:creator>Jackson, K.</dc:creator>
<dc:creator>Luebbert, L.</dc:creator>
<dc:creator>Booeshaghi, A. S.</dc:creator>
<dc:creator>Antonsson, S.</dc:creator>
<dc:creator>Sullivan, D. K.</dc:creator>
<dc:creator>Bray, N.</dc:creator>
<dc:creator>Melsted, P.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2024-04-05</dc:date>
<dc:identifier>doi:10.1101/2024.04.04.588111</dc:identifier>
<dc:title><![CDATA[The impact of package selection and versioning on single-cell RNA-seq analysis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-04-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2022.12.02.518832v1?rss=1">
<title>
<![CDATA[
Accurate quantification of single-nucleus and single-cell RNA-seq transcripts 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2022.12.02.518832v1?rss=1"
</link>
<description><![CDATA[
In single-cell and single-nucleus RNA sequencing, the coexistence of nascent (unprocessed) and mature (processed) mRNA poses challenges in accurate read mapping and the interpretation of count matrices. The traditional transcriptome reference, defining the  region of interest in bulk RNA-seq, restricts its focus to mature mRNA transcripts. This restriction leads to two problems: reads originating outside of the  region of interest are prone to mismapping within this region, and additionally, such external reads cannot be matched to specific transcript targets. Expanding the  region of interest to encompass both nascent and mature mRNA transcript targets provides a more comprehensive framework for RNA-seq analysis. Here, we introduce the concept of distinguishing flanking k-mers (DFKs) to improve mapping of sequencing reads. We have developed an algorithm to identify DFKs, which serve as a sophisticated  background filter, enhancing the accuracy of mRNA quantification. This dual strategy of an expanded region of interest coupled with the use of DFKs enhances the precision in quantifying both mature and nascent mRNA molecules, as well as in delineating reads of ambiguous status.
]]></description>
<dc:creator>Eldjarn Hjörleifsson, K.</dc:creator>
<dc:creator>Sullivan, D. K.</dc:creator>
<dc:creator>Holley, G.</dc:creator>
<dc:creator>Melsted, P.</dc:creator>
<dc:creator>Pachter, L.</dc:creator>
<dc:date>2022-12-02</dc:date>
<dc:identifier>doi:10.1101/2022.12.02.518832</dc:identifier>
<dc:title><![CDATA[Accurate quantification of single-nucleus and single-cell RNA-seq transcripts]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2022-12-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2023.09.07.556700v1?rss=1">
<title>
<![CDATA[
Organ-specific prioritization and annotation of non-coding regulatory variants in the human genome 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2023.09.07.556700v1?rss=1"
</link>
<description><![CDATA[
Identifying non-coding regulatory variants in the human genome remains a challenging task in genomics. Recently, we released the second version of our leading regulatory variant database, RegulomeDB. Building upon this comprehensive database, we developed a novel machine-learning architecture, TLand, which utilizes RegulomeDB-derived features to predict regulatory variants at the cell- or organ-specific level. In our holdout benchmarking, TLand consistently outperformed state-of-the-art models, demonstrating its ability to generalize to new cell lines or organs. We trained three types of organ-specific TLand models to overcome the common model bias toward high data availability cell lines or organs. These models accurately prioritize relevant organs for 2 million GWAS SNPs associated with GWAS traits. Moreover, our analysis of top-scoring variants in specific organ models showed a high enrichment of relevant GWAS traits. We expect that TLand and RegulomeDB will further advance our ability to understand human regulatory variants genome-wide.
]]></description>
<dc:creator>Zhao, N.</dc:creator>
<dc:creator>Dong, S.</dc:creator>
<dc:creator>Boyle, A. P.</dc:creator>
<dc:date>2023-09-08</dc:date>
<dc:identifier>doi:10.1101/2023.09.07.556700</dc:identifier>
<dc:title><![CDATA[Organ-specific prioritization and annotation of non-coding regulatory variants in the human genome]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2023-09-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.05.16.654442v1?rss=1">
<title>
<![CDATA[
Long-term Reprogramming and Altered Ontogeny of Classical Monocytes Mediates Enhanced Lung Injury in Sepsis Survivor Mice 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.05.16.654442v1?rss=1"
</link>
<description><![CDATA[
Patients who survive sepsis are predisposed to new hospitalizations for respiratory failure, but the underlying mechanisms are unknown. Using a murine model in which prior sepsis predisposes to enhanced lung injury, we previously discovered that classical monocytes persist in the lungs after long-term recovery from sepsis and exhibit enhanced cytokine expression after secondary challenge with intra-nasal lipopolysaccharide. Here, we hypothesized that immune reprogramming of post-sepsis monocytes and altered ontogeny predispose to enhanced lung injury. Monocyte depletion and/or adoptive transfer was performed three weeks and three months after sepsis. Monocytes from post-sepsis mice were necessary and sufficient for enhanced LPS-induced lung injury and promoted neutrophil degranulation. Prior sepsis enhanced JAK-STAT signaling and AP-1 binding in monocytes and shifted monocytes toward the neutrophil-like monocyte lineage. In human sepsis and/or pneumonia survivors, monocytes were predictive of 90-day mortality and exhibit transcriptional and proteomic neutrophil-like signatures. We conclude that sepsis reprograms monocytes into a pro-inflammatory phenotype and skews bone marrow progenitors and monocytes toward the neutrophil-like lineage, predisposing to neutrophil degranulation and lung injury.
]]></description>
<dc:creator>Denstaedt, S. J.</dc:creator>
<dc:creator>McBean, B.</dc:creator>
<dc:creator>Boyle, A. P.</dc:creator>
<dc:creator>Arenberg, B. C.</dc:creator>
<dc:creator>Mack, M.</dc:creator>
<dc:creator>Moore, B. B.</dc:creator>
<dc:creator>Newstead, M. W.</dc:creator>
<dc:creator>Singer, B. H.</dc:creator>
<dc:creator>Cano, J.</dc:creator>
<dc:creator>Prescott, H. C.</dc:creator>
<dc:creator>Goodridge, H. S.</dc:creator>
<dc:creator>Zemans, R. L.</dc:creator>
<dc:date>2025-05-21</dc:date>
<dc:identifier>doi:10.1101/2025.05.16.654442</dc:identifier>
<dc:title><![CDATA[Long-term Reprogramming and Altered Ontogeny of Classical Monocytes Mediates Enhanced Lung Injury in Sepsis Survivor Mice]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-05-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.05.08.652986v1?rss=1">
<title>
<![CDATA[
Developing a general AI model for integrating diverse genomic modalities and comprehensive genomic knowledge 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.05.08.652986v1?rss=1"
</link>
<description><![CDATA[
Advances in next-generation sequencing technologies have vastly expanded the availability of diverse genomic, epigenomic and transcriptomic data, presenting the opportunity to develop a general AI model that integrates comprehensive genomic knowledge into a unified model. Unlike previous predictive models, which are typically specialized to certain tasks, our general AI model unifies a wide range of genomic modalities, such as nascent RNA and ultra-high-resolution chromatin organization, within a multi-task architecture. Using ATAC-seq and DNA sequences as inputs, we incorporated diverse genomic modalities as output, and the model exhibits strong generalizability across different cell types and tissues in all tasks we trained. It accurately predicts gene-level transcription measured by various nascent RNA assays, and effectively captures enhancer-associated transcription. Additionally, it also accurately captures the potential functions of non-coding genetic variants and regulatory elements. Additionally, we extended the model trained on human data to a mouse general model, achieving accurate predictions of genomic modalities, such as high resolution chromatin contact maps with limited data availability, which are further validated using an established mouse inner-ear study. This comprehensive approach offers a powerful tool for understanding genome regulation in both human and mouse species.
]]></description>
<dc:creator>Zhang, Z.</dc:creator>
<dc:creator>Bao, X.</dc:creator>
<dc:creator>Jiang, L.</dc:creator>
<dc:creator>Luo, X.</dc:creator>
<dc:creator>Wang, Y.</dc:creator>
<dc:creator>Comai, A.</dc:creator>
<dc:creator>Waldhaus, J.</dc:creator>
<dc:creator>Hansen, A. S.</dc:creator>
<dc:creator>Li, W.</dc:creator>
<dc:creator>Liu, J.</dc:creator>
<dc:date>2025-05-14</dc:date>
<dc:identifier>doi:10.1101/2025.05.08.652986</dc:identifier>
<dc:title><![CDATA[Developing a general AI model for integrating diverse genomic modalities and comprehensive genomic knowledge]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-05-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.02.07.579365v1?rss=1">
<title>
<![CDATA[
Enhancing Portability of Trans-Ancestral Polygenic Risk Scores through Tissue-Specific Functional Genomic Data Integration 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.02.07.579365v1?rss=1"
</link>
<description><![CDATA[
Portability of trans-ancestral polygenic risk scores is often confounded by differences in linkage disequilibrium and genetic architecture between ancestries. Recent literature has shown that prioritizing GWAS SNPs with functional genomic evidence over strong association signals can improve model portability. We leveraged three RegulomeDB-derived functional regulatory annotations - SURF, TURF, and TLand - to construct polygenic risk models across a set of quantitative and binary traits highlighting functional mutations tagged by trait-associated tissue annotations. Tissue-specific prioritization by TURF and TLand provide a significant improvement in model accuracy over standard polygenic risk score (PRS) models across all traits. We developed the Trans-ancestral Iterative Tissue Refinement (TITR) algorithm to construct PRS models that prioritize functional mutations across multiple trait-implicated tissues. TITR-constructed PRS models show increased predictive accuracy over single tissue prioritization. This indicates our TITR approach captures a more comprehensive view of regulatory systems across implicated tissues that contribute to variance in trait expression.
]]></description>
<dc:creator>Crone, B.</dc:creator>
<dc:creator>Boyle, A. P.</dc:creator>
<dc:date>2024-02-09</dc:date>
<dc:identifier>doi:10.1101/2024.02.07.579365</dc:identifier>
<dc:title><![CDATA[Enhancing Portability of Trans-Ancestral Polygenic Risk Scores through Tissue-Specific Functional Genomic Data Integration]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-02-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.01.20.700519v1?rss=1">
<title>
<![CDATA[
Federated single-cell QTL meta-analysis reveals novel disease mechanisms 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.01.20.700519v1?rss=1"
</link>
<description><![CDATA[
Genetic effects on gene expression are often cell type-specific and obscured in bulk analyses. To resolve this context-dependent regulation, we performed a federated cis-eQTL meta-analysis across 12 PBMC datasets (2,032 individuals, 2.5 million cells). Across six immune cell types, we identified cis-eQTLs for 6,592 genes and fine-mapped 14,985 independent loci. Notably, the 42% of eQTLs that were undetected in a bulk eQTL study on 43,301 whole blood samples also showed stronger enrichment for disease GWAS loci. We further identified three genome-wide significant and 65 suggestive loci affecting the abundance of (rare) immune cell types and validated these using previously reported hematological GWAS and bulk-derived trans-eQTLs. Integrating single-cell cis-eQTLs with bulk trans-eQTLs enabled us to anchor 6,382 trans-eGenes (37.2% novel) to upstream regulators and reconstruct directed gene regulatory relationships. For example, a hemorrhoidal disease-associated variant showed a CD4+ T cell-specific cis-eQTL on BACH1 that colocalized with 45 immune and metabolic trans-eGenes. These results demonstrate the power of single-cell QTL meta-analysis in interpreting complex trait genetics.
]]></description>
<dc:creator>Kaptijn, D.</dc:creator>
<dc:creator>Michielsen, L.</dc:creator>
<dc:creator>Neavin, D.</dc:creator>
<dc:creator>Ripoll-Cladellas, A.</dc:creator>
<dc:creator>Alquicira-Hernandez, J. E.</dc:creator>
<dc:creator>Korshevniuk, M.</dc:creator>
<dc:creator>Lee, J. T. H.</dc:creator>
<dc:creator>Oelen, R.</dc:creator>
<dc:creator>Vochteloo, M.</dc:creator>
<dc:creator>Warmerdam, R.</dc:creator>
<dc:creator>Ando, Y.</dc:creator>
<dc:creator>Ban, M.</dc:creator>
<dc:creator>Bayaraa, O.</dc:creator>
<dc:creator>Berg, M.</dc:creator>
<dc:creator>van Blokland, I.</dc:creator>
<dc:creator>Considine, D.</dc:creator>
<dc:creator>Dieng, M. M.</dc:creator>
<dc:creator>Edahiro, R.</dc:creator>
<dc:creator>Gordon, M. G.</dc:creator>
<dc:creator>Groot, H. E.</dc:creator>
<dc:creator>van der Harst, P.</dc:creator>
<dc:creator>Heinig, M.</dc:creator>
<dc:creator>Hon, C.-C.</dc:creator>
<dc:creator>Idaghdour, Y.</dc:creator>
<dc:creator>Kathail, P.</dc:creator>
<dc:creator>de Klein, N.</dc:creator>
<dc:creator>Li, W.</dc:creator>
<dc:creator>Li, Y.</dc:creator>
<dc:creator>Losert, C.</dc:creator>
<dc:creator>Manikanda, V.</dc:creator>
<dc:creator>Moody, J.</dc:creator>
<dc:creator>Naeem, H.</dc:creator>
<dc:creator>Mokrab, Y.</dc:creator>
<dc:creator>Nawijn, M. C.</dc:creator>
<dc:creator>Netea, M.</dc:creator>
<dc:creator>Niewold, J.</dc:creator>
<dc:creator>Okada, Y.</dc:creator>
<dc:creator>Sawcer, S.</dc:creator>
<dc:creator>Soulama, I.</dc:creator>
<dc:creator>Stegle, O.</dc:creator>
<dc:creator>Tsepilov, Y.</dc:creator>
<dc:creator>Park, W.-Y.</dc:creator>
<dc:creator>Rajagopalan, D.</dc:creator>
<dc:creator>Shahin, T.</dc:creator>
<dc:creator></dc:creator>
<dc:date>2026-01-23</dc:date>
<dc:identifier>doi:10.64898/2026.01.20.700519</dc:identifier>
<dc:title><![CDATA[Federated single-cell QTL meta-analysis reveals novel disease mechanisms]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-01-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.02.02.703413v1?rss=1">
<title>
<![CDATA[
Farm animal evolution demonstrates hidden molecular basis of human traits 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.02.02.703413v1?rss=1"
</link>
<description><![CDATA[
Most human variants identified by genome-wide association studies are believed to affect traits by altering gene expression. This belief is supported by considerable circumstantial evidence, but statistical methods are unable to link most trait-associated variants to gene expression--a problem we refer to as "missing regulation." Many explanations have been proposed, including the possibility that natural selection on gene expression limits power. Here, we take a novel approach to the question of missing regulation, beginning with the observation that the majority of trait-associated variants alter gene expression in two non-human species: cattle and pigs. We explain this discrepancy by comparing the species evolutionary histories. The observed differences in regulatory variants are consistent with selection on human gene regulation and increased genetic drift due to agricultural breeding. The differences are not limited to specific genes and reflect increased ascertainment of regulatory variants that are distal to genes. Additionally, we show that trait-associated gene regulation in cattle and pigs matches observed patterns from complex-trait genetics in humans, and may reflect currently unobserved trait-associated regulation in humans.
]]></description>
<dc:creator>Connally, N. J.</dc:creator>
<dc:creator>Sunyaev, S.</dc:creator>
<dc:date>2026-02-04</dc:date>
<dc:identifier>doi:10.64898/2026.02.02.703413</dc:identifier>
<dc:title><![CDATA[Farm animal evolution demonstrates hidden molecular basis of human traits]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-02-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.04.30.651435v1?rss=1">
<title>
<![CDATA[
Cellular and Regional Vulnerability Shapes the Molecular Landscape of Psychosis in Alzheimers Disease 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.04.30.651435v1?rss=1"
</link>
<description><![CDATA[
Approximately 40 percent of Alzheimers disease patients develop psychosis (AD+P), yet the molecular and cellular basis of these symptoms remains poorly understood. Here we profiled single-nucleus transcriptomes and epigenomes from 48 postmortem Alzheimers brains stratified by psychiatric diagnosis. Across cell types, AD+P was distinguished by transcriptional programs in upper-layer cortical pyramidal neurons consistent with re-engagement of developmental and structural plasticity pathways. These neurons exhibited greater loss in AD+P cortex, indicating that such programs emerge in a context of heightened vulnerability. Integrating these findings with functional perturbation screens in stem-cell-derived brain organoids, we found that activation of these programs alters cortico-cortical network connectivity and can exacerbate network dysfunction. Our data suggest that compensatory neuronal plasticity, shaped by glial inflammatory responses, may paradoxically contribute to circuit instability and selective vulnerability underlying neuropsychiatric symptoms in dementia.

HighlightsO_LICell-type- and brain-region-specific transcriptional changes in AD with psychosis (AD+P)
C_LIO_LIUpper-layer pyramidal dysfunction and metabolic vulnerability marks the pathophysiology of AD+P
C_LIO_LICircuit wiring programs are evoked in AD+P as maladaptive compensatory responses
C_LIO_LIAD+P-associated IL-6 signaling impairs neuronal network function in brain organoids
C_LI
]]></description>
<dc:creator>Victor, M. B.</dc:creator>
<dc:creator>Sun, N.</dc:creator>
<dc:creator>Galani, K.</dc:creator>
<dc:creator>Leary, N.</dc:creator>
<dc:creator>Tanigawa, Y.</dc:creator>
<dc:creator>Scannail, A. N.</dc:creator>
<dc:creator>Ho, L.-L.</dc:creator>
<dc:creator>Prosper, S.</dc:creator>
<dc:creator>Liu, L.</dc:creator>
<dc:creator>Kofler, J. K.</dc:creator>
<dc:creator>Sweet, R.</dc:creator>
<dc:creator>Tsai, L.-H.</dc:creator>
<dc:creator>Kellis, M.</dc:creator>
<dc:date>2025-05-07</dc:date>
<dc:identifier>doi:10.1101/2025.04.30.651435</dc:identifier>
<dc:title><![CDATA[Cellular and Regional Vulnerability Shapes the Molecular Landscape of Psychosis in Alzheimers Disease]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-05-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.01.07.631688v1?rss=1">
<title>
<![CDATA[
NERINE reveals rare variant associations in gene networks across multiple phenotypes and implicates an SNCA-PRL-LRRK2 subnetwork in Parkinson's disease 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.01.07.631688v1?rss=1"
</link>
<description><![CDATA[
There are two primary approaches to study the genetic basis of human phenotypes. Experiments in model systems generate interpretable gene networks but, in isolation, do not establish relevance to the human condition. Statistical genetics identifies relevant association signals at the variant or gene level but lacks tools to test specific mechanistic models, as existing methods do not incorporate the topology of gene-gene interactions. We bridge these two strategies by introducing a method that competitively tests network hypotheses with rare variant associations. A hierarchical model-based association test NERINE for the first time incorporates gene network topology while remaining resilient to network inaccuracies. We demonstrate NERINEs ability to test network hypotheses derived from both canonical pathway databases and model system screens. Comprehensive database-wide search of pathway networks with NERINE uncovers compelling associations for breast cancer, cardiovascular diseases, and type II diabetes, which are undetected by single-gene tests. Testing bespoke networks from experimental screens targeting key PD pathologies: dopaminergic neuron survival and -synuclein pathobiology, NERINE highlights rare variant burden in gene modules related to autophagy, vesicle trafficking, and protein homeostasis. Genome-scale CRISPRi-screening of -synuclein toxicity modifiers in human neurons and NERINE converge on PRL, revealing an intraneuronal -synuclein/prolactin stress response that may impact resilience to PD pathologies.
]]></description>
<dc:creator>Nazeen, S.</dc:creator>
<dc:creator>Wang, X.</dc:creator>
<dc:creator>Morrow, A.</dc:creator>
<dc:creator>Strom, R.</dc:creator>
<dc:creator>Ethier, E.</dc:creator>
<dc:creator>Ritter, D.</dc:creator>
<dc:creator>Henderson, A.</dc:creator>
<dc:creator>Afroz, J.</dc:creator>
<dc:creator>Stitziel, N. O.</dc:creator>
<dc:creator>Gupta, R. M.</dc:creator>
<dc:creator>Luk, K.</dc:creator>
<dc:creator>Studer, L.</dc:creator>
<dc:creator>Khurana, V.</dc:creator>
<dc:creator>Sunyaev, S. R.</dc:creator>
<dc:date>2025-01-10</dc:date>
<dc:identifier>doi:10.1101/2025.01.07.631688</dc:identifier>
<dc:title><![CDATA[NERINE reveals rare variant associations in gene networks across multiple phenotypes and implicates an SNCA-PRL-LRRK2 subnetwork in Parkinson's disease]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-01-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.01.09.631797v1?rss=1">
<title>
<![CDATA[
Inherent instability of simple DNA repeats shapes an evolutionarily stable distribution of repeat lengths 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.01.09.631797v1?rss=1"
</link>
<description><![CDATA[
Using the Telomere-to-Telomere reference, we assembled the distribution of simple repeat lengths present in the human genome. Analyzing over two hundred mammalian genomes, we found remarkable consistency in the shape of the distribution across evolutionary epochs. All observed genomes harbor an excess of long repeats, which are potentially prone to developing into repeat expansion disorders. We measured mutation rates for repeat length instability, quantitatively modeled the per-generation action of mutations, and observed the corresponding long-term behavior shaping the repeat length distribution. We found that short repetitive sequences appear to be a straightforward consequence of random substitution. Evolving largely independently, longer repeats (above roughly 10 nt) emerge and persist in a rapidly mutating dynamic balance between expansion, contraction and interruption. These mutational processes, collectively, are sufficient to explain the abundance of long repeats, without invoking natural selection. Our analysis constrains properties of molecular mechanisms responsible for maintaining genome fidelity that underlie repeat instability.
]]></description>
<dc:creator>McGinty, R. J.</dc:creator>
<dc:creator>Balick, D. J.</dc:creator>
<dc:creator>Mirkin, S. M.</dc:creator>
<dc:creator>Sunyaev, S. R.</dc:creator>
<dc:date>2025-01-10</dc:date>
<dc:identifier>doi:10.1101/2025.01.09.631797</dc:identifier>
<dc:title><![CDATA[Inherent instability of simple DNA repeats shapes an evolutionarily stable distribution of repeat lengths]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-01-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.10.02.680094v1?rss=1">
<title>
<![CDATA[
Segregating DNA lesions point to high selective advantage of tumor initiating cells 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.10.02.680094v1?rss=1"
</link>
<description><![CDATA[
The complications with identifying cells at the origin of cancer and tracking their early divisions impede studies of cancer initiation. Recently, it was shown that some DNA lesions generated by a pulse of damage-inducing mutagen persist over multiple rounds of replication. Segregation of DNA lesions in the early genealogy of an expanding clone leaves a statistically interpretable footprint of cancer initiating events. Specifically, it allows for estimating the number of cell divisions between the initiating DNA lesion and the most recent common ancestor of the tumor. Here, we analyze footprints of segregating lesions from a previously published experimental mouse system, as well as post-chemotherapy human metastatic tumors and the blood of chemotherapy treated patients. In all contexts, clones tend to start early, usually within the span of 4 cell generations from mutagen exposure. Using a branching process model, we show that fitness advantage of early cancer drivers exceeds 30%, with each early division leading to at least 1.3 self-renewing cells. We highlight an example of a blood-derived single cell phylogeny with major subclones separated by just two cell divisions. Broadly, our approach allows inference of tumor initiation and growth parameters based on events preceding the most recent common ancestor of the initiating clone as opposed to characteristics of fully grown tumors.
]]></description>
<dc:creator>Seplyarskiy, V.</dc:creator>
<dc:creator>Shady, M.</dc:creator>
<dc:creator>Andrianova, M. A.</dc:creator>
<dc:creator>Chapman, M. S.</dc:creator>
<dc:creator>Van Allen, E.</dc:creator>
<dc:creator>Sunyaev, S. R.</dc:creator>
<dc:date>2025-10-04</dc:date>
<dc:identifier>doi:10.1101/2025.10.02.680094</dc:identifier>
<dc:title><![CDATA[Segregating DNA lesions point to high selective advantage of tumor initiating cells]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-10-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.09.23.677874v1?rss=1">
<title>
<![CDATA[
Functional and dysfunctional T regulatory cell states in human tissues in RA and other autoimmune arthritic diseases 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.09.23.677874v1?rss=1"
</link>
<description><![CDATA[
Regulatory T cells (Tregs), characterized by FOXP3 expression, are essential for maintaining immune homeostasis by controlling inflammation. However, in autoimmune diseases such as rheumatoid arthritis (RA), impaired Treg function contributes to immune dysregulation and disease pathology. While most studies of human Tregs have focused on blood, here we analyzed Tregs in synovial tissues from RA patients using single cell RNA sequencing (scRNAseq). We identified two predominant Treg states, CD25hiCXCR6pos Tregs with strong suppressive function, and CD25loAREGpos Tregs, a dysfunctional state exclusively enriched in synovial tissues but not in blood. Computational and in vitro analyses revealed that cortisol induced AREG expression, suppressed glycolysis, and impaired the suppressive function of CD25loAREGpos Tregs. In turn, AREG promoted an IL-33+ inflammatory phenotype in synovial fibroblasts. Importantly, we found that TNFR2 engagement can prevent or reverse this dysfunctional Treg state. In contrast to CD25loAREGpos Tregs, CD25hiCXCR6pos Tregs were highly suppressive, showed coordinated abundance with macrophages in synovial tissue, and functionally interacted with membrane-bound TNF expressed by macrophages, which promoted their functional suppressive state. These two Treg subsets were similarly found in the synovial tissue in Juvenile Idiopathic Arthritis (JIA), another inflammatory arthritic disorder, indicating conserved mechanisms across arthritic diseases. Together, our findings define distinct pathways driving divergent functional and dysfunctional Treg states in inflamed tissues and point to interventions that may prevent or reverse the development of the dysfunctional state.
]]></description>
<dc:creator>Koh, B.</dc:creator>
<dc:creator>Gal Oz, S. T.</dc:creator>
<dc:creator>Sato, R.</dc:creator>
<dc:creator>Nguyen, H.</dc:creator>
<dc:creator>Dunlap, G. S.</dc:creator>
<dc:creator>Mahony, C.</dc:creator>
<dc:creator>Bolton, C.</dc:creator>
<dc:creator>Wedderburn, L. R.</dc:creator>
<dc:creator>Croft, A.</dc:creator>
<dc:creator>Donlin, L.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:creator>Korsunskiy, I.</dc:creator>
<dc:creator>Rao, D. A.</dc:creator>
<dc:creator>Brenner, M. B.</dc:creator>
<dc:date>2025-09-25</dc:date>
<dc:identifier>doi:10.1101/2025.09.23.677874</dc:identifier>
<dc:title><![CDATA[Functional and dysfunctional T regulatory cell states in human tissues in RA and other autoimmune arthritic diseases]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-09-25</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.08.20.671277v1?rss=1">
<title>
<![CDATA[
TCR germline diversity reveals evidence of natural selection on variable and joining alpha chain genes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.08.20.671277v1?rss=1"
</link>
<description><![CDATA[
T cell receptors (TCRs) orchestrate adaptive immunity, yet the complex, repetitive architecture of the TCR loci has impeded systematic characterization of human genetic variation in the genes encoding the TCR. Using public long-read sequencing data from 2,668 donors, we build a near-complete map of common alleles in TCR V, D, and J genes, revealing amino acid variation at almost every position within V genes. We discover pervasive evidence of natural selection on TCR genes, including balancing selection on a TRAJ gene recognizing an immunodominant influenza epitope and positive selection on a TRAV gene. We find TCR allelic polymorphism alters core functional properties of T cells, including thymic fate commitment, phenotypes in diseased tissues, and cell-surface receptor abundance. Collectively, our findings position inherited variation in TCR genes as a key axis of immunological diversity that may shape interindividual differences in immune responses.
]]></description>
<dc:creator>Mantena, S.</dc:creator>
<dc:creator>Akbari, A.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:date>2025-08-24</dc:date>
<dc:identifier>doi:10.1101/2025.08.20.671277</dc:identifier>
<dc:title><![CDATA[TCR germline diversity reveals evidence of natural selection on variable and joining alpha chain genes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-08-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.04.02.646871v1?rss=1">
<title>
<![CDATA[
Defining effective strategies to integrate multi-sample single-nucleus ATAC-seq datasets via a multimodal-guided approach 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.04.02.646871v1?rss=1"
</link>
<description><![CDATA[
BackgroundChromatin accessibility, measured via single-nucleus Assay for Transposase-Accessible Chromatin with sequencing (snATAC-seq), can reveal the underpinnings of transcriptional regulation across heterogeneous cell states. As the number and scale of snATAC-seq datasets increases, we need robust computational pipelines to integrate samples within a dataset and datasets across studies. These integration pipelines should correct cell-state-obfuscating technical effects while conserving underlying biological cell states, as has been shown for single-cell RNA-seq (scRNA-seq) pipelines. However, scRNA-seq integration methods have performed inconsistently on snATAC-seq datasets, potentially due to sparsity and genomic feature differences.

ResultsUsing single-nucleus multimodal datasets profiling ATAC and RNA simultaneously, we can measure snATAC-seq integration method performance by comparison to independently integrated snRNA-seq gold standard embeddings and annotations. Here, we benchmark 58 pipelines, incorporating 7 integration methods plus 1 embedding correction method with 5 feature sets. Using our command-line tool, we assessed 5 multimodal datasets at 3 different resolutions using 2 novel metrics to determine the best practices for multi-sample snATAC-seq integration. ATAC features outperformed Gene Activity Score (GAS) features, and embedding correction with Harmony was generally useful. SnapATAC2, PeakVI, and ArchRs iterative Latent Semantic Indexing (LSI) performed well.

ConclusionsWe recommend SnapATAC2 + Harmony with pre-defined ENCODE candidate cis-regulatory element (cCRE) features as a first-pass pipeline given its metric performance, generalizability of features, and method resource-efficiency. This and other high-performing pipelines will guide future comprehensive gene regulation maps.
]]></description>
<dc:creator>Weinand, K.</dc:creator>
<dc:creator>Langan, E. M.</dc:creator>
<dc:creator>Curtis, M.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:date>2025-04-03</dc:date>
<dc:identifier>doi:10.1101/2025.04.02.646871</dc:identifier>
<dc:title><![CDATA[Defining effective strategies to integrate multi-sample single-nucleus ATAC-seq datasets via a multimodal-guided approach]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-04-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.02.24.639351v1?rss=1">
<title>
<![CDATA[
Early and late RNA eQTL are driven by different genetic mechanisms 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.02.24.639351v1?rss=1"
</link>
<description><![CDATA[
Understanding the genetic regulation of RNA abundance is essential to defining disease mechanisms. However, conventional expression quantitative loci (eQTL) studies quantify RNA molecules across the transcript lifecycle. While most eQTL likely affect transcription by altering promoter or enhancer function within the nucleus, it is also possible that they modulate any processes after transcription, including chemical modifications and RNA stability in the cytosol. To elucidate distinct eQTL mechanisms of early versus late RNA, we compared eQTL from mature cellular RNA and nascent nuclear RNA in the brain and the kidney. Across tissues, we identified different causal variants for cellular and nuclear eQTL for the same eGene. Cellular eQTL were enriched in transcribed regions (P=3.3x10-126), suggesting the importance of post-transcriptional regulation. Conversely, nuclear eQTL were enriched in distal regulatory elements (P=7.0x10-32), highlighting the role of DNA transcriptional regulation. For example, we identified stop-gain eQTL variants likely acting through nonsense-mediated decay in cellular eQTL that had no effect in nuclear eQTL. Cellular eQTL were enriched for loci with multiple causal variants in linkage disequilibrium within the transcribed regions, where they may in concert affect RNA stability. We also identified examples of nuclear eQTL variants within enhancers that had no effect in cellular eQTL. We show that such eQTL (e.g., TUBGCP4) sometimes uniquely colocalize with disease alleles (schizophrenia). This study reveals key differences in the genetic mechanisms of cellular and nuclear eQTL.
]]></description>
<dc:creator>Sakaue, S.</dc:creator>
<dc:creator>Accelerating Medicines Partnership: RA/SLE Network,</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:date>2025-02-26</dc:date>
<dc:identifier>doi:10.1101/2025.02.24.639351</dc:identifier>
<dc:title><![CDATA[Early and late RNA eQTL are driven by different genetic mechanisms]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-02-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.01.06.631510v1?rss=1">
<title>
<![CDATA[
Wnt signaling drives stromal inflammation in inflammatory arthritis 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.01.06.631510v1?rss=1"
</link>
<description><![CDATA[
The concept that fibroblasts are critical mediators of inflammation is an emerging paradigm. In rheumatoid arthritis (RA), they are the main producers of IL-6 as well as a host of other cytokines and chemokines. Their pathologic activation also directly causes cartilage and bone degradation. Yet, therapeutic agents specifically targeting fibroblasts are not available. Here, we find that Wnt receptors and modulators are predominantly expressed in stromal populations in the synovium. Importantly, non-canonical Wnt activation induces robust inflammatory gene expression including an abundance of cytokines and chemokines in synovial fibroblasts in vitro. Strikingly, the addition of Wnt ligands or inhibition of Wnt secretion exacerbates or reduces arthritis severity, respectively, in vivo in a murine model of inflammatory arthritis. These observations are relevant in human disease, as Wnt activation signatures are enhanced in fibroblasts derived from inflamed RA synovial tissue as well as fibroblasts across other inflammatory diseases. Together, these findings implicate Wnt signaling as a major driver of fibroblast-mediated inflammation and joint pathology. They further suggest that targeting the Wnt pathway is a therapeutically relevant approach to rheumatoid arthritis, particularly in patients who do not respond to conventional treatments and who often express fibroblast-predominant synovial phenotypes.
]]></description>
<dc:creator>Mueller, A. A.</dc:creator>
<dc:creator>Zou, A. E.</dc:creator>
<dc:creator>Marsh, L.-J.</dc:creator>
<dc:creator>Kemble, S.</dc:creator>
<dc:creator>Nayar, S.</dc:creator>
<dc:creator>Watts, G. F. M.</dc:creator>
<dc:creator>Murphy, C. L.</dc:creator>
<dc:creator>Taylor, E.</dc:creator>
<dc:creator>Major, T.</dc:creator>
<dc:creator>Gardner, D.</dc:creator>
<dc:creator>Buckley, C. D.</dc:creator>
<dc:creator>Wei, K.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:creator>Korsunsky, I.</dc:creator>
<dc:creator>Filer, A.</dc:creator>
<dc:creator>Croft, A. P.</dc:creator>
<dc:creator>Brenner, M. B.</dc:creator>
<dc:date>2025-01-08</dc:date>
<dc:identifier>doi:10.1101/2025.01.06.631510</dc:identifier>
<dc:title><![CDATA[Wnt signaling drives stromal inflammation in inflammatory arthritis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-01-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.02.13.580158v1?rss=1">
<title>
<![CDATA[
Joint, multifaceted genomic analysis enables diagnosis of diverse, ultra-rare monogenic presentations 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.02.13.580158v1?rss=1"
</link>
<description><![CDATA[
Genomics for rare disease diagnosis has advanced at a rapid pace due to our ability to perform "N-of-1" analyses on individual patients with ultra-rare diseases. The increasing sizes of ultra-rare disease cohorts internationally newly enables cohort-wide analyses for new discoveries, but well-calibrated statistical genetics approaches for jointly analyzing these patients are still under development.1,2 The Undiagnosed Diseases Network (UDN) brings multiple clinical, research and experimental centers under the same umbrella across the United States to facilitate and scale N-of-1 analyses. Here, we present the first joint analysis of whole genome sequencing data of UDN patients across the network. We introduce new, well-calibrated statistical methods for prioritizing disease genes with de novo recurrence and compound heterozygosity. We also detect pathways enriched with candidate and known diagnostic genes. Our computational analysis, coupled with a systematic clinical review, recapitulated known diagnoses and revealed new disease associations. We further release a software package, RaMeDiES, enabling automated cross-analysis of deidentified sequenced cohorts for new diagnostic and research discoveries. Gene-level findings and variant-level information across the cohort are available in a public-facing browser (https://dbmi-bgm.github.io/udn-browser/). These results show that N-of-1 efforts should be supplemented by a joint genomic analysis across cohorts.
]]></description>
<dc:creator>Kobren, S. N.</dc:creator>
<dc:creator>Moldovan, M. A.</dc:creator>
<dc:creator>Reimers, R.</dc:creator>
<dc:creator>Traviglia, D.</dc:creator>
<dc:creator>Li, X.</dc:creator>
<dc:creator>Barnum, D.</dc:creator>
<dc:creator>Veit, A.</dc:creator>
<dc:creator>Willett, J.</dc:creator>
<dc:creator>Berselli, M.</dc:creator>
<dc:creator>Ronchetti, W.</dc:creator>
<dc:creator>Sherwood, R.</dc:creator>
<dc:creator>Krier, J.</dc:creator>
<dc:creator>Kohane, I. S.</dc:creator>
<dc:creator>Undiagnosed Diseases Network,</dc:creator>
<dc:creator>Sunyaev, S. R.</dc:creator>
<dc:date>2024-02-16</dc:date>
<dc:identifier>doi:10.1101/2024.02.13.580158</dc:identifier>
<dc:title><![CDATA[Joint, multifaceted genomic analysis enables diagnosis of diverse, ultra-rare monogenic presentations]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-02-16</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.05.03.592310v1?rss=1">
<title>
<![CDATA[
Reproducible single cell annotation of programs underlying T-cell subsets, activation states, and functions 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.05.03.592310v1?rss=1"
</link>
<description><![CDATA[
T-cells recognize antigens and induce specialized gene expression programs (GEPs) enabling functions including proliferation, cytotoxicity, and cytokine production. Traditionally, different classes of helper T-cells express mutually exclusive responses - for example, Th1, Th2, and Th17 programs. However, new single-cell RNA sequencing (scRNA-Seq) experiments have revealed a continuum of T-cell states without discrete clusters corresponding to these subsets, implying the need for new analytical frameworks. Here, we advance the characterization of T-cells with T-CellAnnoTator (TCAT), a pipeline that simultaneously quantifies pre-defined GEPs capturing activation states and cellular subsets. From 1,700,000 T-cells from 700 individuals across 38 tissues and five diverse disease contexts, we discover 46 reproducible GEPs reflecting the known core functions of T-cells including proliferation, cytotoxicity, exhaustion, and T helper effector states. We experimentally characterize several novel activation programs and apply TCAT to describe T-cell activation and exhaustion in Covid-19 and cancer, providing insight into T-cell function in these diseases.
]]></description>
<dc:creator>Kotliar, D.</dc:creator>
<dc:creator>Curtis, M.</dc:creator>
<dc:creator>Agnew, R.</dc:creator>
<dc:creator>Weinand, K.</dc:creator>
<dc:creator>Nathan, A.</dc:creator>
<dc:creator>Baglaenko, Y.</dc:creator>
<dc:creator>Zhao, Y.</dc:creator>
<dc:creator>Sabeti, P. C.</dc:creator>
<dc:creator>Rao, D. A.</dc:creator>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:date>2024-05-05</dc:date>
<dc:identifier>doi:10.1101/2024.05.03.592310</dc:identifier>
<dc:title><![CDATA[Reproducible single cell annotation of programs underlying T-cell subsets, activation states, and functions]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-05-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.02.20.581100v1?rss=1">
<title>
<![CDATA[
Modeling heterogeneity in single-cell perturbation states enhances detection of response eQTLs 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.02.20.581100v1?rss=1"
</link>
<description><![CDATA[
Identifying response expression quantitative trait loci (reQTLs) can help to elucidate mechanisms of disease associations. Typically, such studies model the effect of perturbation as discrete conditions. However, perturbation experiments usually affect perturbed cells heterogeneously. We demonstrated that modeling of per-cell perturbation state enhances power to detect reQTLs. We use public single-cell peripheral blood mononuclear cell (PBMC) data, to study the effect of perturbations with Influenza A virus (IAV), Candida albicans (CA), Pseudomonas aeruginosa (PA), and Mycobacterium tuberculosis (MTB) on gene regulation. We found on average 36.9% more reQTLs by accounting for single cell heterogeneity compared to the standard discrete reQTL model. For example, we detected a decrease in the eQTL effect of rs11721168 for PXK in IAV. Furthermore, we found that on average of 25% reQTLs have cell-type-specific effects. For example, in IAV the increase of the eQTL effect of rs10774671 for OAS1 was stronger in CD4+T and B cells. Similarly, in all four perturbation experiments, the reQTL effect for RPS26 was stronger in B cells. Our work provides a general model for more accurate reQTL identification and underscores the value of modeling cell-level variation.
]]></description>
<dc:creator>Raychaudhuri, S.</dc:creator>
<dc:creator>Valencia, C.</dc:creator>
<dc:creator>Nathan, A.</dc:creator>
<dc:creator>Kang, J. B.</dc:creator>
<dc:creator>Rumker, L.</dc:creator>
<dc:creator>Lee, H.</dc:creator>
<dc:date>2024-02-22</dc:date>
<dc:identifier>doi:10.1101/2024.02.20.581100</dc:identifier>
<dc:title><![CDATA[Modeling heterogeneity in single-cell perturbation states enhances detection of response eQTLs]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-02-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.02.14.705914v1?rss=1">
<title>
<![CDATA[
Ancestry-specific performance of variant effect predictors in clinical variant classification 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.02.14.705914v1?rss=1"
</link>
<description><![CDATA[
Predicting the effects of genetic variants and assessing prediction performance are key computational tasks in genomic medicine. It has been shown that well-calibrated variant effect predictors can be reliably used as evidence towards establishing pathogenicity (or benignity) of missense variants, thereby rendering these variants suitable for use in (or exclusion from) the genetic diagnosis of rare Mendelian conditions. However, most predictors have been trained or calibrated on data that may not be sufficiently representative to lead to similar performance across all genetic ancestries. This raises questions about the responsible deployment of these tools to improve human health. To better understand the utility of computational predictors, we set out to assess their ancestry-specific performance in terms of accuracy and evidence strength according to the ACMG/AMP guidelines. First, we determined that the expected count of rare variants in an individuals genome and the allele frequency distribution of these variants are the key confounders when evaluating a predictors performance across different genetic ancestries. Second, we found that a predictors accuracy itself inversely correlates with the allele frequency of the rare variant. After stratifying according to allele frequency, we show that established methods for predicting the pathogenicity of missense variants have comparable performance levels across major ancestry groups. Our results therefore support the wide deployment of such models in the context of genetic diagnosis and related applications.
]]></description>
<dc:creator>Hoffing, R.</dc:creator>
<dc:creator>Zeiberg, D.</dc:creator>
<dc:creator>Stenton, S. L.</dc:creator>
<dc:creator>Mort, M.</dc:creator>
<dc:creator>Cooper, D. N.</dc:creator>
<dc:creator>Hahn, M. W.</dc:creator>
<dc:creator>O'Donnell-Luria, A.</dc:creator>
<dc:creator>Ward, L. D.</dc:creator>
<dc:creator>Radivojac, P.</dc:creator>
<dc:date>2026-02-17</dc:date>
<dc:identifier>doi:10.64898/2026.02.14.705914</dc:identifier>
<dc:title><![CDATA[Ancestry-specific performance of variant effect predictors in clinical variant classification]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-02-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.09.17.611902v1?rss=1">
<title>
<![CDATA[
Calibration of additional computational tools expands ClinGen recommendation options for variant classification with PP3/BP4 criteria 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.09.17.611902v1?rss=1"
</link>
<description><![CDATA[
PurposeWe previously developed an approach to calibrate computational tools for clinical variant classification, updating recommendations for the reliable use of variant impact predictors to provide evidence strength up to Strong. A new generation of tools using distinctive approaches have since been released, and these methods must be independently calibrated for clinical application.

MethodUsing our local posterior probability-based calibration and our established data set of ClinVar pathogenic and benign variants, we determined the strength of evidence provided by three new tools (AlphaMissense, ESM1b, VARITY) and calibrated scores meeting each evidence strength. Results

All three tools reached the Strong level of evidence for variant pathogenicity and Moderate for benignity, though sometimes for few variants. Compared to previously recommended tools, these yielded at best only modest improvements in the tradeoffs of evidence strength and false positive predictions.

ConclusionAt calibrated thresholds, three new computational predictors provided evidence for variant pathogenicity at similar strength to the four previously recommended predictors (and comparable with functional assays for some variants). This calibration broadens the scope of computational tools for application in clinical variant classification. Their new approaches offer promise for future advancement of the field.
]]></description>
<dc:creator>Bergquist, T.</dc:creator>
<dc:creator>Stenton, S. L.</dc:creator>
<dc:creator>Nadeau, E. A. W.</dc:creator>
<dc:creator>Byrne, A. B.</dc:creator>
<dc:creator>Greenblatt, M. S.</dc:creator>
<dc:creator>Harrison, S. M.</dc:creator>
<dc:creator>Tavtigian, S. V.</dc:creator>
<dc:creator>O'Donnell-Luria, A.</dc:creator>
<dc:creator>Biesecker, L. G.</dc:creator>
<dc:creator>Radivojac, P.</dc:creator>
<dc:creator>Brenner, S. E.</dc:creator>
<dc:creator>Pejaver, V.</dc:creator>
<dc:creator>ClinGen Sequence Variant Interpretation Working Group,</dc:creator>
<dc:date>2024-09-21</dc:date>
<dc:identifier>doi:10.1101/2024.09.17.611902</dc:identifier>
<dc:title><![CDATA[Calibration of additional computational tools expands ClinGen recommendation options for variant classification with PP3/BP4 criteria]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-09-21</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.05.16.594558v1?rss=1">
<title>
<![CDATA[
Evaluation of enzyme activity predictions for variants of unknown significance in Arylsulfatase A 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.05.16.594558v1?rss=1"
</link>
<description><![CDATA[
Continued advances in variant effect prediction are necessary to demonstrate the ability of machine learning methods to accurately determine the clinical impact of variants of unknown significance (VUS). Towards this goal, the ARSA Critical Assessment of Genome Interpretation (CAGI) challenge was designed to characterize progress by utilizing 219 experimentally assayed missense VUS in the Arylsulfa-tase A (ARSA) gene to assess the performance of community-submitted predictions of variant functional effects. The challenge involved 15 teams, and evaluated additional predictions from established and recently released models. Notably, a model developed by participants of a genetics and coding bootcamp, trained with standard machine-learning tools in Python, demonstrated superior performance among sub-missions. Furthermore, the study observed that state-of-the-art deep learning methods provided small but statistically significant improvement in predictive performance compared to less elaborate techniques. These findings underscore the utility of variant effect prediction, and the potential for models trained with modest resources to accurately classify VUS in genetic and clinical research.
]]></description>
<dc:creator>Jain, S.</dc:creator>
<dc:creator>Trinidad, M.</dc:creator>
<dc:creator>Nguyen, T. B.</dc:creator>
<dc:creator>Jones, K.</dc:creator>
<dc:creator>Diaz Neto, S.</dc:creator>
<dc:creator>Ge, F.</dc:creator>
<dc:creator>Glagovsky, A.</dc:creator>
<dc:creator>Jones, C.</dc:creator>
<dc:creator>Moran, G.</dc:creator>
<dc:creator>Wang, B.</dc:creator>
<dc:creator>Rahimi, K.</dc:creator>
<dc:creator>Zeynep Calici, S.</dc:creator>
<dc:creator>Cedillo, L. R.</dc:creator>
<dc:creator>Berardelli, S.</dc:creator>
<dc:creator>Ozden, B.</dc:creator>
<dc:creator>Chen, K.</dc:creator>
<dc:creator>Katsonis, P.</dc:creator>
<dc:creator>Williams, A.</dc:creator>
<dc:creator>Lichtarge, O.</dc:creator>
<dc:creator>Rana, S.</dc:creator>
<dc:creator>Pradhan, S.</dc:creator>
<dc:creator>Srinivasan, R.</dc:creator>
<dc:creator>Sajeed, R.</dc:creator>
<dc:creator>Joshi, D.</dc:creator>
<dc:creator>Faraggi, E.</dc:creator>
<dc:creator>Jernigan, R.</dc:creator>
<dc:creator>Kloczkowski, A.</dc:creator>
<dc:creator>Xu, J.</dc:creator>
<dc:creator>Song, Z.</dc:creator>
<dc:creator>Ozkan, S.</dc:creator>
<dc:creator>Padilla, N.</dc:creator>
<dc:creator>de la Cruz, X.</dc:creator>
<dc:creator>Acuna-Hidalgo, R.</dc:creator>
<dc:creator>Grafmuller, A.</dc:creator>
<dc:creator>Jimenez Barron, L. T.</dc:creator>
<dc:creator>Manfredi, M.</dc:creator>
<dc:creator>Savojardo, C.</dc:creator>
<dc:creator>Babbi, G.</dc:creator>
<dc:creator>Martelli, P. L.</dc:creator>
<dc:creator>Casadio, R.</dc:creator>
<dc:creator>Sun, Y.</dc:creator>
<dc:creator>Zhu, S.</dc:creator>
<dc:creator>Shen, Y.</dc:creator>
<dc:creator>Pucci, F.</dc:creator>
<dc:creator>Rooman, M.</dc:creator>
<dc:creator>Cia, G.</dc:creator>
<dc:creator>R</dc:creator>
<dc:date>2024-05-19</dc:date>
<dc:identifier>doi:10.1101/2024.05.16.594558</dc:identifier>
<dc:title><![CDATA[Evaluation of enzyme activity predictions for variants of unknown significance in Arylsulfatase A]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-05-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.06.06.597828v1?rss=1">
<title>
<![CDATA[
Critical assessment of missense variant effect predictors on disease-relevant variant data 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.06.06.597828v1?rss=1"
</link>
<description><![CDATA[
Regular, systematic, and independent assessment of computational tools used to predict the pathogenicity of missense variants is necessary to evaluate their clinical and research utility and suggest directions for future improvement. Here, as part of the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, we assess missense variant effect predictors (or variant impact predictors) on an evaluation dataset of rare missense variants from disease-relevant databases. Our assessment evaluates predictors submitted to the CAGI6 Annotate-All-Missense challenge, predictors commonly used by the clinical genetics community, and recently developed deep learning methods for variant effect prediction. To explore a variety of settings that are relevant for different clinical and research applications, we assess performance within different subsets of the evaluation data and within high-specificity and high-sensitivity regimes. We find strong performance of many predictors across multiple settings. Meta-predictors tend to outperform their constituent individual predictors; however, several individual predictors have performance similar to that of commonly used meta-predictors. The relative performance of predictors differs in high-specificity and high-sensitivity regimes, suggesting that different methods may be best suited to different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors supervised on pathogenicity labels from curated variant databases often learn label imbalances within genes. Overall, we find notable advances over the oldest and most cited missense variant effect predictors and continued improvements among the most recently developed tools, and the CAGI Annotate-All-Missense challenge (also termed the Missense Marathon) will continue to assess state-of-the-art methods as the field progresses. Together, our results help illuminate the current clinical and research utility of missense variant effect predictors and identify potential areas for future development.
]]></description>
<dc:creator>Rastogi, R.</dc:creator>
<dc:creator>Chung, R.</dc:creator>
<dc:creator>Li, S.</dc:creator>
<dc:creator>Li, C.</dc:creator>
<dc:creator>Lee, K.</dc:creator>
<dc:creator>Woo, J.</dc:creator>
<dc:creator>Kim, D.-W.</dc:creator>
<dc:creator>Keum, C.</dc:creator>
<dc:creator>Babbi, G.</dc:creator>
<dc:creator>Martelli, P. L.</dc:creator>
<dc:creator>Savojardo, C.</dc:creator>
<dc:creator>Casadio, R.</dc:creator>
<dc:creator>Chennen, K.</dc:creator>
<dc:creator>Weber, T.</dc:creator>
<dc:creator>Poch, O.</dc:creator>
<dc:creator>Ancien, F.</dc:creator>
<dc:creator>Cia, G.</dc:creator>
<dc:creator>Pucci, F.</dc:creator>
<dc:creator>Raimondi, D.</dc:creator>
<dc:creator>Vranken, W.</dc:creator>
<dc:creator>Rooman, M.</dc:creator>
<dc:creator>Marquet, C.</dc:creator>
<dc:creator>Olenyi, T.</dc:creator>
<dc:creator>Rost, B.</dc:creator>
<dc:creator>Andreoletti, G.</dc:creator>
<dc:creator>Kamandula, A.</dc:creator>
<dc:creator>Peng, Y.</dc:creator>
<dc:creator>Bakolitsa, C.</dc:creator>
<dc:creator>Mort, M.</dc:creator>
<dc:creator>Cooper, D. N.</dc:creator>
<dc:creator>Bergquist, T.</dc:creator>
<dc:creator>Pejaver, V.</dc:creator>
<dc:creator>Liu, X.</dc:creator>
<dc:creator>Radivojac, P.</dc:creator>
<dc:creator>Brenner, S. E.</dc:creator>
<dc:creator>Ioannidis, N. M.</dc:creator>
<dc:date>2024-06-08</dc:date>
<dc:identifier>doi:10.1101/2024.06.06.597828</dc:identifier>
<dc:title><![CDATA[Critical assessment of missense variant effect predictors on disease-relevant variant data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-06-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.04.11.588920v1?rss=1">
<title>
<![CDATA[
The landscape of regional missense mutational intolerance quantified from 125,748 exomes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.04.11.588920v1?rss=1"
</link>
<description><![CDATA[
Missense variants can have a range of functional impacts depending on factors such as the specific amino acid substitution and location within the gene. To interpret their deleteriousness, studies have sought to identify regions within genes that are specifically intolerant of missense variation. Here, we leverage the patterns of rare missense variation in 730,947 exome sequenced individuals in the Genome Aggregation Database (gnomAD v4.1.1) against a null mutational model to identify transcripts with regional differences in missense constraint. Missense-depleted regions are enriched for ClinVar pathogenic variants, de novo missense variants from individuals with neurodevelopmental disorders, and complex trait heritability. Following ClinGen calibration recommendations for the ACMG/AMP variant classification guidelines, we establish that variants within regions with <36% of their expected missense variation achieve moderate support for pathogenicity. We integrate this regional constraint measure into a missense deleteriousness metric (named MPC) that effectively stratifies rare and de novo missense variants in individuals with early-onset developmental conditions from controls. These results provide additional tools to aid in missense variant interpretation.
]]></description>
<dc:creator>Chao, K. R.</dc:creator>
<dc:creator>Wang, L.</dc:creator>
<dc:creator>Panchal, R.</dc:creator>
<dc:creator>Liao, C.</dc:creator>
<dc:creator>Abderrazzaq, H.</dc:creator>
<dc:creator>Ye, R.</dc:creator>
<dc:creator>Schultz, P.</dc:creator>
<dc:creator>Compitello, J.</dc:creator>
<dc:creator>Grant, R. H.</dc:creator>
<dc:creator>Kosmicki, J. A.</dc:creator>
<dc:creator>Weisburd, B.</dc:creator>
<dc:creator>Phu, W.</dc:creator>
<dc:creator>Wilson, M. W.</dc:creator>
<dc:creator>Laricchia, K. M.</dc:creator>
<dc:creator>Goodrich, J. K.</dc:creator>
<dc:creator>Goldstein, D.</dc:creator>
<dc:creator>Goldstein, J. I.</dc:creator>
<dc:creator>Vittal, C.</dc:creator>
<dc:creator>Poterba, T.</dc:creator>
<dc:creator>Baxter, S.</dc:creator>
<dc:creator>Watts, N. A.</dc:creator>
<dc:creator>Solomonson, M.</dc:creator>
<dc:creator>gnomAD consortium,</dc:creator>
<dc:creator>Tiao, G.</dc:creator>
<dc:creator>Rehm, H. L.</dc:creator>
<dc:creator>Neale, B. M.</dc:creator>
<dc:creator>Talkowski, M. E.</dc:creator>
<dc:creator>MacArthur, D. G.</dc:creator>
<dc:creator>O'Donnell-Luria, A.</dc:creator>
<dc:creator>Karczewski, K. J.</dc:creator>
<dc:creator>Radivojac, P.</dc:creator>
<dc:creator>Daly, M. J.</dc:creator>
<dc:creator>Samocha, K. E.</dc:creator>
<dc:date>2024-04-13</dc:date>
<dc:identifier>doi:10.1101/2024.04.11.588920</dc:identifier>
<dc:title><![CDATA[The landscape of regional missense mutational intolerance quantified from 125,748 exomes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-04-13</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.03.20.712202v1?rss=1">
<title>
<![CDATA[
A graph-based learning approach to predict the effects of gene perturbations on molecular phenotypes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.03.20.712202v1?rss=1"
</link>
<description><![CDATA[
MotivationLarge-scale gene knockdown/knockout screens have been used to gain insight into a wide array of phenotypes and biological processes. However, conducting such experiments is expensive and labor-intensive. In this work, we present a general graph-based machine-learning approach that can predict the effects of gene perturbations on molecular phenotypes of interest given some measured phenotypic effects of other gene perturbations. The motivation for learning models that can predict the effects of gene perturbations is fourfold. Such models can (1) predict effects for unmeasured genes in cases in which cost or technical barriers preclude perturbing every gene, (2) prioritize unmeasured genes or sets of genes for subsequent perturbation experiments, (3) hypothesize mechanisms that underlie the relationships between the perturbed genes and their effects, and (4) generalize to other unmeasured phenotypes of interest.

ResultsWe evaluate our approach by applying it, in conjunction with four different learning methods, to learn models for four varied phenotypes. Our empirical evaluation demonstrates that the learned models (1) show relatively high levels of predictive accuracy across the four phenotypes, (2) have better predictive accuracy than several standard baselines, (3) can often learn accurate models with small training sets, (4) benefit from having multiple sources of evidence in the input representation, (5) can, in many cases, transfer their predictive value to other phenotypes.

Data availabilityThe assembled data sets and source code for this work are available at: https://github.com/Craven-Biostat-Lab/graph-molecular-phenotype-prediction

Author summaryOne general approach for gaining insight into the genes involved in a specific biological process is to conduct an experiment in which individual genes are perturbed and the effect on the process is measured for each perturbation. Large-scale experiments of this type have provided important biological insights, but they are often expensive and labor-intensive to perform. As a result, it is not always feasible to measure the effects of perturbing every gene. In this article, we present a machine-learning approach to predicting the effects of gene perturbations using available experimental data and biological network information. Our method can estimate the effects of genes that have not yet been experimentally measured, helping researchers identify promising genes to study next. In addition, the models can suggest hypotheses about the molecular interactions that link genes to the biological process of interest. Approaches like this may help guide experimental studies and accelerate the discovery of gene-phenotype relationships.
]]></description>
<dc:creator>Jin, Y.</dc:creator>
<dc:creator>Sverchkov, Y.</dc:creator>
<dc:creator>Sushkova, A.</dc:creator>
<dc:creator>Ohtake, M.</dc:creator>
<dc:creator>Emfinger, C.</dc:creator>
<dc:creator>Craven, M.</dc:creator>
<dc:date>2026-03-23</dc:date>
<dc:identifier>doi:10.64898/2026.03.20.712202</dc:identifier>
<dc:title><![CDATA[A graph-based learning approach to predict the effects of gene perturbations on molecular phenotypes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-03-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.12.12.693799v1?rss=1">
<title>
<![CDATA[
SEEK-VEC: Robust Latent Structure Discovery via Ensemble Topic Modeling 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.12.12.693799v1?rss=1"
</link>
<description><![CDATA[
Count data are ubiquitous across many applications in which understanding hidden patterns, or latent structure, is of interest. Topic modeling is a powerful tool for detecting latent structure in count data. However, standard topic modeling methods are often constrained by their restrictive assumptions, susceptible to noise, and sensitive to misspecification of the number of topics, which is particularly of concern when analyzing non-text data. Here, we introduce SEEK-VEC (Spectral Ensembling of topic models with Eigenscore for K-agnostic Vocabulary Embedding and Classification), an ensemble framework for count data that integrates insights from multiple candidate topic models through a spectral ensembling procedure. This approach automatically reinforces signal and mitigates noise to generate a consensus low-dimensional embedding of the data. SEEK-VEC produces prioritization scores and grouping scores that enable variable classification, interactive pattern discovery, and model diagnostics. Through simulations, we demonstrate that SEEK-VEC is robust under realistic settings and outperforms state-of-the-art oracle methods, particularly when signal strength is weak. Applied to diverse real-world datasets, including self-reported psychopathology symptom data, food preference questionnaires, and single-cell transcriptomics, SEEK-VEC reveals latent structures that provide scientifically meaningful insights.
]]></description>
<dc:creator>Danning, R.</dc:creator>
<dc:creator>Ke, Z. T.</dc:creator>
<dc:creator>Ma, R.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:date>2025-12-14</dc:date>
<dc:identifier>doi:10.64898/2025.12.12.693799</dc:identifier>
<dc:title><![CDATA[SEEK-VEC: Robust Latent Structure Discovery via Ensemble Topic Modeling]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-12-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.02.02.636107v1?rss=1">
<title>
<![CDATA[
Worm Perturb-Seq: massively parallel whole-animal RNAi and RNA-seq 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.02.02.636107v1?rss=1"
</link>
<description><![CDATA[
The transcriptome provides a highly informative molecular phenotype to connect genotype to phenotype and is most frequently measured by RNA-sequencing (RNA-seq). Therefore, an ultimate goal is to perturb every gene and measure changes in the transcriptome. However, this remains challenging, especially in intact organisms due to different experimental and computational challenges. Here, we present  Worm Perturb-Seq (WPS), which provides high-resolution RNA-seq profiles for hundreds of replicate perturbations at a time in a living animal. WPS introduces multiple experimental advances that combine strengths of bulk and single cell RNA-seq, and that further provides an analytical framework, EmpirDE, that leverages the unique power of the large WPS datasets. EmpirDE identifies differentially expressed genes (DEGs) by using gene-specific empirical null distributions, rather than control conditions alone, thereby systematically removing technical biases and improving statistical rigor. We applied WPS to 103 Caenhorhabditis elegans nuclear hormone receptors (NHRs) to delineate a Gene Regulatory Network (GRN) and found that this GRN presents a striking  pairwise modularity where pairs of NHRs regulate shared target genes. We envision that the experimental and analytical advances of WPS should be useful not only for C. elegans, but will be broadly applicable to other models, including human cells.
]]></description>
<dc:creator>Zhang, H.</dc:creator>
<dc:creator>Li, X.</dc:creator>
<dc:creator>Song, D.</dc:creator>
<dc:creator>Yukselen, O.</dc:creator>
<dc:creator>Nanda, S.</dc:creator>
<dc:creator>Kucukural, A.</dc:creator>
<dc:creator>Li, J. J.</dc:creator>
<dc:creator>Garber, M.</dc:creator>
<dc:creator>Walhout, A. J. M.</dc:creator>
<dc:date>2025-02-03</dc:date>
<dc:identifier>doi:10.1101/2025.02.02.636107</dc:identifier>
<dc:title><![CDATA[Worm Perturb-Seq: massively parallel whole-animal RNAi and RNA-seq]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-02-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.10.14.618256v1?rss=1">
<title>
<![CDATA[
SPLENDID incorporates continuous genetic ancestry in biobank-scale data to improve polygenic risk prediction across diverse populations 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.10.14.618256v1?rss=1"
</link>
<description><![CDATA[
Polygenic risk scores are widely used in disease risk stratification, but their accuracy varies across diverse populations. Recent methods large-scale leverage multi-ancestry data to improve accuracy in under-represented populations but require labelling individuals by ancestry for prediction. This poses challenges for practical use, as clinical practices are typically not based on ancestry. We propose SPLENDID, a novel penalized regression framework for diverse biobank-scale data. Our method utilizes ancestry principal component interactions to model genetic ancestry as a continuum within a single prediction model for all ancestries, eliminating the need for discrete labels. In extensive simulations and analyses of 9 traits from the All of Us Research Program (N=224,364) and UK Biobank (N=340,140), SPLENDID significantly outperformed existing methods in prediction accuracy and model sparsity. By directly incorporating continuous genetic ancestry in model training, SPLENDID stands as a valuable tool for robust risk prediction across diverse populations and fairer clinical implementation.
]]></description>
<dc:creator>Chen, T.</dc:creator>
<dc:creator>Zhang, H.</dc:creator>
<dc:creator>Mazumder, R.</dc:creator>
<dc:creator>Lin, X.</dc:creator>
<dc:date>2024-10-17</dc:date>
<dc:identifier>doi:10.1101/2024.10.14.618256</dc:identifier>
<dc:title><![CDATA[SPLENDID incorporates continuous genetic ancestry in biobank-scale data to improve polygenic risk prediction across diverse populations]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-10-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.04.23.650307v1?rss=1">
<title>
<![CDATA[
cellSTAAR: Incorporating single-cell-sequencing-based functional data to boost power in rare variant association testing of non-coding regions 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.04.23.650307v1?rss=1"
</link>
<description><![CDATA[
Whole genome sequencing (WGS) studies have identified hundreds of millions of rare variants (RVs) and have enabled RV association tests (RVATs) of these variants with complex traits and diseases. Analysis of non-coding variants is challenged by the considerable variability in regulatory function which candidate Cis-Regulatory Elements (cCREs) exhibit across cell types. We propose cellSTAAR, which integrates WGS data with single-cell ATAC-seq data to capture variability in chromatin accessibility across cell types via the construction of cell-type-specific functional annotations and variant sets. To reflect the uncertainty in cCRE-gene linking, cellSTAAR also links cCREs to their target genes using an omnibus framework which aggregates results from a variety of popular linking approaches. We applied cellSTAAR on Freeze 8 (N = 60,000) of the NHLBI Trans-Omics for Precision Medicine (TOPMed) consortium data to four lipids phenotypes: LDL cholesterol, a binary variable corresponding to high LDL cholesterol, HDL cholesterol, and triglycerides. We also provide replication results for all four phenotypes using UK Biobank (N = 190,000). Evidence from simulation studies and our real data analysis demonstrates that cellSTAAR boosts power and improves interpretation of RVATs of cCREs.
]]></description>
<dc:creator>Van Buren, E.</dc:creator>
<dc:creator>Zhang, Y.</dc:creator>
<dc:creator>Li, X.</dc:creator>
<dc:creator>Selvaraj, M. S.</dc:creator>
<dc:creator>Li, Z.</dc:creator>
<dc:creator>Zhou, H.</dc:creator>
<dc:creator>Palmer, N. D.</dc:creator>
<dc:creator>Arnett, D. K.</dc:creator>
<dc:creator>Blangero, J.</dc:creator>
<dc:creator>Boerwinkle, E.</dc:creator>
<dc:creator>Cade, B. E.</dc:creator>
<dc:creator>Carlson, J. C.</dc:creator>
<dc:creator>Carson, A. P.</dc:creator>
<dc:creator>Chen, Y.-D. I.</dc:creator>
<dc:creator>Curran, J.</dc:creator>
<dc:creator>Duggirala, R.</dc:creator>
<dc:creator>Fornage, M.</dc:creator>
<dc:creator>Franceschini, N.</dc:creator>
<dc:creator>Graff, M.</dc:creator>
<dc:creator>Gu, C.</dc:creator>
<dc:creator>Guo, X.</dc:creator>
<dc:creator>He, J.</dc:creator>
<dc:creator>Heard-Cosa, N.</dc:creator>
<dc:creator>Hou, L.</dc:creator>
<dc:creator>Hung, Y.-J.</dc:creator>
<dc:creator>Kalyani, R. R.</dc:creator>
<dc:creator>Kardia, S. L. R.</dc:creator>
<dc:creator>Kooperberg, C.</dc:creator>
<dc:creator>Kral, B. G.</dc:creator>
<dc:creator>Lange, L.</dc:creator>
<dc:creator>Li, C.</dc:creator>
<dc:creator>Liu, S.</dc:creator>
<dc:creator>Lloyd-Jones, D.</dc:creator>
<dc:creator>Loos, R. J. F.</dc:creator>
<dc:creator>Manichaikul, A. W.</dc:creator>
<dc:creator>Martin, L. W.</dc:creator>
<dc:creator>Mathias, R.</dc:creator>
<dc:creator>Minster, R.</dc:creator>
<dc:creator>Mitchell, B. D.</dc:creator>
<dc:creator>Mychaleckyj, J. C.</dc:creator>
<dc:creator>Naseri, T.</dc:creator>
<dc:creator>North, K.</dc:creator>
<dc:creator>O'Connell, J.</dc:creator>
<dc:creator>Perry, J. A.</dc:creator>
<dc:creator>Peyse</dc:creator>
<dc:date>2025-04-26</dc:date>
<dc:identifier>doi:10.1101/2025.04.23.650307</dc:identifier>
<dc:title><![CDATA[cellSTAAR: Incorporating single-cell-sequencing-based functional data to boost power in rare variant association testing of non-coding regions]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-04-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2026.02.05.703637v1?rss=1">
<title>
<![CDATA[
Short-Context Regulatory DNA Language Models with Motif-Discovery Regularization 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2026.02.05.703637v1?rss=1"
</link>
<description><![CDATA[
Self-supervised DNA language models (DNALMs) are typically trained at massive scale on whole genomes and long contexts. However, regulatory sequence features are sparse, heterogeneous, and dominated by poorly conserved flexible syntax of short motifs, which can be difficult to learn from genome-wide self-supervision. As a result, annotation agnostic, long-context DNALMs struggle to learn regulatory syntax and can underperform simpler baseline models on key regulatory tasks. We therefore introduce ARSENAL, a short-context masked DNA language model trained on a functionally enriched regulatory corpus and augmented with a novel regularizer than that encourages motif discovery. ARSENAL improves recovery of diverse transcription factor motifs de novo and prediction of regulatory variant effects in the zero-shot setting compared to other DNALMs. Incorporating ARSENAL embeddings also improves supervised chromatin accessibility prediction over strong ab-initio baselines across multiple cell types and yields improved regulatory variant scoring. Finally, ARSENAL serves as a practical generative prior, enabling targeted regulatory sequence design under downstream functional constraints.

All code can be found at https://github.com/kundajelab/regulatory_lm, and models and data can be found at https://sageb.io/4ZpEnk
]]></description>
<dc:creator>Patel, A.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:date>2026-02-06</dc:date>
<dc:identifier>doi:10.64898/2026.02.05.703637</dc:identifier>
<dc:title><![CDATA[Short-Context Regulatory DNA Language Models with Motif-Discovery Regularization]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2026-02-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.12.19.695550v1?rss=1">
<title>
<![CDATA[
High false sign rates in transcriptome-wide association studies 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.12.19.695550v1?rss=1"
</link>
<description><![CDATA[
Transcriptome-wide association studies (TWAS) are widely used to identify genes involved in complex traits and to infer the direction of gene effects on traits. However, despite their popularity, it remains unclear how accurately TWAS recover the true direction of a genes effect on a trait. Here, we estimate the false sign rate (FSR) of TWAS for plasma proteins, leveraging the expectation that increased gene expression should generally increase protein expression. We then extend this framework to complex traits, where loss-of-function burden tests provide the expected direction-of-effect. In both analyses, we observe high discordance with expectations, with TWAS showing an FSR of 23% for plasma proteins and 33% for complex traits. While colocalization-based filtering reduced the FSR, substantial discordance remained, and with substantial loss of recall. However, when we restricted gene-direction assignments for plasma proteins to using only relevant tissues in combination with colocalization-based filtering, the FSR dropped to 11%, and to just 5% if we excluded brain-specific proteins. We propose that much of the sign discordance arises when eQTLs in non-trait-relevant tissues tag GWAS-associated haplotypes via distinct, tightly-linked regulatory variants, yielding spurious TWAS associations with the correct genes but with unreliable direction-of-effect. These findings show that TWAS-based direction-of-effect estimates should be interpreted with caution and raise concerns about the reliability of TWAS more broadly.
]]></description>
<dc:creator>Gerlach, P. A.</dc:creator>
<dc:creator>Milind, N.</dc:creator>
<dc:creator>Spence, J. P.</dc:creator>
<dc:creator>Pritchard, J. K.</dc:creator>
<dc:date>2025-12-20</dc:date>
<dc:identifier>doi:10.64898/2025.12.19.695550</dc:identifier>
<dc:title><![CDATA[High false sign rates in transcriptome-wide association studies]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-12-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.10.03.680360v1?rss=1">
<title>
<![CDATA[
Gradient-aware modeling advances AI-driven prediction of genetic perturbation effects 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.10.03.680360v1?rss=1"
</link>
<description><![CDATA[
Predicting the transcriptional effects of genetic perturbations across diverse contexts is a central challenge in functional genomics. While single-cell perturbational assays such as Perturb-seq have generated valuable datasets, exhaustively profiling all perturbations is infeasible, underscoring the need for predictive models. We present GARM (Gradient Aligned Regression with Multi-decoder), a machine learning (ML) framework that leverages gradient-aware supervision to capture both absolute and relative perturbational effects. Across multiple large-scale datasets, GARM consistently outperforms leading approaches--including GEARS, scGPT, and GenePert--in predicting responses to unseen perturbations within and across contexts. Complementing this, we show that widely used evaluation metrics substantially overestimate performance, allowing trivial models to appear predictive. To address this, we introduce perturbation-ranking criteria (PrtR) that better reflect model utility for experimental design. Finally, we provide insight into gene-specific predictability, revealing pathways and gene classes systematically easier or harder to predict, with implications for model development and biological interpretation. Together, these advances establish a unified methodological and conceptual framework that improves perturbation modeling, sets rigorous evaluation standards, and provides biological insight into gene-specific predictability in functional genomics.
]]></description>
<dc:creator>Jerby, L.</dc:creator>
<dc:creator>Zhu, D.</dc:creator>
<dc:date>2025-10-05</dc:date>
<dc:identifier>doi:10.1101/2025.10.03.680360</dc:identifier>
<dc:title><![CDATA[Gradient-aware modeling advances AI-driven prediction of genetic perturbation effects]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-10-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.08.12.669924v1?rss=1">
<title>
<![CDATA[
Regulatory network topology and the genetic architecture of gene expression 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.08.12.669924v1?rss=1"
</link>
<description><![CDATA[
In human populations, most of the genetic variance in gene expression can be attributed to trans-acting expression quantitative trait loci (eQTLs) spread across the genome. However, in practice it is difficult to discover these eQTLs, and their cumulative effects on gene expression and complex traits are yet to be fully understood. Here, we assess how properties of the genetic architecture of gene expression constrain the space of plausible gene regulatory networks. We describe a structured causal model of gene expression regulation and consider how it interacts with biologically relevant properties of the gene regulatory network to alter the genomic distribution of expression heritability. Under our model, we find that the genetic architecture of gene expression is shaped in large part by local network motifs and by hub regulators that shorten paths through the network and act as key sources of trans-acting variance. Further, simulated networks with an enrichment of motifs and hub regulators best recapitulate the distribution of cis and trans heritability of gene expression as measured in a recent twin study. Taken together, our results suggest that the architecture of gene expression is sparser and more pleiotropic across genes than would be suggested by naive models of regulatory networks, which has important implications for future studies of complex traits.
]]></description>
<dc:creator>Aguirre, M.</dc:creator>
<dc:creator>Spence, J. P.</dc:creator>
<dc:creator>Sella, G.</dc:creator>
<dc:creator>Pritchard, J. K.</dc:creator>
<dc:date>2025-08-13</dc:date>
<dc:identifier>doi:10.1101/2025.08.12.669924</dc:identifier>
<dc:title><![CDATA[Regulatory network topology and the genetic architecture of gene expression]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-08-13</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.06.05.658097v1?rss=1">
<title>
<![CDATA[
Robust self-supervised machine learning for single cell embeddings and annotations 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.06.05.658097v1?rss=1"
</link>
<description><![CDATA[
Dimensionality reduction and clustering are critical steps in single-cell and spatial genomics studies. Here, we show that existing dimensionality reduction and clustering methods suffer from: (1) overfitting to the dominant patterns while missing unique ones, which impairs the detection and annotation of rare cell types and states, and (2) fitting to technical noise over biological signal. To address this, we developed DR-GEM, a self-supervised meta-algorithm that combines principles in distributionally robust optimization with balanced consensus machine learning. DR-GEM supervises itself by (1) using the reconstruction error to identify and reorient its attention to samples/cells that are otherwise poorly embedded, and (2) using balanced consensus learning as a mechanism to increase robustness and mitigate the impact of low-quality samples/cells. Applied to synthetic and real-world single cell  omics data, single cell resolution spatial transcriptomics, and Perturb-seq datasets, DR-GEM markedly and consistently outperforms existing methods in obtaining reliable embeddings, recovering rare cell types, filtering noise, and uncovering the underlying biology. In summary, this study surfaces and addresses a gap in single cell genomics and brings self-supervision to the realm of dimensionality reduction and clustering to better support data-driven discoveries.
]]></description>
<dc:creator>Yeh, C. Y.</dc:creator>
<dc:creator>Sun, M. W.</dc:creator>
<dc:creator>Zhu, D.</dc:creator>
<dc:creator>Jerby, L.</dc:creator>
<dc:date>2025-06-08</dc:date>
<dc:identifier>doi:10.1101/2025.06.05.658097</dc:identifier>
<dc:title><![CDATA[Robust self-supervised machine learning for single cell embeddings and annotations]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-06-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.06.06.658175v1?rss=1">
<title>
<![CDATA[
Focus on single gene effects limits discovery and interpretation of complex trait-associated variants 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.06.06.658175v1?rss=1"
</link>
<description><![CDATA[
Standard QTL mapping approaches consider variant effects on a single gene at a time, despite abundant evidence for allelic pleiotropy, where a single variant can affect multiple genes simultaneously. While allelic pleiotropy describes variant effects on both local and distal genes or a mixture of molecular effects on a single gene, here we specifically investigate allelic expression "proxitropy": where a single variant influences the expression of multiple, neighboring genes. We introduce a multi-gene eQTL mapping framework--cis-principal component expression QTL (cis-pc eQTL or pcQTL)--to identify variants associated with shared axes of expression variation across a cluster of neighboring genes. We perform pcQTL mapping in 13 GTEx human tissues and discover novel loci undetected by single-gene approaches. In total, we identify an average of 1396 pcQTLs/tissue, 27% of which were not discovered by single-gene methods. These novel pcQTL colocalized with an additional 142 GWAS trait-associated variants and increased the number of colocalizations by 34% over single-gene QTL mapping. These findings highlight that moving beyond single-gene-at-a-time approaches toward multi-gene methods can offer a more comprehensive view of gene regulation and complex trait-associated variation.
]]></description>
<dc:creator>Lawrence, K. A.</dc:creator>
<dc:creator>Gjorgjieva, T.</dc:creator>
<dc:creator>Montgomery, S. B.</dc:creator>
<dc:date>2025-06-06</dc:date>
<dc:identifier>doi:10.1101/2025.06.06.658175</dc:identifier>
<dc:title><![CDATA[Focus on single gene effects limits discovery and interpretation of complex trait-associated variants]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-06-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.03.21.644686v1?rss=1">
<title>
<![CDATA[
Redirecting cytotoxic lymphocytes to breast cancer tumors via metabolite-sensing receptors 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.03.21.644686v1?rss=1"
</link>
<description><![CDATA[
Insufficient infiltration of cytotoxic lymphocytes to solid tumors limits the efficacy of immunotherapies and cell therapies. Here, we report a programmable mechanism to mobilize Natural Killer (NK) and T cells to breast cancer tumors by engineering these cells to express orphan and metabolite-sensing G protein-coupled receptors (GPCRs). First, in vivo and in vitro CRISPR activation screens in NK-92 cells identified GPR183, GPR84, GPR34, GPR18, FPR3, and LPAR2 as top enhancers of both tumor infiltration and chemotaxis to breast cancer. These genes equip NK and T cells with the ability to sense and migrate to chemoattracting metabolites such as 7,25-dihydroxycholesterol and other factors released from breast cancer. Based on Perturb-seq and functional investigations, GPR183 also enhances effector functions, such that engineering NK and CAR NK cells to express GPR183 enhances their ability to migrate to, infiltrate, and control breast cancer tumors. Our study uncovered metabolite-based tumor immune recruitment mechanisms, opening avenues for spatially targeted cell therapies.
]]></description>
<dc:creator>Kim, Y. M.</dc:creator>
<dc:creator>Akana, R.</dc:creator>
<dc:creator>Sun, C.</dc:creator>
<dc:creator>Laveroni, O.</dc:creator>
<dc:creator>Jerby, L.</dc:creator>
<dc:date>2025-03-25</dc:date>
<dc:identifier>doi:10.1101/2025.03.21.644686</dc:identifier>
<dc:title><![CDATA[Redirecting cytotoxic lymphocytes to breast cancer tumors via metabolite-sensing receptors]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-03-25</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.02.18.638922v1?rss=1">
<title>
<![CDATA[
Mapping the regulatory effects of common and rare non-coding variants across cellular and developmental contexts in the brain and heart 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.02.18.638922v1?rss=1"
</link>
<description><![CDATA[
Whole genome sequencing has identified over a billion non-coding variants in humans, while GWAS has revealed the non-coding genome as a significant contributor to disease. However, prioritizing causal common and rare non-coding variants in human disease, and understanding how selective pressures have shaped the non-coding genome, remains a significant challenge. Here, we predicted the effects of 15 million variants with deep learning models trained on single-cell ATAC-seq across 132 cellular contexts in adult and fetal brain and heart, producing nearly two billion context-specific predictions. Using these predictions, we distinguish candidate causal variants underlying human traits and diseases and their context-specific effects. While common variant effects are more cell-type-specific, rare variants exert more cell-type-shared regulatory effects, with selective pressures particularly targeting variants affecting fetal brain neurons. To prioritize de novo mutations with extreme regulatory effects, we developed FLARE, a context-specific functional genomic model of constraint. FLARE outperformed other methods in prioritizing case mutations from autism-affected families near syndromic autism-associated genes; for example, identifying mutation outliers near CNTNAP2 that would be missed by alternative approaches. Overall, our findings demonstrate the potential of integrating single-cell maps with population genetics and deep learning-based variant effect prediction to elucidate mechanisms of development and disease-ultimately, supporting the notion that genetic contributions to neurodevelopmental disorders are predominantly rare.
]]></description>
<dc:creator>Marderstein, A. R.</dc:creator>
<dc:creator>Kundu, S.</dc:creator>
<dc:creator>Padhi, E. M.</dc:creator>
<dc:creator>Deshpande, S.</dc:creator>
<dc:creator>Wang, A.</dc:creator>
<dc:creator>Robb, E.</dc:creator>
<dc:creator>Sun, Y.</dc:creator>
<dc:creator>Yun, C. M.</dc:creator>
<dc:creator>Pomales-Matos, D.</dc:creator>
<dc:creator>Xie, Y.</dc:creator>
<dc:creator>Nachun, D.</dc:creator>
<dc:creator>Jessa, S.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Montgomery, S. B.</dc:creator>
<dc:date>2025-02-19</dc:date>
<dc:identifier>doi:10.1101/2025.02.18.638922</dc:identifier>
<dc:title><![CDATA[Mapping the regulatory effects of common and rare non-coding variants across cellular and developmental contexts in the brain and heart]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-02-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.01.22.634424v1?rss=1">
<title>
<![CDATA[
Causal modeling of gene effects from regulators to programs to traits: integration of genetic associations and Perturb-seq 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.01.22.634424v1?rss=1"
</link>
<description><![CDATA[
AbstractGenetic association studies provide a unique tool for identifying causal links from genes to human traits and diseases. However, it is challenging to determine the biological mechanisms underlying most associations, and we lack genome-scale approaches for inferring causal mechanistic pathways from genes to cellular functions to traits. Here we propose new approaches to bridge this gap by combining quantitative estimates of gene-trait relationships from loss-of-function burden tests with gene-regulatory connections inferred from Perturb-seq experiments in relevant cell types. By combining these two forms of data, we aim to build causal graphs in which the directional associations of genes with a trait can be explained by their regulatory effects on biological programs or direct effects on the trait. As a proof-of-concept, we constructed a causal graph of the gene regulatory hierarchy that jointly controls three partially co-regulated blood traits. We propose that perturbation studies in trait-relevant cell types, coupled with gene-level effect sizes for traits, can bridge the gap between genetics and biology.
]]></description>
<dc:creator>Ota, M.</dc:creator>
<dc:creator>Spence, J. P.</dc:creator>
<dc:creator>Zeng, T.</dc:creator>
<dc:creator>Dann, E.</dc:creator>
<dc:creator>Marson, A.</dc:creator>
<dc:creator>Pritchard, J. K.</dc:creator>
<dc:date>2025-01-24</dc:date>
<dc:identifier>doi:10.1101/2025.01.22.634424</dc:identifier>
<dc:title><![CDATA[Causal modeling of gene effects from regulators to programs to traits: integration of genetic associations and Perturb-seq]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-01-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.12.25.630221v1?rss=1">
<title>
<![CDATA[
ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.12.25.630221v1?rss=1"
</link>
<description><![CDATA[
Despite extensive mapping of cis-regulatory elements (cREs) across cellular contexts with chromatin accessibility assays, the sequence syntax and genetic variants that regulate transcription factor (TF) binding and chromatin accessibility at context-specific cREs remain elusive. We introduce ChromBPNet, a deep learning DNA sequence model of base-resolution accessibility profiles that detects, learns and deconvolves assay-specific enzyme biases from regulatory sequence determinants of accessibility, enabling robust discovery of compact TF motif lexicons, cooperative motif syntax and precision footprints across assays and sequencing depths. Extensive benchmarks show that ChromBPNet, despite its lightweight design, is competitive with much larger contemporary models at predicting variant effects on chromatin accessibility, pioneer TF binding and reporter activity across assays, cell contexts and ancestry, while providing interpretation of disrupted regulatory syntax. ChromBPNet also helps prioritize and interpret regulatory variants that influence complex traits and rare diseases, thereby providing a powerful lens to decode regulatory DNA and genetic variation.
]]></description>
<dc:creator>Pampari, A.</dc:creator>
<dc:creator>Shcherbina, A.</dc:creator>
<dc:creator>Kvon, E.</dc:creator>
<dc:creator>Kosicki, M.</dc:creator>
<dc:creator>Nair, S.</dc:creator>
<dc:creator>Kundu, S.</dc:creator>
<dc:creator>Kathiria, A. S.</dc:creator>
<dc:creator>Risca, V. I.</dc:creator>
<dc:creator>Kuningas, K.</dc:creator>
<dc:creator>Alasoo, K.</dc:creator>
<dc:creator>Greenleaf, W.</dc:creator>
<dc:creator>Pennacchio, L.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:date>2024-12-25</dc:date>
<dc:identifier>doi:10.1101/2024.12.25.630221</dc:identifier>
<dc:title><![CDATA[ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-12-25</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.12.12.628073v1?rss=1">
<title>
<![CDATA[
Specificity, length, and luck: How genes are prioritized by rare and common variant association studies 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.12.12.628073v1?rss=1"
</link>
<description><![CDATA[
Standard genome-wide association studies (GWAS) and rare variant burden tests are essential tools for identifying trait-relevant genes. Although these methods are conceptually similar, we show by analyzing association studies of 209 quantitative traits in the UK Biobank that they systematically prioritize different genes. This raises the question of how genes should ideally be prioritized. We propose two prioritization criteria: 1) trait importance -- how much a gene quantitatively affects a trait; and 2) trait specificity -- a genes importance for the trait under study relative to its importance across all traits. We find that GWAS prioritize genes near trait-specific variants, while burden tests prioritize trait-specific genes. Because non-coding variants can be context specific, GWAS can prioritize highly pleiotropic genes, while burden tests generally cannot. Both study designs are also affected by distinct trait-irrelevant factors, complicating their interpretation. Our results illustrate that burden tests and GWAS reveal different aspects of trait biology and suggest ways to improve their interpretation and usage.
]]></description>
<dc:creator>Spence, J. P.</dc:creator>
<dc:creator>Mostafavi, H.</dc:creator>
<dc:creator>Ota, M.</dc:creator>
<dc:creator>Milind, N.</dc:creator>
<dc:creator>Gjorgjieva, T.</dc:creator>
<dc:creator>Smith, C. J.</dc:creator>
<dc:creator>Simons, Y. B.</dc:creator>
<dc:creator>Sella, G.</dc:creator>
<dc:creator>Pritchard, J. K.</dc:creator>
<dc:date>2024-12-16</dc:date>
<dc:identifier>doi:10.1101/2024.12.12.628073</dc:identifier>
<dc:title><![CDATA[Specificity, length, and luck: How genes are prioritized by rare and common variant association studies]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-12-16</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.12.01.621925v1?rss=1">
<title>
<![CDATA[
Directionality of Transcriptional Regulatory Elements 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.12.01.621925v1?rss=1"
</link>
<description><![CDATA[
Divergent transcription is a critical marker of active transcriptional regulatory elements (TREs), including enhancers and promoters, in mammals. However, distal elements with unidirectional transcriptional patterns are often overlooked, leaving their identity and function poorly understood. Here, we performed a systematic comparison between divergent and unidirectional elements, revealing their distinct architectural and functional features. Our analysis also shows that unidirectional elements have younger sequence ages and are under weaker evolutionary constraints than divergent elements, indicating that they may represent a unique category of genomic regulatory function with more recent origins. Notably, we observed that some transcription factors, including CTCF, AP1, SP, and NFY, exhibit dual roles in modulating the directionality of TREs, either activating or repressing nascent transcription in a position-dependent manner. Overall, the elucidation of directionality enhances our understanding of the diverse architectural models, functional features, evolutionary dynamics, and regulatory logic of TREs.
]]></description>
<dc:creator>Chen, Y.</dc:creator>
<dc:creator>Shah, S. R.</dc:creator>
<dc:creator>Leung, A.</dc:creator>
<dc:creator>Paramo, M. I.</dc:creator>
<dc:creator>Cochran, K.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Clark, A. G.</dc:creator>
<dc:creator>Lis, J. T.</dc:creator>
<dc:creator>Yu, H.</dc:creator>
<dc:date>2024-12-02</dc:date>
<dc:identifier>doi:10.1101/2024.12.01.621925</dc:identifier>
<dc:title><![CDATA[Directionality of Transcriptional Regulatory Elements]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-12-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.05.28.596138v1?rss=1">
<title>
<![CDATA[
Dissecting the cis-regulatory syntax of transcription initiation with deep learning 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.05.28.596138v1?rss=1"
</link>
<description><![CDATA[
Despite extensive characterization of mammalian Pol II transcription, the DNA sequence determinants of transcription initiation at a third of human promoters and most enhancers remain poorly understood. We trained and interpreted a neural network called ProCapNet that accurately models base-resolution initiation profiles from PRO-cap experiments using local DNA sequence. ProCapNet learns sequence motifs with distinct effects on initiation rates and TSS positioning and uncovers context-specific cryptic initiator elements intertwined within other TF motifs. ProCapNet annotates predictive motifs in nearly all actively transcribed regulatory elements across multiple cell-lines, revealing a shared cis-regulatory logic across promoters and enhancers and a highly epistatic sequence syntax of cooperative and competitive motif interactions. ProCapNet models of steady-state RAMPAGE profiles distill initiation signals on par with models trained directly on PRO-cap profiles. ProCapNet learns a largely cell-type-agnostic cis-regulatory code of initiation complementing sequence drivers of cell-type-specific chromatin state critical for accurate prediction of cell-type-specific transcription initiation.
]]></description>
<dc:creator>Cochran, K.</dc:creator>
<dc:creator>Yin, M.</dc:creator>
<dc:creator>Mantripragada, A.</dc:creator>
<dc:creator>Schreiber, J.</dc:creator>
<dc:creator>Marinov, G. K.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:date>2024-06-01</dc:date>
<dc:identifier>doi:10.1101/2024.05.28.596138</dc:identifier>
<dc:title><![CDATA[Dissecting the cis-regulatory syntax of transcription initiation with deep learning]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-06-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.07.04.602130v1?rss=1">
<title>
<![CDATA[
Gene regulatory network structure informs the distribution of perturbation effects 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.07.04.602130v1?rss=1"
</link>
<description><![CDATA[
Gene regulatory networks (GRNs) govern many core developmental and biological processes underlying human complex traits. Even with broad-scale efforts to characterize the effects of molecular perturbations and interpret gene coexpression, it remains challenging to infer the architecture of gene regulation in a precise and efficient manner. Key properties of GRNs, like hierarchical structure, modular organization, and sparsity, provide both challenges and opportunities for this objective. Here, we seek to better understand properties of GRNs using a new approach to simulate their structure and model their function. We produce realistic network structures with a novel generating algorithm based on insights from small-world network theory, and we model gene expression regulation using stochastic differential equations formulated to accommodate modeling molecular perturbations. With these tools, we systematically describe the effects of gene knockouts within and across GRNs, finding a subset of networks that recapitulate features of a recent genome-scale perturbation study. With deeper analysis of these exemplar networks, we consider future avenues to map the architecture of gene expression regulation using data from cells in perturbed and unperturbed states, finding that while perturbation data are critical to discover specific regulatory interactions, data from unperturbed cells may be sufficient to reveal regulatory programs.
]]></description>
<dc:creator>Aguirre, M.</dc:creator>
<dc:creator>Spence, J. P.</dc:creator>
<dc:creator>Sella, G.</dc:creator>
<dc:creator>Pritchard, J. K.</dc:creator>
<dc:date>2024-07-05</dc:date>
<dc:identifier>doi:10.1101/2024.07.04.602130</dc:identifier>
<dc:title><![CDATA[Gene regulatory network structure informs the distribution of perturbation effects]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-07-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.09.06.611737v1?rss=1">
<title>
<![CDATA[
Mutagenesis Sensitivity Mapping of HumanEnhancers In Vivo 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.09.06.611737v1?rss=1"
</link>
<description><![CDATA[
Distant-acting enhancers are central to human development. However, our limited understanding of their functional sequence features prevents the interpretation of enhancer mutations in disease. Here, we determined the functional sensitivity to mutagenesis of human developmental enhancers in vivo. Focusing on seven enhancers active in the developing brain, heart, limb and face, we created over 1700 transgenic mice for over 260 mutagenized enhancer alleles. Systematic mutation of 12-basepair blocks collectively altered each sequence feature in each enhancer at least once. We show that 69% of all blocks are required for normal in vivo activity, with mutations more commonly resulting in loss (60%) than in gain (9%) of function. Using predictive modeling, we annotated critical nucleotides at base-pair resolution. The vast majority of motifs predicted by these machine learning models (88%) coincided with changes to in vivo function, and the models showed considerable sensitivity, identifying 59% of all functional blocks. Taken together, our results reveal that human enhancers contain a high density of sequence features required for their normal in vivo function and provide a rich resource for further exploration of human enhancer logic.
]]></description>
<dc:creator>Kosicki, M.</dc:creator>
<dc:creator>Zhang, B.</dc:creator>
<dc:creator>Pampari, A.</dc:creator>
<dc:creator>Akiyama, J. A.</dc:creator>
<dc:creator>Playzer-Frick, I.</dc:creator>
<dc:creator>Novak, C. S.</dc:creator>
<dc:creator>Tran, S.</dc:creator>
<dc:creator>Zhu, Y.</dc:creator>
<dc:creator>Kato, M.</dc:creator>
<dc:creator>Hunter, R. D.</dc:creator>
<dc:creator>von Maydell, K.</dc:creator>
<dc:creator>Barton, S.</dc:creator>
<dc:creator>Beckman, E.</dc:creator>
<dc:creator>Kundaje, A.</dc:creator>
<dc:creator>Dickel, D. E.</dc:creator>
<dc:creator>Visel, A.</dc:creator>
<dc:creator>Pennacchio, L. A.</dc:creator>
<dc:date>2024-09-08</dc:date>
<dc:identifier>doi:10.1101/2024.09.06.611737</dc:identifier>
<dc:title><![CDATA[Mutagenesis Sensitivity Mapping of HumanEnhancers In Vivo]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-09-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.06.17.599448v1?rss=1">
<title>
<![CDATA[
A model for accurate quantification of CRISPR effects in pooled FACS screens 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.06.17.599448v1?rss=1"
</link>
<description><![CDATA[
CRISPR screens are powerful tools to identify key genes that underlie biological processes. One important type of screen uses fluorescence activated cell sorting (FACS) to sort perturbed cells into bins based on the expression level of marker genes, followed by guide RNA (gRNA) sequencing. Analysis of these data presents several statistical challenges due to multiple factors including the discrete nature of the bins and typically small numbers of replicate experiments. To address these challenges, we developed a robust and powerful Bayesian random effects model and software package called Waterbear. Furthermore, we used Waterbear to explore how various experimental design parameters affect statistical power to establish principled guidelines for future screens. Finally, we experimentally validated our experimental design model findings that, when using Waterbear for analysis, high power is maintained even at low cell coverage and a high multiplicity of infection. We anticipate that Waterbear will be of broad utility for analyzing FACS-based CRISPR screens.
]]></description>
<dc:creator>Pimentel, H.</dc:creator>
<dc:creator>Freimer, J.</dc:creator>
<dc:creator>Arce, M. M.</dc:creator>
<dc:creator>Garrido, C. M.</dc:creator>
<dc:creator>Marson, A.</dc:creator>
<dc:creator>Pritchard, J. K.</dc:creator>
<dc:date>2024-06-18</dc:date>
<dc:identifier>doi:10.1101/2024.06.17.599448</dc:identifier>
<dc:title><![CDATA[A model for accurate quantification of CRISPR effects in pooled FACS screens]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-06-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.05.01.590171v1?rss=1">
<title>
<![CDATA[
regionalpcs: improved discovery of DNA methylation associations with complex traits 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.05.01.590171v1?rss=1"
</link>
<description><![CDATA[
We have developed the regional principal components (rPCs) method, a novel approach for summarizing gene-level methylation. rPCs address the challenge of deciphering complex epigenetic mechanisms in diseases like Alzheimers disease (AD). In contrast to traditional averaging, rPCs leverage principal components analysis to capture complex methylation patterns across gene regions. Our method demonstrated a 54% improvement in sensitivity over averaging in simulations, offering a robust framework for identifying subtle epigenetic variations. Applying rPCs to the AD brain methylation data in ROSMAP, combined with cell type deconvolution, we uncovered 838 differentially methylated genes associated with neuritic plaque burden--significantly outperforming conventional methods. Integrating methylation quantitative trait loci (meQTL) with genome-wide association studies (GWAS) identified 17 genes with potential causal roles in AD, including MS4A4A and PI-CALM. Our approach is available in the Bioconductor package regionalpcs, opening avenues for research and facilitating a deeper understanding of the epigenetic landscape in complex diseases.
]]></description>
<dc:creator>Eulalio, T.</dc:creator>
<dc:creator>Sun, M. W.</dc:creator>
<dc:creator>Gevaert, O.</dc:creator>
<dc:creator>Greicius, M. D.</dc:creator>
<dc:creator>Montine, T. J.</dc:creator>
<dc:creator>Nachun, D.</dc:creator>
<dc:creator>Montgomery, S. B.</dc:creator>
<dc:date>2024-05-01</dc:date>
<dc:identifier>doi:10.1101/2024.05.01.590171</dc:identifier>
<dc:title><![CDATA[regionalpcs: improved discovery of DNA methylation associations with complex traits]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-05-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.11.13.688264v1?rss=1">
<title>
<![CDATA[
Characterizing spatial functional microniches with SpaceTravLR 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.11.13.688264v1?rss=1"
</link>
<description><![CDATA[
The advent of spatial omics has revolutionized our understanding of tissue biology; however, these technologies remain largely descriptive and do not capture how changes in gene regulation propagate across spatial neighborhoods. While in-silico perturbation methods and foundation models aim to model the impact of genetic perturbations, these methods are limited to single-cell approaches that lack spatial resolution. Other studies can delineate morphological domains based on transcriptional similarity, but not spatial functional microniches. We address this major unmet need by developing SpaceTravLR (Spatially perturbing Transcription factors, Ligands and Receptors), a novel interpretable machine learning approach that generalizes across tissues and species, uncovering spatial features linked to functional outcomes, thereby capturing functional microniches with spatial resolution. SpaceTravLR infers how single or combinatorial genetic perturbations rewire signals across the tissue neighborhood, by propagating effects through underlying spatially resolved molecular networks, thereby modeling how perturbations can reshape both the targeted cell and its surrounding neighborhood. SpaceTravLR defines novel spatial microniches across a range of tissues at different scales of organization (niches, neighborhoods and tissues), disease and developmental contexts. SpaceTravLRs perturbation predictions are made solely from spatial omics data and closely align with experimental validation or known outcomes based on mechanistic studies. Critically, our approach enables the generation of mechanistic hypotheses underlying identified niches. We show SpaceTravLR discovered a novel mechanism for Ccr4 that drives the spatial location of a pathogenic population of allergen-specific T helper 2 (Th2) cells as they develop in the lymph node, which was experimentally validated in a murine model. Overall, SpaceTravLR provides a novel interpretable and experimentally validated framework for uncovering how genes act individually and combinatorially through cell-intrinsic and cell-extrinsic circuits to shape spatial tissue organization and function.
]]></description>
<dc:creator>Ramjattun, K.</dc:creator>
<dc:creator>Wang, A.</dc:creator>
<dc:creator>Lee, H.</dc:creator>
<dc:creator>Giri, S.</dc:creator>
<dc:creator>Chen, Y.</dc:creator>
<dc:creator>MacDonald, W. A.</dc:creator>
<dc:creator>Lord, N.</dc:creator>
<dc:creator>Poholek, A. C.</dc:creator>
<dc:creator>Lee, Y.</dc:creator>
<dc:creator>Das, J.</dc:creator>
<dc:date>2025-11-14</dc:date>
<dc:identifier>doi:10.1101/2025.11.13.688264</dc:identifier>
<dc:title><![CDATA[Characterizing spatial functional microniches with SpaceTravLR]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-11-14</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.08.19.670906v1?rss=1">
<title>
<![CDATA[
A unified network systems approach uncovers a core novel program underlying T follicular helper cell differentiation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.08.19.670906v1?rss=1"
</link>
<description><![CDATA[
T follicular helper (Tfh) cells are central to the adaptive immune response and exhibit remarkable functional diversity and plasticity. The complex nature of Tfh cell populations, inconsistent findings across experimental systems and potential differences across species have fueled ongoing debate regarding core regulatory pathways that govern Tfh differentiation. Many studies have experimentally investigated individual proteins and circuits involved in Tfh differentiation in limited contexts, each providing only a partial understanding of the process. To address this, we adopted a novel multi-scale network systems approach that incorporates both regulatory and protein-protein interactions. Our approach integrates diverse data types, captures regulation across multiple levels of immune system organization, and recapitulates known drivers. Further, we discover a core Tfh gene set that is conserved across tissue types and disease contexts, and is consistent across data modalities - bulk, single-cell and spatial. While components of this set have been individually reported, a novel aspect of our work lies in the discovery, characterization, and connectivity of this core signature using a single unbiased approach. Using this method, we also uncover a novel function of IL-12, a molecule with reported conflicting functions, in the regulation of Tfh differentiation. Notably, we find that, in both humans and mice, IL-12 is permissive for the differentiation of Tfh precursors, but blocks subsequent differentiation into GC Tfh cells. Overall, this work elucidates novel networks with unexplored roles in governing Tfh cell differentiation across species and tissues, paving the way for novel -therapeutic interventions.
]]></description>
<dc:creator>Omelchenko, A. A.</dc:creator>
<dc:creator>Rahman, S. A.</dc:creator>
<dc:creator>Viswanadham, V. V.</dc:creator>
<dc:creator>Yuen, G. J.</dc:creator>
<dc:creator>Del Rio Estrada, P. M.</dc:creator>
<dc:creator>D'Onofrio, V.</dc:creator>
<dc:creator>Chen, Y.</dc:creator>
<dc:creator>Sun, N.</dc:creator>
<dc:creator>Mattoo, H.</dc:creator>
<dc:creator>Varma, C. G.</dc:creator>
<dc:creator>Salgado, G.</dc:creator>
<dc:creator>Nava, M. S.</dc:creator>
<dc:creator>Ruiz, L. C.</dc:creator>
<dc:creator>Rivera, D. D.</dc:creator>
<dc:creator>Rios, S. A.</dc:creator>
<dc:creator>Kasturi, S. P.</dc:creator>
<dc:creator>Ribeiro, S. P.</dc:creator>
<dc:creator>Shlomchik, M. J.</dc:creator>
<dc:creator>Poholek, A. C.</dc:creator>
<dc:creator>Pillai, S. S.</dc:creator>
<dc:creator>Elsner, R. A.</dc:creator>
<dc:creator>Mahajan, V. S.</dc:creator>
<dc:creator>Das, J.</dc:creator>
<dc:date>2025-08-24</dc:date>
<dc:identifier>doi:10.1101/2025.08.19.670906</dc:identifier>
<dc:title><![CDATA[A unified network systems approach uncovers a core novel program underlying T follicular helper cell differentiation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-08-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2025.07.09.662874v1?rss=1">
<title>
<![CDATA[
Expanding the DNA Motif Lexicon of the Transcriptional Regulatory Code 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2025.07.09.662874v1?rss=1"
</link>
<description><![CDATA[
Transcriptional regulatory sequences in metazoans contain intricate combinations of transcription factor (TF) motifs. Stereospecific arrangements of simple motifs constitute composite elements (CEs) that enhance DNA-protein interaction specificity and enable combinatorial regulatory logic. Despite their importance, CEs remain underexplored. We advance CE discovery and functional characterization by developing an integrated framework that combines computational prediction, experimental testing and deep learning. The extended TF motif catalog comprises both synergistic and counteracting CEs, which are supported by evidence of TF binding in vivo and in vitro. A deep learning model GRACE trained on customized massively parallel reporter assays learns the lexicon of CEs at single-nucleotide resolution. Comparative analysis with a neural network model trained on chromatin accessibility demonstrates striking convergence and distinctions within the expanded regulatory lexicon, enabling joint predictions of motif contributions and the impact of variants on chromatin structure and transcriptional activity in diverse cellular contexts.
]]></description>
<dc:creator>Fan, J.</dc:creator>
<dc:creator>Chaudhri, V. K.</dc:creator>
<dc:creator>Bisht, D.</dc:creator>
<dc:creator>Pease, N. A.</dc:creator>
<dc:creator>Hall, D. R.</dc:creator>
<dc:creator>Gerges, P.</dc:creator>
<dc:creator>Yi, V. F.</dc:creator>
<dc:creator>Kales, S.</dc:creator>
<dc:creator>Ho, C.-H.</dc:creator>
<dc:creator>Das, J.</dc:creator>
<dc:creator>Ray, J. P.</dc:creator>
<dc:creator>Tewhey, R.</dc:creator>
<dc:creator>Meyer, C. A.</dc:creator>
<dc:creator>Sahni, N.</dc:creator>
<dc:creator>Singh, H.</dc:creator>
<dc:date>2025-07-15</dc:date>
<dc:identifier>doi:10.1101/2025.07.09.662874</dc:identifier>
<dc:title><![CDATA[Expanding the DNA Motif Lexicon of the Transcriptional Regulatory Code]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2025-07-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2024.05.01.592062v1?rss=1">
<title>
<![CDATA[
Sliding Window INteraction Grammar (SWING): a generalized interaction language model for peptide and protein interactions 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2024.05.01.592062v1?rss=1"
</link>
<description><![CDATA[
The explosion of sequence data has allowed the rapid growth of protein language models (pLMs). pLMs have now been employed in many frameworks including variant-effect and peptide-specificity prediction. Traditionally, for protein-protein or peptide-protein interactions (PPIs), corresponding sequences are either co-embedded followed by post-hoc integration or the sequences are concatenated prior to embedding. Interestingly, no method utilizes a language representation of the interaction itself. We developed an interaction LM (iLM), which uses a novel language to represent interactions between protein/peptide sequences. Sliding Window Interaction Grammar (SWING) leverages differences in amino acid properties to generate an interaction vocabulary. This vocabulary is the input into a LM followed by a supervised prediction step where the LMs representations are used as features.

SWING was first applied to predicting peptide:MHC (pMHC) interactions. SWING was not only successful at generating Class I and Class II models that have comparable prediction to state-of-the-art approaches, but the unique Mixed Class model was also successful at jointly predicting both classes. Further, the SWING model trained only on Class I alleles was predictive for Class II, a complex prediction task not attempted by any existing approach. For de novo data, using only Class I or Class II data, SWING also accurately predicted Class II pMHC interactions in murine models of SLE (MRL/lpr model) and T1D (NOD model), that were validated experimentally.

To further evaluate SWINGs generalizability, we tested its ability to predict the disruption of specific protein-protein interactions by missense mutations. Although modern methods like AlphaMissense and ESM1b can predict interfaces and variant effects/pathogenicity per mutation, they are unable to predict interaction-specific disruptions. SWING was successful at accurately predicting the impact of both Mendelian mutations and population variants on PPIs. This is the first generalizable approach that can accurately predict interaction-specific disruptions by missense mutations with only sequence information. Overall, SWING is a first-in-class generalizable zero-shot iLM that learns the language of PPIs.
]]></description>
<dc:creator>Omelchenko, A. A.</dc:creator>
<dc:creator>Siwek, J. C.</dc:creator>
<dc:creator>Chhibbar, P.</dc:creator>
<dc:creator>Arshad, S.</dc:creator>
<dc:creator>Nazarali, I.</dc:creator>
<dc:creator>Nazarali, K.</dc:creator>
<dc:creator>Rosengart, A.</dc:creator>
<dc:creator>Rahimikollu, J.</dc:creator>
<dc:creator>Tilstra, J.</dc:creator>
<dc:creator>Shlomchik, M. J.</dc:creator>
<dc:creator>Koes, D. R.</dc:creator>
<dc:creator>Joglekar, A. V.</dc:creator>
<dc:creator>Das, J.</dc:creator>
<dc:date>2024-05-04</dc:date>
<dc:identifier>doi:10.1101/2024.05.01.592062</dc:identifier>
<dc:title><![CDATA[Sliding Window INteraction Grammar (SWING): a generalized interaction language model for peptide and protein interactions]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2024-05-04</prism:publicationDate>
<prism:section></prism:section>
</item>
</rdf:RDF>
