<?xml version="1.0" encoding="UTF-8" ?>
<rdf:RDF xmlns:admin="http://webns.net/mvcb/" xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:prism="http://purl.org/rss/1.0/modules/prism/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/">
<channel rdf:about="https://biorxiv.org">
<admin:errorReportsTo rdf:resource="mailto:biorxiv@cshlpress.edu"/>
<title>bioRxiv Subject Collection: Genomics Bioinformatics</title>
<link>https://biorxiv.org</link>
<description>
This feed contains articles for bioRxiv Subject Collection "Genomics Bioinformatics"
</description>

<items>
<rdf:Seq>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.11.717967v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717286v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717310v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717563v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.07.717122v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717277v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717343v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.10.717777v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.715765v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717544v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717302v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.10.717844v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717501v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717357v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717557v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.10.717766v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717080v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717200v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717429v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.10.717550v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717340v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717246v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717236v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717220v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.714730v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717207v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717212v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717021v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717199v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.702570v1?rss=1"/>
</rdf:Seq>
</items>
<prism:eIssn/>
<prism:publicationName>bioRxiv</prism:publicationName>
<prism:issn/>

<image rdf:resource=""/>
</channel>
<image rdf:about="">
<title>bioRxiv</title>
<url>https://www.biorxiv.org/sites/default/files/bioRxiv_article.jpg</url>
<link>https://www.biorxiv.org</link>
</image>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.11.717967v1?rss=1">
<title>
<![CDATA[
Scalable genotyping in fixed transcriptomes resolves clonal heterogeneity via single-cell sequencing 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.11.717967v1?rss=1
</link>
<description><![CDATA[
Single-cell transcriptomics has revolutionized our understanding of heterogeneous cell populations. However, technical limitations of widely-used platforms have limited our ability to link transcriptional states to somatic mutations within the same cells at scale. Here, we introduce Genotyping in Fixed Transcriptomes (GIFT), a novel assay for simultaneous detection of hundreds of targeted genetic variants and whole transcriptome profiles in single cells. The core innovation of GIFT is a rationally designed gapfilling reaction between adjacent single-stranded DNA (ssDNA) probes that barcodes native transcript sequence to enable highly-specific targeted mutation detection. GIFT achieves >99% genotyping accuracy and flexible capture of hundreds of mutations per cell, including in FFPE (Formalin-Fixed Paraffin-Embedded) tissue, enabling clonal lineage tracing in heterogeneous settings. We demonstrate the unique scalability of GIFT by profiling >700,000 cells from 35 donors with myeloproliferative neoplasms (MPN), revealing mutation-dependent hematopoietic responses to systemic inflammation associated with the characteristic JAK2V617 mutation, including an allelic dose gradient of interferon-associated transcriptional programs and transcriptional priming of hematopoietic stem cells that develop into divergent disease states. Together, the unique technical advantages of GIFT enable direct resolution of genotype-to-phenotype relationships via clonal lineage tracing with comprehensive cell state measurements at single-cell resolution.
]]></description>
<dc:creator><![CDATA[ Blattman, S. B., Maslah, N., Varela, A. A., Kumpaitis, K., Nalbant, B., Snopkowski, C., Mariani, M., Kida, L. C., Takizawa, M., Ratnayeke, N., Yu, K. K. H., Fernandes, S., Mousavi, N., Borgstrom, E., Vallejo, D., Boghospor, L., Xin, R., Mignardi, M., Wu, S., Scarlott, N., Delgado-Rivera, L., Kumar, P., Krishnan, S., Giraudier, S., Kiladjian, J.-J., Howitt, B. E., Kohlway, A., Lund, P., Pe'er, D., Chaligne, R., Lareau, C. A. ]]></dc:creator>
<dc:date>2026-04-12</dc:date>
<dc:identifier>doi:10.64898/2026.04.11.717967</dc:identifier>
<dc:title><![CDATA[Scalable genotyping in fixed transcriptomes resolves clonal heterogeneity via single-cell sequencing]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717286v1?rss=1">
<title>
<![CDATA[
Metabolomic Fingerprinting from Dried Blood Spots Enables Individual Identification Across 1,257 Participants at 94% User-Level Accuracy 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717286v1?rss=1
</link>
<description><![CDATA[
Background. Constructing digital twins in healthcare requires biological data sources that are simultaneously informative, dynamic, and practical for routine collection. Dried blood spot (DBS) sampling combined with untargeted metabolomics is well suited to meet these requirements: DBS can be self-collected at home and mailed at ambient temperature, while untargeted LC-MS/MS captures thousands of metabolites reflecting individual physiology, lifestyle, and exposures. We previously demonstrated proof-of-concept individual identification from DBS-derived metabolomic profiles in 277 volunteers (80-92% accuracy). Here, we report a large-scale validation on a substantially expanded cohort. Methods. We collected 18,288 DBS samples from 1,257 individuals across 134 analytical batches over 15 months. Samples were self-collected at home, mailed via standard postal service, and analyzed by untargeted LC-MS/MS on a high-resolution Orbitrap platform in positive ESI mode. Our classification pipeline comprises batch-aware normalization, supervised feature selection, biological signal filtering, dimensionality reduction, and user-level majority voting across all available samples. This voting reflects the real-world use case: participants contribute multiple self-collected DBS cards over time, taken at different times of day and under varying conditions. We employed GroupKFold cross-validation with group=batch to ensure zero batch leakage between training and testing sets. Results. In 10-fold GroupKFold cross-validation (group=batch, zero batch leakage), our pipeline achieved 94.1% user-level identification accuracy (85.5% sample-level). In a fully held-out validation on 17 future batches, with all feature selection, normalization, and model fitting performed exclusively on training data, performance was even stronger: 96.1% user-level and 92.6% sample-level across 1,134 classes (chance level: 0.088%). Feature selection stability was confirmed via bootstrap analysis. We identified batch leakage as a critical methodological pitfall for the field: naive random splitting inflated accuracy by sharing 92.8% of test samples' (user, batch) pairs with the training set. The top discriminative metabolites span biologically relevant pathways including amino acid metabolism, fatty acid transport, and sphingolipid biosynthesis. Conclusions. Untargeted metabolomics from dried blood spots supports batch-aware, closed-set individual identification in a single-laboratory setting, with potential relevance for longitudinal sample-to-person linkage in future digital twin workflows. Keywords: dried blood spots, untargeted metabolomics, digital twin, individual identification, metabolic fingerprinting, LC-MS/MS, batch effect, precision medicine
]]></description>
<dc:creator><![CDATA[ Hauguel, P., Anctil, N., Noel, L. P. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717286</dc:identifier>
<dc:title><![CDATA[Metabolomic Fingerprinting from Dried Blood Spots Enables Individual Identification Across 1,257 Participants at 94% User-Level Accuracy]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717310v1?rss=1">
<title>
<![CDATA[
Evaluation of somatic variant calling methods on high coverage tumour-only amplicon sequencing data in a clinical environment 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717310v1?rss=1
</link>
<description><![CDATA[
One of the current workhorses of next-generation sequencing in clinical molecular diagnostics laboratories for profiling somatic mutations in tumours are amplicon-based targeted sequencing panels. Many open-source somatic variant callers are available; however, their use in clinical applications remains under explored. Therefore, we integrated outputs of six variant callers (FreeBayes, MuTect2, Pisces, Platypus, VarDict and VarScan) into a Snakemake pipeline and evaluated tumour-only data from the HD789 commercial reference standard sequenced in triplicate on three different sequencing runs using the Illumina AmpliSeq Focus panel on MiSeq and NextSeq 2000. A 1:4 dilution sample was sequenced for evaluating limits of variant detection. The called variants were analysed along depth, allele frequency, and other sequencing metrics. The variant callers were evaluated by their level of concordance and performance on known somatic variants. FreeBayes consistently called the largest number of somatic variants in each sample but also included more potential artifacts. Overall, FreeBayes, VarScan, MuTect2, and Pisces had the best performance on HD789 data.
]]></description>
<dc:creator><![CDATA[ Bharne, D., Gaston, D. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717310</dc:identifier>
<dc:title><![CDATA[Evaluation of somatic variant calling methods on high coverage tumour-only amplicon sequencing data in a clinical environment]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717563v1?rss=1">
<title>
<![CDATA[
TFBindFormer:A Cross-Attention Transformer for Transcription Factor--DNA Binding Prediction 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717563v1?rss=1
</link>
<description><![CDATA[
Transcription factors (TFs) are central regulators of gene expression, and their selective recognition of genomic DNA underlies various biological processes. Experimental profiling of TF -- DNA interactions using chromatin immunoprecipitation followed by sequencing(ChIP-seq) provides high resolution maps of in vivoTF -- DNA binding but remains costly, labor-intensive, and inherently low-throughput, limiting their scalability across different transcription factors,cell types, and regulatory conditions. Computational modeling therefore plays an essential role in inferring TF -- DNA interactions at genome scale. However, most existing computational models rely solely on DNA sequence and chromatin features to predict TF -- DNA binding, neglecting TF-specific protein information. This omission limits their ability to capture protein-dependent binding specificity. Here, we present TFBindFormer, a hybrid cross-attention transformer that explicitly integrates genomic DNA features with TF specific representations derived from protein sequences and structures. By modeling protein-conditioned, position-specific TF -- DNA interactions, TFBindFormer enables direct learning of molecular determinants underlying DNA recognition. Evaluated across hundreds of cell-type-specific TFs and hundreds of millions of genome-wide DNA bins, TFBindFormer consistently outperforms DNA-only baselines, achieving substantial gains in both area under precision-recall curve(AUPRC) and area under receiver operating characteristic curve(AUROC). Together, these results demonstrate that integrating TF and DNA features via cross-attention enables TFBindFormer to serve as an effective and scalable framework for large-scale TF -- DNA binding prediction.
]]></description>
<dc:creator><![CDATA[ Liu, P., Wang, L., Basnet, S., Cheng, J. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717563</dc:identifier>
<dc:title><![CDATA[TFBindFormer:A Cross-Attention Transformer for Transcription Factor--DNA Binding Prediction]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.07.717122v1?rss=1">
<title>
<![CDATA[
FM-GPT: Bayesian fine mapping for phenome-wide transcriptome-wide association studies 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.07.717122v1?rss=1
</link>
<description><![CDATA[
Transcriptome-wide association studies (TWAS) integrate genome wide association studies with expression quantitative trait locus reference panels to identify genes associated with traits of interest. However, linkage disequilibrium and correlated gene expression can induce spurious TWAS signals, motivating fine mapping methods to prioritize putatively causal genes within associated loci. The rapid growth of large-scale phenomic resources (e.g. electronic health records (EHRs)) has shifted genetic studies from single-trait analyses to phenome-wide investigations that jointly evaluate many closely related phenotypes. We introduce FM-GPT (Fine-mapping of causal Genes for Phenome-wide Transcriptome-wide association studies), a novel Bayesian fine mapping method for prioritizing causal genes across multiple correlated phenotypes with potentially mixed outcome types (e.g., binary, count or continuous) in phenome-wide TWAS. FM-GPT performs gene-guided dimension reduction of the phenotypes and reveals pleiotropic or phenotype-specific effects of the identified genes. In simulations, FM-GPT identified true causal genes more accurately than other fine mapping methods while controlling false positives. We applied FM-GPT to two applications using data from UK Biobank: a brain-wide genetic analysis of MRI data derived regional cortical thickness measures and a phenome-wide genetic analysis of clinical phenotypes derived from EHR data. FM-GPT greatly narrowed down the set size of putatively causal genes and identified: 1. genes with pleiotropic effects on regional cortical thickness across the cerebral cortex, including five genes BCAS3, LRRC37A, NOS2P3, ARL17B and UBB on chromosome 17 regulating neuronal morphology and cortical organization; and 2. genes that influence multiple medical conditions across the circulatory, metabolic, digestive, respiratory and genitourinary systems, revealing two major axes of variation among these conditions that point to a potential trade-off in gene regulation between immune and metabolic functions. These results highlight FM-GPT's power to disentangle complex gene-phenotype relationships in large-scale phenome-wide studies, uncovering shared biological mechanisms across diverse human traits and advancing translational and comorbidity research.
]]></description>
<dc:creator><![CDATA[ Canida, T., Ye, Z., Wang, S.-H., Huang, H.-H., Pan, Y., Liang, M., Chen, S., Ma, T. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.07.717122</dc:identifier>
<dc:title><![CDATA[FM-GPT: Bayesian fine mapping for phenome-wide transcriptome-wide association studies]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717277v1?rss=1">
<title>
<![CDATA[
A Large Yield Model for Crop Production and Design in Western Canada 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717277v1?rss=1
</link>
<description><![CDATA[
With a changing climate, disease pressure, and other production threats, it is critical to ensure that crop producers are well-positioned to protect and optimize yields. In this work we present LYM-1, the first large-scale, multi-crop model for the prediction of yield performance in the Canadian prairies. This is enabled by a large dataset containing over 4.7 million yield observations across 10 different crop types, distributed over 23 growing years. Leveraging additional data sources for weather and soil properties allows the model to reason about the complex interactions between genetics, environment, and management which underlie yield. The trained model is not only effective at predicting the yield for held-out data, but also reveals scientifically and agronomically relevant effects such as the interaction between solar radiation and nitrogen uptake. We anticipate that large yield models can be used for both the optimization of crop production by producers, as well as by plant breeders and industry for crop design.
]]></description>
<dc:creator><![CDATA[ Ubbens, J., Loliencar, P., Kagale, S. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717277</dc:identifier>
<dc:title><![CDATA[A Large Yield Model for Crop Production and Design in Western Canada]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717343v1?rss=1">
<title>
<![CDATA[
RNA Folding Nearest Neighbor Parameters Including the Modification 1-Methyl-Pseudouridine 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717343v1?rss=1
</link>
<description><![CDATA[
Nearest neighbor analysis is commonly used to estimate RNA folding stabilities. In this contribution, we report a set of RNA folding nearest neighbor parameters for estimating free energy change for RNA sequences including 1-methyl-pseudouridine. Development of mRNA vaccines has identified 1-methyl-pseudouridine as a key nucleobase modification for suppressing innate immune responses. However, the contributions of these modifications to RNA folding stability were unclear. Our new parameters provide helical terms for 1-methyl-pseudouridine-adenine and 1-methyl-pseudouridine-guanine base pairs. The parameters also estimate loop stabilities for loops with 1-methyl-pseudouridine or a combination of 1-methyl-pseudouridine and uridine. These parameters are derived using 208 optical melting experiments and tested against an additional 16 optical melting experiments. On average, we find that substitution of uridine with 1-methyl-pseudouridine stabilizes RNA folding, with the extent of stabilization depending on adjacent sequence. The estimation of tRNA folding ensembles for tRNA sequences with 1-methyl-pseudouridine was significantly improved using the new nearest neighbor parameters. The new nearest neighbor parameters are provided as part of the RNAstructure software package. With these parameters, the secondary structures of natural sequences with 1-methyl-pseudouridine and mRNA therapeutics fully substituted with 1-methyl-pseudouridine can be modeled.
]]></description>
<dc:creator><![CDATA[ Kierzek, E., Shabangu, T. S., Hiltke, O. M., Miaro, M., Arteaga, S., Znosko, B. M., Jolley, E. A., Bevilacqua, P. C., SantaLucia, J., SantaLucia, H. A., Lin, H., Metkar, M., Aviran, S., Soszynska-Jozwiak, M., Kierzek, R., Mathews, D. H. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717343</dc:identifier>
<dc:title><![CDATA[RNA Folding Nearest Neighbor Parameters Including the Modification 1-Methyl-Pseudouridine]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.10.717777v1?rss=1">
<title>
<![CDATA[
Generative design of intrinsically disordered protein regions with IDiom 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.10.717777v1?rss=1
</link>
<description><![CDATA[
Intrinsically disordered protein regions are ubiquitous across all kingdoms of life. These structurally heterogeneous regions play central roles in cellular processes such as transcriptional regulation, cellular signaling, and subcellular organization, yet they have remained largely inaccessible to rational design. Structure-based generative methods are not applicable to proteins that lack a stable fold, and existing sequence-based approaches for disordered regions rely on sampling methods that do not capture the evolutionary statistics of natural disordered regions. Here, we introduce IDiom, an autoregressive protein language model trained on 37 million intrinsically disordered region sequences curated from the AlphaFold Database. Trained using a fill-in-the-middle data augmentation, IDiom generates disordered region sequences conditioned on their surrounding structured context, as well as fully disordered proteins without any context. The model generates diverse sequences that recapitulate biologically relevant sequence features of natural disordered regions, and we demonstrate that post-training via reinforcement learning with a subcellular localization reward model produces sequences with features which are consistent with known sequence determinants of compartment-specific localization. These results establish IDiom as a general platform for the generative design of intrinsically disordered proteins and regions.
]]></description>
<dc:creator><![CDATA[ Liu, J., Ibarraran, S., Hu, F., Park, A., Dunn, A., Rotskoff, G. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.10.717777</dc:identifier>
<dc:title><![CDATA[Generative design of intrinsically disordered protein regions with IDiom]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.715765v1?rss=1">
<title>
<![CDATA[
A unified spatial transcriptome profiling of ten mouse organs 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.715765v1?rss=1
</link>
<description><![CDATA[
Spatial transcriptomics has enabled numerous deep learning models in this area, and training them requires large amounts of high-quality data, especially expression matrices paired with histological images. Here, we present a unified spatial transcriptomic dataset generated using the Stereo-seq platform, covering 10 mouse organs --including brain, kidney, lung, thymus, large intestine, skin, spleen, ovary, testis, and uterus --encompassing 23 tissue sections generated from 21 chips, each with matched ssDNA or H&E staining images. The dataset comprises single-cell-resolution (cell-bin) or square bin-50 (25 m x 25 m) expression matrices for each sample, accompanied by corresponding cell type annotations. Annotation robustness was further supported by concordance across different sections of the same tissue and corroboration with canonical marker gene expression patterns. Finally, we compared the characteristics of the cell-bin and bin-50 expression matrices and demonstrated the advantages of cell-bin resolution for cell annotation. This dataset provides a standardized resource for spatial transcriptomics method development, benchmarking, and multimodal analysis.
]]></description>
<dc:creator><![CDATA[ Ren, X., Lv, T., Liu, N., Shi, C., Fang, J., Zhao, N., Kang, Q., Wang, D. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.715765</dc:identifier>
<dc:title><![CDATA[A unified spatial transcriptome profiling of ten mouse organs]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717544v1?rss=1">
<title>
<![CDATA[
Living by the sea: chromosome-scale genome assembly and salt gland transcriptomes provide insights into ion regulatory mechanisms in the saline-tolerant mosquito Aedes togoi 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717544v1?rss=1
</link>
<description><![CDATA[
The coastal rock pool mosquito, Aedes togoi, is among the few saline-tolerant mosquito species who lay their eggs in seawater pools where their larvae develop in water that spans dilute freshwater to hyper-saline conditions. Ae. togoi is found in a relatively restricted range spanning the North Pacific coast of North America and coastal regions of Asia from subtropical to subarctic latitudes. Here, we present a de-novo chromosome-scale genome assembly and gene annotation for Ae. togoi, highlighting its relatively small genome size and novel chromosomal arrangements compared to other available genomes of Aedine mosquitoes. As part of the annotation process, we detail repeat content and distribution and curate several key multi-gene families, focusing on ion-transport proteins enriched in the larval salt-secreting gland that are candidates for facilitating hyperosmotic urine formation during development in saline water. Using these new resources, we gain mechanistic insight into the ion regulatory capabilities that power the remarkable saline tolerance of the larvae of Ae. togoi. Altogether, we have contributed to the growing body of genomic and transcriptomic resources for diverse mosquito species and provided mechanistic insights into the molecular adaptations required for an insect to thrive in highly dynamic environments such as coastal rock pools.
]]></description>
<dc:creator><![CDATA[ Chiang, J., Khodikian, E., Phelan, O., Parra, A. K., Peach, D. A. H., Durant, A. C., Matthews, B. J. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717544</dc:identifier>
<dc:title><![CDATA[Living by the sea: chromosome-scale genome assembly and salt gland transcriptomes provide insights into ion regulatory mechanisms in the saline-tolerant mosquito Aedes togoi]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717302v1?rss=1">
<title>
<![CDATA[
A segmental duplication-mediated deletion leads to neocentromere formation in orangutans 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717302v1?rss=1
</link>
<description><![CDATA[
Centromeres ensure faithful chromosome segregation, yet how new centromeres arise and replace canonical ones remains poorly understood. Here, we investigate a polymorphic centromere repositioning event on the orangutan chromosome 10 using near-telomere-to-telomere assemblies, epigenetic profiling, and population-scale data. We identify striking heterogeneity in canonical centromeres, ranging from large, higher-order repeat -satellite arrays to short, monomeric -satellite tracts, alongside the emergence of neocentromeres lacking -satellite DNA. We show a segmental duplication-mediated deletion of 3.6 Mbp that removed the higher-order repeat array, promoting centromere repositioning and neocentromere formation. Phylogenetic analyses reveal complex evolutionary dynamics, including introgression and incomplete lineage sorting in orangutan lineages. These findings demonstrate that centromere identity can evolve through structural variation and epigenetic reprogramming, highlighting its remarkable plasticity in primate genomes.
]]></description>
<dc:creator><![CDATA[ De Gennaro, L., Yoo, D., Pistacchia, L., Magrone, R., Daponte, A., Perrone, F., Ravasini, F., Mastrorosa, K. F., Oshima, K. K., Polano, C., Hoekzema, K., Munson, K. M., Wertz, J., Marroni, F., Catacchio, C. R., Antonacci, F., Noordermeer, D., Montinaro, F., Logsdon, G. A., Trombetta, B., Eichler, E. E., Ventura, M. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717302</dc:identifier>
<dc:title><![CDATA[A segmental duplication-mediated deletion leads to neocentromere formation in orangutans]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.10.717844v1?rss=1">
<title>
<![CDATA[
EVEE: Interpretable variant effect prediction from genomic foundation model embeddings 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.10.717844v1?rss=1
</link>
<description><![CDATA[
Predicting the clinical significance of genetic variants remains a central challenge in genomic medicine, with most observed variants classified as variants of uncertain significance. Here we show that representations from Evo 2, a 7-billion-parameter genomic foundation model, support accurate and interpretable pathogenicity prediction across variant types from a single framework. An embedding-based classifier, or "probe", trained on Evo 2 embeddings achieves state-of-the-art performance across single nucleotide variant consequence types (0.997 overall AUROC on 839k ClinVar variants) and generalizes zero-shot to indels (0.991 AUROC), outperforming bioinformatic meta-predictors, protein models, and existing foundation model approaches. Performance is robust across conservation levels and transfers to deep mutational scanning datasets for BRCA1, BRCA2, TP53, and LDLR. To make these predictions interpretable, we train supervised annotation probes to quantify predicted disruptions caused by each variant, then synthesize these disruption profiles into natural language explanations using a frontier reasoning model. We provide pre-computed predictions and on-demand explanations for all 4.2 million ClinVar variants through the Evo Variant Effect Explorer (EVEE), an interactive web resource for the community. This work establishes that representations from genomic foundation models can serve as a unified substrate for both accurate variant effect prediction and mechanistic interpretation, reframing interpretability in computational genomics from a trade-off into a complementary product of learned biological structure.
]]></description>
<dc:creator><![CDATA[ Pearce, M. T., Dooms, T., Yamamoto, R., Meehl, J., Molnar, C., Bissell, M., Hazra, D., Fang, C., Nguyen, N., Anderson, M., Osborne, C., Duffy, P., Toomey, B., Klee, E., Myasoedova, E., Ryu, A., Ayanian, S., Korfiatis, P., Redlon, M., Jain, A., Balsam, D., Wang, N. K. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.10.717844</dc:identifier>
<dc:title><![CDATA[EVEE: Interpretable variant effect prediction from genomic foundation model embeddings]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717501v1?rss=1">
<title>
<![CDATA[
Genomic insights into bacterial isolates dominating honeypot ant crop microbiomes reveal metabolically distinct Fructilactobacillus sp. 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717501v1?rss=1
</link>
<description><![CDATA[
Honeypot ants engage in a convergently-evolved phenotype called repletism, where specialized workers expand their crops and gasters to store vast amounts of food internally. They then store that food for months to support colonies during times of food scarcity. This fascinating phenotype is not well-understood and very little is known about the microbial interactions happening within the fructose-rich replete crop. Previous research using amplicon sequencing showed that Fructilactobacillus makes up nearly 100% of the crop microbiomes of Myrmecocystus mexicanus repletes. This striking result and successful isolation of those strains led to the present investigation into the phylogenetic diversity of these strains and any clues to the nature of the symbiotic relationship between them and the ant host. We find that the isolates from these repletes represented two evolutionary lineages, both most closely related to F. fructivorans. One of those lineages was also found to be phylogenetically and metabolically distinct from all other Fructilactobacillus reference genomes used in this study. This discovery in a genus of bacteria that are highly relevant for fermented human foods and will also lay the groundwork for future understanding of the convergent evolutionary mechanisms of repletism in ants.
]]></description>
<dc:creator><![CDATA[ Oiler, I. M., Francoeur, C., Grigaitis, P., LeBoeuf, A. C., Cicconardi, F., Montgomery, S. H., Khadempour, L. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717501</dc:identifier>
<dc:title><![CDATA[Genomic insights into bacterial isolates dominating honeypot ant crop microbiomes reveal metabolically distinct Fructilactobacillus sp.]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717357v1?rss=1">
<title>
<![CDATA[
Palaeogenomics-informed inferences of European dog admixture enables scalable dingo conservation 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717357v1?rss=1
</link>
<description><![CDATA[
Dingoes, mainland Australia's sole terrestrial apex mammal for over 3,000 years, are important components of many ecosystems and Indigenous cultural heritage. Yet conflicts with farmers over livestock predation following European colonisation led to widespread lethal control. These measures are further reinforced by perceptions of hybrid ancestry with European dogs. Accurate estimation of European dog ancestry is therefore essential for effective conservation, but existing tests yield highly conflicting results. Leveraging pre-colonial dingo palaeogenomes and a robust ancestry modelling framework, we reassess the genetic ancestry of contemporary populations. Our approach corrects limitations and biases in existing methods, producing consistent estimates even with as few as 10,000 genome-wide transversion genetic markers. Accounting for admixture uncovers population structure that has persisted for over two millennia and reveals patterns of genetic admixture coinciding with human activity during the colonial era. This study underscores the value of palaeogenomes as a vital conservation tool, offering insights unattainable from modern DNA alone. By clarifying ancestry and population structure, our study offers a robust foundation for effective regionally informed dingo management across Australia.
]]></description>
<dc:creator><![CDATA[ Ravishankar, S., Nguyen, N. C., Taufik, L., Michielsen, N. M., Bergström, A., Tobler, R., Fordham, D., Brüniche-Olsen, A., Rahbek, C., Llamas, B., Souilmi, Y. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717357</dc:identifier>
<dc:title><![CDATA[Palaeogenomics-informed inferences of European dog admixture enables scalable dingo conservation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717557v1?rss=1">
<title>
<![CDATA[
Flanking DNA sequences determine DNA methylation maintenance in proliferation, cancer and aging 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717557v1?rss=1
</link>
<description><![CDATA[
DNA methylation is a stable epigenetic modification essential for promoter silencing, retrotransposon silencing, genomic imprinting, and X-chromosome inactivation. Symmetrical DNA methylation at CpG dinucleotides is maintained after every round of cell division by the DNMT1-UHRF1 maintenance methyltransferase complex. Here we define a conserved rank order of DNA hexanucleotide sequences surrounding CpG sites that determines baseline DNA methylation levels in cells and the probability that DNA methylation is retained across cell divisions. This rank order is conserved in vertebrates and does not depend on TET enzymatic activity. CpG sites in hexanucleotide sequences less favored by DNMT1 are more susceptible to replication-dependent loss of DNA methylation over time; consequently, the methylation status of these motifs serves as a marker of cumulative cell divisions, biological age and cancer progression. Thus, the intrinsic vulnerability stemming from the sequence preference of the DNMT1-UHRF1 complex compromises the long-term stability of DNA methylation, especially at heterochromatic sites in proliferating cells, and contributes to the epigenetic dysregulation observed in cancer and aging.
]]></description>
<dc:creator><![CDATA[ Lopez-Moyado, I. F., Hernandez-Espinosa, L., Angel, J. C., Modat, A., Lleshi, E., Crawford, R., Faulkner, G. J., Rao, A. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717557</dc:identifier>
<dc:title><![CDATA[Flanking DNA sequences determine DNA methylation maintenance in proliferation, cancer and aging]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.10.717766v1?rss=1">
<title>
<![CDATA[
Suppression of upstream ORF translation is not a widespread mechanism of translational stimulation by yeast helicase Ded1 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.10.717766v1?rss=1
</link>
<description><![CDATA[
Ded1 is an essential DEAD-box helicase in yeast that broadly stimulates translation initiation and is critical for mRNAs with structured 5'UTRs. We have evaluated the proposal that Ded1 stimulates translation primarily by preventing initiation at upstream ORFs (uORFs) associated with stable secondary structures. By Ribo-Seq analysis under experimental conditions designed to suppress artifactual 5'UTR translation, we found that reduced translation of the main open-reading-frames (mORFs) in native mRNAs is generally not accompanied by increased 5'UTR translation in ded1 mutant cells, and that the presence of translated uORFs in yeast mRNAs generally does not confer heightened dependence on Ded1 for efficient translation of mORFs. Results from a high-throughput reporter assay examining native 5'UTRs reinforce the importance of Ded1 in initiation from structured 5' UTRs and show that impairing Ded1 has minimal effects on translational repression by uORFs. Our results demonstrate that, in cells growing vegetatively in rich medium, translational stimulation by suppression of inhibitory uORFs is restricted to a minority of Ded1 targets, and that unwinding of 5' UTR secondary structures per se is the principal mechanism for Ded1 stimulation of translation initiation.
]]></description>
<dc:creator><![CDATA[ Kumar, R., May, G., Sen, N. D., McManus, J., Hinnebusch, A. G. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.10.717766</dc:identifier>
<dc:title><![CDATA[Suppression of upstream ORF translation is not a widespread mechanism of translational stimulation by yeast helicase Ded1]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717080v1?rss=1">
<title>
<![CDATA[
scMultiPreDICT: A single-cell predictive framework with transcriptomic and epigenetic signatures 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717080v1?rss=1
</link>
<description><![CDATA[
Cellular responses to genetic perturbations depend on both transcriptional programs and the epigenetic landscape. While single-cell multiomics technologies enable simultaneous profiling of gene expression and chromatin accessibility, the relative contribution of each regulatory layer to gene expression remains unclear. Existing computational approaches focus on data integration and gene regulatory network inference but do not systematically compare the predictive performance of transcriptional versus epigenetic features on a gene-by-gene basis.We present scMultiPreDICT, a computational framework for comparative predictive modeling of gene expression using single-cell multiomics data. scMultiPreDICT benchmarks RNA-only, ATAC-only and multimodal feature sets across six machine learning models including regression, tree-based learning and deep learning using multiple biological datasets. We show that RNA-derived features generally provide strong predictive power, whereas chromatin accessibility alone yields a modest performance. Surprisingly, multimodal integration does not uniformly improve prediction accuracy; instead, its benefit is gene-specific and context-dependent. Feature importance analysis reveals that transcriptional features dominate for most genes, whereas chromatin accessibility contributes meaningfully for a subset of genes in specific cellular contexts. Overall, the results demonstrate that regulatory layers contribute differently to gene expression. scMultiPreDICT provides a systematic framework for identifying the relative contributions of transcriptional and epigenetic regulation across genes and cellular contexts, guiding the design of targeted perturbation studies and the prioritization of regulatory layers for therapeutic interventions. scMultiPreDICT is implemented in R and available at https://github.com/UzunLab/scMultiPreDICT/.
]]></description>
<dc:creator><![CDATA[ Manful, E.-E., Uzun, Y. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717080</dc:identifier>
<dc:title><![CDATA[scMultiPreDICT: A single-cell predictive framework with transcriptomic and epigenetic signatures]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717200v1?rss=1">
<title>
<![CDATA[
A Joint Promoterome-Proteome Atlas Highlights the Molecular Diversity of Human Skeletal Muscles 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717200v1?rss=1
</link>
<description><![CDATA[
More than 600 distinct skeletal muscles constitute up to 40% of the total mass of the human body. Human skeletal muscles differ in anatomical position, morphology, origin, and function, but the diversity of their molecular phenotypes, the gene expression and protein abundance profiles, remains poorly explored. Here, we report the large-scale CAGE-Seq promoterome profiling of 75 human skeletal muscles, complemented by 22 matched proteomes obtained with mass spectrometry. We identified 37001 transcribed regulatory elements and 1804 protein groups encompassing 1895 proteins, 80% of which demonstrated non-uniform expression across different muscles. The skeletal muscles of the eye, tongue, and diaphragm had the most distinctive molecular phenotypes, while the overall diversity was driven by hundreds of transcription factors with tissue-specific activity. By analyzing the allelic imbalance of CAGE-Seq reads, we discovered 6653 allele-specific single-nucleotide variants often coinciding with muscle-related GWAS SNPs, including muscle volume. Finally, we provide an interactive online atlas of transcriptomic and proteomic molecular phenotypes, facilitating further studies of gene regulation and heritable pathologies of skeletal muscles.
]]></description>
<dc:creator><![CDATA[ Buyan, A., Gazizova, G., Zgoda, V. G., Vavilov, N. E., Gryzunov, N., Eliseeva, I. A., Nozdrin, V., Sergeeva, Y., Titova, A., Shigapova, L., Erina, A. V., Mescheryakov, G., Murtazina, A., Deviatiiarov, R., Forrest, A. R. R., Makeev, V., Hayashizaki, Y., Popov, D., Shagimardanova, E., Kulakovskiy, I. V., Gusev, O. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717200</dc:identifier>
<dc:title><![CDATA[A Joint Promoterome-Proteome Atlas Highlights the Molecular Diversity of Human Skeletal Muscles]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717429v1?rss=1">
<title>
<![CDATA[
DIANA: Deep Learning Identification and Assessment of Ancient DNA 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717429v1?rss=1
</link>
<description><![CDATA[
The field of ancient metagenomics provides insights into past microbiomes, but with a growing dataset size, methods that rely on reference databases have limited scope. Here, we introduce DIANA, a multi-task neural network that predicts key metadata categories from unitig abundances. Trained on 2,597 run accessions (1.72~Tbp of assembled unitig sequences), DIANA accurately identifies sample host (94.6%), community type (90.0%), and material (88.9%) on held-out test data and demonstrates robust generalisation on an independent validation set. A key innovation is DIANA's ability to perform semantic generalisation, correctly classifying samples with labels unseen during training -- such as novel subspecies -- to their appropriate parent categories. By leveraging both known and uncharacterized genomic sequences, DIANA provides a rapid, data-driven system for metadata validation and quality control, accelerating discovery in ancient metagenomics research.
]]></description>
<dc:creator><![CDATA[ Duitama Gonzalez, C., Lopopolo, M., Nishimura, L., Faure, R., Duchene, S. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717429</dc:identifier>
<dc:title><![CDATA[DIANA: Deep Learning Identification and Assessment of Ancient DNA]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.10.717550v1?rss=1">
<title>
<![CDATA[
Divergent landscapes of positive and negative selection signatures across residue-resolved human-virus protein-protein interaction interfaces 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.10.717550v1?rss=1
</link>
<description><![CDATA[
Virus-targeted host proteins evolve under dual selective pressures. Negative selection preserves within-host interactions, while positive selection promotes adaptive changes to evade viral engagement. Viral and endogenous within-host partners can compete for binding, bringing distinct pressures together on the same interaction interface. Yet, the spatial organization of distinct selective pressures across virus-targeted host proteins, and how such pressures manifest across diverse interaction contexts, remains largely unknown. Here, we integrate an evolutionarily annotated map of human-virus protein-protein interactions (PPIs) with intra-protein residue-residue contact maps to probe the spatial organization of residue-level selective pressures across PPI interfaces of virus-targeted host proteins. Across all PPI interfaces collectively, we find that residues under positive selection are spatially clustered, whereas those under negative selection are broadly dispersed, with additional spatial segregation between positive and strongly negatively selected sites. Moreover, while positive selection is unevenly distributed across interfaces bound exclusively by viral proteins (exogenous-specific), they are more uniformly distributed across interfaces shared between viral and within-host partners (mimic-targeted), suggesting that adaptive pressure from viral targeting acts on the entire mimic-targeted interface, whereas it acts on only a subset of the exogenous-specific interface. Strikingly, clustering of positively selected residues is more pronounced between mimic-targeted and other interface types than within exogenous- or endogenous-specific interfaces alone, suggesting that mimic-targeted interfaces may serve as focal points of adaptive evolution. Overall, our multiscale framework of PPI interfaces and residue-level contacts reveals heterogeneous, context-dependent landscapes of selective pressures across virus-targeted host proteins, providing a high-resolution view of how adaptation and constraint are intricately balanced and coordinated within the host.
]]></description>
<dc:creator><![CDATA[ Su, W.-C., Xia, Y. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.10.717550</dc:identifier>
<dc:title><![CDATA[Divergent landscapes of positive and negative selection signatures across residue-resolved human-virus protein-protein interaction interfaces]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717340v1?rss=1">
<title>
<![CDATA[
Genomic insights into polyketide toxin synthesis and algal symbiosis using high-quality genome sequences of the early divergent hexacorallian genus Palythoa (Cnidaria, Zoantharia) 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717340v1?rss=1
</link>
<description><![CDATA[
Palytoxin, first isolated from Palythoa toxica, is among the most potent marine toxins known. Despite decades of biochemical investigation, genetic bases underlying its potential biosynthesis in Palythoa remain unresolved. Here we present four high-quality genome assemblies of Palythoa species, including Palythoa cf. toxica, and integrate these with a chromosome-scale genome assembly of P. caribaeorum. Performing comparative genomic analyses, we screened for candidate genes potentially involved in palytoxin biosynthesis and examined patterns of genome evolution. Unexpectedly, we identified only two classes of ketosynthase (KS) domain-containing genes in Palythoa: fatty acid synthases (FAS) and bacterial-like polyketide synthases (PKSs). Contrasting other anthozoans, animal FAS-like PKS (AFPK) genes common to all Palythoa species were not detected. We found no evidence for lineage-specific expansion of PKS genes unique to Palythoa, suggesting that if palytoxin/palytoxin-like molecule biosynthesis is host-encoded, it may involve functional modification or co-opting pre-existing FAS and/or bacterial-like PKS pathways. Comparative analyses revealed expansions of gene families associated with transport and binding functions in Palythoa, potentially reflecting molecular adaptations linked to their sand-incorporating body structure. We identified TPT1 and CLEC4A as rapidly evolving genes in multiple Palythoa species, consistent with possible roles in growth regulation and host-microbe interactions. Additionally, comparison between azooxanthellate and zooxanthellate species revealed mutations within conserved protein domains of LePin, which has been implicated in cnidarian endosymbiosis, suggesting lineage-specific modifications associated with symbiotic state. This study establishes a foundation for zoantharian genomic research, provides insights into lineage-specific genomic signatures, and advances molecular and evolutionary biological knowledge of this ecologically important group.
]]></description>
<dc:creator><![CDATA[ Yoshioka, Y., Shoguchi, E., Chiu, Y.-L., Kawamitsu, M., Reimer, J. D., Yamashita, H. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717340</dc:identifier>
<dc:title><![CDATA[Genomic insights into polyketide toxin synthesis and algal symbiosis using high-quality genome sequences of the early divergent hexacorallian genus Palythoa (Cnidaria, Zoantharia)]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717246v1?rss=1">
<title>
<![CDATA[
Aimea gen. nov. defines a novel plant-associated yeast genus in Microbotryomycetes with three novel species 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717246v1?rss=1
</link>
<description><![CDATA[
Plant tissues and surfaces are among the largest microbial habitats on Earth, and commensal yeasts are common members of these communities, where they can contribute to plant-microbe interactions including the biological control of plant diseases. Here, we describe a novel genus, Aimea, of unpigmented, plant-associated basidiomycete yeasts, in the class Microbotryomycetes, and name three new species (A. erigeronia, A. cardamina, and A. sorghi) represented by four isolates from leaves and roots of multiple hosts. We characterize these taxa through analyses of metabolic requirements, tolerance to differences in osmolarity, pH, and temperature, and enzymatic activities. In parallel, we generate near-chromosome-scale hybrid genomes annotated with transcriptome data. We employ whole-genome and multilocus phylogenetic approaches to infer the placement of these species within a monophyletic clade. We use comparative genomics to examine how the gene content of these yeasts differs from that of other members of the Microbotryomycetes, including an apparent proliferation of retrotransposons. We further demonstrate the genetic transformability of these taxa using Agrobacterium tumefaciens-mediated transformation. The description of these new species, together with high-quality genome resources and a genetic transformation protocol, establishes a foundation for experimental studies of these novel plant-associated yeasts and their interactions with hosts and other microbes.
]]></description>
<dc:creator><![CDATA[ Liber, J. A., Coelho, M. A., He, S. Y. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717246</dc:identifier>
<dc:title><![CDATA[Aimea gen. nov. defines a novel plant-associated yeast genus in Microbotryomycetes with three novel species]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717236v1?rss=1">
<title>
<![CDATA[
The Rayleigh Quotient and Contrastive Principal Component Analysis II 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717236v1?rss=1
</link>
<description><![CDATA[
Contrastive principal component analysis (PCA) methods are effective approaches to dimensionality reduction where variance of a target dataset is maximized while variance of a background dataset is minimized. We previously described how contrastive PCA problems can be written as solutions to generalized eigenvalue problems that maximize particular instantiations of the Rayleigh quotient. Here, we discuss two extensions of contrastive PCA: we use kernel weighting from spatial PCA (k-{rho}PCA) to contrast spatial and non-spatial axes of variation, and separately solve the Rayleigh quotient in the space of basis function coefficients (f-{rho}PCA) to find modes of variation in functional data. Together, these extensions expand the scope of contrastive PCA while unifying disparate fields of spatial and functional methods within a single conceptual and mathematical framework. We showcase the utility of these extensions with several examples drawn from genomics, analyzing gene expression in cancer and immune response to vaccination.
]]></description>
<dc:creator><![CDATA[ Jackson, K. C., Carilli, M. T., Pachter, L. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717236</dc:identifier>
<dc:title><![CDATA[The Rayleigh Quotient and Contrastive Principal Component Analysis II]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717220v1?rss=1">
<title>
<![CDATA[
TopicVI: A Knowledge-guided deep interpretable model for resolving context-specific gene programs 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717220v1?rss=1
</link>
<description><![CDATA[
Mechanistic insights from single-cell and spatial transcriptomics largely rely on cell clustering, differential expression analysis, and interpretation through prior biological knowledge. However, this approach is often limited by the reliance on curated biological priors that fail to capture context-specific gene programs, particularly in complex disease states. To address this gap, we introduce TopicVI, a deep interpretable model that integrates established biological knowledge with data-driven refinement to discover context-dependent gene programs in single-cell and spatial transcriptomic data. TopicVI jointly infers cell clusters and gene topics using optimal transport to flexibly align prior gene programs with observed data while permitting context-specific refinements. Comprehensive benchmarking demonstrates that TopicVI outperforms existing methods in biological conservation, batch correction, topic coherence, and rare cell identification. TopicVI effectively disentangles multiple sources of biological variation, such as separating anatomy-specific expression patterns from disease-associated signatures in spatial transcriptomics. Applying TopicVI to glioblastoma datasets, we identify gene topics related to cell cycle regulation and EGFR signaling that reveal convergent tumor states across distinct drug perturbations. By integrating prior knowledge with data-driven discovery, TopicVI enables identification of interpretable gene programs that illuminate biological processes and therapeutic mechanisms in complex transcriptomics data.
]]></description>
<dc:creator><![CDATA[ Cai, G., Zhao, W., Zhu, X., Lin, Y., Zhou, B., Cao, J., He, Q., Yang, B., Gu, X., Xiong, X., Zhou, Z. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717220</dc:identifier>
<dc:title><![CDATA[TopicVI: A Knowledge-guided deep interpretable model for resolving context-specific gene programs]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.714730v1?rss=1">
<title>
<![CDATA[
PERREO: An integrated pipeline for repetitive elements analysis enables the repeatome expression profiling in cancer 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.714730v1?rss=1
</link>
<description><![CDATA[
Transcriptome-wide profiling of repetitive elements expression reveals transposable element-derived transcripts that are deregulated in diverse biological contexts including cancer. However, most RNA-seq pipelines are optimized for annotated genes and substantially undercount repeat RNA molecules, limiting their discovery and characterization. Here we present PERREO, a comprehensive, user-friendly pipeline for analyzing repetitive RNA elements from short- and long-read sequencing data. PERREO performs quality control, repeat-aware alignment and quantification, differential expression analysis, co-expression network analysis, and de novo transcript assembly with minimal computational expertise required. We validate PERREO across cell lines, tumor tissues and liquid biopsies, demonstrating superior sensitivity to repetitive RNA signatures compared with standard RNA-seq approaches. PERREO integrates predictive modelling to identify biological associations and generates publication-ready visualizations. By removing the bioinformatic barrier to repetitive RNA discovery, this pipeline enables broader investigation of the repeatome's role in cellular biology and disease, yielding valuable results that, for specific analytical objectives, outperform certain existing tools and pipelines.
]]></description>
<dc:creator><![CDATA[ Rodriguez-Martin, F., Masero-Leon, M., Gomez-Cabello, D. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.714730</dc:identifier>
<dc:title><![CDATA[PERREO: An integrated pipeline for repetitive elements analysis enables the repeatome expression profiling in cancer]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717207v1?rss=1">
<title>
<![CDATA[
BrightEyes-FFS: an open-source platform for comprehensive analysis of fluorescence fluctuation spectroscopy experiments with small detector arrays 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717207v1?rss=1
</link>
<description><![CDATA[
Fluorescence fluctuation spectroscopy (FFS) is an ensemble of techniques for quantitative measurement of molecular dynamics and interactions. Recently, the introduction of small-format array detectors has opened up a new range of spatiotemporal information, allowing for more detailed analysis of system kinetics. However, there is currently no open-source software available for analyzing the high-dimensional FFS data sets. We present BrightEyes-FFS, an open-source Python-based environment for FFS analysis with array detectors. The environment includes a Python package for reading raw FFS data, computing auto- and cross-correlations using various algorithms, and fitting the correlations to several models. A graphical user interface (GUI), available as a standalone executable, makes the analysis fast and user-friendly. An automated Jupyter Notebook writing tool enables transition from the GUI to Jupyter Notebook for custom analysis. We believe that BrightEyes-FFS will enable a wider community to study diffusion, flow, and interaction dynamics.
]]></description>
<dc:creator><![CDATA[ Slenders, E., Perego, E., Zappone, S., Vicidomini, G. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717207</dc:identifier>
<dc:title><![CDATA[BrightEyes-FFS: an open-source platform for comprehensive analysis of fluorescence fluctuation spectroscopy experiments with small detector arrays]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717212v1?rss=1">
<title>
<![CDATA[
Statistical Principles Define an Open-Source Differential Analysis Workflow for Mass Spectrometry Imaging Experiments with Complex Designs 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717212v1?rss=1
</link>
<description><![CDATA[
Mass spectrometry imaging (MSI) characterizes the spatial heterogeneity of molecular abundances in biological samples. Experiments with complex designs, involving multiple conditions and multiple samples, provide particularly useful insight into differential abundance of analytes. However, analyses of these experiments require attention to details such as signal processing, selection of regions of interest, and statistical methodology. This manuscript contributes a statistical analysis workflow for detecting differentially abundant analytes in MSI experiments with complex designs. Using a case study of histologic samples of human tibial plateaus from knees of osteoarthritis patients and cadaveric controls, as well as simulated datasets, we illustrate the impact of the analysis decisions. We illustrate the importance of signal processing and feature aggregation for preserving biological relevance and alleviating the stringency of multiple testing. We further demonstrate the importance of selecting regions of interest in ways that are compatible with differential analysis. Finally, we contrast several common statistical models for differential analysis, showcase the appropriate use of replication, and demonstrate model-based calculation of sample size for followup investigations. The discussion is accompanied by detailed recommendations and an open-source R-based implementation that can be followed by other investigations.
]]></description>
<dc:creator><![CDATA[ Rogers, E. B. T., Lakkimsetty, S. S., Bemis, K. A., Schurman, C. A., Angel, P. A., Schilling, B., Vitek, O. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717212</dc:identifier>
<dc:title><![CDATA[Statistical Principles Define an Open-Source Differential Analysis Workflow for Mass Spectrometry Imaging Experiments with Complex Designs]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717021v1?rss=1">
<title>
<![CDATA[
Deep learning enables direct HLA typing from immunopeptidomics data 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717021v1?rss=1
</link>
<description><![CDATA[
The immune system eliminates malignant and infected cells through T-cell-mediated recognition of peptides presented by human leukocyte antigen molecules. Mass spectrometry-based immunopeptidomics enables unbiased identification of naturally presented HLA-restricted peptides and has become central to the development of T-cell-based immunotherapies. However, immunopeptidomics data reflects the combined peptide presentation of multiple HLA alleles, and determining which allotypes are represented in this multi-allelic complexity remains an unmet computational challenge. Here, we introduce immunotype, a deep learning-based ensemble predictor for HLA class I allotyping directly from immunopeptidomics data. Immunotype integrates peptide and HLA sequence information through transformer encoders and a graph neural network, complemented by a curated mono-allelic reference of known peptide-HLA binding preferences. Immunotype achieves an overall accuracy of 87.2% at protein-level resolution across diverse tissues and thereby enables rapid, cost-effective HLA typing of large-scale immunopeptidomics datasets.
]]></description>
<dc:creator><![CDATA[ Pilz, M., Scheid, J., Bauer, A., Lemke, S., Sachsenberg, T., Bauer, J., Nelde, A., Stadelmaier, J., Walter, A., Rammensee, H.-G., Nahnsen, S., Kohlbacher, O., Walz, J. S. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717021</dc:identifier>
<dc:title><![CDATA[Deep learning enables direct HLA typing from immunopeptidomics data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717199v1?rss=1">
<title>
<![CDATA[
A computational model for quantifying instability of tandem repeats across the genome 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717199v1?rss=1
</link>
<description><![CDATA[
Tandem repeats (TRs) exhibit high levels of somatic mosaicism, which is increasingly recognized as an important modifier of repeat expansion disorders. Long-read sequencing can capture full-length repeat alleles, yet robust frameworks for quantifying instability across TRs genome-wide are still needed. Here, we introduce a general-purpose model for quantifying TR instability in a given long-read sequencing dataset, without explicitly distinguishing biological mosaicism from technical noise, and which is broadly applicable to both simple and structurally complex loci. This model accurately characterizes allelic instability at each TR locus by representing the distribution of read-to-consensus deviations for each allele. Using HiFi sequencing data from 256 HPRC cell line samples, we fitted models for 617,007 TR loci, including known pathogenic repeats. We observe that instability levels are generally low, but vary substantially across individual TRs, and are driven more strongly by repeat composition than overall repeat length. Furthermore, we applied our method to targeted PureTarget long-read data from samples with known repeat expansions and identified significant mosaicism in the majority of expanded alleles. Our model offers a practical way to quantify instability of tandem repeats across the genome and to detect unusually unstable repeat alleles.
]]></description>
<dc:creator><![CDATA[ Dolzhenko, E., English, A., Mokveld, T., de Sena Brandine, G., Kronenberg, Z., Wright, G., Drogemoller, B., Rowell, W. J., Wenger, A. M., Bennett, M. F., Weisburd, B., Erwin, G. S., Jin, P., Nelson, D. L., Dashnow, H., Sedlazeck, F., Eberle, M. A. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717199</dc:identifier>
<dc:title><![CDATA[A computational model for quantifying instability of tandem repeats across the genome]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.702570v1?rss=1">
<title>
<![CDATA[
Eco-physiological and transcriptomic plasticity of Dianthus inoxianus in response to drought 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.702570v1?rss=1
</link>
<description><![CDATA[
Phenotypic plasticity is a key mechanism by which plants adjust their traits to environmental changes. These phenotypic adjustments are driven by plastic changes in gene expression regulated by gene regulatory networks. Drought, a major selective force in Mediterranean ecosystems, provides a powerful context to examine how genomic plasticity translates into phenotypic responses. Here, we used Dianthus inoxianus, a drought-tolerant Mediterranean carnation, in order to characterize the phenotypic and transcriptomic plasticity in response to drought stress combining ecophysiological measurements with RNA-seq, gene co-expression and gene regulatory network analyses. Most of the phenotypic traits exhibited low plasticity in response to drought, except water and osmotic potential. At transcriptome level, we identified 57 plastic genes, suggesting that drought tolerance in D. inoxianus relies predominantly on constitutive gene expression. These plastic genes were enriched in processes typically related to drought response, such as cell wall components and abscisic acid (ABA) signaling. Some plastic genes belonged to drought-responsive modules, while others were hubs in different modules acting as inter-modular connectors. Furthermore, the regulatory network revealed that these plastic genes were strongly regulated by multiple stress-responsive transcription factors, and that drought-associated modules were regulated through both ABA-dependent and ABA-independent pathways. In addition, we identified contrasting patterns of canalization and decanalization, with immune and post-transcriptional regulation remaining canalized under drought, whereas photosynthesis and amino acid metabolism became decanalized, potentially releasing cryptic genetic variation. Overall, our results emphasise that drought tolerance in D. inoxianus emerges from a strategy combining preadaptation with targeted plasticity in key molecular pathways.
]]></description>
<dc:creator><![CDATA[ Parra, A. R., Balao, F. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.702570</dc:identifier>
<dc:title><![CDATA[Eco-physiological and transcriptomic plasticity of Dianthus inoxianus in response to drought]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
</rdf:RDF>
