<?xml version="1.0" encoding="UTF-8" ?>
<rdf:RDF xmlns:admin="http://webns.net/mvcb/" xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:prism="http://purl.org/rss/1.0/modules/prism/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/">
<channel rdf:about="https://biorxiv.org">
<admin:errorReportsTo rdf:resource="mailto:biorxiv@cshlpress.edu"/>
<title>bioRxiv Subject Collection: Genomics Bioinformatics</title>
<link>https://biorxiv.org</link>
<description>
This feed contains articles for bioRxiv Subject Collection "Genomics Bioinformatics"
</description>

<items>
<rdf:Seq>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717138v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717460v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.11.717915v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717280v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717332v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717547v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.11.717967v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717286v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717310v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717563v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.07.717122v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717277v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717343v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.10.717777v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.715765v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717544v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717302v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.10.717844v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717501v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717357v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717557v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.10.717766v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717080v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717200v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.09.717429v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.10.717550v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717340v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717246v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717236v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.04.08.717220v1?rss=1"/>
</rdf:Seq>
</items>
<prism:eIssn/>
<prism:publicationName>bioRxiv</prism:publicationName>
<prism:issn/>

<image rdf:resource=""/>
</channel>
<image rdf:about="">
<title>bioRxiv</title>
<url>https://www.biorxiv.org/sites/default/files/bioRxiv_article.jpg</url>
<link>https://www.biorxiv.org</link>
</image>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717138v1?rss=1">
<title>
<![CDATA[
TB-Bench: A Systematic Benchmark of Machine Learning and Deep Learning Methods for Second-Line TB Drug Resistance Prediction 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717138v1?rss=1
</link>
<description><![CDATA[
Drug-resistant tuberculosis (TB), characterized by prolonged treatment regimens and suboptimal treatment outcomes, remains a major obstacle to global TB elimination. Advances in sequencing technologies have enabled the development of machine-learning (ML) approaches, including deep-learning (DL) methods, to predict drug resistance directly from genomic data. However, a significant gap remains in translating these advances into clinical practice. While current approaches reliably predict resistance to first-line drugs, they show consistently lower and more variable performance for second-line drugs compared with traditional drug-susceptibility testing. To characterize these limitations and assess practical utility, we conducted a comprehensive survey and standardized benchmarking of current approaches for predicting TB drug resistance using whole-genome sequencing (WGS) data. Using systematic selection criteria, we identified 20 traditional ML and DL models from 8 studies and evaluated drug-specific versions across 14 second-line drugs within a unified framework. To account for methodological heterogeneity, the models were evaluated using three distinct feature sets reflecting variability in input representations. We trained and evaluated the models on different subsets of the WHO dataset, comprising 50,801 samples, and assessed generalizability using an external validation dataset comprising 1,199 samples. In the internal evaluation on the held-out WHO test dataset, traditional ML models using binary features achieved higher predictive performance than DL models. For example, XGBoost achieved the highest area under the precision-recall curve (PRAUC) scores (46%-93%) for 10 of the 14 drugs. However, performance varied substantially across drugs. Notably, the superior performance of traditional ML models - even with limited feature sets - highlights their applicability in low-resource settings. When evaluated on the external validation dataset, the performance of traditional ML and DL models was comparable, and neither class of models demonstrated substantial improvement over catalogue-based approaches, underscoring challenges in cross-dataset generalization. Overall, this benchmarking study provides a comprehensive and systematic evaluation of current approaches, establishes a rigorous evaluation framework for future comparisons, and identifies key methodological considerations necessary to advance robust drug resistance prediction in clinical settings. To enhance reproducibility and facilitate the application of TB-Bench to additional datasets and models, we have made the source code publicly available at https://github.com/BIRDSgroup/TB-Bench.
]]></description>
<dc:creator><![CDATA[ VP, B., Jaiswal, S., Meshram, A., PVS, D., S C, S., Narayanan, M. ]]></dc:creator>
<dc:date>2026-04-13</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717138</dc:identifier>
<dc:title><![CDATA[TB-Bench: A Systematic Benchmark of Machine Learning and Deep Learning Methods for Second-Line TB Drug Resistance Prediction]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-13</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717460v1?rss=1">
<title>
<![CDATA[
Revisiting Reconstruction Likelihood: Variational Autoencoders for Biological and Biomedical Data Clustering 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717460v1?rss=1
</link>
<description><![CDATA[
Background and Objective: Variational Autoencoders (VAEs) offer a powerful framework for unsupervised anomaly detection and data clustering, often surpassing traditional methods. A core strength of VAEs lies in their ability to model data distributions probabilistically, enabling robust identification of anomalies and clusters through reconstruction likelihood --- a stochastic metric providing a principled alternative to deterministic error scores. Methods: We investigated how different VAE architectures, combining reconstruction likelihood with a learnable or data-driven prior, performed in a clustering task on a toy dataset such as MNIST. Results were verified using dimensionality reduction techniques like t-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), alongside clustering algorithms such as k-means and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). Results: The VAE's encoder inherently maps data points into a latent space exhibiting discernible cluster structure, as evidenced by alignment with ground truth labels. While dimensionality reduction techniques (both t-SNE and UMAP) facilitated the application of clustering algorithms (k-means and HDBSCAN), these methods were primarily used to visualize and interpret the latent space organization. Conclusions: This study demonstrates that VAEs effectively cluster data by implicitly encoding assignments in their latent representations. Determining cluster membership from encoder output, combined with reconstruction likelihood using semantic features, offers a principled approach for identifying typical samples and anomalies. Future research should focus on leveraging this inherent clustering capability of VAEs to enhance interpretability and facilitate clinical application.
]]></description>
<dc:creator><![CDATA[ Korenic, A., Özkaya, U., Capar, A. ]]></dc:creator>
<dc:date>2026-04-12</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717460</dc:identifier>
<dc:title><![CDATA[Revisiting Reconstruction Likelihood: Variational Autoencoders for Biological and Biomedical Data Clustering]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.11.717915v1?rss=1">
<title>
<![CDATA[
HiReS: A Method for Automated Morphometric Trait Extraction from High-Resolution Plankton Images 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.11.717915v1?rss=1
</link>
<description><![CDATA[
Trait-based analyses in plankton ecology require measurements from large numbers of individuals, yet morphometric data are typically collected manually from small subsets. Although deep learning methods enable automated detection and segmentation, extracting quantitative trait data from full-resolution images remains challenging due to memory limitations. We present HiReS (High-Resolution Segmentation), an open-source workflow for automated morphometric trait extraction from large plankton images. HiReS partitions images into overlapping chunks, performs YOLO-based instance segmentation, reconstructs polygon annotations in full-image space, removes truncated and duplicate detections, and computes geometric descriptors. We evaluated the workflow using manually annotated and automated segmentations of Daphnia pulex, Daphnia galeata, and Simocephalus vetulus. Automated measurements reproduced the structure of manual trait distributions and showed strong agreement at both sample and individual levels. A consistent positive bias was observed, reflecting a multiplicative scaling offset rather than distortion of relative trait structure. After centering, residual disagreement was low and sample-level relationships were preserved. Subsampling analyses further showed that model-derived medians can outperform manual estimates at low sampling depths. HiReS provides a reproducible and computationally efficient framework for extracting morphometric traits from full-resolution plankton images.
]]></description>
<dc:creator><![CDATA[ Mavrianos, S., Teurlincx, S., Declerck, S. A., Otte, K. A. ]]></dc:creator>
<dc:date>2026-04-12</dc:date>
<dc:identifier>doi:10.64898/2026.04.11.717915</dc:identifier>
<dc:title><![CDATA[HiReS: A Method for Automated Morphometric Trait Extraction from High-Resolution Plankton Images]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717280v1?rss=1">
<title>
<![CDATA[
Cyclome: Large-scale replica-exchange dynamics of 930 cyclic peptide reveal thermal stability and critical metal-binding behavior 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717280v1?rss=1
</link>
<description><![CDATA[
Cyclic peptides are recognized as versatile scaffolds for therapeutic and functional applications due to their structural stability and resistance to degradation. Despite this promise, systematic analysis and prediction of their thermal stability remain limited by fragmented data resources, inadequate sequence comparison methods, and the lack of cyclicity-aware computational models. We provide a comprehensive, multi-scale computational framework to characterize cyclic peptides. First, we unified four fragmented public repositories of cyclic peptides into a single largest curated resource of 930 cyclic peptides, Cyclome930. This integrates cyclic topology, sequence, experimental structural coordinates, and source organism annotations into a consistently featurized dataset. Cyclome930 thus expands the dataset of annotated cyclic peptides by ~3.4 fold (from 276 to 930). Second, we developed a novel cyclic sequence alignment algorithm that explicitly accounts for rotational symmetry and knot topology, enabling more accurate scoring of sequence similarity than conventional linear alignments. Third, we investigate the thermal stability of cyclic peptides using extensive all-atom replica-exchange molecular dynamics (100ns; REMD) simulations, allowing conformational sampling across 298 K to 400 K and track its stress tensors with increasing temperature. Finally, these simulation-derived thermo-stability metrics were used to train a machine learning model to predict cyclic peptide melting points from sequence and topology (STop2Melt). Crucially, the model introduces cyclicity-aware embeddings derived from ESMc representations coupled with cyclic offset vector, capturing the peptides knot topology. STop2Melt achieved strong predictive performance on held-out peptides and outperforms baseline methods that neglect cyclic structure. Finally, we scored Cyclome930 (cyclic ligands) for critical mineral metal binding using a multi-classifier model (CritiCL). To our knowledge, Cyclome930 represents the first effort in peptide literature to integrate physics-based temperature ramped simulations, cyclic sequence similarity scoring, machine learning for thermal stability prediction and scoring them for critical metal binding. Cyclicity-aware computational toolchains (cyclome930.studio/) provide a foundational resource for computational design of stable cyclic peptide prototype libraries thereby annotating and expanding genomic islands linked to critical mineral recovery.
]]></description>
<dc:creator><![CDATA[ Sajeevan, K. A., Gates, H., Raghunath, V. S., Tan, C. P. H., Danurdoro, R., Young, J., Chowdhury, R. ]]></dc:creator>
<dc:date>2026-04-12</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717280</dc:identifier>
<dc:title><![CDATA[Cyclome: Large-scale replica-exchange dynamics of 930 cyclic peptide reveal thermal stability and critical metal-binding behavior]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717332v1?rss=1">
<title>
<![CDATA[
Pipette: Encoding scientific literature into an executable Skill Graph for multi-agent bioinformatics 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717332v1?rss=1
</link>
<description><![CDATA[
The cost of genomic sequencing has fallen by several orders of magnitude, yet data analysis remains a bottleneck concentrated among researchers with specialized computational expertise. While Large Language Models can generate bioinformatics code, they frequently produce incoherent multi-step workflows due to the absence of domain-specific analytical constraints. Here, we present Pipette, a multi-agent AI framework that orchestrates end-to-end bioinformatics workflows through natural language interaction, guided by a literature-derived Skill Graph. This directed, edge-weighted knowledge graph, extracted from over 20,000 peer-reviewed publications, constrains workflow generation to biologically valid analytical transitions, preventing incomplete or incoherent workflows. We benchmarked Pipette across four biological domains using published datasets: single-cell RNA-seq analysis of peripheral blood mononuclear cells and a human pancreas atlas, bulk RNA-seq differential expression in rice under environmental stress, and two structure-based computational drug design workflows. In ablation against two LLMs operating without Skill Graph constraints, Pipette matched or exceeded both baselines across all quantitative metrics while uniquely completing multi-step cross-domain transitions. We further evaluated Pipette on a clinical genomics task, where it executed an ACMG/AMP-compliant variant classification on a reference human genome. In all cases, Pipette recapitulated established biological and clinical findings while generating a fully reproducible, machine-readable provenance record. By reducing the computational expertise required to execute standard genomic analyses, Pipette lowers the barrier between sequencing data and biological insight for bench scientists. Pipette is available at https://pipette.bio.
]]></description>
<dc:creator><![CDATA[ Gupta, C., Sharma, A. ]]></dc:creator>
<dc:date>2026-04-12</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717332</dc:identifier>
<dc:title><![CDATA[Pipette: Encoding scientific literature into an executable Skill Graph for multi-agent bioinformatics]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717547v1?rss=1">
<title>
<![CDATA[
Interpretable Antibody-Antigen Structural Interface Prediction via Adaptive Graph Learning and Cyclic Transfer 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717547v1?rss=1
</link>
<description><![CDATA[
Experimental structural methods can identify antibody-antigen interfaces with high precision, but they remain time consuming and resource intensive, limiting their application across the rapidly expanding space of antibody and antigen sequences. Computational models capable of predicting these interfaces could therefore accelerate antibody discovery and provide insight into the principles governing immune recognition. However, this problem remains challenging due to limited structural datasets, severe class imbalance, and the complex, non-local nature of biomolecular interactions. Here we present VASCIF (Variable-domain Antibody-antigen Structural Complex Interface Finder), a structure-aware framework built on a Masked Graph Attention (MGA) architecture that represents protein complexes as residue graphs and captures long-range structural dependencies through attention-based message passing. The framework is straightforward to implement and enables efficient inference, allowing substantially faster predictions than other existing structure based approaches. Evaluated on curated structural complexes across multiple benchmark datasets using rigorous cross validation, VASCIF achieves state of the art performance for residue level interface prediction. Interpretability analyses reveal that the model recovers biophysically meaningful interaction patterns consistent with known principles of antibody recognition, and redefining interfaces using larger residue distance thresholds (~10 angstrom) significantly improves predictive performance. Together, VASCIF provides a practical predictive framework and new insights into antibody-antigen molecular recognition.
]]></description>
<dc:creator><![CDATA[ Liu, X., Kantorow, J., Chattopadhyay, A. K., Chakraborty, S. ]]></dc:creator>
<dc:date>2026-04-12</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717547</dc:identifier>
<dc:title><![CDATA[Interpretable Antibody-Antigen Structural Interface Prediction via Adaptive Graph Learning and Cyclic Transfer]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.11.717967v1?rss=1">
<title>
<![CDATA[
Scalable genotyping in fixed transcriptomes resolves clonal heterogeneity via single-cell sequencing 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.11.717967v1?rss=1
</link>
<description><![CDATA[
Single-cell transcriptomics has revolutionized our understanding of heterogeneous cell populations. However, technical limitations of widely-used platforms have limited our ability to link transcriptional states to somatic mutations within the same cells at scale. Here, we introduce Genotyping in Fixed Transcriptomes (GIFT), a novel assay for simultaneous detection of hundreds of targeted genetic variants and whole transcriptome profiles in single cells. The core innovation of GIFT is a rationally designed gapfilling reaction between adjacent single-stranded DNA (ssDNA) probes that barcodes native transcript sequence to enable highly-specific targeted mutation detection. GIFT achieves >99% genotyping accuracy and flexible capture of hundreds of mutations per cell, including in FFPE (Formalin-Fixed Paraffin-Embedded) tissue, enabling clonal lineage tracing in heterogeneous settings. We demonstrate the unique scalability of GIFT by profiling >700,000 cells from 35 donors with myeloproliferative neoplasms (MPN), revealing mutation-dependent hematopoietic responses to systemic inflammation associated with the characteristic JAK2V617 mutation, including an allelic dose gradient of interferon-associated transcriptional programs and transcriptional priming of hematopoietic stem cells that develop into divergent disease states. Together, the unique technical advantages of GIFT enable direct resolution of genotype-to-phenotype relationships via clonal lineage tracing with comprehensive cell state measurements at single-cell resolution.
]]></description>
<dc:creator><![CDATA[ Blattman, S. B., Maslah, N., Varela, A. A., Kumpaitis, K., Nalbant, B., Snopkowski, C., Mariani, M., Kida, L. C., Takizawa, M., Ratnayeke, N., Yu, K. K. H., Fernandes, S., Mousavi, N., Borgstrom, E., Vallejo, D., Boghospor, L., Xin, R., Mignardi, M., Wu, S., Scarlott, N., Delgado-Rivera, L., Kumar, P., Krishnan, S., Giraudier, S., Kiladjian, J.-J., Howitt, B. E., Kohlway, A., Lund, P., Pe'er, D., Chaligne, R., Lareau, C. A. ]]></dc:creator>
<dc:date>2026-04-12</dc:date>
<dc:identifier>doi:10.64898/2026.04.11.717967</dc:identifier>
<dc:title><![CDATA[Scalable genotyping in fixed transcriptomes resolves clonal heterogeneity via single-cell sequencing]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717286v1?rss=1">
<title>
<![CDATA[
Metabolomic Fingerprinting from Dried Blood Spots Enables Individual Identification Across 1,257 Participants at 94% User-Level Accuracy 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717286v1?rss=1
</link>
<description><![CDATA[
Background. Constructing digital twins in healthcare requires biological data sources that are simultaneously informative, dynamic, and practical for routine collection. Dried blood spot (DBS) sampling combined with untargeted metabolomics is well suited to meet these requirements: DBS can be self-collected at home and mailed at ambient temperature, while untargeted LC-MS/MS captures thousands of metabolites reflecting individual physiology, lifestyle, and exposures. We previously demonstrated proof-of-concept individual identification from DBS-derived metabolomic profiles in 277 volunteers (80-92% accuracy). Here, we report a large-scale validation on a substantially expanded cohort. Methods. We collected 18,288 DBS samples from 1,257 individuals across 134 analytical batches over 15 months. Samples were self-collected at home, mailed via standard postal service, and analyzed by untargeted LC-MS/MS on a high-resolution Orbitrap platform in positive ESI mode. Our classification pipeline comprises batch-aware normalization, supervised feature selection, biological signal filtering, dimensionality reduction, and user-level majority voting across all available samples. This voting reflects the real-world use case: participants contribute multiple self-collected DBS cards over time, taken at different times of day and under varying conditions. We employed GroupKFold cross-validation with group=batch to ensure zero batch leakage between training and testing sets. Results. In 10-fold GroupKFold cross-validation (group=batch, zero batch leakage), our pipeline achieved 94.1% user-level identification accuracy (85.5% sample-level). In a fully held-out validation on 17 future batches, with all feature selection, normalization, and model fitting performed exclusively on training data, performance was even stronger: 96.1% user-level and 92.6% sample-level across 1,134 classes (chance level: 0.088%). Feature selection stability was confirmed via bootstrap analysis. We identified batch leakage as a critical methodological pitfall for the field: naive random splitting inflated accuracy by sharing 92.8% of test samples' (user, batch) pairs with the training set. The top discriminative metabolites span biologically relevant pathways including amino acid metabolism, fatty acid transport, and sphingolipid biosynthesis. Conclusions. Untargeted metabolomics from dried blood spots supports batch-aware, closed-set individual identification in a single-laboratory setting, with potential relevance for longitudinal sample-to-person linkage in future digital twin workflows. Keywords: dried blood spots, untargeted metabolomics, digital twin, individual identification, metabolic fingerprinting, LC-MS/MS, batch effect, precision medicine
]]></description>
<dc:creator><![CDATA[ Hauguel, P., Anctil, N., Noel, L. P. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717286</dc:identifier>
<dc:title><![CDATA[Metabolomic Fingerprinting from Dried Blood Spots Enables Individual Identification Across 1,257 Participants at 94% User-Level Accuracy]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717310v1?rss=1">
<title>
<![CDATA[
Evaluation of somatic variant calling methods on high coverage tumour-only amplicon sequencing data in a clinical environment 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717310v1?rss=1
</link>
<description><![CDATA[
One of the current workhorses of next-generation sequencing in clinical molecular diagnostics laboratories for profiling somatic mutations in tumours are amplicon-based targeted sequencing panels. Many open-source somatic variant callers are available; however, their use in clinical applications remains under explored. Therefore, we integrated outputs of six variant callers (FreeBayes, MuTect2, Pisces, Platypus, VarDict and VarScan) into a Snakemake pipeline and evaluated tumour-only data from the HD789 commercial reference standard sequenced in triplicate on three different sequencing runs using the Illumina AmpliSeq Focus panel on MiSeq and NextSeq 2000. A 1:4 dilution sample was sequenced for evaluating limits of variant detection. The called variants were analysed along depth, allele frequency, and other sequencing metrics. The variant callers were evaluated by their level of concordance and performance on known somatic variants. FreeBayes consistently called the largest number of somatic variants in each sample but also included more potential artifacts. Overall, FreeBayes, VarScan, MuTect2, and Pisces had the best performance on HD789 data.
]]></description>
<dc:creator><![CDATA[ Bharne, D., Gaston, D. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717310</dc:identifier>
<dc:title><![CDATA[Evaluation of somatic variant calling methods on high coverage tumour-only amplicon sequencing data in a clinical environment]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717563v1?rss=1">
<title>
<![CDATA[
TFBindFormer:A Cross-Attention Transformer for Transcription Factor--DNA Binding Prediction 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717563v1?rss=1
</link>
<description><![CDATA[
Transcription factors (TFs) are central regulators of gene expression, and their selective recognition of genomic DNA underlies various biological processes. Experimental profiling of TF -- DNA interactions using chromatin immunoprecipitation followed by sequencing(ChIP-seq) provides high resolution maps of in vivoTF -- DNA binding but remains costly, labor-intensive, and inherently low-throughput, limiting their scalability across different transcription factors,cell types, and regulatory conditions. Computational modeling therefore plays an essential role in inferring TF -- DNA interactions at genome scale. However, most existing computational models rely solely on DNA sequence and chromatin features to predict TF -- DNA binding, neglecting TF-specific protein information. This omission limits their ability to capture protein-dependent binding specificity. Here, we present TFBindFormer, a hybrid cross-attention transformer that explicitly integrates genomic DNA features with TF specific representations derived from protein sequences and structures. By modeling protein-conditioned, position-specific TF -- DNA interactions, TFBindFormer enables direct learning of molecular determinants underlying DNA recognition. Evaluated across hundreds of cell-type-specific TFs and hundreds of millions of genome-wide DNA bins, TFBindFormer consistently outperforms DNA-only baselines, achieving substantial gains in both area under precision-recall curve(AUPRC) and area under receiver operating characteristic curve(AUROC). Together, these results demonstrate that integrating TF and DNA features via cross-attention enables TFBindFormer to serve as an effective and scalable framework for large-scale TF -- DNA binding prediction.
]]></description>
<dc:creator><![CDATA[ Liu, P., Wang, L., Basnet, S., Cheng, J. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717563</dc:identifier>
<dc:title><![CDATA[TFBindFormer:A Cross-Attention Transformer for Transcription Factor--DNA Binding Prediction]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.07.717122v1?rss=1">
<title>
<![CDATA[
FM-GPT: Bayesian fine mapping for phenome-wide transcriptome-wide association studies 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.07.717122v1?rss=1
</link>
<description><![CDATA[
Transcriptome-wide association studies (TWAS) integrate genome wide association studies with expression quantitative trait locus reference panels to identify genes associated with traits of interest. However, linkage disequilibrium and correlated gene expression can induce spurious TWAS signals, motivating fine mapping methods to prioritize putatively causal genes within associated loci. The rapid growth of large-scale phenomic resources (e.g. electronic health records (EHRs)) has shifted genetic studies from single-trait analyses to phenome-wide investigations that jointly evaluate many closely related phenotypes. We introduce FM-GPT (Fine-mapping of causal Genes for Phenome-wide Transcriptome-wide association studies), a novel Bayesian fine mapping method for prioritizing causal genes across multiple correlated phenotypes with potentially mixed outcome types (e.g., binary, count or continuous) in phenome-wide TWAS. FM-GPT performs gene-guided dimension reduction of the phenotypes and reveals pleiotropic or phenotype-specific effects of the identified genes. In simulations, FM-GPT identified true causal genes more accurately than other fine mapping methods while controlling false positives. We applied FM-GPT to two applications using data from UK Biobank: a brain-wide genetic analysis of MRI data derived regional cortical thickness measures and a phenome-wide genetic analysis of clinical phenotypes derived from EHR data. FM-GPT greatly narrowed down the set size of putatively causal genes and identified: 1. genes with pleiotropic effects on regional cortical thickness across the cerebral cortex, including five genes BCAS3, LRRC37A, NOS2P3, ARL17B and UBB on chromosome 17 regulating neuronal morphology and cortical organization; and 2. genes that influence multiple medical conditions across the circulatory, metabolic, digestive, respiratory and genitourinary systems, revealing two major axes of variation among these conditions that point to a potential trade-off in gene regulation between immune and metabolic functions. These results highlight FM-GPT's power to disentangle complex gene-phenotype relationships in large-scale phenome-wide studies, uncovering shared biological mechanisms across diverse human traits and advancing translational and comorbidity research.
]]></description>
<dc:creator><![CDATA[ Canida, T., Ye, Z., Wang, S.-H., Huang, H.-H., Pan, Y., Liang, M., Chen, S., Ma, T. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.07.717122</dc:identifier>
<dc:title><![CDATA[FM-GPT: Bayesian fine mapping for phenome-wide transcriptome-wide association studies]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717277v1?rss=1">
<title>
<![CDATA[
A Large Yield Model for Crop Production and Design in Western Canada 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717277v1?rss=1
</link>
<description><![CDATA[
With a changing climate, disease pressure, and other production threats, it is critical to ensure that crop producers are well-positioned to protect and optimize yields. In this work we present LYM-1, the first large-scale, multi-crop model for the prediction of yield performance in the Canadian prairies. This is enabled by a large dataset containing over 4.7 million yield observations across 10 different crop types, distributed over 23 growing years. Leveraging additional data sources for weather and soil properties allows the model to reason about the complex interactions between genetics, environment, and management which underlie yield. The trained model is not only effective at predicting the yield for held-out data, but also reveals scientifically and agronomically relevant effects such as the interaction between solar radiation and nitrogen uptake. We anticipate that large yield models can be used for both the optimization of crop production by producers, as well as by plant breeders and industry for crop design.
]]></description>
<dc:creator><![CDATA[ Ubbens, J., Loliencar, P., Kagale, S. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717277</dc:identifier>
<dc:title><![CDATA[A Large Yield Model for Crop Production and Design in Western Canada]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717343v1?rss=1">
<title>
<![CDATA[
RNA Folding Nearest Neighbor Parameters Including the Modification 1-Methyl-Pseudouridine 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717343v1?rss=1
</link>
<description><![CDATA[
Nearest neighbor analysis is commonly used to estimate RNA folding stabilities. In this contribution, we report a set of RNA folding nearest neighbor parameters for estimating free energy change for RNA sequences including 1-methyl-pseudouridine. Development of mRNA vaccines has identified 1-methyl-pseudouridine as a key nucleobase modification for suppressing innate immune responses. However, the contributions of these modifications to RNA folding stability were unclear. Our new parameters provide helical terms for 1-methyl-pseudouridine-adenine and 1-methyl-pseudouridine-guanine base pairs. The parameters also estimate loop stabilities for loops with 1-methyl-pseudouridine or a combination of 1-methyl-pseudouridine and uridine. These parameters are derived using 208 optical melting experiments and tested against an additional 16 optical melting experiments. On average, we find that substitution of uridine with 1-methyl-pseudouridine stabilizes RNA folding, with the extent of stabilization depending on adjacent sequence. The estimation of tRNA folding ensembles for tRNA sequences with 1-methyl-pseudouridine was significantly improved using the new nearest neighbor parameters. The new nearest neighbor parameters are provided as part of the RNAstructure software package. With these parameters, the secondary structures of natural sequences with 1-methyl-pseudouridine and mRNA therapeutics fully substituted with 1-methyl-pseudouridine can be modeled.
]]></description>
<dc:creator><![CDATA[ Kierzek, E., Shabangu, T. S., Hiltke, O. M., Miaro, M., Arteaga, S., Znosko, B. M., Jolley, E. A., Bevilacqua, P. C., SantaLucia, J., SantaLucia, H. A., Lin, H., Metkar, M., Aviran, S., Soszynska-Jozwiak, M., Kierzek, R., Mathews, D. H. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717343</dc:identifier>
<dc:title><![CDATA[RNA Folding Nearest Neighbor Parameters Including the Modification 1-Methyl-Pseudouridine]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.10.717777v1?rss=1">
<title>
<![CDATA[
Generative design of intrinsically disordered protein regions with IDiom 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.10.717777v1?rss=1
</link>
<description><![CDATA[
Intrinsically disordered protein regions are ubiquitous across all kingdoms of life. These structurally heterogeneous regions play central roles in cellular processes such as transcriptional regulation, cellular signaling, and subcellular organization, yet they have remained largely inaccessible to rational design. Structure-based generative methods are not applicable to proteins that lack a stable fold, and existing sequence-based approaches for disordered regions rely on sampling methods that do not capture the evolutionary statistics of natural disordered regions. Here, we introduce IDiom, an autoregressive protein language model trained on 37 million intrinsically disordered region sequences curated from the AlphaFold Database. Trained using a fill-in-the-middle data augmentation, IDiom generates disordered region sequences conditioned on their surrounding structured context, as well as fully disordered proteins without any context. The model generates diverse sequences that recapitulate biologically relevant sequence features of natural disordered regions, and we demonstrate that post-training via reinforcement learning with a subcellular localization reward model produces sequences with features which are consistent with known sequence determinants of compartment-specific localization. These results establish IDiom as a general platform for the generative design of intrinsically disordered proteins and regions.
]]></description>
<dc:creator><![CDATA[ Liu, J., Ibarraran, S., Hu, F., Park, A., Dunn, A., Rotskoff, G. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.10.717777</dc:identifier>
<dc:title><![CDATA[Generative design of intrinsically disordered protein regions with IDiom]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.715765v1?rss=1">
<title>
<![CDATA[
A unified spatial transcriptome profiling of ten mouse organs 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.715765v1?rss=1
</link>
<description><![CDATA[
Spatial transcriptomics has enabled numerous deep learning models in this area, and training them requires large amounts of high-quality data, especially expression matrices paired with histological images. Here, we present a unified spatial transcriptomic dataset generated using the Stereo-seq platform, covering 10 mouse organs --including brain, kidney, lung, thymus, large intestine, skin, spleen, ovary, testis, and uterus --encompassing 23 tissue sections generated from 21 chips, each with matched ssDNA or H&E staining images. The dataset comprises single-cell-resolution (cell-bin) or square bin-50 (25 m x 25 m) expression matrices for each sample, accompanied by corresponding cell type annotations. Annotation robustness was further supported by concordance across different sections of the same tissue and corroboration with canonical marker gene expression patterns. Finally, we compared the characteristics of the cell-bin and bin-50 expression matrices and demonstrated the advantages of cell-bin resolution for cell annotation. This dataset provides a standardized resource for spatial transcriptomics method development, benchmarking, and multimodal analysis.
]]></description>
<dc:creator><![CDATA[ Ren, X., Lv, T., Liu, N., Shi, C., Fang, J., Zhao, N., Kang, Q., Wang, D. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.715765</dc:identifier>
<dc:title><![CDATA[A unified spatial transcriptome profiling of ten mouse organs]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717544v1?rss=1">
<title>
<![CDATA[
Living by the sea: chromosome-scale genome assembly and salt gland transcriptomes provide insights into ion regulatory mechanisms in the saline-tolerant mosquito Aedes togoi 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717544v1?rss=1
</link>
<description><![CDATA[
The coastal rock pool mosquito, Aedes togoi, is among the few saline-tolerant mosquito species who lay their eggs in seawater pools where their larvae develop in water that spans dilute freshwater to hyper-saline conditions. Ae. togoi is found in a relatively restricted range spanning the North Pacific coast of North America and coastal regions of Asia from subtropical to subarctic latitudes. Here, we present a de-novo chromosome-scale genome assembly and gene annotation for Ae. togoi, highlighting its relatively small genome size and novel chromosomal arrangements compared to other available genomes of Aedine mosquitoes. As part of the annotation process, we detail repeat content and distribution and curate several key multi-gene families, focusing on ion-transport proteins enriched in the larval salt-secreting gland that are candidates for facilitating hyperosmotic urine formation during development in saline water. Using these new resources, we gain mechanistic insight into the ion regulatory capabilities that power the remarkable saline tolerance of the larvae of Ae. togoi. Altogether, we have contributed to the growing body of genomic and transcriptomic resources for diverse mosquito species and provided mechanistic insights into the molecular adaptations required for an insect to thrive in highly dynamic environments such as coastal rock pools.
]]></description>
<dc:creator><![CDATA[ Chiang, J., Khodikian, E., Phelan, O., Parra, A. K., Peach, D. A. H., Durant, A. C., Matthews, B. J. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717544</dc:identifier>
<dc:title><![CDATA[Living by the sea: chromosome-scale genome assembly and salt gland transcriptomes provide insights into ion regulatory mechanisms in the saline-tolerant mosquito Aedes togoi]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717302v1?rss=1">
<title>
<![CDATA[
A segmental duplication-mediated deletion leads to neocentromere formation in orangutans 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717302v1?rss=1
</link>
<description><![CDATA[
Centromeres ensure faithful chromosome segregation, yet how new centromeres arise and replace canonical ones remains poorly understood. Here, we investigate a polymorphic centromere repositioning event on the orangutan chromosome 10 using near-telomere-to-telomere assemblies, epigenetic profiling, and population-scale data. We identify striking heterogeneity in canonical centromeres, ranging from large, higher-order repeat -satellite arrays to short, monomeric -satellite tracts, alongside the emergence of neocentromeres lacking -satellite DNA. We show a segmental duplication-mediated deletion of 3.6 Mbp that removed the higher-order repeat array, promoting centromere repositioning and neocentromere formation. Phylogenetic analyses reveal complex evolutionary dynamics, including introgression and incomplete lineage sorting in orangutan lineages. These findings demonstrate that centromere identity can evolve through structural variation and epigenetic reprogramming, highlighting its remarkable plasticity in primate genomes.
]]></description>
<dc:creator><![CDATA[ De Gennaro, L., Yoo, D., Pistacchia, L., Magrone, R., Daponte, A., Perrone, F., Ravasini, F., Mastrorosa, K. F., Oshima, K. K., Polano, C., Hoekzema, K., Munson, K. M., Wertz, J., Marroni, F., Catacchio, C. R., Antonacci, F., Noordermeer, D., Montinaro, F., Logsdon, G. A., Trombetta, B., Eichler, E. E., Ventura, M. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717302</dc:identifier>
<dc:title><![CDATA[A segmental duplication-mediated deletion leads to neocentromere formation in orangutans]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.10.717844v1?rss=1">
<title>
<![CDATA[
EVEE: Interpretable variant effect prediction from genomic foundation model embeddings 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.10.717844v1?rss=1
</link>
<description><![CDATA[
Predicting the clinical significance of genetic variants remains a central challenge in genomic medicine, with most observed variants classified as variants of uncertain significance. Here we show that representations from Evo 2, a 7-billion-parameter genomic foundation model, support accurate and interpretable pathogenicity prediction across variant types from a single framework. An embedding-based classifier, or "probe", trained on Evo 2 embeddings achieves state-of-the-art performance across single nucleotide variant consequence types (0.997 overall AUROC on 839k ClinVar variants) and generalizes zero-shot to indels (0.991 AUROC), outperforming bioinformatic meta-predictors, protein models, and existing foundation model approaches. Performance is robust across conservation levels and transfers to deep mutational scanning datasets for BRCA1, BRCA2, TP53, and LDLR. To make these predictions interpretable, we train supervised annotation probes to quantify predicted disruptions caused by each variant, then synthesize these disruption profiles into natural language explanations using a frontier reasoning model. We provide pre-computed predictions and on-demand explanations for all 4.2 million ClinVar variants through the Evo Variant Effect Explorer (EVEE), an interactive web resource for the community. This work establishes that representations from genomic foundation models can serve as a unified substrate for both accurate variant effect prediction and mechanistic interpretation, reframing interpretability in computational genomics from a trade-off into a complementary product of learned biological structure.
]]></description>
<dc:creator><![CDATA[ Pearce, M. T., Dooms, T., Yamamoto, R., Meehl, J., Molnar, C., Bissell, M., Hazra, D., Fang, C., Nguyen, N., Anderson, M., Osborne, C., Duffy, P., Toomey, B., Klee, E., Myasoedova, E., Ryu, A., Ayanian, S., Korfiatis, P., Redlon, M., Jain, A., Balsam, D., Wang, N. K. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.10.717844</dc:identifier>
<dc:title><![CDATA[EVEE: Interpretable variant effect prediction from genomic foundation model embeddings]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717501v1?rss=1">
<title>
<![CDATA[
Genomic insights into bacterial isolates dominating honeypot ant crop microbiomes reveal metabolically distinct Fructilactobacillus sp. 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717501v1?rss=1
</link>
<description><![CDATA[
Honeypot ants engage in a convergently-evolved phenotype called repletism, where specialized workers expand their crops and gasters to store vast amounts of food internally. They then store that food for months to support colonies during times of food scarcity. This fascinating phenotype is not well-understood and very little is known about the microbial interactions happening within the fructose-rich replete crop. Previous research using amplicon sequencing showed that Fructilactobacillus makes up nearly 100% of the crop microbiomes of Myrmecocystus mexicanus repletes. This striking result and successful isolation of those strains led to the present investigation into the phylogenetic diversity of these strains and any clues to the nature of the symbiotic relationship between them and the ant host. We find that the isolates from these repletes represented two evolutionary lineages, both most closely related to F. fructivorans. One of those lineages was also found to be phylogenetically and metabolically distinct from all other Fructilactobacillus reference genomes used in this study. This discovery in a genus of bacteria that are highly relevant for fermented human foods and will also lay the groundwork for future understanding of the convergent evolutionary mechanisms of repletism in ants.
]]></description>
<dc:creator><![CDATA[ Oiler, I. M., Francoeur, C., Grigaitis, P., LeBoeuf, A. C., Cicconardi, F., Montgomery, S. H., Khadempour, L. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717501</dc:identifier>
<dc:title><![CDATA[Genomic insights into bacterial isolates dominating honeypot ant crop microbiomes reveal metabolically distinct Fructilactobacillus sp.]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717357v1?rss=1">
<title>
<![CDATA[
Palaeogenomics-informed inferences of European dog admixture enables scalable dingo conservation 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717357v1?rss=1
</link>
<description><![CDATA[
Dingoes, mainland Australia's sole terrestrial apex mammal for over 3,000 years, are important components of many ecosystems and Indigenous cultural heritage. Yet conflicts with farmers over livestock predation following European colonisation led to widespread lethal control. These measures are further reinforced by perceptions of hybrid ancestry with European dogs. Accurate estimation of European dog ancestry is therefore essential for effective conservation, but existing tests yield highly conflicting results. Leveraging pre-colonial dingo palaeogenomes and a robust ancestry modelling framework, we reassess the genetic ancestry of contemporary populations. Our approach corrects limitations and biases in existing methods, producing consistent estimates even with as few as 10,000 genome-wide transversion genetic markers. Accounting for admixture uncovers population structure that has persisted for over two millennia and reveals patterns of genetic admixture coinciding with human activity during the colonial era. This study underscores the value of palaeogenomes as a vital conservation tool, offering insights unattainable from modern DNA alone. By clarifying ancestry and population structure, our study offers a robust foundation for effective regionally informed dingo management across Australia.
]]></description>
<dc:creator><![CDATA[ Ravishankar, S., Nguyen, N. C., Taufik, L., Michielsen, N. M., Bergström, A., Tobler, R., Fordham, D., Brüniche-Olsen, A., Rahbek, C., Llamas, B., Souilmi, Y. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717357</dc:identifier>
<dc:title><![CDATA[Palaeogenomics-informed inferences of European dog admixture enables scalable dingo conservation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717557v1?rss=1">
<title>
<![CDATA[
Flanking DNA sequences determine DNA methylation maintenance in proliferation, cancer and aging 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717557v1?rss=1
</link>
<description><![CDATA[
DNA methylation is a stable epigenetic modification essential for promoter silencing, retrotransposon silencing, genomic imprinting, and X-chromosome inactivation. Symmetrical DNA methylation at CpG dinucleotides is maintained after every round of cell division by the DNMT1-UHRF1 maintenance methyltransferase complex. Here we define a conserved rank order of DNA hexanucleotide sequences surrounding CpG sites that determines baseline DNA methylation levels in cells and the probability that DNA methylation is retained across cell divisions. This rank order is conserved in vertebrates and does not depend on TET enzymatic activity. CpG sites in hexanucleotide sequences less favored by DNMT1 are more susceptible to replication-dependent loss of DNA methylation over time; consequently, the methylation status of these motifs serves as a marker of cumulative cell divisions, biological age and cancer progression. Thus, the intrinsic vulnerability stemming from the sequence preference of the DNMT1-UHRF1 complex compromises the long-term stability of DNA methylation, especially at heterochromatic sites in proliferating cells, and contributes to the epigenetic dysregulation observed in cancer and aging.
]]></description>
<dc:creator><![CDATA[ Lopez-Moyado, I. F., Hernandez-Espinosa, L., Angel, J. C., Modat, A., Lleshi, E., Crawford, R., Faulkner, G. J., Rao, A. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717557</dc:identifier>
<dc:title><![CDATA[Flanking DNA sequences determine DNA methylation maintenance in proliferation, cancer and aging]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.10.717766v1?rss=1">
<title>
<![CDATA[
Suppression of upstream ORF translation is not a widespread mechanism of translational stimulation by yeast helicase Ded1 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.10.717766v1?rss=1
</link>
<description><![CDATA[
Ded1 is an essential DEAD-box helicase in yeast that broadly stimulates translation initiation and is critical for mRNAs with structured 5'UTRs. We have evaluated the proposal that Ded1 stimulates translation primarily by preventing initiation at upstream ORFs (uORFs) associated with stable secondary structures. By Ribo-Seq analysis under experimental conditions designed to suppress artifactual 5'UTR translation, we found that reduced translation of the main open-reading-frames (mORFs) in native mRNAs is generally not accompanied by increased 5'UTR translation in ded1 mutant cells, and that the presence of translated uORFs in yeast mRNAs generally does not confer heightened dependence on Ded1 for efficient translation of mORFs. Results from a high-throughput reporter assay examining native 5'UTRs reinforce the importance of Ded1 in initiation from structured 5' UTRs and show that impairing Ded1 has minimal effects on translational repression by uORFs. Our results demonstrate that, in cells growing vegetatively in rich medium, translational stimulation by suppression of inhibitory uORFs is restricted to a minority of Ded1 targets, and that unwinding of 5' UTR secondary structures per se is the principal mechanism for Ded1 stimulation of translation initiation.
]]></description>
<dc:creator><![CDATA[ Kumar, R., May, G., Sen, N. D., McManus, J., Hinnebusch, A. G. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.10.717766</dc:identifier>
<dc:title><![CDATA[Suppression of upstream ORF translation is not a widespread mechanism of translational stimulation by yeast helicase Ded1]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717080v1?rss=1">
<title>
<![CDATA[
scMultiPreDICT: A single-cell predictive framework with transcriptomic and epigenetic signatures 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717080v1?rss=1
</link>
<description><![CDATA[
Cellular responses to genetic perturbations depend on both transcriptional programs and the epigenetic landscape. While single-cell multiomics technologies enable simultaneous profiling of gene expression and chromatin accessibility, the relative contribution of each regulatory layer to gene expression remains unclear. Existing computational approaches focus on data integration and gene regulatory network inference but do not systematically compare the predictive performance of transcriptional versus epigenetic features on a gene-by-gene basis.We present scMultiPreDICT, a computational framework for comparative predictive modeling of gene expression using single-cell multiomics data. scMultiPreDICT benchmarks RNA-only, ATAC-only and multimodal feature sets across six machine learning models including regression, tree-based learning and deep learning using multiple biological datasets. We show that RNA-derived features generally provide strong predictive power, whereas chromatin accessibility alone yields a modest performance. Surprisingly, multimodal integration does not uniformly improve prediction accuracy; instead, its benefit is gene-specific and context-dependent. Feature importance analysis reveals that transcriptional features dominate for most genes, whereas chromatin accessibility contributes meaningfully for a subset of genes in specific cellular contexts. Overall, the results demonstrate that regulatory layers contribute differently to gene expression. scMultiPreDICT provides a systematic framework for identifying the relative contributions of transcriptional and epigenetic regulation across genes and cellular contexts, guiding the design of targeted perturbation studies and the prioritization of regulatory layers for therapeutic interventions. scMultiPreDICT is implemented in R and available at https://github.com/UzunLab/scMultiPreDICT/.
]]></description>
<dc:creator><![CDATA[ Manful, E.-E., Uzun, Y. ]]></dc:creator>
<dc:date>2026-04-11</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717080</dc:identifier>
<dc:title><![CDATA[scMultiPreDICT: A single-cell predictive framework with transcriptomic and epigenetic signatures]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717200v1?rss=1">
<title>
<![CDATA[
A Joint Promoterome-Proteome Atlas Highlights the Molecular Diversity of Human Skeletal Muscles 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717200v1?rss=1
</link>
<description><![CDATA[
More than 600 distinct skeletal muscles constitute up to 40% of the total mass of the human body. Human skeletal muscles differ in anatomical position, morphology, origin, and function, but the diversity of their molecular phenotypes, the gene expression and protein abundance profiles, remains poorly explored. Here, we report the large-scale CAGE-Seq promoterome profiling of 75 human skeletal muscles, complemented by 22 matched proteomes obtained with mass spectrometry. We identified 37001 transcribed regulatory elements and 1804 protein groups encompassing 1895 proteins, 80% of which demonstrated non-uniform expression across different muscles. The skeletal muscles of the eye, tongue, and diaphragm had the most distinctive molecular phenotypes, while the overall diversity was driven by hundreds of transcription factors with tissue-specific activity. By analyzing the allelic imbalance of CAGE-Seq reads, we discovered 6653 allele-specific single-nucleotide variants often coinciding with muscle-related GWAS SNPs, including muscle volume. Finally, we provide an interactive online atlas of transcriptomic and proteomic molecular phenotypes, facilitating further studies of gene regulation and heritable pathologies of skeletal muscles.
]]></description>
<dc:creator><![CDATA[ Buyan, A., Gazizova, G., Zgoda, V. G., Vavilov, N. E., Gryzunov, N., Eliseeva, I. A., Nozdrin, V., Sergeeva, Y., Titova, A., Shigapova, L., Erina, A. V., Mescheryakov, G., Murtazina, A., Deviatiiarov, R., Forrest, A. R. R., Makeev, V., Hayashizaki, Y., Popov, D., Shagimardanova, E., Kulakovskiy, I. V., Gusev, O. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717200</dc:identifier>
<dc:title><![CDATA[A Joint Promoterome-Proteome Atlas Highlights the Molecular Diversity of Human Skeletal Muscles]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.09.717429v1?rss=1">
<title>
<![CDATA[
DIANA: Deep Learning Identification and Assessment of Ancient DNA 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.09.717429v1?rss=1
</link>
<description><![CDATA[
The field of ancient metagenomics provides insights into past microbiomes, but with a growing dataset size, methods that rely on reference databases have limited scope. Here, we introduce DIANA, a multi-task neural network that predicts key metadata categories from unitig abundances. Trained on 2,597 run accessions (1.72~Tbp of assembled unitig sequences), DIANA accurately identifies sample host (94.6%), community type (90.0%), and material (88.9%) on held-out test data and demonstrates robust generalisation on an independent validation set. A key innovation is DIANA's ability to perform semantic generalisation, correctly classifying samples with labels unseen during training -- such as novel subspecies -- to their appropriate parent categories. By leveraging both known and uncharacterized genomic sequences, DIANA provides a rapid, data-driven system for metadata validation and quality control, accelerating discovery in ancient metagenomics research.
]]></description>
<dc:creator><![CDATA[ Duitama Gonzalez, C., Lopopolo, M., Nishimura, L., Faure, R., Duchene, S. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.09.717429</dc:identifier>
<dc:title><![CDATA[DIANA: Deep Learning Identification and Assessment of Ancient DNA]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.10.717550v1?rss=1">
<title>
<![CDATA[
Divergent landscapes of positive and negative selection signatures across residue-resolved human-virus protein-protein interaction interfaces 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.10.717550v1?rss=1
</link>
<description><![CDATA[
Virus-targeted host proteins evolve under dual selective pressures. Negative selection preserves within-host interactions, while positive selection promotes adaptive changes to evade viral engagement. Viral and endogenous within-host partners can compete for binding, bringing distinct pressures together on the same interaction interface. Yet, the spatial organization of distinct selective pressures across virus-targeted host proteins, and how such pressures manifest across diverse interaction contexts, remains largely unknown. Here, we integrate an evolutionarily annotated map of human-virus protein-protein interactions (PPIs) with intra-protein residue-residue contact maps to probe the spatial organization of residue-level selective pressures across PPI interfaces of virus-targeted host proteins. Across all PPI interfaces collectively, we find that residues under positive selection are spatially clustered, whereas those under negative selection are broadly dispersed, with additional spatial segregation between positive and strongly negatively selected sites. Moreover, while positive selection is unevenly distributed across interfaces bound exclusively by viral proteins (exogenous-specific), they are more uniformly distributed across interfaces shared between viral and within-host partners (mimic-targeted), suggesting that adaptive pressure from viral targeting acts on the entire mimic-targeted interface, whereas it acts on only a subset of the exogenous-specific interface. Strikingly, clustering of positively selected residues is more pronounced between mimic-targeted and other interface types than within exogenous- or endogenous-specific interfaces alone, suggesting that mimic-targeted interfaces may serve as focal points of adaptive evolution. Overall, our multiscale framework of PPI interfaces and residue-level contacts reveals heterogeneous, context-dependent landscapes of selective pressures across virus-targeted host proteins, providing a high-resolution view of how adaptation and constraint are intricately balanced and coordinated within the host.
]]></description>
<dc:creator><![CDATA[ Su, W.-C., Xia, Y. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.10.717550</dc:identifier>
<dc:title><![CDATA[Divergent landscapes of positive and negative selection signatures across residue-resolved human-virus protein-protein interaction interfaces]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717340v1?rss=1">
<title>
<![CDATA[
Genomic insights into polyketide toxin synthesis and algal symbiosis using high-quality genome sequences of the early divergent hexacorallian genus Palythoa (Cnidaria, Zoantharia) 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717340v1?rss=1
</link>
<description><![CDATA[
Palytoxin, first isolated from Palythoa toxica, is among the most potent marine toxins known. Despite decades of biochemical investigation, genetic bases underlying its potential biosynthesis in Palythoa remain unresolved. Here we present four high-quality genome assemblies of Palythoa species, including Palythoa cf. toxica, and integrate these with a chromosome-scale genome assembly of P. caribaeorum. Performing comparative genomic analyses, we screened for candidate genes potentially involved in palytoxin biosynthesis and examined patterns of genome evolution. Unexpectedly, we identified only two classes of ketosynthase (KS) domain-containing genes in Palythoa: fatty acid synthases (FAS) and bacterial-like polyketide synthases (PKSs). Contrasting other anthozoans, animal FAS-like PKS (AFPK) genes common to all Palythoa species were not detected. We found no evidence for lineage-specific expansion of PKS genes unique to Palythoa, suggesting that if palytoxin/palytoxin-like molecule biosynthesis is host-encoded, it may involve functional modification or co-opting pre-existing FAS and/or bacterial-like PKS pathways. Comparative analyses revealed expansions of gene families associated with transport and binding functions in Palythoa, potentially reflecting molecular adaptations linked to their sand-incorporating body structure. We identified TPT1 and CLEC4A as rapidly evolving genes in multiple Palythoa species, consistent with possible roles in growth regulation and host-microbe interactions. Additionally, comparison between azooxanthellate and zooxanthellate species revealed mutations within conserved protein domains of LePin, which has been implicated in cnidarian endosymbiosis, suggesting lineage-specific modifications associated with symbiotic state. This study establishes a foundation for zoantharian genomic research, provides insights into lineage-specific genomic signatures, and advances molecular and evolutionary biological knowledge of this ecologically important group.
]]></description>
<dc:creator><![CDATA[ Yoshioka, Y., Shoguchi, E., Chiu, Y.-L., Kawamitsu, M., Reimer, J. D., Yamashita, H. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717340</dc:identifier>
<dc:title><![CDATA[Genomic insights into polyketide toxin synthesis and algal symbiosis using high-quality genome sequences of the early divergent hexacorallian genus Palythoa (Cnidaria, Zoantharia)]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717246v1?rss=1">
<title>
<![CDATA[
Aimea gen. nov. defines a novel plant-associated yeast genus in Microbotryomycetes with three novel species 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717246v1?rss=1
</link>
<description><![CDATA[
Plant tissues and surfaces are among the largest microbial habitats on Earth, and commensal yeasts are common members of these communities, where they can contribute to plant-microbe interactions including the biological control of plant diseases. Here, we describe a novel genus, Aimea, of unpigmented, plant-associated basidiomycete yeasts, in the class Microbotryomycetes, and name three new species (A. erigeronia, A. cardamina, and A. sorghi) represented by four isolates from leaves and roots of multiple hosts. We characterize these taxa through analyses of metabolic requirements, tolerance to differences in osmolarity, pH, and temperature, and enzymatic activities. In parallel, we generate near-chromosome-scale hybrid genomes annotated with transcriptome data. We employ whole-genome and multilocus phylogenetic approaches to infer the placement of these species within a monophyletic clade. We use comparative genomics to examine how the gene content of these yeasts differs from that of other members of the Microbotryomycetes, including an apparent proliferation of retrotransposons. We further demonstrate the genetic transformability of these taxa using Agrobacterium tumefaciens-mediated transformation. The description of these new species, together with high-quality genome resources and a genetic transformation protocol, establishes a foundation for experimental studies of these novel plant-associated yeasts and their interactions with hosts and other microbes.
]]></description>
<dc:creator><![CDATA[ Liber, J. A., Coelho, M. A., He, S. Y. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717246</dc:identifier>
<dc:title><![CDATA[Aimea gen. nov. defines a novel plant-associated yeast genus in Microbotryomycetes with three novel species]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717236v1?rss=1">
<title>
<![CDATA[
The Rayleigh Quotient and Contrastive Principal Component Analysis II 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717236v1?rss=1
</link>
<description><![CDATA[
Contrastive principal component analysis (PCA) methods are effective approaches to dimensionality reduction where variance of a target dataset is maximized while variance of a background dataset is minimized. We previously described how contrastive PCA problems can be written as solutions to generalized eigenvalue problems that maximize particular instantiations of the Rayleigh quotient. Here, we discuss two extensions of contrastive PCA: we use kernel weighting from spatial PCA (k-{rho}PCA) to contrast spatial and non-spatial axes of variation, and separately solve the Rayleigh quotient in the space of basis function coefficients (f-{rho}PCA) to find modes of variation in functional data. Together, these extensions expand the scope of contrastive PCA while unifying disparate fields of spatial and functional methods within a single conceptual and mathematical framework. We showcase the utility of these extensions with several examples drawn from genomics, analyzing gene expression in cancer and immune response to vaccination.
]]></description>
<dc:creator><![CDATA[ Jackson, K. C., Carilli, M. T., Pachter, L. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717236</dc:identifier>
<dc:title><![CDATA[The Rayleigh Quotient and Contrastive Principal Component Analysis II]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.04.08.717220v1?rss=1">
<title>
<![CDATA[
TopicVI: A Knowledge-guided deep interpretable model for resolving context-specific gene programs 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.04.08.717220v1?rss=1
</link>
<description><![CDATA[
Mechanistic insights from single-cell and spatial transcriptomics largely rely on cell clustering, differential expression analysis, and interpretation through prior biological knowledge. However, this approach is often limited by the reliance on curated biological priors that fail to capture context-specific gene programs, particularly in complex disease states. To address this gap, we introduce TopicVI, a deep interpretable model that integrates established biological knowledge with data-driven refinement to discover context-dependent gene programs in single-cell and spatial transcriptomic data. TopicVI jointly infers cell clusters and gene topics using optimal transport to flexibly align prior gene programs with observed data while permitting context-specific refinements. Comprehensive benchmarking demonstrates that TopicVI outperforms existing methods in biological conservation, batch correction, topic coherence, and rare cell identification. TopicVI effectively disentangles multiple sources of biological variation, such as separating anatomy-specific expression patterns from disease-associated signatures in spatial transcriptomics. Applying TopicVI to glioblastoma datasets, we identify gene topics related to cell cycle regulation and EGFR signaling that reveal convergent tumor states across distinct drug perturbations. By integrating prior knowledge with data-driven discovery, TopicVI enables identification of interpretable gene programs that illuminate biological processes and therapeutic mechanisms in complex transcriptomics data.
]]></description>
<dc:creator><![CDATA[ Cai, G., Zhao, W., Zhu, X., Lin, Y., Zhou, B., Cao, J., He, Q., Yang, B., Gu, X., Xiong, X., Zhou, Z. ]]></dc:creator>
<dc:date>2026-04-10</dc:date>
<dc:identifier>doi:10.64898/2026.04.08.717220</dc:identifier>
<dc:title><![CDATA[TopicVI: A Knowledge-guided deep interpretable model for resolving context-specific gene programs]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-04-10</prism:publicationDate>
<prism:section></prism:section>
</item>
</rdf:RDF>
