<?xml version="1.0" encoding="UTF-8" ?>
<rdf:RDF xmlns:admin="http://webns.net/mvcb/" xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:prism="http://purl.org/rss/1.0/modules/prism/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/">
<channel rdf:about="https://biorxiv.org">
<admin:errorReportsTo rdf:resource="mailto:biorxiv@cshlpress.edu"/>
<title>bioRxiv Subject Collection: Genomics Bioinformatics</title>
<link>https://biorxiv.org</link>
<description>
This feed contains articles for bioRxiv Subject Collection "Genomics Bioinformatics"
</description>

<items>
<rdf:Seq>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.07.723456v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.06.723212v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.06.723284v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.05.723039v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.05.722287v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.09.723953v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.06.723289v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.06.722337v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.05.723051v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.06.723015v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.01.721633v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.08.723698v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.06.723290v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.06.722973v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.06.723246v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.06.723091v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.06.722876v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.06.723123v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.05.723059v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.05.723100v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.06.723148v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.05.723092v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.06.721805v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.06.722404v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.06.723370v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.05.723027v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.05.723040v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.05.723010v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.05.722940v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.05.05.722888v1?rss=1"/>
</rdf:Seq>
</items>
<prism:eIssn/>
<prism:publicationName>bioRxiv</prism:publicationName>
<prism:issn/>

<image rdf:resource=""/>
</channel>
<image rdf:about="">
<title>bioRxiv</title>
<url>https://www.biorxiv.org/sites/default/files/bioRxiv_article.jpg</url>
<link>https://www.biorxiv.org</link>
</image>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.07.723456v1?rss=1">
<title>
<![CDATA[
InterScale reveals multi-scale cellular interaction programs in spatial transcriptomics 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.07.723456v1?rss=1
</link>
<description><![CDATA[
Tissue homeostasis and disease emerge from cell-cell interactions operating across spatial scales: from autocrine and juxtacrine signals within micrometers to paracrine gradients coordinating responses across tissues. While these can be read out from spatial transcriptomics, existing computational methods capture either local adjacency-based or long-range dependencies, but rarely both within a single framework. We introduce InterScale, a graph-transformer approach that jointly models local and global cellular interactions from spatial transcriptomics data. By integrating a Graph Convolutional Network as a local component with a global transformer encoder, InterScale learns multi-scale representations of cellular communication. A downstream workflow enables scale-resolved interpretation of interactions from gene to tissue level. Applied to Sonic Hedgehog morphogen patterning in neural organoids, InterScale resolves spatially restricted neuronal differentiation programs and broader progenitor regulatory states along the morphogen gradient. In a human pancreatic dataset contrasting healthy and type 1 diabetic tissue, it reveals disease-associated spatial reorganization and tissue remodeling. InterScale's modular architecture supports diverse spatial transcriptomics platforms and provides a scalable, unbiased, and biologically interpretable framework for studying cellular interactions across scales.
]]></description>
<dc:creator><![CDATA[ Drummer, F. K., Jimenez, S., Marco, F. D., Schaar, A. C., Pentimalli, T. M., Beckmann, J., Rajewsky, N., Theis, F. J. ]]></dc:creator>
<dc:date>2026-05-11</dc:date>
<dc:identifier>doi:10.64898/2026.05.07.723456</dc:identifier>
<dc:title><![CDATA[InterScale reveals multi-scale cellular interaction programs in spatial transcriptomics]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.06.723212v1?rss=1">
<title>
<![CDATA[
SRSA-VAE: Self-Attention-Based Feature Learning for Single-Cell Multimodal Clustering 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.06.723212v1?rss=1
</link>
<description><![CDATA[
Clustering plays a critical role in the analysis of single-cell omics data for identifying cellular heterogeneity and uncovering biological mechanisms. However, the high dimensionality, sparsity, and multimodal nature of single-cell datasets such as single-cell RNA sequencing (scRNA-seq) and Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) pose significant challenges for effective feature learning and representation learning. Traditional dimensionality reduction methods often rely on linear transformations and fail to capture complex nonlinear relationships between gene and protein expression profiles. In this work, we propose SRSA-VAE, a scalable variational autoencoder framework that integrates a residual self-attention encoder for context-aware feature learning and multimodal representation learning. The proposed model dynamically contextualizes gene and protein representations through a self-attention mechanism, enabling the encoder to capture inter-cell relationships and emphasize biologically informative signals. A scalable residual connection further stabilizes training and preserves essential input information during latent representation learning. We evaluate SRSA-VAE on five large-scale publicly available single-cell datasets, including both scRNA-seq and CITE-seq data, and compare its performance with established deep generative models. Experimental results demonstrate that SRSA-VAE consistently outperforms existing methods in Adjusted Rand Index (ARI) across benchmark datasets, with particularly strong gains on complex immune cell populations. Ablation studies further confirm the importance of the self-attention mechanism and residual connection in enhancing model stability and clustering accuracy. The proposed model offers a generalizable, robust, and scalable solution for single-cell clustering tasks.
]]></description>
<dc:creator><![CDATA[ Das, R., Dey, A., Maulik, U., Bandyopadhyay, S. ]]></dc:creator>
<dc:date>2026-05-11</dc:date>
<dc:identifier>doi:10.64898/2026.05.06.723212</dc:identifier>
<dc:title><![CDATA[SRSA-VAE: Self-Attention-Based Feature Learning for Single-Cell Multimodal Clustering]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.06.723284v1?rss=1">
<title>
<![CDATA[
Temperate phage microdiversity reflects infant gut microbiome maturation independent of chronic undernutrition 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.06.723284v1?rss=1
</link>
<description><![CDATA[
The assembly and maturation of the infant gut microbiome is a critical developmental process. Yet the dynamics of the viral community, particularly in the context of stunting (chronic malnutrition) remain underexplored. Leveraging longitudinal fecal metagenomes from Zimbabwean infants with normal and stunted growth trajectories, we characterized the development of the gut bacterial and temperate phage communities from birth to 18 moths old. We found that infant gut temperate phages target hallmark early-life bacterial taxa, such as Bifidobacteriaceae, and exhibit an age-dependent maturation that parallels bacterial succession. Notably, both bacterial and temperate phage alpha diversity increased with age. This contrasts with previous studies focused on the extracellular viral fraction and highlights a strong coupling between prophage early-life dynamics and during bacterial gut colonization. Using abundance-based maturation models, we identified successional phases of colonization for both bacteria and their associated temperate viral clusters. Importantly, a viral microdiversity maturation model provided a stronger prediction of chronological age than viral abundance-based model, revealing within-phage genomic variation as a key signal of virome assembly, particularly around weaning. Contrary to findings in wasting (or severe acute malnutrition), stunted growth trajectories were not associated with a significant delay in either bacterial or temperate phage maturation. These results demonstrate that viral genomic variation is a new, informative dimension of early-life gut microbial assembly and that stunting may not impair infants gut maturation process.
]]></description>
<dc:creator><![CDATA[ Camelo Valera, L. C. C., Reyes, A., Maurice, C. F. ]]></dc:creator>
<dc:date>2026-05-10</dc:date>
<dc:identifier>doi:10.64898/2026.05.06.723284</dc:identifier>
<dc:title><![CDATA[Temperate phage microdiversity reflects infant gut microbiome maturation independent of chronic undernutrition]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.05.723039v1?rss=1">
<title>
<![CDATA[
Deep Computational Anatomy via Latent-Aligned Multiview Normalizing Flows 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.05.723039v1?rss=1
</link>
<description><![CDATA[
In modeling complex probability distributions, normalizing flows provide exact-likelihood, bijective mappings between empirical data and tractable latent spaces. Building on this foundation, latent-aligned multiview normalizing (LAMNr) flows leverage these salient properties to learn shared latent subspaces across heterogeneous, multimodal datasets while simultaneously topologically unfolding the sampled data manifold into a continuous vector space. Formal latent-alignment constraints are used to model shared structural features separate from view-specific variations, coordinating latent projections into a shared geometric subspace. By applying this transformation in the context of biological imaging, the framework establishes a potential basis for a deep learning interpretation of foundational computational anatomy concepts, such as the population template, latent distances, and geodesic pairwise image interpolation. Additionally, the proposed framework enables closed-form conditional modeling for exact cross-view imputation and other latent space manipulations. Evaluations and illustrations on both imaging-derived phenotypes (IDPs) and multimodal MRI demonstrate the proposed framework and potential applications. To further motivate our work, we provide a robust and comprehensive, 2D and 3D open-source implementation in PyTorch, natively integrated with the ANTsX ecosystem (i.e., ANTsTorch) for efficient training and subsequent data transformation, manipulation, and analysis.
]]></description>
<dc:creator><![CDATA[ Tustison, N. J., Avants, B. B., Cook, P. A., Gee, J. C., Stone, J. R. ]]></dc:creator>
<dc:date>2026-05-10</dc:date>
<dc:identifier>doi:10.64898/2026.05.05.723039</dc:identifier>
<dc:title><![CDATA[Deep Computational Anatomy via Latent-Aligned Multiview Normalizing Flows]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.05.722287v1?rss=1">
<title>
<![CDATA[
AI-enabled virtual immunopeptidomics links quantitative neoantigen presentation to immunogenicity 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.05.722287v1?rss=1
</link>
<description><![CDATA[
Effective anti-tumor T cell response depends on both neoantigen quality (non-selfness) and quantity (abundance). However, existing methods for neoantigen prioritization largely overlook peptide abundance because it is difficult to measure and model. To bridge this gap, we developed epiVIP, a deep learning framework that predicts the abundance of individual HLA-I peptides using widely available (sc)RNA-seq data. Trained on 1.7 million immune peptides paired with gene expression profiles, epiVIP demonstrated strong generalizability across unseen samples. Analyzing 33,711 neoantigens from clinical datasets revealed a compensatory relationship between abundance and non-selfness in determining antigenicity, providing quantitative support for the TCR avidity theory. Importantly, abundance independently predicted tumor reactivity and patient survival across multiple neoantigen vaccine cohorts and immune checkpoint blockade cohorts. Mechanistic interpretation of epiVIP further identified directional regulation of MAGEA3 epitope presentation by PSME4, which was validated experimentally using T cell functional assays. Together, these findings established AI-enabled virtual immunopeptidomics as a powerful strategy to improve cancer immunotherapy.
]]></description>
<dc:creator><![CDATA[ Tan, Y., Yang, Z., Wang, T., Hu, H., Fleming, J., Pan, M., Eisenlohr, L. C., Li, B. ]]></dc:creator>
<dc:date>2026-05-10</dc:date>
<dc:identifier>doi:10.64898/2026.05.05.722287</dc:identifier>
<dc:title><![CDATA[AI-enabled virtual immunopeptidomics links quantitative neoantigen presentation to immunogenicity]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.09.723953v1?rss=1">
<title>
<![CDATA[
Deciphering the epitranscriptomic code of RNA degradation with nanopore direct RNA sequencing 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.09.723953v1?rss=1
</link>
<description><![CDATA[
The precise regulation of RNA degradation is crucial for gene expression homeostasis, yet how multiple molecular features coordinate on a single transcript remains poorly understood. Here, we use nanopore direct RNA sequencing (DRS) to simultaneously track alternative isoforms, m6A modifications, and poly(A) tail dynamics at single-molecule resolution during a time course of RNA decay. We show that m6A regulates RNA degradation in a stoichiometry-dependent manner, where modification levels quantitatively modulate decay kinetics. Mechanistically, m6A is functionally coupled to deadenylation, promoting accelerated poly(A) tail shortening and coordinated RNA turnover. At the isoform level, we identify regional m6A clusters (RMCs) as structural elements that associate with isoform-selective degradation and remodel protein-coding potential. Furthermore, transcript splicing architecture is associated with distinct m6A deposition patterns, suggesting that gene structure encodes RNA decay kinetics through m6A-mediated regulation. A machine learning model integrating these multi-modal features highlights the central contribution of m6A and deadenylation in shaping RNA decay, while revealing substantial regulatory heterogeneity across transcripts. Collectively, our study deciphers the multi-layered, cooperative principles of RNA degradation and provides an epitranscriptomic perspective for understanding how RNA fate is encoded at the single-molecule resolution.
]]></description>
<dc:creator><![CDATA[ Zhang, Z., Wang, C.-L., Zhang, Z.-H., Huang, Y.-F., Xie, Y.-Y., Zhong, Z.-D., Tang, G.-R., Ren, Z.-H., Lan, Y.-L., Kong, J.-W., Qiao, Z.-S., Su, T.-W., Chen, H.-X., Wang, Q.-Y., Luo, R.-J., He, J.-T., Liu, W.-Q., Wu, F., Luo, G.-Z. ]]></dc:creator>
<dc:date>2026-05-10</dc:date>
<dc:identifier>doi:10.64898/2026.05.09.723953</dc:identifier>
<dc:title><![CDATA[Deciphering the epitranscriptomic code of RNA degradation with nanopore direct RNA sequencing]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.06.723289v1?rss=1">
<title>
<![CDATA[
Fatty Acid Oxidation Suppression Reprograms Fibroblasts in Fibrostenotic Crohns Disease 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.06.723289v1?rss=1
</link>
<description><![CDATA[
Fibrostenotic complications represent a major cause of morbidity in Crohns disease (CD), yet the cellular mechanisms that drive intestinal fibrosis independent of active inflammation remain poorly understood. Here, we identify impaired fatty acid oxidation (FAO) as a defining metabolic feature of fibroblasts in fibrostenotic CD. Untargeted lipidomics of non-inflamed colonic tissue from CD patients demonstrated enrichment of triacylglycerols and long-chain acylcarnitines, suggesting altered lipid utilization. Across three independent RNA-sequencing cohorts, including treatment-naive pediatric ileal biopsies, FAO genes (CPT1A, CPT2, SLC25A20) were selectively downregulated in patients with or destined to develop fibrostenotic disease. Single-cell RNA-sequencing localized these transcriptional alterations specifically to fibroblasts within strictured ileum. Primary fibroblasts derived from fibrostenotic CD exhibited increased neutral lipid accumulation, impaired mitochondrial fatty acid trafficking, and diminished responsiveness to PPARgamma-mediated suppression of TGFbeta-induced myofibroblast activation. Together, these findings demonstrate that FAO impairment is a conserved, fibroblast-specific metabolic program associated with intestinal fibrosis in CD and suggest that metabolic modulation of stromal cells represents a potential therapeutic strategy for fibrostenotic disease.
]]></description>
<dc:creator><![CDATA[ Jihad Aljabban, J., Awad, A., McMichael, B. D., Gartner, V., Thomas, V., Huan, B., Weaver, D., Lian, G., Beasley, C., Lau, G. W.-J., Silverstein, S., Kapadia, M., Salvador, A. C., Rieder, F., Thaxton, J. E., Furey, T. S., Bhatt, A. P., Sheikh, S. Z. ]]></dc:creator>
<dc:date>2026-05-10</dc:date>
<dc:identifier>doi:10.64898/2026.05.06.723289</dc:identifier>
<dc:title><![CDATA[Fatty Acid Oxidation Suppression Reprograms Fibroblasts in Fibrostenotic Crohns Disease]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.06.722337v1?rss=1">
<title>
<![CDATA[
scPlOver: inferring DNA content from amplification-free single-cell WGS using fragment overlaps 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.06.722337v1?rss=1
</link>
<description><![CDATA[
Correctly inferring copy-number aberrations from single-cell DNA sequencing data requires estimating cell DNA content, which is unidentifiable from read counts alone. In tagmentation-based sequencing, sequence fragments are distinct DNA molecules, thus overlap counts are directly linked to copy number. We present a theoretical model of fragment overlaps as a function of copy number and coverage and introduce scPlOver, a method that uses this model to infer DNA content. scPlOver outperforms existing methods on simulated and experimental data and identifies thousands of ovarian cancer cells with higher DNA content than previously estimated across a cohort of 41 patients.
]]></description>
<dc:creator><![CDATA[ Myers, M. A., Satas, G., Shah, S., Mcpherson, A. ]]></dc:creator>
<dc:date>2026-05-10</dc:date>
<dc:identifier>doi:10.64898/2026.05.06.722337</dc:identifier>
<dc:title><![CDATA[scPlOver: inferring DNA content from amplification-free single-cell WGS using fragment overlaps]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.05.723051v1?rss=1">
<title>
<![CDATA[
Rubus armeniacus genome sequence reveals the secrets of blackberry anthocyanin biosynthesis 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.05.723051v1?rss=1
</link>
<description><![CDATA[
Here, we present a comprehensive multiomics analysis of anthocyanin biosynthesis in Rubus armeniacus, known for its dark fruits. A phased genome sequence of the tetraploid blackberry was generated, achieving an N50 of 34 Mb with an assembly size of 1.2 Gbp based on Oxford Nanopore Technology sequencing (ONT). The BUSCO score for the total assembly shows a high completeness of 99.1%. The assembly was separated into 4 pseudohaplophases, with the pseudohaplophase A representing the R. armeniacus genome in 7 chromosome scale contigs, with an N50 of 46 Mbp and 98.8% conserved BUSCO genes. A total of 118,183 protein coding genes were annotated within the genome assembly and all relevant genes encoding enzymes and transcriptional regulators of the anthocyanin biosynthesis pathway were identified within each pseudohaplophase. To further understand the underlying cause of dark pigmentation, the gene expression was analysed during different stages of berry development revealing a strong induction of anthocyanin biosynthesis genes including the anthocyanin activating subgroup 6 MYB transcriptions during the berry ripening process. Further, a quantification of cyanidin-3-O-glucoside in methanolic berry extract, utilizing a UHPLC-HRAM-MS analysis, revealed an approximately 500-fold increase of cyanidin-3-O-glucoside from green to black fruit, indicating that dark pigmentation in R. armeniacus results from high anthocyanin accumulation.
]]></description>
<dc:creator><![CDATA[ Wolff, K., Nowak, M. S., Thoben, C., Beuerle, T., Pucker, B. ]]></dc:creator>
<dc:date>2026-05-10</dc:date>
<dc:identifier>doi:10.64898/2026.05.05.723051</dc:identifier>
<dc:title><![CDATA[Rubus armeniacus genome sequence reveals the secrets of blackberry anthocyanin biosynthesis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.06.723015v1?rss=1">
<title>
<![CDATA[
The reliability and accuracy of recombination inferred by Shapeit2 duoHMM on whole genome sequence 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.06.723015v1?rss=1
</link>
<description><![CDATA[
Few studies assessed the performance of population-based phasing combined with parental genotypes to infer recombination on whole genome sequence (WGS) data. In this study, our objective was to evaluate whether Shapeit2 duoHMM, a Hidden Markov Model using parental genotypes, infers recombination events reliably on WGS and with narrower intervals than SNP arrays. We based our analysis on the overlap between recombination events inferred by Merlin on SNP genotypes and Shapeit2 on WGS and SNP genotypes. We used a sample of 61 extended families from the GeneSTAR study with TopMED freeze 8 WGS on 580 sequenced subjects (60% of sample). Shapeit2 was run with a window size of 500 kilobases and 200 states on WGS. To mimic a SNP array, we extracted genotypes of 355,112 autosomal markers on the Illumina OmniExpress array. The number of recombination events per meiosis inferred by Shapeit2 on the WGS data (36.8) was aligned with the expected numbers over autosomes (35.7), although Merlin overestimated this number (115.0). 73% of Shapeit2 recombination events on WGS were detected by Merlin, a proportion rising to 91% when restricting to events also inferred by Shapeit2 on OmniExpress genotypes. Furthermore, Shapeit2 recombination intervals were narrower on WGS than OmniExpress genotypes (median of 4,530 bp vs. 49,458 bp). This suggests that Shapeit2 on WGS is a reliable and accurate method for inferring recombination events.
]]></description>
<dc:creator><![CDATA[ Oubninte, S., Ruczinski, I., Yanek, L. R., Mathias, R., Bureau, A. ]]></dc:creator>
<dc:date>2026-05-10</dc:date>
<dc:identifier>doi:10.64898/2026.05.06.723015</dc:identifier>
<dc:title><![CDATA[The reliability and accuracy of recombination inferred by Shapeit2 duoHMM on whole genome sequence]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.01.721633v1?rss=1">
<title>
<![CDATA[
LIVIA: a browser-based tool for assessing and visualizing predicted protein interactions 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.01.721633v1?rss=1
</link>
<description><![CDATA[
As protein structure prediction tools become widely adopted across biology, there is a growing need for accessible methods to assess and visualize predicted protein-protein interactions (PPIs). Here we present LIVIA (Local Interaction Visualization and Analysis), a browser-based tool that computes local PPI confidence metrics across multiple prediction platforms, identifies predicted interface residues, embeds an interactive Mol-star 3D viewer, and generates visualization scripts for ChimeraX and PyMOL. The tool automatically detects prediction formats; all parsing and computation occur locally on the users machine. LIVIA is freely available at https://flyark.github.io/LIVIA.
]]></description>
<dc:creator><![CDATA[ Kim, A.-R., Perrimon, N. ]]></dc:creator>
<dc:date>2026-05-10</dc:date>
<dc:identifier>doi:10.64898/2026.05.01.721633</dc:identifier>
<dc:title><![CDATA[LIVIA: a browser-based tool for assessing and visualizing predicted protein interactions]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.08.723698v1?rss=1">
<title>
<![CDATA[
Wolves in black: multiple introgressions and natural selection may explain melanism in Italian wolves 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.08.723698v1?rss=1
</link>
<description><![CDATA[
Hybridisation between wild and domestic taxa can favour the spread of domestic alleles into wild populations through backcrossing. The complex interplay of random genetic drift, recombination, and selection can shape the fate of introgressed alleles. Maladaptive domestic variants are likely to be purged by natural selection, but others may persist across generations. It has long been known that the Apennine Italian wolf population, exposed to large numbers of free-ranging dogs, has experienced extensive introgression. The unusually high frequency of black wolves observed in Italy, compared to other European populations, may parallel patterns documented in North American wolves, where the melanistic KB allele at the CBD103 gene, of domestic origin, has spread over thousands of years of introgression. We tested whether the KB mutation entered the peninsular Italian wolf population via hybridisation and spread through adaptive introgression. Genome-wide analyses of black and wild-type (grey-coated) Apennine wolves showed no clear signatures of recent dog ancestry in most melanistic animals. Our ancestry reconstruction approaches identified two distinct KB haplogroups of domestic origin, suggesting multiple introgression events. Notably, we found molecular evidence consistent with balancing selection on the KB haplotypes, whose functional role, nonetheless, warrants further research. Therefore, the microevolutionary genomic and ecological consequences of wolf-dog hybridisation in Italy should be carefully investigated to inform appropriate science-based conservation management strategies.
]]></description>
<dc:creator><![CDATA[ Fabbri, G., Battilani, D., Mattucci, F., Galaverni, M., Stronen, A. V., Musiani, M., Godinho, R., Lobo, D., Scandura, M., Randi, E., Fabbri, E., Caniglia, R. ]]></dc:creator>
<dc:date>2026-05-09</dc:date>
<dc:identifier>doi:10.64898/2026.05.08.723698</dc:identifier>
<dc:title><![CDATA[Wolves in black: multiple introgressions and natural selection may explain melanism in Italian wolves]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.06.723290v1?rss=1">
<title>
<![CDATA[
Deciphering the Molecular Structure of the Type III Secretion System in Chlamydia trachomatis for Structure-Based Therapeutic Targeting 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.06.723290v1?rss=1
</link>
<description><![CDATA[
Chlamydia trachomatis is an obligate intracellular Gram-negative pathogen responsible for sexually transmitted infections and trachoma in humans. Although antibiotics are generally effective against acute infections, persistent chlamydial forms often exhibit reduced susceptibility during chronic infection. Chlamydia relies on its type III secretion system (T3SS) to inject effector proteins into host cells, making T3SS proteins attractive targets for antivirulence therapeutics. In this study, we employed an integrated computational pipeline to model and assemble the C. trachomatis T3SS constituent proteins. Template-based modeling using crystallographic structures of homologs from other Gram-negative bacteria revealed a highly conserved structural architecture despite low sequence identity (18-46%). Stereochemical validation confirmed high model quality, with most T3SS proteins exhibiting favorable protein-protein interactions (PPIs). Since the activity of the T3SS complex relies on extensive PPIs, we targeted these PPIs as a promising approach to attenuate bacterial virulence. CdsN, which functions as an ATPase of the T3SS, is a hexamer of which we targeted the dimerization interface. Structure-based virtual screening of compounds from the e-Drug3D and IMPPAT libraries against predicted hotspot residues and the identified druggable pocket at the CdsN dimeric interface, followed by ADMET screening, yielded three promising candidates: M Roflumilast (Drug ID: 1537), Elacestrant (Drug ID: 2081), and Tecovirimat (Drug ID: 1889). All three ligands formed thermodynamically stable complexes with the CdsN dimer, with Elacestrant demonstrating the most favourable binding free energy. This was also confirmed by 100 ns molecular dynamics simulation. This study provides new insights into the molecular architecture of C. trachomatis T3SS and identifies M Roflumilast, Elacestrant, and Tecovirimat as potential drug candidates against chlamydial infection.
]]></description>
<dc:creator><![CDATA[ Panda, A., Kapoor, J., Rajagopal, R., Kumar, S., Bandyopadhyay, A. ]]></dc:creator>
<dc:date>2026-05-09</dc:date>
<dc:identifier>doi:10.64898/2026.05.06.723290</dc:identifier>
<dc:title><![CDATA[Deciphering the Molecular Structure of the Type III Secretion System in Chlamydia trachomatis for Structure-Based Therapeutic Targeting]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.06.722973v1?rss=1">
<title>
<![CDATA[
Know Your Alphabet: Conformational Noise, Latent-Space Encodings, and the Future of Structural Phylogenetics 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.06.722973v1?rss=1
</link>
<description><![CDATA[
Structural alphabets have transformed protein phylogenetics by enabling sequence-style alignment and maximum-likelihood inference to be applied directly to structural data. However, a coordinate-explicit alphabet, in which character states are derived from three-dimensional atomic positions, encodes not only evolutionary signal but also the conformational variability inherent to protein structure. This source of noise has not previously been quantified in a phylogenetic context, and no framework exists for comparing alphabets with respect to their conformational sensitivity. Here, we introduce the Normalised Noise Index (NNI), a Shannon entropy-based metric for quantifying conformational sensitivity in structural alphabet encodings, and apply it alongside ensemble-wide Robinson--Foulds (RF) variance as a framework for characterising the impact of conformational noise on phylogenetic inference. Across 3,749 single-chain NMR ensembles from the Protein Data Bank, we show that 3Di character variability is a pervasive feature of experimentally observed conformational spread, with NNI negatively correlated with within-ensemble structural stability. A 100 ns molecular dynamics simulation of myoglobin confirmed that thermal fluctuations alone are sufficient to generate comparable 3Di character variation and, in 2.9% of cases, to redirect maximum-likelihood tree search away from the expected topology in a 4-taxon globin benchmark with independently established relationships. Exhaustive enumeration of 4,800 conformational replicates across three NMR ensembles revealed that topological variance under 3Di encoding is approximately 1.7-fold greater than under structural distance, based on 11,517,600 pairwise RF comparisons, a source of uncertainty invisible to standard bootstrap analysis. By contrast, TEA, a sequence-derived structure-aware alphabet inferred from ESM-2 embeddings rather than directly from atomic coordinates, is insulated from conformational sampling by construction and yields zero topological variance across all conformational replicates, serving here as a noise-insulated reference rather than a proposed replacement for 3Di. Together, these results demonstrate that alphabet choice is a methodological variable in structural phylogenetics, and that the NNI metric and RF variance framework introduced here provide a practical basis for principled noise characterisation as new structural alphabets continue to emerge.
]]></description>
<dc:creator><![CDATA[ Schmid, M., Liu, Y., Malik, A. J., Ascher, D. ]]></dc:creator>
<dc:date>2026-05-09</dc:date>
<dc:identifier>doi:10.64898/2026.05.06.722973</dc:identifier>
<dc:title><![CDATA[Know Your Alphabet: Conformational Noise, Latent-Space Encodings, and the Future of Structural Phylogenetics]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.06.723246v1?rss=1">
<title>
<![CDATA[
Environmental Regulation and Gene-by-Environment Interaction Influence RAP1 Activity and its Impact on Gene Expression 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.06.723246v1?rss=1
</link>
<description><![CDATA[
Gene-by-environment (GxE) interactions play a major role in shaping both phenotypic and molecular variation, with important implications for human health and disease. In this study, we used the Doxycycline (Dox) regulated, tetracycline-responsive (Tet-Off) promoter system to sequentially reduce or titrate gene expression levels of the essential yeast transcription factor Repressor Activator Protein 1 (RAP1) similar to a hypomorph allele series, across three distinct environments: Yeast Peptone Dextrose (YPD) media, YPD media with Heat Shock (HS), and Yeast Peptone Acetate (YPAC) media. We then performed RNA sequencing (RNA Seq) to assess global transcriptional responses to RAP1 reduction in these different growth environments. Our analysis first focused on the independent effects of varying RAP1 expression levels within and across environments. We then explored GxE interactions, revealing a subset of genes with significant consequences of reduced levels of RAP1 and environment-specific expression patterns. Notably, many genes exhibited opposite effects of RAP1 titration on gene expression when yeast were grown in YPAC media compared to YPD media and/or HS, suggesting environment-dependent regulatory architecture. This design reveals how cells integrate internal transcriptional and regulatory changes with external environmental cues, providing a deeper view of GxE architecture. Using Weighted Gene Co-expression Network Analysis (WGCNA), we identified co-regulated gene modules, and by combining this with transcription factor motif enrichment tests, our study identified candidate regulators driving their dynamics. Our findings demonstrate that gene regulatory networks can vary dramatically depending on the environmental context an organism experiences, which can then influence the specific phenotypes produced by a particular genetic perturbation. This illustrates the complexity of genotype-environment interactions and the importance of studying gene function in multiple environments to gain a truly comprehensive understanding of a genes sometimes numerous and diverse functions.
]]></description>
<dc:creator><![CDATA[ Kalra, S., Sanchez, G., Stubin, A., Le, A., Bakshian, A., Ortiz Diaz, B., Mark, B. M., Pena, C., Parker, E., Johnston, E., Hsu, E., Brangham, G., Bala-Mehta, I., Perez, L., Milrod, M., Stanten, M., Nakamura, M., Hwang, P., Ptaszynska, S., Cander, S., Park, S., Tan, T. L., Zhou, Y., Coolon, J. ]]></dc:creator>
<dc:date>2026-05-09</dc:date>
<dc:identifier>doi:10.64898/2026.05.06.723246</dc:identifier>
<dc:title><![CDATA[Environmental Regulation and Gene-by-Environment Interaction Influence RAP1 Activity and its Impact on Gene Expression]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.06.723091v1?rss=1">
<title>
<![CDATA[
A structural grammar of truncation across the human homodimer landscape 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.06.723091v1?rss=1
</link>
<description><![CDATA[
Alternative splicing and proteolytic truncation generate tens of thousands of protein isoforms in the human proteome, but the structural consequences for quaternary state, the level at which most signaling, enzymatic and regulatory function operates, have largely been examined one molecule at a time. Leveraging the recent expansion of the AlphaFold Database to predicted human homodimers, we systematically compared 5,168 canonical-versus-truncated homodimer pairs across the human proteome. In high-confidence canonical homodimers, truncation is associated with predicted structural conservation in 56.4% of pairs (mean 85 residues lost), complete interface ablation in 26.1% (mean 178 residues lost), and partial destabilization in 17.5% (mean 134 residues lost); a distinct fourth class (4.0% of the dataset, n = 208) shows truncation-associated emergence of a predicted high-confidence interface from a sub-threshold canonical baseline. Two reproducible rules govern these transitions: a topological asymmetry in which N-terminal losses are preferentially enriched ~1.6-fold in interface preservation while C-terminal losses are rare overall (~6% of pairs) and modestly under-represented in the conservation class, and a biophysical rule in which emergence-class proteins show substantially elevated intrinsic disorder content relative to ablation-class proteins, as measured by both AlphaFold pLDDT-defined disorder of the canonical structure (Cohen's d {approx} 1.39) and AIUPred peak binding propensity of the truncated isoform (Cohen's d {approx} 0.65). Formal pathway enrichment recovered only a small nucleotide-metabolism signal, indicating that these rules operate across diverse gene-functional categories. Truncation-associated remodeling of homodimer architecture thus constitutes a structural grammar of the human proteome rather than a specialty of any single regulatory family.
]]></description>
<dc:creator><![CDATA[ Karagöl, T., Karagöl, A. ]]></dc:creator>
<dc:date>2026-05-09</dc:date>
<dc:identifier>doi:10.64898/2026.05.06.723091</dc:identifier>
<dc:title><![CDATA[A structural grammar of truncation across the human homodimer landscape]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.06.722876v1?rss=1">
<title>
<![CDATA[
Building an open ecosystem for molecular neuroimaging: standards and tools from the OpenNeuroPET initiative 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.06.722876v1?rss=1
</link>
<description><![CDATA[
Molecular neuroimaging with positron emission tomography (PET) and single-photon emission computed tomography (SPECT) enables quantification of specific molecular targets in the living brain. Despite its scientific impact, molecular neuroimaging research has historically faced challenges due to high costs, small sample sizes, laboratory-specific analysis pipelines, and limited large-scale data sharing. These factors have hindered reproducibility and the broader reuse of valuable PET datasets. The OpenNeuroPET initiative was established to address these barriers by developing standards, infrastructure, and open-source tools for organizing, sharing, and analyzing molecular neuroimaging data. Through collaborations across Europe and North America, OpenNeuroPET has supported the PET extension of the Brain Imaging Data Structure (PET-BIDS), providing a standardized framework for PET datasets and metadata. Building on PET-BIDS, tools such as PET2BIDS, ezBIDS, and BIDSCoin facilitate data conversion and curation. In parallel, OpenNeuro now hosts PET-BIDS datasets for open sharing, while complementary platforms such as PublicnEUro enable GDPR-compliant controlled access. Emerging open-source workflows and BIDS applications further support automated, reproducible PET preprocessing and quantitative analysis, promoting harmonized processing across centers. Together, these developments mark an important step toward an open molecular neuroimaging ecosystem in which datasets, software, and workflows can be transparently shared, reused, and scaled for collaborative research.
]]></description>
<dc:creator><![CDATA[ Ganz, M., Norgaard, M., Pernet, C., Matheson, G. J., Galassi, A., Ceballos, E. G., Wighton, P., Bilgel, M., Eierud, C., Gonzalez-Escamilla, G., Buckholtz, J., Blair, R., Markiewicz, C. J., Hardcastle, N., Greve, D. N., Thomas, A. G., Poldrack, R. A., Calhoun, V. D., Innis, R. B., Knudsen, G. M. ]]></dc:creator>
<dc:date>2026-05-09</dc:date>
<dc:identifier>doi:10.64898/2026.05.06.722876</dc:identifier>
<dc:title><![CDATA[Building an open ecosystem for molecular neuroimaging: standards and tools from the OpenNeuroPET initiative]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.06.723123v1?rss=1">
<title>
<![CDATA[
A Fractal-Dimension Framework for Quantifying Self-Similarity in Chromatin Folding 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.06.723123v1?rss=1
</link>
<description><![CDATA[
The three-dimensional folding of DNA is essential for genome function, but its organization remains difficult to summarize quantitatively across genomic scales. Here, we study DNA folding from Hi-C contact data using a network-based notion of fractal dimension. In this representation, genomic loci are treated as nodes, and observed Hi-C contacts define weighted edges, so that frequently interacting loci are closer in the resulting network. We then estimate fractal dimension using two complementary graph-based methods: the correlation dimension and the sandbox dimension. Validation on synthetic networks shows that the proposed estimators detect clear scaling behavior in hierarchical fractal-like networks, while distinguishing them from networks with local clustering but no stable multiscale self-similarity. Applied to intrachromosomal Hi-C data from the IMR90 human cell line, the method reveals approximate linear scaling regimes on log-log plots, suggesting fractal-like organization in chromatin contact networks. At the chromosome level, estimated fractal dimension tends to increase with chromosome size: larger chromosomes often have dimensions closer to 3, consistent with more compact and space-filling organization, whereas shorter chromosomes tend to have lower dimensions, closer to 1, consistent with simpler and more open folding patterns. A sliding-window analysis at 5 kb resolution further shows that fractal organization varies substantially along chromosomes rather than remaining uniform across genomic position. These results suggest that graph-based fractal dimension provides an interpretable summary of DNA folding complexity at both global and local scales. More broadly, the proposed framework offers a quantitative way to study multiscale genome organization from Hi-C data using tools from network geometry.
]]></description>
<dc:creator><![CDATA[ El-Yaagoubi, A., Balubaid, A. O., Chung, M. K., tegner, j., Ombao, H. ]]></dc:creator>
<dc:date>2026-05-09</dc:date>
<dc:identifier>doi:10.64898/2026.05.06.723123</dc:identifier>
<dc:title><![CDATA[A Fractal-Dimension Framework for Quantifying Self-Similarity in Chromatin Folding]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.05.723059v1?rss=1">
<title>
<![CDATA[
Machine learning cross-platform proteomic imputation enables protein quality scoring and replication of epidemiological associations 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.05.723059v1?rss=1
</link>
<description><![CDATA[
High-throughput affinity-based proteomics has advanced biomedical research, yet fundamental, persistent discordance between mainstream platforms (SomaScan and Olink) routinely undermines the replication of findings. This platform-driven non-replication complicates downstream biological validation and biomarker prioritization. Here, we develop a machine learning-based framework for cross-platform protein value imputation to resolve this translational bottleneck. Using paired proteomic data measured by both SomaScan and Olink from 5,325 participants of the Multi-Ethnic Study of Atherosclerosis, we developed models to impute cross-platform measurements and applied them to two independent and demographically distinct cohorts (Cardiovascular Health Study [N=3,171] and UK Biobank [UKB; N=41,405]) for external validation. Our bi-directional model 1) established an imputation performance-based protein fidelity index, validated against gold-standard measurements from Atherosclerosis Risk in Communities study (N=101) and Nurses' Health Study (N=54), 2) enabled imputation of platform-exclusive protein measurements, and 3) facilitated calibration of overlapping proteins. We demonstrate the utility of this framework through three applications: 1) fidelity-informed analyses enhanced the replication of biomarker discovery, 2) recovery of SomaScan signals that were previously inaccessible in UKB's original Olink measurements, and 3) improved replication performance for overlapping proteins. Our study offers a translational roadmap that allows researchers to achieve reliable epidemiological replication, target specific assays for future optimization, and prioritize biological signal over platform noise.
]]></description>
<dc:creator><![CDATA[ Li, L., Alaa, A., Tan, Y., Demirel, I., Friedman, S., Zha, Q., Trac, R. P., Taylor, K. D., Yu, B., Ballantyne, C. M., Deo, R., Dubin, R., Tsai, M. Y., Peloso, G. M., Brody, J., Austin, T., Psaty, B. M., Nicholas, J., Raffield, L. M., Tahir, U., Coresh, J., Hornsby, W., Chan, A., Rich, S. S., Rotter, J. I., Ganz, P., Gerszten, R., Philippakis, A., Natarajan, P., Yu, Z. ]]></dc:creator>
<dc:date>2026-05-09</dc:date>
<dc:identifier>doi:10.64898/2026.05.05.723059</dc:identifier>
<dc:title><![CDATA[Machine learning cross-platform proteomic imputation enables protein quality scoring and replication of epidemiological associations]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.05.723100v1?rss=1">
<title>
<![CDATA[
Cross Dataset Transcriptomic Analysis Identifies Oxidative Stress Inflammation Gene Networks Modulated by Nutrigenomic Interventions in Parkinson Disease 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.05.723100v1?rss=1
</link>
<description><![CDATA[
Inflammation and oxidative stress (OS) are key to Parkinson's disease (PD). We performed a cross-dataset integrative transcriptomic analysis to identify OS and inflammation-related hub genes persistently dysregulated in PD and to evaluate their response to nutrigenomic interventions using publicly available datasets. Four GEO datasets (GSE7621, GSE20141, GSE20146, GSE49036) were analysed to identify differentially expressed genes (DEGs), which were intersected with GeneCards OS inflammation gene sets. Functional enrichment analyses, including gene ontology (GO), pathway over-representation analysis (ORA), and protein-protein interaction (PPI) analysis, were used to identify key pathways and hub genes. Gene food bioactive compound (FBC) association was explored by integrating PD signatures with nutrigenomic profiles from NutriGenomeDB. We identified 183 DEGs in PD, enriched in synaptic, dopaminergic, OS, and inflammatory pathways. Intersection analysis yielded 26 OS-inflammation-related genes and 10 central regulators, including TH, DDC, SNCA, LRRK2, HSPB1, and HSPA1B. revealed opposing transcriptional patterns, with several FBCs suppressing stress related genes and upregulating dopaminergic markers such as TH, GCH1, and DDC. Overall, this integrative analysis highlights OS inflammation gene networks in PD and identifies candidate diet gene interactions that warrant further experimental validation
]]></description>
<dc:creator><![CDATA[ Rafiee, M., Abaj, F., Mahdevar, M., Rashidian, A., Ghaedi, K., Ghiasvand, R. ]]></dc:creator>
<dc:date>2026-05-09</dc:date>
<dc:identifier>doi:10.64898/2026.05.05.723100</dc:identifier>
<dc:title><![CDATA[Cross Dataset Transcriptomic Analysis Identifies Oxidative Stress Inflammation Gene Networks Modulated by Nutrigenomic Interventions in Parkinson Disease]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.06.723148v1?rss=1">
<title>
<![CDATA[
Sex-biased Fibroblast Subpopulations and Transcriptional Programs Reveal Mechanisms of Skin Lesion Development in Systemic Sclerosis 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.06.723148v1?rss=1
</link>
<description><![CDATA[
Systemic sclerosis (SSc) is an autoimmune connective tissue disease with pronounced sex differences: females are more frequently affected and males develop more severe skin fibrosis. The cellular mechanisms of this disparity remain unclear. Here we use single-cell transcriptomics of lesional, non-lesional, and healthy skin to define fibroblast states and sex-biased transcriptional programs during lesion development. We identify a sex-dependent divergence in SSc fibrotic regulation. Female fibroblasts exhibit heightened inflammatory signaling and canonical TGF-{beta}-driven extracellular matrix production, whereas male fibroblasts preferentially engage non-canonical TGF-{beta} pathways, mechanotransduction, and MYC-associated stress programs. We further reveal that the fibrotic lesional environment shows sex differences: SFRP2DPP4 fibroblasts predominate in females and COL11A1/COCH in males. Our findings uncover cellular mechanisms underlying sex differences in SSc fibrosis, highlight opportunities for sex-informed therapeutic strategies and underscore the necessity of integrating biological sex into precision medicine frameworks to identify divergent molecular drivers of fibrotic disease.
]]></description>
<dc:creator><![CDATA[ Khantham, C., Rodriguez-Martin, I., Kerick, M., Villanueva-Martin, G., Callejas, J. L., Ortego-Centeno, N., Guillen-Del-Castillo, A., Simeon-Aznar, C. P., Ruiz-Villaverde, R., Andres-Leon, E., Martin, J., Acosta-Herrera, M. ]]></dc:creator>
<dc:date>2026-05-09</dc:date>
<dc:identifier>doi:10.64898/2026.05.06.723148</dc:identifier>
<dc:title><![CDATA[Sex-biased Fibroblast Subpopulations and Transcriptional Programs Reveal Mechanisms of Skin Lesion Development in Systemic Sclerosis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.05.723092v1?rss=1">
<title>
<![CDATA[
PromptBio-Bench: Benchmarking LLM-based Bioinformatics Agents for End-to-End Data Analysis 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.05.723092v1?rss=1
</link>
<description><![CDATA[
Large language model (LLM)-based agents hold transformative potential for automating bioinformatics workflows; however, systematic evaluations of their capabilities remain limited, hindering a clear assessment of their readiness for real-world application. We introduce PromptBio-Bench, a comprehensive evaluation suite of 194 expert-curated tasks spanning bioinformatics and data science at varied difficulty levels, and an evaluation framework for structured file comparison and scoring against expert reference answers. Benchmarking three state-of-the-art agents revealed that Biomni and ToolsGenie achieved comparable performance, and accuracy declined markedly at higher difficulty levels across all agents. As foundation models and agent frameworks continue to evolve, PromptBio-Bench provides a valuable benchmark infrastructure for the community to systematically track the progress of agentic bioinformatics.
]]></description>
<dc:creator><![CDATA[ Guo, W., Zhang, M., Han, B., Ma, Y., Leng, Y., Hebbar, S., Zhou, X., Gu, W., Yang, X., Dhar, S. ]]></dc:creator>
<dc:date>2026-05-08</dc:date>
<dc:identifier>doi:10.64898/2026.05.05.723092</dc:identifier>
<dc:title><![CDATA[PromptBio-Bench: Benchmarking LLM-based Bioinformatics Agents for End-to-End Data Analysis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.06.721805v1?rss=1">
<title>
<![CDATA[
Structural bias in machine learning-guided peptide design 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.06.721805v1?rss=1
</link>
<description><![CDATA[
Machine learning continues to accelerate peptide and protein design through the rapid prediction and generation of sequences with desired characteristics. Many applications focus on predicting properties, functions, and structures, as well as generating point mutations and de novo designs. Nevertheless, many models prove less generalizable than initially claimed. Most predictors and generators are trained on sequential datasets, where imbalances can be addressed during preprocessing. In contrast, structural bias, a subtype of algorithmic bias arising from uneven representation of structural classes in training datasets, and the limitations of early protein structure predictors have frequently remained undetected and uncorrected. The recent surge in powerful protein structure prediction tools, such as the AlphaFold and RosettaFold series and their variants, now presents opportunities to mitigate this issue. We hypothesize that such structural sampling biases influence the downstream performance of ML models. Using antimicrobial peptides as a case study, we audited the structural biases in 16 state-of-the-art predictors for antimicrobial activity and tested whether structural information constrains their predictions. Our analysis revealed that models explicitly trained on sequential data still produce predictions biased by uneven fold representations and data leakage. These findings highlight the importance of integrating balanced structural data or implementing bias-mitigating strategies to develop agnostic models that maximize bioactive protein discovery and multi-objective optimization.
]]></description>
<dc:creator><![CDATA[ Aldas-Bulos, V. D., Plisson, F. ]]></dc:creator>
<dc:date>2026-05-08</dc:date>
<dc:identifier>doi:10.64898/2026.05.06.721805</dc:identifier>
<dc:title><![CDATA[Structural bias in machine learning-guided peptide design]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.06.722404v1?rss=1">
<title>
<![CDATA[
Open-Rosalind: Tool-First Biomedical LLM Agents with Process-Aware Benchmarking 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.06.722404v1?rss=1
</link>
<description><![CDATA[
Large language models are increasingly used as scientific agents, yet the flexibility that benefits general-purpose agents can conflict with the accountability required in biomedical research. We study whether biomedical agents can be organized around auditable constraints rather than unconstrained autonomy. We present Open-Rosalind, a tool-first bio-agent system designed around four operational principles: evidence-grounded outputs, trace completeness, workflow-constrained execution, and explicit tool mediation for factual claims. To evaluate these principles, we introduce Open-Rosalind BioBench, a process-aware benchmark that measures not only task accuracy but also tool correctness, citation presence, trace completeness, and failure rate. On a strict in-house benchmark, the reference pipeline achieves 81.4% accuracy with complete execution traces. In multi-model ablations and paired replications, removing tools reduces accuracy by 19.3 to 26.4 percentage points, indicating that tool-first execution is the strongest and most stable contributor to performance. Constrained workflows also reduce lower-tail failures for models that are weak at free-form tool use. However, an author-independent 30-task hold-out initially revealed severe external-validity collapse on the deployment model. After diagnosing five routing and normalization failures and applying targeted fixes, hold-out accuracy improved from 17.8% to 53.3%, and the most concerning negative comparison against a no-tool baseline disappeared. Taken together, these results frame Open-Rosalind as an empirical study of auditable biomedical agents, rather than as a claim that protocol constraints alone guarantee superior performance.
]]></description>
<dc:creator><![CDATA[ Wang, L. ]]></dc:creator>
<dc:date>2026-05-08</dc:date>
<dc:identifier>doi:10.64898/2026.05.06.722404</dc:identifier>
<dc:title><![CDATA[Open-Rosalind: Tool-First Biomedical LLM Agents with Process-Aware Benchmarking]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.06.723370v1?rss=1">
<title>
<![CDATA[
vartracker: an end-to-end tool for pathogen longitudinal variant analysis and visualisation 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.06.723370v1?rss=1
</link>
<description><![CDATA[
Longitudinal sequencing can reveal fine-grained pathogen evolution during acute and chronic infections and inform public health responses. However, integrating ordered pathogen genomic data into a coherent evolutionary and clinical framework can be tedious and error-prone. We present vartracker, an open-source tool for longitudinal pathogen variant analysis and visualisation. Given an ordered sample manifest, vartracker supports three entry points: raw sequence reads, reference-aligned BAM files, or user-supplied VCF and coverage inputs. Raw-read and BAM inputs are processed through an integrated Snakemake workflow, whereas VCF mode starts from precomputed files. Variants are normalised and annotated relative to a reference genome, tracked across timepoints, and classified as original or newly emerging and as transient or persistent. Inferred amino acid changes are reported, and for SARS-CoV-2 analyses, relevant published literature for key mutations can be automatically linked through a functional database. vartracker outputs a schema-documented results table, provenance metadata for reproducibility, publication-quality static figures, and an interactive heatmap for data exploration. Although packaged with SARS-CoV-2 reference assets and initially developed for SARS-CoV-2 datasets, vartracker is pathogen-agnostic when appropriate reference data are supplied. We demonstrate its utility using SARS-CoV-2 and respiratory syncytial virus A (RSV-A) datasets. vartracker is freely available through GitHub, PyPI and Bioconda.
]]></description>
<dc:creator><![CDATA[ Foster, C. S. P., Rawlinson, W. D. ]]></dc:creator>
<dc:date>2026-05-08</dc:date>
<dc:identifier>doi:10.64898/2026.05.06.723370</dc:identifier>
<dc:title><![CDATA[vartracker: an end-to-end tool for pathogen longitudinal variant analysis and visualisation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.05.723027v1?rss=1">
<title>
<![CDATA[
BART-spatial unravels biologically significant transcriptional regulators from spatial omics data 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.05.723027v1?rss=1
</link>
<description><![CDATA[
Transcriptional regulators (TRs) are crucial regulators of cell fate decisions by activating or repressing lineage-specific genes and integrating environmental signals with intrinsic networks. Identifying functional TRs is essential for understanding development, tissue organization, and disease. Emerging spatial transcriptomics and epigenomics technologies now provide near-single-cell resolution mapping of genomic features while preserving information of each cell's physical location and microenvironment which influence TR activity. Despite these advances, identifying active TRs in spatial data remains challenging due to low TR expression and the fact that TR activity often does not correlate directly with mRNA levels. Moreover, existing tools mainly designed for non-spatial single-cell data overlook spatial heterogeneity. To bridge this gap, we developed BART-spatial (Binding Analysis for Regulation of Prediction for spatial omics), an innovative computational method to infer functional TRs from spatial omics data. BART-spatial integrates spatial variability and pseudo-temporal information with publicly available TR binding profiles. Applied to multiple spatial datasets from diverse platforms, including 10X Visium, Visium HD, Atera, and spatial RNA-ATAC-seq, BART-spatial consistently outperforms existing methods, identifying stage-specific TRs and revealing regulators undetectable by expression alone. Its compatibility with spatial epigenomics data further strengthens its utility and enables cross-validation. Overall, BART-spatial provides a powerful and robust tool for decoding spatially resolved gene regulatory programs.
]]></description>
<dc:creator><![CDATA[ Wang, J., Zhang, H., Wang, Z., Zang, C. ]]></dc:creator>
<dc:date>2026-05-08</dc:date>
<dc:identifier>doi:10.64898/2026.05.05.723027</dc:identifier>
<dc:title><![CDATA[BART-spatial unravels biologically significant transcriptional regulators from spatial omics data]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.05.723040v1?rss=1">
<title>
<![CDATA[
RAPID: an interactive R/Shiny platform for end-to-end 16S rRNA and ITS amplicon sequence analysis using DADA2 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.05.723040v1?rss=1
</link>
<description><![CDATA[
Abstract Motivation: Amplicon sequencing of 16S rRNA and internal transcribed spacer (ITS) gene regions is the most widely used approach for characterizing bacterial and fungal communities, respectively. The DADA2 pipeline has become a standard for inferring amplicon sequence variants (ASVs), offering single-nucleotide resolution over traditional OTU clustering. However, executing the full DADA2 workflow requires proficiency in R programming and manual coordination of multiple sequential steps, presenting a substantial barrier for researchers in clinical, environmental, and agricultural sciences who lack computational training. Results: We present RAPID (R-based Amplicon Pipeline for Interactive DADA2), a pair of R/Shiny applications providing complete graphical user interfaces for 16S rRNA and ITS amplicon sequence analysis. The 16S application implements a 10-step guided workflow from raw paired-end FASTQ files through quality filtering, error learning, dereplication, paired-read merging, chimera removal, taxonomy assignment (SILVA), phyloseq construction with data transformation (rarefaction, relative abundance, or CLR), interactive visualization (rarefaction curves, alpha diversity, NMDS, PCoA, taxonomic abundance), PERMANOVA, and ANCOM-BC2 differential abundance analysis. The ITS application extends this to an 11-step workflow, adding an automated primer removal step using cutadapt with support for multiple primers and length-variable amplicons, and uses the UNITE database for fungal taxonomy. Both applications feature asynchronous background processing, session persistence, real-time progress monitoring, publication-ready figure export, and comprehensive result downloads. Availability: RAPID is freely available at https://github.com/beantkapoor786/RAPID. Both applications can be installed locally on any system with R (version 4.0 or higher) and run as local web applications accessible through a standard browser. Keywords: 16S rRNA, ITS, amplicon sequencing, DADA2, microbiome, mycobiome, graphical user interface, Shiny, phyloseq, ASV, PERMANOVA, ANCOM-BC2
]]></description>
<dc:creator><![CDATA[ Kapoor, B., Cregger, M. A., Ranjan, P. ]]></dc:creator>
<dc:date>2026-05-08</dc:date>
<dc:identifier>doi:10.64898/2026.05.05.723040</dc:identifier>
<dc:title><![CDATA[RAPID: an interactive R/Shiny platform for end-to-end 16S rRNA and ITS amplicon sequence analysis using DADA2]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.05.723010v1?rss=1">
<title>
<![CDATA[
STARMAP: A 3D-informed framework for mapping functional regions in proteins to regulatory and cellular phenotypes 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.05.723010v1?rss=1
</link>
<description><![CDATA[
Artificial Intelligence (AI) has transformed biology by revealing patterns in large-scale datasets and predicting regulatory relationships. Yet even the most advanced models often fail to identify biologically meaningful mechanisms from statistical associations. This limitation arises not from algorithmic capacity but from the lack of mechanistically grounded input features. Our structure-informed framework Structure-based Topological Analysis of Regulatory and Molecular Activity Patterns (STARMAP) embeds protein three-dimensional structure and population-scale functional genomics data into a unified representation for mechanistic inference. By mapping over 1.5 million naturally occurring variants across ~1,700 cancer cell lines onto protein structures, STARMAP was able to identify spatial clusters of variation associated with shifts in transcriptional regulatory networks and drug response phenotypes. This approach transforms natural genetic variation into a large-scale, structure-informed screen, enabling systematic discovery of regulatory relationships across the proteome and providing interpretable and testable models of cellular regulation.
]]></description>
<dc:creator><![CDATA[ Shukla, K., Castro, J., Cheng, D., Holley, L., Brunk, E. C. ]]></dc:creator>
<dc:date>2026-05-08</dc:date>
<dc:identifier>doi:10.64898/2026.05.05.723010</dc:identifier>
<dc:title><![CDATA[STARMAP: A 3D-informed framework for mapping functional regions in proteins to regulatory and cellular phenotypes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.05.722940v1?rss=1">
<title>
<![CDATA[
TopoFuseNet: Hierarchical Graph Representation Learning with Multi-Scale Topological Features for Accurate Drug Synergy Prediction 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.05.722940v1?rss=1
</link>
<description><![CDATA[
Accurate prediction of drug synergy is paramount for developing effective combination therapies and advancing personalized medicine. Although methods based on graph neural networks (GNNs) have become a prevalent approach, they often treat molecules as flat graphs of connected atoms, thus overlooking their inherent hierarchical structure (i.e., atoms forming functional groups) and the critical topological information that governs molecular interactions. To address this limitation, we introduce TopoFuseNet, a novel hierarchical graph representation learning framework that integrates multi-scale topological features. The core innovations of TopoFuseNet include: 1) The first-ever application of "Group Centrality" from network science to cheminformatics, enabling the identification and quantification of functional groups crucial to drug activity; 2) A systematic, multi-path strategy to seamlessly integrate node-level (atom) and group-level (functional group) topological features into a Graph Attention Network (GAT) via feature augmentation, attention biasing, and hierarchical pooling; 3) A Differential Transformer module to deeply fuse multi-modal features learned from sequences, fingerprints, and our proposed hierarchical graph representations.
]]></description>
<dc:creator><![CDATA[ Wang, Q., Shi, x. ]]></dc:creator>
<dc:date>2026-05-08</dc:date>
<dc:identifier>doi:10.64898/2026.05.05.722940</dc:identifier>
<dc:title><![CDATA[TopoFuseNet: Hierarchical Graph Representation Learning with Multi-Scale Topological Features for Accurate Drug Synergy Prediction]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.05.05.722888v1?rss=1">
<title>
<![CDATA[
A Differentiable dFBA Simulator for Scalable Bayesian Inference over Microbial Metabolic Models 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.05.05.722888v1?rss=1
</link>
<description><![CDATA[
Medium optimisation for bioprocess design remains challenging and costly: fermentation recipes typically contain ten or more components, the design space expands combinatorially as ingredients are added, and each batch experiment requires over 24 hours. High-throughput 96-well plate screening can reduce experimental cost, but extracting actionable predictions from growth curves requires a mechanistic model that links medium composition to cellular metabolism. In this paper, we present a differentiable simulator for dynamic flux balance analysis (dFBA) that enables scalable Bayesian inference over microbial metabolic models. A distinguishing feature is that inference is driven entirely by OD600 measurements, a simple optical proxy for biomass, without substrate or product assays; internal fluxes, substrate consumption, and secreted metabolite profiles are recovered as latent variables constrained by the metabolic network stoichiometry. We resolve the core differentiability barrier of classical dFBA by reformulating the per-step linear or quadratic programme (LP/QP) as a smooth continuous ODE (the Relaxed Interior-Point ODE, R-iODE), establishing the mathematical framework for end-to-end gradient propagation through long fermentation trajectories in JAX; full gradient validation is ongoing. The result is a framework for principled inference over thousands of batch fermentations, providing a path toward model-guided medium design, cross-strain parameter transfer, and scale-up prediction from plate data.
]]></description>
<dc:creator><![CDATA[ Diederen, T., Merzbacher, C., Patz, M. ]]></dc:creator>
<dc:date>2026-05-08</dc:date>
<dc:identifier>doi:10.64898/2026.05.05.722888</dc:identifier>
<dc:title><![CDATA[A Differentiable dFBA Simulator for Scalable Bayesian Inference over Microbial Metabolic Models]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-05-08</prism:publicationDate>
<prism:section></prism:section>
</item>
</rdf:RDF>
