bioRxiv Subject Collection: Genomics Bioinformatics

bioRxiv Subject Collection: Genomics Bioinformatics https://biorxiv.org This feed contains articles for bioRxiv Subject Collection "Genomics Bioinformatics" bioRxiv bioRxiv https://www.biorxiv.org/sites/default/files/bioRxiv_article.jpg https://www.biorxiv.org <![CDATA[ A community machine learning challenge to predict the effects of gene perturbations on T cell differentiation for cancer immunotherapy ]]> https://www.biorxiv.org/content/10.64898/2026.05.21.726863v1?rss=1 2026-05-22 doi:10.64898/2026.05.21.726863 Cold Spring Harbor Laboratory 2026-05-22 <![CDATA[ BPabZIP, a new bZIP protein motif that promotes binding near, and displacement of, nucleosomes ]]> https://www.biorxiv.org/content/10.64898/2026.05.20.725981v1?rss=1 2026-05-22 doi:10.64898/2026.05.20.725981 Cold Spring Harbor Laboratory 2026-05-22 <![CDATA[ Min-frame transformation enables more sensitive viral genome alignment ]]> https://www.biorxiv.org/content/10.64898/2026.05.20.726535v1?rss=1 2026-05-22 doi:10.64898/2026.05.20.726535 Cold Spring Harbor Laboratory 2026-05-22 <![CDATA[ RANKOR: Direct Drug Prioritization from Bulk and Single-Cell Transcriptomic Signatures ]]> https://www.biorxiv.org/content/10.64898/2026.05.20.726471v1?rss=1 2026-05-21 doi:10.64898/2026.05.20.726471 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ Benchmarking full-length ITS metabarcoding across Illumina 2x500, PacBio, and Oxford Nanopore sequencing using mock and soil communities ]]> https://www.biorxiv.org/content/10.64898/2026.05.20.726443v1?rss=1 500 bases) offer better taxonomic resolution than shorter ones. Still, the choice of sequencing platforms and bioinformatics pipelines may strongly affect inferred diversity due to various technical biases. We assessed the relative performance of Illumina MiSeq i100 (2x500 paired-end), PacBio Revio and Oxford Nanopore MinION sequencing and bioinformatics pipelines, using full-length ITS amplicon sequencing datasets from a 103-species mock community and 45 composite soil samples. Despite numerous low-quality reads, PacBio yielded the lowest overall error rate and highest number of taxa. Illumina revealed the highest proportion of chimeric and index-switched reads, along with a strong bias towards shorter amplicons. MinION data analysed using PRONAME and Minovar - a bioinformatics pipeline presented here - had the largest proportion of low-quality data, and rare taxa were lost during data filtering and read polishing steps. Although Minovar enabled amplicon sequence variant (ASV) level precision for common taxa, we recommend clustering ASVs into OTUs. For PacBio, standard filtering approaches outperformed the ASV approach because they retained rare taxa. For Illumina, a stringent ASV approach or removal of rare OTUs would limit artefacts. Across all platforms, excess PCR cycles promoted chimeric and low-quality reads and lost quantitativity in biodiversity assessments. With moderate differences in effect sizes, all analytical approaches supported the conclusion that sampling design determines how we see soil biodiversity responses to land use. For biodiversity surveys based on the full-length ITS metabarcoding, we recommend using PacBio sequencing with standard, non-ASV pipelines. ]]> 2026-05-21 doi:10.64898/2026.05.20.726443 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ MolCodon: A Codon-Based Molecular Language for InterpretableStructural Representation and Similarity Search ]]> https://www.biorxiv.org/content/10.64898/2026.05.20.726468v1?rss=1 2026-05-21 doi:10.64898/2026.05.20.726468 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ A cross-tissue POSTN+ fibroblast atlas links periodontal, tumor, and fibrotic stromal niches ]]> https://www.biorxiv.org/content/10.64898/2026.05.20.726414v1?rss=1 0.5 within an atlas-integrated leiden cluster, combined with per-cell POSTN > 0) identified 11,451 POSTN+ cells (20.2% of the atlas) recurring across all six contexts at frequencies from 6.2% (periodontal ligament) to 55.1% (liver fibrosis). Within-fibroblast differential expression yielded a 102-gene shared core program - collagen biosynthesis, ECM crosslinkers, and matricellular markers including POSTN, SPARC, BGN, FN1, MMP2, and CTHRC1 - interpreted as POSTN-specific transcriptional amplification of an activated ECM-remodelling module. KLF4, hypothesized a priori as a POSTN+ co-marker, was upregulated in only one of six contexts, consistent with its role as a quiescence brake released during activation[3]. Three pre-registered sensitivity analyses (Harmony parameter, three definitions, dataset exclusion) and an independent Puram-2017 OSCC cohort (1,422 fibroblasts; 101/102 core genes recovered; primary vs lymph-node-met Mann-Whitney p = 0.005) support robustness across integration parameters, definitions, dataset inclusion, and platform. ]]> 2026-05-21 doi:10.64898/2026.05.20.726414 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ Joint enzyme-reaction retrieval and catalytic optima prediction via multimodal fusion ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726405v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726405 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ MirMachine 2: a scalable, evolutionarily informed pipeline for microRNA annotation and comparative genomics across thousands of animal genomes ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726197v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726197 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ Transcriptomics of cold stress and recovery reveal strongly tissue-specific responses ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.725261v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.725261 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ NanoCortex: A Unified Agentic System for Nanopore Sequencing Analysis ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726254v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726254 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ A Plasmodium falciparum Pangenome Resource to Drive Structural Variant Discovery and to assist Malaria Control ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726271v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726271 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ A unified framework for batch correction and missing data handling in large-scale and single-cell mass spectrometry proteomics ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726178v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726178 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ Antimicrobial peptide databases and prediction tools: Toward a standard evaluation framework ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726290v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726290 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ zFISHer: Automated 3D Registration, Detection, and Colocalization with Interactive Curation for Sequential Multiplexed FISH ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726314v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726314 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ ParaDISM: Precise mapping of short reads to genes with highly homologous regions ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726275v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726275 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ Comparative somatic genomics reveals divergent development of cell lineages across scleractinian corals ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726040v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726040 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ Efficient and Robust Genomic DNA Isolation and Next-Generation Sequencing Library Preparation from Recalcitrant Wild Grape Species ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.713680v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.713680 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ Combinatorial pioneer transcription factor binding reinforces bivalent epigenetic states to preserve lineage fidelity ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726322v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726322 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ Differential Gene Expression in the Tropical House Cricket and Its Iridovirus in Healthy versus Diseased Specimens ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726264v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726264 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ Structural Pockets and Interacting RNA-Associated Ligands (SPIRAL): A DSSR-enabled Meta-Analysis of RNA-Small Molecule Recognition ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726393v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726393 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ Multi-layer transcriptomic characterization of age-related immune dynamics ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726397v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726397 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ S-IGTD: supervised tabular-to-image topology learning via between-group correlation for multiclass classification of biological data ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726105v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726105 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ A framework for peptide identification on commercial nanopore sequencing platforms ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726067v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726067 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ Spectral Prompting: Unsupervised Recovery of Human Hair Follicle Cell-Type and Multiscale Systems Architecture from Bulk and Single-Cell RNA-Seq Datasets via Single-Gene Seeded Spectral Unfolding ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726151v1?rss=1 1 manifold constructions. With this, I recover hitherto unresolved gene expression programmes from bulk data, including, but not limited to, epithelial hair follicle stem cell (eHFSC), hair shaft, dermal papilla and endothelial gene expression signatures. Focusing on querying KRT15, a human anagen bulge eHFSC and progenitor marker, raw output from individual spectral prompts during testing recovered known eHFSC-associated genes including LGR5, LHX2 and CXCL14, and discovered new candidate human eHFSC and progenitor cell-associated markers, such as RGMA and MUCL1 which were validated in situ. Finally, I show a brief demonstration that the technique can be similarly applied to single-cell data (GSE129611), whereby a KRT15 gene prompt from a combined expression matrix was mapped to a KRT15+/CXCL14+/LHX2+/DIO2+/SFRP1+ cell population (31/6000 cells) independent of standard clustering tools. Moving forward, from this foundation, the method will be developed to study how latent gene expression space shifts following perturbation or pathology. ]]> 2026-05-21 doi:10.64898/2026.05.19.726151 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ Heterogeneity-driven adaptive scale graph learning for subcellular spatial transcriptomics ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726162v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726162 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ BioRAG-DRAG: A Multimodal Biological Retrieval Layer for Local-First Biomedical Agents ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726174v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726174 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ A phylogeny-guided framework for decoding mechanisms of human endogenous retrovirus regulation in health and disease ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726217v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726217 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ Lifestyles of Gypsy-family transposons shape their regulatory mechanisms ]]> https://www.biorxiv.org/content/10.64898/2026.05.19.726053v1?rss=1 2026-05-21 doi:10.64898/2026.05.19.726053 Cold Spring Harbor Laboratory 2026-05-21 <![CDATA[ geneML: Gene annotation across diverse fungal species using deep learning ]]> https://www.biorxiv.org/content/10.64898/2026.05.18.725946v1?rss=1 2026-05-21 doi:10.64898/2026.05.18.725946 Cold Spring Harbor Laboratory 2026-05-21