<?xml version="1.0" encoding="UTF-8" ?>
<rdf:RDF xmlns:admin="http://webns.net/mvcb/" xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:prism="http://purl.org/rss/1.0/modules/prism/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/">
<channel rdf:about="https://biorxiv.org">
<admin:errorReportsTo rdf:resource="mailto:biorxiv@cshlpress.edu"/>
<title>bioRxiv Subject Collection: Genomics</title>
<link>https://biorxiv.org</link>
<description>
This feed contains articles for bioRxiv Subject Collection "Genomics"
</description>

<items>
<rdf:Seq>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.07.01.735920v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.30.735536v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.30.735631v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.07.02.735893v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.30.735261v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.29.735270v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.29.735415v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.30.735306v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.29.734743v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.30.735570v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.29.734667v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.30.735624v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.28.733875v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.07.01.729829v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.29.735274v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.29.735038v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.27.734930v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.29.735218v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.29.735170v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.29.735224v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.07.01.735800v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.29.730585v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.28.735079v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.26.734910v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.26.734884v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.26.733976v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.26.734379v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.28.735102v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.26.734692v1?rss=1"/>
<rdf:li rdf:resource="https://www.biorxiv.org/content/10.64898/2026.06.30.735607v1?rss=1"/>
</rdf:Seq>
</items>
<prism:eIssn/>
<prism:publicationName>bioRxiv</prism:publicationName>
<prism:issn/>

<image rdf:resource=""/>
</channel>
<image rdf:about="">
<title>bioRxiv</title>
<url>https://www.biorxiv.org/sites/default/files/bioRxiv_article.jpg</url>
<link>https://www.biorxiv.org</link>
</image>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.07.01.735920v1?rss=1">
<title>
<![CDATA[
Cohesin residence time gates 3D genome response to histone hyperacetylation 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.07.01.735920v1?rss=1
</link>
<description><![CDATA[
Cohesin-mediated loop extrusion and chromatin state-dependent compartmentalization are major drivers of three-dimensional (3D) genome organization. Although epigenomic perturbations are widely assumed to reshape chromatin architecture, the mechanisms that determine how changes in chromatin state are translated into structural reorganization remain poorly understood. Here, we identify cohesin residence time as a key regulator of the genome's architectural response to histone hyperacetylation induced by histone deacetylase inhibition (HDACi). Acute depletion of RAD21 or CTCF weakens chromatin loops but preserves HDACi-induced changes in compartmentalization, contact-scaling behavior, and loop density. In contrast, perturbation of cohesin loading or release produces opposing effects: NIPBL depletion sensitizes and amplifies architectural responses to HDACi, whereas WAPL loss renders the genome largely refractory to HDACi-induced remodeling, suppressing changes in compartments and loop density while stabilizing CTCF-anchored loops. These distinct architectural outcomes occur despite comparable levels of HDACi-induced histone hyperacetylation across genotypes, indicating that differential epigenomic input is not responsible for the observed effects. Together, our findings demonstrate that dynamic cohesin turnover, rather than cohesin chromatin association alone, governs whether epigenomic perturbations are converted into higher-order genome reorganization. These results establish cohesin residence time as a molecular gate linking chromatin state to 3D genome architecture and reveal a previously unrecognized principle underlying chromatin architecture plasticity.
]]></description>
<dc:creator><![CDATA[ Smith, R. G., Schiela, K. L., Wilson, H. M., Williams, R. A., Johnson, J., Cohen, C. B., Yueh, W.-T., Whitaker, A. M., Johnson, N., Kanemaki, M. T., Liu, Y. ]]></dc:creator>
<dc:date>2026-07-04</dc:date>
<dc:identifier>doi:10.64898/2026.07.01.735920</dc:identifier>
<dc:title><![CDATA[Cohesin residence time gates 3D genome response to histone hyperacetylation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.30.735536v1?rss=1">
<title>
<![CDATA[
Accurate, comprehensive gene annotation and ortholog identification across thousands of vertebrate genomes with TOGA2 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.30.735536v1?rss=1
</link>
<description><![CDATA[
Inferring orthologs and annotating coding genes remain central challenges in genomics, evident by the growing gap between assembled and annotated genomes. TOGA (Tool to infer Orthologs from Genome Alignments) addresses this challenge by integrating gene annotation and orthology inference. Here, we present TOGA2, the next generation of TOGA, which substantially improves annotation completeness, accuracy, scalability, and orthology inference. TOGA2 leverages exon-level orthology and introduces an exon-wise annotation procedure that reduces memory usage 513-fold and runtime 6.1-fold. We show that human-trained deep learning models for splice site prediction generalize across vertebrates. Integrating these predictions enables robust handling of evolutionary changes in exon-intron structure, including splice site shifts, intron deletions, and exonization of introns. A new gene tree reconciliation step refines orthology inference, and UTR annotation improves gene model completeness. Across mammals, birds, turtles, and percomorph fishes, TOGA2 annotations generally achieve higher gene completeness than transcriptome-informed RefSeq annotations. TOGA2 identifies previously unannotated exons in mouse, assigns informative gene symbols, and annotates V(D)J segments of antigen receptors. TOGA2 scales to thousands of genomes, which we demonstrate by generating comprehensive comparative genomics resources for 2,162 vertebrate assemblies, including gene annotations, ortholog sets, gene losses and duplications, retrogene candidates, and outputs supporting downstream analyses. Together, TOGA2 provides a scalable and versatile framework for comparative genomics that bridges the genome annotation gap.
]]></description>
<dc:creator><![CDATA[ Malovichko, Y. V., Bein, B., Gonzales-Irribarren, A., Leushkin, E., Hilgers, L., Stephen, A., Yi, X., Albertini, M., Stadager, T., Zumpt, M., Hoppach, L., Goetz, F., Himstedt, N., Koch, L., VGP,, Hiller, M. ]]></dc:creator>
<dc:date>2026-07-04</dc:date>
<dc:identifier>doi:10.64898/2026.06.30.735536</dc:identifier>
<dc:title><![CDATA[Accurate, comprehensive gene annotation and ortholog identification across thousands of vertebrate genomes with TOGA2]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.30.735631v1?rss=1">
<title>
<![CDATA[
Saturation-seq integrates single-cell saturation genome editing and RNA-seq to quantify NFE2L2 (NRF2) variant effects 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.30.735631v1?rss=1
</link>
<description><![CDATA[
Interpreting the functional consequences of variants remains one of the central unsolved problems in genomics and clinical genetics. Compounding this, most existing approaches rely on reductive, one-dimensional proxies such as cell growth to score variant effects, which can be a poor substitute for the rich, multidimensional phenotyping that is ultimately needed to understand how variants alter biology. This is especially true for variants known to act through gain-of-function/neomorphic mechanisms. We developed Saturation-seq, a high-throughput platform that combines saturation genome editing with single-cell DNA and RNA profiling to systematically map variant effects. Using CRISPR-based editing in a barcoded haploid cell line, we install hundreds of variants directly into endogenous genomic loci, testing them in multiplex and preserving the native coding and regulatory context. Single-cell amplicon and transcriptome sequencing enables direct linkage of each genomic edit to its transcriptional impact. We apply Saturation-seq to comprehensively characterize 230 variants in the recurrently mutated N-terminal region of NFE2L2 (NRF2), a master regulator of oxidative stress and an oncogene mutated in lung cancer. We define variant function with disruption scores computed from misregulation of known NRF2 targets in single-cell transcriptomes; scores separate pathogenic/benign truthset variants with >90% accuracy and enabled interpretation of TCGA and TRACERx patient tumor data, as well as a rare NFE2L2 germline variant linked to a developmental syndrome. Thus, we establish a broadly applicable high-resolution single-cell variant-to-function platform with a rich phenotypic readout.
]]></description>
<dc:creator><![CDATA[ Strauss, M. E., Waters, A. J., Roberston, H., Brendler-Spaeth, T., Gontarczyk, A., Gupta, P., Kataria, S., Gitterman, D., Ntereke, T., Wells, L., Billington, J., Bassett, A., Cooper, S., Adams, D. J. ]]></dc:creator>
<dc:date>2026-07-04</dc:date>
<dc:identifier>doi:10.64898/2026.06.30.735631</dc:identifier>
<dc:title><![CDATA[Saturation-seq integrates single-cell saturation genome editing and RNA-seq to quantify NFE2L2 (NRF2) variant effects]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-04</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.07.02.735893v1?rss=1">
<title>
<![CDATA[
Long-read sequencing maps transposable element variation and its regulatory and epigenetic effects in the human brain 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.07.02.735893v1?rss=1
</link>
<description><![CDATA[
Transposable elements (TEs) are mobile DNA sequences that shape genome architecture and gene regulation, yet their roles in the human brain remain largely unresolved. Short-read sequencing lacks the resolution to accurately map TE insertions, detect associated structural variants, and resolve highly repetitive regions. Here, we leverage long-read whole-genome sequencing to profile germline TE insertions in postmortem brain tissue from two ancestrally diverse cohorts: the North American Brain Expression Consortium (NABEC; European ancestry, n = 205) and the Human Brain Collection Core (HBCC; African and African-admixed ancestry, n = 146). We identified 2,842 and 1,660 high-confidence non-reference insertions in HBCC and NABEC, respectively, spanning Alu, LINE-1, and SVA elements. We then also further characterized complex short tandem repeat and variable number tandem repeat variation within reference SVA and Alu loci. Reference TEs were also found to mediate complex structural variants at loci implicated in brain development and neurodegenerative disease, with several showing ancestry-specific patterns. Integration of bulk RNA-sequencing data identified TE expression quantitative trait loci, including insertions that modulate neuronal gene expression. Single-nucleus RNA sequencing revealed cell-type-specific effects of TE regulation across cortical populations. Long-read methylation profiling further demonstrated age-associated epigenetic regulation of both reference and non-reference Alu elements. As a community resource, we release a catalog of TE insertions, allele frequencies, and ancestry-specific distributions to enable future functional and disease-focused investigations. Together, these findings highlight the widespread regulatory and epigenetic influence of TEs in the human brain and establish long-read sequencing as a powerful approach for uncovering cell-type- and population-specific TE dynamics.
]]></description>
<dc:creator><![CDATA[ Ayuketah, A., Meredith, M., Groza, C., Moller, A., Daida, K., Catching, A., Weller, C., Kouam, C., Paulin, L., Malik, L., Baker, B., Hu, F., Bromberek, S., Jerez, P. A., Paquette, K., Izydorczyk, M., Gu, B., Chaisson, M. J. P., Middlehurst, B., Bubb, V. J., Quinn, J. P., Price, E., Singleton, A. B., Jain, M., Blauwendraat, C., Nalls, M. A., Cookson, M. R., Reed, X., Sedlazeck, F. J., Goubert, C., Billingsley, K. J. ]]></dc:creator>
<dc:date>2026-07-03</dc:date>
<dc:identifier>doi:10.64898/2026.07.02.735893</dc:identifier>
<dc:title><![CDATA[Long-read sequencing maps transposable element variation and its regulatory and epigenetic effects in the human brain]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.30.735261v1?rss=1">
<title>
<![CDATA[
Spatial and functional mapping of the human pancreas reveals endocrine and exocrine cell states in health and metabolic disease 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.30.735261v1?rss=1
</link>
<description><![CDATA[
The human pancreas contains diverse endocrine and exocrine cell populations whose spatial organization is essential for tissue physiology. While single-cell and spatial transcriptomics revealed molecular heterogeneity across pancreatic cell types, linking these states to physiological activity in situ has remained challenging. Here, we combined single-nucleus RNA sequencing, spatial transcriptomics, and functional calcium imaging across pancreatic samples in health and metabolic disease. We identified heterogeneous endocrine and exocrine cell states associated with obesity and diabetes, including inflammatory remodeling of acinar and ductal populations. To directly couple tissue physiology with molecular state, we developed Slice-seq, which integrates calcium imaging with spatial transcriptomics in acute pancreatic slices. Slice-seq linked local endocrine composition and transcriptional programs with {beta} cell activity and identified extra-islet {beta} cells with reduced glucose responsiveness and mitochondrial oxidative metabolism. Together, our study provides a framework for linking pancreatic cell states to tissue organization and physiological activity in health and disease.
]]></description>
<dc:creator><![CDATA[ Xie, Y., Postic, S., Pereyra, D., Ferrara, R., Mullins, A., Pfabe, J., Dalman, M., Hallin, K., Ingvast, S., Smith, N., Suleiman, M., Tesi, M., Gyoeri, G., Dingfelder, J., Gironella-Torrent, M., Starlinger, P. P., Marselli, L., Marchetti, P., MacDonald, P. E., Rupnik, M. S., Korsgren, O., Camunas-Soler, J. ]]></dc:creator>
<dc:date>2026-07-03</dc:date>
<dc:identifier>doi:10.64898/2026.06.30.735261</dc:identifier>
<dc:title><![CDATA[Spatial and functional mapping of the human pancreas reveals endocrine and exocrine cell states in health and metabolic disease]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.29.735270v1?rss=1">
<title>
<![CDATA[
Long-range regulatory target prediction reveals shared genetic background across ulcerative colitis, Crohn's disease, primary sclerosing cholangitis and ankylosing spondylitis 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.29.735270v1?rss=1
</link>
<description><![CDATA[
Common variants detected by the genome-wide association studies (GWAS) create a wealth of knowledge on genetic component of individual traits and diseases. Elucidating the molecular mechanism behind the vast majority of these variants that are found to be non-coding remains a largely unsolved task, especially when distal and pleiotropic interactions between regulatory elements where these variants occur and gene promoters are taken into account. Focusing on four diseases with immune-mediated mechanisms namely ulcerative colitis, Crohn's disease, primary sclerosing cholangitis and ankylosing spondylitis, we demonstrate the utility of the targPred tool, providing prediction of genes targeted by the regulatory variants. We demonstrate that taking into account evolutionary and comparative genomic data, previously unobserved mechanistic trends (the platelet, vascular and sterol clusters) can be detected in terms of implicated genes targeted by the regulatory elements containing common variants, shared between all four diseases, as well as specific trends for subsets of diseases, e.g. two IBD phenotypes. We also elucidate a clinically-relevant target COG6 shared between IBD and PSC, as well as a whole range of other target genes missed by the conventional SNP-to-gene assignments methods.
]]></description>
<dc:creator><![CDATA[ Dulcic, D., Mandic, K., Hrsak, D., Baresic, A. ]]></dc:creator>
<dc:date>2026-07-03</dc:date>
<dc:identifier>doi:10.64898/2026.06.29.735270</dc:identifier>
<dc:title><![CDATA[Long-range regulatory target prediction reveals shared genetic background across ulcerative colitis, Crohn's disease, primary sclerosing cholangitis and ankylosing spondylitis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.29.735415v1?rss=1">
<title>
<![CDATA[
From junk to deleterious: Natural subtelomeric repeat amplifications impact fitness and cellular phenotypes in yeast 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.29.735415v1?rss=1
</link>
<description><![CDATA[
Eukaryotic genomes exhibit astounding levels of complexity. Much of this complexity resides in repetitive DNA thought to evolve neutrally, meaning that its impact on fitness is so small that natural selection cannot act efficiently to favor or purge it. Yet, repetitive DNA greatly facilitates the generation of structural variants (SVs), which fuel evolution with both adaptive and deleterious variation. How SVs involving initially neutral repetitive DNA can bring new evolutionarily meaningful impacts is not well understood. This is in part because finding and interpreting molecular signatures of these transitions using comparative genomics over long evolutionary timescales is challenging. Here, we document one such transition over a microevolutionary timescale using budding yeast population genomics. We characterize multiple massive amplifications of the Y' element, a highly polymorphic and dispensable subtelomeric tandem repeat. We uncover extreme structural diversity in Y' tandem amplifications among near-isogenic strains, and show that these amplifications bring a significant fitness cost. We further link Y' amplifications with transcriptome rewiring, heightened DNA replication stress sensitivity and DNA damage response activation. Together, our results support a model by which massive subtelomeric tandem amplification pushed a repetitive DNA family outside of effective neutrality to become deleterious.
]]></description>
<dc:creator><![CDATA[ Henault, M., Fogg, V., Heasley, L. R. ]]></dc:creator>
<dc:date>2026-07-03</dc:date>
<dc:identifier>doi:10.64898/2026.06.29.735415</dc:identifier>
<dc:title><![CDATA[From junk to deleterious: Natural subtelomeric repeat amplifications impact fitness and cellular phenotypes in yeast]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.30.735306v1?rss=1">
<title>
<![CDATA[
Variable latency between the founder genetic event and rhabdoid tumor expansion 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.30.735306v1?rss=1
</link>
<description><![CDATA[
Rhabdoid tumors are very aggressive rare pediatric cancers with poor survival affecting very young children. They are characterized by the bi-allelic loss of SMARCB1 or SMARCA4, which is suspected to occur prenatally. However, their genomic evolution is not well understood. Here we assembled the largest cohort of whole-genome sequenced rhabdoid tumors to date, comprising 97 tumors from 88 children. We discovered that, in 42% of cases, the bi-allelic inactivation of the driver gene occurred via a Copy Number Neutral-Loss of Heterozygosity (CN-LOH). We exploited these CN-LOH events and the steady accumulation of age-related mutations in the tumor genomes to estimate the age of donors at the time of occurrence of the driver event and at the time of emergence of the clonal expansion. Across all cases with CN-LOH, the loss of the driver gene occurred very early during prenatal development. However, the clonal expansion that ultimately gave rise to the tumor occurred at different times during infancy, even several years after the acquisition of the founder event. These results indicate that probably other factors, besides the genetic driver event, are required to promote rhabdoid tumorigenesis.
]]></description>
<dc:creator><![CDATA[ Sanchez-Guixe, M., Cebria-Xart, A., Fabre, N., Rodriguez-Hernandez, C. J., Pinheiro-Santin, M., Lavarino, C., Drost, J., Van Boxtel, R., Lopez-Bigas, N., Avgustinova, A., Gonzalez-Perez, A. ]]></dc:creator>
<dc:date>2026-07-03</dc:date>
<dc:identifier>doi:10.64898/2026.06.30.735306</dc:identifier>
<dc:title><![CDATA[Variable latency between the founder genetic event and rhabdoid tumor expansion]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.29.734743v1?rss=1">
<title>
<![CDATA[
Systematic benchmarking of low-input whole exome sequencing workflows for longitudinal ctDNA profiling in pancreatic ductal adenocarcinoma 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.29.734743v1?rss=1
</link>
<description><![CDATA[
Whole exome sequencing (WES) of circulating tumour DNA (ctDNA) enables longitudinal monitoring of tumour dynamics, evolution and treatment response but remains technically challenging in low-input, low-shedding settings such as pancreatic ductal adenocarcinoma (PDAC). Here, we systematically compared three commercially available low-input WES workflows incorporating Agilent (V6, V8) and Qiagen exome capture designs using ultra-low input cfDNAs extracted from multiple matched longitudinal plasma samples from PDAC patients. Using predefined performance metrics including coverage, duplication rate and variant detection and additional metrics relevant for clinical genomic profiling in patient care, we show that all three workflows produced high-quality sequencing data, even from very low input cfDNA. Within the conditions tested here, the Agilent V8 workflow provided the most favourable balance of coverage uniformity, sequencing efficiency and hotspot coverage for low input, low tumour fraction cfDNA WES. These findings demonstrate that workflow design, including capture footprint, substantially influences ctDNA WES performance in low-input clinical contexts. These findings are particularly relevant in early stage and/or minimal residual disease settings, where tumour fractions are low and recovery of genomic information from limited-input samples is critical.
]]></description>
<dc:creator><![CDATA[ James, L. G., Thorn, G. J., Morel, C., PCRFTB,, Kocher, H. M., Ross-Adams, H. E., Chelala, C. ]]></dc:creator>
<dc:date>2026-07-03</dc:date>
<dc:identifier>doi:10.64898/2026.06.29.734743</dc:identifier>
<dc:title><![CDATA[Systematic benchmarking of low-input whole exome sequencing workflows for longitudinal ctDNA profiling in pancreatic ductal adenocarcinoma]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.30.735570v1?rss=1">
<title>
<![CDATA[
Recombinogenic G-quadruplexes in the Newtonian DNA Sequence Space 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.30.735570v1?rss=1
</link>
<description><![CDATA[
Abstract The universe of possible nucleotide sequences expands combinatorially with sequence length, vastly exceeding the fraction sampled by real genomes. Yet genomic sequences exhibit reproducible compositional symmetries and recurrent structural motifs, indicating that biological sequence space is shaped by strong organizing constraints. Here, we introduce an explicit framework for constructing and visualizing the complete sequence universe using the Newtonian polynomial for a four-letter alphabet, and for identifying biologically relevant subsets through the application of fundamental filters. Three filters of biological relevance are formulated: (i) the constraint that DNA predominantly exists as an antiparallel-stranded double helix, (ii) the second Chargaff parity rule, which enforces approximate strand symmetry in single-stranded sequence composition, and (iii) genome shadows, reflecting the imprint of concerted sequence changes. Successive application of these filters dramatically reduces the accessible sequence space and reveals distinct symmetry classes. Among these, mirror-symmetric sequences occupy a privileged position because they are invariant under strand reversal and therefore compatible with both antiparallel and parallel strand orientations. This dual compatibility enables such sequences to bridge otherwise disjoint structural subspaces of DNA. G-rich members of this class are shown to have a strong propensity to form G-quadruplex architectures that incorporate parallel-stranded domains while remaining compatible with duplex DNA. We propose that this structural versatility provides a mechanistic basis for the recurrent association of G-rich mirror-symmetric sequences with recombination hotspots and genome rearrangements. Together, these results establish a symmetry-based framework for understanding how combinatorial sequence space is filtered into biologically functional DNA motifs.
]]></description>
<dc:creator><![CDATA[ Kuryavyi, V. V. ]]></dc:creator>
<dc:date>2026-07-03</dc:date>
<dc:identifier>doi:10.64898/2026.06.30.735570</dc:identifier>
<dc:title><![CDATA[Recombinogenic G-quadruplexes in the Newtonian DNA Sequence Space]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.29.734667v1?rss=1">
<title>
<![CDATA[
glmmDMR reveals replicate-level methylation variance as a major determinant of false-positive DMR detection 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.29.734667v1?rss=1
</link>
<description><![CDATA[
Background: Accurate identification of differentially methylated regions (DMRs) is fundamental to epigenomic research but remains challenging due to biological variability among replicates, heterogeneous effect sizes, and the tendency of adjacent cytosines to share similar methylation states. Many existing methods aggregate methylation measurements before statistical testing or do not explicitly account for replicate-level variability, contributing to elevated false-positive rates. Results: We developed glmmDMR, a DMR detection framework that combines generalized linear mixed models with a seed-based strategy for reconstructing DMRs from locally high-confidence signals while explicitly modeling replicate-level variability. Using simulated datasets with known ground-truth DMRs, we demonstrate that false-positive detections are more strongly associated with methylation variance among biological replicates than with the magnitude of methylation differences between groups. glmmDMR achieved higher precision than existing approaches while maintaining competitive recall, particularly for subtle methylation differences. Site-level modeling with beta regression provided the strongest overall performance, and seed-based region construction reduced artificial DMR fragmentation, improving recovery of true DMR boundaries and producing more contiguous, biologically interpretable DMRs. Applied to Arabidopsis thaliana ddm1 methylomes and a rice DEMETER-LIKE DNA demethylase mutant (Osdml3a-1), glmmDMR identified biologically meaningful DMRs, revealing widespread TE-associated hypomethylation and subtle TE-family-specific hypermethylation. Conclusions: Replicate-level methylation variance is an important determinant of DMR detection performance, and explicitly modeling this variance improves discrimination of biologically meaningful methylation changes from high-variance signals. By combining variance-aware statistical modeling with seed-based region construction, glmmDMR provides a robust framework for identifying contiguous, biologically interpretable DMRs across diverse methylome datasets.
]]></description>
<dc:creator><![CDATA[ Daito, Y., Uechi, M., Kinoshita, T., Tonosaki, K. ]]></dc:creator>
<dc:date>2026-07-03</dc:date>
<dc:identifier>doi:10.64898/2026.06.29.734667</dc:identifier>
<dc:title><![CDATA[glmmDMR reveals replicate-level methylation variance as a major determinant of false-positive DMR detection]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.30.735624v1?rss=1">
<title>
<![CDATA[
Higher-order Architecture Shapes Concerted Evolution in a Y-linked repeat array 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.30.735624v1?rss=1
</link>
<description><![CDATA[
The maintenance of functional repeat arrays on nonrecombining sex chromosomes presents an evolutionary paradox: tandem repeats are intrinsically unstable yet must preserve sequence identity and copy number to remain functional. The Y-linked Suppressor of Stellate (Su(Ste)) locus in Drosophila melanogaster is a large tandem array that produces piRNAs to silence the X-linked meiotic driver Stellate, but how such arrays are maintained remains unclear. Here, we reconstruct and compare repeat-resolved assemblies of the Su(Ste)/PCKR tandem array across three strains and show that the array is partitioned into discrete domains of elevated sequence identity. These domains exhibit an alternating pattern of similarity, in which nonadjacent regions are more similar to each other than to neighboring regions, and this organization is conserved across strains. Copy-number variation occurs primarily within specific domains, while overall array architecture remains stable. These results indicate that concerted evolution in the Su(Ste) array operates within structurally defined domains rather than uniformly across the array. The association of domain boundaries with inverted repeat elements suggests that higher-order structure constrains gene conversion, shaping both sequence homogenization and copy-number dynamics. In contrast, Y-linked rDNA arrays show uniform sequence similarity across long genomic distances, indicating a distinct mode of homogenization. Together, our findings demonstrate that gene conversion on nonrecombining chromosomes is structured by higher-order array architecture, providing a general framework for the maintenance of functional repeat arrays.
]]></description>
<dc:creator><![CDATA[ Delgado, A. A., Samano, A., Chakraborty, M. ]]></dc:creator>
<dc:date>2026-07-03</dc:date>
<dc:identifier>doi:10.64898/2026.06.30.735624</dc:identifier>
<dc:title><![CDATA[Higher-order Architecture Shapes Concerted Evolution in a Y-linked repeat array]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.28.733875v1?rss=1">
<title>
<![CDATA[
Genome-Wide Markers Predict Metribuzin Tolerance in Southern Soft Red Winter Wheat 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.28.733875v1?rss=1
</link>
<description><![CDATA[
Metribuzin is a versatile herbicide effective against various annual grasses and broadleaf weeds found in wheat fields. However, it can cause foliar damage to wheat, impacting plant health and yield. A clearer understanding of the genetic architecture associated with metribuzin tolerance is necessary to guide marker-based breeding strategies. This study evaluated 351 historic Gulf Atlantic Wheat Nursery (GAWN) wheat breeding lines representative of southern US soft red winter wheat (SRWW) germplasm. Field trials were conducted at Winnsboro (WN) and Baton Rouge (BR), Louisiana, in 2016 and 2017. Metribuzin was applied at specific growth stages[DN1.1], and tolerance was assessed based on visual foliar damage. Genomic data from 6,252 filtered single nucleotide polymorphism (SNP) markers were used to estimate narrow-sense heritability, conduct genome-wide association (GWAS), and assess genomic prediction accuracy using genomic best linear unbiased prediction (GBLUP). Broad-sense heritability ranged from 0.54 to 0.69 within environments and reached 0.77 across environments, while narrow-sense heritability ranged from 0.35 to 0.47, indicating moderate additive genetic control. No SNP surpassed the significance threshold, but genomic prediction (GP) showed moderate to strong predictive ability (PA) across environments, with the highest accuracy (r = 0.62) observed between BR17 and WN17. These results indicate that metribuzin tolerance in SRWW is primarily controlled by multiple small-effect loci and that GS provides a more effective breeding strategy than marker-assisted selection for improving tolerance in southern wheat germplasm.
]]></description>
<dc:creator><![CDATA[ Sellani, J., Anzueto, H., Arcenaux, K., Price, P. T., Brown-Guedira, G., Harrison, S., DeWitt, N. ]]></dc:creator>
<dc:date>2026-07-03</dc:date>
<dc:identifier>doi:10.64898/2026.06.28.733875</dc:identifier>
<dc:title><![CDATA[Genome-Wide Markers Predict Metribuzin Tolerance in Southern Soft Red Winter Wheat]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.07.01.729829v1?rss=1">
<title>
<![CDATA[
Evo 2's Perception of Single Nucleotide Substitutions in the Genes of Two Plant Model Organisms 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.07.01.729829v1?rss=1
</link>
<description><![CDATA[
Although DNA Large Language Models (DNA-LLMs) offer a path to decoding genetic complexity, our ability to evaluate these models is constrained by our incomplete understanding of the very same genetic syntax and functional logic that these models are trained to learn. In this study we use single nucleotide substitutions that have or have not been observed in living organisms, to evaluate how the DNA-LLM Evo 2 interprets gene sequences from two plant model organisms, Arabidopsis thaliana and Oryza sativa japonica. Using perplexity as a measure of the model's confidence, we observe that alleles containing simulated substitutions are perceived, on average, as less likely than those observed in vivo. Although the size of the effect is modest, the effect is statistically significant and robust, suggesting that Evo 2 is aligned with our current understanding of evolutionary selective constraints. This approach is designed to be model-agnostic and species-agnostic and could serve as a generic framework for evaluating the performance of DNA-LLMs.
]]></description>
<dc:creator><![CDATA[ Mantegazza, O., Bertolini, L., Leoni, G., Colaiacovo, M., Petrillo, M., Bonfini, L., Savini, C., Ceresa, M., Zaoui, X. ]]></dc:creator>
<dc:date>2026-07-03</dc:date>
<dc:identifier>doi:10.64898/2026.07.01.729829</dc:identifier>
<dc:title><![CDATA[Evo 2's Perception of Single Nucleotide Substitutions in the Genes of Two Plant Model Organisms]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.29.735274v1?rss=1">
<title>
<![CDATA[
Comparative genomics reveals shared accessory regions between members of two Fusarium species complexes virulent on garden pea 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.29.735274v1?rss=1
</link>
<description><![CDATA[
The contribution of accessory or conditionally dispensable chromosomes to host-specific virulence was first demonstrated in members of the Fusarium solani species complex (FSSC) that are pathogens of garden pea, Pisum sativum L. The phenomenon has since been shown to exist in many fungal plant pathogens, including the closely related F. oxysporum species complex (FOSC). Genome analysis of members of the FSSC and FOSC pathogenic on pea revealed a diverse size range of the accessory genome of these fungi. Despite the ~65 million years of diverging time, regions on a chromosome known to carry host-specific virulence factors for pea, including the cytochrome P450 pisatin demethylase (PDA) and other pea pathogenicity (PEP) genes, were present in all genomes of these pea pathogens. Genes directly involved in virulence on pea - PEP2, PDA, and PEP5- were the most frequently clustered together. Transcriptome analysis of fungal mycelia treated with the pea phytoalexin pisatin, identified 1,155 differentially expressed genes where many were involved in cellular stress responses. As wilt pathogens that invade host xylem, members of the FOSC encode more putative effectors, when compared to those in the FSSC, and several FOSC effectors were identified to confer race specificity. The conservation of part of the accessory genomes across two evolutionarily diverged species complexes suggests a common origin. Horizontal transfer of accessory chromosomes containing genetic loci involved in pathogenesis for garden pea offers a parsimonious explanation of the polyphyletic origin of host specificity.
]]></description>
<dc:creator><![CDATA[ Pokhrel, A., Haridas, S., Calhoun, S., Kuo, A., Lipzen, A., Riley, R., LaButti, K., Pangilinan, J., Andreopoulos, B., He, G., Yan, M., Barry, K., Ma, L.-J., Geiser, D. M., Freitag, M., Grigoriev, I. V., Coleman, J. ]]></dc:creator>
<dc:date>2026-07-03</dc:date>
<dc:identifier>doi:10.64898/2026.06.29.735274</dc:identifier>
<dc:title><![CDATA[Comparative genomics reveals shared accessory regions between members of two Fusarium species complexes virulent on garden pea]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.29.735038v1?rss=1">
<title>
<![CDATA[
Ultra-accurate sequencing reveals an extreme transmission bottleneck in a deep-sea clam symbiosis 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.29.735038v1?rss=1
</link>
<description><![CDATA[
Vertically transmitted symbionts experience progressive genome degradation driven by transmission bottlenecks each host generation that reduce genetic diversity and promote fixation of deleterious mutations. Direct estimates remain rare because inference requires scarce parent-offspring samples and sequencing sensitive enough to detect rare variants. Here, we investigate symbiont transmission bottlenecks in a vesicomyid clam by deeply sampling within-host endosymbiont genetic diversity using two ultra-accurate sequencing methods. Demographic modeling revealed an effective bottleneck size of approximately eight symbionts (95% CI: 1-17 genomes) per host generation. This estimate is sharply reduced relative to prior cytological estimates of bottleneck census size, with important implications for understanding the rate and dynamics of endosymbiont genome degradation.
]]></description>
<dc:creator><![CDATA[ Mirchandani, C., Pepper-Tunick, E., Gozashti, L., Russell, S., Corbett-Detig, R. ]]></dc:creator>
<dc:date>2026-07-03</dc:date>
<dc:identifier>doi:10.64898/2026.06.29.735038</dc:identifier>
<dc:title><![CDATA[Ultra-accurate sequencing reveals an extreme transmission bottleneck in a deep-sea clam symbiosis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.27.734930v1?rss=1">
<title>
<![CDATA[
From mountaintops to metacollections: using genomics to evaluate ex situ conservation collections. A case study from tropical montane cloud forest plants 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.27.734930v1?rss=1
</link>
<description><![CDATA[
A core aim of ex situ conservation is to represent wild genetic diversity in managed living collections. For the climate-threatened tropical montane cloud forest (TMCF) flora of northeast Australia, an ex situ metacollection of plants and seeds has been established by the Tropical Mountain Plant Science (TroMPS) project. In this study we used reduced-representation sequencing (DArTseq) of wild, herbarium, and ex situ material alongside provenance information for ten species, to pursue two central aims: to characterise landscape-scale genetic structure across species' ranges, and to evaluate how well the assembled metacollections represent that wild diversity. Analyses revealed consistent patterns of genetic differentiation among mountain top populations across multiple species, reflecting the isolating influence of lowland gaps between upland habitats, with the degree of differentiation varying among species. These results provide the first genetic baseline for Australian TMCF flora and reinforce the importance of treating individual mountain top populations as distinct units for conservation management. Additionally, the project provided valuable insights into the logistical challenges of coordinated multi-institutional collecting, informing strategies for metacollection design more broadly. Evaluation of the metacollection revealed both strengths and gaps in representation across species, providing an evidence base to refine the current holdings and guide future targeted collecting to strengthen their long-term conservation value.
]]></description>
<dc:creator><![CDATA[ Cascini, M., Simpson, L., Worboys, S., Worboys, W., Guja, L., Knapp, Z., Bredell, P., Percival, J., Rossetto, M., Crayn, D. ]]></dc:creator>
<dc:date>2026-07-03</dc:date>
<dc:identifier>doi:10.64898/2026.06.27.734930</dc:identifier>
<dc:title><![CDATA[From mountaintops to metacollections: using genomics to evaluate ex situ conservation collections. A case study from tropical montane cloud forest plants]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.29.735218v1?rss=1">
<title>
<![CDATA[
A chromosome-level genome assembly of the Eurasian great grey owl, Strix nebulosa lapponica (Thunberg 1798) 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.29.735218v1?rss=1
</link>
<description><![CDATA[
We present a chromosome-level genome assembly of a female great grey owl (Strix nebulosa lapponica). The assembly comprises two pseudo-haplotypes of 1554 Mb and 1242 Mb, with 83.2% and 91.4% scaffolded into 40 autosomal chromosomes, in addition to the W and Z sex chromosomes both placed in hap1. Assembly completeness is high (BUSCO 99.2% and 94.8%), with 18,493 and 17,279 annotated protein-coding genes for hap1 and hap2, respectively. This genome establishes a reference for investigating genetic variation and chromosome evolution in great grey owls. Compared with the previous S. nebulosa assembly, this assembly includes both sex chromosomes, separates regions that were previously collapsed, and resolves 82 chromosomes total. While larger chromosomes show broadly conserved synteny across owl assemblies, the recovery of additional conserved microchromosome-associated genes suggests that ONT reads improved resolution of the smallest chromosomes relative to HiFi-based assemblies.
]]></description>
<dc:creator><![CDATA[ Strand, M. A., Steindal, I. A. F., Ragnhildstveit, E., Solheim, R., Torresen, O. K., Skage, M., Ferrari, G., Tooming-Klunderud, A., Jakobsen, K. S. ]]></dc:creator>
<dc:date>2026-07-02</dc:date>
<dc:identifier>doi:10.64898/2026.06.29.735218</dc:identifier>
<dc:title><![CDATA[A chromosome-level genome assembly of the Eurasian great grey owl, Strix nebulosa lapponica (Thunberg 1798)]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.29.735170v1?rss=1">
<title>
<![CDATA[
HiFi-ST: High-Fidelity Reconstruction of Continuous Spatial Transcriptomic Expression Fields via Conditional Neural Fields 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.29.735170v1?rss=1
</link>
<description><![CDATA[
Spatial transcriptomics characterizes tissue-scale gene expression patterns, yet its observations are sparse discrete samples of an underlying continuous molecular field, leading to spatial aliasing and sub-resolution information loss. Existing methods usually formulate this task as spot-level point regression, making it difficult to capture both expression continuity and the regional nature of observation. Here, we propose HiFi-ST, a conditional neural field framework for continuous spatial transcriptomics modeling. HiFi-ST formulates spatial gene expression prediction as continuous expression field learning, models each spot as a regional observation over a finite support domain, approximates local integration through Monte Carlo sampling, and integrates multiscale tissue feature extraction with FiLM-based conditional modulation to improve modeling of complex spatial heterogeneity and consistency with the underlying measurement process. Systematic evaluation on three independent datasets (HER2+, cSCC, and Alex_NatGen) showed that HiFi-ST outperformed mclSTExp, BLEEP, THItoGene, His2ST, and HisToGene on key metrics. On HER2+, HiFi-ST achieved an average PCC improvement of 65.1% and an average MSE reduction of 40.9%; on cSCC, PCC improved by 10.2% and MSE decreased by 51.2%; on Alex_NatGen, PCC improved by 80.0% and MSE decreased by 16.3%. In addition, the learned multiscale tissue representations supported downstream spatial immunoanalysis, including assisted identification of candidate TLS regions. Overall, HiFi-ST provides a unified framework bridging discrete measurements and continuous expression field reconstruction for tumor microenvironment analysis and spatial immune structure characterization.
]]></description>
<dc:creator><![CDATA[ Li, H., Tang, L., Han, W., Yang, X., Chen, X. ]]></dc:creator>
<dc:date>2026-07-02</dc:date>
<dc:identifier>doi:10.64898/2026.06.29.735170</dc:identifier>
<dc:title><![CDATA[HiFi-ST: High-Fidelity Reconstruction of Continuous Spatial Transcriptomic Expression Fields via Conditional Neural Fields]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.29.735224v1?rss=1">
<title>
<![CDATA[
Scalable multi-group nonnegative spatial factorization for spatial genomics data with cell-type heterogeneity 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.29.735224v1?rss=1
</link>
<description><![CDATA[
Spatial transcriptomics (ST) technologies enable the study of gene expression within the spatial context of tissues, providing insights into tissue structure, cellular interactions, and disease progression. However, existing dimension reduction methods often overlook spatial information or struggle to distinguish spatial gene patterns from those driven by cell-type differences, limiting biological interpretability by convolving differences in gene expression patterns with differences in cell-type proportions. To address these challenges, we introduce the scalable multi-group nonnegative spatial factorization (smNSF), a computationally-tractable probabilistic framework that integrates spatial coordinates and cell-type labels into a unified matrix factorization model. By using multi-group Gaussian processes (MGGPs) as priors, our model captures complex spatial variation in a cell-type specific way while enforcing nonnegativity to enhance interpretability. We develop a variational inference framework for MGGPs that supports scalable optimization and improves the numerical stability of smNSF. Across seven spatial transcriptomics datasets spanning diverse technologies and tissues, smNSF recovers sparse, interpretable spatial factors and, through its cell-type conditional posteriors, organizes them into cell-type enriched, cell-type specific, and universal spatial programs that are not apparent from marginal factors alone. Given cell-type labels in ST data, smNSF enables cell-type aware spatial decompositions and supports cell-type conditional posteriors for in silico exploration of relationships between spatial patterns and cellular identity.
]]></description>
<dc:creator><![CDATA[ Chumpitaz-Diaz, L., Shrestha, P., Engelhardt, B. E. ]]></dc:creator>
<dc:date>2026-07-03</dc:date>
<dc:identifier>doi:10.64898/2026.06.29.735224</dc:identifier>
<dc:title><![CDATA[Scalable multi-group nonnegative spatial factorization for spatial genomics data with cell-type heterogeneity]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.07.01.735800v1?rss=1">
<title>
<![CDATA[
Initiation codon context governs translation-coupled mRNA decay and coordinated expression in the human parasite Leishmania 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.07.01.735800v1?rss=1
</link>
<description><![CDATA[
In the absence of canonical, promoter-based transcriptional regulation, Leishmania has evolved alternative regulatory mechanisms for adaptive gene expression, including post-transcriptional control via differential mRNA turnover. While this mechanism is recognized as critical in Leishmania, fundamental aspects of transcript stability in these parasites remain to be elucidated, such as the role of translation initiation-mediated mRNA decay. We addressed this important gap by investigating the role of the initiation codon context (Kozak sequence) in gene expression in L. donovani. Mapping Kozak sequences across the trypanosomatid genomes revealed important differences in nucleotide preference across the genus and sub-genus levels, suggesting important cis-regulatory function. Within a single species, only a small subset of possible Kozak sequences is associated with several start codons, further supporting their role in expression control. Transgenic L. donovani lines expressing EGFP under the control of distinct Kozak variants indeed demonstrated that the nucleotide context of the start codon directly modulates both protein expression and mRNA stability, which was associated with increased recruitment of mRNA to heavy polysomes. Parasite exposure to the translation inhibitor cycloheximide restored EGFP expression driven by a weak Kozak sequence, revealing a direct link between mRNA stability and Kozak-mediated translatability. RNA-seq analysis of parasites arrested for transcription or translation elongation revealed transcripts enriched for the GO terms RNA modification and pseudouridine synthesis as key targets for translation-dependent mRNA turnover. The segregation of these transcripts into functional clusters with distinct Kozak profiles further suggests that Kozak sequence composition defines Kozak-governed regulons in Leishmania. Within this regulatory framework, the -3 nucleotide is identified as the key positional determinant driving differential transcript abundance. Our work uncovers a key role for translation initiation-coupled mRNA decay in Leishmania gene expression regulation adding a previously underappreciated layer of post-transcriptional regulation in parasite adaptation.
]]></description>
<dc:creator><![CDATA[ SANTI, A. M. M., COKELAER, T., PIPOLI DA FONSECA, J., JENKINS, P., TIKHONOVA, E., RODRIGUEZ-ALMONACID, C., KARAMYSHEV, A., KARAMYSHEVA, Z., SPÄTH, G. ]]></dc:creator>
<dc:date>2026-07-02</dc:date>
<dc:identifier>doi:10.64898/2026.07.01.735800</dc:identifier>
<dc:title><![CDATA[Initiation codon context governs translation-coupled mRNA decay and coordinated expression in the human parasite Leishmania]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.29.730585v1?rss=1">
<title>
<![CDATA[
Genomic impact of the second plague pandemic on three human populations 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.29.730585v1?rss=1
</link>
<description><![CDATA[
The second plague pandemic (early 14th-early 19th centuries), which was caused by Yersinia pestis, had a profound demographic, socio-economic and cultural impact across Eurasia and North Africa. Many regions in Europe and the Middle East are estimated to have lost 40-60% of their human populations, with some areas suffering even higher mortality. Whether exposure to Y. pestis drove strong positive selection on protective genetic variants in the human genome, and how it shaped migration patterns, remains debated, despite several recent studies based on ancient DNA. Here, we analyse a markedly larger, higher coverage, and geographically diverse dataset based on shotgun sequencing of genomes from 529 ancient individuals to a mean depth 8.8x dating to either before or after the arrival of the pandemic at three sites in northern Europe: Trondheim (Norway), Lund (Sweden) and Vilnius (Lithuania). Genome-wide scans for signatures of selection provide no evidence for strong positive selection acting on specific genetic variants driven by Y. pestis exposure: we neither replicate selection signatures reported by previous studies nor identify new genome-wide significant candidates. However, for all three sites, we observe evidence for a reduction in long-range immigration, indicated by a drop in the diversity of ancestry that followed the arrival of Y. pestis and broadly coincided with the end of the Viking Age, Christianisation and the onset of the Little Ice Age. Our results shed important light on the demographic impact of major sociohistorical changes that occurred during the late Medieval period in Scandinavia and the Baltic region and link Christianisation to increased diversity in ancestry before the pandemic.
]]></description>
<dc:creator><![CDATA[ Liu, X., Moore, K., Ebenesersdottir, S. S., Arcini, C., Walker, G.-T., Denham, S. D., Slavin, P., Sotofte, M. B., Nielsen, S. D., Ellegaard, M. R., Vagene, A. J., Margaryan, A., Iraeta-Orbegozo, M., Mylopotamitaki, D., Laffoon, J., Philippsen, B., Rydahl, M. C., Jankauskas, R., Kozakaite, J., Alfredsson, L., Cavalleri, G. L., Chen, H. S., Cheronet, O., Demetz, L., Emeruem, D. N., Fernandes, D., Gelabert, P., Gilbert, E., Hansen, T. F., Hovig, E., Kockum, I., Llanos-Lizcano, A., Oberreiter, V., Olsson, T., Pinhasi, R., Praxmarer, E., Skar, B., Werge, T., Stenoien, H. K., Schroeder, H., Martin ]]></dc:creator>
<dc:date>2026-07-02</dc:date>
<dc:identifier>doi:10.64898/2026.06.29.730585</dc:identifier>
<dc:title><![CDATA[Genomic impact of the second plague pandemic on three human populations]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.28.735079v1?rss=1">
<title>
<![CDATA[
simSOMA: a cell-lineage based simulator of the somatic VAF spectrum in plants 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.28.735079v1?rss=1
</link>
<description><![CDATA[
Plants accumulate somatic mutations during growth, and some of these mutations can spread from local cell lineages into branches, organs, or reproductive tissues. There is growing interest in these variants because they can underlie bud-sport traits in crops, contribute to within-organism somatic selection, and provide genetic variation that may be transmitted vegetatively or sexually to future generations. Recent genomic sequencing of bulk and layer-enriched plant tissues has shown that de novo somatic variants can generate complex variant allele-frequency (VAF) spectra. Interpreting these spectra requires understanding how mutations arising during mitotic cell division are filtered or amplified through shoot growth, branching, and organ formation. Because these processes interact across multiple scales, their combined effects are difficult to derive analytically. Here, we present simSOMA, a modular simulator that links rooted plant topologies to explicit cell-lineage dynamics. simSOMA models somatic mutation accumulation during stem-cell self-renewal in the shoot apical meristem, clonal expansion from the stem-cell niche to the meristem periphery, branch founding, and organ formation. Applying simSOMA across diverse growth scenarios revealed how individual processes can be isolated, varied, and combined to assess their effects on organ-level VAF spectra and among-organ variant sharing. The same simulated spectra can also be transformed to represent bulk or layer-enriched sampling and phased or unphased variant readouts, separating effects of developmental history from those introduced by tissue composition and allele counting. Because simSOMA is organized around modules with defined input-output interfaces, individual developmental components can be replaced or extended as new empirical information becomes available. This makes simSOMA a flexible tool for testing alternative models of somatic mosaicism in plants and for guiding the design and interpretation of VAF-based sequencing studies. The simulator is available at https://github.com/jlab-code/simSOMA.
]]></description>
<dc:creator><![CDATA[ Johannes, F. ]]></dc:creator>
<dc:date>2026-07-01</dc:date>
<dc:identifier>doi:10.64898/2026.06.28.735079</dc:identifier>
<dc:title><![CDATA[simSOMA: a cell-lineage based simulator of the somatic VAF spectrum in plants]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.26.734910v1?rss=1">
<title>
<![CDATA[
The E2F1 regulon orchestrates a proliferative emergency and vascular programming in fetal endothelial progenitors exposed to GDM: a sex-stratified systems medicine approach 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.26.734910v1?rss=1
</link>
<description><![CDATA[
Maternal Gestational Diabetes Mellitus (GDM) and obesity are major drivers of the Developmental Origins of Health and Disease (DOHaD), predisposing offspring to premature cardiovascular disease. However, the specific molecular pathways that program this sex-specific vascular risk remain poorly defined due to the cellular complexity of the placenta. We sought to identify the primary regulatory engines of fetal vascular programming in a sex-stratified neonatal cohort. We analyzed purified neonatal Endothelial Colony Forming Cells (ECFCs)--the fundamental progenitors of the fetal vasculature--from pregnancies complicated by GDM and pre-pregnancy obesity. Using a sex-stratified regulatory inference framework, we decoupled the priming effects of obesity from the acute transcriptomic insult of GDM. Our findings reveal a profound functional asymmetry in fetal vascular adaptation. While male progenitors maintain metabolic resilience through AKT3-mediated buffering, the female fetal-placental interface undergoes a systemic proliferative emergency. This maladaptive state is driven by a massive unshackling of the E2F1-regulon (NES = 16.86), triggered by a maternal-fetal surge in CDK/MAPK signaling. This female-specific program prioritizes unscheduled cell-cycle progression at the metabolic expense of angiogenic maturation and innate immune surveillance. GDM imposes a sex-specific epigenetic scar on female fetal endothelial progenitors, characterized by a quantity-over-quality trade-off in vascular development. This identification of the E2F1-pathway as a driver of fetal vascular exhaustion provides a mechanistic basis for the increased cardiovascular vulnerability in female offspring and identifies the cell cycle as a potential therapeutic target for mitigating the long-term sequelae of GDM.
]]></description>
<dc:creator><![CDATA[ Adegbaju, M. S., Babayeju, O., Morenikeji, O. B., Ojurongbe, O., Thomas, B. ]]></dc:creator>
<dc:date>2026-07-01</dc:date>
<dc:identifier>doi:10.64898/2026.06.26.734910</dc:identifier>
<dc:title><![CDATA[The E2F1 regulon orchestrates a proliferative emergency and vascular programming in fetal endothelial progenitors exposed to GDM: a sex-stratified systems medicine approach]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.26.734884v1?rss=1">
<title>
<![CDATA[
Unveiling the Hidden Rules: Enhancing NMD Prediction for Protein-Truncating Variants 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.26.734884v1?rss=1
</link>
<description><![CDATA[
Nonsense-mediated decay (NMD) is a conserved RNA quality-control pathway that degrades transcripts containing premature termination codons. Because roughly a third of pathogenic variants in ClinVar can lead to truncated protein synthesis, predicting whether such transcripts undergo NMD is central to interpreting variant effects, yet the canonical 50-55 nucleotide rule explains only about half of observed outcome variability. Using paired whole-genome and RNA-sequencing from 10,306 individual samples in the Trans-Omics for Precision Medicine (TOPMed) program, we quantified NMD efficiency for 5,749 germline truncating variants via allele-specific expression and trained a gradient-boosting classifier, TrunCat, that distinguished NMD-sensitive from NMD-escape transcripts with [~]78% ROC-AUC (Receiver Operating Characteristic - Area Under the Curve). A reduced model using the ten features with the highest mean SHAP (SHapley Additive exPlanations) value as a measure of each features average contribution to predictions nearly matched this performance. Applied across large variant databases and a rare-disease cohort, the model produced NMD outcome predictions, with variants of uncertain significance showing higher predicted escape than pathogenic ones. This framework confirms the canonical rule, identifies non-canonical determinants, and offers a scalable resource for interpreting protein-truncating variants.
]]></description>
<dc:creator><![CDATA[ Egab, I., Schmidt, J., Cortazar, M., Xu, J., Orchard, P., Bozkurt-Yozgatli, T., Dawood, M., Koh, J., Mestroni, L., Taylor, M., Yi, S. S., Calame, D., Posey, J., Gibbs, R. A., Boerwinkle, E., Reiner, A. P., de Vries, P. S., Morrison, A., Shaw, C. A., Lupski, J. R., Carvalho, C. M. B., Montgomery, S. B., Jagannathan, S., Coban Akdemir, Z. ]]></dc:creator>
<dc:date>2026-07-01</dc:date>
<dc:identifier>doi:10.64898/2026.06.26.734884</dc:identifier>
<dc:title><![CDATA[Unveiling the Hidden Rules: Enhancing NMD Prediction for Protein-Truncating Variants]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.26.733976v1?rss=1">
<title>
<![CDATA[
Nitrogen use efficiency in pigs is associated with transcriptomic signatures related to amino acid metabolism, immune activity, and nutrient partitioning 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.26.733976v1?rss=1
</link>
<description><![CDATA[
Dietary protein restriction challenges the allocation of amino acids to growth and other physiological functions and therefore requires coordinated metabolic adaptation. Domestic pigs provide an informative system in which to study such responses, because nitrogen retention directly affects lean growth and can be quantified accurately under controlled feeding and housing conditions. Under reduced-protein diets, pigs differ in how effectively they retain nitrogen, and this variation has a genetic basis, making them well suited to investigate the molecular regulation of nitrogen use efficiency (NUE). Here, we characterise differential gene expression and enriched pathways in liver and skeletal muscle of more than 80 pigs with two divergent NUE phenotypes (high and low) maintained under the same protein-reduced, ad libitum dietary conditions. The two NUE phenotypes were clearly distinct at the transcriptomic level, with 177 differentially expressed genes in the liver and 133 in the muscle. In the liver, differential expression and enrichment analyses indicate reduced amino acid catabolism, lower inflammatory and detoxification activity, and a metabolic state that favours lipid processing and insulin-related regulation over the use of amino acids as energy sources. In skeletal muscle, they point to reduced lipid uptake, lower reliance on amino acid oxidation, and a greater emphasis on protein synthesis, translational regulation, mitochondrial energy metabolism, and growth-related processes. These gene-level patterns were supported and extended by pathway and gene-set enrichment analyses. Together, the results suggest that high and low-NUE pigs differ through coordinated, tissue-specific molecular adaptations. Overall, variation in NUE appears to reflect coordinated, tissue-specific differences in how nutrients are allocated between energy use, storage, and lean tissue growth.
]]></description>
<dc:creator><![CDATA[ Monney, B., Ewaoluwagbemiga, E. O., Kasper, C. ]]></dc:creator>
<dc:date>2026-07-01</dc:date>
<dc:identifier>doi:10.64898/2026.06.26.733976</dc:identifier>
<dc:title><![CDATA[Nitrogen use efficiency in pigs is associated with transcriptomic signatures related to amino acid metabolism, immune activity, and nutrient partitioning]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.26.734379v1?rss=1">
<title>
<![CDATA[
Pericystic brain transcriptomics reveals molecular signatures of immune activation and neurovascular remodelling in viable and post-treatment porcine neurocysticercosis 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.26.734379v1?rss=1
</link>
<description><![CDATA[
Neurocysticercosis (NCC), the infection of the central nervous system by Taenia solium larvae, is a leading cause of acquired epilepsy in endemic regions. While viable cysticerci can persist asymptomatically for extended periods, their spontaneous or drug-induced degradation triggers marked perilesional inflammation and severe neurological symptoms. Despite well-documented histopathological characterisation of these lesion states, the host transcriptional programmes associated with viable parasite persistence and early post-treatment lesion disruption remain poorly understood. To address this gap, we performed the first bulk RNA sequencing of pericystic brain tissue using a physiologically relevant porcine model of NCC. Comparing uninfected controls (n = 3), infected untreated pigs with intact viable cysts (n = 6), and antiparasitic-treated pigs with disrupted cysts (n = 3), we identified distinct transcriptional signatures associated with each disease state. Viable infection was associated with broad transcriptional changes (461 upregulated and 175 downregulated genes), characterised by local immune activation alongside suppression of blood-brain barrier (BBB) remodelling, vascular, and neuronal signalling molecular signatures. The post-treatment state with confirmed BBB disruption was associated with a smaller but directionally distinct response (160 upregulated and 57 downregulated genes), marked by inflammatory signalling and increased expression of genes associated with endothelial activation, vascular regulation, and BBB-associated remodelling. Together, these findings suggest that, while immune engagement is a feature shared across both lesion states, the BBB-associated transcriptional axis shifts substantially following treatment. These results provide an exploratory transcriptomic framework for understanding parasite persistence, treatment-induced neuroinflammation, and neurovascular remodelling in NCC, and highlight candidate pathways and genes for future mechanistic investigation.

Author SummaryNeurocysticercosis is a major cause of epilepsy in regions where Taenia solium is endemic. Brain cysts can remain viable for long periods with limited symptoms, but parasite degeneration, whether spontaneous or drug-induced, can trigger damaging neuroinflammation. In this study, we used RNA sequencing in a pig model that closely resembles human disease to characterise how brain tissue responds to viable cysts and to early treatment-induced cyst disruption. We found that viable infection was associated with local immune activation alongside reduced expression of genes involved in blood-brain barrier function. Following antiparasitic treatment, disrupted lesions showed an increased expression of genes linked to vascular and barrier remodelling. These findings suggest that the host transcriptional environment changes substantially after parasite disruption, and highlight molecular pathways that may contribute to neuroinflammation, blood-brain barrier changes, and neurological disease in NCC. As an exploratory first transcriptomic survey in this model, these results provide a candidate framework for future studies aimed at identifying biomarkers and adjunctive therapeutic targets in NCC.
]]></description>
<dc:creator><![CDATA[ Apaza-Quiroz, C. A., Rojas-Portocarrero, C. C., Gutierrez Guarnizo, S. A., Ponce-Nakatahara, E. K., Bustos, J. A., Arroyo, G., Gilman, R. H., Garcia, H. H., Zimic, M. ]]></dc:creator>
<dc:date>2026-07-01</dc:date>
<dc:identifier>doi:10.64898/2026.06.26.734379</dc:identifier>
<dc:title><![CDATA[Pericystic brain transcriptomics reveals molecular signatures of immune activation and neurovascular remodelling in viable and post-treatment porcine neurocysticercosis]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.28.735102v1?rss=1">
<title>
<![CDATA[
Fly Viral Atlas: A single-nucleus transcriptomic atlas of RNA viruses and transposable elements (TEs) in Drosophila melanogaster 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.28.735102v1?rss=1
</link>
<description><![CDATA[
Drosophila RNA viruses often persist in wild and lab populations, yet their tissue and cellular tropism is poorly understood. In the Fly Cell Atlas (a comprehensive Drosophila single-nucleus transcriptome) data, we detected four RNA virus infections: Nora virus, Drosophila A virus, Drosophila C virus, and Newfield virus. Nora and Drosophila A virus were the most abundant and widespread across tissues and cell types, while Drosophila C virus and Newfield virus RNA transcript were only found in oenocyte and fat body tissues. We found transcriptional changes associated with viral infection in canonical viral immunity genes (e.g. Vago, vir-1). Additionally, we observed that during persistent viral infections, transposable element (TE) transcripts were upregulated in somatic cells. TEs are traditionally associated with the germline, but recent studies and our data suggest they are also expressed in somatic cells. Using the Fly Cell Atlas data, we found that distinct somatic cell types express specific TE subtypes, indicating regulated and cell-type specific TE activity often overlooked in transcriptomic studies. We present Fly Viral Atlas (https://flyviralatlas.shinyapps.io/home/), a single-nucleus level atlas of RNA viruses and TE expressions in Drosophila, providing new insights into viral tropism and TE dynamics across cell types and tissues.
]]></description>
<dc:creator><![CDATA[ Roy, N., Unckless, R. L. ]]></dc:creator>
<dc:date>2026-07-01</dc:date>
<dc:identifier>doi:10.64898/2026.06.28.735102</dc:identifier>
<dc:title><![CDATA[Fly Viral Atlas: A single-nucleus transcriptomic atlas of RNA viruses and transposable elements (TEs) in Drosophila melanogaster]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.26.734692v1?rss=1">
<title>
<![CDATA[
Evolutionary Stratification of Codon Usage Bias In Plants Arises from GC3 Composition and Translational Optimization 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.26.734692v1?rss=1
</link>
<description><![CDATA[
Codon usage bias is a fundamental genomic characteristic that prefers non-random preferential use of synonymous codons. It is a major determinant of translational efficiency, gene regulation, and molecular evolution. However, the evolutionary bias and functional relevance of codon usage bias across the plant lineage is poorly defined and yet to understand what are the major factors responsible for relative synonymous codon usage (RSCU) in genomes and how codon usage bias influences the gene regulation, molecular evolution genomes. A genome-wide codon usage bias study of coding DNA sequences of 262 plant genome was conducted. It encompassed more than 4.6 billion codons from > 11 million coding sequences. Relative synonymous codon usage, codon adaptation index, codon-anticodon mapping, effective number of codon (ENC)-GC3, GC1,2-GC3, parity rule 2 (PR2-bias), molecular economy, and machine learning approaches were used for the study. It was found that codon usage bias was strongly non-random and exhibited a clear phylogenetic structuring. The higher plants favoured A/T-ending, whereas early-diverging lineages were enriched in G/C-ending codons. Analysis of RSCU, codon adaptation index, and codon-anticodon pairing indicated that translational selection is mediated by tRNA availability, contributing sustainability to these molecular patterns. Machine-learning approaches identified a small subset of codons having outsized influence on genome-wide codon usage landscapes. Further studies revealed the presence of robust inverse relationships between the effective number of codons and GC content at synonymous third positions. Neutrality analysis revealed approximately 61% of variation was driven by mutational pressure, tempered by selective constraints. Phylogenetic reconstruction showed a progressive relaxation of codon bias from algae to angiosperms while maintaining a conserved molecular economy cost of [~] 30 ATP per codon across the lineages. The study revealed codon usage bias is lineage-specific evolutionary conserved trait governed by mutation, selection, and translational optimization.

Research HighlightsO_LILargest genome-wide study of codon usage in the plant kingdom, covering approximately 4.6 billion codons from 262 plant species that covers algae to angiosperms.
C_LIO_LIShowed non-random AT-based codon usage with lineage-specific pattern where higher GC-ending codons are found in the lower plants and AT-ending codons in the higher plants.
C_LIO_LIGC3-associated mutational pressure was the primary driver of codon usage bias, with approximately 61% of the variation explained by mutation.
C_LIO_LIGC3 showed an inverse relationship with effective number of codons (ENC), establishing the role of GC3 as major axis of evolutionary divergence.
C_LIO_LITranslational selection and codon adaptation shaped the genome-wide codon and amino acid usage.
C_LIO_LICodon anticodon co-evolution is shaped by lineage-specific optimization driven by tRNA anticodon availability
C_LIO_LIApproximately 30 ATPs/codon are required as translational energy cost, revealing strong evolutionary constraints on molecular economy.
C_LI

Author SummaryAlthough the genetic code is degenerate, synonymous codons are not used universally across the entire genome. Although codon bias is determined as one of the major determinants of translation and molecular evolution, its genome-wide evolutionary architecture across the plant kingdom still remains unresolved. Here the author presented a large and comprehensive analysis of genome-wide codon usage bias across the plant kingdom by integrating more than 4.6 billion codons from over 11 million sequences and 262 plant species. We found that codon usage bias is deeply conserved through the evolutionary stratification that tracks the plant phylogeny. The early diverging lineages algae and bryophytes preferred GC-ending codons, while higher plants preferred AT-ending codons. GC3 composition was found to play a major role towards divergence of codon usage, while mutational pressure played approximately 61% of genome-wide variation. Translational selection also added to the lineage-specific constraints. By integrating codon-anticodon mapping, codon-adaptation index, and machine-learning approaches, it was found that translational optimization is tightly regulated by tRNA availability. A small subset of codons showed disproportionate influence on the global codon usage landscape. Although there is extensive diversification in genome composition and codon preference, plants still showed a conserved energetic investment of approximately 30 ATP per codon. This indicates a strong conserved evolutionary constraint of translational economy. Collectively, the study demonstrates that codon usage bias is a property that is both evolutionarily conserved and dependent on lineage. These findings are resulted from the interplay between mutation, selection, and translational optimization. The study provides a comprehensive framework towards understanding the evolution of synonymous codon usage in plants and offers future efforts in evolutionary genomics, synthetic biology, and transgene design.
]]></description>
<dc:creator><![CDATA[ Mohanta, T. K. ]]></dc:creator>
<dc:date>2026-07-01</dc:date>
<dc:identifier>doi:10.64898/2026.06.26.734692</dc:identifier>
<dc:title><![CDATA[Evolutionary Stratification of Codon Usage Bias In Plants Arises from GC3 Composition and Translational Optimization]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://www.biorxiv.org/content/10.64898/2026.06.30.735607v1?rss=1">
<title>
<![CDATA[
A conserved architectural domain shapes centromere evolution in Drosophila 
]]>
</title>
<link>
https://www.biorxiv.org/content/10.64898/2026.06.30.735607v1?rss=1
</link>
<description><![CDATA[
Centromeres ensure faithful chromosome segregation despite being embedded within rapidly evolving repetitive DNA, a contradiction known as the centromere paradox. While centromere identity is defined by the histone variant CENP-A, how conserved function is maintained amid rapid DNA turnover remains unclear. Here, we generate highly contiguous genome assemblies from single Drosophila melanogaster individuals that, for the first time, resolve a chromosome through its centromere, linking the chromosome 3 arms within a continuous sequence. Comparative assemblies from wild-derived strains reveal extensive structural variation in pericentromeric satellites, including large-scale expansions, contractions, and sequence divergence. Despite this variation, the CENP-A-associated centromeric core exhibits conserved organization across strains. Integration of Hi-C interaction maps with sequence analyses shows that flanking dodeca satellite arrays form a spatially interacting domain that bridges both sides of the centromere, whereas adjacent Prodsat arrays are more variable and show weaker interactions. These results support a model in which rapidly evolving centromeric DNA is constrained by conserved higher-order architecture, providing a framework for reconciling the rapid evolution of centromere sequence with its conserved function.
]]></description>
<dc:creator><![CDATA[ Samano, A., Chakraborty, M. ]]></dc:creator>
<dc:date>2026-07-01</dc:date>
<dc:identifier>doi:10.64898/2026.06.30.735607</dc:identifier>
<dc:title><![CDATA[A conserved architectural domain shapes centromere evolution in Drosophila]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory</dc:publisher>
<prism:publicationDate>2026-07-01</prism:publicationDate>
<prism:section></prism:section>
</item>
</rdf:RDF>
