	<rdf:RDF xmlns:admin="http://webns.net/mvcb/" xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:prism="http://purl.org/rss/1.0/modules/prism/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/">
	<channel rdf:about="https://biorxiv.org">
	<admin:errorReportsTo rdf:resource="mailto:biorxiv@cshlpress.edu"/>
	<title>bioRxiv Channel: Biology of Genomes 2016 #BOG16 </title>
	<link>https://biorxiv.org</link>
	<description>
	This feed contains articles for bioRxiv Channel "Biology of Genomes 2016 #BOG16 "
	</description>

		<items>
	<rdf:Seq>
		</rdf:Seq>
	</items>
	<prism:eIssn/>
	<prism:publicationName>bioRxiv</prism:publicationName>
	<prism:issn/>

	<image rdf:resource=""/>
	</channel>
	<image rdf:about="">
	<title>bioRxiv</title>
	<url/>
	<link>https://biorxiv.org</link>
	</image>
	<item rdf:about="https://biorxiv.org/cgi/content/short/036434v1?rss=1">
<title>
<![CDATA[
Variation in the molecular clock of primates 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/036434v1?rss=1"
</link>
<description><![CDATA[
Events in primate evolution are often dated by assuming a "molecular clock", i.e., a constant rate of substitution per unit time, but the validity of this assumption remains unclear. Among mammals, it is well known that there exists substantial variation in yearly substitution rates. Such variation is to be expected from differences in life-history traits, suggesting that it should also be found among primates. Motivated by these considerations, we analyze whole genomes from ten primate species, including Old World Monkeys (OWMs), New World Monkeys (NWMs) and apes, focusing on putatively neutral autosomal sites and controlling for possible effects of biased gene conversion and methylation at CpG sites. We find that substitution rates are {small tilde}65% higher in lineages leading from the hominoid-NWM ancestor to NWMs than to apes. Within apes, rates are {small tilde}2% higher in chimpanzees and {small tilde}7% higher in the gorilla than in humans. Substitution types subject to biased gene conversion show no more variation among species than those not subject to it. Not all mutation types behave similarly, however: in particular, transitions at CpG sites exhibit a more clock-like behavior than do other types, presumably due to their non-replicative origin. Thus, not only the total rate, but also the mutational spectrum varies among primates. This finding suggests that events in primate evolution are most reliably dated using CpG transitions. Taking this approach, we estimate that the average time to the most recent common ancestor of human and chimpanzee is 12.1 million years and their split time 7.9 million years.nnSignificance statementMuch of our understanding of the chronology of human evolution relies on the "molecular clock", i.e., a constant rate of substitutions per unit time. To evaluate the validity of this assumption, we analyze whole genome sequences from ten primate species. We find that there is substantial variation in the molecular clock between apes and monkeys, and rates even differ within hominoids. Importantly, not all mutation types behave similarly: notably, transitions at CpG sites exhibit a more clock-like behavior than other substitutions, presumably due to their non-replicative origin. Thus, the mutation spectra, and not just the overall substitution rates, are changing across primates. This finding further suggests that events in primate evolution are most reliably dated using CpG transitions.
]]></description>
<dc:creator>Priya Moorjani</dc:creator>
<dc:creator>Carlos Eduardo G. Amorim</dc:creator>
<dc:creator>Peter F. Arndt</dc:creator>
<dc:creator>Molly Przeworski</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-01-11</dc:date>
<dc:identifier>doi:10.1101/036434</dc:identifier>
<dc:title><![CDATA[Variation in the molecular clock of primates]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-01-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/045260v1?rss=1">
<title>
<![CDATA[
Integrating tissue specific mechanisms into GWAS summary results 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/045260v1?rss=1"
</link>
<description><![CDATA[
Scalable, integrative methods to understand mechanisms that link genetic variants with phenotypes are needed. Here we derive a mathematical expression to compute PrediXcan (a gene mapping approach) results using summary data (S-PrediXcan) and show its accuracy and general robustness to misspecified reference sets. We apply this framework to 44 GTEx tissues and 100+ phenotypes from GWAS and meta-analysis studies, creating a growing public catalog of associations that seeks to capture the effects of gene expression variation on human phenotypes. Replication in an independent cohort is shown. Most of the associations were tissue specific, suggesting context specificity of the trait etiology. Colocalized significant associations in unexpected tissues underscore the need for an agnostic scanning of multiple contexts to improve our ability to detect causal regulatory mechanisms. Monogenic disease genes are enriched among significant associations for related traits, suggesting that smaller alterations of these genes may cause a spectrum of milder phenotypes.
]]></description>
<dc:creator>Alvaro Barbeira</dc:creator>
<dc:creator>Scott P Dickinson</dc:creator>
<dc:creator>Jason M Torres</dc:creator>
<dc:creator>Eric S Torstenson</dc:creator>
<dc:creator>Jiamao Zheng</dc:creator>
<dc:creator>Heather E Wheeler</dc:creator>
<dc:creator>Kaanan P Shah</dc:creator>
<dc:creator>Todd Edwards</dc:creator>
<dc:creator>GTEx Consortium</dc:creator>
<dc:creator>Dan Nicolae</dc:creator>
<dc:creator>Nancy J Cox</dc:creator>
<dc:creator>Hae Kyung Im</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-03-23</dc:date>
<dc:identifier>doi:10.1101/045260</dc:identifier>
<dc:title><![CDATA[Integrating tissue specific mechanisms into GWAS summary results]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-03-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/043653v1?rss=1">
<title>
<![CDATA[
Survey of the Heritability and Sparsity of Gene Expression Traits Across Human Tissues 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/043653v1?rss=1"
</link>
<description><![CDATA[
Understanding the genetic architecture of gene expression traits is key to elucidating the underlying mechanisms of complex traits. Here, for the first time, we perform a systematic survey of the heritability and the distribution of effect sizes across all representative tissues in the human body. We find that local h2 can be relatively well characterized with 59% of expressed genes showing significant h2 (FDR < 0.1) in the DGN whole blood cohort. However, current sample sizes (n [&le;] 922) do not allow us to compute distal h2. Bayesian Sparse Linear Mixed Model (BSLMM) analysis provides strong evidence that the genetic contribution to local expression traits is dominated by a handful of genetic variants rather than by the collective contribution of a large number of variants each of modest size. In other words, the local architecture of gene expression traits is sparse rather than polygenic across all 40 tissues (from DGN and GTEx) examined. This result is confirmed by the sparsity of optimal performing gene expression predictors via elastic net modeling. To further explore the tissue context specificity, we decompose the expression traits into cross-tissue and tissue-specific components using a novel Orthogonal Tissue Decomposition (OTD) approach. Through a series of simulations we show that the cross-tissue and tissue-specific components are identifiable via OTD. Heritability and sparsity estimates of these derived expression phenotypes show similar characteristics to the original traits. Consistent properties relative to prior GTEx multi-tissue analysis results suggest that these traits reflect the expected biology. Finally, we apply this knowledge to develop prediction models of gene expression traits for all tissues. The prediction models, heritability, and prediction performance R2 for original and decomposed expression phenotypes are made publicly available (https://github.com/hakyimlab/PrediXcan).nnAuthor SummaryGene regulation is known to contribute to the underlying mechanisms of complex traits. The GTEx project has generated RNA-Seq data on hundreds of individuals across more than 40 tissues providing a comprehensive atlas of gene expression traits. Here, we systematically examined the local versus distant heritability as well as the sparsity versus polygenicity of protein coding gene expression traits in tissues across the entire human body. To determine tissue context specificity, we decomposed the expression levels into cross-tissue and tissue-specific components. Regardless of tissue type, we found that local heritability, but not distal heritability, can be well characterized with current sample sizes. We found that the distribution of effect sizes is more consistent with a sparse local architecture in all tissues. We also show that the cross-tissue and tissue-specific expression phenotypes constructed with our orthogonal tissue decomposition model recapitulate complex Bayesian multi-tissue analysis results. This knowledge was applied to develop prediction models of gene expression traits for all tissues, which we make publicly available.
]]></description>
<dc:creator>Heather E Wheeler</dc:creator>
<dc:creator>Kaanan P Shah</dc:creator>
<dc:creator>Jonathon Brenner</dc:creator>
<dc:creator>Tzintzuni Garcia</dc:creator>
<dc:creator>Keston Aquino-Michaels</dc:creator>
<dc:creator>GTEx Consortium</dc:creator>
<dc:creator>Nancy J Cox</dc:creator>
<dc:creator>Dan L Nicolae</dc:creator>
<dc:creator>Hae Kyung Im</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-03-15</dc:date>
<dc:identifier>doi:10.1101/043653</dc:identifier>
<dc:title><![CDATA[Survey of the Heritability and Sparsity of Gene Expression Traits Across Human Tissues]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-03-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/043752v1?rss=1">
<title>
<![CDATA[
Genetic predictors of gene expression associated with risk of bipolar disorder 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/043752v1?rss=1"
</link>
<description><![CDATA[
Bipolar disorder (BD) affects the quality of life of approximately 1% of the population and represents a major public health concern. It is known to be highly heritable but large-scale genome-wide association studies (GWAS) have discovered only a handful of markers associated with the disease. Furthermore, the biological mechanisms underlying these markers need to be elucidated. We recently published a gene-level association test, PrediXcan that integrates transcriptome regulation data to characterize the function of these markers in a tissue specific manner. In this study, we developed prediction models for mRNA levels in 10 brain regions using data from the GTEx project and performed PrediXcan analysis in WTCCC as well as in an independent cohort, GAIN. We replicate the association between predicted expression of PTPRE and BD risk in whole blood and recapitulate the association in brain tissues. PTPRE encodes the protein tyrosine phosphatase, receptor type E, that is known to be involved in RAS signaling and activation of voltage-gated K+ channels. We also found a new genome-wide significant association between lower predicted expression of BBX (bobby sox homolog) in the anterior cingulate cortex region of the brain and increased risk of BD (pWTCCC = 7.02 x 10-6, pGAIN = 4.68 x 10-3, pmeta = 1.11 x 10-7). In sum, we used our mechanistically informed approach, PrediXcan, to identify and replicate two novel genome-wide significant genes using existing GWAS studies.
]]></description>
<dc:creator>Kaanan Shah</dc:creator>
<dc:creator>Heather E Wheeler</dc:creator>
<dc:creator>Eric R Gamazon</dc:creator>
<dc:creator>Dan L Nicolae</dc:creator>
<dc:creator>Nancy J Cox</dc:creator>
<dc:creator>Hae Kyung Im</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-03-15</dc:date>
<dc:identifier>doi:10.1101/043752</dc:identifier>
<dc:title><![CDATA[Genetic predictors of gene expression associated with risk of bipolar disorder]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-03-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/049130v1?rss=1">
<title>
<![CDATA[
High-resolution interrogation of functional elements in the noncoding genome 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/049130v1?rss=1"
</link>
<description><![CDATA[
The noncoding genome plays a major role in gene regulation and disease yet we lack tools for rapid identification and manipulation of noncoding elements. Here, we develop a large-scale CRISPR screen employing ~18,000 sgRNAs targeting >700 kb of noncoding sequence in an unbiased manner surrounding three genes (NF1, NF2, and CUL3) involved in resistance to the BRAF inhibitor vemurafenib in the BRAF-mutant melanoma cell line A375. We identify specific noncoding locations near genes that modulate drug resistance when mutated. These sites have predictive hallmarks of noncoding function, such as physical interaction with gene promoters, evolutionary conservation and tissue-specific chromatin accessibility. At a subset of identified elements at the CUL3 locus, we show that engineered mutations lead to a loss of gene expression associated with changes in transcription factor occupancy and in long-range and local epigenetic environments, implicating these sites in gene regulation and chemotherapeutic resistance. This demonstration of an unbiased mutagenesis screen across large noncoding regions expands the potential of pooled CRISPR screens for fundamental genomic discovery and for elucidating biologically relevant mechanisms of gene regulation.
]]></description>
<dc:creator>Neville E Sanjana</dc:creator>
<dc:creator>Jason Wright</dc:creator>
<dc:creator>Kaijie Zheng</dc:creator>
<dc:creator>Ophir Shalem</dc:creator>
<dc:creator>Pierre Fontanillas</dc:creator>
<dc:creator>Julia Joung</dc:creator>
<dc:creator>Christine Cheng</dc:creator>
<dc:creator>Aviv Regev</dc:creator>
<dc:creator>Feng Zhang</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-04-18</dc:date>
<dc:identifier>doi:10.1101/049130</dc:identifier>
<dc:title><![CDATA[High-resolution interrogation of functional elements in the noncoding genome]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-04-18</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/047464v1?rss=1">
<title>
<![CDATA[
Genome-wide generalized additive models 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/047464v1?rss=1"
</link>
<description><![CDATA[
MotivationChromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) is a widely used approach to study protein-DNA interactions. Often, the quantities of interest are the differential occupancies relative to controls, between genetic backgrounds, treatments, or combinations thereof. Current methods for differential occupancy of ChIP-seq data rely however on binning or sliding window techniques, for which the choice of the window and bin sizes are subjective.nnResultsHere, we present GenoGAM (Genome-wide Generalized Additive Model), which brings the well-established and flexible generalized additive models framework to genomic applications using a data parallelism strategy. We model ChIP-Seq read count frequencies as products of smooth functions along chromosomes. Smoothing parameters are objectively estimated from the data by cross-validation, eliminating ad-hoc binning and windowing needed by current approaches. GenoGAM provides base-level and region-level significance testing for full factorial designs. Application to a ChIP-Seq dataset in yeast showed increased sensitivity over existing differential occupancy methods while controlling for type I error rate. By analyzing a set of DNA methylation data and illustrating an extension to a peak caller, we further demonstrate the potential of GenoGAM as a generic statistical modeling tool for genome-wide assays.nnAvailabilitySoftware is available from Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/GenoGAM.htmlnnContactgagneur@in.tum.dennSupplementary informationSupplementary information is available at Bioinformatics online.
]]></description>
<dc:creator>Georg Stricker</dc:creator>
<dc:creator>Alexander Engelhardt</dc:creator>
<dc:creator>Daniel Schulz</dc:creator>
<dc:creator>Matthias Schmid</dc:creator>
<dc:creator>Achim Tresch</dc:creator>
<dc:creator>Julien Gagneur</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-04-06</dc:date>
<dc:identifier>doi:10.1101/047464</dc:identifier>
<dc:title><![CDATA[Genome-wide generalized additive models]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-04-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/041020v1?rss=1">
<title>
<![CDATA[
Accurate promoter and enhancer identification in 127 ENCODE and Roadmap Epigenomics cell types and tissues by GenoSTAN 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/041020v1?rss=1"
</link>
<description><![CDATA[
Accurate maps of promoters and enhancers are required for understanding transcriptional regulation. Promoters and enhancers are usually mapped by integration of chromatin assays charting histone modifications, DNA accessibility, and transcription factor binding. However, current algorithms are limited by unrealistic data distribution assumptions. Here we propose GenoSTAN (Genomic STate ANnotation), a hidden Markov model overcoming these limitations. We map promoters and enhancers for 127 cell types and tissues from the ENCODE and Roadmap Epigenomics projects, todays largest compendium of chromatin assays. Extensive benchmarks demonstrate that GenoSTAN consistently identifies promoters and enhancers with significantly higher accuracy than previous methods. Moreover, GenoSTAN-derived promoters and enhancers showed significantly higher enrichment of complex trait-associated genetic variants than current annotations. Altogether, GenoSTAN provides an easy-to-use tool to define promoters and enhancers in any system, and our annotation of human transcriptional cis-regulatory elements constitutes a rich resource for future research in biology and medicine.
]]></description>
<dc:creator>Benedikt Zacher</dc:creator>
<dc:creator>Margaux Michel</dc:creator>
<dc:creator>Bjoern Schwalb</dc:creator>
<dc:creator>Patrick Cramer</dc:creator>
<dc:creator>Achim Tresch</dc:creator>
<dc:creator>Julien Gagneur</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-02-23</dc:date>
<dc:identifier>doi:10.1101/041020</dc:identifier>
<dc:title><![CDATA[Accurate promoter and enhancer identification in 127 ENCODE and Roadmap Epigenomics cell types and tissues by GenoSTAN]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-02-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/048819v1?rss=1">
<title>
<![CDATA[
A model of compound heterozygous, loss-of-function alleles is broadly consistent with observations from complex-disease GWAS datasets 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/048819v1?rss=1"
</link>
<description><![CDATA[
The genetic component of complex disease risk in humans remains largely unexplained. A corollary is that the allelic spectrum of genetic variants contributing to complex disease risk is unknown. Theoretical models that relate population genetic processes to the maintenance of genetic variation for quantitative traits may suggest profitable avenues for future experimental design. Here we use forward simulation to model a genomic region evolving under a balance between recurrent deleterious mutation and Gaussian stabilizing selection. We consider multiple genetic and demographic models, and several different methods for identifying genomic regions harboring variants associated with complex disease risk. We demonstrate that the model of gene action, relating genotype to phenotype, has a qualitative effect on several relevant aspects of the population genetic architecture of a complex trait. In particular, the genetic model impacts genetic variance component partitioning across the allele frequency spectrum and the power of statistical tests. Models with partial recessivity closely match the minor allele frequency distribution of significant hits from empirical genome-wide association studies without requiring homozygous effect-sizes to be small. We highlight a particular gene-based model of incomplete recessivity that is appealing from first principles. Under that model, deleterious mutations in a genomic region partially fail to complement one another. This model of gene-based recessivity predicts the empirically observed inconsistency between twin and SNP based estimated of dominance heritability. Furthermore, this model predicts considerable levels of unexplained variance associated with intralocus epistasis. Our results suggest a need for improved statistical tools for region based genetic association and heritability estimation.nnAuthor SummaryGene action determines how mutations affect phenotype. When placed in an evolutionary context, the details of the genotype-to-phenotype model can impact the maintenance of genetic variation for complex traits. Likewise, non-equilibrium demographic history may affect patterns of genetic variation. Here, we explore the impact of genetic model and population growth on distribution of genetic variance across the allele frequency spectrum underlying risk for a complex disease. Using forward-in-time population genetic simulations, we show that the genetic model has important impacts on the composition of variation for complex disease risk in a population. We explicitly simulate genome-wide association studies (GWAS) and perform heritability estimation on population samples. A particular model of gene-based partial recessivity, based on allelic non-complementation, aligns well with empirical results. This model is congruent with the dominance variance estimates from both SNPs and twins, and the minor allele frequency distribution of GWAS hits.
]]></description>
<dc:creator>Jaleal Sanjak</dc:creator>
<dc:creator>Anthony D Long</dc:creator>
<dc:creator>Kevin R Thornton</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-04-15</dc:date>
<dc:identifier>doi:10.1101/048819</dc:identifier>
<dc:title><![CDATA[A model of compound heterozygous, loss-of-function alleles is broadly consistent with observations from complex-disease GWAS datasets]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-04-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/038463v1?rss=1">
<title>
<![CDATA[
Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/038463v1?rss=1"
</link>
<description><![CDATA[
Bacterial genomes vary extensively in terms of both gene content and gene sequence - this plasticity hampers the use of traditional SNP-based methods for identifying all genetic associations with phenotypic variation. Here we introduce a computationally scalable and widely applicable statistical method (SEER) for the identification of sequence elements that are significantly enriched in a phenotype of interest. SEER is applicable to even tens of thousands of genomes by counting variable-length k-mers using a distributed string-mining algorithm. Robust options are provided for association analysis that also correct for the clonal population structure of bacteria. Using large collections of genomes of the major human pathogens Streptococcus pneumoniae and Streptococcus pyogenes, SEER identifies relevant previously characterised resistance determinants for several antibiotics and discovers potential novel factors related to the invasiveness of S. pyogenes. We thus demonstrate that our method can answer important biologically and medically relevant questions.
]]></description>
<dc:creator>John A Lees</dc:creator>
<dc:creator>Minna Vehkala</dc:creator>
<dc:creator>Niko Välimäki</dc:creator>
<dc:creator>Simon R Harris</dc:creator>
<dc:creator>Claire Chewapreecha</dc:creator>
<dc:creator>Nicholas J Croucher</dc:creator>
<dc:creator>Pekka Marttinen</dc:creator>
<dc:creator>Mark R Davies</dc:creator>
<dc:creator>Andrew C Steer</dc:creator>
<dc:creator>Stephen Y C Tong</dc:creator>
<dc:creator>Antti Honkela</dc:creator>
<dc:creator>Julian Parkhill</dc:creator>
<dc:creator>Stephen D Bentley</dc:creator>
<dc:creator>Jukka Corander</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-02-02</dc:date>
<dc:identifier>doi:10.1101/038463</dc:identifier>
<dc:title><![CDATA[Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-02-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/019166v1?rss=1">
<title>
<![CDATA[
Coregulation of tandem duplicate genes slows evolution of subfunctionalization in mammals. 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/019166v1?rss=1"
</link>
<description><![CDATA[
Gene duplication is a fundamental process in genome evolution. However, most young duplicates are degraded into pseudogenes by loss-of-function mutations, and the factors that allow some duplicate pairs to survive long-term remain controversial. One class of models to explain duplicate retention invokes sub- or neofunctionalization, especially through evolution of gene expression, while other models focus on sharing of gene dosage. While studies of whole genome duplications tend to support dosage sharing, the primary mechanisms in mammals-where duplications are small-scale and thus disrupt dosage balance- are unclear. Using RNA-seq data from 46 human and 26 mouse tissues we find that sub-functionalization of expression evolves slowly, and is rare among duplicates that arose within the placental mammals. A major impediment to subfunctionalization is that tandem duplicates tend to be co-regulated by shared genomic elements, in contrast to the standard assumption of modularity of gene expression. Instead, consistent with the dosage-sharing hypothesis, most young duplicates are down-regulated to match expression of outgroup singleton genes. Our data suggest that dosage sharing of expression is a key factor in the initial survival of mammalian duplicates, followed by slower functional adaptation enabling long-term preservation.
]]></description>
<dc:creator>Xun Lan</dc:creator>
<dc:creator>Jonathan K Pritchard</dc:creator>
<dc:creator></dc:creator>
<dc:date>2015-05-10</dc:date>
<dc:identifier>doi:10.1101/019166</dc:identifier>
<dc:title><![CDATA[Coregulation of tandem duplicate genes slows evolution of subfunctionalization in mammals.]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2015-05-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/019067v1?rss=1">
<title>
<![CDATA[
Rail-RNA: Scalable analysis of RNA-seq splicing and coverage 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/019067v1?rss=1"
</link>
<description><![CDATA[
RNA sequencing (RNA-seq) experiments now span hundreds to thousands of samples. Current spliced alignment software is designed to analyze each sample separately. Consequently, no information is gained from analyzing multiple samples together, and it is difficult to reproduce the exact analysis without access to original computing resources. We describe Rail-RNA, a cloud-enabled spliced aligner that analyzes many samples at once. Rail-RNA eliminates redundant work across samples, making it more efficient as samples are added. For many samples, Rail-RNA is more accurate than annotation-assisted aligners. We use Rail-RNA to align 667 RNA-seq samples from the GEUVADIS project on Amazon Web Services in under 16 hours for US$0.91 per sample. Rail-RNA produces alignments and base-resolution bigWig coverage files, ready for use with downstream packages for reproducible statistical analysis. We identify expressed regions in the GEUVADIS samples and show that both annotated and unannotated (novel) expressed regions exhibit consistent patterns of variation across populations and with respect to known confounders. Rail-RNA is open-source software available at http://rail.bio.
]]></description>
<dc:creator>Abhinav Nellore</dc:creator>
<dc:creator>Leonardo Collado-Torres</dc:creator>
<dc:creator>Andrew E Jaffe</dc:creator>
<dc:creator>José Alquicira-Hernández</dc:creator>
<dc:creator>Jacob Pritt</dc:creator>
<dc:creator>James Morton</dc:creator>
<dc:creator>Jeffrey T Leek</dc:creator>
<dc:creator>Ben Langmead</dc:creator>
<dc:creator></dc:creator>
<dc:date>2015-05-07</dc:date>
<dc:identifier>doi:10.1101/019067</dc:identifier>
<dc:title><![CDATA[Rail-RNA: Scalable analysis of RNA-seq splicing and coverage]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2015-05-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/051557v1?rss=1">
<title>
<![CDATA[
Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/051557v1?rss=1"
</link>
<description><![CDATA[
Translational stop codon readthrough was virtually unknown in eukaryotic genomes until recent developments in comparative genomics and new experimental techniques revealed evidence of readthrough in hundreds of fly genes and several human, worm, and yeast genes. Here, we use the genomes of 21 species of Anopheles mosquitoes and improved comparative techniques to identify evolutionary signatures of conserved, functional readthrough of 353 stop codons in the malaria vector, Anopheles gambiae, and 51 additional Drosophila melanogaster stop codons, with several cases of double and triple readthrough including readthrough of two adjacent stop codons, supporting our earlier prediction of abundant readthrough in pancrustacea genomes. Comparisons between Anopheles and Drosophila allow us to transcend the static picture provided by single-clade analysis to explore the evolutionary dynamics of abundant readthrough. We find that most differences between the readthrough repertoires of the two species are due to readthrough gain or loss in existing genes, rather than to birth of new genes or to gene death; that RNA structures are sometimes gained or lost while readthrough persists; and that readthrough is more likely to be lost at TAA and TAG stop codons. We also determine which characteristic properties of readthrough predate readthrough and which are clade-specific. We estimate that there are more than 600 functional readthrough stop codons in A. gambiae and 900 in D. melanogaster. We find evidence that readthrough is used to regulate peroxisomal targeting in two genes. Finally, we use the sequenced centipede genome to refine the phylogenetic extent of abundant readthrough.
]]></description>
<dc:creator>Irwin Jungreis</dc:creator>
<dc:creator>Clara S Chan</dc:creator>
<dc:creator>Robert M Waterhouse</dc:creator>
<dc:creator>Gabriel Fields</dc:creator>
<dc:creator>Michael F Lin</dc:creator>
<dc:creator>Manolis Kellis</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-05-03</dc:date>
<dc:identifier>doi:10.1101/051557</dc:identifier>
<dc:title><![CDATA[Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-05-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/044925v1?rss=1">
<title>
<![CDATA[
Assemblytics: a web analytics tool for the detection of assembly-based variants 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/044925v1?rss=1"
</link>
<description><![CDATA[
SummaryAssemblytics is a web app for detecting and analyzing structural variants from a de novo genome assembly aligned to a reference genome. It incorporates a unique anchor filtering approach to increase robustness to repetitive elements, and identifies six classes of variants based on their distinct alignment signatures. Assemblytics can be applied both to comparing aberrant genomes, such as human cancers, to a reference, or to identify differences between related species. Multiple interactive visualizations enable in-depth explorations of the genomic distributions of variants.nnAvailability and Implementationhttp://qb.cshl.edu/assemblytics, https://github.com/marianattestad/assemblyticsnnContact: mnattest@cshl.edunnSupplementary informationSupplementary data are available at Bioinformatics online.
]]></description>
<dc:creator>Maria Nattestad</dc:creator>
<dc:creator>Michael C Schatz</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-03-20</dc:date>
<dc:identifier>doi:10.1101/044925</dc:identifier>
<dc:title><![CDATA[Assemblytics: a web analytics tool for the detection of assembly-based variants]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-03-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/035287v1?rss=1">
<title>
<![CDATA[
Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/035287v1?rss=1"
</link>
<description><![CDATA[
Motivation: Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP To analyze dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data.nnResults: We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce. The tool, Rail-RNA v0.2, is a spliced aligner for RNA- seq data, which we demonstrate by running on 9,662 samples from the dbGaP-protected GTEx consortium dataset. The Rail-dbGaP protocol makes explicit for the first time the steps an investigator must take to develop Elastic MapReduce pipelines that analyze dbGaP-protected data in a manner compliant with NIH guidelines. Rail-RNA automates implementation of the protocol, making it easy for typical biomedical investigators to study protected RNA-seq data, regardless of their local IT resources or expertise.nnAvailability: Rail-RNA is available from http://rail.bio. Technical details on the Rail-dbGaP protocol as well as an implementation walkthrough are available at https://github.com/nellore/rail-dbgap. Detailed instructions on running Rail-RNA on dbGaP-protected data using Amazon Web Services are available at http://docs.rail.bio/dbgap/.nnContact: anellore@gmail.com, langmea@cs.jhu.edu
]]></description>
<dc:creator>Abhinav Nellore</dc:creator>
<dc:creator>Christopher Wilks</dc:creator>
<dc:creator>Kasper D Hansen</dc:creator>
<dc:creator>Jeffrey T Leek</dc:creator>
<dc:creator>Ben Langmead</dc:creator>
<dc:creator></dc:creator>
<dc:date>2015-12-24</dc:date>
<dc:identifier>doi:10.1101/035287</dc:identifier>
<dc:title><![CDATA[Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2015-12-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/038224v1?rss=1">
<title>
<![CDATA[
Human splicing diversity across the Sequence Read Archive 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/038224v1?rss=1"
</link>
<description><![CDATA[
We aligned 21,504 publicly available Illumina-sequenced human RNA-seq samples from the Sequence Read Archive (SRA) to the human genome and compared detected exon-exon junctions with junctions in several recent gene annotations. 56,865 junctions (18.6%) found in at least 1,000 samples were not annotated, and their expression associated with tissue type. Newer samples contributed few novel well-supported junctions, with 96.1% of junctions detected in at least 20 reads across samples present in samples before 2013. Junction data is compiled into a resource called intropolis available at http://intropolis.rail.bio. We discuss an application of this resource to cancer involving a recently validated isoform of the ALK gene.
]]></description>
<dc:creator>Abhinav Nellore</dc:creator>
<dc:creator>Andrew E Jaffe</dc:creator>
<dc:creator>Jean-Philippe Fortin</dc:creator>
<dc:creator>José Alquicira-Hernández</dc:creator>
<dc:creator>Leonardo Collado-Torres</dc:creator>
<dc:creator>Siruo Wang</dc:creator>
<dc:creator>Robert A Phillips</dc:creator>
<dc:creator>Nishika Karbhari</dc:creator>
<dc:creator>Kasper D Hansen</dc:creator>
<dc:creator>Ben Langmead</dc:creator>
<dc:creator>Jeffrey T Leek</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-01-29</dc:date>
<dc:identifier>doi:10.1101/038224</dc:identifier>
<dc:title><![CDATA[Human splicing diversity across the Sequence Read Archive]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-01-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/039297v1?rss=1">
<title>
<![CDATA[
Persisting fetal clonotypes influence the structure and overlap of adult human T cell receptor repertoires 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/039297v1?rss=1"
</link>
<description><![CDATA[
The diversity of T-cell receptors recognizing foreign pathogens is generated through a highly stochastic recombination process, making the independent production of the same sequence rare. Yet unrelated individuals do share receptors, which together constitute a "public" repertoire of abundant clonotypes. The TCR repertoire is initially formed prenatally, when the enzyme inserting random nucleotides is downregulated, producing a limited diversity subset. By statistically analyzing deep sequencing T-cell repertoire data from twins, unrelated individuals of various ages, and cord blood, we show that T-cell clones generated before birth persist and maintain high abundances in adult organisms for decades, slowly decaying with age. Our results suggest that large, low-diversity public clones are created during pregnancy, and survive over long periods, providing the basis of the public repertoire.
]]></description>
<dc:creator>Mikhail V Pogorelyy</dc:creator>
<dc:creator>Yuval Elhanati</dc:creator>
<dc:creator>Quentin Marcou</dc:creator>
<dc:creator>Anastasia L Sycheva</dc:creator>
<dc:creator>Ekaterina A Komech</dc:creator>
<dc:creator>Vadim I Nazarov</dc:creator>
<dc:creator>Olga V Britanova</dc:creator>
<dc:creator>Dmitriy M Chudakov</dc:creator>
<dc:creator>Ilgar Z Mamedov</dc:creator>
<dc:creator>Yuri B Lebedev</dc:creator>
<dc:creator>Thierry Mora</dc:creator>
<dc:creator>Aleksandra M Walczak</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-02-09</dc:date>
<dc:identifier>doi:10.1101/039297</dc:identifier>
<dc:title><![CDATA[Persisting fetal clonotypes influence the structure and overlap of adult human T cell receptor repertoires]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-02-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/052084v1?rss=1">
<title>
<![CDATA[
Detection of human adaptation during the past 2,000 years 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/052084v1?rss=1"
</link>
<description><![CDATA[
Detection of recent natural selection is a challenging problem in population genetics, as standard methods generally integrate over long timescales. Here we introduce the Singleton Density Score (SDS), a powerful measure to infer very recent changes in allele frequencies from contemporary genome sequences. When applied to data from the UK10K Project, SDS reflects allele frequency changes in the ancestors of modern Britons during the past 2,000 years. We see strong signals of selection at lactase and HLA, and in favor of blond hair and blue eyes. Turning to signals of polygenic adaptation we find, remarkably, that recent selection for increased height has driven allele frequency shifts across most of the genome. Moreover, we report suggestive new evidence for polygenic shifts affecting many other complex traits. Our results suggest that polygenic adaptation has played a pervasive role in shaping genotypic and phenotypic variation in modern humans.
]]></description>
<dc:creator>Yair Field</dc:creator>
<dc:creator>Evan A Boyle</dc:creator>
<dc:creator>Natalie Telis</dc:creator>
<dc:creator>Ziyue Gao</dc:creator>
<dc:creator>Kyle J Gaulton</dc:creator>
<dc:creator>David Golan</dc:creator>
<dc:creator>Loic Yengo</dc:creator>
<dc:creator>Ghislain Rocheleau</dc:creator>
<dc:creator>Philippe Froguel</dc:creator>
<dc:creator>Mark I McCarthy</dc:creator>
<dc:creator>Jonathan K Pritchard</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-05-07</dc:date>
<dc:identifier>doi:10.1101/052084</dc:identifier>
<dc:title><![CDATA[Detection of human adaptation during the past 2,000 years]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-05-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/030338v1?rss=1">
<title>
<![CDATA[
Analysis of protein-coding genetic variation in 60,706 humans 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/030338v1?rss=1"
</link>
<description><![CDATA[
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities generated as part of the Exome Aggregation Consortium (ExAC). The resulting catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We show that this catalogue can be used to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; we identify 3,230 genes with near-complete depletion of truncating variants, 72% of which have no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human "knockout" variants in protein-coding genes.
]]></description>
<dc:creator>Exome Aggregation Consortium</dc:creator>
<dc:creator>Monkol Lek</dc:creator>
<dc:creator>Konrad Karczewski</dc:creator>
<dc:creator>Eric Minikel</dc:creator>
<dc:creator>Kaitlin Samocha</dc:creator>
<dc:creator>Eric Banks</dc:creator>
<dc:creator>Timothy Fennell</dc:creator>
<dc:creator>Anne O'Donnell-Luria</dc:creator>
<dc:creator>James Ware</dc:creator>
<dc:creator>Andrew Hill</dc:creator>
<dc:creator>Beryl Cummings</dc:creator>
<dc:creator>Taru Tukiainen</dc:creator>
<dc:creator>Daniel Birnbaum</dc:creator>
<dc:creator>Jack Kosmicki</dc:creator>
<dc:creator>Laramie Duncan</dc:creator>
<dc:creator>Karol Estrada</dc:creator>
<dc:creator>Fengmei Zhao</dc:creator>
<dc:creator>James Zou</dc:creator>
<dc:creator>Emma Pierce-Hoffman</dc:creator>
<dc:creator>Joanne Berghout</dc:creator>
<dc:creator>David Cooper</dc:creator>
<dc:creator>Nicole Deflaux</dc:creator>
<dc:creator>Mark DePristo</dc:creator>
<dc:creator>Ron Do</dc:creator>
<dc:creator>Jason Flannick</dc:creator>
<dc:creator>Menachem Fromer</dc:creator>
<dc:creator>Laura Gauthier</dc:creator>
<dc:creator>Jackie Goldstein</dc:creator>
<dc:creator>Namrata Gupta</dc:creator>
<dc:creator>Daniel Howrigan</dc:creator>
<dc:creator>Adam Kiezun</dc:creator>
<dc:creator>Mitja Kurki</dc:creator>
<dc:creator>Ami Levy Moonshine</dc:creator>
<dc:creator>Pradeep Natarajan</dc:creator>
<dc:creator>Lorena Orozco</dc:creator>
<dc:creator>Gina Peloso</dc:creator>
<dc:creator>Ryan Poplin</dc:creator>
<dc:creator>Manuel Rivas</dc:creator>
<dc:creator>Valentin Ruano-Rubio</dc:creator>
<dc:creator>Samuel Rose</dc:creator>
<dc:creator>Douglas</dc:creator>
<dc:date>2015-10-30</dc:date>
<dc:identifier>doi:10.1101/030338</dc:identifier>
<dc:title><![CDATA[Analysis of protein-coding genetic variation in 60,706 humans]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2015-10-30</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/039529v1?rss=1">
<title>
<![CDATA[
Rare variant phasing and haplotypic expression from RNA-sequencing with phASER 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/039529v1?rss=1"
</link>
<description><![CDATA[
Haplotype phasing of genetic variants is important for clinical interpretation of the genome, population genetic analysis, and functional genomic analysis of allelic activity. Here we present phASER, a fast and accurate approach for phasing variants that are overlapped by sequencing reads, including those from RNA-sequencing (RNA-seq), which often span multiple exons due to splicing. This provides 1) dramatically more accurate phasing of rare and de novo variants compared to population-based phasing; 2) phasing of variants in the same gene up to hundreds of kilobases away which cannot be obtained from DNA-sequencing reads; 3) high confidence measures of haplotypic expression, greatly improving power for allelic expression studies.
]]></description>
<dc:creator>Stephane E Castel</dc:creator>
<dc:creator>Pejman Mohammadi</dc:creator>
<dc:creator>Wendy K Chung</dc:creator>
<dc:creator>Yufeng Shen</dc:creator>
<dc:creator>Tuuli Lappalainen</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-02-12</dc:date>
<dc:identifier>doi:10.1101/039529</dc:identifier>
<dc:title><![CDATA[Rare variant phasing and haplotypic expression from RNA-sequencing with phASER]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-02-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/030486v1?rss=1">
<title>
<![CDATA[
Resources for the comprehensive discovery of functional RNA elements 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/030486v1?rss=1"
</link>
<description><![CDATA[
Transcriptome-wide maps of RNA binding protein (RBP)-RNA interactions by immunoprecipitation (IP)-based methods such as RNA IP (RIP) and crosslinking and IP (CLIP) are key starting points for evaluating the molecular roles of the thousands of human RBPs. A significant bottleneck to the application of these methods in diverse cell-lines, tissues and developmental stages, is the availability of validated IP-quality antibodies. Using IP followed by immunoblot assays, we have developed a validated repository of 438 commercially available antibodies that interrogate 365 unique RBPs. In parallel, 362 short-hairpin RNA (shRNA) constructs against 276 unique RBPs were also used to confirm specificity of these antibodies. These antibodies can characterize subcellular RBP localization. With the burgeoning interest in the roles of RBPs in cancer, neurobiology and development, these resources are invaluable to the broad scientific community. Detailed information about these resources is publicly available at the ENCODE portal (https://www.encodeproject.org/).nnHighlightsO_LIAntibodies against 365 unique RBPs successfully immunoprecipitate the RBPsnC_LIO_LIShort-hairpin RNAs against 276 unique RBPs confirm the specificity of RBP antibodiesnC_LIO_LIAntibodies characterize subcellular localization of RBPsnC_LIO_LIAntibody and hairpin RNA information are provided at https://www.encodeproject.org/nC_LI
]]></description>
<dc:creator>Balaji Sundararaman</dc:creator>
<dc:creator>Lijun Zhan</dc:creator>
<dc:creator>Steven Blue</dc:creator>
<dc:creator>Rebecca Stanton</dc:creator>
<dc:creator>Keri Elkins</dc:creator>
<dc:creator>Sara Olson</dc:creator>
<dc:creator>Xintao Wei</dc:creator>
<dc:creator>Eric L Van Nostrand</dc:creator>
<dc:creator>Stephanie C Huelga</dc:creator>
<dc:creator>Brendan M Smalec</dc:creator>
<dc:creator>Xiaofeng Wang</dc:creator>
<dc:creator>Eurie L Hong</dc:creator>
<dc:creator>Jean M Davidson</dc:creator>
<dc:creator>Eric Lecuyer</dc:creator>
<dc:creator>Brenton R Graveley</dc:creator>
<dc:creator>Gene W Yeo</dc:creator>
<dc:creator></dc:creator>
<dc:date>2015-11-03</dc:date>
<dc:identifier>doi:10.1101/030486</dc:identifier>
<dc:title><![CDATA[Resources for the comprehensive discovery of functional RNA elements]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2015-11-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/041863v1?rss=1">
<title>
<![CDATA[
Vcfanno: fast, flexible annotation of genetic variants 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/041863v1?rss=1"
</link>
<description><![CDATA[
BackgroundThe integration of genome annotations and reference databases is critical to the identification of genetic variants that may be of interest in studies of disease or other traits. However, comprehensive variant annotation with diverse file formats is difficult with existing methods.nnResultsWe have developed vcfanno as a flexible toolset that simplifies the annotation of genetic variants in VCF format. Vcfanno can extract and summarize multiple attributes from one or more annotation files and append the resulting annotations to the INFO field of the original VCF file. Vcfanno also integrates the lua scripting language so that users can easily develop custom annotations and metrics. By leveraging a new parallel "chromosome sweeping" algorithm, it enables rapid annotation of both whole-exome and whole-genome datasets. We demonstrate this performance by annotating over 85.3 million variants in less than 17 minutes (>85,000 variants per second) with 50 attributes from 17 commonly used genome annotation resources.nnConclusionsVcfanno is a flexible software package that provides researchers with the ability to annotate genetic variation with a wide range of datasets and reference databases in diverse genomic formats.nnAvailabilityThe vcfanno source code is available at https://github.com/brentp/vcfanno under the MIT license, and platform-specific binaries are available at https://github.com/brentp/vcfanno/releases. Detailed documentation is available at http://brentp.github.io/vcfanno/, and the code underlying the analyses presented can be found at https://github.com/brentp/vcfanno/tree/master/scripts/paper.
]]></description>
<dc:creator>Brent Pedersen</dc:creator>
<dc:creator>Ryan Layer</dc:creator>
<dc:creator>Aaron Quinlan</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-02-29</dc:date>
<dc:identifier>doi:10.1101/041863</dc:identifier>
<dc:title><![CDATA[Vcfanno: fast, flexible annotation of genetic variants]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-02-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/052571v1?rss=1">
<title>
<![CDATA[
Chromatin variation associated with liver metabolism is mediated by transposable elements 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/052571v1?rss=1"
</link>
<description><![CDATA[
BackgroundFunctional regulatory regions in eukaryotic genomes are characterized by the disruption of nucleosomes leading to accessible chromatin. The modulation of chromatin accessibility is one of the key mediators of transcriptional regulation and variation in chromatin accessibility across individuals has been liked to complex traits and disease susceptibility. While mechanisms responsible for chromatin variation across individuals have been investigated, the overwhelming majority of chromatin variation remains unexplained. Furthermore, the processes through which the variation of chromatin accessibility contributes to phenotypic diversity remain poorly understood.nnResultsWe profiled chromatin accessibility in liver from seven strains of mice with phenotypic diversity in response to a high-fat/high-sucrose (HF/HS) diet and identified reproducible chromatin variation across the genome. We found that sites of variable chromatin accessibility were more likely to coincide with particular classes of transposable elements (TEs) than sites with common chromatin features. Evolutionarily younger long interspersed nuclear elements (LINEs) are particularly enriched for variable chromatin sites. These younger LINEs are enriched for binding sites of immune-associated transcription factors, whereas older LINEs are enriched for liver-specific transcription factors. Genomic region enrichment analysis indicates that variable chromatin sites at TEs contribute to liver metabolic pathways. Finally, we show that polymorphism of TEs and differential DNA methylation at TEs can both contribute to chromatin variation.nnConclusionsOur results demonstrate specific classes of TEs contribute to chromatin accessibility variation across strains of mice that display phenotypic diversity in response to a HF/HS diet. These results indicate that regulatory variation at TEs is an important contributor to phenotypic variation among populations.
]]></description>
<dc:creator>Juan Du</dc:creator>
<dc:creator>Amy Leung</dc:creator>
<dc:creator>Candi Trac</dc:creator>
<dc:creator>Brian W. Parks</dc:creator>
<dc:creator>Aldons J. Lusis</dc:creator>
<dc:creator>Rama Natarajan</dc:creator>
<dc:creator>Dustin E. Schones</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-05-10</dc:date>
<dc:identifier>doi:10.1101/052571</dc:identifier>
<dc:title><![CDATA[Chromatin variation associated with liver metabolism is mediated by transposable elements]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-05-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/052225v1?rss=1">
<title>
<![CDATA[
Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/052225v1?rss=1"
</link>
<description><![CDATA[
Single-cell RNA-seq technologies enable high throughput gene expression measurement of individual cells, and allow the discovery of heterogeneity within cell populations. Measurement of cell-to-cell gene expression similarity is critical to identification, visualization and analysis of cell populations. However, single-cell data introduce challenges to conventional measures of gene expression similarity because of the high level of noise, outliers and dropouts. Here, we propose a novel similarity-learning framework, SIMLR (single-cell interpretation via multi-kernel learning), which learns an appropriate distance metric from the data for dimension reduction, clustering and visualization applications. Benchmarking against state-of-the-art methods for these applications, we used SIMLR to re-analyse seven representative single-cell data sets, including high-throughput droplet-based data sets with tens of thousands of cells. We show that SIMLR greatly improves clustering sensitivity and accuracy, as well as the visualization and interpretability of the data.
]]></description>
<dc:creator>Bo Wang</dc:creator>
<dc:creator>Junjie Zhu</dc:creator>
<dc:creator>Emma Pierson</dc:creator>
<dc:creator>Daniele Ramazzotti</dc:creator>
<dc:creator>Serafim Batzoglou</dc:creator>
<dc:creator></dc:creator>
<dc:date>2016-05-09</dc:date>
<dc:identifier>doi:10.1101/052225</dc:identifier>
<dc:title><![CDATA[Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2016-05-09</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/010934v1?rss=1">
<title>
<![CDATA[
When is selection effective? 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/010934v1?rss=1"
</link>
<description><![CDATA[
Deleterious alleles can reach high frequency in small populations because of random fluctuations in allele frequency. This may lead, over time, to reduced average fitness. In that sense, selection is more  effective in larger populations. Recent studies have considered whether the different demographic histories across human populations have resulted in differences in the number, distribution, and severity of deleterious variants, leading to an animated debate. This article first seeks to clarify some terms of the debate by identifying differences in definitions and assumptions used in recent studies. We argue that variants of Morton, Crow and Mullers  total mutational damage provide the soundest and most practical basis for such comparisons. Using simulations, analytical calculations, and 1000 Genomes data, we provide an intuitive and quantitative explanation for the observed similarity in genetic load across populations. We show that recent demography has likely modulated the effect of selection, and still affects it, but the net result of the accumulated differences is small. Direct observation of differential efficacy of selection for specific allele classes is nevertheless possible with contemporary datasets. By contrast, identifying average genome-wide differences in the efficacy of selection across populations will require many modelling assumptions, and is unlikely to provide much biological insight about human populations.
]]></description>
<dc:creator>Simon Gravel</dc:creator>
<dc:creator></dc:creator>
<dc:date>2014-10-30</dc:date>
<dc:identifier>doi:10.1101/010934</dc:identifier>
<dc:title><![CDATA[When is selection effective?]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2014-10-30</prism:publicationDate>
<prism:section></prism:section>
</item>
</rdf:RDF>
