	<rdf:RDF xmlns:admin="http://webns.net/mvcb/" xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:prism="http://purl.org/rss/1.0/modules/prism/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/">
	<channel rdf:about="https://biorxiv.org">
	<admin:errorReportsTo rdf:resource="mailto:biorxiv@cshlpress.edu"/>
	<title>bioRxiv Channel: Rosetta Commons</title>
	<link>https://biorxiv.org</link>
	<description>
	This feed contains articles for bioRxiv Channel "Rosetta Commons"
	</description>

		<items>
	<rdf:Seq>
		</rdf:Seq>
	</items>
	<prism:eIssn/>
	<prism:publicationName>bioRxiv</prism:publicationName>
	<prism:issn/>

	<image rdf:resource=""/>
	</channel>
	<image rdf:about="">
	<title>bioRxiv</title>
	<url/>
	<link>https://biorxiv.org</link>
	</image>
	<item rdf:about="https://biorxiv.org/cgi/content/short/2020.06.25.171371v1?rss=1">
<title>
<![CDATA[
Structural basis for peptide substrate specificities of glycosyltransferase GalNAc-T2 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.06.25.171371v1?rss=1"
</link>
<description><![CDATA[
The polypeptide N-acetylgalactosaminyl transferase (GalNAc-T) enzyme family initiates O-linked mucin-type glycosylation. The family constitutes 20 isozymes in humans--an unusually large number--unique to O-glycosylation. GalNAc-Ts exhibit both redundancy and finely tuned specificity for a wide range of peptide substrates. In this work, we deciphered the sequence and structural motifs that determine the peptide substrate preferences for the GalNAc-T2 isoform. Our approach involved sampling and characterization of peptide-enzyme conformations obtained from Rosetta Monte Carlo-minimization-based flexible docking. We computationally scanned 19 amino acid residues at positions -1 and +1 of an eight-residue peptide substrate, which comprised a dataset of 361 (19x19) peptides with previously characterized experimental GalNAc-T2 glycosylation efficiencies. The calculations recapitulated experimental specificity data, successfully discriminating between glycosylatable and non-glycosylatable peptides with a probability of 96.5% (ROC-AUC score), a balanced accuracy of 85.5% and a false positive rate of 7.3%. The glycosylatable peptide substrates viz. peptides with proline, serine, threonine, and alanine at the -1 position of the peptide preferentially exhibited cognate sequon-like conformations. The preference for specific residues at the -1 position of the peptide was regulated by enzyme residues R362, K363, Q364, H365 and W331, which modulate the pocket size and specific enzyme-peptide interactions. For the +1 position of the peptide, enzyme residues K281 and K363 formed gating interactions with aromatics and glutamines at the +1 position of the peptide, leading to modes of peptide-binding sub-optimal for catalysis. Overall, our work revealed enzyme features that lead to the finely tuned specificity observed for a broad range of peptide substrates for the GalNAc-T2 enzyme. We anticipate that the key sequence and structural motifs can be extended to analyze specificities of other isoforms of the GalNAc-T family and can be used to guide design of variants with tailored specificity.
]]></description>
<dc:creator>Mahajan, S. P.</dc:creator>
<dc:creator>Srinivasan, Y.</dc:creator>
<dc:creator>Labonte, J. W.</dc:creator>
<dc:creator>DeLisa, M. P.</dc:creator>
<dc:creator>Gray, J. J.</dc:creator>
<dc:date>2020-06-27</dc:date>
<dc:identifier>doi:10.1101/2020.06.25.171371</dc:identifier>
<dc:title><![CDATA[Structural basis for peptide substrate specificities of glycosyltransferase GalNAc-T2]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-06-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.04.04.438423v1?rss=1">
<title>
<![CDATA[
Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.04.04.438423v1?rss=1"
</link>
<description><![CDATA[
Each year vast international resources are wasted on irreproducible research. The scientific community has been slow to adopt standard software engineering practices, despite the increases in high-dimensional data, complexities of workflows, and computational environments. Here we show how scientific software applications can be created in a reproducible manner when simple design goals for reproducibility are met. We describe the implementation of a test server framework and 40 scientific benchmarks, covering numerous applications in Rosetta bio-macromolecular modeling. High performance computing cluster integration allows these benchmarks to run continuously and automatically. Detailed protocol captures are useful for developers and users of Rosetta and other macromolecular modeling tools. The framework and design concepts presented here are valuable for developers and users of any type of scientific software and for the scientific community to create reproducible methods. Specific examples highlight the utility of this framework and the comprehensive documentation illustrates the ease of adding new tests in a matter of hours.
]]></description>
<dc:creator>Koehler Leman, J.</dc:creator>
<dc:creator>Lyskov, S.</dc:creator>
<dc:creator>Lewis, S.</dc:creator>
<dc:creator>Adolf-Bryfogle, J.</dc:creator>
<dc:creator>Alford, R. F.</dc:creator>
<dc:creator>Barlow, K.</dc:creator>
<dc:creator>Ben-Aharon, Z.</dc:creator>
<dc:creator>Farrell, D.</dc:creator>
<dc:creator>Fell, J.</dc:creator>
<dc:creator>Hansen, W. A.</dc:creator>
<dc:creator>Harmalkar, A.</dc:creator>
<dc:creator>Jeliazkov, J.</dc:creator>
<dc:creator>Krys, J. D.</dc:creator>
<dc:creator>Kuenze, G.</dc:creator>
<dc:creator>Ljubetic, A.</dc:creator>
<dc:creator>Loshbaugh, A. L.</dc:creator>
<dc:creator>Maguire, J.</dc:creator>
<dc:creator>Moretti, R.</dc:creator>
<dc:creator>Mulligan, V. K.</dc:creator>
<dc:creator>Nguyen, P. T.</dc:creator>
<dc:creator>OConchuir, S.</dc:creator>
<dc:creator>Roy Burman, S. S.</dc:creator>
<dc:creator>Smith, S. T.</dc:creator>
<dc:creator>Teets, F.</dc:creator>
<dc:creator>Tiemann, J. K.</dc:creator>
<dc:creator>Watkins, A.</dc:creator>
<dc:creator>Woods, H.</dc:creator>
<dc:creator>Yachnin, B. J.</dc:creator>
<dc:creator>Bahl, C. D.</dc:creator>
<dc:creator>Bailey-Kellogg, C.</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:creator>Das, R.</dc:creator>
<dc:creator>DiMaio, F.</dc:creator>
<dc:creator>Khare, S. D.</dc:creator>
<dc:creator>Kortemme, T.</dc:creator>
<dc:creator>Labonte, J. W.</dc:creator>
<dc:creator>Lindorff-Larsen, K.</dc:creator>
<dc:creator>Meiler, J.</dc:creator>
<dc:creator>Schief, W.</dc:creator>
<dc:creator>Schueler-Furman, O.</dc:creator>
<dc:creator>Siegel, J.</dc:creator>
<dc:creator>Stein, A.</dc:creator>
<dc:creator></dc:creator>
<dc:date>2021-04-05</dc:date>
<dc:identifier>doi:10.1101/2021.04.04.438423</dc:identifier>
<dc:title><![CDATA[Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-04-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.06.23.168021v1?rss=1">
<title>
<![CDATA[
Diverse scientific benchmarks for implicit membrane energy functions 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.06.23.168021v1?rss=1"
</link>
<description><![CDATA[
Energy functions are fundamental to biomolecular modeling. Their success depends on robust physical formalisms, efficient optimization, and high-resolution data for training and validation. Over the past 20 years, progress in each area has advanced soluble protein energy functions. Yet, energy functions for membrane proteins lag behind due to sparse and low-quality data, leading to overfit tools. To overcome this challenge, we assembled a suite of 12 tests on independent datasets varying in size, diversity, and resolution. The tests probe an energy functions ability to capture membrane protein orientation, stability, sequence, and structure. Here, we present the tests and use the franklin2019 energy function to demonstrate them. We then present a vision for transforming these "small" datasets into "big data" that can be used for more sophisticated energy function optimization. The tests are available through the Rosetta Benchmark Server (https://benchmark.graylab.jhu.edu/) and GitHub (https://github.com/rfalford12/Implicit-Membrane-Energy-Function-Benchmark).
]]></description>
<dc:creator>Alford, R. F.</dc:creator>
<dc:creator>Gray, J. J.</dc:creator>
<dc:date>2020-06-24</dc:date>
<dc:identifier>doi:10.1101/2020.06.23.168021</dc:identifier>
<dc:title><![CDATA[Diverse scientific benchmarks for implicit membrane energy functions]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-06-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.05.26.116210v1?rss=1">
<title>
<![CDATA[
Robustification of RosettaAntibody and Rosetta SnugDock 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.05.26.116210v1?rss=1"
</link>
<description><![CDATA[
In recent years, the observed antibody sequence space has grown exponentially due to advances in high-throughput sequencing of immune receptors. The rise in sequences has not been mirrored by a rise in structures, as experimental structure determination techniques have remained low-throughput. Computational modeling, however, has the potential to close the sequence-structure gap. To achieve this goal, computational methods must be robust, fast, easy to use, and accurate. Here we report on the latest advances made in RosettaAntibody and Rosetta SnugDock--methods for antibody structure prediction and antibody-antigen docking. We simplified the user interface, expanded and automated the template database, generalized the kinematics of antibody-antigen docking (which enabled modeling of single-domain antibodies) and incorporated new loop modeling techniques. To evaluate the effects of our updates on modeling accuracy, we developed rigorous tests under a new scientific benchmarking framework within Rosetta. Benchmarking revealed that more structurally similar templates could be identified in the updated database and that SnugDock broadened its applicability without losing accuracy. However, there are further advances to be made, including increasing the accuracy and speed of CDR-H3 loop modeling, before computational approaches can accurately model any antibody.
]]></description>
<dc:creator>Jeliazkov, J. R.</dc:creator>
<dc:creator>Frick, R.</dc:creator>
<dc:creator>Zhou, J.</dc:creator>
<dc:creator>Gray, J. J.</dc:creator>
<dc:date>2020-05-26</dc:date>
<dc:identifier>doi:10.1101/2020.05.26.116210</dc:identifier>
<dc:title><![CDATA[Robustification of RosettaAntibody and Rosetta SnugDock]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-05-26</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.02.09.940254v1?rss=1">
<title>
<![CDATA[
Geometric Potentials from Deep Learning Improve Prediction of CDR H3 Loop Structures 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.02.09.940254v1?rss=1"
</link>
<description><![CDATA[
Antibody structure is largely conserved, except for a complementarity-determining region featuring six variable loops. Five of these loops adopt canonical folds which can typically be predicted with existing methods, while the remaining loop (CDR H3) remains a challenge due to its highly diverse set of observed conformations. In recent years, deep neural networks have proven to be effective at capturing the complex patterns of protein structure. This work proposes DeepH3, a deep residual neural network that learns to predict inter-residue distances and orientations from antibody heavy and light chain sequence. The output of DeepH3 is a set of probability distributions over distances and orientation angles between pairs of residues. These distributions are converted to geometric potentials and used to discriminate between decoy structures produced by RosettaAntibody. When evaluated on the Rosetta Antibody Benchmark dataset of 49 targets, DeepH3-predicted potentials identified better, same, and worse structures (measured by root-mean-squared distance [RMSD] from the experimental CDR H3 loop structure) than the standard Rosetta energy function for 30, 13, and 6 targets, respectively, and improved the average RMSD of predictions by 21.3% (0.48 [A]). Analysis of individual geometric potentials revealed that inter-residue orientations were more effective than inter-residue distances for discriminating near-native CDR H3 loop structures.
]]></description>
<dc:creator>Ruffolo, J. A.</dc:creator>
<dc:creator>Guerra, C.</dc:creator>
<dc:creator>Mahajan, S. P.</dc:creator>
<dc:creator>Sulam, J.</dc:creator>
<dc:creator>Gray, J. J.</dc:creator>
<dc:date>2020-02-10</dc:date>
<dc:identifier>doi:10.1101/2020.02.09.940254</dc:identifier>
<dc:title><![CDATA[Geometric Potentials from Deep Learning Improve Prediction of CDR H3 Loop Structures]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-02-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.05.27.119354v1?rss=1">
<title>
<![CDATA[
PRosettaC: Rosetta based modeling of PROTAC mediated ternary complexes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.05.27.119354v1?rss=1"
</link>
<description><![CDATA[
Proteolysis-targeting chimeras (PROTACs), which induce degradation by recruitment of an E3 ligase to a target protein, are gaining much interest as a new pharmacological modality. However, designing PROTACs is challenging. Formation of a ternary complex between the protein target, the PROTAC and the recruited E3 ligase is considered paramount for successful degradation. A structural model of this ternary complex could in principle inform rational PROTAC design. Unfortunately, only a handful of structures are available for such complexes, necessitating tools for their modeling. We developed a combined protocol that alternates between sampling of the protein-protein interaction space and the PROTAC molecule conformational space. Application of this protocol - PRosettaC - to a benchmark of known PROTAC ternary complexes results in near-native predictions, with often atomic accuracy prediction of the protein chains, as well as the PROTAC binding moieties. It allowed the modeling of a CRBN/BTK complex that recapitulated experimental results for a series of PROTACs. PRosettaC generated models may be used to design PROTACs for new targets, as well as improve PROTACs for existing targets, potentially cutting down time and synthesis efforts.
]]></description>
<dc:creator>Zaidman, D.</dc:creator>
<dc:creator>London, N.</dc:creator>
<dc:date>2020-05-30</dc:date>
<dc:identifier>doi:10.1101/2020.05.27.119354</dc:identifier>
<dc:title><![CDATA[PRosettaC: Rosetta based modeling of PROTAC mediated ternary complexes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-05-30</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.11.29.402743v1?rss=1">
<title>
<![CDATA[
Design of proteins presenting discontinuous functional sites using deep learning 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.11.29.402743v1?rss=1"
</link>
<description><![CDATA[
An outstanding challenge in protein design is the design of binders against therapeutically relevant target proteins via scaffolding the discontinuous binding interfaces present in their often large and complex binding partners. There is currently no method for sampling through the almost unlimited number of possible protein structures for those capable of scaffolding a specified discontinuous functional site; instead, current approaches make the sampling problem tractable by restricting search to structures composed of pre-defined secondary structural elements. Such restriction of search has the disadvantage that considerable trial and error can be required to identify architectures capable of scaffolding an arbitrary discontinuous functional site, and only a tiny fraction of possible architectures can be explored. Here we build on recent advances in de novo protein design by deep network hallucination to develop a solution to this problem which eliminates the need to pre-specify the structure of the scaffolding in any way. We use the trRosetta residual neural network, which maps input sequences to predicted inter-residue distances and orientations, to compute a loss function which simultaneously rewards recapitulation of a desired structural motif and the ideality of the surrounding scaffold, and generate diverse structures harboring the desired binding interface by optimizing this loss function by gradient descent. We illustrate the power and versatility of the method by scaffolding binding sites from proteins involved in key signaling pathways with a wide range of secondary structure compositions and geometries. The method should be broadly useful for designing small stable proteins containing complex functional sites.
]]></description>
<dc:creator>Tischer, D.</dc:creator>
<dc:creator>Lisanza, S.</dc:creator>
<dc:creator>Wang, J.</dc:creator>
<dc:creator>Dong, R.</dc:creator>
<dc:creator>Anishchenko, I. K.</dc:creator>
<dc:creator>Milles, L.</dc:creator>
<dc:creator>Ovchinnikov, S.</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:date>2020-11-29</dc:date>
<dc:identifier>doi:10.1101/2020.11.29.402743</dc:identifier>
<dc:title><![CDATA[Design of proteins presenting discontinuous functional sites using deep learning]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-11-29</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/630715v1?rss=1">
<title>
<![CDATA[
Protein structure prediction and design in a biologically-realistic implicit membrane 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/630715v1?rss=1"
</link>
<description><![CDATA[
Protein design is a powerful tool for elucidating mechanisms of function and engineering new therapeutics and nanotechnologies. While soluble protein design has advanced, membrane protein design remains challenging due to difficulties in modeling the lipid bilayer. In this work, we developed an implicit approach that captures the anisotropic structure, shape of water-filled pores, and nanoscale dimensions of membranes with different lipid compositions. The model improves performance in computational bench-marks against experimental targets including prediction of protein orientations in the bilayer, {Delta}{Delta}G calculations, native structure dis-crimination, and native sequence recovery. When applied to de novo protein design, this approach designs sequences with an amino acid distribution near the native amino acid distribution in membrane proteins, overcoming a critical flaw in previous membrane models that were prone to generating leucine-rich designs. Further, the proteins designed in the new membrane model exhibit native-like features including interfacial aromatic side chains, hydrophobic lengths compatible with bilayer thickness, and polar pores. Our method advances high-resolution membrane protein structure prediction and design toward tackling key biological questions and engineering challenges.nnSignificance StatementMembrane proteins participate in many life processes including transport, signaling, and catalysis. They constitute over 30% of all proteins and are targets for over 60% of pharmaceuticals. Computational design tools for membrane proteins will transform the interrogation of basic science questions such as membrane protein thermodynamics and the pipeline for engineering new therapeutics and nanotechnologies. Existing tools are either too expensive to compute or rely on manual design strategies. In this work, we developed a fast and accurate method for membrane protein design. The tool is available to the public and will accelerate the experimental design pipeline for membrane proteins.
]]></description>
<dc:creator>Alford, R. F.</dc:creator>
<dc:creator>Fleming, P. J.</dc:creator>
<dc:creator>Fleming, K. G.</dc:creator>
<dc:creator>Gray, J. J.</dc:creator>
<dc:date>2019-05-08</dc:date>
<dc:identifier>doi:10.1101/630715</dc:identifier>
<dc:title><![CDATA[Protein structure prediction and design in a biologically-realistic implicit membrane]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-05-08</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/752485v1?rss=1">
<title>
<![CDATA[
Designing Peptides on a Quantum Computer 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/752485v1?rss=1"
</link>
<description><![CDATA[
Although a wide variety of quantum computers are currently being developed, actual computational results have been largely restricted to contrived, artificial tasks. Finding ways to apply quantum computers to useful, real-world computational tasks remains an active research area. Here we describe our mapping of the protein design problem to the D-Wave quantum annealer. We present a system whereby Rosetta, a state-of-the-art protein design software suite, interfaces with the D-Wave quantum processing unit to find amino acid side chain identities and conformations to stabilize a fixed protein backbone. Our approach, which we call the QPacker, uses a large side-chain rotamer library and the full Rosetta energy function, and in no way reduces the design task to a simpler format. We demonstrate that quantum annealer-based design can be applied to complex real-world design tasks, producing designed molecules comparable to those produced by widely adopted classical design approaches. We also show through large-scale classical folding simulations that the results produced on the quantum annealer can inform wet-lab experiments. For design tasks that scale exponentially on classical computers, the QPacker achieves nearly constant runtime performance over the range of problem sizes that could be tested. We anticipate better than classical performance scaling as quantum computers mature.
]]></description>
<dc:creator>Mulligan, V. K.</dc:creator>
<dc:creator>Melo, H.</dc:creator>
<dc:creator>Merritt, H. I.</dc:creator>
<dc:creator>Slocum, S.</dc:creator>
<dc:creator>Weitzner, B. D.</dc:creator>
<dc:creator>Watkins, A. M.</dc:creator>
<dc:creator>Renfrew, P. D.</dc:creator>
<dc:creator>Pelissier, C.</dc:creator>
<dc:creator>Arora, P. S.</dc:creator>
<dc:creator>Bonneau, R.</dc:creator>
<dc:date>2019-09-02</dc:date>
<dc:identifier>doi:10.1101/752485</dc:identifier>
<dc:title><![CDATA[Designing Peptides on a Quantum Computer]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-09-02</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/618603v1?rss=1">
<title>
<![CDATA[
Efficient consideration of coordinated water molecules improves computational protein-protein and protein-ligand docking 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/618603v1?rss=1"
</link>
<description><![CDATA[
Highly-coordinated water molecules are frequently an integral part of protein-protein and protein-ligand interfaces. We introduce an updated energy model that efficiently captures the energetic effects of these highly-coordinated water molecules on the surfaces of proteins. A two-stage protocol is developed in which polar groups arranged in geometries suitable for water placement are first identified, then a modified Monte Carlo simulation allows highly coordinated waters to be placed on the surface of a protein while simultaneously sampling amino acid side chain orientations. This "semi-explicit" water model is implemented in Rosetta and is suitable for both structure prediction and protein design. We show that our new approach and energy - model yield significant improvements in native structure recovery of protein-protein and protein-ligand docking.
]]></description>
<dc:creator>Pavlovicz, R. E.</dc:creator>
<dc:creator>Park, H.</dc:creator>
<dc:creator>DiMaio, F.</dc:creator>
<dc:date>2019-04-25</dc:date>
<dc:identifier>doi:10.1101/618603</dc:identifier>
<dc:title><![CDATA[Efficient consideration of coordinated water molecules improves computational protein-protein and protein-ligand docking]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-04-25</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/764449v1?rss=1">
<title>
<![CDATA[
FARFAR2: Improved de novo Rosetta prediction of complex global RNA folds 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/764449v1?rss=1"
</link>
<description><![CDATA[
Methods to predict RNA 3D structures from sequence are needed to understand the exploding number of RNA molecules being discovered across biology. As assessed during community-wide RNA-Puzzles trials, Rosettas Fragment Assembly of RNA with Full-Atom Refinement (FARFAR) enables accurate prediction of complex folds, but it remains unclear how much human intervention and experimental guidance is needed to achieve this performance. Here, we present FARFAR2, a protocol integrating recent innovations with updated RNA fragment libraries and helix modeling. In 16 of 21 RNA-Puzzles revisited without experimental data or expert intervention, FARFAR2 recovers structures that are more accurate than the original models submitted by our group and other participants during the RNA-Puzzles trials. In five prospective tests, pre-registered FARFAR2 models for riboswitches and adenovirus VA-I achieved 3-8 [A] RMSD accuracies. Finally, we present a server and three large model archives (FARFAR2-Classics, FARFAR2-Motifs, and FARFAR2-Puzzles) to guide future applications and advances.
]]></description>
<dc:creator>Das, R.</dc:creator>
<dc:creator>Watkins, A. M.</dc:creator>
<dc:date>2019-09-10</dc:date>
<dc:identifier>doi:10.1101/764449</dc:identifier>
<dc:title><![CDATA[FARFAR2: Improved de novo Rosetta prediction of complex global RNA folds]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-09-10</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.03.18.989657v1?rss=1">
<title>
<![CDATA[
Prediction of protein mutational free energy: benchmark and sampling improvements increase classification accuracy 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.03.18.989657v1?rss=1"
</link>
<description><![CDATA[
Software to predict the change in protein stability upon point mutation is a valuable tool for a number of biotechnological and scientific problems. To facilitate the development of such software and provide easy access to the available experimental data, the ProTherm database was created. Biases in the methods and types of information collected has led to disparity in the types of mutations for which experimental data is available. For example, mutations to alanine are hugely overrepresented whereas those involving charged residues, especially from one charged residue to another, are underrepresented. ProTherm subsets created as benchmark sets that do not account for this often underrepresented certain mutational types. This issue introduces systematic biases into previously published protocols ability to accurately predict the change in folding energy on these classes of mutations. To resolve this issue, we have generated a new benchmark set with these problems corrected. We have then used the benchmark set to test a number of improvements to the point mutation energetics tools in the Rosetta software suite.
]]></description>
<dc:creator>Frenz, B.</dc:creator>
<dc:creator>Lewis, S.</dc:creator>
<dc:creator>King, I.</dc:creator>
<dc:creator>Park, H.</dc:creator>
<dc:creator>DiMaio, F.</dc:creator>
<dc:creator>Song, Y.</dc:creator>
<dc:date>2020-03-20</dc:date>
<dc:identifier>doi:10.1101/2020.03.18.989657</dc:identifier>
<dc:title><![CDATA[Prediction of protein mutational free energy: benchmark and sampling improvements increase classification accuracy]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-03-20</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.07.22.211482v1?rss=1">
<title>
<![CDATA[
De novo protein design by deep network hallucination 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.07.22.211482v1?rss=1"
</link>
<description><![CDATA[
There has been considerable recent progress in protein structure prediction using deep neural networks to infer distance constraints from amino acid residue co-evolution1-3. We investigated whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occuring proteins used in training the models. We generated random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting distance maps, which as expected are quite featureless. We then carried out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (KL-divergence) between the distance distributions predicted by the network and the background distribution. Optimization from different random starting points resulted in a wide range of proteins with diverse sequences and all alpha, all beta sheet, and mixed alpha-beta structures. We obtained synthetic genes encoding 129 of these network hallucinated sequences, expressed and purified the proteins in E coli, and found that 27 folded to monomeric stable structures with circular dichroism spectra consistent with the hallucinated structures. Thus deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute, alongside traditional physically based models, to the de novo design of proteins with new functions.
]]></description>
<dc:creator>Anishchenko, I.</dc:creator>
<dc:creator>Chidyausiku, T. M.</dc:creator>
<dc:creator>Ovchinnikov, S.</dc:creator>
<dc:creator>Pellock, S. J.</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:date>2020-07-23</dc:date>
<dc:identifier>doi:10.1101/2020.07.22.211482</dc:identifier>
<dc:title><![CDATA[De novo protein design by deep network hallucination]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-07-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/597872v1?rss=1">
<title>
<![CDATA[
Integrative protein modeling in RosettaNMR from sparse paramagnetic restraints 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/597872v1?rss=1"
</link>
<description><![CDATA[
Computational methods to predict protein structure from nuclear magnetic resonance (NMR) restraints that only require assignment of backbone signals hold great potential to study larger proteins and complexes. Additionally, computational methods designed to work with sparse data add atomic detail that is missing in the experimental restraints, allowing application to systems that are difficult to investigate. While specific frameworks in the Rosetta macromolecular modeling suite support the use of certain NMR restraint types, use of all commonly measured restraint types together is precluded. Here, we introduce a comprehensive framework into Rosetta that reconciles CS-Rosetta, PCS-Rosetta and RosettaNMR into a single framework, that, in addition to backbone chemical shifts and nuclear Overhauser effect distance restraints, leverages NMR restraints derived from paramagnetic labeling. Specifically, RosettaNMR incorporates pseudocontact shifts, residual dipolar couplings, and paramagnetic relaxation enhancements, measured at multiple tagging sites. We further showcase the generality of RosettaNMR for various modeling challenges and benchmark it on 28 structure prediction cases, eight symmetric assemblies, two protein-protein and three protein-ligand docking examples. Paramagnetic restraints generated more accurate models for 85% of the benchmark proteins and, when combined with chemical shifts, sampled high-accuracy models ([&le;] 2[A]) in 50% of the cases.nnSignificance StatementComputational methods such as Rosetta can assist NMR structure determination by employing efficient conformational search algorithms alongside physically realistic energy functions to model protein structure from sparse experimental data. We have developed a framework in Rosetta that leverages paramagnetic NMR data in addition to chemical shift and nuclear Overhauser effect restraints and extends RosettaNMR calculations to the prediction of symmetric assemblies, protein-protein and protein-ligand complexes. RosettaNMR generated high-accuracy models ([&le;] 2[A]) in 50% of cases for a benchmark set of 28 monomeric and eight symmetric proteins and predicted protein-protein and protein-ligand interfaces with up to 1[A] accuracy. The method expands Rosettas rich toolbox for integrative data-driven modeling and promises to be broadly useful in structural biology.
]]></description>
<dc:creator>Kuenze, G.</dc:creator>
<dc:creator>Bonneau, R.</dc:creator>
<dc:creator>Koehler Leman, J.</dc:creator>
<dc:creator>Meiler, J.</dc:creator>
<dc:date>2019-04-03</dc:date>
<dc:identifier>doi:10.1101/597872</dc:identifier>
<dc:title><![CDATA[Integrative protein modeling in RosettaNMR from sparse paramagnetic restraints]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-04-03</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.03.12.435185v1?rss=1">
<title>
<![CDATA[
Large-scale design and refinement of stable proteins using sequence-only models 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.03.12.435185v1?rss=1"
</link>
<description><![CDATA[
Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we report a neural network model that predicts protein stability based only on sequences of amino acids, and demonstrate its performance by evaluating the stability of almost 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We also report a second neural network model that is able to generate novel stable proteins. Finally, we show that the predictive model can be used to substantially increase the stability of both expert-designed and model-generated proteins.
]]></description>
<dc:creator>Singer, J. M.</dc:creator>
<dc:creator>Novotney, S.</dc:creator>
<dc:creator>Strickland, D.</dc:creator>
<dc:creator>Haddox, H. K.</dc:creator>
<dc:creator>Leiby, N.</dc:creator>
<dc:creator>Rocklin, G. J.</dc:creator>
<dc:creator>Chow, C. M.</dc:creator>
<dc:creator>Roy, A.</dc:creator>
<dc:creator>Bera, A. K.</dc:creator>
<dc:creator>Motta, F. C.</dc:creator>
<dc:creator>Cao, L.</dc:creator>
<dc:creator>Strauch, E.-M.</dc:creator>
<dc:creator>Chidyausiku, T. M.</dc:creator>
<dc:creator>Ford, A.</dc:creator>
<dc:creator>Ho, E.</dc:creator>
<dc:creator>Mackenzie, C. O.</dc:creator>
<dc:creator>Eramian, H.</dc:creator>
<dc:creator>DiMaio, F.</dc:creator>
<dc:creator>Grigoryan, G.</dc:creator>
<dc:creator>Vaughn, M.</dc:creator>
<dc:creator>Stewart, L. J.</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:creator>Klavins, E.</dc:creator>
<dc:date>2021-03-12</dc:date>
<dc:identifier>doi:10.1101/2021.03.12.435185</dc:identifier>
<dc:title><![CDATA[Large-scale design and refinement of stable proteins using sequence-only models]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-03-12</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.03.10.434454v1?rss=1">
<title>
<![CDATA[
Sampling of Structure and Sequence Space of Small Protein Folds 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.03.10.434454v1?rss=1"
</link>
<description><![CDATA[
Nature only samples a small fraction in sequence space, yet many more amino acid combinations can fold into stable proteins. Furthermore, small structural variations in a single fold, which may only be a few amino acids different from the next homolog, define their molecular function. Hence, to design proteins with novel molecular functionalities, such as molecular recognition, methods to control and sample shape diversity are necessary. To explore this space, we developed and experimentally validated a computational platform that can design a wide variety of small protein folds while sampling high shape diversity. We designed and evaluated about 30,000 de novo protein designs of 7 different folds. Among these designs, about 6,200 stable proteins were identified, with predicted structures having first-of-its-kind minimalized thioredoxin. Obtained data revealed more protein folding rules, such as helix connecting loops, which were in nature. Beyond providing a resource database for protein engineering, our data presents a large training data set for machine learning. We developed a high-accuracy classifier to predict the stability of our designed proteins. The methods and the wide range of new protein shapes provide a basis for the design of new protein function without compromising stability.
]]></description>
<dc:creator>Linsky, T. W.</dc:creator>
<dc:creator>Noble, K.</dc:creator>
<dc:creator>Tobin, A.</dc:creator>
<dc:creator>Crow, R.</dc:creator>
<dc:creator>Carter, L. P.</dc:creator>
<dc:creator>Urbauer, J. L.</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:creator>Strauch, E.-M.</dc:creator>
<dc:date>2021-03-11</dc:date>
<dc:identifier>doi:10.1101/2021.03.10.434454</dc:identifier>
<dc:title><![CDATA[Sampling of Structure and Sequence Space of Small Protein Folds]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-03-11</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.12.21.423882v1?rss=1">
<title>
<![CDATA[
Single Layers of Attention Suffice to Predict Protein Contacts 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.12.21.423882v1?rss=1"
</link>
<description><![CDATA[
AO_SCPLOWBSTRACTC_SCPLOWThe established approach to unsupervised protein contact prediction estimates co-evolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment, then predicts that the edges with highest weight correspond to contacts in the 3D structure. On the other hand, increasingly large Transformers are being pretrained on protein sequence databases but have demonstrated mixed results for downstream tasks, including contact prediction. This has sparked discussion about the role of scale and attention-based models in unsupervised protein representation learning. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We introduce a simplified attention layer, factored attention, and show that it achieves comparable performance to Potts models, while sharing parameters both within and across families. Further, we extract contacts from the attention maps of a pretrained Transformer and show they perform competitively with the other two approaches. This provides evidence that large-scale pretraining can learn meaningful protein features when presented with unlabeled and unaligned data. We contrast factored attention with the Transformer to indicate that the Transformer leverages hierarchical signal in protein family databases not captured by our single-layer models. This raises the exciting possibility for the development of powerful structured models of protein family databases.1
]]></description>
<dc:creator>Bhattacharya, N.</dc:creator>
<dc:creator>Thomas, N.</dc:creator>
<dc:creator>Rao, R.</dc:creator>
<dc:creator>Daupras, J.</dc:creator>
<dc:creator>Koo, P.</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:creator>Song, Y. S.</dc:creator>
<dc:creator>Ovchinnikov, S.</dc:creator>
<dc:date>2020-12-22</dc:date>
<dc:identifier>doi:10.1101/2020.12.21.423882</dc:identifier>
<dc:title><![CDATA[Single Layers of Attention Suffice to Predict Protein Contacts]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-12-22</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.07.17.209643v1?rss=1">
<title>
<![CDATA[
Improved protein structure refinement guided by deep learning based accuracy estimation 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.07.17.209643v1?rss=1"
</link>
<description><![CDATA[
We develop a deep learning framework (DeepAccNet) that estimates per-residue accuracy and residue-residue distance signed error in protein models and uses these predictions to guide Rosetta protein structure refinement. The network uses 3D convolutions to evaluate local atomic environments followed by 2D convolutions to provide their global contexts and outperforms other methods that similarly predict the accuracy of protein structure models. Overall accuracy predictions for X-ray and cryoEM structures in the PDB correlate with their resolution, and the network should be broadly useful for assessing the accuracy of both predicted structure models and experimentally determined structures and identifying specific regions likely to be in error. Incorporation of the accuracy predictions at multiple stages in the Rosetta refinement protocol considerably increased the accuracy of the resulting protein structure models, illustrating how deep learning can improve search for global energy minima of biomolecules.
]]></description>
<dc:creator>Hiranuma, N.</dc:creator>
<dc:creator>Park, H.</dc:creator>
<dc:creator>Anishchanka, I.</dc:creator>
<dc:creator>Baek, M.</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:date>2020-07-19</dc:date>
<dc:identifier>doi:10.1101/2020.07.17.209643</dc:identifier>
<dc:title><![CDATA[Improved protein structure refinement guided by deep learning based accuracy estimation]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-07-19</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.10.22.346965v1?rss=1">
<title>
<![CDATA[
De novo design of transmembrane beta-barrels 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.10.22.346965v1?rss=1"
</link>
<description><![CDATA[
The ability of naturally occurring transmembrane {beta}-barrel proteins (TMBs) to spontaneously insert into lipid bilayers and form stable transmembrane pores is a remarkable feat of protein evolution and has been exploited in biotechnology for applications ranging from single molecule DNA and protein sequencing to biomimetic filtration membranes. Because it has not been possible to design TMBs from first principles, these efforts have relied on re-engineering of naturally occurring TMBs that generally have a biological function very different from that desired. Here we leverage the power of de novo computational design coupled with a "hypothesis, design and test" approach to determine principles underlying TMB structure and folding, and find that, unlike almost all other classes of protein, locally destabilizing sequences in both the {beta}-turns and {beta}-strands facilitate TMB expression and global folding by modulating the kinetics of folding and the competition between soluble misfolding and proper folding into the lipid bilayer. We use these principles to design new eight stranded TMBs with sequences unrelated to any known TMB and show that they insert and fold into detergent micelles and synthetic lipid membranes. The designed proteins fold more rapidly and reversibly in lipid membranes than the TMB domain of the model native protein OmpA, and high resolution NMR and X-ray crystal structures of one of the designs are very close to the computational model. The ability to design TMBs from first principles opens the door to custom design of TMBs for biotechnology and demonstrates the value of de novo design to investigate basic protein folding problems that are otherwise hidden by evolutionary history.

One sentence summarySuccess in de novo design of transmembrane {beta}-barrels reveals geometric and sequence constraints on the fold and paves the way to design of custom pores for sequencing and other single-molecule analytical applications.
]]></description>
<dc:creator>Vorobieva, A. A.</dc:creator>
<dc:creator>White, P.</dc:creator>
<dc:creator>Liang, B.</dc:creator>
<dc:creator>Horne, J. E.</dc:creator>
<dc:creator>Bera, A. K.</dc:creator>
<dc:creator>Chow, C. M.</dc:creator>
<dc:creator>Gerben, S. R.</dc:creator>
<dc:creator>Marx, S.</dc:creator>
<dc:creator>Kang, A.</dc:creator>
<dc:creator>Stiving, A. Q.</dc:creator>
<dc:creator>Harvey, S. R.</dc:creator>
<dc:creator>Marx, D. C.</dc:creator>
<dc:creator>Khan, N.</dc:creator>
<dc:creator>Fleming, K. G.</dc:creator>
<dc:creator>Wysocki, V. H.</dc:creator>
<dc:creator>Brockwell, D. J.</dc:creator>
<dc:creator>Tamm, L. K.</dc:creator>
<dc:creator>Radford, S. E.</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:date>2020-10-23</dc:date>
<dc:identifier>doi:10.1101/2020.10.22.346965</dc:identifier>
<dc:title><![CDATA[De novo design of transmembrane beta-barrels]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-10-23</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.09.06.285239v1?rss=1">
<title>
<![CDATA[
Learning a force field from small-molecule crystal lattice predictions enables consistent sub-Angstrom protein-ligand docking 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.09.06.285239v1?rss=1"
</link>
<description><![CDATA[
Accurate and rapid calculation of protein-small molecule interaction energies is critical for computational drug discovery. Because of the large chemical space spanned by drug-like molecules, classical force fields contain thousands of parameters describing atom-pair distance and torsional preferences; each parameter is typically optimized independently on simple representative molecules. Here we describe a new approach in which small-molecule force field parameters are jointly optimized guided by the rich source of information contained within thousands of available small molecule crystal structures. We optimize parameters by requiring that the experimentally determined molecular lattice arrangements have lower energy than all alternative lattice arrangements. Thousands of independent crystal lattice-prediction simulations were run on each of 1,386 small molecule crystal structures, and energy function parameters of an implicit solvent energy model were optimized so native crystal lattice arrangements had lowest energy. The resulting energy model was implemented in Rosetta, together with a rapid genetic algorithm docking method employing grid based scoring and receptor flexibility. The success rate of bound structure recapitulation in cross-docking on 1,112 complexes was improved by more than 10% over previously published methods, with solutions within <1 [A] in over half of the cases. Our results demonstrate that small molecule crystal structures are a rich source of information for systematically improving computational drug discovery.
]]></description>
<dc:creator>Park, H.</dc:creator>
<dc:creator>Zhou, G.</dc:creator>
<dc:creator>Baek, M.</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:creator>DiMaio, F.</dc:creator>
<dc:date>2020-09-07</dc:date>
<dc:identifier>doi:10.1101/2020.09.06.285239</dc:identifier>
<dc:title><![CDATA[Learning a force field from small-molecule crystal lattice predictions enables consistent sub-Angstrom protein-ligand docking]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-09-07</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.07.27.221333v1?rss=1">
<title>
<![CDATA[
Hierarchical design of multi-scale protein complexes by combinatorial assembly of oligomeric helical bundle and repeat protein building blocks 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.07.27.221333v1?rss=1"
</link>
<description><![CDATA[
A goal of de novo protein design is to develop a systematic and robust approach to generating complex nanomaterials from stable building blocks. Due to their structural regularity and simplicity, a wide range of monomeric repeat proteins and oligomeric helical bundle structures have been designed and characterized. Here we describe a stepwise hierarchical approach to building up multi-component symmetric protein assemblies using these structures. We first connect designed helical repeat proteins (DHRs) to designed helical bundle proteins (HBs) to generate a large library of heterodimeric and homooligomeric building blocks; the latter have cyclic symmetries ranging from C2 to C6. All of the building blocks have repeat proteins with accessible termini, which we take advantage of in a second round of architecture guided rigid helical fusion (WORMS) to generate larger symmetric assemblies including C3 and C5 cyclic and D2 dihedral rings, a tetrahedral cage, and a 120 subunit icosahedral cage. Characterization of the structures by small angle x-ray scattering, x-ray crystallography, and cryo-electron microscopy demonstrates that the hierarchical design approach can accurately and robustly generate a wide range of macromolecular assemblies; with a diameter of 43nm, the icosahedral nanocage is the largest structurally validated designed cage to date. The computational methods and building block sets described here provide a very general route to new de novo designed symmetric protein nanomaterials.
]]></description>
<dc:creator>Hsia, Y.</dc:creator>
<dc:creator>Mout, R.</dc:creator>
<dc:creator>Sheffler, W.</dc:creator>
<dc:creator>Edman, N. I.</dc:creator>
<dc:creator>Vulovic, I.</dc:creator>
<dc:creator>Park, Y.-J.</dc:creator>
<dc:creator>Redler, R. L.</dc:creator>
<dc:creator>Bick, M. J.</dc:creator>
<dc:creator>Bera, A. K.</dc:creator>
<dc:creator>Courbet, A.</dc:creator>
<dc:creator>Kang, A.</dc:creator>
<dc:creator>Brunette, T.</dc:creator>
<dc:creator>Nattermann, U.</dc:creator>
<dc:creator>Tsai, E.</dc:creator>
<dc:creator>Saleem, A.</dc:creator>
<dc:creator>Chow, C. M.</dc:creator>
<dc:creator>Ekiert, D. C.</dc:creator>
<dc:creator>Bhabha, G.</dc:creator>
<dc:creator>Veesler, D.</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:date>2020-07-28</dc:date>
<dc:identifier>doi:10.1101/2020.07.27.221333</dc:identifier>
<dc:title><![CDATA[Hierarchical design of multi-scale protein complexes by combinatorial assembly of oligomeric helical bundle and repeat protein building blocks]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-07-28</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.07.23.218917v1?rss=1">
<title>
<![CDATA[
Protein sequence design by explicit energy landscape optimization 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.07.23.218917v1?rss=1"
</link>
<description><![CDATA[
The protein design problem is to identify an amino acid sequence which folds to a desired structure. Given Anfinsens thermodynamic hypothesis of folding, this can be recast as finding an amino acid sequence for which the lowest energy conformation is that structure. As this calculation involves not only all possible amino acid sequences but also all possible structures, most current approaches focus instead on the more tractable problem of finding the lowest energy amino acid sequence for the desired structure, often checking by protein structure prediction in a second step that the desired structure is indeed the lowest energy conformation for the designed sequence, and discarding the in many cases large fraction of designed sequences for which this is not the case. Here we show that by backpropagating gradients through the trRosetta structure prediction network from the desired structure to the input amino acid sequence, we can directly optimize over all possible amino acid sequences and all possible structures, and in one calculation explicitly design amino acid sequences predicted to fold into the desired structure and not any other. We find that trRosetta calculations, which consider the full conformational landscape, can be more effective than Rosetta single point energy estimations in predicting folding and stability of de novo designed proteins. We compare sequence design by landscape optimization to the standard fixed backbone sequence design methodology in Rosetta, and show that the results of the former, but not the latter, are sensitive to the presence of competing low-lying states. We show further that more funneled energy landscapes can be designed by combining the strengths of the two approaches: the low resolution trRosetta model serves to disfavor alternative states, and the high resolution Rosetta model, to create a deep energy minimum at the design target structure.

SignificanceComputational protein design has primarily focused on finding sequences which have very low energy in the target designed structure. However, what is most relevant during folding is not the absolute energy of the folded state, but the energy difference between the folded state and the lowest lying alternative states. We describe a deep learning approach which captures the entire folding landscape, and show that it can enhance current protein design methods.
]]></description>
<dc:creator>Norn, C.</dc:creator>
<dc:creator>Wicky, B. I. M.</dc:creator>
<dc:creator>Juergens, D.</dc:creator>
<dc:creator>Liu, S.</dc:creator>
<dc:creator>Kim, D.</dc:creator>
<dc:creator>Koepnick, B.</dc:creator>
<dc:creator>Anishchenko, I.</dc:creator>
<dc:creator>Foldit Players,</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:creator>Ovchinnikov, S.</dc:creator>
<dc:date>2020-07-24</dc:date>
<dc:identifier>doi:10.1101/2020.07.23.218917</dc:identifier>
<dc:title><![CDATA[Protein sequence design by explicit energy landscape optimization]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-07-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.06.17.156646v1?rss=1">
<title>
<![CDATA[
Protein sequence optimization with a pairwise decomposable penalty for buried unsatisfied hydrogen bonds 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.06.17.156646v1?rss=1"
</link>
<description><![CDATA[
In aqueous solution, polar groups make hydrogen bonds with water, and hence burial of such groups in the interior of a protein is unfavorable unless the loss of hydrogen bonds with water is compensated by formation of new ones with other protein groups. Hence, buried "unsatisfied" polar groups making no hydrogen bonds are very rare in proteins. Efficiently representing the energetic cost of unsatisfied hydrogen bonds with a pairwise-decomposable energy term during protein design is challenging since whether or not a group is satisfied depends on all of its neighbors. Here we describe a method for assigning a pairwise-decomposable energy to sidechain rotamers such that following combinatorial sidechain packing, buried unsaturated polar atoms are penalized. The penalty can be any quadratic function of the number of unsatisfied polar groups, and can be computed very rapidly. We show that inclusion of this term in Rosetta sidechain packing calculations substantially reduces the number of buried unsatisfied polar groups.
]]></description>
<dc:creator>Coventry, B.</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:date>2020-06-17</dc:date>
<dc:identifier>doi:10.1101/2020.06.17.156646</dc:identifier>
<dc:title><![CDATA[Protein sequence optimization with a pairwise decomposable penalty for buried unsatisfied hydrogen bonds]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-06-17</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.03.23.003913v1?rss=1">
<title>
<![CDATA[
A generative algorithm for de novo design of proteins with diverse pocket structures 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.03.23.003913v1?rss=1"
</link>
<description><![CDATA[
To create new enzymes and biosensors from scratch, precise control over the structure of small molecule binding sites is of paramount importance, but systematically designing arbitrary protein pocket shapes and sizes remains an outstanding challenge. Using the NTF2-like structural superfamily as a model system, we developed a generative algorithm for creating a virtually unlimited number of de novo proteins supporting diverse pocket structures. The generative algorithm was tested and refined through feedback from two rounds of large scale experimental testing, involving in total, the assembly of synthetic genes encoding 7896 generated designs and assessment of their stability on the yeast cell surface, detailed biophysical characterization of 64 designs, and crystal structures of 5 designs. The refined algorithm generates proteins that remain folded at high temperatures and exhibit more pocket diversity than naturally occurring NTF2-like proteins. We expect this approach to transform the design of small molecule sensors and enzymes by enabling the creation of binding and active site geometries much more optimal for specific design challenges than is accessible by repurposing the limited number of naturally occurring NTF2-like proteins.
]]></description>
<dc:creator>Basanta, B.</dc:creator>
<dc:creator>Bick, M. J.</dc:creator>
<dc:creator>Bera, A. K.</dc:creator>
<dc:creator>Norn, C.</dc:creator>
<dc:creator>Chow, C. M.</dc:creator>
<dc:creator>Carter, L. P.</dc:creator>
<dc:creator>Goreshnick, I.</dc:creator>
<dc:creator>Dimaio, F.</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:date>2020-03-24</dc:date>
<dc:identifier>doi:10.1101/2020.03.23.003913</dc:identifier>
<dc:title><![CDATA[A generative algorithm for de novo design of proteins with diverse pocket structures]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-03-24</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.05.27.445982v1?rss=1">
<title>
<![CDATA[
Antibody structure prediction using interpretable deep learning 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.05.27.445982v1?rss=1"
</link>
<description><![CDATA[
Therapeutic antibodies make up a rapidly growing segment of the biologics market. However, rational design of antibodies is hindered by reliance on experimental methods for determining antibody structures. In recent years, deep learning methods have driven significant advances in general protein structure prediction. Here, we present DeepAb, a deep learning method for predicting accurate antibody FV structures from sequence. We evaluate DeepAb on two benchmark sets - one balanced for structural diversity and the other composed of clinical-stage therapeutic antibodies - and find that our method consistently outperforms the leading alternatives. Previous deep learning methods have operated as "black boxes" and offered few insights into their predictions. By introducing a directly interpretable attention mechanism, we show that our network attends to physically important residue pairs. For example, in prediction of one CDR H3 residue conformation, the network attends to proximal aromatics and a key hydrogen bonding interaction that constrain the loop conformation. Finally, we present a novel mutant scoring metric derived from network confidence and show that for a particular antibody, all eight of the top-ranked mutations improve binding affinity. These results suggest that this model will be useful for a broad range of antibody prediction and design tasks.

SignificanceAccurate structure models are critical for understanding the properties of potential therapeutic antibodies. Conventional methods for protein structure determination require significant investments of time and resources and may fail. Although greatly improved, methods for general protein structure prediction still cannot consistently provide the accuracy necessary to understand or design antibodies. We present a deep learning method for antibody structure prediction and demonstrate improvement over alternatives on diverse, therapeutically relevant benchmarks. In addition to its improved accuracy, our method reveals interpretable outputs about specific amino acids and residue interactions that should facilitate design of novel therapeutic antibodies.
]]></description>
<dc:creator>Ruffolo, J. A.</dc:creator>
<dc:creator>Sulam, J.</dc:creator>
<dc:creator>Gray, J. J.</dc:creator>
<dc:date>2021-05-27</dc:date>
<dc:identifier>doi:10.1101/2021.05.27.445982</dc:identifier>
<dc:title><![CDATA[Antibody structure prediction using interpretable deep learning]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-05-27</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.06.14.448402v1?rss=1">
<title>
<![CDATA[
Accurate prediction of protein structures and interactions using a 3-track network 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.06.14.448402v1?rss=1"
</link>
<description><![CDATA[
DeepMind presented remarkably accurate protein structure predictions at the CASP14 conference. We explored network architectures incorporating related ideas and obtained the best performance with a 3-track network in which information at the 1D sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The 3-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables rapid solution of challenging X-ray crystallography and cryo-EM structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate models of protein-protein complexes from sequence information alone, short circuiting traditional approaches which require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.

One-Sentence SummaryAccurate protein structure modeling enables rapid solution of structure determination problems and provides insights into biological function.
]]></description>
<dc:creator>Baek, M.</dc:creator>
<dc:creator>DiMaio, F.</dc:creator>
<dc:creator>Anishchenko, I.</dc:creator>
<dc:creator>Dauparas, J.</dc:creator>
<dc:creator>Ovchinnikov, S.</dc:creator>
<dc:creator>Lee, G. R.</dc:creator>
<dc:creator>Wang, J.</dc:creator>
<dc:creator>Cong, Q.</dc:creator>
<dc:creator>Kinch, L. N.</dc:creator>
<dc:creator>Schaeffer, R. D.</dc:creator>
<dc:creator>Millan, C.</dc:creator>
<dc:creator>Park, H.</dc:creator>
<dc:creator>Adams, C.</dc:creator>
<dc:creator>Glassman, C. R.</dc:creator>
<dc:creator>DeGiovanni, A.</dc:creator>
<dc:creator>Pereira, J. H.</dc:creator>
<dc:creator>Rodrigues, A. V.</dc:creator>
<dc:creator>van Dijk, A. A.</dc:creator>
<dc:creator>Ebrecht, A. C.</dc:creator>
<dc:creator>Opperman, D. J.</dc:creator>
<dc:creator>Sagmeister, T.</dc:creator>
<dc:creator>Buhlheller, C.</dc:creator>
<dc:creator>Pavkov-Keller, T.</dc:creator>
<dc:creator>Rathinaswamy, M. K.</dc:creator>
<dc:creator>Dalwadi, U.</dc:creator>
<dc:creator>Yip, C. K.</dc:creator>
<dc:creator>Burke, J. E.</dc:creator>
<dc:creator>Garcia, K. C.</dc:creator>
<dc:creator>Grishin, N. V.</dc:creator>
<dc:creator>Adams, P. D.</dc:creator>
<dc:creator>Read, R. J.</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:date>2021-06-15</dc:date>
<dc:identifier>doi:10.1101/2021.06.14.448402</dc:identifier>
<dc:title><![CDATA[Accurate prediction of protein structures and interactions using a 3-track network]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-06-15</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.09.29.319103v1?rss=1">
<title>
<![CDATA[
Epistasis on the stability landscape of de novo TIM barrels explored by a modular design approach 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.09.29.319103v1?rss=1"
</link>
<description><![CDATA[
The ability to design stable proteins with custom-made functions is a major goal in biochemistry with practical relevance for our environment and society. High conformational stability lowers protein sensitivity to mutations and changes in the environment; thus, understanding and manipulating protein stability will expand the applications of de novo proteins. Since the ({beta}/)8-barrel or TIM-barrel fold is one of the most common functional scaffolds, in this work we designed a collection of stable de novo TIM barrels (DeNovoTIMs), using a computational fixed-backbone and modular approach based on improved hydrophobic packing of sTIM11, the first validated de novo TIM barrel. DeNovoTIMs navigate a region of the stability landscape previously uncharted by natural TIM barrels, with variations spanning 60 degrees in melting temperature and 22 kcal per mol in conformational stability throughout the designs. Significant non-additive or epistatic effects were observed when stabilizing mutations from different regions of the barrel were combined. The molecular basis of epistasis in DeNovoTIMs appears to be related to the extension of the hydrophobic cores. This study is an important step towards the fine-tuned modulation of protein stability by design.

Significance StatementDe novo protein design expands our knowledge about protein structure and stability. The TIM barrel is a highly relevant fold used in nature to host a rich variety of catalytic functions. Here, we follow a modular approach to design and characterize a collection of de novo TIM barrels and subjected them to a thorough folding analysis. Non-additive effects modulate the increase in stability when different regions of the barrel are mutated, showing a wide variety of thermodynamic properties that allow them to navigate an unexplored region of the stability landscape found in natural TIM barrels. The design of stable proteins increases the applicability of de novo proteins and provides crucial information on the molecular determinants that modulate structure and stability.

One Sentence SummaryA family of designed TIM barrels with diverse thermodynamic properties shows epistatic effects on its stability landscape.
]]></description>
<dc:creator>Romero-Romero, S.</dc:creator>
<dc:creator>Costas, M.</dc:creator>
<dc:creator>Silva, D.-A.</dc:creator>
<dc:creator>Kordes, S.</dc:creator>
<dc:creator>Rojas-Ortega, E.</dc:creator>
<dc:creator>Guerra, Y.</dc:creator>
<dc:creator>Tapia, C.</dc:creator>
<dc:creator>Shanmugaratnam, S.</dc:creator>
<dc:creator>Rodriguez-Romero, A.</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:creator>Höcker, B.</dc:creator>
<dc:creator>Fernandez-Velasco, D. A.</dc:creator>
<dc:date>2020-10-01</dc:date>
<dc:identifier>doi:10.1101/2020.09.29.319103</dc:identifier>
<dc:title><![CDATA[Epistasis on the stability landscape of de novo TIM barrels explored by a modular design approach]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-10-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2020.12.01.406611v1?rss=1">
<title>
<![CDATA[
Designed proteins assemble antibodies into modular nanocages 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2020.12.01.406611v1?rss=1"
</link>
<description><![CDATA[
Antibodies are widely used in biology and medicine, and there has been considerable interest in multivalent antibody formats to increase binding avidity and enhance signaling pathway agonism. However, there are currently no general approaches for forming precisely oriented antibody assemblies with controlled valency. We describe the computational design of two-component nanocages that overcome this limitation by uniting form and function. One structural component is any antibody or Fc fusion and the second is a designed Fc-binding homo-oligomer that drives nanocage assembly. Structures of 8 antibody nanocages determined by electron microscopy spanning dihedral, tetrahedral, octahedral, and icosahedral architectures with 2, 6, 12, and 30 antibodies per nanocage match the corresponding computational models. Antibody nanocages targeting cell-surface receptors enhance signaling compared to free antibodies or Fc-fusions in DR5-mediated apoptosis, Tie2-mediated angiogenesis, CD40 activation, and T cell proliferation; nanocage assembly also increases SARS-CoV-2 pseudovirus neutralization by -SARS-CoV-2 monoclonal antibodies and Fc-ACE2 fusion proteins. We anticipate that the ability to assemble arbitrary antibodies without need for covalent modification into highly ordered assemblies with different geometries and valencies will have broad impact in biology and medicine.
]]></description>
<dc:creator>Divine, R.</dc:creator>
<dc:creator>Dang, H. V.</dc:creator>
<dc:creator>Ueda, G.</dc:creator>
<dc:creator>Fallas, J. A.</dc:creator>
<dc:creator>Vulovic, I.</dc:creator>
<dc:creator>Sheffler, W.</dc:creator>
<dc:creator>Saini, S.</dc:creator>
<dc:creator>Zhao, Y. T.</dc:creator>
<dc:creator>Raj, I. X.</dc:creator>
<dc:creator>Morawski, P. A.</dc:creator>
<dc:creator>Jennewein, M. F.</dc:creator>
<dc:creator>Homad, L. J.</dc:creator>
<dc:creator>Wan, Y.-H.</dc:creator>
<dc:creator>Tooley, M. R.</dc:creator>
<dc:creator>Seeger, F.</dc:creator>
<dc:creator>Fahning, M. L.</dc:creator>
<dc:creator>Etemadi, A.</dc:creator>
<dc:creator>Lazarovits, J.</dc:creator>
<dc:creator>Roederer, A.</dc:creator>
<dc:creator>Walls, A. C.</dc:creator>
<dc:creator>Stewart, L.</dc:creator>
<dc:creator>Mazloomi, M.</dc:creator>
<dc:creator>King, N. P.</dc:creator>
<dc:creator>Campbell, D. J.</dc:creator>
<dc:creator>McGuire, A. T.</dc:creator>
<dc:creator>Stamatatos, L.</dc:creator>
<dc:creator>Ruohola-Baker, H.</dc:creator>
<dc:creator>Mathieu, J.</dc:creator>
<dc:creator>Veesler, D.</dc:creator>
<dc:creator>Baker, D.</dc:creator>
<dc:date>2020-12-01</dc:date>
<dc:identifier>doi:10.1101/2020.12.01.406611</dc:identifier>
<dc:title><![CDATA[Designed proteins assemble antibodies into modular nanocages]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2020-12-01</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.04.12.437963v1?rss=1">
<title>
<![CDATA[
A memetic algorithm enables global all-atom protein-protein docking with sidechain flexibility 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.04.12.437963v1?rss=1"
</link>
<description><![CDATA[
Protein complex formation is encoded by specific interactions at the atomic scale, but the computational cost of modeling proteins at this level often requires the use of simplified energy models and limited conformational flexibility. In particular, the use of all-atom energy functions, backbone and sidechain flexibility results in rugged energy landscapes that are difficult to explore. In this study we develop a protein-protein docking algorithm, EvoDOCK, that combine the strength of a differential evolution algorithm for efficient exploration of the global search space with the benefits of a local optimization method to refine detailed atomic interactions. EvoDOCK enabled accurate and fast local and global protein-protein docking using an all-atom energy function with side-chain flexibility. Comparison with a standard method built on Monte Carlo optimization demonstrated improved accuracy and with increases in computational speed of up to 35 times. The evolutionary algorithm also enabled efficient atomistic docking with backbone flexibility.
]]></description>
<dc:creator>Varela, D.</dc:creator>
<dc:creator>Andre, I.</dc:creator>
<dc:date>2021-04-13</dc:date>
<dc:identifier>doi:10.1101/2021.04.12.437963</dc:identifier>
<dc:title><![CDATA[A memetic algorithm enables global all-atom protein-protein docking with sidechain flexibility]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-04-13</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/2021.05.05.442729v1?rss=1">
<title>
<![CDATA[
XENet: Using a new graph convolution to accelerate the timeline for protein design on quantum computers 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/2021.05.05.442729v1?rss=1"
</link>
<description><![CDATA[
Graph representations are traditionally used to represent protein structures in sequence design protocols where the folding pattern is known. This infrequently extends to machine learning projects: existing graph convolution algorithms have shortcomings when representing protein environments. One reason for this is the lack of emphasis on edge attributes during massage-passing operations. Another reason is the traditionally shallow nature of graph neural network architectures. Here we introduce an improved message-passing operation that is better equipped to model local kinematics problems such as protein design. Our approach, XENet, pays special attention to both incoming and outgoing edge attributes.

We compare XENet against existing graph convolutions in an attempt to decrease rotamer sample counts in Rosettas rotamer substitution protocol. This use case is motivating because it allows larger protein design problems to fit onto near-term quantum computers. XENet outperformed competing models while also displaying a greater tolerance for deeper architectures. We found that XENet was able to decrease rotamer counts by 40% without loss in quality. This decreased the problem size of our use case by more than a factor of 3. Additionally, XENet displayed an ability to handle deeper architectures than competing convolutions.

Author summaryGraphs data structures are ubiquitous in the field of protein design and are at the core of the recent advances in artificial intelligence brought forth by graph neural networks (GNNs). GNNs have led to some impressive results in modeling protein interactions, but are not as common as other tensor representations.

Most GNN architectures tend to put little to no emphasis on the information stored on edges; however, protein modeling tools often use edges to represent vital geometric relationships about residue pair interactions. In this paper, we show that a more advanced processing of edge attributes can lead to considerable benefits when modeling chemical data.

We introduce XENet, a new member of the GNN family that is shown to have improved ability to model protein residue environments based on chemical and geometric data. We use XENet to intelligently simplify the optimization problem that is solved when designing proteins. This task is important to us and others because it allows larger proteins to be designed on near-term quantum computers. We show that XENet is able to train on our protein modeling data better than existing methods, successfully resulting in a dramatic decrease in protein design sample space with no loss in quality.
]]></description>
<dc:creator>Maguire, J. B.</dc:creator>
<dc:creator>Grattarola, D.</dc:creator>
<dc:creator>Klyshko, E.</dc:creator>
<dc:creator>Mulligan, V. K.</dc:creator>
<dc:creator>Melo, H.</dc:creator>
<dc:date>2021-05-05</dc:date>
<dc:identifier>doi:10.1101/2021.05.05.442729</dc:identifier>
<dc:title><![CDATA[XENet: Using a new graph convolution to accelerate the timeline for protein design on quantum computers]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2021-05-05</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/749317v1?rss=1">
<title>
<![CDATA[
Novel sampling strategies and a coarse-grained score function for docking homomers, flexible heteromers, and oligosaccharides using Rosetta in CAPRI Rounds 37-45 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/749317v1?rss=1"
</link>
<description><![CDATA[
CAPRI Rounds 37 through 45 introduced larger complexes, new macromolecules, and multi-stage assemblies. For these rounds, we used and expanded docking methods in Rosetta to model 23 target complexes. We successfully predicted 14 target complexes and recognized and refined near-native models generated by other groups for two further targets. Notably, for targets T110 and T136, we achieved the closest prediction of any CAPRI participant. We created several innovative approaches during these rounds. Since Round 39 (target 122), we have used the new RosettaDock 4.0, which has a revamped coarse-grained energy function and the ability to perform conformer selection during docking with hundreds of pre-generated protein backbones. Ten of the complexes had some degree of symmetry in their interactions, so we tested Rosetta SymDock, realized its shortcomings, and developed the next-generation symmetric docking protocol, SymDock2, which includes docking of multiple backbones and induced-fit refinement. Since the last CAPRI assessment, we also developed methods for modeling and designing carbohydrates in Rosetta, and we used them to successfully model oligosaccharide-protein complexes in Round 41. While the results were broadly encouraging, they also highlighted the pressing need to invest in (1) flexible docking algorithms with the ability to model loop and linker motions and in (2) new sampling and scoring methods for oligosaccharide-protein interactions.
]]></description>
<dc:creator>Roy Burman, S. S.</dc:creator>
<dc:creator>Nance, M. L.</dc:creator>
<dc:creator>Jeliazkov, J. R.</dc:creator>
<dc:creator>Labonte, J. W.</dc:creator>
<dc:creator>Lubin, J. H.</dc:creator>
<dc:creator>Biswas, N.</dc:creator>
<dc:creator>Gray, J. J.</dc:creator>
<dc:date>2019-08-30</dc:date>
<dc:identifier>doi:10.1101/749317</dc:identifier>
<dc:title><![CDATA[Novel sampling strategies and a coarse-grained score function for docking homomers, flexible heteromers, and oligosaccharides using Rosetta in CAPRI Rounds 37-45]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2019-08-30</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/409730v1?rss=1">
<title>
<![CDATA[
Flexible backbone assembly and refinement of symmetrical homomeric complexes 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/409730v1?rss=1"
</link>
<description><![CDATA[
Symmetrical homomeric proteins are ubiquitous in every domain of life, and information about their structure is essential to decipher function. The size of these complexes often makes them intractable to high-resolution structure determination experiments. Computational docking algorithms offer a promising alternative for modeling large complexes with arbitrary symmetry. Accuracy of existing algorithms, however, is limited by backbone inaccuracies when using homology-modeled monomers. Here, we present Rosetta SymDock2 with a broad search of symmetrical conformational space using a six-dimensional coarse-grained score function followed by an all-atom flexible-backbone refinement, which we demonstrate to be essential for physically-realistic modeling of tightly packed complexes. In global docking of a benchmark set of complexes of different point symmetries -- staring from homology-modeled monomers -- we successfully dock (defined as predicting three near-native structures in the five top-scoring models) 19 out of 31 cyclic complexes and 5 out of 12 dihedral complexes.nnHighlightsO_LISymDock2 is an algorithm to assemble symmetric protein structures from monomersnC_LIO_LICoarse-grained score function discriminates near-native conformationsnC_LIO_LIFlexible backbone refinement is necessary to create realistic all-atom modelsnC_LIO_LIResults improve six-fold and outperform other symmetric docking algorithmsnC_LInnGraphical AbstractnnO_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=200 SRC="FIGDIR/small/409730_ufig1.gif" ALT="Figure 1">nView larger version (64K):norg.highwire.dtl.DTLVardef@1e4834borg.highwire.dtl.DTLVardef@167bd2eorg.highwire.dtl.DTLVardef@1b52510org.highwire.dtl.DTLVardef@1945a02_HPS_FORMAT_FIGEXP  M_FIG C_FIG
]]></description>
<dc:creator>Roy Burman, S. S.</dc:creator>
<dc:creator>Yovanno, R. A.</dc:creator>
<dc:creator>Gray, J. J.</dc:creator>
<dc:date>2018-09-06</dc:date>
<dc:identifier>doi:10.1101/409730</dc:identifier>
<dc:title><![CDATA[Flexible backbone assembly and refinement of symmetrical homomeric complexes]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2018-09-06</prism:publicationDate>
<prism:section></prism:section>
</item>
<item rdf:about="https://biorxiv.org/cgi/content/short/223511v1?rss=1">
<title>
<![CDATA[
Efficient Flexible Backbone Protein-Protein Docking for Challenging Targets 
]]>
</title>
<link>
https://biorxiv.org/cgi/content/short/223511v1?rss=1"
</link>
<description><![CDATA[
Computational prediction of protein-protein complex structures facilitates a fundamental understanding of biological mechanisms and enables therapeutics design. Binding-induced conformational changes challenge all current computational docking algorithms by exponentially increasing the conformational space to be explored. To restrict this search to relevant space, some computational docking algorithms exploit the inherent flexibility of the protein monomers to simulate conformational selection from pre-generated ensembles. As the ensemble size expands with increased protein flexibility, these methods struggle with efficiency and high false positive rates. Here, we develop and benchmark a method that efficiently samples large conformational ensembles of flexible proteins and docks them using a novel, six-dimensional, coarse-grained score function. A strong discriminative ability allows an eight-fold higher enrichment of nearnative candidate structures in the coarse-grained phase compared to a previous method. Further, the method adapts to the diversity of backbone conformations in the ensemble by modulating sampling rates. It samples 100 conformations each of the ligand and the receptor backbone while increasing computational time by only 20-80%. In a benchmark set of 88 proteins of varying degrees of flexibility, the expected success rate for blind predictions after resampling is 77% for rigid complexes, 49% for moderately flexible complexes, and 31% for highly flexible complexes. These success rates on flexible complexes are a substantial step forward from all existing methods. Additionally, for highly flexible proteins, we demonstrate that when a suitable conformer generation method exists, RosettaDock 4.0 can dock the complex successfully.nnSignificancePredicting binding-induced conformational plasticity in protein backbones remains a principal challenge in computational protein-protein docking. To date, there are no methods that can reliably dock proteins that undergo more than 1 [A] root-mean-squared-deviation of the backbones of the interface residues upon binding. Here, we present a method that samples backbone motions and scores conformations rapidly, obtaining-for the first time-successful docking of nearly 50% of flexible target complexes with backbone conformational change up to 2.2 [A] RMSD. This method will be applicable to a broader range of protein docking problems, which in turn will help us understand biomolecular assembly and protein function.
]]></description>
<dc:creator>Marze, N. A.</dc:creator>
<dc:creator>Roy Burman, S. S.</dc:creator>
<dc:creator>Sheffler, W.</dc:creator>
<dc:creator>Gray, J. J.</dc:creator>
<dc:date>2017-11-22</dc:date>
<dc:identifier>doi:10.1101/223511</dc:identifier>
<dc:title><![CDATA[Efficient Flexible Backbone Protein-Protein Docking for Challenging Targets]]></dc:title>
<dc:publisher>Cold Spring Harbor Laboratory Press</dc:publisher>
<prism:publicationDate>2017-11-22</prism:publicationDate>
<prism:section></prism:section>
</item>
</rdf:RDF>
