Parallelizing Protein Folding

Current Contributors: Shawna Thomas, Lawrence Rauchwerger, Nancy Amato

Supported By: King Abdullah University of Science and Technology (KAUST), NSF, Dept. of Education, Texas Higher Education Coordinating Board

The protein-folding problem is a study of how a protein dynamically folds to its so-called native state — an energetically stable, three-dimensional conformation. Understanding this process is of great practical importance since some devastating diseases such as Alzheimer's and bovine spongiform encephalopathy (Mad Cow) are associated with the misfolding of proteins. We have developed a new computational technique for studying protein folding that is based on probabilistic roadmap methods for motion planning. Our technique yields an approximate map of a protein's potential energy landscape that contains thousands of feasible folding pathways. We have validated our method against known experimental results. Other simulation techniques, such as molecular dynamics or Monte Carlo methods, require many orders of magnitude more time to produce a single, partial trajectory.

In this work we report on our experiences parallelizing our method using STAPL (Standard Template Adaptive Parallel Library) that is developed by the Parasol Lab. An efficient parallel version will enable us to study larger proteins with increased accuracy. We demonstrate how STAPL enables portable efficiency across multiple platforms, ranging from small Linux clusters to massively parallel machines such as IBM's BlueGene/L, without user code modification. We obtained good speedups on multiple platforms, ranging from small linux clusters, to distributed shared memory machines, to massively parallel machines.

The following are performance results for three types of proteins:

Speedups for Linux Cluster A It consists of four boards, each of which has two processors and 2 GB RAM. Two boards have 1 GHz processors with 256 KB caches, and two boards have 1.1 GHz processors with 512 KB caches. They are connected with a Gbit dedicated Ethernet switch.

Total time	Each phase for Protein CTXIII

Speedups for SGI Altix 3700 A distributed shared-memory machine in the Texas A&M University Supercomputing facility. It contains 32 nodes, each with two pairs of 1.3 GHz 64-bit processors, and 256 GB RAM.

Total time	Each phase for Protein A

Speedups for MCR A large, dedicated Linux cluster at the Lawrence Livermore National Laboratory. It has 1152 nodes with two 2.4 GHz processors and 4 GB RAM each. They are connected with a Gbit Ethernet switch.

Total time	Each phase for Protein G

Speedups for BlueGene/L A scalable massively parallel 180 Teraflop machine which will have up to 65,536 compute nodes, each with 256 MB of memory, configured as a 64x32x32 three-dimensional torus. Each node has a single ASIC and 256 MB of memory.

Total time	Each phase for Protein CTXIII

Related Publications

Parallel Protein Folding with STAPL, Shawna Thomas, Gabriel Tanase, Lucia K. Dale, Jose E. Moreira, Lawrence Rauchwerger, Nancy M. Amato, Concurrency and Computation: Practice and Experience, Vol: 17, Issue: 14, pp. 1643-1656, Dec 2005. DOI: https://doi.org/10.1002/cpe.950
Keywords: Parallel Planning, Protein Folding, STAPL
Links : [Published]

BibTex

@article{https://doi.org/10.1002/cpe.950,
author = {Thomas, Shawna and Tanase, Gabriel and Dale, Lucia K. and Moreira, Jose M. and Rauchwerger, Lawrence and Amato, Nancy M.},
title = {Parallel protein folding with STAPL},
journal = {Concurrency and Computation: Practice and Experience},
volume = {17},
number = {14},
pages = {1643-1656},
keywords = {protein folding, motion planning, parallel libraries, C++},
doi = {https://doi.org/10.1002/cpe.950},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.950},
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.950},
abstract = {Abstract The protein-folding problem is a study of how a protein dynamically folds to its so-called native state—an energetically stable, three-dimensional conformation. Understanding this process is of great practical importance since some devastating diseases such as Alzheimer's and bovine spongiform encephalopathy (Mad Cow) are associated with the misfolding of proteins. We have developed a new computational technique for studying protein folding that is based on probabilistic roadmap methods for motion planning. Our technique yields an approximate map of a protein's potential energy landscape that contains thousands of feasible folding pathways. We have validated our method against known experimental results. Other simulation techniques, such as molecular dynamics or Monte Carlo methods, require many orders of magnitude more time to produce a single, partial trajectory. In this paper we report on our experiences parallelizing our method using STAPL (Standard Template Adaptive Parallel Library) that is being developed in the Parasol Lab at Texas A\&M. An efficient parallel version will enable us to study larger proteins with increased accuracy. We demonstrate how STAPL enables portable efficiency across multiple platforms, ranging from small Linux clusters to massively parallel machines such as IBM's BlueGene/L, without user code modification. Copyright © 2005 John Wiley \& Sons, Ltd.},
year = {2005}
}

Abstract

The protein-folding problem is a study of how a protein dynamically folds to its so-called native state - an energetically stable, three-dimensional conformation. Understanding this process is of great practical importance since some devastating diseases such as Alzheimer's and bovine spongiform encephalopathy (Mad Cow) are associated with the misfolding of proteins. We have developed a new computational technique for studying protein folding that is based on probabilistic roadmap methods for motion planning. Our technique yields an approximate map of a protein's potential energy landscape that contains thousands of feasible folding pathways. We have validated our method against known experimental results. Other simulation techniques, such as molecular dynamics or Monte Carlo methods, require many orders of magnitude more time to produce a single, partial trajectory. In this paper we report on our experiences parallelizing our method using STAPL (Standard Template Adaptive Parallel Library) that is being developed in the Parasol Lab at Texas A&M. An efficient parallel version will enable us to study larger proteins with increased accuracy. We demonstrate how STAPL enables portable efficiency across multiple platforms, ranging from small Linux clusters to massively parallel machines such as IBM's BlueGene/L, without user code modification.

Parallel Protein Folding with STAPL, Shawna Thomas, Nancy M. Amato, 18th International Parallel and Distributed Processing Symposium (IPDPS), pp. 189-, Santa Fe, NM, USA, Apr 2004. DOI: 10.1109/IPDPS.2004.1303204
Keywords: Parallel Planning, Protein Folding, STAPL
Links : [Published]

BibTex

@INPROCEEDINGS{1303204,
author={S. {Thomas} and N. M. {Amato}},
booktitle={18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.},
title={Parallel protein folding with STAPL},
year={2004},
volume={},
number={},
pages={189-},
doi={10.1109/IPDPS.2004.1303204}}

Abstract

Summary form only given. The protein folding problem is to study how a protein dynamically folds to its so-called native state - an energetically stable, three-dimensional configuration. Understanding this process is of great practical importance since some devastating diseases such as Alzheimer\'s and bovine spongiform encephalopathy (Mad Cow) are associated with the misfolding of proteins. In our group, we have developed a new computational technique for studying protein folding that is based on probabilistic roadmap methods for motion planning. Our technique yields an approximate map of a protein\'s potential energy landscape that contains thousands of feasible folding pathways. We have validated our method against known experimental results. Other simulation techniques, such as molecular dynamics or Monte Carlo methods, require many orders of magnitude more time to produce a single, partial, trajectory. We report on our experiences parallelizing our method using STAPL (the standard template adaptive parallel library), that is being developed in the Parasol Lab at Texas A&M. An efficient parallel version enables us to study larger proteins with increased accuracy. We demonstrate how STAPL enables portable efficiency across multiple platforms without user code modification. We show performance gains on two systems: a dedicated Linux cluster and an extremely heterogeneous multiuser Linux cluster.