Hopes and facts in evaluating the performance of HPC-I/O on a cloud environment
Keywords:application I/O model, I/O system, Cloud Cluster, I/O phases, I/O access pattern, I/O configuration
Currently, there is an increasing interest about the cloud platform by the High Performance Computing (HPC) community, and the Parallel I/O for High Performance Systems is not an exception. In cloud platforms, the user takes into account not only the execution time but also the cost, because the cost can be one of the most important issue. In this paper, we propose a methodology to quickly evaluate the performance and cost of Virtual Clusters for parallel scientific application that uses parallel I/O. From the parallel application I/O model automatically extracted with our tool PAS2P-IO, we obtain the I/O requirements and then the user can select the Virtual Cluster that meets the application requirements. The application I/O model does not depend on the underlying I/O system. One of the main benefits of applying our methodology is that it is not necessary to execute the application to select the Virtual Cluster on cloud. Finally, costs and performance-cost ratio for the Virtual Clusters are provided to facilitate the decision making on the selection of resources on a cloud platform.
 R. Exposito, G. Taboada, S. Ramos, J. Gonzalez-Dominguez, J. Tourino, and R. Doallo, “Analysis of I/O Perform ance on an Amazon EC2 Cluster Com pute and High I/O Platform,” Journal of Grid Computing, vol. 11, no. 4, pp. 613-631.
 G. Juve, E. Deelman, G. B. Berrim an, B. P. Berm an, and P. Maechling, “An Evaluation of the Cost and Performance of Scientific Workflows on Am azon EC2,” J. Grid Comput., vol. 10, no. 1, pp. 5-21, Mar. 2012.
 M. Liu, Y. Jin, J. Zhai, Y. Zhai, Q. Shi, X. M a, and W. Chen, “ACIC: Automatic Cloud I/O Configurator for HPC Applications,” in Proceedings o f the Int. Conf. on High Performance Computing, Networking, Storage and Analysis , ser. SC’13. ACM, 2013, pp. 38:1-38:12.
 P. Wong and R. F. V. D. Wijngaart, “Nas parallel benchmarks i/o version 2.4,” Computer Sciences Corporation, NASA Advanced Supercomputing (NAS) Division, Tech. Rep., 2003.
 J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. M a, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo, “Terascale direct numerical simulations of turbulent combustion using S3D,” Computational Science & Discovery, vol. 2, no. 1, p. 015001, 2009.
 StarCluster. (2014) An Open Source Cluster-Computing Toolkit for Amazon’s Elastic Compute Cloud (EC2). [Online]. Available: http://star.m it.edu/cluster/
 S. Mendez, J. Panadero, A. Wong, D. Rexachs, and E. Luque, “A New approach for Analyzing I/O in Parallel Scientific Applications,” in CACIC12, 2012, pp. 337-346.
 S. M endez, D. Rexachs, and E. Luque, “Modeling Parallel Scientific Applications through their Input/Output Phases,” in Cluster Computing Workshops, 2012 IEEE Int. Conf. on, Sept 2012, pp. 7-15.
 W. D. Norcott. (2006) IOzone Filesystem Benchmark. [Online]. Available: http://www.iozone.org/
 W. Loewe, T. McLarty, and C. Morrone. (2012) IOR Benchmark. [Online]. Available: https://github.com /chaos/ior/blob/m aster/doc/USER_GUIDE
 CESGA. (2014) Finisterrae of the centre of supercomputing of galicia (CESGA). [Online]. Available: https://www.cesga.es
 W CSS. (2014) Supernova of the Wroclaw Centre for Networking and Supercomputing (WCSS). [Online]. Available: https://www.wcss.pl
 AWS-EC2. (2014) Amazon Elastic Compute Cloud, Instance Types. [Online]. Available: http://docs.aws.amazon.com /AWSEC2/latest/UserGuide/instance-types.html