Using AWS EC2 as Test-Bed infrastructure in the I/O system configuration for HPC applications
Keywords:cloud computing, parallel file system, PVFS2, MPI applications
In recent years, the use of public cloud platforms as infrastructure has been gaining popularity in many scientific areas and High Performance Computing (HPC) is no exception. These kinds of platforms can be used by system administrators as Test-Bed systems for evaluating and detecting performance inefficiencies in the I/O subsystem, and for taking decisions about the configuration parameters that have influence on the performance of an application, without compromising the performance of the production HPC system. In this paper, we propose a methodology to evaluate parallel applications by using virtual clusters as a test system. Our experimental validation indicates that virtual clusters are a quick and easy solution for system administrators, for analyzing the impact of the I/O system on the I/O kernels of the parallel applications and for taking performance decisions in a controlled environment.
 N. Miller, R. Latham, R. Ross, and P. Carns, “improving cluster performance with pvfs2,” ClusterWorld Magazine, vol. 2, no. 4, 2004.
 M. Liu, J. Zhai, Y. Zhai, X. Ma, and W. Chen, “One optimized i/o configuration per hpc application: Leveraging the configurability of cloud,” in Proceedings of the Second Asia-Pacific Workshop on Systems, APSys ’11, pp. 15:1–15:5, 2011.
 G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. P. Berman, and P. Maechling, “Data sharing options for scientific workflows on amazon ec2,” in 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–9, 2010.
 R. R. Expósito, G. L. Taboada, S. Ramos, J. González-Domı́nguez, J. Touriño, and R. Doallo, “Analysis of i/o performance on an amazon ec2 cluster compute and high i/o platform,” Journal of grid computing, vol. 11, no. 4, pp. 613–631, 2013.
 C. Vecchiola, S. Pandey, and R. Buyya, “High-performance cloud computing: A view of scientific applications,” in 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks, pp. 4–16, 2009.
 J. Zhai, M. Liu, Y. Jin, X. Ma, and W. Chen, “Automatic cloud i/o configurator for i/o intensive parallel applications,” IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 12, pp. 3275–3288, 2015.
 H. Herodotou, F. Dong, and S. Babu, “No one (cluster) size fits all: Automatic cluster sizing for data-intensive analytics,” in Proceedings of the 2Nd ACM Symposium on Cloud Computing, SOCC ’11, pp. 18:1–18:14, 2011.
 P. Carns, K. Harms, W. Allcock, C. Bacon, S. Lang, R. Latham, and R. Ross, “Understanding and improving computational science storage access through continuous characterization,” Trans. Storage, vol. 7, pp. 8:1–8:26, Oct. 2011.
 W. Loewe, T. MacLarty, and M. C, IOR Benchmark, 2012. Accessed: 2016-05-14.
 R. Ross, R. Thakur, and A. Choudhary, “Achievements and challenges for i/o in computational science,” Journal of Physics: Conference Series, vol. 16, no. 1, pp. 501+, 2005.
 Prabhat and Q. Koziol, High Performance Parallel I/O. Chapman & Hall/CRC Computational Science, 1 ed., 2014.
 J. T. Simpson, K. Wong, S. D. Jackman, J. E. Schein, S. J. Jones, and I. Birol, “Abyss: a parallel assembler for short read sequence data,” Genome research, vol. 19, no. 6, pp. 1117–1123, 2009.
 CACAU, “Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas,” tech. rep., Universidade Estadual de Santa Cruz, 2016.
 LRZ, “Leibniz Supercomputing Centre,” tech. rep., Bayerischen Akademie der Wissenschaften, 2016.
 P. A. Pevzner, H. Tang, and M. S. Waterman, “An eulerian path approach to dna fragment assembly,” Proc Natl Acad Sci USA, vol. 98, pp. 9748–53, Aug. 2001.
 PVFS2, “Config File Description,” tech. rep., PVFS2, 2016.