Solving a big-data problem with GPU: the network traffic analysis
The number of devices connected to the Internet has increased significantly and will grow exponentially in the near future, it is due to the lower costs. It is expected that next years, data traffic via Internet increases up to values around zettabyte. As a consequence of this increase, it can be observed that the data traffic is growing faster than the capacity of their processing. In recent years, the identification of Internet traffic generated by different applications has become one of the major challenges for telecommunications networks. This characterization is based on understanding the composition and dynamics of Internet traffic to improve network performance. To analyse a huge amount of data generated by networks traffic in real time requires more power and capacity computing. A good option is to apply High Performance Computing techniques in this problem, especifically use Graphics Processing Unit (GPU). Its main characteristics are high computational power, constant development and low cost, besides provides a kit of programming called CUDA. It offers a GPUCPU interface, thread synchronization, data types, among others. In this paper we present the causes of increasing data volumes circulating on the network, data analysis and monitoring current techniques, and the feasibility of combining data mining techniques with GPU to solve this problem and speed up turnaround times.
 A.Lakhina, K.Papagiannaki, M.Crovella, C.Diot, E.Kolaczyk, and N.Taft, “Structural analysis of network traffic flows,” in SIGMETRICS Perform. Eval. Rev, vol. 32, no. 1, June 2004, pp. 61–72.
 M. Sullivan, “Tribeca: A stream database manager for network traffic analysis,” in Proceedings of the 22th International Conference on Very Large Data Bases, ser. VLDB ’96. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1996, pp. 594–604.
 R.Antonello, S.Fernandes, C.Kamienski, D.Sadok, J.Kelner, I.Gódor, G.Szabó, and T.Westholm, “Deep packet inspection tools and techniques in commodity platforms: Challenges and trends,” Journal of Network and Computer Applications, vol. 35, no. 6, pp. 71 863 – 1878, 2012. [Online]. Available: http://www.sciencedirect.com/science/article/pii
 A. Dainotti, A. Pescape, and K. Claffy, “Issues and future directions in traffic classification,” Network, IEEE, vol. 26, no. 1, pp. 35–40, January 2012.
 W. Didimo, G. Liotta., and S. Romeo, “Graph visualization techniques for conceptual web site traffic analysis,” in Pacific Visualization Symposium (PacificVis), 2010 IEEE, March 2010, pp. 193–200.
 M. Barlow, Real-Time Big Data Analytics: Emerging Architecture. O’Reilly, February 2013.
 V. Mayer-Schönberger and K. Cukier, Big Data: A Revolution That Will Transform How We Live, Work, and Think, H. Mifflin, Ed. Houghton Mifflin Harcourt, march 2013.
 J.Manyika, M.Chui, B.Brown, J.Bughin, R.Dobbs, C.Roxburgh, and A.Byers, “Big data: The next frontier for innovation, competition, and productivity,” McKinsey Global Institute, Tech. Rep., 2011.
 D. Kirk and W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach, ser. Applications of GPU Computing Series. Elsevier Science, 2010.
 N. Wilt, The CUDA Handbook: A Comprehensive Guide to GPU Programming. Pearson Education, 2013. [Online]. Available: http://books.google.com.ar
 H.Suárez, “Minerı́a de datos, big data y seguridad.” in Instituto Nacional de TecnologÃas de la Comunicaciones (INTECO), 2013.
 D.Loshin, Big Data Analytics From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph, M. Kaufmann, Ed. Elsevier Science & Technology Books, 2013.
 I. S. Institute, Transmission Control Protocol: DARPA Internet Program Protocol Specification. Defense Advanced Research Projects Agency, Information Processing Techniques Office, 1981.
 A. Callado, C. Kamienski, G. Szabo, B. Gero, J. Kelner, S. Fernandes, and D. Sadok, “A survey on internet traffic identification,” Communications Surveys Tutorials, IEEE, vol. 11, no. 3, pp. 37–52, August 2009.
 M.Clos, “A framework for network traffic analysis using gpus,” Master’s thesis, Universitat Politécnica de Catalunya (UPC) Escola Técnica Superior dEnginyeria de Telecomunicació de Barcelona (ETSETB), 2010.
 “Simple network management protocol,” 2013.
 “Tcpdump and libpcap.” [Online]. Available: www.tcpdump.org
 “Cloud services.” [Online]. Available: http://www.aos5.com/cloud
 [Online]. Available: https://www.wireshark.org/
 Cisco. [Online]. Available: http://www.cisco.com/c/en/us/products/index.html
 [Online]. Available: http://www.jumper.net/techpubs/software/erx/junose80/swconfig-ip-services/html/ip-jflow-stats-config2.html
 L. Nagios Enterprises, “The industry standard in it infrastructure monitoring,” http://www.nagios.org/.
 Z. Inc., “Unified it monitoring and analytics for the modern datacenter,” http://www.zenoss.com/.
 A. Wang, C. Talcott, L. Jia, B. Loo, and A. Scedrov, “Analyzing BGP instances in maude,” in Formal Techniques for Distributed Systems - Joint 13th IFIP-WG, 6.1 International Conference Proceedings, 2011, pp. 334–348.
 W. Chen, Statistical Methods in computer security. CRC Press, 2004.
 Y. Wang, Statistical techniques for Network security. IGI Global, 2008.
 R. Bejtlich, The TAO of network security: Beyond Intrusion Detection. Addison-Wesley Professional, 2004.
 Cisco, “World leading open-source ids/ips snort,” https://www.snort.org/.
 Y. Wang, I. Kim, G. Mbateng, and S. Ho, “A latent class modeling approach to detect network intrusion,” Computer Communications, vol. 30, no. 1, pp. 93–100, 2006.
 D. Barbard, N. Wu, and Jajodia, “Detecting novel network intrusions using bayes estimators.” 2001.
 S. S. Lee W. and K. Mok, “A data mining framework for building intrusion detection models.” in Proceedings of the IEEE Symposium on Security and Privacy.
 J. Owens, M. Houston, D. Luebke, S. Green, J. Stone, and J. Phillips, “GPU Computing,” in IEEE, vol. 96, no. 5, 2008, pp. 879 – 899.
 NVIDIA, “Nvidia cuda compute unified device architecture, programming guide version 4.2.” in NVIDIA, 2012.
 “Gpu + in-memory data management for big data analytics.”
 P. K. Chong, E. Karuppiah, and K. K. Yong, “A multi-gpu framework for in-memory text data analytics,” in Advanced Information Networking and Applications Workshops (WAINA), 2013 27th International Conference on, March 2013, pp. 1411–1416.
 J. Zhang, S. You, and L. Gruenwald, “High-performance spatial query processing on big taxi trip data using gpgpus,” in Big Data (BigData Congress), 2014 IEEE International Congress on, June 2014, pp. 72–79.
 W. Wu, P. DeMar, D. J. Holmgren, A. Singh, and R. Pordes, “G-netmon: A gpu-accelerated network performance monitoring system for large scale scientific collaborations,” CoRR, vol. abs/1108.1785, 2011.
 P.Lopes, S.Fernandes, W.Melo, and D.Sadok, “Gpu-oriented stream data mining traffic classification,” in IEEE Symposium on Computers and Communications, ISCC 2014, Funchal, Madeira, Portugal, June 23-26, 2014, 2014, pp. 1–7.
 A.Feitoza, S.Fernandes, P.Gomes-Lopes, D.Sadok, and G. Szabo, “Multi-gigabit traffic identification on gpu,” in Proceedings of the First Edition Workshop on High Performance and Programmable Networking, ser. HPPN’13. New York, NY, USA: ACM, 2013, pp. 39–44.