Subquery allocations in distributed databases using genetic algorithms
Minimization of query execution time is an important performance objective in distributed databases design. While total time is to be minimized for On Line Transaction Processing (OLTP) type queries, response time has to be minimized in Decision Support type queries. Thus different allocations of subqueries to sites and their execution plans are optimal based on the query type. We formulate the subquery allocation problem and provide analytical cost models for these two objective functions. Since the problem is NP-hard, we solve the problem using genetic algorithm (GA). Our results indicate query execution plans with total minimization objective are inefficient for response time objective and vice versa. The GA procedure is tested with simulation experiments using complex queries of up to 20 joins. Comparison of results with exhaustive enumeration indicates that GA produced optimal solutions in all cases in much less time.
 Baiao, F., Mattoso, M. & Zaverucha, G. 2004. A Distribution Design Methodology for Object DBMS. Journal of Distributed and Parallel Databases. 16 (1), 45-90.
 Bergsten, B., Couprie, M. & Valduriez, P. 1993. Overview of Parallel Architectures for Database. The Computer Journal, 36, 734-740.
 Carey, M. J. & Livny, M. 1991. Conflict Detection Tradeoffs for Replicated Data. ACM Transactions on Database Systems. 16, 703-746.
 Cheng, C-H, Lee, W-K, & Wong, K-F. 2002. A Genetic Algorithm-Based Clustering Approach for Database Partitioning. IEEE Transactions on Systems, Man, and Cybernetics, 32(3), 215-230.
 Cornell, D.W. & Yu, P.S. 1989. On optimal site assignment for relations in the distributed database environment. IEEE Transactions on Software Engineering, 15 (8), 1004-1009.
 Davis, L. 1991. Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York, N.Y.
 Du, J, Alhajj, R, & Barker, K. 2006. Genetic algorithms based approach to database vertical partitioning. Journal of Intelligent Information Systems, 26 (2), 167-183
 Frieder, O. & Baru, C. 1994. Site and Query Scheduling Policies in Multicomputer Database Systems. IEEE Transactions on Knowledge and Data Engineering, 6(4), 609-619.
 Gog, A. & Grebla, H-A. 2005. Evolutionary Tuning for Distributed Database Performance. The 4th International Symposium on Parallel and Distributed Computing, 275-281.
 Goldberg, D.E. 1989. Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley Publishing.
 Gorla, N. 2001. An Object-Oriented Database Design for Improved Performance. Data and Knowledge Engineering, 37, 117-138.
 Graefe, G. 1993. Query Evaluation Techniques for Large Databases. ACM Comp. Surveys, 25, 73-90.
 Johansson, J M, March, S T, & Naumann, J D. 2003. Modeling Network Latency and Parallel Processing in Distributed Database Design. Decision Sciences, 34 (4), 677-706.
 Kossmann, D. 2000. The State of the Art in Distributed Query Processing. ACM Computing Surveys. 32(4), 422-469.
 Li, B. & Jiang, W. 2000. A novel stochastic optimization algorithm. IEEE Trans. on Systems, Man, and Cybernetics, Part B, 30(1).
 Lim, S-J & Ng, Y-K. 1997. Vertical Fragmentation and Allocation in Distributed Deductive Database Systems. Information Systems, 22(1), 1-24.
 Ma, H. & Kirchberg, M. 2008, Cost based fragmentation for distributed complex value databases. Lecture Notes in Comp. Sci., 4801, 72-86
 March, S.T. & Rho, S. 1995. Allocating Data and Operations to Nodes in Distributed Database Design.IEEE Trans. on Knowledge and Data Engg, 7(2).
 Martin, T., Lam K. & Russel, J. 1990. An Evaluation of Site Selection Algorithms for Distributed Query Processing. The Computer Journal, 33(1), 61-70.
 Sacco, G. & Yao, S.B. 1982. Query Optimization in Distributed Data Base Systems. Advances in Computer, 21, 225-273.
 Song, S-K & Gorla, N. 2000. A Genetic Algorithm for Vertical Fragmentation and Access Path Selection. The Computer Journal, 43(1), 81-93.
 Schaffer, J. D., Caruana, R. A., Eshlman, L. J., & Das, R. 1989. A Study of Control Parameters Affecting Online Performance of Genetic Algorithms for Function Optimization, In J. D. Schaffer, (ed.), Proceedings of the Third International Conference on Genetic Algorithms, 51-60
 Tam, K.Y. 1992. Genetic algorithms, function optimization, and facility layout design. European Journal of Operations Research, 63(2).
 Tamhankar, A.M. & Ram, S. 1998. Database Fragmentation and Allocation: An Integrated Methodology and Case Study. IEEE Trans. on Systems, Man, and Cybernetics, 28(3), 288-305.
 Kulkarni, U R, & Jain, H K. 1993. Interaction Between Concurrent Transactions in the Design of Distributed Databases. Decision Sciences, 24(2), 253-277
 Yu, C.T. & Chang, C.C. 1994. Distributed Query Processing. ACM Computing Surveys, 16, 399-433.
 Yu, C. T., Chang, C., Templeton, M., Brin, D. & Lund, E. 1985. Query Processing in a Fragmented Relational Distributed System: Mermaid. IEEE Transactions on Software Engineering, 11, 795-809.
Copyright (c) 2010 Narasimhaiah Gorla, Suk-Kyu Song
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.