Multipath Fault-Tolerant Routing Policies to deal with Dynamic Link Failures in High Speed Interconnection Networks
In this thesis, we present fault-tolerant routing policies based on concepts of adaptability and deadlock freedom, capable of serving interconnection networks affected by a large number of dynamic link failures. The strongest point of this thesis is that it provides a simple but complete solution to the problem of dynamic fault tolerance in interconnection networks. The proposed solution does not require any information about network faults when the system is started or restarted. Throughout the thesis, we present the conception, design, implementation and evaluation of two contributions. The first of these contributions is the adaptive multipath routing method Fault-Tolerant Distributed Routing Balancing (FT-DRB). This method has been designed to exploit the communication path redundancy available in many network topologies, allowing interconnection networks to perform in the presence of a large number of faults. The second contribution is the scalable deadlock avoidance technique Non-blocking Adaptive Cycles (NAC), specifically designed for interconnection networks suffering from a large number of failures. This technique has been designed and implemented with the aim of ensuring freedom from deadlocks in the proposed fault-tolerant routing method FT-DRB.
 G. Zarza, D. Lugones, D. Franco, and E. Luque. A Multipath Fault-Tolerant Routing Method for High-Speed Interconnection Networks. In Euro-Par 2009: Proceeding of the 4th International Euro-Par Conference on Parallel Processing, volume 5704 of LNCS, pages 1078--1088, August 2009. doi:10.1007/978-3-642-03869-3 99.
 G. Zarza, D. Lugones, D. Franco, and E. Luque. Deadlock Avoidance for Interconnection Networks with Multiple Dynamic Faults. In 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing (PDP), pages 276--280, Feb 2010. doi:10.1109/PDP.2010.82.
 G. Zarza, D. Lugones, D. Franco, and E. Luque. FT-DRB: A Method for Tolerating Dynamic Faults in High-Speed Interconnection Networks. In 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing (PDP), pages 77--84, Feb 2010. doi:10.1109/PDP.2010.65.
 G. Zarza, D. Lugones, D. Franco, and E. Luque. Fault-tolerant Routing for Multiple Permanent and Non-permanent Faults in HPC Systems. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), pages 144--150, Las Vegas, NV, USA, July 2010.
 G. Zarza, D. Lugones, D. Franco, and E. Luque. Non-blocking Adaptive Cycles: Deadlock Avoidance for Fault-tolerant Interconnection Networks. In IEEE International Conference on Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), pages 1--4, Sep 2010. doi:10.1109/CLUSTERWKSP.2010.5613085.