Memory disambiguation hardware: a Review

Authors

  • Fernando Castro ArTeCS Group, Department of Computer Arquitecture, Complutense University, Madrid, Spain
  • Daniel Chaver ArTeCS Group, Department of Computer Arquitecture, Complutense University, Madrid, Spain
  • Luis Piñuel ArTeCS Group, Department of Computer Arquitecture, Complutense University, Madrid, Spain
  • Manuel Prieto ArTeCS Group, Department of Computer Arquitecture, Complutense University, Madrid, Spain
  • Francisco Tirado Fernández ArTeCS Group, Department of Computer Arquitecture, Complutense University, Madrid, Spain

Keywords:

LSQ, memory disambiguation, energy-efficiency, filtering, hardware simplification

Abstract

One of the main challenges of modern processor designs is the implementation of scalable and efficient mechanisms to detect memory access order violations as a result of out-of-order execution. Conventional structures performing this task are complex, inefficient and power-hungry. This fact has generated a large body of work on optimizing address-based memory disambiguation logic, namely the load-store queue. In this paper we review the most significant proposals in this research field, focusing on our own contributions.

Downloads

Download data is not yet available.

References

[1] J. Tendler, J. Dodson, J. Fields, H. Le and B. Sinharoy, “Power4 System Microarchitecture”, IBM Journal of Research and Development, Vol 46, No. 1, 2002, pp. 5-25.
[2] R. Kessler, “The Alpha 21264 Microprocessor”, IEEE Micro, Vol. 9, No. 2, 1999, pp. 24-36.
[3] A. Moshovos, S. Breach, T. Vijaykumar and G. Sohi. “Dynamic Speculation and Synchronization of Data Dependences”. In Int’l Symp. on Computer Architecture, 1997, pp. 181-193.
[4] G. Chrysos and J. Emer. “Memory Dependence Prediction using Store Sets”. In Int’l Symp. on Computer Architecture, 1998, pp. 142-153.
[5] S. Subramaniam and G. Loh. “Store Vectors for Scalable Memory Dependence Prediction and Scheduling”. In Int’l Symp. on High-Performance Computer Architecture, 2006, pp. 65-76.
[6] M. Goshima, K. Nishino, Y. Nakashima, S. Mori, T. Kitamura and S. Tomita. “A High-Speed Dynamic Instruction Scheduling Scheme for Superescalar Processors. In Int’l Symp. on Microarchitecture, 2001, pp. 225-236.
[7] C. Fang, S. Carr, S. Onder and Z. Wang. “Feedback-Directed Memory Disambiguation through Store Distance Analysis”. In Int’l Conference on Supercomputing, 2006, pp. 278-287.
[8] S. Sethumadhavan, R. Desikan, D. Burger, C. R. Moore, S. W. Keckler. “Scalable Hardware Memory Disambiguation for High ILP Processors”. In Int’l Symp. on Microarchitecture, 2003, pp. 399-410.
[9] B. Bloom, “Space/Time Trade-offs in Hash Coding with Allowable Errors”, Communications of the ACM, Vol. 13, No. 7, 1970, pp. 422-426.
[10] I. Park, C. L. Ooi, T. N. Vijaykumar. “Reducing Design Complexity of the Load-Store Queue”. In Int’l Symp. on Microarchitecture, 2003, pp. 411-422.
[11] T. Sha, M. M. K. Martin, A. Roth. “Scalable Store–Load Forwarding via Store Queue Index Prediction”. In Int’l Symp. on Microarchitecture, 2005, pp. 159-170.
[12] L. Baugh and C. Zilles, “Decomposing the Load-Store Queue by Function for Power Reduction and Scalability”, IBM Journal of Research and Development, Vol. 50, No. 2-3, 2006, pp. 287-298.
[13] A. Roth. “A High-Bandwidth Load-Store Unit For Single- and Multi- Threaded Processors”. Technical report (CIS), Development of Computer and Information Science, University of Pennsylvania, 2004.
[14] S. S. Stone, K. M. Woley and M. I. Frank. “Address-Indexed Memory Disambiguation and Store-to-Load Forwarding”. In Int’l Symp. on Microarchitecture, 2005, pp. 171-182.
[15] H. Akkary, R. Rajwar and S. Srinivasan. “Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors”. In Int’l Symp. on Microarchitecture, 2003, pp. 423-434.
[16] E. Torres, P. Ibañez, V. Viñals and J. Llaberia. “Store Buffer Design in First-Level Multibanked Data Caches”. In Int’l Symp. on Computer Architecture, 2005, pp. 469-480.
[17] S. Sethumadhavan, F. Roesner, J. S. Emer, D. Burger and S. W. Keckler. “Late-Binding: Enabling Unordered Load-Store Queues. In Int’l Symp. on Computer Architecture, 2007, pp. 347-357.
[18] H. W. Cain and M. H. Lipasti. “Memory Ordering: a Value-Based Approach”. In Int’l Symp. on Computer Architecture, 2004, pp. 90-101.
[19] A. Roth. “Store Vulnerability Window (SVW): ReExecution Filtering for Enhanced Load Optimization”. In Int’l Symp. on Computer Architecture, 2005, pp. 458-468.
[20] S. Subramaniam and G. Loh. “Fire-and-Forget: Load-Store Scheduling with no Store Queue”. In Int’l Symp. on Microarchitecture, 2006, pp. 273-284.
[21] F. Castro, D. Chaver, L. Piñuel, M. Prieto, M. Huang and F. Tirado “Load-Store Queue Management: an Energy-Efficient Design Based on a State-Filtering Mechanism”. In Int’l Conference on Computer Design, 2005, pp. 617-624.
[22] A. Garg, F. Castro, M. Huang, L. Piñuel, D. Chaver and M. Prieto. “Substituting Associative Load Queue with Simple Hash Table in Out-of-Order Microprocessors”. In Int’l Symp. on Low-Power Electronics, 2006, pp. 268-273.
[23] F. Castro, L. Piñuel, D. Chaver, M. Prieto, M. Huang and F. Tirado “DMDC: Delayed Memory Dependence Checking through Age-Based Filtering”. In Int’l Symposium on Microarchitecture, 2006, pp. 297-308.
[24] F. Castro, R. Noor, A. Garg, D. Chaver, M. Huang, L. Piñuel, M. Prieto and F. Tirado. “Replacing Associative Load Queues: a Timing-Centric Approach”. To appear in IEEE Transactions on Computers, 2008.

Downloads

Published

2008-10-01

How to Cite

Castro, F., Chaver, D., Piñuel, L., Prieto, M., & Tirado Fernández, F. (2008). Memory disambiguation hardware: a Review. Journal of Computer Science and Technology, 8(03), p. 132–138. Retrieved from https://journal.info.unlp.edu.ar/JCST/article/view/754

Issue

Section

Invited Articles