21. Bibliography

[gem14]Gem5 Visualization. 2014. http://www.m5sim.org/Visualization.
[hwa15]The Hwacha Project. 2015. http://hwacha.org.
[roc16]Rocket Microarchitectural Implementation of RISC-V ISA. 2016. https://github.com/ucb-bar/rocket.
[Asa98]K. Asanovic. Vector Microprocessors, PhD thesis. Technical Report, U.C. Berkeley, 1998.
[ABC+06]Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, and Katherine A. Yelick. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec 2006. URL: http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html.
[ALE02]Todd Austin, Eric Larson, and Dan Ernst. SimpleScalar: An Infrastructure for Computer System Modeling. Computer, 35(2):59–67, 2002. doi:http://dx.doi.org/10.1109/2.982917.
[CWB05]Chen Chang, John Wawrzynek, and Robert W. Brodersen. BEE2: A High-End Reconfigurable Computing System. IEEE Design & Test of Computers, 22(2):114–125, 2005. URL: http://dblp.uni-trier.de/db/journals/dt/dt22.html#ChangWB05.
[EAB+02]Joel Emer, Pritpal Ahuja, Eric Borch, Artur Klauser, Chi-Keung Luk, Srilatha Manne, Shubhendu S. Mukherjee, Harish Patil, Steven Wallace, Nathan Binkert, Roger Espasa, and Toni Juan. Asim: A Performance Model Framework. Computer, 35(2):68–76, 2002. doi:http://doi.ieeecomputersociety.org/10.1109/2.982918.
[GLL+90]Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta, and John Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. SIGARCH Comput. Archit. News, 18(2SI):15–26, May 1990. URL: http://doi.acm.org/10.1145/325096.325102, doi:10.1145/325096.325102.
[KAR13]E. K. Ardestani and J. Renau. ESESC: A Fast Multicore Simulator Using Time-Based Sampling. In International Symposium on High Performance Computer Architecture, HPCA‘19. 2013.
[KMK+18]Sagar Karandikar, Howard Mao, Donggyu Kim, David Biancolin, Alon Amid, Dayeol Lee, Nathan Pemberton, Emmanuel Amaro, Colin Schmidt, Aditya Chopra, Qijing Huang, Kyle Kovacs, Borivoje Nikolic, Randy Katz, Jonathan Bachrach, and Krste Asanović. Firesim: fpga-accelerated cycle-exact scale-out system simulation in the public cloud. In Proceedings of the 45th Annual International Symposium on Computer Architecture, ISCA ‘18, 29–42. Piscataway, NJ, USA, 2018. IEEE Press. URL: https://doi.org/10.1109/ISCA.2018.00014, doi:10.1109/ISCA.2018.00014.
[Kes99]R.E. Kessler. The Alpha 21264 Microprocessor. IEEE Micro, 19(2):24–36, 1999. doi:http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=755465.
[KBH+04]Ronny Krashinsky, Christopher Batten, Mark Hampton, Steve Gerding, Brian Pharris, Jared Casper, and Krste Asanovic. The vector-thread architecture. IEEE Micro, 24(6):84–90, 2004. doi:http://doi.ieeecomputersociety.org/10.1109/MM.2004.90.
[KSW+07]Alex Krasnov, Andrew Schultz, John Wawrzynek, Greg Gibeling, and Pierre-Yves Droz. RAMP Blue: A Message-Passing Manycore System in FPGAs. In International Conference on Field Programmable Logic and Applications. August 2007. URL: http://www.gigascale.org/pubs/1033.html.
[LNA08]Jae W. Lee, Man Cheuk Ng, and Krste Asanovic. Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks. In ISCA ‘08: Proceedings of the 35th International Symposium on Computer Architecture, 89–100. Washington, DC, USA, 2008. IEEE Computer Society. doi:http://dx.doi.org/10.1109/ISCA.2008.31.
[MCE+02]P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: a full system simulation platform. IEEE Computer, 2002.
[NALS06]Kyle J. Nesbit, Nidhi Aggarwal, James Laudon, and James E. Smith. Fair Queuing Memory Systems. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, 208–222. Washington, DC, USA, 2006. IEEE Computer Society. doi:http://dx.doi.org/10.1109/MICRO.2006.24.
[PRA97]Vijay S. Pai, Parthasarathy Ranganathan, and Sarita V. Adve. RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors. In In Proceedings of the Third Workshop on Computer Architecture Education. 1997.
[PHA09]Heidi Pan, Benjamin Hindman, and Krste Asanovic. Lithe: Enabling Efficient Composition of Parallel Libraries. In Workshop on Hot Topics in Parallelism (HotPar-09). USENIX, March 2009. URL: http://www.gigascale.org/pubs/1870.html.
[PEA07]Michael Pellauer, Joel Emer, and Arvind. HAsim: Implementing a Partitioned Performance Model on an FPGA. 2007. http://publications.csail.mit.edu/abstracts/abstracts07/pellauer-abstract/hasim.html.
[SWC+06]Njuguna Njoroge Sewook, Sewook Wee, Jared Casper, Justin Burdick, Yuriy Teslyar, Christos Kozyrakis, and Kunle Olukotun. Building and Using the ATLAS Transactional Memory System. In in Proceedings of the Workshop on Architecture Research using FPGA Platforms, held at HPCA12. 2006. 2006.
[Sez11]André Seznec. A new case for the tage branch predictor. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, 117–127. ACM, 2011.
[SFKS02]André Seznec, Stephen Felix, Venkata Krishnan, and Yiannakis Sazeides. Design tradeoffs for the alpha ev8 conditional branch predictor. In Computer Architecture, 2002. Proceedings. 29th Annual International Symposium on, 295–306. IEEE, 2002.
[SM06]André Seznec and Pierre Michaud. A case for (partially) tagged geometric history length branch prediction. Journal of Instruction Level Parallelism, 8:1–23, 2006.
[SWL+15]Chen Sun, Mark T Wade, Yunsup Lee, Jason S Orcutt, Luca Alloatti, Michael S Georgas, Andrew S Waterman, Jeffrey M Shainline, Rimas R Avizienis, Sen Lin, and others. Single-chip microprocessor that communicates directly using light. Nature, 528(7583):534–538, 2015.
[Wil08]S. Williams. Autotuning Performance on Multicore Computers, PhD thesis. Technical Report, U.C. Berkeley, 2008.
[WWP09]Samuel Williams, Andrew Waterman, and David Patterson. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65–76, 2009. doi:http://doi.acm.org/10.1145/1498765.1498785.
[WWFH06]Roland E. Wunderlich, Thomas F. Wenisch, Babak Falsafi, and James C. Hoe. Statistical sampling of microarchitecture simulation. ACM Transactions on Modeling and Computer Simulation, 16(3):197 – 224, 2006. doi:10.1145/1147224.1147225.
[Yea96]K.C. Yeager. The MIPS R10000 Superscalar Microprocessor. IEEE Micro, 16(2):28–41, 1996. doi:http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=491460.
[CadenceDSystems]Cadence Design Systems. Palladium Accelerator/Emulator. http://www.cadence.com/products/functional_ver/palladium/.
[MicrosoftResearch]Microsoft Research. Berkeley Emulation Engine 3. http://research.microsoft.com/en-us/projects/BEE3/.