题目：Future Server Processor & Memory Architecture
报告人：Babak Falsafi, Professor, IEEE Fellow, Ecole Polytechnique Federale de Lausanne, Switzerland and Carnegie Mellon University, USA
Future Server Processor & Memory Architecture
Server processor and memory architecture is now undergoing a major transformation due to technological constraints and end of Dennard Scaling on the one hand and the unprecedented demand on processing massive amounts of data and the emergence of Big Data on the other.
This course will cover a variety of topics from on-chip/off-chip processor/accelerator, memory and interconnect architecture to emerging technologies such as die-stacking to bridge the gap between Big Data and efficiency.
The course will proceed as follows:
Multiprocessor Server Simulation & Design Evaluation
Computer architects have long relied on software simulation to measure dynamic performance metrics (e.g., CPI) of a proposed design. Unfortunately, with the ever-growing size and com-plexity of modern microprocessors, detailed software simulators have become four or more orders of mag-nitude slower than their hardware counterparts. The low simulation throughput is especially prohibitive for large-scale multiprocessor server systems because the simulation turnaround for these systems grows at least linearly with the number of processors. Slow simulation has barred researchers from attempting complete benchmarks and input sets or realistic system sizes on detailed simulators. In this lecture I will go over the only practical cycle-accurate full-system simulation technology for multiprocessor servers available to academics.
Various forms of the technologies discussed here are in broader use in industry.
Server Chip Cache Hierarchies & Interconnects I & II
Server workloads spend a major of their execution time and resources in the memory system including both on-chip and off-chip hierarchies. There are a number of opportunities to identify common traits in server workloads to optimize the various components of the cache hierarchies and on-chip interconnects to improve performance and cost in terms of real-estate and power.
In a series of two lectures, I will first present detailed server workload characterization results identifying these opportunities. I will then present a number of technologies for on-chip cache hierarchies, coherence directories, wait-free memory models and on-chip interconnects to exploit these opportunities and improve design.
Reference: CloudSuite & STeMS
Scaling Trends, Bandwidth & Dark Silicon
This will be a lecture covering server processor and memory scaling trends.
In this lecture I will go over the two fundamental emerging bottlenecks in server processor and memory system namely memory bandwidth and power. With techniques to circumvent bandwidth limitation, the ultimate showstopper is power eventually leading to dark silicon. I will then go over a scaling model for future server chips, and present an analysis of these emerging fundamental bottlenecks. I will then present results from us and our peers on designs towards dark silicon.
Nikos Hardavellas et al., Towards Dark Silicon in Servers, IEEE Micro, 2011.
Scale-Out Processors & Big Data Accelerators
The Vertically-Integrated Server Architecture (VISA) project targets design for dark silicon where an integrated hardware/software approach to specialization implements performance- and energy-hungry services with minimal energy. Specialization allows future technologies to utilize dark silicon effectively and maintain a constant power envelope by keeping only the needed services active on-chip, monitoring and shutting off unneeded resources. Specialization maximizes transistor efficiency and makes better use of available real-estate, achieving two or more orders of magnitude reduction in energy through a hand-in-hand collaboration of software and hardware.
I will first present technologies for specialized server processor and die-stacked memory that runs an existing server software stack transparently without any modifications. Then I will go over accelerator designs for specific data services in server chips.
Babak is a Professor in the School of Computer and Communication Sciences and the founding director of the EcoCloud research center pioneering future energy-efficient and environmentally-friendly cloud technologies at EPFL. He has made numerous contributions to computer system design and evaluation including a scalable multiprocessor architecture which was prototyped by Sun Microsystems (now Oracle), snoop filters and temporal stream prefetchers that are incorporated into IBM BlueGene/P and BlueGene/Q, and computer system simulation sampling methodologies that have been in use by AMD and HP for research and product development. His most notable contribution has been to be first to show that contrary to conventional wisdom, multiprocessor memory programming models -- known as memory consistency models -- prevalent in all modern systems are neither necessary nor sufficient to achieve high performance. He is a recipient of an NSF CAREER award, IBM Faculty Partnership Awards, and an Alfred P. Sloan Research Fellowship. He is a fellow of IEEE.