“111”引智平台“高等并行计算机系统结构”系列讲座——Seminars on Advanced Parallel Computer Architecture
Advanced Topics in Computer Architecture

报告题目:Toward Exascale Resilience

报告人:Mattan Erez, 副教授,美国德克萨斯大学奥斯汀分校(University of Texas at Austin, USA)






The march toward supercomputing performance that reaches exaflops and the scientific applications that scale to utilize them continues at a steady pace. Along the way, resilience concerns and fears are increasing in importance and prominence. In this short course I will give an overview of different fault, error, and failure modes, and discuss their importance and some common and widely-diverging estimates of their expected rates. I will then discuss various solutions that span both incremental and more revolutionary potential solutions. I will describe the memory system in detail as it plays a major role in current resilient platforms and then describe various proposed techniques that span multiple system layers, from hardware to runtimes to the programmer, algorithm, and tools. This short course will be researchy in nature as this is a relatively new and rapidly evolving topic.


Mattan Erez is an Associate Professor at the Department of Electrical and Computer Engineering at the University of Texas at Austin. His research focuses on improving the performance, efficiency, and scalability of computing systems through advances in hardware architecture, software systems, and programming models. The vision is to increase the cooperation across system layers and develop flexible and adaptive mechanisms for proportional resource usage. Mattan received a B.Sc. in Electrical Engineering and a B.A. in Physics from the Technion, Israel Institute of Technology and his M.S and Ph.D. in Electrical Engineering from Stanford University. He is a recipient of a 2012 Presidential Early Career Award in Science and Engineering (PECASE; awarded 2014), a 2012 Early Career Research Award from the Department of Energy, and a 2010 NSF CAREER Award.

Mattan Erez是美国得克萨斯大学奥斯汀分校电子与计算机工程系副教授。他的研究兴趣主要在于如何通过发展硬件体系结构、软件系统和编程模型来提升计算系统的性能、效率和可扩展性,为获得灵活、自适应的资源均衡利用机制进行广泛的跨不同系统层次的协作。Mattan在以色列理工学院获得电子工程和物理学双学士,在斯坦福大学获得电子工程硕士和博士学位。他是2012年美国科学与工程青年科学家总统奖、美国能源部青年科学家奖和2010年美国自然基金青年科学家奖获得者。


