Figure 1 Stress distribution in a metal line (Courtesy of Dr. Sukharev)
Figure 2 the EM-induced stress develoipment in the
metal wire over time.
|
Dr. Sheldon Tan (PI),
Xin Huang,
Yan Zhu, Sahana Swarup, Taeyoung Kim,
Zao
Liu (Intel Corp), Xuexin Liu (Synopsys)
Industry
liaisons:
1.
Dr. Valeriy Sukharev, Mentor Graphics
Corporation
2.
Dr. Ashish X. Gupta, Intel Corporation
3.
Dr. Jinjun Xiong, IBM Research
4.
Dr. Logendran Bharatham, Freescale
Semiconductor, Inc.
Academic
Collaborator:
Dr.
Hai Wang, The University of Electronic Science and Technology of China,
Chengdu, China
Dr. Haibao Chen, Shanghai Jiaotong
University, Shanghai, China
We appreciate the following funding
agencies for their generous supports of this project.
1. National Science Foundation, NSF FRS (Failure Resistant Systems) program (CCF-1255899), ÒThermal-Sensitive System-Level Reliability Analysis and
Management for Multi-Core and 3D MicroprocessorsÓ, $180K, April 1, 2013 to March. 31, 2016. PI (single PI).
2. Semiconductor Research Corporation, NSF/SRC Multi-core Program (SRC 2013-TJ-2417), ÒThermal-Sensitive System-Level Reliability Analysis and Management for
Multi-Core and 3D MicroprocessorsÓ, $120K, April 1st, 2013
to Match 30, 2016, PI.
Awards:
1. Dr. Valeriy Sukharev received the
prestigious SRC Mahboob Khan Outstanding Industry Liaison Award!
a.
Mahboob
Khan Outstanding Industry Liaison/Associate Awards recognizes those individuals
who demonstrate outstanding commitment and effectiveness in facilitation of
university research, mentoring of graduate students, and dissemination of
knowledge and research results to industry.
b.
Dr.
Sukharev has been selected as a recipient of one of the 2014 Mahboob Khan
Outstanding Industry Liaison/Associate Awards. His dedication and
personal contributions as a liaison to SRC research programs under the
direction of Dr. Sheldon Tan, University of California Riverside on SRC
research #2417.001 - Thermal-Sensitive System-Level Reliability Analysis and
Management for Multi-Core and 3D Microprocessors has served to strengthen our
industry. SRC laud his efforts and hold his accomplishments as a role
model for others.
c. The Mahboob Khan
Outstanding Industry Liaison/Associate Awards will be presented at the SRC
TECHCON 2014 banquet on Monday, September 8th in Austin, TX.
2.
X. Huang, T. Yu, V.
Sukharev, S. X.-D. Tan, ÒPhysics-based
electromigration assessment for power grid networksÓ, Proc. IEEE/ACM
Design Automation Conference (DACÕ14),
San Francisco, June, 2014. (Best
Paper Award Nomination (12 out of 787 submissions, 1.5%))
3.
H. Chen, S. X.-D. Tan, X. Huang, V.
Sukharev, ÒNew electromigration
modeling and analysis considering time-varying temperature and current
densitiesÓ, Proc. Asia South Pacific
Design Automation Conference (ASP-DACÕ15), Chiba, Japan, Jan. 2015. .(Best
Paper Award Nomination)
Reliability has become a significant challenge for the
current multi-core and emerging 3D microprocessor design. Aggressive transistor scaling and
increasing processor power density leads to excessive on-chip temperature and
increases the risk that microprocessors will fail. Many long-term failure
mechanisms are very sensitive to the temperature or temperature changes such as
electro-migration, stress migration and thermal-cycling. The elevated
temperature and temperature gradients due to continuous integration in
multi-core and emerging 3D microprocessors have significant adverse effects on
those reliability issues. Wear-out based long-term reliability issues
traditionally were addressed in the process and manufacturing stages. But as
reliability becomes a major design constraint for nanometer VLSI systems, it
must be addressed at different layers. As a result, there is an urgent need for
reliability awareness and optimization at the micro-architectural design
stage. Since temperature has
exponential impacts on many failure issues, it is crucial to have accurate and
fast thermal estimation for reliability analysis and even optimization at the
architecture and package levels.
This project addresses the fundamental challenges in
system-level reliability modeling, analysis and optimization. The project
consists of the following thrusts:
First, we propose to develop architecture-level full-chip
reliability modeling and analysis techniques considering new structures of
integration techniques and dominant hard failure mechanisms. Then we will develop reliability-aware
dynamic thermal management techniques for the multi-core and 3D stacking
microprocessors. We will focus on the task migration and dynamic voltage and
frequency scaling based thermal management techniques.
Second, we propose to develop full-chip thermal estimation
and prediction techniques considering realistic conditions such as limited
physical thermal sensors, presence of errors in thermal and power models, for
run time system-level reliability analysis and optimization. For fast thermal analysis and estimation
at the design stage, we also propose a module-based hierarchical thermal
analysis techniques, which promises both accuracy and efficiency.
We
expect the following results coming from this research:
1. Development
of architecture-level full-chip reliability modeling and analysis techniques.
2. Development
of reliability-aware dynamic thermal management techniques for the multi-core
and 3D stacking microprocessors.
3. Design
full-chip thermal estimation and prediction techniques considering practical
limited thermal sensors, noise errors,
for run-time thermal management and optimization
1. Nanyang
Technological University, School of Electrical and Electronic Engineering, Singapore, Singapore , ÒThermal Modeling, Estimation and Prediction for Package Design and
On-Chip Temperature RegulationÓ,. Aug. 16, 2011.
3. Mentor Graphics
Corp, Calibre Group, Fremont, CA, ÒThermal Modeling and
Analysis Research for High-Performance Package and Chip DesignÓ, Dec. 14, 2011.
4. MediaTek
Singapore Pte Ltd, Singapore, ÒThermal Analysis and Runtime Management Research for Multi-core
MicroprocessorsÓ, July 27,
2012.
8.
The University of Hong
Kong, Department of Electrical and Electronic Engineering, Hong Kong, China, ÒNew More Physics-Based Full-Chip
Electron-migration Modeling and AnalysisÓ, Jan. 24, 2014. Host: Prof. Ngai Wong
of Univ. of HK.
9.
The University of
California at San Diego, Department of Electrical and Computer Engineering, San
Diego, CA. ÒNew Physics-Based Full-Chip Electron-Migration Analysis and System-level
Reliability ManagementÓ, April 23, 2014. Host: Prof. Chung-Kuan Cheng of UCSD.
10.
The Institute of
Computing Technologies, State Key Lab of Computer Architecture, Chinese Academy
of Science, Beijing, China, ÒPhysics-Based Full-Chip Electron-Migration Analysis
and System-level Reliability ManagementÓ, July 4th, 2014. Host: Prof. Yu Hu of ICT,
CAS.
11.2nd
International Workshop on Cross-layer Resiliency (IWCR 2014), USC Information Science Institute
(ISI), Marina del Rey, CA, ÒPhysics-Based
Full-Chip Electron-Migration Modeling and System-level Reliability ManagementÓ,
July 28, 2014.
12.
EDA workshop, Daejeon
Convention Center, Daejeon, Korea, ÒPhysics-Based
Full-Chip Electron-Migration Modeling and Cross-Layer Reliability ManagementÓ,
August 26, 2014.
J1 D. Li, S. X.-D. Tan, E. H. Pacheco, M. Tirumala, ÒParameterized architecture-level thermal modeling for multi-core microprocessorsÓ, ACM Transaction on Design Automation of Electronic Systems (TODAES), vol. 15, no. 2, pp.1-22, February 2010 (one of top 10 downloaded ACM TODAES Articles published in 2010).
J2 T. Eguia, S. X.-D. Tan, R. Shen, D. Li, E. H. Pacheco, M. Tirumala, L. Wang, ÒGeneral parameterized thermal modeling for high-performance microprocessor designÓ, IEEE Transactions on Very Large Scale Integrated Systems (TVLSI), Vol. 20, No. 2, pp.221-224, Feb. 2012. 10.1109/TVLSI.2010.2098054.
J3 H. Wang, S. X.-D. Tan, D. Li, A. Gupta, Y. Yuan, ÒComposable Thermal Modeling and Simulation for Architecture-Level Thermal Designs of Multi-core MicroprocessorsÓ, ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 18, no. 2, March 2013.
J4
Z. Liu, S.
X.-D. Tan, H. Wang, Y. Hua, and A. Gupta, ÒCompact thermal modeling for packaged
microprocessor design with practical power mapsÓ, Integration, The VLSI Journal, vol. 47, no. 1, January 2014. (One of the most downloaded papers in 2014
after its publication, 178 downloads in 3 months) see: http://www.journals.elsevier.com/integration-the-vlsi-journal/most-downloaded-articles/ Online access: http://www.sciencedirect.com/science/article/pii/S0167926013000412
J5
Z. Liu, S. X.-D. Tan, X. Huang and H.
Wang, ÒTask migrations for distributed thermal management considering transient
effectsÓ, IEEE Transactions on Very Large
Scale Integrated Systems
(TVLSI), (in press).
J6
Z. Liu, S. Swarup, S.
X.-D. Tan, H. Chen, H. Wang, ÒCompact lateral
thermal resistance model of TSVs for fast finite-difference based thermal
analysis of 3D stacked ICsÓ, IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems (TCAD), vol. 33, no. 10. Oct.
2014.
C1
H. Wang, S. X.-D. Tan, X. Liu, A. Gupta, ÒRuntime power
estimator calibration for high-performance microprocessorsÓ, Proc. Design, Automation and Test in Europe
(DATE'12), pp.352-357, Dresden, Germany, March 2012.
C2
Z. Liu, S. X.-D. Tan, H. Wang, A. Gupta,
and S. Swarup , ÒCompact
nonlinear thermal modeling of packaged integrated systemsÓ, Proc. Asia South Pacific Design Automation
Conference (ASP-DACÕ13), pp. 157-162, Yokohama, Japan, Jan. 2013
C3
Z. Liu, T. Xu, S.
X.-D. Tan, and H. Wang, ÒDynamic
thermal management for multi-core microprocessors considering
transient thermal effectsÓ, Proc. Asia South Pacific Design Automation
Conference (ASP-DACÕ13), pp.473-478, Yokohama, Japan, Jan. 2013.
C4
H. Wang, S. X.-D. Tan, S. Swarup, and X. Liu, ÒA power-driven
thermal sensor placement algorithm for dynamic thermal managementÓ, Proc. Design, Automation and Test in Europe
(DATE'13), pp.1215-1220, Grenoble, France, March 2013.
C5 Z. Liu, S. Swarup,
and S. X-D. Tan, ÒCompact lateral
thermal resistance modeling and characterization for TSV and TSV arrayÓ, Proc.
IEEE/ACM International Conf. on Computer-Aided Design (ICCADÕ13), San Jose, CA, Nov. 2013.
C6
Z.
Liu, X. Huang, S. X.-D. Tan, H.
Wang, H. Tang, ÒDistributed task migration for thermal hot spot reduction in
many-core microprocessorsÓ, in Proc.
International Conference on ASIC (ASICONÕ13), Shenzhen, China, Oct. 2013
C7 Y.
Chi, S. X.-D. Tan, T. Yu, X. Huang and N. Wong, ÒDirect
finite-element-based solver for 3D-IC thermal analysis via H-matrix
representationÓ, Proc. Int. Symposium on
Quality Electronic Design (ISQEDÕ14), San Jose, CA,
March, 2014.
C8 X. Huang, T. Yu, V. Sukharev, S. X.-D. Tan, ÒPhysics-based electromigration
assessment for power grid networksÓ, Proc. IEEE/ACM Design Automation
Conference (DACÕ14), San
Francisco, June, 2014. (Best Paper Award Nomination (12 out of 787
submissions, 1.5%))
C9
Z.
Liu, X. Huang, V. Sukharev and S.
X.-D. Tan, ÒEM-reliability system modeling and performance optimization for
high-performance microprocessorsÓ, TECHCONÕ2014
, Austin, TX, Sept. 2014.
C10 V. Sukharev, X. Huang, H. Chen and S. X.-D. Tan, ÒIR-drop based electromigration assessment: parametric failure chip-scale analysisÓ, Proc. IEEE/ACM International Conf. on Computer-Aided Design (ICCADÕ14), San Jose, CA, Nov. 2014.
C11 T. Kim, B. Zheng, H. Chen, Q. Zhu, V. Sukharev and S. X.-D. Tan, ÒLifetime optimization for real-time embedded systems considering electromigration effectsÓ Proc. IEEE/ACM International Conf. on Computer-Aided Design (ICCADÕ14), San Jose, CA, Nov. 2014.
C12 H. Chen, S. X.-D. Tan, X. Huang, V.
Sukharev, ÒNew electromigration
modeling and analysis considering time-varying temperature and current
densitiesÓ, Proc. Asia South Pacific
Design Automation Conference (ASP-DACÕ15), Chiba, Japan, Jan. 2015. .(Best Paper Award Nomination)