USRC at DSN 2017

USRC’s Lissa Baseman, Dr. Li Tan, and Olena Tkachenko are at The 47th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2017) where they are presenting on resilience and machine learning research at USRC.

Lissa Baseman presented in the Industry Track III: Dependability
Data and Security the paper Automating DRAM Fault Mitigation By Learning From Experience (slides).  USRC intern, Olena Tkachenko, provided much of the analysis for this work and the paper is in collaboration with AMD and Sandia National Laboratories.

Dr. Tan presented at the RADIANCE (International Workshop on Recent Advances in the DependabIlity AssessmeNt of Complex systEms).  His presentation was entitled RSVP: Soft Error Resilient Power Savings at Near-ThresholdVoltage using Register Vulnerability (slides) and was co-authored by other USRC members.

USRC at HPDC 2017

This week is The 26th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC).  Opening the conference this week was the 7th Fault Tolerance for HPC at eXtreme Scale (FTXS) Workshop.  FTXS is a workshop co-created by USRC’s Dr. Nathan DeBardeleben and has been run by Nathan ever since.

In the International Workshop on Runtime and Operating Systems for Supercomputers (ROSS 2017) on Tuesday, the paper UNITY: Unified Memory and File Space will be presented.  This work includes contributions by USRC’s Mike Lang, Latchesar Ionkov, and Doug Otstott.
USRC’s Dr. Qiang Guan and Dr. Nathan DeBardeleben have a paper in the main conference (19% acceptance rate) primarily authored by USRC alumnus Bo Fang, entitled LetGo: A Lightweight Continuous Framework for HPC Applications Under Failures (slides).

The poster session included work by USRC alumnus Song Huang and work in progress by current USRC PhD student, Zongze Li.

USRC at ISC HPC 2017

USRC members Dr. Nathan DeBardeleben (resilience) and Dr. John Bent (storage) are at ISC HPC 2017 this week.

John Bent (ISC person page) presented at BoF 08: The Virtual Institute for I/O & the IO-500 Tuesday morning (slides “IO-500″).

Nathan DeBardeleben (ISC person page) presented at Fault Tolerance for Next Generation High Performance Computing on Evaluating Parallel Application Resiliency with the Software Fault Injector, PFSEFI (slides) Wednesday.

Nathan presented on work by Dr. Li Tan on injecting faults into the FleCSALE parallel application.