LANL Collaborators

USRC’s LANL collaborators drive the research and development in a number of computer science areas.  Below is the current list of LANL collaborators.

Dr. Nathan DeBardeleben : USRC Director and Senior Research Scientist

Dr. Nathan DeBardeleben

USRC Director and Senior Research Scientist

Nathan conducts resilience and fault-tolerance research with research scientists, students, and visiting professors. Among interests in hardware reliability, algorithmic design, and a general interest in computing on (perhaps extremely) unreliable hardware, Nathan is project lead of the Fine-Grained Soft Error Fault Injection (F-SEFI) framework. F-SEFI is a tool for exploring how real applications running on real systems tolerate emulated soft errors. The tool injects soft errors with extreme precision at specific points in a running application on real hardware, with real OS kernels, and real middleware (not PIN-based, not LLVM-based, not source code modification). F-SEFI builds on an open source virtual machine and processor emulator to emulate faulty hardware but does so only in ways to affect the application of interest, thereby making it more tractable to study how applications respond to specific types of soft errors. Additionally, Nathan conducts field studies on Department of Energy supercomputers to study memory and processor resilience including correctable faults and uncorrectable errors.
Mike Lang : Research Scientist

Mike Lang

Research Scientist

Michael has been working with UNIX systems for over twenty years, joining LANL from 1999 to 2010 as a member of the Performance and Architecture Lab (PAL) focusing on performance of large-scale systems. Currently he is the team leader for Ultrascale Systems Research focusing on resilient scalable systems software for large-scale systems. He received his MS in Electrical Engineering from University of New Mexico, and BS in Computer Engineering from UNM.
Dr. Howard Pritchard : Research Scientist

Dr. Howard Pritchard

Research Scientist

Howard Pritchard is researcher in HPC network software. He is actively involved in the Open MPI project and Open Fabrics Interfaces Working Group. He is also involved in the OpenSHMEM community, and leads a project to combine this programming model with the Habanero asynchronous task-based runtime. Before joining USRC and LANL, Howard was a Principal Engineer at Cray Inc. where he worked on the design and implementation of various components of the Cray XE and XC network software stack.
Dave Montoya : Research Scientist

Dave Montoya

Research Scientist

Dave works in the intersection of application and architectures. HPC Software Environments encompass what is needed by users, developers and system individuals. Workflow characterization and quantification is being used to map the need with performance metrics captured to map the direction needed for that community as well as vendor architecture efforts. Dave is also involved in cross-lab programming environment open-source projects, monitoring efforts, and university projects.
Sean Blanchard : Research Scientist

Sean Blanchard

Research Scientist

Sean works on kernel level support of systems in research and production at Los Alamos. At USRC he researches Soft-error Resilience and in Scalable System Software.
Dr. Bradley Settlemyer : Research Scientist

Dr. Bradley Settlemyer

Research Scientist

Brad Settlemyer is a storage systems researcher and systems programmer specializing in high performance computing. He received his Ph.D in computer engineering from Clemson University in 2009 and works as a research scientist in Los Alamos National Laboratory's HPC Design group. He has published papers on emerging storage systems, long distance data movement, network modeling, and storage system algorithms.
Dr. Laura Monroe : Research Scientist

Dr. Laura Monroe

Research Scientist

Laura is a researcher in resilience and novel computing techniques, especially probabilistic computing. Her current interest is the design of algorithms and systems to address expected increasing fault rates in hardware in a probabilistic manner. Another interest is the application of discrete mathematics to the design and understanding of computing systems. She also led the production visualization effort at LANL for many years, and was the originator and project leader of the recent redesign and redeployment of the LANL visualization corridor, encompassing the computing systems, networking, and display systems used for LANL ASC large-scale visualization. She served on the design teams for the Cielo and Trinity supercomputers and was one of the designers of the Viewmaster visualization compute cluster. She has published in the areas of probabilistic computing and algorithms, resilience, error-correcting codes, virtual reality and visualization. She received her Ph.D. In Mathematics and Computer Science in the field of Error-Correcting Codes, working with Dr. Vera Pless.
Hugh Greenberg : Research Scientist

Hugh Greenberg

Research Scientist

Hugh participated in the design and implementation of the Linux Noise Detective. The Linux Noise detective is a Linux kernel module and a GUI to collect process data directly from the kernel (on multiple cluster nodes simultaneously) and analyze the data to determine the sources of system noise. He also participated in the design and the development of the XGet file transfer software. XGet scalably transfers files to nodes within a cluster by building a tree of participants and delegating serving duties to optimal slave nodes. He participated in the development of the XCPU cluster management system. XCPU keeps the state of the cluster distributed across all nodes, allowing easy configuration of hot-spare management nodes and graceful failover that doesn't require canceling the running jobs in case of head node failure.
Lissa Baseman : Research Scientist

Lissa Baseman

Research Scientist

Lissa is an applied machine learning researcher and data scientist working on the resilience and fault-tolerance team. At USRC, her work spans using statistical relational models for fault characterization and mitigation as well as developing anomaly detection techniques for large-scale monitoring of supercomputing facilities.  Before joining USRC, Lissa contributed to quantum algorithms for machine learning at LANL’s Center for Nonlinear Studies. Her background, including work on social network analysis with the Human Language Technology group at MIT Lincoln Laboratory and a short time at a startup back in Massachusetts, is primarily in the development and application of probabilistic graphical models to new relational and/or temporal domains. Lissa received her MS in Computer Science from the University of Massachusetts Amherst and her BA, also in Computer Science, from Amherst College.
Lucho Ionkov : Research Scientist

Lucho Ionkov

Research Scientist

Lucho co-developed the v9fs filesystem, which is now a standard part of the Linux kernel distribution. His previous work includes CellFS programming model and XCPU and XCPU2 process-management systems which addressed issues of large-scale system complexity, resiliency, and manageability.At USRC, Lucho works on scalable system software and accelerated access to application data.
Dr. Qiang Guan : Research Scientist

Dr. Qiang Guan

Research Scientist

Dr. Qiang Guan is a computer scientist at Los Alamos National Laboratory and the Ultra-scale System Research Center (USRC) since Nov 2015. He obtained his Ph.D. degree in Computer Science and Engineering from the University of North Texas, Denton, Texas, in 2014 (Ph.D. advisor: Dr. Song Fu). He received his M.S. degree in Information Engineering from Myongji University, Seoul, South Korea, in 2008 and his B.S. degree in Communication Engineering from Northeastern University, Shenyang, China, in 2005. His research interests include, soft error fault injection, data visualization, virtualization, resilience, cloud performance modeling and optimization, cloud dependability and reliability, power management and green computing, resource management, data mining and machine learning, signal processing and image processing.