Subasi, Omer (Date of defense: 2016-10-27)
As high performance computing (HPC) systems continue to grow, their fault rate increases. Applications running on these systems have to deal with rates on the order of hours or days. Furthermore, some ...
Jokanović, Ana (Date of defense: 2014-12-19)
Network interference of nearby jobs has been recently identified as the dominant reason for the high performance variability of parallel applications running on High Performance Computing (HPC) systems. ...