In recent years, grid technology has emerged as an important tool for solving compute-intensive problems within the scientific community and in industry. To further the development and adoption of this technology, researchers and practitioners from different disciplines have collaborated to produce standard specifications for implementing large-scale, interoperable grid-systems. The focus of this activity has been the Open Grid Forum, but other standards development organizations have also produced specifications that are used in grid systems. To date, these specifications have been used to build operational systems that provide basic grid functions. However, to fully realize the potential of grid technology, it also will be critical to ensure that grid systems are highly reliable and that specifications used to build these systems fully support reliable grid services. Similarly, it will be necessary to ensure that grid systems continue to be reliable under conditions of scale, heterogeneity, and dynamism. This study surveys work on grid reliability that has been done in recent years and reviews progress made toward achieving these goals. The survey identifies important issues and problems that researchers are working to overcome in order to develop reliability methods for large-scale, heterogeneous, dynamic environments. The survey also illuminates reliability issues relating to standard specifications used in grid systems, identifying existing specifications that may need to be evolved and areas where new specifications are needed to better support reliability.