Simple answer is a question, what have you done to avoid failures?
I’ve been working for years working for Customers to help resolve loss of service in their internal and public applications and I have to say it is a little surprising the lack of understanding of what is really happening or what are the limitations (not only technical) existing.
WebSites and Portals are complex applications exposed to critical consumption patterns. In addition technology solutions quite often follows Conway’s law. The result is a funny mess.
You have nowadays lots of technologies, trained people and strategies/approaches for improving your applications, but have in mind the following: resources are not infinite