I'm very impressed by how they take responsibility for this, in their words: "HE...

theoj · on April 27, 2011

Do they have another option? If no site had survived the outage, then they may have been on to something. But with some sites surviving the outage, they just have no excuse.

Sure you can blame AWS because they said multiple availability zones in the same region would work. But at the same time there is an expectation that a site like Heroku is knowledgeable enough and sophisticated enough to intelligently process what AWS says and determine what's appropriate for them.

jarin · on April 27, 2011

It's sort of counter-intuitive, but taking responsibility for something (even if it's not directly your fault) often has the effect of deflecting some of the anger from your customers/clients/boss/etc.

Personally, I prefer to just get the blame part out of the way by taking responsibility and concentrate on the important things: fixing the problem and making sure it doesn't happen again.

I think that deep down, people aren't that concerned with who's fault it was. They just want to know that someone is going to fix it.

ghshephard · on April 27, 2011

The reason why you don't want to take responsibility, is that the liability comes along with that. If Heroku took the position that the AWS outage was a force de majeure, then their liability for recompense to their customers would have been minimized.

By suggesting they take responsibility, they also are in a position where they have to make good for all of the downtime their customers experienced.

Short term - that will be an expensive decision. Long term, I think it's the right thing to do. It certainly builds up my confidence level in them.

kowsik · on April 27, 2011

As a PaaS vendor, they are supposed to abstract away from IaaS failures. And they were not to use a single region to host all their apps. I love Heroku and will continue to use them as long as I have the option to add affinity to my dynos and workers to spread across multiple regions of my choice. Coupled with anycast DNS support, this will be a very compelling offering, if they can pull it off. During the outage, all of our scale engines (http://blitz.io) and our CouchDB cluster across the other AWS regions held up, but since the web-tier was down, the whole app went down.