If you are a Blackberry user (I am), you probably discovered sometime early this morning that you were not receiving email messages… and then you no doubt learned that pretty much all Blackberries in the entire Western hemisphere were offline since last night. For email, that is… they still worked as a phone, but I mean, you don’t really get a Blackberry for the phone aspect. At this point, basically every major news outlet is covering the story, and I’m sure we can expect the stories to continue for quite some time. The service seems to be back up now (mine is, anyway), but I’m sure it will take a bit for it to be restored everywhere.
Working in my home office today, I actually didn’t notice the outage until I did one of my very occasional scans of Twitter and saw Chris Brogan complaining being stuck on a train without email access. Knowing Chris, I figured I’d give him a quick call and was rewarded with this great quote about his trip around New York:
“Yeah, I was just in Penn Station and there were all these dudes in suits looking down at their hands and getting increasingly frantic!” (Chris Brogan)
Indeed! Given how much the financial industry (as well as the US government!) relies on Blackberries, I’m sure there were a heck of a lot of frantic calls being made all morning.
As a network technology geek, I’ll be curious to see what information comes out later about the cause. ComputerWorld is running speculation that it may have had to do with issues with one of RIM’s Network Operations Centers (NOCs), but that is, at the moment, purely speculation and may be a red herring. (Although it does raise another issue of why RIM has two NOCs that are both located in Canada. With a global service such that they have, I would have thought that they would have gone for greater geographic distribution!)
In any event, something like this will definitely serve to remind people of how addicted they are to “push e-mail” and will undoubtedly cause larger customers to ask RIM serious questions about network availability (and perhaps to consider other alternatives). Having some friends working at RIM (with whom I have not touched base), I can only hope they get it all sorted out rather soon.
Since it was a multiple-carrier outage, it was certainly an outage at RIM’s NOC. Unlike a lot of mobile-messaging things, RIM still handles the processing that you’d expect that the carrier was doing (even for enterprise BlackBerrys!), so if it affects two unrelated carriers, it’s just a matter of following things upstream.
But don’t confuse geographic distribution with international distribution. We made that mistake once 🙂 The main advantage of distributing things internationally is legal; if country A decides to interfere with your business, you move it to country B where you have infrastructure ready to go.
I don’t know where RIM’s second NOC is (I thought there was only one, in Waterloo), but you get much better geographic diversity without the hassle of international regulation, shipping, and so on by locating in, say, Toronto and Vancouver than you do in Toronto and Boston.
When I had systems colocated in opposite-coast US cities I didn’t realize any advantages at all over having them in equivalently-equipped colocation in major Canadian cities, but I did have problems getting gear shipped around and getting into the US to do work on the equipment.
(Of course, even then geographic distribution only helps if you suffer a failure related to geography. The sort of failure that local redundancy wouldn’t address is a pretty catastrophic one, and it’s almost always more practical to build in redundancy in one place than it is to build two data centres and keep one on hot standby.)
Rich,
Yes, you are quite right… geography doesn’t really matter (unless, of course, there’s a major power outage affecting the entire region) as much as, say, *network* distribution, i.e. if there’s a network outage, you’d ideally like to have your other NOC(s) on different networks/backbones/etc. so that they are not affected by the outage.
Thanks for the comment. Given that you have some rather direct experience in this area, I appreciate your insight.
Dan
When I built a disaster recovery data center for my previous employer, we ended up putting it three towns away. Everyone admitted that it was silly. The second site had slightly different power grid routes, and it used two completely different telco fibre rings from two vendors, two CO’s, etc. But really. Three towns away? The disaster would’ve whacked both most likely.
Cost. It cost too much to keep a staff at a far site, too much to make the far site cold. So we made a hot split three towns away and dealt.
No comment about BlackBerry, but just thoughts from a frustrated dude looking at his palm. : )