Tuesday, September 25, 2012

Who's mess is this anyway?

If your organization has a security incident of some kind, who's responsible? During the incident and most of the response, assigning responsibility likely isn't useful - but in the aftermath, as much as we would like to all hold hands and sing - there are a number of reasons to identify who's at fault. I realize this isn't a popular viewpoint (especially for folks who feel that they might be the person blamed) but I would like to propose that

  • It's not always as obvious who is at fault
  • In some cases "blame" resides with the organization (or lack thereof)
  • The cause for blame isn't always as simple as you might think

It's pretty easy to point the finger at the IT person responsible for information security. Obviously, if they had been doing their job, the incident would not have happened, right? Wrong, or maybe mostly wrong. No doubt there are a lot of times when the CISO, CSO, Dir IT Security, etc fall asleep at the switch. I did it myself back in 1985 and it wasn't a pleasant day for me or anyone working on our systems. I had followed the instructions for setting up an anonymous FTP server for HP-UX, but failed to clean out the stub password file required in the chrooted directory. The passwords were encrypted, but we had no password quality measures in place (see my earlier post on the subject) and a few of the passwords were easily decrypted. My fault - quickly remedied and no serious harm done.

A few years later, in 1988 the Morris worm hit, and the (fairly benign) payload infected our sendmail servers. In that case, I won't as readily accept the blame. It was the FIRST worm of it's type to propagate using the Internet, it was the original "zero day" worm and there were no good defenses or natural defenses. As far as I'm concerned, this one falls under the category "Defenses Fail." I could have been better prepared for it I suppose, and today I would be.

Not many folks have experience in root cause analysis, and 5 whys gets turned into a reflexive 6th, "Why do this?" nearly every time I suggest it. Folks shy away from getting to the bottom of what happened for a lot of reasons, and I'll try to list a few here:


  1. If it was MY screw-up, I'd rather not be blamed. Understandable, but not helpful from an organizational standpoint. Id rather a team member owned up to the issue so I could help them with avoiding the next. Sure, some places may have a zero tolerance policy for this sort of thing, but in those cases, you should be using the company printers to collate and bind your resume anyway.
  2. If I know I'M not to blame, I still don't want to throw Bob under the bus. Same principle applies here as #1. What we don't identify, we can't fix. Bob's worth helping, and organizationally we're money ahead if we can fix Bob in-place rather than eventually looking for his replacement.
  3. Blame doesn't help anyone. Here again, from an organizational point of view, understanding why an adverse event happened, so that we can address the underlying issue clearly helps almost everyone. We'd all like to get on with our lives after something like this happens, but we'd also prefer to not have to deal with the same problem tomorrow.
  4. We don't have time to figure it out. My favorite. The event just cost you 45 hours of team resources, and since we don't know exactly why, there's a good chance it could happen again (and again.) We can't spend 5 hours trying to prevent the stream of 45 hour events? 
I'm reminded of an incident from a few years ago where a client of mine was being relentlessly attacked and DoSed. That kind of thing happens all of the time, and there's often no easily determined proximate cause, but in this case we eventually discovered that an employee had (while sitting at his desk, using the company network and more importantly, the companies external IP address) REALLY annoyed someone online. That someone decided to shut him up and see if he could get him fired in the process. In this case, there was no existing policy to prevent what had happened (organizational failing) but we also had a little conversation about common sense with the employee while adding a new section to the existing policy documents.

In a well run organization, root cause analysis and remediation work shouldn't make people fear for their jobs, or be prohibitably expensive. It should be just an expected part of any reaction to an adverse event.

No comments:

Post a Comment