Transitioning from traditional Windows auditing to the new auditing framework

Beginning with Vista, Windows auditing moves from 9 “legacy” categories to 52 subcategories nested beneath those 9 legacy categories. And as if that wasn’t enough change, beginning with Vista there is also a renumber of *every* security event ID, changing many of the events, combining quite a few, and enhancing the quality and quantity of information within many events. This combination of changes can leave a Windows administrator lost when looking at security logs.

So that we are all on the same page, let’s double back on how things worked prior to vista.

Traditionally, via group policy (or local policy) you configure what categories (for either success or failure ) can be audited, and then where applicable, you configure a SACL on the resources to generate that kind of audit in your security log. For some types of events, there is no a SACL to configure; simply enabling the category results in security events being generated when the applicable action happens. Domain administrators traditionally configure categories in a sane manner for their entire domain, leaving out configurations which they deem to generate noise. And there was quite a bit of collaborative work around “good configurations” from various organizations focused on this legacy approach to auditing. But there is almost no work yet based on the new auditing paradigm.

By now, you are probably very curious to see a list of the new subcategories. And the nice thing is that you can ask your vista/ws2008 box what they are. Run “auditpol.exe /get /category:*” at a command prompt to see that list. And as a bonus, you’ll also see what the active auditing policy is for that box.

So if you continue using your legacy auditing category settings with Vista and beyond, those legacy categories apply to all the subcategories nested beneath them, i.e. if I enable success audits for the logon/logoff category, then all the logon/logoff subcategories are enabled (and those subcategories are: Account Lockout, IPsec Extended Mode, IPsec Main Mode, IPsec Quick Mode, Logoff, Logon, Network Policy Server, Other Logon/Logoff Events, Special Logon). In other words, you’ll see lots of noisy ipsec events, even if all you want is logon events. What is worse is that as of today, there is no direct ability via group policy to configure subcategories. And as I already pointed out, there are high volume subcategories which are mostly noise within categories where high interest events occur; so you want to be able to configure at the subcategory level.

There are indirect methods to use group policy to configure subcategories (using scripts and scheduled tasks), and there is a group policy setting which tells vista and beyond to ignore the legacy category settings they get from group policy and use whatever is set locally. Moving to a local setting model can be dangerous from a security perspective, and is very hard to manage at any scale. And the indirect group policy method is unreasonably complex. However, with ws2008r2, there is the ability to directly set subcategories via group policy. Existing webpages indicate this functionality will only work on ws2008r2 and Windows 7 clients, but I suspect that information is incorrect, and so I’m personally waiting a bit longer to see whether this new functionality improves the story before moving forward on the indirect group policy method.

Earlier I mentioned that there was next to no work based on the new auditing paradigm. There are a couple exceptions:

Randy Franklin Smith’s Ultimate Windows Security website is an open collaboration around documenting Windows security events, settings, and auditing. Randy recommends a subcategory-focused auditing baseline on this website, http://www.ultimatewindowssecurity.com/wiki/RecommendedBaselineAuditPolicyforWindowsServer2008.ashx.

Eric Fitz at Microsoft is a master of auditing. See his ‘Windows Security Logging and Other Esoterica’ blog at http://blogs.msdn.com/ericfitz/default.aspx.

Ned Pyle on the MS ASKDS blog, writes about some cool auditing tricks http://blogs.technet.com/askds/archive/2007/11/16/cool-auditing-tricks-in-vista-and-2008.aspx. This post includes tricks for:

  • finding when users are elevating via UAC
  • find out who is making AD changes unrelated to Accounts aka who the heck has been messing with group policy?!
  • find out what is changing a registry value at random intervals

And if you’d rather not wait around to see if Microsoft fixes the subcategory group policy issue, you can implement an indirect, complex group policy based method by following the directions at http://support.microsoft.com/kb/921469.

Moving on from auditing policies to all the other changes …

There are now some auditing events which can *NOT* be turned off. These are high security sorts of things like clearing the security log and service shutdown. Hooray!

With Vista and beyond all the eventids are new. So you will never see a 528 or a 540 or a 680 eventid on a vista box. Instead you’d see a 4624 event (yep, all three of those events are smashed into a 4624 now). In general, you can find the newer eventid from the old ones you might be familiar with by adding 4096 to the number. But this isn’t true across the board. Check out Randy’s website to see what has been verified, and detailed explanations. You might also check out the Microsoft version of the new security events at http://www.microsoft.com/downloads/details.aspx?FamilyID=82e6d48f-e843-40ed-8b10-b3b716f6b51b&DisplayLang=en. But I wouldn’t put much trust in the MS version; I’ve already seen several mistakes and many omissions in it. Randy’s website is more accurate.

In general, the quality of the information in these new security events is also much, much better. For example, in the old events, you rarely got an IP address on the login events. In the new events, I haven’t seen a case yet where the IP address is missing. The new events also seem to have better consistency.

One of the nicest enhancements is in the eventviewer GUI interface. If you’ve got a bunch of the same event filtered there, and you want to look at the same part of the event message body, you can scroll to the right part on the first message, then you can browse through all the messages, and the message body stays focused on that same area instead of jumping back to the beginning of the body and forcing you to scroll back down for each.

Another area which changed quite a bit is Directory Services auditing. So in events which are generated out of this category, both the old and new value for stuff that changes is logged. And what is logged makes sense:
-for multi-value attributes, only what changes is logged,
-for new objects only the initial values are logged
-for moved objects the paths are logged
-for undeleted objects, the new path is logged

You can view more details about the DS auditing changes at http://technet2.microsoft.com/windowsserver2008/en/library/a9c25483-89e2-4202-881c-ea8e02b4b2a51033.mspx.

Of course, to see the DS audit events, you’ll need to enable the right category (or subcategories), and also set a SACL on the directory object(s) you want to see those events for. A SACL is the part of the security descriptor which tells the host who to generate audit events for, with respect to that resource object. In general, you typically set a SACL to Everyone when you want auditing for a given object.

There is also a little-known Special Groups auditing feature that came with Vista. This feature allows you to specify a list of SIDs, and when a user logs in with a token that has one of those SIDs, then a special event is raised. You might use this feature to keep track of where sensitive accounts were being used, and to help ensure that they weren’t used in the wrong place. See http://support.microsoft.com/default.aspx?scid=kb;EN-US;947223 for more on this.

To see more details about all the eventlog related changes that came with Vista, see http://technet.microsoft.com/en-us/library/cc766042.aspx.

On the management side of things, Microsoft added the Audit Collection Services (ACS) product to the Systems Center Operations Manager (SCOM) product.

ACS allows you to collect security events centrally to a SQL database, run reports on those events, and alert on serious issues. ACS receives those events from the same layer as the eventlog (i.e. ACS doesn’t get events from the eventlog alone, but instead from the same source that the eventlog gets them from). So if a hacker clears your eventlog to cover his tracks, ACS still gets a copy of the events. ACS also provides filtering capabilities, so if your audit policies are noisy, or you don’t really care to collect/report on certain kinds of events, you can filter them from getting into ACS. In many ways, ACS is the syslog daemon Windows equivalent for which we Windows admins have always been envious of our unix brethren.

So, lots has changed in this space, and I’ll likely have more posts about auditing in the future.

Tracking down a perpetrator of Windows account lockouts or how to pull your hair out multiple ways

This is gonna be a lengthy post aimed primarily at Windows administrators. Consider yourself forewarned. But to entice you to the end, I’ll also mention that there is a lot of good, technically useful info in here, including a few details which aren’t well discussed anywhere.

So there are lots of bad guys out there. And some number of them like to try and brute-force your Windows accounts. And most of us take the sensible precautions by enforcing strong password strength and configuring account lockouts. You can quickly get into a denial of service situation if you aren’t careful with the account lockout settings.

We use a 5 minute lockout after 150 failed logins during a 5 minute period. This avoids most denial of service situations, while disrupting the brute force attacks very effectively.

But during this past year we ran into a situation where these settings didn’t avoid a persistent account lockouts. So we had to track down why the account was getting locked out, and where from.

First we tried a low-overhead solution of rebuilding the user’s computer. Often account lockouts are due to some process on the user’s computer having an old password cached, and endlessly trying to login with that old password. An event on the user’s computer seemed to support this theory:

Event Type: Warning Event Source: Tcpip Event Category: None Event ID: 4226 Date:  11/20/2006 Time:  4:26:32 PM User:  N/A Computer: blahblah Description: TCP/IP has reached the security limit imposed on the number of concurrent TCP connect attempts.

This is logged when something on that computer is trying to establish tcp connections faster than the OS will allow. In more detail, this means that the pool of dynamic tcp ports is exhausted before prior connection attempts are timed out. In other words, something is badly misconfigured. What that something is is not clear; sometimes legitimate applications can cause this, sometimes it’s malware. You could run sysinternals tcpview to figure out which app is doing this (or another similar tool). However, you’d need to catch the behavior while it was happening. Instead, we just went ahead and rebuilt the user’s computer.

But that didn’t end the problem.

So then eventids 672, 680, 539, and 644 were searched on all our DCs in an effort to locate where this was being perpetrated from.

As you know, 539 and 672 events contain source IP info, whereas 644 and 680 events contain source workstation name (netbios) info. The nature of the process causing the endless account lockouts was such that only 644 and 680 events were generated. So we had only a netbios name to work with.

Since this computer name was not in our domain/forest nor in WINS or DDNS (and being a university, we have an open network), this was a dead-end.

We tried analyzing netstat output from the DCs getting the 680 & 644 events to correlate the open sessions at the time of the failures to obtain an IP address. We further eliminated all connections from known domain computers. This gave us no leads which was incredibly mysterious.

We then tried sniffing network traffic at the DCs. This also gave us no leads which again was very mysterious.

I then stumbled upon a fairly old Microsoft webcast, http://support.microsoft.com/default.aspx?scid=%2Fservicedesks%2Fwebcasts%2Fen%2Fwc022703%2Fwct022703.asp&SD=GN. In that webcast, Microsoft suggests (but doesn’t explain why) that turning on “netlogon logging” would be beneficial in tracking down account lockout events. We tried that, hoping an IP address would be logged there. It wasn’t. But it did help.

Specifically, the netlogon logs told us where the failures were being “chained” from. We discovered that the audit failures were being chained from a member server to a DC to the PDC emulator. Which explained why netstat & sniffing at the DC level failed (along with the fact that secure channel communications between domain computers were obscuring the searchable details in a network sniff like the username). At the member server, we used a network sniff to obtain the IP of the offending computer, and the rest was history.

Turning netlogon logging is done by running the following command:

“nltest /dbflag:2080ffff”

plus a bounce of the netlogon service.

The hex string specifies the verbosity. The value above is full verbosity. According the webcast, this at maximum generates 40MB of log–which is nothing. They recommend turning it on for all DCs which is our current practice.

The log file this generates is located at c:\windows\debug\netlogon.log.


I’d also mention that this resource:
http://www.ultimatewindowssecurity.com/encyclopedia.html

is invaluable. It’s so much more complete than Microsoft security event documentation.