Tracking down a perpetrator of Windows account lockouts or how to pull your hair out multiple ways

This is gonna be a lengthy post aimed primarily at Windows administrators. Consider yourself forewarned. But to entice you to the end, I’ll also mention that there is a lot of good, technically useful info in here, including a few details which aren’t well discussed anywhere.

So there are lots of bad guys out there. And some number of them like to try and brute-force your Windows accounts. And most of us take the sensible precautions by enforcing strong password strength and configuring account lockouts. You can quickly get into a denial of service situation if you aren’t careful with the account lockout settings.

We use a 5 minute lockout after 150 failed logins during a 5 minute period. This avoids most denial of service situations, while disrupting the brute force attacks very effectively.

But during this past year we ran into a situation where these settings didn’t avoid a persistent account lockouts. So we had to track down why the account was getting locked out, and where from.

First we tried a low-overhead solution of rebuilding the user’s computer. Often account lockouts are due to some process on the user’s computer having an old password cached, and endlessly trying to login with that old password. An event on the user’s computer seemed to support this theory:

Event Type: Warning Event Source: Tcpip Event Category: None Event ID: 4226 Date:  11/20/2006 Time:  4:26:32 PM User:  N/A Computer: blahblah Description: TCP/IP has reached the security limit imposed on the number of concurrent TCP connect attempts.

This is logged when something on that computer is trying to establish tcp connections faster than the OS will allow. In more detail, this means that the pool of dynamic tcp ports is exhausted before prior connection attempts are timed out. In other words, something is badly misconfigured. What that something is is not clear; sometimes legitimate applications can cause this, sometimes it’s malware. You could run sysinternals tcpview to figure out which app is doing this (or another similar tool). However, you’d need to catch the behavior while it was happening. Instead, we just went ahead and rebuilt the user’s computer.

But that didn’t end the problem.

So then eventids 672, 680, 539, and 644 were searched on all our DCs in an effort to locate where this was being perpetrated from.

As you know, 539 and 672 events contain source IP info, whereas 644 and 680 events contain source workstation name (netbios) info. The nature of the process causing the endless account lockouts was such that only 644 and 680 events were generated. So we had only a netbios name to work with.

Since this computer name was not in our domain/forest nor in WINS or DDNS (and being a university, we have an open network), this was a dead-end.

We tried analyzing netstat output from the DCs getting the 680 & 644 events to correlate the open sessions at the time of the failures to obtain an IP address. We further eliminated all connections from known domain computers. This gave us no leads which was incredibly mysterious.

We then tried sniffing network traffic at the DCs. This also gave us no leads which again was very mysterious.

I then stumbled upon a fairly old Microsoft webcast, In that webcast, Microsoft suggests (but doesn’t explain why) that turning on “netlogon logging” would be beneficial in tracking down account lockout events. We tried that, hoping an IP address would be logged there. It wasn’t. But it did help.

Specifically, the netlogon logs told us where the failures were being “chained” from. We discovered that the audit failures were being chained from a member server to a DC to the PDC emulator. Which explained why netstat & sniffing at the DC level failed (along with the fact that secure channel communications between domain computers were obscuring the searchable details in a network sniff like the username). At the member server, we used a network sniff to obtain the IP of the offending computer, and the rest was history.

Turning netlogon logging is done by running the following command:

“nltest /dbflag:2080ffff”

plus a bounce of the netlogon service.

The hex string specifies the verbosity. The value above is full verbosity. According the webcast, this at maximum generates 40MB of log–which is nothing. They recommend turning it on for all DCs which is our current practice.

The log file this generates is located at c:\windows\debug\netlogon.log.

I’d also mention that this resource:

is invaluable. It’s so much more complete than Microsoft security event documentation.