Fun with Two Hop Kerberos

So as part of a recent Datawarehouse initiative here at the UW, there’s been quite a bit of activity around Windows authentication delegation, sometime more well known as Kerberos two hop authentication. I know the Law School has been using two-hop authentication for awhile now, and recently had a problem so I think this post is likely relevant to quite a few.

To explain what two hop authentication is, we’ll need to jump back to Windows authentication basics to make sure we are all on the same page. If you already understand it, then jump down to “The Meat”.

So when you login, you give your password (or some other credentials) to the lsass.exe process on what is usually a (physically) local (to you) computer. The lsass.exe process on your computer hashes some other info (a timestamp) using the password to create the hash, then sends that hash over the wire to a domain controller for verification. Note that the info on the wire doesn’t contain any form of your password. The domain controller compares that hash to what it expects, and if successful, passes back a login token that can be used. Depending on the details of the authentication scenario, that login token might have additional stuff (usually domain local & local groups) added to it before you receive it. Then you can use that token to access stuff on the local computer and over the network. The reason you can use it to access stuff off the local computer is because the token itself has been marked as re-usable, and the local lsass.exe process considers that mark as inviolable.

Sometimes you access stuff over the network, and you are challenged for your credentials. When that happens, you actually do send your password over the wire, and the lsass.exe process on the remote computer takes your password, does the same dance with the domain controller, *except* this time the token doesn’t get the re-usable mark. This is because that remote computer doesn’t need your login token except for resources local to it. It uses that token, and we say the remote computer is impersonating you to gain access to resources (local to itself) on your behalf. In Windows terminology this is called Impersonation. Impersonation can also happen without a password challenge, and in that case, your local lsass.exe which has a re-usable copy of your login token passes that token to the remote computer, which then uses that token to ask the domain controllers for a non-reusable token. You might also think of this scenario as one-hop, as the login token is one hop removed from where the user physically is.

Now, say that the remote computer needed to access network resources that aren’t local to it, as you. That’s the scenario we are concerned with here. If it helps, imagine a web service that needs to access a sql service as you to provide the right data. In this two hop scenario, you pass your creds (either password or login token) to the first remote server. That remote server has something special about it. The user account that is running the network service process has been granted a special ability called delegation. The user account might be SYSTEM, in which case the user account is the computer object in Active Directory, or it might be some specific service account. Using delegation, the first remote server can take the creds you’ve provided to get a login token that is re-usable. It then can reach out to the 2nd remote server, and provide a non-reusable token to access whatever it needs on that 2nd remote server. There are two levels of delegation: unconstrained delegation and constrained delegation. With unconstrained, the 1st remote server can get a re-usable login token that can be used to access *any* network services that token has access to. With constrained, the 1st remote server is limited so that the re-uable login token can only be used locally and with specific network services. Obviously unconstrained delegation is more secure and therefore preferable to unconstrained.

A few relevant factoids about delegation:

  • Delegation relies on Kerberos authentication. If you can’t do Kerberos to the 1st remote server, then you can’t use delegation to achieve the second hop.
  • Kerberos authentication relies on a bunch of pre-requisites, so it can sometimes be tricky to achieve.
  • You can have as many hops as you’d like, as long as each server in the chain has delegation privileges to the next server in the chain.
  • Granting the delegation privilege is practically an all or nothing thing. If you grant it to user serviceX, it means that *every* user who passes creds to serviceX will have a re-usable login token available to serviceX. If serviceX is insecure or not trustworthy, then really bad things can happen. Aside from the constrained level, there is one check on this privilege–you can mark certain user accounts as being “sensitive”. This means that they can not be used via delegation at all. You will want to mark all your domain admin accounts as sensitive, and likely quite a few others too.

So that’s the basics, and now we’ll move onto the more interesting stuff.

The Meat

So I was saying that the datawarehouse project here has chosen an architecture design that relies on two-hop authentication. The primary components which do this are sql servers that via the sql linked server functionality bring data from many sql servers together into a view. Complicating this picture is the fact that our user accounts and the sql servers themselves are in two different forests.

We had a lot of problems getting this to work correctly. For Kerberos to work correctly, you have to make sure you have all the service principal names registered correctly. You also have to ensure all the computers are within a given time window. And that they all are trying to use Kerberos. And that you have a forest trust, not a domain trust.

In the course of all these problems, we finally asked PSS to come help us. They sent two consultants on site on two separate occasions, but both left stumped. We were left with all kinds of additional (undocumented) claims from the two different consultants, in some cases contradictory to each other.

We eventually figured out both the problems which were ailing us.

The most serious problem was that in marking a wide variety of accounts as sensitive, we (actually it was me) accidentally marked a special built-in Windows account as sensitive. That account is the KrbTgt account. This account has a very special function. It issues *every* login token for your Windows domain. So, obviously, it’s very important. By marking the KrbTgt account sensitive, apparently every login token it issues is also marked sensitive. This is undocumented behavior, but from a logical perspective it makes sense. So for the span of about 3 months I can say with definitive authority that there was absolutely no delegation used from that domain–because every account was effectively marked as sensitive. Fortunately, not many folks are using delegation from that domain as of yet.

Note that for some domains this might be desired behavior, and that it’s really a shame that this is undocumented behavior. I’d imagine that quite a few Windows security organizations might want to add this to their locked down configuration guides.

We also had a sporadic problem on certain servers with hostnames where the DNS suffix of those server’s hostnames corresponds to a MIT Kerberos realm which happens to have a Kerberos trust from one (and only one) of the forests involved. That problem happens because Kerberos uses mutual authentication–meaning one computer verifies that any other computer it talks to is who it claims to be. For this, it uses what are called servicePrincipalNames (SPNs). But you have to find the right authority for a given SPN, and of course, the Windows logic assumes that the DNS suffix on a SPN is meaningful even though that isn’t necessarily true. It turns out that if the servers involved have the registry keys needed to resolve the KDCs for a MIT Kerberos realm in this scenario, then Windows works as you’d like. In other words, if it can find the MIT Kerberos realm, then it can check it for the SPNs, find out that they aren’t there, and then look elsewhere for the SPNs. But if it can’t find the KDCs for that MIT Kerberos realm, then it gets stuck. Putting the registry keys for resolving the MIT Kerberos realm on all relevant computers is one fix, another is not using that DNS suffix in any server hostnames.

Put another way:

Windows domain has a Kerberos realm trust with Windows domain has a server named in it. Out of the box Windows clients in *can’t* negotiate Kerberos with sql1. Windows clients with the appropriate KDC registry keys referencing the Kerberos realm *can* negotiate Kerberos with sql1.

In other words, because you have that Kerberos realm trust, you can’t plan on having Kerberos auth to any computers with a DNS suffix that matches that realm unless all your clients have got the KDC reg keys to that realm. Somewhere in the background it’s likely that there’s an error happening which won’t give up and allow the local Windows KDC to issue a TGS for a host with that DNS suffix, unless it can contact the external Kerberos realm KDCs to see if they have a more authoritative SPN.

If you do want to read up on this technology, my favorite blog site, the MS Directory Services blog, has a very useful post that you can add to your reading list:

Leave a Reply

Your email address will not be published. Required fields are marked *