ADFS and Office Modern Authentication, What Could Possibly Go Wrong?

We’ve recently thrown the load balancer switch to send users to our new ADFS 4.0 farm rather than the old ADFS 2.x farm. My first baby steps in this process were documented in a prior post.1 It turns out that was just the beginning of this long, tortured journey. Things got very complicated when we started getting errors from users of Outlook hitting our Office 365 Exchange Online. In my prior post I explain how SAML Tracer can be helpful. However, you can’t use a browser-based HTTP debugger/tracer with a thick client like Outlook. In these cases Fiddler is your friend.

Many of the Office 2016 apps (and some of the Office 2013 apps with the right updates and registry settings) can use what Microsoft likes to call Modern Authentication. This is nothing but a lame pseudonym for OpenID Connect. OIDC, as it is abbreviated, uses a web-API friendly exchange to authenticate users. This is in contrast with the older and well established SAML and WS-Trust authentication protocols which are SOAP-based. We don’t (yet) use MFA with Office 365 so the settings I discussed in the prior article don’t apply to it.

Older versions of the Office thick clients use basic authentication with Office 365. The app puts up a credential dialog and then sends the user’s credentials to the O365 service where the actual authentication against Azure AD takes place. The user credentials are protected by TLS. This means that the user has to enter their credentials each time they start the app unless they choose to have the credentials stored locally. The biggest downside to this is that those locally stored credentials can easily be harvested by malware.

How does OIDC change the authentication flow? Newer Office apps open a window that hosts a browser which the app directs to the address of the OIDC provider (OP) configured during auto-discovery. The OP puts up a web form to collect the user’s credentials and, after validating them, returns two JSON web tokens. One is an app authentication token, the other is a refresh token which can be used by the app to request a new auth token when the current one expires. Thus the user’s credentials are never stored locally. The app and refresh tokens could be replayed but they are bound to the app so their loss would be far less damaging.

The Office 365 OP is the familiar https://login.windows.net and/or login.microsoftonline.com which both sit in front of Azure Active Directory (AAD). Things get more complicated when ADFS is in the mix and it really is a bit of a mess when your ADFS is using a SAML Claims Trust Provider (CTP). The UW, like many higher-ed institutions, uses the community developed Shibboleth SAML IdP and our ADFS is configured with it as the CTP. This means we get an authentication flow that transitions between 3 different protocols. The initial step from the Office app uses OIDC. AAD then calls ADFS using WS-Trust. ADFS then translates the WS-Trust call into a SAML protocol call to Shibboleth and the whole process unwinds as the security tokens are returned.2 As you can see there are lots of places where things can go haywire.

Our first sign of something amiss was users reporting this error when they attempted to sign on after the switch to ADFS 4.0.

ADFS error

Error shown by ADFS

Ah-ha, there is an Activity ID. I can look that up in the ADFS event logs to get more detail. Except that the logs didn’t say anything other than there had been an authentication failure. Not very helpful. Here is where breaking out Fiddler becomes necessary. As an aside I recommend running Fiddler from an otherwise unused machine because it captures all network traffic. Your typical workstation is going to be way too noisy which will clutter the network capture with lots of extraneous traffic. I use a virtual machine for this purpose so that nothing is running other than the app (usually Outlook) and Fiddler.

I’m not going to spend time describing how to use Fiddler. There are lots of web articles on that topic.3 What I discovered is that Azure AD (AAD) was sending a WS-Trust request to ADFS with a URL query parameter of:4

wauth=http://schemas.microsoft.com/ws/2008/06/identity/authenticationmethod/password

ADFS then sends this same URI as the SAML AuthnContextClassRef to Shibboleth. This parameter value is not a part of the SAML spec so Shib returned an error and ADFS then displayed its error.

If you recall from my prior post there is an ADFS CTP property called CustomMfaUri that gets applied in the MFA case. Unfortunately Microsoft did not create a corresponding non-MFA property. I’ve asked them to consider creating a CustomDefaultUri property. We’ll see if that gains any traction.

At this point I called Microsoft Premier Support to find out what, if anything, could be done to fix this. The support engineer took a look at the Fiddler trace and pointed out something I hadn’t noticed. Outlook 2016 was adding a “prompt=login” parameter to its OIDC login request. AAD translates this into the WS-Trust wauth parameter. He told me that this AAD behavior is configurable and that I should follow example 3 of this article https://docs.microsoft.com/en-us/windows-server/identity/ad-fs/operations/ad-fs-prompt-login. That indeed fixed the issue. Wauth is an optional WS-Trust parameter. Password authentication is the default in the absence of this parameter. The SAML AuthnRequest created by ADFS when there is no wauth has no AuthnContextClassRef so Shib also defaults to password authentication.

Our celebration of success was short-lived as other users continued to have similar login problems. What we discovered is that different versions of Office and Outlook 2016 have different “modern auth” behavior. The click-to-run version downloaded from the Office 365 site sends the prompt=login parameter. However, the MSI volume licensed version we distribute to campus is an older build of Office 2016 and it instead sends a different OIDC parameter: “amr_values=pwd”. There is no AAD property to configure the behavior with this parameter, it results in the above wauth being sent to ADFS. As far as we can tell there is no update that can be applied to the MSI version to have it change its behavior to send the “prompt=login” parameter. At this point the MS support engineer had no suggestions.

I’m thinking we have 3 options. The first is to convince everyone with the MSI version to uninstall it and install the c2r version. This is a non-starter because campus IT has no real authority; there is no way to force people to upgrade. We still have thousands of people running Windows 7 and a few even with XP! The next option to consider was writing an F5 load balancer iRule to do a URL rewrite to remove the wauth parameter. That solution would not work in the long run because we want to use AAD conditional access to do a structured roll-out of MFA. Removing the wauth would negate the requirement to use MFA. We could detect this specific wauth and only remove it but now the iRule is becoming fairly complex. So the third option was to ask the Shibboleth team to accept the Microsoft URI as a legitimate proxy for the PasswordProtectedTransport URI which is defined by the SAML spec.5

The Shib team made the change and now things are working properly. To quote Jim the lead engineer “We frequently have to make accommodations for vendor apps that are not spec compliant.” A better solution would be for Microsoft to more fully support SAML-spec IdPs as CTPs. Another solution would be to do password hash sync from AD to AAD so that AAD could handle the login without ADFS or Shibboleth in the mix. However we are concerned about introducing a second login portal to our campus users. We have enough of a problem with phishing and this would only complicate matters. We’ll see which way we end up going.

Endnotes

Modifying a Collection While Iterating Over It

I was recently bitten by a bug in some code I had modified. The error message was:

System.InvalidOperationException: Collection was modified; enumeration operation may not execute.

Here is the offending code:

PropertyValueCollection existingProxyAddresses = ADEntry.Properties["proxyAddresses"];
foreach (string proxy in existingProxyAddresses) {
    if (proxy.StartsWith("smpt:")) {
        ADEntry.Properties["proxyAddresses"].Remove(proxy);
    }
}

A little head scratching preceded the “Ah ha!” moment when I realized that the initial assignment of the proxyAddresses property to existingProxyAddresses was a simple alias, it was not a copy of the collection. The fix of course was to iterate over a copy of the collection.

The fact that this is AD code is not really not relevant except that the ProperyValueCollection type is a subclass of a collection primitive that doesn’t support LINQ. I would have preferred to used a LINQ extension to create the copy of the collection. PropertyValueCollection does have a CopyTo method so I used that instead but that requires pre-creating the target collection thus:

var existingProxyAddresses = new object[ADEntry.Properties[AdAttributes.ProxyAddress].Count];
ADEntry.Properties[AdAttributes.ProxyAddress].CopyTo(existingProxyAddresses, 0);

Note that the CopyTo method expects an array as the target type. Also note that although I show a string for the property name in the first code snippet, I am actually using a string constant in a class I’ve declared for that purpose.

So why can’t you modify a collection while you are iterating over it? I don’t know the formal reason but I suspect it is because a collection has no defined order. Because of this lack of ordering, the IEnumerator.MoveNext() method, which is used by the foreach statement, won’t have a defined meaning if elements are removed from the collection during the enumeration. As an example, suppose the foreach enumeration was coded to internally copy the collection and iterate over that copy. If its internal order was A, B, C, D and you removed C from the underlying collection, what would happen if the enumerator went to C? It would be undefined. Thus you are prevented from doing this.

 

Upgrade to Windows 10 Broke Hyper-V

Sometimes Microsoft’s efforts to help just makes things worse. I had Hyper-V configured on my computer that was running Windows 8.1. It was a simple setup with one virtual switch which allowed my VMs to connect to the University’s network. This is what Hyper-V terms an external network. I also had a dozen different VMs configured. I upgraded to Windows 10 1709 and was asked during the install if I wanted to configure Hyper-V now or later. Not knowing what this actually entailed I chose later. After the upgrade I found that all of my virtual machines were gone and the networking was broken. Apparently “configure Hyper-V later” means “rebuild it from scratch.”

It turns out the VHDs were still there but I had to create new VMs and use the existing disks. Not a huge deal but why did it wipe out the VMs during the upgrade?

The networking was more thoroughly hosed. There is a new virtual switch that is added automatically by the 1709 version of Hyper-V. It is called “Default Switch” and is an internal network. It also cannot be modified or removed. This serves no purpose for me and is just a nuisance. Worse the existing external virtual switch was converted into a private switch by the upgrade. I tried to modify it to make it an external switch and the modification failed with an invalid-parameter error. I tried to remove it and got the same error. I did some searching and found that the virtual switches are listed in the registry at “HKLM\SYSTEM\CurrentControlSet\Services\VMSMP\Parameters\SwitchList.”1 I deleted the two entries after making a registry backup by exporting the SwitchList node and then rebooted. The broken switch was now gone but the default switch was automatically recreated.

I still didn’t have networking for my VMs so I went to create a new external virtual switch. I have only one NIC so I checked the “Allow managment operating system to share this network adapter” box. Hyper-V manager put up a warning when I clicked Apply that my existing network configuration might be impacted (or words to that effect). I’m running a static IP and knew that I couldn’t rely on DHCP to restore all of the networking settings so I ran “ipconfig /all” so that the settings would be displayed in case any needed to be restored. Yup, after I clicked OK to proceed I lost network connectivity. A bit of poking around revealed that the static IP was still configured but it had lost both the DNS and WINS setting. I restored those and now have network connectivity on both the host and the guest systems.

I admit I’m neither a networking or Hyper-V guru, but one shouldn’t have to be to make basic use of these utilities. None of this should have been necessary, the upgrade should have just left my Hyper-V configuration alone (or silently done whatever changes were needed). I’ve lost a lot of time rebuilding my Hyper-V setup. That’s not my idea of software increasing one’s productivity.

  1. See https://social.technet.microsoft.com/Forums/windows/en-US/e49df568-4f4c-47b7-b30c-952d1e26ca58/cant-remove-failed-virtual-switch-from-hypervs-virtual-switch-manager

ADFS 4.0, Shibboleth and MFA

The University of Washington uses the InCommon Shibboleth SAML identity provider for web SSO. We run ADFS as a proxy between Office 365/Azure AD and our on-premise identity systems. Our ADFS is configured to use our Shib IdP as an additional “Claims Trust Provider” (CTP). We do this for two reasons: we want all web SSO to have the same login experience and we provide multi-factor authentication through our Shib service. The problem I was working to solve was how to configure ADFS 4.0 to require MFA through our Shib instance.

We initially set up what is known as ADFS 2.0. This was a downloadable upgrade to the original version of ADFS that shipped with Windows Server 2008. ADFS normally would show a “Home Realm Discovery” (HRD) page if there is more than one CTP (with AD being the default CTP). We wanted our ADFS relying parties (RPs) to go straight to Shibboleth, so we modified the HRD page code to effect this. We also used this modified code to require MFA for certain RPs. When ADFS 3.0 was released with WS 2012 R2 it threw a monkey wrench into this design. ADFS 3.0 no longer ran as an IIS web site such that the HRD page code was no longer accessible to be modified. We discovered that you can configure RPs to go to a specific CTP, but we were stymied as to how to require MFA. In the interim ADFS 4.0 was released with WS 2016 and yet the solution to the MFA problem remained elusive. I finally opened a support request with Microsoft to seek an answer to this problem. Here is what I’ve learned.

It turns out this is a two step solution. The first step is configuring your CTP for MFA. The second step is to configure RPs to require MFA.

Almost all advanced ADFS settings are accessed via PowerShell. There are new parameters to the familiar PS commandlets that are of interest. Here are those new commands and how they would be applied to solve our conundrum.

Configuring the Shibboleth CTP

You need to tell ADFS how to invoke MFA through the CTP. Our Shibboleth IdP will require MFA if it receives an AuthNRequest containing an AuthnContextClassRef with the value of

Note that this is the value that our Shibboleth is configured to look for. Your Shib installation may use a different URI to signal MFA.

Hint: use Firefox and its SAML Tracer plugin to trace a login session. SAML Tracer helpfully decodes the SAML token so you can examine it along with the rest of the HTTP exchange.

A WS-Federation authentication request can effect this request for MFA by setting the wauth parameter to the above value. ADFS translates this WS-Fed request into a SAMLP AuthnRequest with the required AuthnContextClassRef value of this URI. If you have a WS-Fed RP that can’t be configured to send the wauth parameter, or if you need to enforce MFA at the IdP, then this won’t work. Instead you configure ADFS so it knows how to make this request using the following PowerShell.

First, to get the current CTP state you can call

The parameter of interest is CustomMFAUri. Use the following code to set it.

Now ADFS knows how to ask for MFA from our SAML IdP.

Configuring Your Relying Parties

We want to send all of our RPs to our Shib CTP. Use the following PowerShell to do this.

You could script this by sending the output of Get-AdfsRelyingPartyTrust to the above command. Note also that since we set a display name on the CTP we had to use that rather than the URN identifier.

The next step is configuring those RPs of which you require MFA. That is done with this command

Now an incoming authentication request to the RP will result in our Shib prompting for the second factor after the entry of the correct user name and password.

I did a search on the RequestMFAFromClaimsProviders parameter after being told about its existence and didn’t find much. The MS documentation is useless and gives no examples of the use of this and the related parameters. I did find one non-MS blog post here, but it was rather general in nature. I hope these detailed instructions help those who want to use Shibboleth as their institutional identity provider via ADFS.

What is a Web Service?

The earliest computers didn’t talk to one-another. They were islands of information. A lot has changed since those pioneering days. Web services are the current state-of-the-art in computer to computer communications. I present below a brief history to illustrate and help explain this transformation from isolation to connectedness.

Let’s Talk! Connecting Computers

Networking technologies were developed to enable inter-computer communications. At that point you could connect two or more computers together but it still wasn’t easy to share information. There were initially no standard ways to represent or manipulate data.

Multiple, competing efforts progressed to standardize network communications. TCP/IP emerged as the primary way to interconnect networks and enabled the Internet. SMTP saw increasing adoption as an electronic mail protocol. Things were not as simple in the world of client-server communications. DCE/RPC and CORBA competed for attention, with Microsoft settling on the former. While providing a framework for client-server computing, these are still low-level binary network protocols that are not easy to use nor are they firewall-friendly. By that I mean those protocols requires a large number of TCP ports to be open which nullifies most of the security gained from a firewall.

Web Services

The next major advancement in network communications was SOAP. SOAP is not a wire-level protocol meaning that SOAP messages can be transmitted via a variety of application layer protocols including HTTP and SMTP. SOAP also standardized on XML as the data representation model. Both of these concepts were transformational in that now you could use a set of ports that are usually left open on firewalls and the data could be interpreted without an understanding of a complex binary layout. Major vendors jumped on SOAP and produced a raft of web service specifications (WS-*). Why were these called “web services?” Because they used the same underlying protocol as the World-Wide-Web: HTTP!

Except this is not a completely accurate timeline. SOAP was developed after HTTP and it turns out that HTTP itself makes a great client-server computing protocol. The HTTP protocol was developed by early Internet luminaries including Tim Berners-Lee, Paul Leach and Roy Fielding. The latter published a revolutionary dissertation in 2000 in which he did an analysis of networking architecture. Within his dissertation Dr. Fielding presented Representational State Transfer, a,k.a. REST. This is an architectural pattern for producing client server communications using the rich semantics of HTTP. However, given the large investment that had been made in the WS-* suite, it took a long time for folks to realize the inherent advantages of REST over SOAP.

RESTful Web Services

Although SOAP-based services used HTTP, they did not and cannot fully leverage all of the features of HTTP. All SOAP message exchanges use the POST HTTP verb. It doesn’t matter what you want to do, the SOAP client POSTs a request to the SOAP server. This is incredibly inefficient. The majority of network transactions are data reads (I don’t have any handy references for this but I believe it to be true). HTTP has a built-in verb for fetching data: GET. HTTP GETs are by definition stateless, idempotent and without side-effects. This enables two very powerful features: scale-out and caching. Because the requests are stateless you can use a load balancer to spread them out to a farm of servers. This also enables caching of requests on intermediate nodes of the Internet such as proxies and gateways. These combined capabilities have enabled the creation of Content Delivery Networks (CDNs).

Details of REST

REST is a resource-centric architecture which gives it the following characteristics.

  • Each distinct resource is named by a unique URL path.
    • e.g. https://myservice.example.com/stuff/things/mything22
    • The leaf element is the resource name while the intervening path elements can be thought of as containers or collections; thus the leaf element name need only be unique within the specific path hierarchy.
  • CRUD (create, read, update, delete) operations map directly to the HTTP verbs PUT, GET, POST, and DELETE respectively.
  • Stateless – as noted above this enabled Internet-scale services
  • Standard MIME media-types for payload encoding (JSON, XML, etc.)
  • Searching for resources is rooted in a container path and employs URL parameters to describe the search query
    • e.g. https://myservice.example.com/stuff/things?$filter=thingnum lt 22

While all of this is cool, REST isn’t an actual protocol. Rather, it is a set of architectural styles or conventions. Several competing implementation protocols have evolved as a result. The two dominant REST API description languages are OData and OpenAPI (was: Swagger). The former is being pushed heavily by Microsoft which may explain why some in the open source community prefer the latter (and I’m sure there are lots of other good reasons). In any case they both aspire to the same goals: providing a standard way for a service to describe its capabilities (the service description endpoint) and the schema of its data (the service metadata endpoint).

Examples of RESTful Web Services

Where to start? They are all around us. Facebook, Amazon, Google, Microsoft all expose resources via web services. I have code that calls the Amazon AWS Simple Queue Service for event message delivery. I am developing code to call the Microsoft Azure Active Directory Graph API (AAD Graph for short).

My employer, the University of Washington, hosts a number of RESTful web services. One that has been in use for a while is the Groups Web Service. A new middleware service is being developed to provide a standardized way to access University data. This is known as the Enterprise Integration Platform.

My next post will dive into making web service calls using the PowerShell scripting language.

Addenda

Example PowerShell and a PowerPoint deck at https://github.com/erickool/ws-powershell

Kerberos Delegation in Active Directory

The topic of Active Directory Kerberos delegation seems rather retro given that it is as old as AD itself. However, this is a very confusing and complex subject which has resulted in much misinformation out on the Internet. I am hoping that my explanation will be useful to a broad audience.

What is Kerberos Authentication?

Kerberos is an authentication protocol. It facilitates users proving their identity to services via the exchange of “tickets” mediated by the AD domain controllers. It is also a mutual authentication mechanism that allows services to prove their identities to users. Much has been written about Kerberos so suffice it to say that it is one of the most secure authentication protocols in wide use. The protocol is defined in https://tools.ietf.org/html/rfc4120 .

What is Kerberos Delegation?

Kerberos delegation is used in multi-tier application/service situations. A common scenario would be a web server application making calls to a database running on another server. The first tier is the user who browses to the web site’s URL. The second tier is the web site. The third or data tier would be the database. Delegation allows the database to know who is actually accessing its data.

One way to set this up is to run the web site using a domain service account. Let’s call this service account WebServerAcct. The database is running on a different server under its own service account. In many cases the database is run by a separate team from the web application so that the web application team must request database access for their WebServerAcct service account. The database admins would need to grant sufficient access to the WebServerAcct account for all possible actions of the web application. This means that the web application developers and/or admins determine who can access the application and by extension the data in the back end. This situation may be unacceptable to the database admins as they cannot control who ultimately has access to the data. The solution is to use Kerberos delegation.

Kerberos delegation would be configured on the WebServerAcct service account which grants it permission to delegate to the database service account. What does this actually mean? When a user accesses the web site they authenticate with Windows Integrated Authentication. This results in the WebServerAcct application receiving a request from the user that is accompanied by the user’s Kerberos ticket (I’m glossing over lots of details here in order to keep the scenario relatively simple). The user’s ticket contains a list of the user’s AD group memberships. The WebServerAcct application can examine the user’s group memberships and only allow access if the user is in a specific group. With delegation configured, the WebServerAcct service can request a Kerberos ticket to the database as the user rather than as itself. IOW, the database would receive a Kerberos ticket from the user rather than from the WebServerAcct application. This allows the database to examine the user’s groups to see if there is a membership in a group that is permitted access to the database. Without delegation the database would have no idea what user is actually accessing the data since it would have to give blanket access to the WebServerAcct account.

A concrete example of the above scenario is running SQL Server Reporting Services (SSRS) on a computer that is separate from the SQL Server database that provides the report data. The SSRS developer/admin can limit access to reports to specific users or groups. However, this does not actually grant those users/groups access to the data in the database. With delegation the database admins can control which users or groups can actually access the data rather than giving unlimited access to the SSRS service account.

Constrained Versus Unconstrained Delegation

Unconstrained delegation (a.k.a. basic delegation) was introduced with Active Directory in Windows 2000. It has the rather severe shortcoming in that it allows a user/service to request delegated tickets to any other service. This capability can be abused as an elevation-of-privilege attack vector. It was, however, the only reliable way to do delegation across a domain-trust boundary until Server 2012. Constrained delegation imposes limits as to which service accounts a delegating account can delegate to. This vastly reduces the potential for abuse of the delegating service account’s privileges.

There are actually two flavors of constrained delegation.

Original Constrained Delegation

This initial form of constrained delegation was introduced in Server 2003. With this type of delegation you explicitly list the services that the first tier account is allowed to delegate to. Using the above example, you would set constrained delegation on the WebServerAcct account. The Active Directory Users and Computers (ADUC) user property sheet has a page for configuring delegation. This form of constrained delegation may not be used across a domain/forest trust. Both the middle tier and back end tier services must be in the same domain.1 There are two other caveats around this form of constrained delegation. 1) the delegation tab will only appear on user and computer objects that have Service Principal Names (SPNs) set. If you expect a delegation tab and it isn’t there, that means that SPNs are not configured. 2) the delegation tab has some shortcomings in supporting service accounts that are user accounts; it will only list services running as a computer’s local account (Network Service, etc.). Thus to delegate to a user object service account one must directly edit the msDS-AllowedToDelegateTo (A2D2) attribute.

SPNs are discussed in many places on the web, so I won’t dwell on them here.

Resource-Based Delegation

The new form of delegation was introduced in Server 2012. It is designed to overcome several limitations of A2D2 delegation. First, it allows for delegation across a trust. Second, it changes how delegation is controlled. Rather than configuring the middle tier account to enable delegation, you configure the data-tier (resource) account to specify who can delegate to it. Additionally it does not require domain-administrator privilege to configure. The admin who has the ability to manage the resource service account can make the delegation changes. This change introduced the msDS-AllowedToActOnBehalfOfOtherIdentity attribute which would be configured on the resource service account.

This article is a good in-depth explanation of the Kerberos S4U2Proxy extension that enables constrained delegation and the changes introduced with Server 2012: http://windowsitpro.com/security/how-windows-server-2012-eases-pain-kerberos-constrained-delegation-part-1 (with more technical details in the second part).

I don’t believe the proffered advantages are as compelling in a real world situation. First, the domain is not a security boundary (see Security Bulletin MS02-001). I understand that there are a lot of legacy setups in the wild, but if you aren’t thinking about domain consolidation you really ought to be. Second, the data custodians/DBAs still need to control access to the databases by limiting access to specific groups. Do you really gain much by giving DBAs the additional ability to limit access to specific apps/services through this second delegation option? Regardless, there are certainly scenarios where these features will be useful.

Sensitive to Delegation?

This may be the most confusing part of Kerberos delegation. What exactly does the user account option “Account is sensitive and cannot be delegated” do? Does it control whether an account can request delegated tickets to another account? NO! It has no bearing on whether an account can do delegation! Rather, it means that a service account cannot request a delegated ticket for an account with this setting.

I think an example is in order. First, what would be a sensitive account? That means an account with elevated privilege in AD. An obvious example is Domain Admins. You would not want a service to request a delegated token for a domain admin. That would elevate the service’s privilege to that of domain admin. It is a best practice to set AD ACLs to limit the access of ordinary user accounts (e.g. ordinary users should not be able to log into and configure servers). Service accounts and systems admin accounts often need additional privileges to do what they do. Thus you should also stamp these accounts with the “Account is sensitive” setting.

A note on AD security: do not grant ordinary user accounts elevated privileges! Create clearly named separate accounts for those administrative tasks. Never use highly privileged domain or enterprise admin accounts for tasks that do not require that level of privilege! If you do server administration, browse the web, or read email with an account with EA/DA privs, the hackers will own you. ‘Nuff said.

Technical Details

All AD security principals contain the attribute userAccountControl. This attribute is a bit set, meaning that each binary digit is assigned a specific meaning. These bit values are defined in the Windows SDK in lmaccess.h. We are interested in three of these “flag” values:

#define UF_TRUSTED_FOR_DELEGATION                     0x80000
#define UF_NOT_DELEGATED                             0x100000
#define UF_TRUSTED_TO_AUTHENTICATE_FOR_DELEGATION   0x1000000

The UF_NOT_DELEGATED bit is set when you select the “Account is sensitive and cannot be delegated” checkbox.

The UF_TRUSTED_FOR_DELEGATION bit specifies unconstrained delegation. It is set when you select “Trust this user/computer for delegation to any service (Kerberos only)” in the Delegation tab. The only accounts that should have this bit set are the domain controller computer accounts. We have to trust our DCs; we’d rather not extend this level of trust to anyone else!

The UF_TRUSTED_TO_AUTHENTICATE_FOR_DELEGATION bit must be set to enable constrained delegation. It is set automatically when you add delegation through the Delegation UI.

As I mentioned earlier, the msDS-AllowedToDelegateTo attribute enables constrained delegation to the named servers/services. The entries in this attribute must match the SPN(s) set on the corresponding server or service account. If you manually modify this attribute, then you must ensure that the UF_TRUSTED_TO_AUTHENTICATE_FOR_DELEGATION bit is set on userAccountControl.

The msDS-AllowedToActOnBehalfOfOtherIdentity attribute controls the newer form of constrained delegation. It is set on the back-end data tier service account and names those middle-tier accounts that are allowed to request delegated tickets to the back-end service.

LDAP and PowerShell Techniques for Managing Delegation

It is a good policy to periodically scan your AD accounts to see which have delegation enabled. To make this an effective tool though you’d need a table of those accounts that have been granted permission to delegate. This enables spotting accounts whose delegation authorization has expired or who were never actually given administrative authorization. Similarly it is a good idea to scan privileged and service accounts to ensure that they have the “Account is sensitive” bit set.

Searching AD for accounts with one of these bits set in userAccountControl is straightforward but certainly not obvious. The first challenge is understanding LDAP query filter structure which is based on prefix notation. This means that the logical operators that combine query clauses are placed before the clauses. LDAP query clauses are enclosed in parenthesis. If you have clause A and clause B and you wanted both to be true to satisfy the query, it would be structured as (&(A)(B)) rather than the more conventional programming infix notation of (A & B).

The second hurdle is searching for specific bits in a bit set. This requires the specification of a “custom” query operator that is identified using an OID (an Object ID). OIDs are a bit like GUIDs except that they have a hierarchical namespace (digit space?) and are regulated by a standards body. At any rate the OID for doing a bit-match query clause in LDAP is “1.2.840.113556.1.4.803”. Another thing to keep in mind is that this LDAP bit-match query expects a decimal (base-10) number rather than the hexadecimal (base-16) number used in lmaccess.h.

  • Unconstrained delegation (UF_TRUSTED_FOR_DELEGATION 0x80000) = 524288 decimal
  • Sensitive to delegation (UF_NOT_DELEGATED 0x100000) = 1048576 decimal

To search for all accounts that are enabled for unconstrained delegation use the LDAP query filter of:

(userAccountControl:1.2.840.113556.1.4.803:=524288)

To search for accounts that should have “Sensitive to delegation” but don’t:

(&(name=$userPrefix)(!userAccountControl:1.2.840.113556.1.4.803:=1048576))

Note the exclamation point in front of userAccountControl. That means to find all accounts that don’t have that bit set. The $userPrefix is a placeholder for a user filter expression that would apply to your AD. We create all of our admin and service accounts with specific prefixes to make them easy to identify. Thus you could have (name=a_*) to search for all accounts that start with a_.

You can use these query filters with a tool like LDAPDE. I’ll show how to make these queries from PowerShell. The first example searches for user accounts starting with a specific prefix that don’t have UF_NOT_DELEGATED set.

# Find all user accounts matching the prefix that don't have "Sensitive to delegation" set
param([string]$userPrefix="a_*")
Import-Module ActiveDirectory
$filter = "(&(name=$userPrefix)(!userAccountControl:1.2.840.113556.1.4.803:=1048576))"
$users = Get-ADUser -LDAPFilter $filter -Properties userAccountControl
Write-Host "$($users.Count) accounts found without UF_NOT_DELEGATED set"
foreach ($user in $users) {
    # do something for each user or simply log the results
}

This script searches for all accounts (user, computer, gMSA, etc.) that have unconstrained delegation (UF_TRUSTED_FOR_DELEGATION) set.

# Find all accounts that are enabled for unconstrained delegation
$filter = "(userAccountControl:1.2.840.113556.1.4.803:=524288)"
$objects = Get-ADObject -LDAPFilter $filter
$objects | select Name

To search for objects with constrained delegation, you look for non-empty msDS-AllowedToDelegateTo attributes with this query filter:

$filter = "(msDS-AllowedToDelegateTo=*)"

If you want to change the userAccountControl value of accounts that are out of compliance, there is a PowerShell commandlet for doing this.

Set-ADAccountControl

This commandlet does not require bit-set manipulation. You list the settings you want as separate command parameters. See https://technet.microsoft.com/en-us/library/ee617249.aspx. There does not appear to be a corresponding Get-ADAccountControl which I find a little strange.

Conclusion

Wow, this ended up being much longer than I expected. I hope that this information is useful and leads to less confusion over the topic of Kerberos delegation.

Addenda

Additional resources:

  • Microsoft’s overview of the new Server 2012 delegation features: https://technet.microsoft.com/en-us/library/jj553400(v=ws.11).aspx
  • A deep dive into the details of Kerberos: https://technet.microsoft.com/en-us/library/4a1daa3e-b45c-44ea-a0b6-fe8910f92f28

The above post updated on 2016/11/16 to clarify several points.

  1. I’ve seen references to doing constrained delegation across a domain trust using versions of Windows Server prior to 2012. However, I’ve not found a definitive explanation of how this would work. At the very least it would require an up-level trust that supports Kerberos and all of the related configuration to enable Kerberos referrals to work properly. The second addendum-linked article, which is for pre-Server 2012, says “Constrained delegation is the only delegation mode supported with protocol transition and only works in the boundary of a domain.”

UW Group Sync Code Now On BitBucket

Yesterday I placed the UW Windows Infrastructure (UWWI) AD group synchronization source code into a publicly accessible repository on BitBucket.org. This code has been made available for perusal and reuse by the UW via an Apache 2.0 license. You can find it at https://bitbucket.org/uwitiam/group-sync.

The UW Groups Service places all change events on an Amazon SNS topic. The UWWI Group Sync agent reads an Amazon SQS queue that is attached to this topic. I gave a presentation on this event-driven architecture for distributing group changes at last year’s InCommon Identity Week conference. You can find that presentation here.

The UWWI Group Sync Agent updates the UWWI Active Directory based on these group change events. This is an extremely complex task that requires a detailed knowledge of AD. For example, making rapid successive changes to an AD object can be problematic if you don’t make all of the reads and writes to the same domain controller. AD uses a replication methodology known as “eventual consistency” which means that simultaneous reads from multiple DCs may not yield the same results. The solution to this issue is to use DC affinity; always bind to the same DC when making multiple reads and writes.

Another point of complexity is due to the changes that were made to the UWWI AD to give FERPA compliance. FERPA is a Federal statute that requires that student data be kept confidential. Active Directory was designed for a corporate environment where ease of access would grease the skids of commerce. IOW, any authenticated user can read a wide set of properties on any other object including group memberships. Unfortunately this design assumption leads to a violation of FERPA where the names, classes and other student details could be readable by any AD user. Brian Arkills, the UWWI Service Manager, designed a set of changes to AD to remove this default behavior. You can read about these changes here. Thus the Group Sync agent must ascertain the type of group, public or restricted, and set the appropriate ACLs.

UW Groups can be set to be Exchange-enabled. That is, they can act as both security groups to gate access to resource and they can be email distribution groups. The act of Exchange-enabling a group is a multi-step process. The first step is choosing to Exchange-enable a group in the UW Groups Service. One important part of this step is deciding on the email address for the group. The Groups Service validates that the chosen address is valid and unique. The Group Sync Agent then takes this info from the Groups Service change event message and creates or modifies the AD group, adding all of the attributes required for Exchange-enabling a group.

You can see more on the Group Sync Agent here.

Excessive CPU Use on Win8.1 Redux

The changes I made to solve the “Immersive Shell” DCOM errors did clear up the System event log. However, the high CPU usage persisted so I did more digging. I eventually found several references to problems with a Windows Backup scheduled task and its sdclt.exe process. I started Resource Monitor and saw many instances of sdclt.exe. Some were running and many more were recently terminated. There is a scheduled task that is designed to notify the user that Windows Backup has not been configured. For some reason the sdclt.exe process is repeatedly restarted and this ends up using considerable system resources.

The fix is to go to the task and disable it. The task is located in the Task Scheduler under Microsoft -> Windows -> WindowsBackup and is called ConfigNotification. Select it and disable it. Unfortunately a reboot is necessary to actually get the incessant sdclt.exe restarting to stop.

I have not found an official Microsoft acknowledgement that this is a problem nor have I seen any postulations as to why sdclt.exe is behaving in this fashion. The only common thread is that it occurs on Win8.1. Was this scheduled task introduced in Win8.1 or were there changes made to it with the Win8.1 upgrade? As far as I can tell the high CPU usage started after I installed the 2013-12-13 Windows Updates but I’ve no idea what those updates may have changed.

Regardless, I have a hunch as to what’s going on. I am one of what is certainly a very small number of people who run with User Account Control turned off. A few people turn UAC off because they don’t want to be nagged about running programs with full admin privileges. My reasons are more pragmatic. I have a home (Documents) folder that is redirected to a UNC share. I also run Visual Studio with Administrator privilege because that is the only way to enable debugging. Unfortunately folder redirection does not play nicely with UAC. This was causing all sorts of weird errors in Visual Studio. Thus I turned UAC off. There is a major Win8/Win8.1 consequence to turning UAC off: modern apps won’t run. This didn’t seem to me like much of an issue because I couldn’t stand them anyway. The reason they won’t run is they are configured to only run in a partially trusted application domain. With UAC off you can only run managed code in full trust mode. I’m guessing that the Windows Backup notification was written in partial-trust managed code. If this is the case, it certainly won’t run with UAC turned off. Apparently running the system with UAC off is not part of the Microsoft test matrix.

This brings up an old beef of mine. Why doesn’t the redirector have better support for UAC? It is a total pain that a redirection made as ordinary (limited privilege) user can’t be accessed by that same user with a full local administrator token. I’m sure there is some use case that I’m being protected against but I can’t figure out what it is since the file system ACLs will still be applied. Yeah, I know I am in the extreme minority of power users who push the system to its limits. That’s the standard argument for not accommodating corner cases.

At any rate, I’m sure glad I got the CPU usage issue sorted out. Boy I can’t wait to see what surprises are in the next round of updates!

 

Excessive CPU use on Windows 8.1

I upgraded my work desktop from Windows 7 to Win8 because of the new Hyper-V features. I am a software developer so it is a real asset to have the ability to run multiple virtual machines simultaneously which was not possible with Win7. I was unhappy with the new Win8 “modern” UI, so I upgraded to Win8.1 as soon as it was available. Things seemed OK for a while until I installed the last round of updates from Microsoft on 2013/12/13. My keyboard stopped working after I installed the “Keyboard and Mouse Control Center” (or whatever it was called). I had to log in remotely to uninstall that thing. Then I noticed my CPU usage creeping up unexpectedly. It finally got so bad I was rebooting every few days. I looked at the running processes and tried stopping a few that looked suspicious, but found no real relief. I decided to look into the Event Log. I saw that the System log was absolutely full of the same event.

Log Name: System
Source: Microsoft-Windows-DistributedCOM
Date: 1/13/2014 9:24:44 AM
Event ID: 10016
Level: Error
User: LOCAL SERVICE
Description:
The machine-default permission settings do not grant Local Activation permission
for the COM Server application with CLSID {C2F03A33-21F5-47FA-B4BB-156362A2F239}
and APPID {316CDED5-E4AE-4B15-9113-7055D84DCC97} to the user NT AUTHORITY\LOCAL
SERVICE SID (S-1-5-19) from address LocalHost (Using LRPC) running in the
application container Unavailable SID (Unavailable). This security permission can
be modified using the Component Services administrative tool.

I found the CLSID in the registry and found that it was assigned to something called “Immersive Shell”. Did some searching and discovered a couple of MS articles. In a nutshell, you need to grant local Administrators ownership and full control of the CLSID key and the AppID key in the registry. Once that is done you can go to the Component Services tool, navigate to the DCOM config for the local computer, and then find the Immersive Shell object. Open its properties and under the Security tab choose to Customize the Launch and Activation Permissions. Click the Edit button, add Local Service and grant it Local Launch. Click OK, close everything and reboot. Voila, the Event Log messages stop.

I gleaned some of the above from this post: Weather Application.

I’m not certain this cured all of the excessive CPU consumption. It is still at around 25% with nothing going on other than typing into this WordPress window. I’ll post a follow up if I discover more Win8.1 CPU-eating culprits.

 

Hosting a Shibboleth SP Web Site in Azure, Conclusion

This is the last in a series of posts about using the Shibboleth Service Provider to implement SAML SSO authentication in an Azure cloud service web site. The first three posts present background information. There are two posts that are specific to using the Shibboleth SP with Azure and then there is this concluding post.

The information I am presenting has come from three sources. First is the official Azure documentation from Microsoft and Shibboleth documentation from the Shibboleth Consortium. Each discusses their respective areas but there is no overlap. My second source of information is what I learned while working on the Azure team at Microsoft. That was over a year ago and many things have changed in the interim. However, it gave me a foundation in understanding how an Azure cloud service works. Finally I did a lot of experimenting, trying things, seeing what worked and what didn’t. I’ve found nothing on the web about hosting the Shibboleth SP in Azure so I believe I am blazing a trail of sorts. I hope this information may be of some value to others. Having said that I must offer some caveats.

The first caveat is to simply underscore the fact that Azure is being updated by Microsoft at a frantic pace. My explorations that produced this web application were done in June 2013. Some of the details will certainly change or become obsolete over time. E.g. I noted that an Azure web site did not support SSL. I’ve since seen an announcement of an SSL preview for the web site feature.

Unresolved and Unexplained Issues

Mysterious Site ID in IIS

I used the remote desktop to look at IIS Manager on my Azure role instance and saw that the web site ID was 1273337584. I thought “it’s assigning a random number” and expected it to change with subsequent deployments. It didn’t. So I deleted the deployment and redeployed instead of just doing an update. It remained the same number. Then I deleted the entire cloud service and created a new one with a new name. The web site ID remained the same. What can I conclude from this? Nothing really. I don’t know if this is a fixed number, used for all web role instances (remember that each role instance gets its own VM, so there is no chance of site ID collisions), or if there is some algorithm that could be related to my subscription or some other value.

I looked into using appcmd to read the site ID assigned by Azure thinking I could then modify the shibboleth2.xml file on the fly. Then I discovered that the web site hasn’t been created at the time the startup script is running. The only option is to have a method override in my app code for the role start notification. This is a bit of a chicken-and-egg problem because I’d have to restart the Shibboleth service after updating the config file and might also have to restart the web site – from within the web site code. So this issue remains without a good resolution.

Azure Load Balancer Affinity

A SSO exchange involves several browser redirects first initiated by the SP code, then by the IdP code back to the SP. In between there is a user interactive log in. All of the Azure documentation stresses that your web applications must be stateless. If you have multiple instances of a web role running for load handling reasons, you have no control over which instance will receive a request. Will this cause problems during the authentication sequence?

I found one non-Microsoft blog post that said the Azure load balancer would have session affinity for a given source-IP/port pair. My own understanding from when I worked in Azure was that the load balancer would maintain a session for up to one minute of inactivity. I’ve seen no official confirmation of either notion. I’ve not spun up more than one instance yet to test this issue. Considering that Microsoft provides an SSO federation service in Azure’s Access Control Service, which uses the same sort of redirect sequence for authentication (more actually because of the extra federation hops), I’d have to believe that this is not an issue. It would be nice to know for sure though.

Conclusion

Of course this begs the question: why doesn’t Microsoft natively support SAML authentication? That is, why isn’t there a Windows Identity Foundation protocol handler for the SAML profiles and bindings? That would eliminate the need to jump through these hoops. I’ve asked some of my former Microsoft colleagues who are in a position to know and have received no response. I know the official line is to not comment on unreleased products or product plans, so the lack of a response is not surprising.

There is also the option of updating the Shibboleth SP implementation so it can act as a WIF protocol handler. It is open source and community developed. I might be able to contribute. Stay tuned.