Locking down Azure AD, a closer look at the value proposition

Introduction

Microsoft does not yet provide fine-grain access controls for Azure Active Directory. This is in stark contrast with Active Directory Domain Services, which shipped in 2000 with Windows Server 2000, where you can restrict access down to the individual attribute on an object.

Microsoft has been investing heavily in improving the role-based access controls (RBAC) infrastructure underlying Azure Active Directory, but to date, the features provided have primarily focused on reducing the number of people in the global admins role–a role which can do just about anything in your Azure AD tenant.

If you want to limit access to the data in Azure AD, you have only a few options:

  1. Don’t put it in Azure AD. We exercise this option with course groups and other groups whose membership is considered confidential. This is in stark contrast with our Active Directory, where the fine-grain access controls available allow us to include course groups.
  2. Obscure data in Azure AD. With this solution, we change the real data to avoid undesired consequences of that data being generally available.
  3. Only allow a limited number of users & apps to have read permissions to anything. This option puts you in the role of gatekeeper for any scenario which requires read access to objects in your Azure AD, i.e. it introduces a huge amount of friction, which can only lead to less interest and use of Azure AD based technologies. This includes owners of Azure subscriptions, 3rd party applications trying to do low-risk OAuth activities, and IT professionals trying to get their job done. Many folks with perfectly good business intentions who run into the problems this creates will give up.

Discussion

It’s worth noting that there are a couple special purpose solutions that Microsoft has created, e.g. by default guest users can only see their own account and nothing else in your Azure AD tenant. This special purpose solution suggests that Microsoft has the capability to implement fine-grain access control solutions but hasn’t yet prioritized the ability for customers to craft custom solutions which meet their own purposes.

Solution #1 – don’t put data in Azure AD

Solution #1 is not really a solution–it’s a dodge. If you avoid putting data in Azure AD, you aren’t getting the value Azure AD might provide.

Everyone loses with this solution–there is no value–just avoiding the bad outcomes associated with data being exposed.

Solution #2 – obscure data in Azure AD

With this solution, we change the real data to avoid undesired consequences of that data being generally available. The scenarios where this type of solution are most appropriate are not unique to Azure AD–they usually involve obscuring the data at the source system, and informing those who consume that data with ways to decode that presented information. An example might be a person’s name, where a person at risk has a different name presented than their actual name.

This type of solution generally is best when implemented at the source system and the affected parties have consented to the solution–not in or at the boundary of Azure AD. When implemented on a per-system basis like only in Azure AD, you set up a paradigm where one system has false data but others do not. This paradigm will create support overhead and disincentive to use that system.

This solution is only valuable for specific, limited use cases, not for generally solving Azure AD access control limitations.

Solution #3 – Conditional Access generally prevents read access via undocumented option

Solution #3 is predicated on limiting access to the user interfaces that can be used to query Azure AD information. By blocking the UIs, you block access to the data. You probably already saw the problem–what if a new UI shows up? This solution is predicated on rocky ground which assumes Microsoft won’t introduce something new–which in this constantly changing environment is absolutely not a safe bet.

You implement solution #3 by implementing a Conditional Access policy that prevents issuing access tokens (OAuth) to the Azure Management application in your tenant. Microsoft has never documented what the Azure Management application actually represents. So by implementing this policy you are taking an unknown risk that essentially says:

  • I’m able to determine all the impacts myself
  • Microsoft won’t change what this means on me
  • I will identify all the valid business reasons for a user/application to have access beyond themselves and manually manage exceptions to this policy
  • The chilling effect this policy will have is an acceptable cost

To dig a bit deeper on this, based on other customer reports it is believed that Azure Management includes the following:

  • Azure Portal (portal.azure.com)
  • Office Portal (portal.office.com)
  • Azure AD Graph API
  • Microsoft Graph API
  • PowerShell modules which interact with Azure AD, including but not limited to:
    • MSOnline
    • AzureAD

Whether there are additional affected interfaces is unknown.

The most widely used of these interfaces is the Azure Portal. A customer would have to identify all subscriptions in their tenant, and for each subscription, identify the owners or other role holders which might need to access the Azure Portal, and have some acceptance process to allow them to be excluded from the Conditional Access policy which blocked access to Azure Management.

Similar mitigation steps would be needed for the other capabilities included in Azure Management. Of the others, the hardest to address would be 3rd party applications with OAuth permissions. These applications operate on a consent basis–if a user grants consent, they are allowed to read designated information about the user. The Conditional Access policy would block such 3rd party applications from retrieving that data, unless they were excluded. Depending on how this was approached, this has the potential to recreate a gatekeeper situation for Azure AD applications. At the University of Washington, we spent several years in contention over a gatekeeper approach to Azure AD applications, and reverting the progress made is undesirable.

A problem with this approach is that it won’t necessarily entirely lock down access to the data you’ve put in Azure AD. Some of Microsoft’s applications (which they sometimes call 1st party applications) have been granted broad read access to a variety of data in your Azure AD. For example, Exchange Online can read a lot of Azure AD based data. To really be effective, you’d also need to analyze every one of these Microsoft applications for whether it exposed data and then mitigate those issues, if you can. In some cases, you may find that you can’t mitigate the behavior of an application which Microsoft controls other than to stop using that application. If the application were Exchange Online, not many organizations would seriously consider stopping use of Exchange Online.

The initial effort of implementing solution #3, without also analyzing and mitigating all 1st party Microsoft applications, is estimated at 200 hours. Ongoing maintenance is estimated at 2h/week. Additionally analyzing 1st party Microsoft applications is likely to add at least 600 hours to the effort.

Summary

While you could implement a Conditional Access policy, the consequences of doing so are broad and would require significant effort to implement in a well-planned manner. There are chilling effects and other unknown costs. For these reasons, unless the need is very significant, this solution is not recommended.

Bloodhound – AD Attack Resilience Methodology

Last month I was introduced to BloodHound and the Active Directory Adversary Resilience Methodology via a special workshop put on by SpecterOps.

While a lot of the time and technical nit-picky details center on the Cypher query language, the overall technology and approach is so awesome that I found myself not really caring that it took awhile to figure out how to express what I wanted.

Here’s the punch-line: as a defender, with this approach you have a really excellent tool to figure out how attackers might compromise the high-value targets protected by your Active Directory. This includes a visual map of their potential path, and a way to model how possible mitigations might change what paths are left. The tool itself has an excellent command of the possible exposures in an AD environment–which I can almost guarantee will exceed your awareness and ability to track within your environment.

With a tool and approach like this, you can:

  • identify weak points in your environment which need extra attention
  • have a quantitative way to evaluate possible mitigations or changes proposed
  • quantitatively compare your security posture between two points in time

The tool is not perfect, but for something that is an open-source labor of love that has been released in the last year–it’s pretty impressive–especially when you note the scientific methodology behind the tool.

Here’s how it generally works:

  1. You setup a neo4j database (and web interface) – https://www.youtube.com/watch?v=o22EMEUbrNk (walkthrough)
  2. You setup bloodhound to use that neo4j db – https://github.com/BloodHoundAD/BloodHound/releases/tag/2.0.3 (check for newer versions)
  3. You run bloodhound’s data collector in your environment to populate bloodhound’s db (look for smarthound in the above repo)
  4. You use a combination of the bloodhound UI and the neo4j web interface to explore your environment and the possible attack paths

Neo4j is a graph database, with nodes and edges (relationships between nodes). This allows the modeling needed to happen in an efficient way. Bloodhound defines a great set of AD related nodes and edges in its schema, and the data collector goes about discovering that data in your environment.

Once you’ve got a database with data from your environment, you can use the bloodhound UI or the neo4j web interface (using the cypher query language) to identify attack paths to high-value targets in your environment. A very obvious path you might want to find are ways to go from domain users to domain admins. Finding all such paths in a single query isn’t really practical–instead you might find all paths which are the shortest in terms of the number of hops from node to node. For example, a domain user might be able to log into a computer where a domain admin has a session (i.e. has logged in)–that’s a short path to escalate to domain admin.

You have the power to manually manipulate the nodes and edges in the database. You can add nodes or edges. You can remove nodes or edges. You might do this to simulate what the environment would look like if you applied that mitigation in your real environment.

By iteratively applying manual manipulations and re-running queries for the shortest attack paths you can identify as many weak points in your environment as you care to find. This gives you a laboratory-like environment where you can explore and test a hypothesis using a scientific methodology.

If you layer an analysis tool like PowerBI on top of this, you can put together a dashboard which gives you an objective sense of the overall security stance of your environment in a potential configuration. You’d applying manual manipulations to your bloodhound database and check the PowerBI dashboard to see how much improvement resulted. Likewise, you might use this kind of approach to provide an independent analysis of the security risk profile to any proposed change in your environment before it actually is implemented.

There are some approaches you’ll need to figure out if you use bloodhound as a risk evaluation tool.

Database management

If you are making manual changes, you are likely to want some way to copy and roll-back your database to the a given state. Alternatively, you can save your sharphound data collection and re-import to a fresh database. Do you need multiple copies of the database to support more than one purpose?

Data collection

How often do you do a fresh run of sharphound? And import it?
Which account runs sharphound? From which host(s) is sharphound run?
Do you notify others in your environment that they may see connections from the sharphound host(s)?

Basic reading

https://posts.specterops.io/introducing-the-adversary-resilience-methodology-part-one-e38e06ffd604
https://posts.specterops.io/introducing-the-adversary-resilience-methodology-part-two-279a1ed7863d
https://posts.specterops.io/bloodhound-2-0-bc5117c45a99

Intermediate reading

Sharphound rewrite: https://blog.cptjesus.com/posts/newbloodhoundingestor
Low-level technical details on sharphound: https://blog.cptjesus.com/posts/sharphoundtechnical
Intro to Cypher: https://blog.cptjesus.com/posts/introtocypher
Cypher cheat sheet: https://neo4j.com/docs/cypher-refcard/current/

Engaging

Slack channel for Bloodhound: https://bloodhoundgang.herokuapp.com/

Following

Andy Robbins Twitter: @_wald0
Rohan Vazarkar Twitter: @CptJesus
@SpecterOps

Azure AD tokens and Windows token binding

This blog post is an attempt to capture and share a variety of information that is not well-documented by Microsoft, spanning the two topics in the subject line.

I feel these topics are pretty critical to understanding the fundamentals of modern Azure AD and Windows security, and invaluable for troubleshooting. I do not consider myself an expert on these topics, and certainly not on the protocols via which one might get a token. But I’ve run into these topics enough to recognize there is a big gap in understanding and documentation, so this is my attempt to fill that gap.

First, here’s a table I created for a presentation in April 2018 for the Microsoft Technology community of practice at the UW.

1 Revocation is a complex topic; don’t rely on this too much w/o a deeper understanding.

These are the different types of Azure AD tokens that Microsoft has described in a variety of sources, along with a brief description of what they are good for, and the restrictions associated with each. Some references for this table are:

  • https://docs.microsoft.com/en-us/azure/active-directory/active-directory-configurable-token-lifetimes
  • https://jairocadena.com/2016/11/08/how-sso-works-in-windows-10-devices/
  • https://blogs.technet.microsoft.com/educloud/2017/06/14/how-to-kill-an-active-user-session-in-office-365/

Some key things to understand:

  1. Most users are now familiar with the “Keep Me Signed In” (KMSI) setting. The KMSI dialog only governs the browser cookie, in other words, when you choose no, that means the cookie is browser session bound. And when you choose yes, that cookie persists across browser sessions. I’m saying browser cookie, but as you see in the table that is synonymous with SSO token. And yes, Microsoft sprinkles use of both when talking about its tokens. So KMSI only governs the SSO token or browser cookie.
  2. Many of the AAD tokens have long lifetimes. You can read more about that at the first URL reference above.
  3. Browsers are not the only software managing your Azure AD tokens, e.g.
    1. if on iOS, the app you are using might manage the token, unless you’ve installed MS Authenticator, in which case, it manages AAD tokens
    2. if on Windows, it depends on the OS & Office version. And yes, this is one of the places where Microsoft has down a really poor job documenting or explaining. I might be able to share a few more details here, but it is complicated and I have incomplete details, so I fear that won’t be useful.
    3. if your AAD tokens are federated, then you’ve got upstream tokens. For example, at the UW, we federate our AAD to ADFS, which in turn is federated to Shibboleth. So there are multiple upstream tokens, and each of those tokens may be managed by something other than the browser, and may affect my AAD tokens.
  4. Getting rid of a cached AAD token is a problem. You need to know the specific client details & the recipe for that specific scenario. This is deeply entwined with #3.
    1. At the UW, in early 2018, we moved from ADFS 2 to ADFS 4. During that, we had a broad “Outlook” incident, where there were quite a few instances of corrupt cached AAD tokens that had to be manually deleted to enable users to use Outlook to connect to Exchange Online. In most cases, getting rid of those cached tokens was accomplished via the Windows credential manager.
  5. Apps have to actually enforce token lifetime. And many do not. So don’t necessarily trust the lifetimes to be solid.
  6. How token binding works is complicated. I don’t know how it works on non-Windows platforms. But this Twitter thread has some good details on the Windows platform token binding. Some key takeaways from that thread:
    1. The obvious: with token binding, you can’t do lateral movement–the token is only good on the computer it is on.
    2. You can further protect the token with Windows 10’s Key Guard, a hypervisor key isolation service
    3. Edge, IE, and the HTTP stack on Windows 10 all support token binding
    4. There are downsides to token binding: No 0-RTT, you can’t share tokens :), and proxies might break/strip your access.
    5. In the near-future, you can add FIDO as an additional layer of protection, which gives you a portable hardware token you can bind your AAD token to, in addition to the client computer binding.
  7. AAD token revocation is complicated.
    1. The ability to revoke is limited to specific AAD roles and you must use one of two PowerShell cmdlets to do it.
    2. Disabling the AAD user is usually a good idea in combination with revocation. If you don’t want the user to remain disabled, then have some process where the user must call into your service desk to get out of that state. You’ll need to design your approach based on the common use cases behind why you need to revoke.
    3. Revocation will be ineffective in some scenarios–in particular when a PRT is in play–and a PRT can only be in play if you have Azure AD domain joined devices. To make a PRT unusable, you have to disable or delete the AAD device. No amount of revocations will affect it. Of course, you could just disable the AAD account and then that PRT can’t be used to get new tokens.
    4. Conditional access is another way to prevent an issued token from being useful. It might be used in situations where you want to block access to some/most apps, but still need to allow access to a few.

User on Mac can’t login to AD, but can login from Windows

For about a month, we’ve been struggling with a pesky problem which boils down to the above description. As the above description suggests, the password is known to the user and the account is functional.

Other details:

  • Many others could login from all the macs tried—just the one user was having problems
  • We ensured that different groups setup the macs to rule out some odd configuration decision
  • We tried a couple semi-obvious things, like changing the password and ensuring the length and use of special characters shouldn’t cause problems on a Mac
  • We also looked closely at the AD account itself to see if there was anything broken or odd about it
  • We also tried inspecting the Mac-side logs, but that didn’t lead anywhere—but admittedly, our Mac expertise may not have been deep enough.

None of these produced much of anything and we were starting to create very far-fetched theories, and approaching what we thought was an option of last resort—deleting and recreating the AD user (which we now know probably wouldn’t have worked).

Finally, via a fortunate circumstance, one of our team members discovered there was another account with a displayName value which matched the user’s samAccountName value. We manually changed the displayName value on that other account, and logons for the person from Macs now worked!

The connection between a displayName and what the Mac client calls the username may seem arbitrary, but there is a fairly straightforward explanation:

  • Apple’s Mac AD client takes the username field value supplied and does an AD query using what is called ‘ambiguous name resolution’ (or ANR). This is a query against a set of pre-defined set of naming attributes. It’s what Outlook uses when you do an Address Book query. The ANR set includes displayName. You can find the other AD attributes it includes via an internet search of ‘ambiguous name resolution’. That’s a little funny design decision, but given the broad set of circumstances Apple might find in various Active Directories, it’s pretty understandable.

We believe the following to be true but there is a little speculation on our part:

  • Apple’s Mac AD client then takes all results from that query and silently chooses one and tries to authenticate using that one. If it fails, it doesn’t try any others and raises an error. Assuming our speculation is correct, this is a really bad design decision on Apple’s part.

The takeaway is there is a known problem where if your samAccountName matches the displayName or sn of another UW NetID, you may have problems logging in from a Mac.

I have never heard anyone else share anything remotely like this, nor have I found anything on the internet which suggests someone else has previously traced this problem to this cause. So I thought I’d share.