ILM fixes and displayName explained in detail

Over the past 4 months, I’ve been working on improving

A technical intro to ILM

In that post, I talked about some of the “gotchas” in our existing implementation.

I’m now back to report that most of those have been fixed. ūüôā

I spoke of the high number of disconnectors, which resulted in a 3 hour cycle in our environment. That problem has been solved. The solution was to switch which management agent (MA)project objects to the metaverse (MV). Instead of having the NETID Active Directory MA project, now the PDS custom MA projects to the MV. This means that instead of re-evaluating about half a million disconnectors (from the PDS MA) every time it runs, it only re-evaluates a small handful (from the NETID MA). The ride getting to this fixed state was a bit bumpy on the back-end, and I learned quite a bit along the way.

For example, once a given MV object has been projected, removing the projection rule which resulted in its projection does not remove the relationship with the original MA object. This can result in lots of awful behavior, especially when it happens in large numbers. One fix is to delete the MA space, reimport everything for that MA, and re-run a sync. Deleting the underlying MA object removes the relationship. But it also incurs an unexpected penalty as the MV space needs to re-evaluate everything. And that particular MV object may get deleted as a result, then get re-projected by the other MA.

But the results are rather dramatic: our entire sync cycle now takes about 10 seconds to run. I’ve said elsewhere that it’s about a 1200% improvement, because we went from a 3 hour scheduled cycle to a 15 minute scheduled cycle. But in reality it’s more like 600% improvement because the actual time went from 100 minutes to 10 seconds. Regardless, it’s very good.

Another gotcha I’ve fixed is including more than just the uwPerson objectclass from PDS. There was really no reason to limit what classes from PDS contributed info, and so we did away with that limitation. Along the way, we also added uid synchronization to our ILM feed to UWWI. Combined with the non-uwPerson objectclass fix, this means that all accounts in UWWI which have been provisioned with a UW uid will have it present in UWWI on their uidNumber attribute.

However, I do need to report that one of the more visible gotchas is still outstanding. The name situation remains in the state I reported previously. There have been a couple new problems reported, and awareness of this issue seems to be spreading, but it hasn’t reached enough critical mass yet to justify prioritization over existing projects. I’m hopeful that will happen within the next 6 months however.

My previous coverage of that problem was mostly just an overview, so it’s probably worth taking the time now to cover the problem in greater detail.

Here’s the skinny:

Upon provisioning, our account creation agent, fuzzy_kiwi, does some complicated name parsing logic similar to what I’ll describe below. For uw netids where it doesn’t find a PDS entry, or where it isn’t allowed to publish the name, it stamps the uwnetid value on the name attributes. And for some uw netids, that initial value is where things end.

As you know, ILM connects NETID users with PDS objects, and keeps the name attributes in synchronization according to some complicated name parsing logic.

The recent fix to include non-uwPerson objects from PDS doesn’t really change the name story at all, because the name attributes in PDS that ILM uses as seed information are not present on non-uwPerson objects.

This is an important point to understand. non-uwPerson objects don’t have any name info that ILM wants, so ILM does nothing with them.

In contrast, uwPerson objects do have the naming attributes ILM cares about. However for those uwPerson objects, only two classes of users have any ability to modify those naming attributes. Only UW employees and students have any ability to change that name information. Employees can use ESS to modify that name info, and students must talk to the Registrar to modify it (yes, a phone call or an in-person visit, nothing online). To generalize the logic for these accounts to an understandable form, there are two important pieces of info. First, a flag which indicates whether your info is publishable. Second,¬†the name you’ve given ESS or the registrar. If you’ve agreed to publish, then parsing happens, and your displayname comes in the form “Brian D. Arkills” or “Brian Arkills” depending on how many substrings there are in what you gave ESS/Registrar. If you didn’t agree to publish, then parsing happens and your displayname comes out in a format like “B. Arkills”. In some odd cases, your displayName can come out as just “Arkills” or “Brian”. There isn’t much flexibility here.

People who aren’t UW employees or students have no ability to create the editable naming attributes ILM cares about.¬†In other words, affiliates and¬†sponsored UW NetIDs, shared NetIDs, temporary NetIDs, etc. all have no ability to change what their UWWI displayName is. So what displayName do they end up with? All of these accounts end up with a displayName of “B. Arkills”. The flexibility here is non-existent; there is no back-end solution here where someone can edit something to fix this for a given user. The entire¬†system needs to be overhauled to fix this.

Yes, this is an awful state of affairs, and yes, I agree this should get fixed sooner than later. And if you agree, you should talk with your UW Exchange representative and ask them about raising the priority of this feature.

So, how or why did things get designed this way?

Well … it turns out that back when we rolled out Exchange, the MSCA initiative was limited to *employees*. No students, no non-employees. And all the engineering inquiries into this design constraint came back that it was a firm limit.¬†But then rather quickly,¬†that limitation fell by the wayside , but engineering wasn’t given the time to revisit and refactor. So the combination of poor design constraints and then changing the scope after launch w/o revisiting the solution was a pretty big contributing factor.

Another factor was that our primary engineer was convinced that there was a significant value to having consistent displayName formatting within the Exchange GAL. So he wanted “Brian D. Arkills” or “B. Arkills” only. This is why almost every¬†displaynames ends up in one of those two formats.

Another factor was that the name source information situation isn’t pretty today. There’s the official name info, which is case insensitive. Folks¬†with mid-name capitalization¬†lose out in that, and¬†it’s rather hard to edit this piece of info. Then there’s the editable name info coming from HEPPS (the HR source system) and SDB (the Registrar source system). But the info coming from those sources has no input validation, and is not guaranteed to follow any format. So while the user has lots of control, it’s a nightmare to figure out whether a name will come in as “Brian Arkills”, “Arkills, Brian”, “Brian David Arkills”, “Brian David Joe-Bob Arkills”, “Mr. Brian Arkills”, “Mr. Brian Arkills Sr.”, etc. Obviously the number of permutations are endless, and it’s impossible to predict what format the data will be in.

And that’s it.

But in a very real sense, there is a bigger picture problem here. The problem here is that there are many source systems, and each of them do things differently. Getting all those source systems to implement naming information, input validation, publishing flags, etc is an uphill battle. And when someone is in more than one of those source systems, then you have to choose which source wins.

The solution we imagine bringing to this situation is to implement a UW NetID level solution. From the UW NetID Manage page you assert your name. This eliminates the problems that come from multiple source systems, and the person vs. non-person issues. If you have a UW NetID, you’d be able to set the name. Period. It also allows us to implement input validation in a single place, and restrict the formatting to¬†something reasonable and predictable. Obviously, UWWI would be one of the first to use such a mechanism, and hopefully other source systems will begin to see the value in having a single name across all systems and also leverage it instead. We imagine a state of affairs where people incrementally populate this new bit of name information, and UWWI continues to use the existing logic, unless this new info is present.

So in summary, things in the UWWI directory synchronization space are much better, but we’ve still got this name blot on our balance sheet. And hopefully we’ll get fixing that prioritized soon.

DC changes

A minor update …

As mentioned in the announcement about moving the DCs to p172, we’re in the midst of rebuilding and adding additional domain controllers to UWWI.

Last week, we added yoda as a new, additional DC.

Today, we demoted lando, in preparation of rebuilding it with WS2008, and re-promoting it afterward.

We’ll also be adding obiwan as a new, additional DC soon.

And later we’ll be demoting chewie and luke, rebuilding them with WS2008, and re-promoting.

None of this activity will result in an announcement or outage notice, as none of it should be a user-visible.

However, when we are done with all this activity, we will be making an announcement prior to moving the domain and forest to WS2008 functional level, as this enables new functionality that y’all might care about.

Ldp.exe, my favorite Windows tool

I should start by saying that I really like to work with Active Directory. And I freely admit that I’m somewhat rare, being more familiar with LDAP than your usual geek. But regardless, I think more people should be using ldp.exe.

If you’ve worked with Active Directory for very long, you know that the usual mmc snap-in tools leave a lot to be desired. The biggest problem with them is that they regularly hide information from you¬† in the interest of “helping” you. And sometimes, in the interest of making stuff more fool-proof, they arbitrarily limit what you can do. In general, I hate most of the AD mmc snap-ins. I will use them occasionally, especially for doing ACL work, because the alternatives for doing ACL work are very, very ugly. So in my opinion, they are good for a few things, but in general, I use ldp.exe instead.

Ldp.exe takes a bit of getting used to, and is not for your general casual admin. If you only occasionally need to adminstrate AD, then ldp.exe might help you out of a rough patch, but it likely won’t be something you’d generally use.

Ldp.exe takes a more LDAP centric approach to AD. You connect, you bind, you execute other LDAP operations. You have access to specify LDAP controls that modify what the basic LDAP operations do. You have the ability to specify which attributes are returned, and the ability to directly set a filter so you can view objects which are in many different containers at the same time (unlike ADUC).

One of my favorite things about ldp.exe is that it enables me to see what is happening beneath the surface. And if I can see what is happening beneath the surface, then I am better able to understand what mechanisms are involved in any given technology, and better able to troubleshoot problems. It removes the blinders that the other mmc snap-ins throw on.

Now … that removal exposes a lot of info, some of which is not especially useful. But you’d be surprised at how much of the info that is typically hidden by say ADUC, is very useful. For example, pwdLastSet. I find it very useful to know when someone last set their password, especially if they are claiming that they just set their password and it doesn’t work anymore. Does ADUC tell me this? And badPasswordTime tells me when the last unsuccessful password happened, which might help me in the above scenario to determine that the user is mistyping their username or the domain. Again, you won’t see this info in ADUC.

As you become more aware of what is under the surface, you’ll begin to find that there are ways to accomplish tasks that the mmc snap-ins won’t allow. For example, if you want to configure an account for Kerberos delegation, specifying that it is permitted to delegate to a service on a computer that is outside your forest, you are left high & dry by ADUC. But by paying attention to what is under the surface, you see that the msDS-AllowedToDelegateTo attribute is where the trusted delegation information is stored. And so you can directly modify that attribute, adding the values needed.

But one of the most beautiful things about ldp.exe is the ability to find all the objects which meet some specific criteria. Say I want to find all the objects which have a uid set. Can I do that with ADUC? No, because uidNumber was not included in the advanced find functionality. But with ldp.exe I simply set a filter of (uidNumber=*), maybe specify that I only want the DN attribute (so I’m not deluged by too much info), and I see the list of all the objects with a uid. ADUC so rarely has what I want in its search options that I don’t use that functionality of it at all.

Another one of the things I like about ldp.exe is that it allows me to find out the critical bits so I can write code which might do something useful. Granted, not everyone writes code, and certainly not many people write code against AD. But if you are, I can’t imagine getting along without ldp.exe.

You can also use ldp.exe to connect to other LDAP directories which aren’t AD. For example, you might want to connect to the UW whitepages directory. Or to the UW email forwarding directory. More related to this below …

I should say a few things about some other tools.

Adsiedit.msc is nearly as useful as ldp.exe. It also gives you increased access to all the info. And it comes with the GUI ACL interface, which can be very useful, especially if you have a security problem in your configuration partition (rare, but it happens). And you can use it to enumerate all the *possible* attributes for a given object, which is a much harder task via any other tool. But it lacks the searching power, and configuration abilities that ldp.exe so I only call upon it occasionally. But I don’t sneer at it the way I do at ADUC. ūüôā

A short time ago, Mark Russinovich released an AD management tool called AD Explorer. It’s interesting, in that it allows you to work with multiple domains, even across forests, at the same time. But I find that it has a sql-based approach, and this tends to limit it’s functionality. I find that it is much slower that ldp.exe. It does simplify some things, but ultimately, I gave up on it.

I tried Softerra’s LDAP Browser 2.6 awhile back. It also allows you to work with multiple domains or LDAP directories. It does have a LDAP based approach, but I didn’t really like the way information was returned. My main desire in trying this tool was to see if it supported certificate-based authentication.

Which brings me to a final point. None of the tools I’ve mentioned provide certificate-based authentication. As you might know, both PDS and GDS require cert-based authentication. To my knowledge, there are no free Windows-based GUI tools that provide cert authN support.

As I’ve developed stuff which synchronized with PDS and GDS, this gap has driven me a bit crazy. For the longest time, I’d use visual studio, via the .net code I had developed to access PDS and GDS for troubleshooting and lookups. Then one day, I realized that I could make my own tool which addressed this gap. So I did. At this point, it isn’t very fancy, and it certainly is not a GUI-based tool. What it is, is command-line based, and Windows platform based. I’d be happy to share this tool (or the code) with anyone who has need of something like this. The tool or code does not magically give you access to GDS or PDS, however. You will still need to request access via a certificate, and run the tool from a computer that has that cert installed (with access to the private key granted to the user running the tool).

Moving your DCs to p172

Quite some time ago, I wrote a webpage about this process after Scott Barker and the iSchool piloted it. But in the course of time, that page has fallen into disuse mostly because it got lost in various linking shuffles.

That webpage is at: http://www.washington.edu/computing/support/windows/UWdomains/p172.html

And the info in it is still mostly valid.

But to keep things fresh, I thought I’d review here what we did in the recent UWWI p172 change.

  1. Most DNS A records have an 86400 TTL, i.e. 24 hours. So one day beforehand, set down TTLs on key A records from the default of 24 hours to something quite low, i.e. the A record for each dc and the A record for the Windows domain itself e.g. luke.netid.washington.edu and netid.washington.edu.

    The SRV records and CNAME records for a Windows domain all point at the A records so no changes needed there (yet).

    This is the step I messed up on, and the reason why the work was delayed one day. ūüôĀ

  2. Find p172 addresses for each DC. The p172 equivalent IP may not be available, so you may have to find another open IP on the p172 equivalent network.

    P172 equivalent networks are:
    128.95.x      -> 172.25.x
    128.208.y    -> 172.28.y
    140.142.z    ->  172.22.z

    Ask NOC to reserve these IP addresses, and make sure they are available to make the urgent DNS change you plan on making.

    Again, this is a step I messed up on, assuming the p172 equivalent IP would be available (for lando). ūüôĀ

  3. One day later. RDP to each DC. Add p172 equivalent address (and p172 gateway). Disconnect.
  4. RDP to p172 address you just added (NOT the DNS name). Remove public address (and¬†public gateway). RDP session will “flash” when you click OK on network settings.
  5. Send request that all the A records be changed to the new p172 address, and that all SRV and CNAME records that reference those A records be moved to the internal only/ private DNS zone file.
  6. Wait for changes. Use dig to verify changes have happened.
  7. Reboot each DC, being careful to have only one DC down at a time.

Alternatively, you might move one DC to p172 at a time, asking for DNS changes between each move. This would be a lot more complicated though, because there will be changes in the public DNS zone and changes in the private DNS zone, and any given SRV/CNAME record will have different states in those two zone files. In other words, this added complexity is likely to mean more opportunity for mistakes. So I’d advise against it.

You might also swap the order of steps #4 and 5, taking care to RDP into all the DCs via the p172 address first. This might provide a better client experience.

Windows domains with domain-based DFS will want to schedule this work for a time where clients are less likely to be accessing network files.

Of course, if you have trusts with other domains/forests, and they have firewall rules, you’ll want to keep them in the loop.

And that’s it. Enjoy!