Monday, August 31, 2015

There's no time like Windows time...

I haven't been able to post in a while due to the workload at work.  I though this was important enough to post in the event anyone else ran into it.  From my research dealing with the Windows time service it is something that will bite EVERYONE at one time or another.   For years I've been battling with time sync in the various domains I've managed, and it's always a nightmare. Considering that Bill Gates is the only person I've ever found that has successfully altered the flow of time, (Windows file copy dialog status bar sitting at 98% for hours...) this issue should never have been an issue.

But, I think I just found the answer.

Microsoft has plenty of documentation about W32TM and I cannot recall seeing this noted anywhere. I find it really hard to believe that Microsoft has never hammered all the nails in the coffin of this service.  I've read KB article after KB article, blog after blog, and never found the definitive answer to what the heck is going on with Windows time.  This time I feel like I may have finally slayed the domain time beast.  Like I said, I looked all over for information on how to make things sync properly in a domain.

Turns out the fix was simple, it only took around 8 years and hundreds of blogs and support sites to finally stumble over the answer.  Why in heaven's name this is pasted all over in big red crayon letters I don't know (but there they are, big red letters :-P ).   And to make matters worse I found this bombshell:  https://support.microsoft.com/en-us/kb/939322.  I am quoting from the article at that link:
We do not guarantee and we do not support the accuracy of the W32Time service between nodes on a network. The W32Time service is not a full-featured NTP solution that meets time-sensitive application needs. The W32Time service is primarily designed to do the following: 
  • Make the Kerberos version 5 authentication protocol work. 
  • Provide loose sync time for client computers.The W32Time service cannot reliably maintain sync time to the range of 1 to 2 seconds. Such tolerances are outside the design specification of the W32Time service.
WHAT???  They don't support it?  No wonder.

Anyway, after wiping the blood from my forehead from banging it against the desk I returned to my research on answers for the Windows error "A good time server could not be located" (and no there are no night clubs or bars involved here...) I found a blog entry over at Pete Long's blog in the UK discussing this exact issue.  The first thing he discusses is running DCDIAG and noting a failure in the DC advertising within AD.  I was seeing the same thing.  He recommended removing any and all GPO settings for the time service on policies for the DCs, so what the heck, I gave it a try.

Turns out when you use a Windows domain controller as a time source you CANNOT (or should not) use any GPOs to apply settings to them or it screws them up.  Huh?  I always  use GPOs to set the time settings on DC's, I always have, it's the way I was taught to do it. Unfortunately, when you do, it stops the server from properly advertising as a reliable NTP time source in AD. 

By design when any Windows computer joins a domain it adopts the domain PDC as its master time source.  As a fallback it will try various sources including time.windows.com, and then default to the local hardware clock for reference.  We block NTP at our firewall so no system can use time.windows.com to sync.  And since there didn't appear to be a valid time server on the domain our systems were using their local hardware clock as a fallback. Ugh.

The culprit in our case was that the PDC server wasn’t advertising itself as a reliable time source in AD.  It was getting valid time, and would return that if queried, but if it doesn’t advertise in AD none of the domain members can use it for a time source.    Once all domain policies were disabled on the PDC domain controller and the local settings manually reset on it, everything started working.

So there it is.  After all that ranting and researching just removing the GPO settings fixed it.

To test things is simple.  From an Admin CMD prompt run these commands:

net time  
This should return the time at the domain controller you authenticated from.  Ex:
Current time at \\DC02.mydomain.int is 8/31/2015 2:29:05 PM
If there is no time source advertised in AD you see this:
Could not locate a time-server.

w32tm /query /source
This will list the domain controller you are getting time sync from. Should be the PDC.

w32tm /resync /nowait
This will force your local NTP client to immediately sync from the domain time source.

w32tm /resync /rediscover
This forces the local NTP client query for a valid domain source and then resync.

w32tm /query /status
This reports the current status of your local NTP client.  Note the items in red.
Leap Indicator: 0(no warning)
Stratum: 2 (secondary reference - syncd by (S)NTP)
Precision: -6 (15.625ms per tick)
Root Delay: 0.0312500s
Root Dispersion: 16.0100000s
ReferenceId: 0xC0A85514 (source IP:  192.168.0.20)
Last Successful Sync Time: 8/31/2015 2:32:45 PM
Source: DC01.mydomain.int
Poll Interval: 10 (1024s)

w32tm /query /configuration

This reports your current configuration.  (long output not listed here)

Below is a short list of the many references I've used to isolate this issue:
NTP time source lists:
If you are looking for a freeware local time sync daemon I recommend the one from here: https://www.meinbergglobal.com/english/sw/ntp.htm.  It is based on the open source client from ntp.org and is a very professional package.  It installs as a configurable Windows service.

I hope all this helps someone else.  I spent plenty of time locating it, perhaps it might save you some.