I have a very unusual problem with all our our SLES 11 SP3 (also one SP2 but symptoms identical) servers. We host some 40 XEN servers on SLES 11 SP3 Bare metal blade servers. 12 of the servers are OES11 SP1
All servers are using ntpd to synchronize time. A typical config file is as follows
The bare metal servers, and all the OES11 virtual servers have perfect time sync. The OES time is tightly monitored and no issues have been detected in long term logs.
# path for drift file
# Authentication stuff
keys /etc/ntp.keys # path for keys file
trustedkey 1 # define trusted keys
requestkey 1 # key (7) for accessing server variables
A couple of days ago,we noticed that all SLES 11 SP3 servers (XEN virtual only) were about 5 minutes behind. You could reset the time using 'rcntpd ntptimeset' and the time would be corrected, but within the minute, the time would revert to 5 minutes behind.
After hours of searching, I finally decided to try a patch update. (The servers are only ever a couple of months out in the patch cycle). With all SLES 11 SP3 servers (excluding the bare metal) updated to the latest patches and restarted, the exact same symptom exists, except that the servers are all now 1 minute 28 secs fast. I can run 'rcntpd ntptimeset' and the time will be correct. Within a minute, the time will suddenly jump ahead nearly two minutes.
This is new on a production system that otherwise is running very well. I don't believe it is a configuration issue, or a time server issue (As all configs are similar or identical and the timeservers are the same for all servers, those in sync and those not. I suspect the patch cycle before the current one has introduced a bug!
Linux svl-bne-xen-16 3.0.101-0.7.17-xen #1 SMP Tue Feb 4 13:24:49 UTC 2014 (90aac76) x86_64 x86_64 x86_64 GNU/Linux
Is anyone else seeing this problem?