1Leap Second Smearing with NTP 2----------------------------- 3 4By Martin Burnicki 5with some edits by Harlan Stenn 6 7The NTP software protocol and its reference implementation, ntpd, were 8originally designed to distribute UTC time over a network as accurately as 9possible. 10 11Unfortunately, leap seconds are scheduled to be inserted into or deleted 12from the UTC time scale in irregular intervals to keep the UTC time scale 13synchronized with the Earth rotation. Deletions haven't happened, yet, but 14insertions have happened over 30 times. 15 16The problem is that POSIX requires 86400 seconds in a day, and there is no 17prescribed way to handle leap seconds in POSIX. 18 19Whenever a leap second is to be handled ntpd either: 20 21- passes the leap second announcement down to the OS kernel (if the OS 22supports this) and the kernel handles the leap second automatically, or 23 24- applies the leap second correction itself. 25 26NTP servers also pass a leap second warning flag down to their clients via 27the normal NTP packet exchange, so clients also become aware of an 28approaching leap second, and can handle the leap second appropriately. 29 30 31The Problem on Unix-like Systems 32-------------------------------- 33If a leap second is to be inserted then in most Unix-like systems the OS 34kernel just steps the time back by 1 second at the beginning of the leap 35second, so the last second of the UTC day is repeated and thus duplicate 36timestamps can occur. 37 38Unfortunately there are lots of applications which get confused it the 39system time is stepped back, e.g. due to a leap second insertion. Thus, 40many users have been looking for ways to avoid this, and tried to introduce 41workarounds which may work properly, or not. 42 43So even though these Unix kernels normally can handle leap seconds, the way 44they do this is not optimal for applications. 45 46One good way to handle the leap second is to use ntp_gettime() instead of 47the usual calls, because ntp_gettime() includes a "clock state" variable 48that will actually tell you if the time you are receiving is OK or not, and 49if it is OK, if the current second is an in-progress leap second. But even 50though this mechanism has been available for about 20 years' time, almost 51nobody uses it. 52 53 54NTP Client for Windows Contains a Workaround 55-------------------------------------------- 56The Windows system time knows nothing about leap seconds, so for many years 57the Windows port of ntpd provides a workaround where the system time is 58slewed by the client to compensate the leap second. 59 60Thus it is not required to use a smearing NTP server for Windows clients, 61but of course the smearing server approach also works. 62 63 64The Leap Smear Approach 65----------------------- 66Due to the reasons mentioned above some support for leap smearing has 67recently been implemented in ntpd. This means that to insert a leap second 68an NTP server adds a certain increasing "smear" offset to the real UTC time 69sent to its clients, so that after some predefined interval the leap second 70offset is compensated. The smear interval should be long enough, 71e.g. several hours, so that NTP clients can easily follow the clock drift 72caused by the smeared time. 73 74During the period while the leap smear is being performed, ntpd will include 75a specially-formatted 'refid' in time packets that contain "smeared" time. 76This refid is of the form 254.x.y.z, where x.y.z are 24 encoded bits of the 77smear value. 78 79With this approach the time an NTP server sends to its clients still matches 80UTC before the leap second, up to the beginning of the smear interval, and 81again corresponds to UTC after the insertion of the leap second has 82finished, at the end of the smear interval. By examining the first byte of 83the refid, one can also determine if the server is offering smeared time or 84not. 85 86Of course, clients which receive the "smeared" time from an NTP server don't 87have to (and even must not) care about the leap second anymore. Smearing is 88just transparent to the clients, and the clients don't even notice there's a 89leap second. 90 91 92Pros and Cons of the Smearing Approach 93-------------------------------------- 94The disadvantages of this approach are: 95 96- During the smear interval the time provided by smearing NTP servers 97differs significantly from UTC, and thus from the time provided by normal, 98non-smearing NTP servers. The difference can be up to 1 second, depending 99on the smear algorithm. 100 101- Since smeared time differs from true UTC, and many applications require 102correct legal time (UTC), there may be legal consequences to using smeared 103time. Make sure you check to see if this requirement affects you. 104 105However, for applications where it's only important that all computers have 106the same time and a temporary offset of up to 1 s to UTC is acceptable, a 107better approach may be to slew the time in a well defined way, over a 108certain interval, which is what we call smearing the leap second. 109 110 111The Motivation to Implement Leap Smearing 112----------------------------------------- 113Here is some historical background for ntpd, related to smearing/slewing 114time. 115 116Up to ntpd 4.2.4, if kernel support for leap seconds was either not 117available or was not enabled, ntpd didn't care about the leap second at all. 118So if ntpd was run with -x and thus kernel support wasn't used, ntpd saw a 119sudden 1 s offset after the leap second and normally would have stepped the 120time by -1 s a few minutes later. However, 'ntpd -x' does not step the time 121but "slews" the 1-second correction, which takes 33 minutes and 20 seconds 122to complete. This could be considered a bug, but certainly this was only an 123accidental behavior. 124 125However, as we learned in the discussion in http://bugs.ntp.org/2745, this 126behavior was very much appreciated since indeed the time was never stepped 127back, and even though the start of the slewing was somewhat undefined and 128depended on the poll interval. The system time was off by 1 second for 129several minutes before slewing even started. 130 131In ntpd 4.2.6 some code was added which let ntpd step the time at UTC 132midnight to insert a leap second, if kernel support was not used. 133Unfortunately this also happened if ntpd was started with -x, so the folks 134who expected that the time was never stepped when ntpd was run with -x found 135this wasn't true anymore, and again from the discussion in NTP bug 2745 we 136learn that there were even some folks who patched ntpd to get the 4.2.4 137behavior back. 138 139In 4.2.8 the leap second code was rewritten and some enhancements were 140introduced, but the resulting code still showed the behavior of 4.2.6, 141i.e. ntpd with -x would still step the time. This has only recently been 142fixed in the current ntpd stable code, but this fix is only available with a 143certain patch level of ntpd 4.2.8. 144 145So a possible solution for users who were looking for a way to come over the 146leap second without the time being stepped could have been to check the 147version of ntpd installed on each of their systems. If it's still 4.2.4 be 148sure to start the client ntpd with -x. If it's 4.2.6 or 4.2.8 it won't work 149anyway except if you had a patched ntpd version instead of the original 150version. So you'd need to upgrade to the current -stable code to be able to 151run ntpd with -x and get the desired result, so you'd still have the 152requirement to check/update/configure every single machine in your network 153that runs ntpd. 154 155Google's leap smear approach is a very efficient solution for this, for 156sites that do not require correct timestamps for legal purposes. You just 157have to take care that your NTP servers support leap smearing and configure 158those few servers accordingly. If the smear interval is long enough so that 159NTP clients can follow the smeared time it doesn't matter at all which 160version of ntpd is installed on a client machine, it just works, and it even 161works around kernel bugs due to the leap second. 162 163Since all clients follow the same smeared time the time difference between 164the clients during the smear interval is as small as possible, compared to 165the -x approach. The current leap second code in ntpd determines the point 166in system time when the leap second is to be inserted, and given a 167particular smear interval it's easy to determine the start point of the 168smearing, and the smearing is finished when the leap second ends, i.e. the 169next UTC day begins. 170 171The maximum error doesn't exceed what you'd get with the old smearing caused 172by -x in ntpd 4.2.4, so if users could accept the old behavior they would 173even accept the smearing at the server side. 174 175In order to affect the local timekeeping as little as possible the leap 176smear support currently implemented in ntpd does not affect the internal 177system time at all. Only the timestamps and refid in outgoing reply packets 178*to clients* are modified by the smear offset, so this makes sure the basic 179functionality of ntpd is not accidentally broken. Also peer packets 180exchanged with other NTP servers are based on the real UTC system time and 181the normal refid, as usual. 182 183The leap smear implementation is optionally available in ntp-4.2.8p3 and 184later, and the changes can be tracked via http://bugs.ntp.org/2855. 185 186 187Using NTP's Leap Second Smearing 188-------------------------------- 189- Leap Second Smearing MUST NOT be used for public servers, e.g. servers 190provided by metrology institutes, or servers participating in the NTP pool 191project. There would be a high risk that NTP clients get the time from a 192mixture of smearing and non-smearing NTP servers which could result in 193undefined client behavior. Instead, leap second smearing should only be 194configured on time servers providing dedicated clients with time, if all 195those clients can accept smeared time. 196 197- Leap Second Smearing is NOT configured by default. The only way to get 198this behavior is to invoke the ./configure script from the NTP source code 199package with the --enable-leap-smear parameter before the executables are 200built. 201 202- Even if ntpd has been compiled to enable leap smearing support, leap 203smearing is only done if explicitly configured. 204 205- The leap smear interval should be at least several hours' long, and up to 2061 day (86400s). If the interval is too short then the applied smear offset 207is applied too quickly for clients to follow. 86400s (1 day) is a good 208choice. 209 210- If several NTP servers are set up for leap smearing then the *same* smear 211interval should be configured on each server. 212 213- Smearing NTP servers DO NOT send a leap second warning flag to client time 214requests. Since the leap second is applied gradually the clients don't even 215notice there's a leap second being inserted, and thus there will be no log 216message or similar related to the leap second be visible on the clients. 217 218- Since clients don't (and must not) become aware of the leap second at all, 219clients getting the time from a smearing NTP server MUST NOT be configured 220to use a leap second file. If they had a leap second file they would apply 221the leap second twice: the smeared one from the server, plus another one 222inserted by themselves due to the leap second file. As a result, the 223additional correction would soon be detected and corrected/adjusted. 224 225- Clients MUST NOT be configured to poll both smearing and non-smearing NTP 226servers at the same time. During the smear interval they would get 227different times from different servers and wouldn't know which server(s) to 228accept. 229 230 231Setting Up A Smearing NTP Server 232-------------------------------- 233If an NTP server should perform leap smearing then the leap smear interval 234(in seconds) needs to be specified in the NTP configuration file ntp.conf, 235e.g.: 236 237 leapsmearinterval 86400 238 239Please keep in mind the leap smear interval should be between several and 24 240hours' long. With shorter values clients may not be able to follow the 241drift caused by the smeared time, and with longer values the discrepancy 242between system time and UTC will cause more problems when reconciling 243timestamp differences. 244 245When ntpd starts and a smear interval has been specified then a log message 246is generated, e.g.: 247 248 ntpd[31120]: config: leap smear interval 86400 s 249 250While ntpd is running with a leap smear interval specified the command: 251 252 ntpq -c rv 253 254reports the smear status, e.g.: 255 256# ntpq -c rv 257associd=0 status=4419 leap_add_sec, sync_uhf_radio, 1 event, leap_armed, 258version="ntpd 4.2.8p3-RC1@1.3349-o Mon Jun 22 14:24:09 UTC 2015 (26)", 259processor="i586", system="Linux/3.7.1", leap=01, stratum=1, 260precision=-18, rootdelay=0.000, rootdisp=1.075, refid=MRS, 261reftime=d93dab96.09666671 Tue, Jun 30 2015 23:58:14.036, 262clock=d93dab9b.3386a8d5 Tue, Jun 30 2015 23:58:19.201, peer=2335, 263tc=3, mintc=3, offset=-0.097015, frequency=44.627, sys_jitter=0.003815, 264clk_jitter=0.451, clk_wander=0.035, tai=35, leapsec=201507010000, 265expire=201512280000, leapsmearinterval=86400, leapsmearoffset=-932.087 266 267In the example above 'leapsmearinterval' reports the configured leap smear 268interval all the time, while the 'leapsmearoffset' value is 0 outside the 269interval and increases from 0 to -1000 ms over the interval. So this can be 270used to monitor if and how the time sent to clients is smeared. With a 271leapsmearoffset of -.932087, the refid reported in smeared packets would be 272254.196.88.176. 273