xref: /freebsd/contrib/ntp/README.leapsmear (revision b5ff185e19f6013ca565b2a15bc2d6abce933f46)
1*276da39aSCy SchubertLeap Second Smearing with NTP
2*276da39aSCy Schubert-----------------------------
3*276da39aSCy Schubert
4*276da39aSCy SchubertBy Martin Burnicki
5*276da39aSCy Schubertwith some edits by Harlan Stenn
6*276da39aSCy Schubert
7*276da39aSCy SchubertThe NTP software protocol and its reference implementation, ntpd, were
8*276da39aSCy Schubertoriginally designed to distribute UTC time over a network as accurately as
9*276da39aSCy Schubertpossible.
10*276da39aSCy Schubert
11*276da39aSCy SchubertUnfortunately, leap seconds are scheduled to be inserted into or deleted
12*276da39aSCy Schubertfrom the UTC time scale in irregular intervals to keep the UTC time scale
13*276da39aSCy Schubertsynchronized with the Earth rotation.  Deletions haven't happened, yet, but
14*276da39aSCy Schubertinsertions have happened over 30 times.
15*276da39aSCy Schubert
16*276da39aSCy SchubertThe problem is that POSIX requires 86400 seconds in a day, and there is no
17*276da39aSCy Schubertprescribed way to handle leap seconds in POSIX.
18*276da39aSCy Schubert
19*276da39aSCy SchubertWhenever a leap second is to be handled ntpd either:
20*276da39aSCy Schubert
21*276da39aSCy Schubert- passes the leap second announcement down to the OS kernel (if the OS
22*276da39aSCy Schubertsupports this) and the kernel handles the leap second automatically, or
23*276da39aSCy Schubert
24*276da39aSCy Schubert- applies the leap second correction itself.
25*276da39aSCy Schubert
26*276da39aSCy SchubertNTP servers also pass a leap second warning flag down to their clients via
27*276da39aSCy Schubertthe normal NTP packet exchange, so clients also become aware of an
28*276da39aSCy Schubertapproaching leap second, and can handle the leap second appropriately.
29*276da39aSCy Schubert
30*276da39aSCy Schubert
31*276da39aSCy SchubertThe Problem on Unix-like Systems
32*276da39aSCy Schubert--------------------------------
33*276da39aSCy SchubertIf a leap second is to be inserted then in most Unix-like systems the OS
34*276da39aSCy Schubertkernel just steps the time back by 1 second at the beginning of the leap
35*276da39aSCy Schubertsecond, so the last second of the UTC day is repeated and thus duplicate
36*276da39aSCy Schuberttimestamps can occur.
37*276da39aSCy Schubert
38*276da39aSCy SchubertUnfortunately there are lots of applications which get confused it the
39*276da39aSCy Schubertsystem time is stepped back, e.g. due to a leap second insertion.  Thus,
40*276da39aSCy Schubertmany users have been looking for ways to avoid this, and tried to introduce
41*276da39aSCy Schubertworkarounds which may work properly, or not.
42*276da39aSCy Schubert
43*276da39aSCy SchubertSo even though these Unix kernels normally can handle leap seconds, the way
44*276da39aSCy Schubertthey do this is not optimal for applications.
45*276da39aSCy Schubert
46*276da39aSCy SchubertOne good way to handle the leap second is to use ntp_gettime() instead of
47*276da39aSCy Schubertthe usual calls, because ntp_gettime() includes a "clock state" variable
48*276da39aSCy Schubertthat will actually tell you if the time you are receiving is OK or not, and
49*276da39aSCy Schubertif it is OK, if the current second is an in-progress leap second.  But even
50*276da39aSCy Schubertthough this mechanism has been available for about 20 years' time, almost
51*276da39aSCy Schubertnobody uses it.
52*276da39aSCy Schubert
53*276da39aSCy Schubert
54*276da39aSCy SchubertNTP Client for Windows Contains a Workaround
55*276da39aSCy Schubert--------------------------------------------
56*276da39aSCy SchubertThe Windows system time knows nothing about leap seconds, so for many years
57*276da39aSCy Schubertthe Windows port of ntpd provides a workaround where the system time is
58*276da39aSCy Schubertslewed by the client to compensate the leap second.
59*276da39aSCy Schubert
60*276da39aSCy SchubertThus it is not required to use a smearing NTP server for Windows clients,
61*276da39aSCy Schubertbut of course the smearing server approach also works.
62*276da39aSCy Schubert
63*276da39aSCy Schubert
64*276da39aSCy SchubertThe Leap Smear Approach
65*276da39aSCy Schubert-----------------------
66*276da39aSCy SchubertDue to the reasons mentioned above some support for leap smearing has
67*276da39aSCy Schubertrecently been implemented in ntpd.  This means that to insert a leap second
68*276da39aSCy Schubertan NTP server adds a certain increasing "smear" offset to the real UTC time
69*276da39aSCy Schubertsent to its clients, so that after some predefined interval the leap second
70*276da39aSCy Schubertoffset is compensated.  The smear interval should be long enough,
71*276da39aSCy Schuberte.g. several hours, so that NTP clients can easily follow the clock drift
72*276da39aSCy Schubertcaused by the smeared time.
73*276da39aSCy Schubert
74*276da39aSCy SchubertDuring the period while the leap smear is being performed, ntpd will include
75*276da39aSCy Schuberta specially-formatted 'refid' in time packets that contain "smeared" time.
76*276da39aSCy SchubertThis refid is of the form 254.x.y.z, where x.y.z are 24 encoded bits of the
77*276da39aSCy Schubertsmear value.
78*276da39aSCy Schubert
79*276da39aSCy SchubertWith this approach the time an NTP server sends to its clients still matches
80*276da39aSCy SchubertUTC before the leap second, up to the beginning of the smear interval, and
81*276da39aSCy Schubertagain corresponds to UTC after the insertion of the leap second has
82*276da39aSCy Schubertfinished, at the end of the smear interval.  By examining the first byte of
83*276da39aSCy Schubertthe refid, one can also determine if the server is offering smeared time or
84*276da39aSCy Schubertnot.
85*276da39aSCy Schubert
86*276da39aSCy SchubertOf course, clients which receive the "smeared" time from an NTP server don't
87*276da39aSCy Schuberthave to (and even must not) care about the leap second anymore.  Smearing is
88*276da39aSCy Schubertjust transparent to the clients, and the clients don't even notice there's a
89*276da39aSCy Schubertleap second.
90*276da39aSCy Schubert
91*276da39aSCy Schubert
92*276da39aSCy SchubertPros and Cons of the Smearing Approach
93*276da39aSCy Schubert--------------------------------------
94*276da39aSCy SchubertThe disadvantages of this approach are:
95*276da39aSCy Schubert
96*276da39aSCy Schubert- During the smear interval the time provided by smearing NTP servers
97*276da39aSCy Schubertdiffers significantly from UTC, and thus from the time provided by normal,
98*276da39aSCy Schubertnon-smearing NTP servers.  The difference can be up to 1 second, depending
99*276da39aSCy Schuberton the smear algorithm.
100*276da39aSCy Schubert
101*276da39aSCy Schubert- Since smeared time differs from true UTC, and many applications require
102*276da39aSCy Schubertcorrect legal time (UTC), there may be legal consequences to using smeared
103*276da39aSCy Schuberttime.  Make sure you check to see if this requirement affects you.
104*276da39aSCy Schubert
105*276da39aSCy SchubertHowever, for applications where it's only important that all computers have
106*276da39aSCy Schubertthe same time and a temporary offset of up to 1 s to UTC is acceptable, a
107*276da39aSCy Schubertbetter approach may be to slew the time in a well defined way, over a
108*276da39aSCy Schubertcertain interval, which is what we call smearing the leap second.
109*276da39aSCy Schubert
110*276da39aSCy Schubert
111*276da39aSCy SchubertThe Motivation to Implement Leap Smearing
112*276da39aSCy Schubert-----------------------------------------
113*276da39aSCy SchubertHere is some historical background for ntpd, related to smearing/slewing
114*276da39aSCy Schuberttime.
115*276da39aSCy Schubert
116*276da39aSCy SchubertUp to ntpd 4.2.4, if kernel support for leap seconds was either not
117*276da39aSCy Schubertavailable or was not enabled, ntpd didn't care about the leap second at all.
118*276da39aSCy SchubertSo if ntpd was run with -x and thus kernel support wasn't used, ntpd saw a
119*276da39aSCy Schubertsudden 1 s offset after the leap second and normally would have stepped the
120*276da39aSCy Schuberttime by -1 s a few minutes later.  However, 'ntpd -x' does not step the time
121*276da39aSCy Schubertbut "slews" the 1-second correction, which takes 33 minutes and 20 seconds
122*276da39aSCy Schubertto complete.  This could be considered a bug, but certainly this was only an
123*276da39aSCy Schubertaccidental behavior.
124*276da39aSCy Schubert
125*276da39aSCy SchubertHowever, as we learned in the discussion in http://bugs.ntp.org/2745, this
126*276da39aSCy Schubertbehavior was very much appreciated since indeed the time was never stepped
127*276da39aSCy Schubertback, and even though the start of the slewing was somewhat undefined and
128*276da39aSCy Schubertdepended on the poll interval.  The system time was off by 1 second for
129*276da39aSCy Schubertseveral minutes before slewing even started.
130*276da39aSCy Schubert
131*276da39aSCy SchubertIn ntpd 4.2.6 some code was added which let ntpd step the time at UTC
132*276da39aSCy Schubertmidnight to insert a leap second, if kernel support was not used.
133*276da39aSCy SchubertUnfortunately this also happened if ntpd was started with -x, so the folks
134*276da39aSCy Schubertwho expected that the time was never stepped when ntpd was run with -x found
135*276da39aSCy Schubertthis wasn't true anymore, and again from the discussion in NTP bug 2745 we
136*276da39aSCy Schubertlearn that there were even some folks who patched ntpd to get the 4.2.4
137*276da39aSCy Schubertbehavior back.
138*276da39aSCy Schubert
139*276da39aSCy SchubertIn 4.2.8 the leap second code was rewritten and some enhancements were
140*276da39aSCy Schubertintroduced, but the resulting code still showed the behavior of 4.2.6,
141*276da39aSCy Schuberti.e. ntpd with -x would still step the time.  This has only recently been
142*276da39aSCy Schubertfixed in the current ntpd stable code, but this fix is only available with a
143*276da39aSCy Schubertcertain patch level of ntpd 4.2.8.
144*276da39aSCy Schubert
145*276da39aSCy SchubertSo a possible solution for users who were looking for a way to come over the
146*276da39aSCy Schubertleap second without the time being stepped could have been to check the
147*276da39aSCy Schubertversion of ntpd installed on each of their systems.  If it's still 4.2.4 be
148*276da39aSCy Schubertsure to start the client ntpd with -x.  If it's 4.2.6 or 4.2.8 it won't work
149*276da39aSCy Schubertanyway except if you had a patched ntpd version instead of the original
150*276da39aSCy Schubertversion.  So you'd need to upgrade to the current -stable code to be able to
151*276da39aSCy Schubertrun ntpd with -x and get the desired result, so you'd still have the
152*276da39aSCy Schubertrequirement to check/update/configure every single machine in your network
153*276da39aSCy Schubertthat runs ntpd.
154*276da39aSCy Schubert
155*276da39aSCy SchubertGoogle's leap smear approach is a very efficient solution for this, for
156*276da39aSCy Schubertsites that do not require correct timestamps for legal purposes.  You just
157*276da39aSCy Schuberthave to take care that your NTP servers support leap smearing and configure
158*276da39aSCy Schubertthose few servers accordingly.  If the smear interval is long enough so that
159*276da39aSCy SchubertNTP clients can follow the smeared time it doesn't matter at all which
160*276da39aSCy Schubertversion of ntpd is installed on a client machine, it just works, and it even
161*276da39aSCy Schubertworks around kernel bugs due to the leap second.
162*276da39aSCy Schubert
163*276da39aSCy SchubertSince all clients follow the same smeared time the time difference between
164*276da39aSCy Schubertthe clients during the smear interval is as small as possible, compared to
165*276da39aSCy Schubertthe -x approach.  The current leap second code in ntpd determines the point
166*276da39aSCy Schubertin system time when the leap second is to be inserted, and given a
167*276da39aSCy Schubertparticular smear interval it's easy to determine the start point of the
168*276da39aSCy Schubertsmearing, and the smearing is finished when the leap second ends, i.e. the
169*276da39aSCy Schubertnext UTC day begins.
170*276da39aSCy Schubert
171*276da39aSCy SchubertThe maximum error doesn't exceed what you'd get with the old smearing caused
172*276da39aSCy Schubertby -x in ntpd 4.2.4, so if users could accept the old behavior they would
173*276da39aSCy Schuberteven accept the smearing at the server side.
174*276da39aSCy Schubert
175*276da39aSCy SchubertIn order to affect the local timekeeping as little as possible the leap
176*276da39aSCy Schubertsmear support currently implemented in ntpd does not affect the internal
177*276da39aSCy Schubertsystem time at all.  Only the timestamps and refid in outgoing reply packets
178*276da39aSCy Schubert*to clients* are modified by the smear offset, so this makes sure the basic
179*276da39aSCy Schubertfunctionality of ntpd is not accidentally broken.  Also peer packets
180*276da39aSCy Schubertexchanged with other NTP servers are based on the real UTC system time and
181*276da39aSCy Schubertthe normal refid, as usual.
182*276da39aSCy Schubert
183*276da39aSCy SchubertThe leap smear implementation is optionally available in ntp-4.2.8p3 and
184*276da39aSCy Schubertlater, and the changes can be tracked via http://bugs.ntp.org/2855.
185*276da39aSCy Schubert
186*276da39aSCy Schubert
187*276da39aSCy SchubertUsing NTP's Leap Second Smearing
188*276da39aSCy Schubert--------------------------------
189*276da39aSCy Schubert- Leap Second Smearing MUST NOT be used for public servers, e.g. servers
190*276da39aSCy Schubertprovided by metrology institutes, or servers participating in the NTP pool
191*276da39aSCy Schubertproject.  There would be a high risk that NTP clients get the time from a
192*276da39aSCy Schubertmixture of smearing and non-smearing NTP servers which could result in
193*276da39aSCy Schubertundefined client behavior.  Instead, leap second smearing should only be
194*276da39aSCy Schubertconfigured on time servers providing dedicated clients with time, if all
195*276da39aSCy Schubertthose clients can accept smeared time.
196*276da39aSCy Schubert
197*276da39aSCy Schubert- Leap Second Smearing is NOT configured by default.  The only way to get
198*276da39aSCy Schubertthis behavior is to invoke the ./configure script from the NTP source code
199*276da39aSCy Schubertpackage with the --enable-leap-smear parameter before the executables are
200*276da39aSCy Schubertbuilt.
201*276da39aSCy Schubert
202*276da39aSCy Schubert- Even if ntpd has been compiled to enable leap smearing support, leap
203*276da39aSCy Schubertsmearing is only done if explicitly configured.
204*276da39aSCy Schubert
205*276da39aSCy Schubert- The leap smear interval should be at least several hours' long, and up to
206*276da39aSCy Schubert1 day (86400s).  If the interval is too short then the applied smear offset
207*276da39aSCy Schubertis applied too quickly for clients to follow.  86400s (1 day) is a good
208*276da39aSCy Schubertchoice.
209*276da39aSCy Schubert
210*276da39aSCy Schubert- If several NTP servers are set up for leap smearing then the *same* smear
211*276da39aSCy Schubertinterval should be configured on each server.
212*276da39aSCy Schubert
213*276da39aSCy Schubert- Smearing NTP servers DO NOT send a leap second warning flag to client time
214*276da39aSCy Schubertrequests.  Since the leap second is applied gradually the clients don't even
215*276da39aSCy Schubertnotice there's a leap second being inserted, and thus there will be no log
216*276da39aSCy Schubertmessage or similar related to the leap second be visible on the clients.
217*276da39aSCy Schubert
218*276da39aSCy Schubert- Since clients don't (and must not) become aware of the leap second at all,
219*276da39aSCy Schubertclients getting the time from a smearing NTP server MUST NOT be configured
220*276da39aSCy Schubertto use a leap second file.  If they had a leap second file they would apply
221*276da39aSCy Schubertthe leap second twice: the smeared one from the server, plus another one
222*276da39aSCy Schubertinserted by themselves due to the leap second file.  As a result, the
223*276da39aSCy Schubertadditional correction would soon be detected and corrected/adjusted.
224*276da39aSCy Schubert
225*276da39aSCy Schubert- Clients MUST NOT be configured to poll both smearing and non-smearing NTP
226*276da39aSCy Schubertservers at the same time.  During the smear interval they would get
227*276da39aSCy Schubertdifferent times from different servers and wouldn't know which server(s) to
228*276da39aSCy Schubertaccept.
229*276da39aSCy Schubert
230*276da39aSCy Schubert
231*276da39aSCy SchubertSetting Up A Smearing NTP Server
232*276da39aSCy Schubert--------------------------------
233*276da39aSCy SchubertIf an NTP server should perform leap smearing then the leap smear interval
234*276da39aSCy Schubert(in seconds) needs to be specified in the NTP configuration file ntp.conf,
235*276da39aSCy Schuberte.g.:
236*276da39aSCy Schubert
237*276da39aSCy Schubert leapsmearinterval 86400
238*276da39aSCy Schubert
239*276da39aSCy SchubertPlease keep in mind the leap smear interval should be between several and 24
240*276da39aSCy Schuberthours' long.  With shorter values clients may not be able to follow the
241*276da39aSCy Schubertdrift caused by the smeared time, and with longer values the discrepancy
242*276da39aSCy Schubertbetween system time and UTC will cause more problems when reconciling
243*276da39aSCy Schuberttimestamp differences.
244*276da39aSCy Schubert
245*276da39aSCy SchubertWhen ntpd starts and a smear interval has been specified then a log message
246*276da39aSCy Schubertis generated, e.g.:
247*276da39aSCy Schubert
248*276da39aSCy Schubert ntpd[31120]: config: leap smear interval 86400 s
249*276da39aSCy Schubert
250*276da39aSCy SchubertWhile ntpd is running with a leap smear interval specified the command:
251*276da39aSCy Schubert
252*276da39aSCy Schubert ntpq -c rv
253*276da39aSCy Schubert
254*276da39aSCy Schubertreports the smear status, e.g.:
255*276da39aSCy Schubert
256*276da39aSCy Schubert# ntpq -c rv
257*276da39aSCy Schubertassocid=0 status=4419 leap_add_sec, sync_uhf_radio, 1 event, leap_armed,
258*276da39aSCy Schubertversion="ntpd 4.2.8p3-RC1@1.3349-o Mon Jun 22 14:24:09 UTC 2015 (26)",
259*276da39aSCy Schubertprocessor="i586", system="Linux/3.7.1", leap=01, stratum=1,
260*276da39aSCy Schubertprecision=-18, rootdelay=0.000, rootdisp=1.075, refid=MRS,
261*276da39aSCy Schubertreftime=d93dab96.09666671 Tue, Jun 30 2015 23:58:14.036,
262*276da39aSCy Schubertclock=d93dab9b.3386a8d5 Tue, Jun 30 2015 23:58:19.201, peer=2335,
263*276da39aSCy Schuberttc=3, mintc=3, offset=-0.097015, frequency=44.627, sys_jitter=0.003815,
264*276da39aSCy Schubertclk_jitter=0.451, clk_wander=0.035, tai=35, leapsec=201507010000,
265*276da39aSCy Schubertexpire=201512280000, leapsmearinterval=86400, leapsmearoffset=-932.087
266*276da39aSCy Schubert
267*276da39aSCy SchubertIn the example above 'leapsmearinterval' reports the configured leap smear
268*276da39aSCy Schubertinterval all the time, while the 'leapsmearoffset' value is 0 outside the
269*276da39aSCy Schubertinterval and increases from 0 to -1000 ms over the interval.  So this can be
270*276da39aSCy Schubertused to monitor if and how the time sent to clients is smeared.  With a
271*276da39aSCy Schubertleapsmearoffset of -.932087, the refid reported in smeared packets would be
272*276da39aSCy Schubert254.196.88.176.
273