19c2daa00SOllivier Robert<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> 29c2daa00SOllivier Robert 39c2daa00SOllivier Robert<html> 49c2daa00SOllivier Robert 59c2daa00SOllivier Robert <head> 69c2daa00SOllivier Robert <meta name="generator" content="HTML Tidy, see www.w3.org"> 79c2daa00SOllivier Robert <title>NTP Debugging Techniques</title> 89c2daa00SOllivier Robert <link href="scripts/style.css" type="text/css" rel="stylesheet"> 99c2daa00SOllivier Robert </head> 109c2daa00SOllivier Robert 119c2daa00SOllivier Robert <body> 129c2daa00SOllivier Robert <h3>NTP Debugging Techniques</h3> 139c2daa00SOllivier Robert <img src="pic/pogo.gif" alt="gif" align="left"><a href="http://www.eecis.udel.edu/%7emills/pictures.html">from <i>Pogo</i>, Walt Kelly</a> 149c2daa00SOllivier Robert <p>We make house calls and bring our own bugs.</p> 159c2daa00SOllivier Robert <p>Last update: <csobj format="ShortTime" h="25" locale="00000409" region="0" t="DateTime" w="97">10:23 PM</csobj> UTC <csobj format="LongDate" h="25" locale="00000409" region="0" t="DateTime" w="266">Tuesday, August 05, 2003</csobj></p> 169c2daa00SOllivier Robert <br clear="left"> 179c2daa00SOllivier Robert <h4>More Help</h4> 189c2daa00SOllivier Robert <script type="text/javascript" language="javascript" src="scripts/links12.txt"></script> 199c2daa00SOllivier Robert <hr> 209c2daa00SOllivier Robert <p>Once the NTP software distribution has been compiled and installed and the configuration file constructed, the next step is to verify correct operation and fix any bugs that may result. Usually, the command line that starts the daemon is included in the system startup file, so it is executed only at system boot time; however, the daemon can be stopped and restarted from root at any time. Usually, no command-line arguments are required, unless special actions described in the <a href="ntpd.html"><tt>ntpd</tt> - Network Time Protocol (NTP) daemon</a> page are required. Once started, the daemon will begin sending and receiving messages, as specified in the configuration file.</p> 219c2daa00SOllivier Robert <h4>Initial Startup</h4> 229c2daa00SOllivier Robert <p>When started for the first time, the frequency file, usually called <tt>ntp.drift</tt>, has not yet been created. The daemon switches to a special training routine designed to quickly determine the system clock frequency offset of the particular machine. The routine first measures the current clock offset and sets the clock, then continues for up to twenty minutes before measuring the clock offset, which might involve setting the clock again. The two measurements are used to compute the initial frequency offset and the daemon continues in regular operation, during which the frequency offset is continuously updated. Once each hour the daemon writes the current frequency offset to the <tt>ntp.drift</tt> file. When restarted after that, the daemon reads the frequency offset from the <tt>ntp.drift</tt> file and avoids the training routine.</p> 239c2daa00SOllivier Robert <p>Note that the daemon requires at least four packet exchanges when first started in any case. This is required in order for the mitigation algorithms to insure valid and accurate measurements and defend against network delay spikes and accidental or malicious errors induced by the servers selected in the configuration file. It normally takes less than four minutes to set the clock when first started, but this can be reduced to less than ten seconds with the <tt>iburst</tt> configuration option.</p> 249c2daa00SOllivier Robert <p>The best way to verify correct operation is using the <a href="ntpq.html"><tt>ntpq</tt> - standard NTP query program</a> and <a href="ntpdc.html"><tt>ntpdc</tt> - special NTP query program</a> utility programs, either on the server itself or from another machine elsewhere in the network. The <tt>ntpq</tt> program implements the management functions specified in the NTP specification <a href="http://www.eecis.udel.edu/%7emills/database/rfc/rfc1305/rfc1305c.ps">RFC-1305, Appendix A</a>. The <tt>ntpdc</tt> program implements additional functions not provided in the standard. Both programs can be used to inspect the state variables defined in the specification and, in the case of <tt>ntpdc</tt>, additional ones intended for serious debugging. In addition, the <tt>ntpdc</tt> program can be used to selectively reconfigure and enable or disable some functions while the daemon is running.</p> 259c2daa00SOllivier Robert <p>In extreme cases with elusive bugs, the daemon can operate in two modes, depending on the presence of the <tt>-d</tt> command-line debug switch. If not present, the daemon detaches from the controlling terminal and proceeds autonomously. If one or more <tt>-d</tt> switches are present, the daemon does not detach and generates special output useful for debugging. In general, interpretation of this output requires reference to the sources. However, a single <tt>-d</tt> does produce only mildly cryptic output and can be very useful in finding problems with configuration and network troubles. With a little experience, the volume of output can be reduced by piping the output to <tt>grep</tt> and specifying the keyword of the trace you want to see.</p> 269c2daa00SOllivier Robert <p>Some problems are immediately apparent when the daemon first starts running. The most common of these are the lack of a UDP port for NTP (123) in the Unix <tt>/etc/services</tt> file (or equivalent in some systems). <b>Note that NTP does not use TCP in any form. Also note that NTP requires 123 for both source and destination ports.</b> These facts should be pointed out to firewall administrators.</p> 279c2daa00SOllivier Robert <p>Other problems are apparent in the system log, which ordinarily shows the startup banner, some cryptic initialization data and the computed precision value. Error messages at startup and during regular operation are sent to the system log. In real emergencies the daemon will sent a terminal error message to the system log and then cease operation.</p> 289c2daa00SOllivier Robert <p>The next most common problem is incorrect DNS names. Check that each DNS name used in the configuration file exists and that the address responds to the Unix <tt>ping</tt> command. The Unix <tt>traceroute</tt> or Windows <tt>tracert</tt> utility can be used to verify a partial or complete path exists. Most problems reported to the NTP newsgroup are not NTP problems, but problems with the network or firewall configuration.</p> 299c2daa00SOllivier Robert <p>When first started, the daemon polls the servers listed in the configuration file at 64-s intervals. In order to allow a sufficient number of samples for the NTP algorithms to reliably discriminate between truechimer servers and possible falsetickers, at least four valid messages from at least one server or peer listed in the configuration file is required before the daemon can set the clock. However, if the difference between the client time and server time is greater than the panic threshold, which defaults to 1000 s, the daemon sends a message to the system log and shuts down without setting the clock. It is necessary to set the local clock to within the panic threshold first, either manually by eyeball and wristwatch and the Unix <tt>date</tt> command, or by the <tt>ntpdate</tt> or <tt>ntpd -q</tt> commands. The panic threshold can be changed by the <tt>tinker panic</tt> command discribed on the <a href="miscopt.html">Miscellaneous Options</a> page. The panic threshold can be disabled for the first measurement by the <tt>-g</tt> command line option described on the <a href="ntpd.html"><tt>ntpd</tt> - Network Time Protocol (NTP) daemon</a> page.</p> 309c2daa00SOllivier Robert <p>If the difference between local time and server time is less than the panic threshold but greater than the step threshold, which defaults to 128 ms, the daemon will perform a step adjustment; otherwise, it will gradually slew the clock to the nominal time. Step adjustments are extremely rare in ordinary operation, usually as the result of reboot or hardware failure. The step threshold can be changed to 300 s using the <tt>-x</tt> command line option described on the <tt>ntpd</tt> page. This is usually sufficient to avoid a step after reboot or when the operator has set the system clock to within five minutes by eyeball-and-wristwatch. In extreme cases the step threshold can be changed by the <tt>tinker step</tt> command discribed on the <a href="miscopt.html">Miscellaneous Options</a> page. If set to zero, the clock will never be stepped; however, users should understand the implications for doing this in a distributed data network where all processing must be tightly synchronized. See the <a href="http://www.eecis.udel.edu/%7emills/leap.html">NTP Timescale and Leap Seconds</a> page for further information. If a step adjustment is made, the clock discipline algorithm will start all over again, requiring another round of at least four messages as before. This is necessary so that all servers and peers operate on the same set of time values.</p> 319c2daa00SOllivier Robert <p>The clock discipline algorithm is designed to avoid large noise spikes that might occur on a congested network or access line. If an offset sample exceeds the step threshold, it is ignored and a timer started. If a later sample is below the step threshold, the counter is reset and operation continues normally. However, if the counter is greater than the stepout interval, which defaults to 900 s, the next sample will step the time as directed. The stepout threshold can be changed by the <tt>tinker stepout</tt> command discribed on the Miscellaneous Options page.</p> 329c2daa00SOllivier Robert <p>If for some reason the hardware clock oscillator frequency error is very large, say over 400 PPM, the time offset when the daemon is started for the first time may increase over time until exceeding the step threshold, which requires a frequency adjustment and another step correction. However, due to provisions that reduce vulnerability to noise spikes, the second correction will not be done until after the stepout threshold. When the frequency error is very large, it may take a number of cycles like this until converging to the nominal frequency correction and writing the <tt>ntp.drift</tt> file. If the frequency error is over 500 PPM, convergence will never occur and occasional step adjustments will occur indefinitely.</p> 339c2daa00SOllivier Robert <h4>Verifying Correct Operation</h4> 349c2daa00SOllivier Robert <p>After starting the daemon, run the <tt>ntpq</tt> program using the <tt>-n</tt> switch, which will avoid possible distractions due to name resolution problems. Use the <tt>pe</tt> command to display a billboard showing the status of configured peers and possibly other clients poking the daemon. After operating for a few minutes, the display should be something like:</p> 359c2daa00SOllivier Robert <pre> 369c2daa00SOllivier Robertntpq> pe 379c2daa00SOllivier Robert remote refid st t when poll reach delay offset jitter 389c2daa00SOllivier Robert===================================================================== 399c2daa00SOllivier Robert-isipc6.cairn.ne .GPS1. 1 u 18 64 377 65.592 -5.891 0.044 409c2daa00SOllivier Robert+saicpc-isiepc2. pogo.udel.edu 2 u 241 128 370 10.477 -0.117 0.067 419c2daa00SOllivier Robert+uclpc.cairn.net pogo.udel.edu 2 u 37 64 177 212.111 -0.551 0.187 429c2daa00SOllivier Robert*pogo.udel.edu .GPS1. 1 u 95 128 377 0.607 0.123 0.027 439c2daa00SOllivier Robert</pre> 449c2daa00SOllivier Robert <p>The host names or addresses shown in the <tt>remote</tt> column correspond to the server and peer entries listed in the configuration file; however, the DNS names might not agree if the names listed are not the canonical DNS names. IPv4 addresses are shown in dotted quad notation, while IPv6 addresses are shown alarmingly. The <tt>refid</tt> column shows the current source of synchronization, while the <tt>st</tt> column reveals the stratum, <tt>t</tt> the type (<tt>u</tt> = unicast, <tt>m</tt> = multicast, <tt>l</tt> = local, <tt>-</tt> = don't know), and <tt>poll</tt> the poll interval in seconds. The <tt>when</tt> column shows the time since the peer was last heard in seconds, while the <tt>reach</tt> column shows the status of the reachability register (see RFC-1305) in octal. The remaining entries show the latest delay, offset and jitter in milliseconds. Note that in NTP Version 4 what used to be the <tt>dispersion</tt> column has been replaced by the <tt>jitter</tt> column.</p> 459c2daa00SOllivier Robert <p>As per the NTP specification RFC-1305, when the <tt>stratum</tt> is between 0 and 15 for a NTP server, the <tt>refid</tt> field shows the server DNS name or, if not found, the IP address in dotted-quad. When the <tt>stratum</tt> is any value for a reference clock, this field shows the identification string assigned to the clock. However, until the client has synchronized to a server, or when the <tt>stratum</tt> for a NTP server is 0 (appears as 16 in the billboards), the status cannot be determined. As a help in debugging, the <tt>refid</tt> field is set to a four-character string called the kiss code. The current kiss codes are as as follows.</p> 469c2daa00SOllivier Robert <p>Peer Kiss Codes</p> 479c2daa00SOllivier Robert <p><tt>ACST</tt></p> 489c2daa00SOllivier Robert <dl> 499c2daa00SOllivier Robert <dd>The association belongs to a anycast server. 509c2daa00SOllivier Robert <dt><tt>AUTH</tt> 519c2daa00SOllivier Robert <dd>Server authentication failed. Please wait while the association is restarted. 529c2daa00SOllivier Robert <dt><tt>AUTO</tt> 539c2daa00SOllivier Robert <dd>Autokey sequence failed. Please wait while the association is restarted. 549c2daa00SOllivier Robert <dt><tt>BCST</tt> 559c2daa00SOllivier Robert <dd>The association belongs to a broadcast server. 569c2daa00SOllivier Robert <dt><tt>CRYP</tt> 579c2daa00SOllivier Robert <dd>Cryptographic authentication or identification failed. The details should be in the system log file or the <tt>cryptostats</tt> statistics file, if configured. No further messages will be sent to the server. 589c2daa00SOllivier Robert <dt><tt>DENY</tt> 599c2daa00SOllivier Robert <dd>Access denied by remote server. No further messages will be sent to the server. 609c2daa00SOllivier Robert <dt><tt>DROP</tt> 619c2daa00SOllivier Robert <dd>Lost peer in symmetric mode. Please wait while the association is restarted. 629c2daa00SOllivier Robert <dt><tt>RSTR</tt> 639c2daa00SOllivier Robert <dd>Access denied due to local policy. No further messages will be sent to the server. 649c2daa00SOllivier Robert <dt><tt>INIT</tt> 659c2daa00SOllivier Robert <dd>The association has not yet synchronized for the first time. 669c2daa00SOllivier Robert <dt><tt>MCST</tt> 679c2daa00SOllivier Robert <dd>The association belongs to a manycast server. 689c2daa00SOllivier Robert <dt><tt>NKEY</tt> 699c2daa00SOllivier Robert <dd>No key found. Either the key was never installed or is not trusted. 709c2daa00SOllivier Robert <dt><tt>RATE</tt> 719c2daa00SOllivier Robert <dd>Rate exceeded. The server has temporarily denied access because the client exceeded the rate threshold. 729c2daa00SOllivier Robert <dt><tt>RMOT</tt> 739c2daa00SOllivier Robert <dd>Somebody is tinkering with the association from a remote host running <tt>ntpdc</tt>. Not to worry unless some rascal has stolen your keys. 749c2daa00SOllivier Robert <dt><tt>STEP</tt> 759c2daa00SOllivier Robert <dd>A step change in system time has occurred, but the association has not yet resynchronized. 769c2daa00SOllivier Robert </dl> 779c2daa00SOllivier Robert <p>System Kiss Codes</p> 789c2daa00SOllivier Robert <dl> 799c2daa00SOllivier Robert <dt><tt>INIT</tt> 809c2daa00SOllivier Robert <dd>The system clock has not yet synchronized for the first time. 819c2daa00SOllivier Robert <dt><tt>STEP</tt> 829c2daa00SOllivier Robert <dd>A step change in system time has occurred, but the system clock has not yet resynchronized. 839c2daa00SOllivier Robert </dl> 849c2daa00SOllivier Robert <p>The tattletale symbol at the left margin displays the synchronization status of each peer. The currently selected peer is marked <tt>*</tt>, while additional peers designated acceptable for synchronization are marked <tt>+</tt>. Peers marked <tt>*</tt> and <tt>+</tt> are included in the weighted average computation to set the local clock; the data produced by peers marked with other symbols are discarded. See the <tt>ntpq</tt> page for the meaning of these symbols.</p> 859c2daa00SOllivier Robert <p>Additional details for each peer separately can be determined by the following procedure. First, use the <tt>as</tt> command to display an index of association identifiers, such as</p> 869c2daa00SOllivier Robert <pre> 879c2daa00SOllivier Robertntpq> as 889c2daa00SOllivier Robertind assID status conf reach auth condition last_event cnt 899c2daa00SOllivier Robert=========================================================== 909c2daa00SOllivier Robert 1 50252 f314 yes yes ok outlyer reachable 1 919c2daa00SOllivier Robert 2 50253 f414 yes yes ok candidat reachable 1 929c2daa00SOllivier Robert 3 50254 f414 yes yes ok candidat reachable 1 939c2daa00SOllivier Robert 4 50255 f614 yes yes ok sys.peer reachable 1 949c2daa00SOllivier Robert</pre> 959c2daa00SOllivier Robert <p>Each line in this billboard is associated with the corresponding line in the <tt>pe</tt> billboard above. The <tt>assID</tt> shows the unique identifier for each mobilized association, while the <tt>status</tt> column shows the peer status word in hex, as defined in the NTP specification. Next, use the <tt>rv</tt> command and the respective <tt>assID</tt> identifier to display a detailed synopsis for the selected peer, such as</p> 969c2daa00SOllivier Robert <pre> 979c2daa00SOllivier Robertntpq> rv 50253 989c2daa00SOllivier Robertstatus=f414 reach, conf, auth, sel_candidat, 1 event, event_reach, 999c2daa00SOllivier Robertsrcadr=saicpc-isiepc2.cairn.net, srcport=123, dstadr=140.173.1.46, 1009c2daa00SOllivier Robertdstport=123, keyid=3816249004, stratum=2, precision=-27, 1019c2daa00SOllivier Robertrootdelay=10.925, rootdispersion=12.848, refid=pogo.udel.edu, 1029c2daa00SOllivier Robertreftime=bd11b225.133e1437 Sat, Jul 8 2000 13:59:01.075, delay=10.550, 1039c2daa00SOllivier Robertoffset=-1.357, jitter=0.074, dispersion=1.444, reach=377, valid=7, 1049c2daa00SOllivier Roberthmode=1, pmode=1, hpoll=6, ppoll=7, leap=00, flash=00 ok, 1059c2daa00SOllivier Robertorg=bd11b23c.01385836 Sat, Jul 8 2000 13:59:24.004, 1069c2daa00SOllivier Robertrec=bd11b23c.02dc8fb8 Sat, Jul 8 2000 13:59:24.011, 1079c2daa00SOllivier Robertxmt=bd11b21a.ac34c1a8 Sat, Jul 8 2000 13:58:50.672, 1089c2daa00SOllivier Robertfiltdelay= 10.45 10.50 10.63 10.40 10.48 10.43 10.49 11.26, 1099c2daa00SOllivier Robertfiltoffset= -1.18 -1.26 -1.26 -1.35 -1.35 -1.42 -1.54 -1.81, 1109c2daa00SOllivier Robertfiltdisp= 0.51 1.47 2.46 3.45 4.40 5.34 6.33 7.28, 1119c2daa00SOllivier Roberthostname="miro.time.saic.com", signature=md5WithRSAEncryption, flags=0x83f01, initsequence=61, initkey=0x287b649c, 1129c2daa00SOllivier Roberttimestamp=3172053041 1139c2daa00SOllivier Robert</pre> 1149c2daa00SOllivier Robert <p>A detailed explanation of the fields in this billboard are beyond the scope of this discussion; however, most variables defined in the NTP Version 3 specification RFC-1305 are available along with others defined for NTPv4 on the <tt>ntpq</tt> page. This particular example was chosen to illustrate probably the most complex configuration involving symmetric modes and public-key cryptography. As the result of debugging experience, the names and values of these variables may change from time to time.</p> 1159c2daa00SOllivier Robert <p>A useful indicator of miscellaneous problems is the <tt>flash</tt> value, which reveals the state of the various sanity tests on incoming packets. There are currently 12 bits, one for each test, numbered from the right, which is for test 1. If the test fails, the corresponding bit is set to one and zero otherwise. If any bit is set following each processing step, the packet is discarded. The meaning of each test is described on the <tt>ntpq</tt> page.</p> 1169c2daa00SOllivier Robert <p>The three lines identified as <tt>filtdelay</tt>, <tt>filtoffset</tt> and <tt>filtdisp</tt> reveal the roundtrip delay, clock offset and dispersion for each of the last eight measurement rounds, all in milliseconds. Note that the dispersion, which is an estimate of the error, increases as the age of the sample increases. From these data, it is usually possible to determine the incidence of severe packet loss, network congestion, and unstable local clock oscillators. There are no hard and fast rules here, since every case is unique; however, if one or more of the rounds show large values or change radically from one round to another, the network is probably congested or lossy.</p> 1179c2daa00SOllivier Robert <p>Once the daemon has set the local clock, it will continuously track the discrepancy between local time and NTP time and adjust the local clock accordingly. There are two components of this adjustment, time and frequency. These adjustments are automatically determined by the clock discipline algorithm, which functions as a hybrid phase/frequency feedback loop. The behavior of this algorithm is carefully controlled to minimize residual errors due to network jitter and frequency variations of the local clock hardware oscillator that normally occur in practice. However, when started for the first time, the algorithm may take some time to converge on the intrinsic frequency error of the host machine.</p> 1189c2daa00SOllivier Robert <p>The state of the local clock itself can be determined using the <tt>rv</tt> command (without the argument), such as</p> 1199c2daa00SOllivier Robert <pre> 1209c2daa00SOllivier Robertntpq> rv 1219c2daa00SOllivier Robertstatus=0644 leap_none, sync_ntp, 4 events, event_peer/strat_chg, 1229c2daa00SOllivier Robertversion="ntpd 4.0.99j4-r Fri Jul 7 23:38:17 GMT 2000 (1)", 1239c2daa00SOllivier Robertprocessor="i386", system="FreeBSD3.4-RELEASE", leap=00, stratum=2, 1249c2daa00SOllivier Robertprecision=-27, rootdelay=0.552, rootdispersion=12.532, peer=50255, 1259c2daa00SOllivier Robertrefid=pogo.udel.edu, 1269c2daa00SOllivier Robertreftime=bd11b220.ac89f40a Sat, Jul 8 2000 13:58:56.673, poll=6, 1279c2daa00SOllivier Robertclock=bd11b225.ee201472 Sat, Jul 8 2000 13:59:01.930, state=4, 1289c2daa00SOllivier Robertphase=0.179, frequency=44.298, jitter=0.022, stability=0.001, 1299c2daa00SOllivier Roberthostname="barnstable.udel.edu", signature=md5WithRSAEncryption, 1309c2daa00SOllivier Robertflags=0x80011, hostkey=3171372095, refresh=3172016539 1319c2daa00SOllivier Robertcert="grundoon.udel.edu grundoon.udel.edu 0x3 3233600829" 1329c2daa00SOllivier Robertcert="whimsy.udel.edu whimsy.udel.edu 0x5 3233682156" 1339c2daa00SOllivier Robert</pre> 1349c2daa00SOllivier Robert <p>An explanation about most of these variables is in the RFC-1305 specification. The most useful ones include <tt>clock</tt>, which shows when the clock was last adjusted, and <tt>reftime</tt>, which shows when the server clock of <tt>refid</tt> was last adjusted. The <tt>version</tt>, <tt>processor</tt> and <tt>system</tt> values are very helpful when included in bug reports. The mean millisecond time offset (<tt>phase</tt>) and deviation (<tt>jitter</tt>) monitor the clock quality, while the mean PPM frequency offset (<tt>frequency</tt>) and deviation (<tt>stability</tt>) monitor the clock stability and serve as a useful diagnostic tool. It has been the experience of NTP operators over the years that these data represent useful environment and hardware alarms. If the motherboard fan freezes up or some hardware bit sticks, the system clock is usually the first to notice it.</p> 1359c2daa00SOllivier Robert <p>Among the new variables added for NTP Version 4 are the <tt>hostname</tt>, <tt>signature</tt>, <tt>flags, hostkey, refresh </tt>and<tt> cert</tt>, which are used for the Autokey public-key cryptography described on the <a href="authopt.html">Authentication Options</a> page. The numeric values show the filestamps, in NTP seconds, that the associated media files were created. These are useful in diagnosing problems with cryptographic key consistency and ordering principles.</p> 1369c2daa00SOllivier Robert <p>When nothing seems to happen in the <tt>pe</tt> billboard after some minutes, there may be a network problem. One common network problem is an access controlled router on the path to the selected peer or an access controlled server using methods described on the <a href="accopt.html">Access Control Options</a> page. Another common problem is that the server is down or running in unsynchronized mode due to a local problem. Use the <tt>ntpq</tt> program to spy on the server variables in the same way you can spy on your own.</p> 1379c2daa00SOllivier Robert <p>Normally, the daemon will adjust the local clock in small steps in such a way that system and user programs are unaware of its operation. The adjustment process operates continuously unless the apparent clock error exceeds the step threshold for a period longer than the stepout threshold, which for most Internet paths is a very rare event. If the event is simply an outlyer due to an occasional network delay spike, the correction is simply discarded; however, if the apparent time error persists for longer than the stepout threshold of about 17 minutes, the local clock is stepped or slewed to the new value as directed. This behavior is designed to resist errors due to severely congested network paths, as well as errors due to confused radio clocks upon the epoch of a leap second.</p> 1389c2daa00SOllivier Robert <h4>Large Frequency Errors</h4> 1399c2daa00SOllivier Robert <p>The frequency tolerance of computer clock oscillators can vary widely, which can put a strain on the daemon's ability to compensate for the intrinsic frequency error. While the daemon can handle frequency errors up to 500 parts-per-million (PPM), or 43 seconds per day, values much above 100 PPM reduce the headroom and increase the time to learn the particular value and record it in the <tt>ntp.drift</tt> file. In extreme cases before the particular oscillator frequency error has been determined, the residual system time offsets can sweep from one extreme to the other of the 128-ms tracking window only for the behavior to repeat at 900-s intervals until the measurements have converged.</p> 1409c2daa00SOllivier Robert <p>In order to determine if excessive frequency error is a problem, observe the nominal <tt>filtoffset</tt> values for a number of rounds and divide by the poll interval. If the result is something approaching 500 PPM, there is a good chance that NTP will not work properly until the frequency error is reduced by some means. A common cause is the hardware time-of-year (TOY) clock chip, which must be disabled when NTP disciplines the software clock. For some systems this can be done using the <tt><a href="tickadj.html">tickadj</a></tt> utility and the <tt>-s</tt> command line argument. For other systems this can be done using a command in the system startup file.</p> 1419c2daa00SOllivier Robert <p>If the TOY chip is not the cause, the problem may be that the hardware clock frequency may simply be too slow or two fast. In some systems this might require tweaking a trimmer capacitor on the motherboard. For other systems the clock frequency can be adjusted in increments of 100 PPM using the <tt>tickadj</tt> utility and the <tt>-t</tt> command line argument. Note that the <tt>tickadj</tt> alters certain kernel variables and, while the utility attempts to figure out an acceptable way to do this, there are many cases where <tt>tickadj</tt> is incompatible with a running kernel.</p> 1429c2daa00SOllivier Robert <h4>Access Controls</h4> 1439c2daa00SOllivier Robert <p>Provisions are included in <tt>ntpd</tt> for access controls which deflect unwanted traffic from selected hosts or networks. The controls described on the <a href="accopt.html">Access Control Options</a> include detailed packet filter operations based on source address and address mask. Normally, filtered packets are dropped without notice other than to increment tally counters. However, the server can be configured to send a "kiss-o'-death" (KOD) packet to the client either when explicitly configured or when cryptographic authentication fails for some reason. The client association is permanently disabled, the access denied bit (TEST4) is set in the flash variable and a message is sent to the system log.</p> 1449c2daa00SOllivier Robert <p>The access control provisions include a limit on the packet rate from a host or network. If an incoming packet exceeds the limit, it is dropped and a KOD sent to the source. If this occurs after the client association has synchronized, the association is not disabled, but a message is sent to the system log. See the <a href="accopt.html">Access Control Options</a> page for further informatin.</p> 1459c2daa00SOllivier Robert <h4>Large Delay Variations</h4> 1469c2daa00SOllivier Robert <p>In some reported scenarios an access line may show low to moderate network delays during some period of the day and moderate to high delays during other periods. Often the delay on one direction of transmission dominates, which can result in large time offset errors, sometimes in the range up to a few seconds. It is not usually convenient to run <tt>ntpd</tt> throughout the day in such scenarios, since this could result in several time steps, especially if the condition persists for greater than the stepout threshold.</p> 1479c2daa00SOllivier Robert <p>Specific provisions have been built into <tt>ntpd</tt> to cope with these problems. The scheme is called "huff-'n-puff and is described on the <a href="miscopt.html">Miscellaneous Options</a> page. An alternative approach in such scenarios is first to calibrate the local clock frequency error by running <tt>ntpd</tt> in continuous mode during the quiet interval and let it write the frequency to the <tt>ntp.drift</tt> file. Then, run <tt>ntpd -q</tt> from a cron job each day at some time in the quiet interval. In systems with the nanokernel or microkernel performance enhancements, including Solaris, Tru64, Linux and FreeBSD, the kernel continuously disciplines the frequency so that the residual correction produced by <tt>ntpd</tt> is usually less than a few milliseconds.</p> 1489c2daa00SOllivier Robert <h4>Cryptographic Authentication</h4> 1499c2daa00SOllivier Robert <p>Reliable source authentication requires the use of symmetric key or public key cryptography, as described on the <a href="authopt.html">Authentication Options</a> page. In symmetric key cryptography servers and clients share session keys contained in a secret key file In public key cryptography, which requires the OpenSSL software library, the server has a private key, never shared, and a public key with unrestricted distribution. The cryptographic media required are produced by the <a href="keygen.html"><tt>ntp-keygen</tt></a> program.</p> 1509c2daa00SOllivier Robert <p>Problems with symmetric key authentication are usually due to mismatched keys or improper use of the <tt>trustedkey</tt> command. A simple way to check for problems is to use the trace facility, which is enabled using the <tt>ntpd -d</tt> command line. As each packet is received a trace line is displayed which shows the authentication status in the <tt>auth</tt> field. A status of 1 indicates the packet was successful authenticated; otherwise it has failed.</p> 1519c2daa00SOllivier Robert <p>A common misconception is the implication of the <tt>auth</tt> bit in the <tt>enable</tt> and <tt>disable</tt> commands. <b>This bit does not affect authentication in any way other than to enable or disable mobilization of a new persistent association in broadcast/multicast client, manycast client or symmetric passive modes.</b> If enabled, which is the default, these associations require authentication; if not, an association is mobilized even if not authenticated. Users are cautioned that running with authentication disabled is very dangerous, since an intruder can easily strike up an association and inject false time values.</p> 1529c2daa00SOllivier Robert <p>Public key cryptography is supported in NTPv4 using the Autokey protocol, which is described in briefings on the NTP Project page linked from www.ntp.org. Development of this protocol is mature and the <tt>ntpd</tt> implementation is basically complete. Autokey version 2, which is the latest and current version, includes provisions to hike certificate trails, operate as certificate authorities and verify identity using challenge/response identification schemes. Further details of the protocol are on the <a href="authopt.html">Authentication Options</a> page. Common problems with configuration and key generation are mismatched key files, broken links and missing or broken random seed file.</p> 1539c2daa00SOllivier Robert <p>As in the symmetric key cryptography case, the trace facility is a good way to verify correct operation. A statistics file <tt>cryptostats</tt> records protocol transactions and error messages. The daemon requires a random seed file, public/private key file and a valid certificate file; otherwise it exits immediately with a message to the system log. As each file is loaded a trace message appears with its filestamp. There are a number of checks to insure that only consistent data are used and that the certificate is valid. When the protocol is in operation a number of checks are done to verify the server has the expected credentials and its filestamps and timestamps are consistent. Errors found are reported using NTP control and monitoring protocol traps with extended trap codes shown in the Authentication Options page.</p> 1549c2daa00SOllivier Robert <p>To assist debugging every NTP extension field is displayed in the trace along with the Autokey operation code. Every extension field carrying a verified signature is identified and displayed along with filestamp and timestamp where meaningful. In all except broadcast/multicast client mode, correct operation of the protocol is confirmed by the absence of extension fields and an <tt>auth</tt> value of one. It is normal in broadcast/multicast client mode that the broadcast server use one extension field to show the host name, status word and association ID.</p> 1559c2daa00SOllivier Robert <h4>Debugging Checklist</h4> 1569c2daa00SOllivier Robert <p>If the <tt>ntpq</tt> or <tt>ntpdc</tt> programs do not show that messages are being received by the daemon or that received messages do not result in correct synchronization, verify the following:</p> 1579c2daa00SOllivier Robert <ol> 1589c2daa00SOllivier Robert <li>Verify the <tt>/etc/services</tt> file host machine is configured to accept UDP packets on the NTP port 123. NTP is specifically designed to use UDP and does not respond to TCP. 1599c2daa00SOllivier Robert <li>Check the system log for <tt>ntpd</tt> messages about configuration errors, name-lookup failures or initialization problems. Common system log messages are summarized on the <a href="msyslog.html"><tt>ntpd</tt> System Log Messages</a> page. Check to be sure that only one copy of <tt>ntpd</tt> is running. 1609c2daa00SOllivier Robert <li>Verify using <tt>ping</tt> or other utility that packets actually do make the round trip between the client and server. Verify using <tt>nslookup</tt> or other utility that the DNS server names do exist and resolve to valid Internet addresses. 1619c2daa00SOllivier Robert 1629c2daa00SOllivier Robert <li>Check that the remote NTP server is up and running. The usual evidence that it is not is a <tt>Connection refused</tt> message. 1639c2daa00SOllivier Robert <li>Using the <tt>ntpdc</tt> program, verify that the packets received and packets sent counters are incrementing. If the sent counter does not increment and the configuration file includes configured servers, something may be wrong in the host network or interface configuration. If this counter does increment, but the received counter does not increment, something may be wrong in the network or the server NTP daemon may not be running or the server itself may be down or not responding. 1649c2daa00SOllivier Robert <li>If both the sent and received counters do increment, but the <tt>reach</tt> values in the <tt>pe</tt> billboard with <tt>ntpq</tt> continues to show zero, received packets are probably being discarded for some reason. If this is the case, the cause should be evident from the <tt>flash</tt> variable as discussed above and on the <tt>ntpq</tt> page. It could be that the server has disabled access for the client address, in which case the refid field in the <tt>ntpq pe</tt> billboard will show a kiss code. See earlier on this page for a list of kiss codes and their meaning. <li>If the <tt>reach</tt> values in the <tt>pe</tt> billboard show the servers are alive and responding, note the tattletale symbols at the left margin, which indicate the status of each server resulting from the various grooming and mitigation algorithms. The interpretation of these symbols is discussed on the <tt>ntpq</tt> page. After a few minutes of operation, one or another of the reachable server candidates should show a * tattletale symbol. If this doesn't happen, the intersection algorithm, which classifies the servers as truechimers or falsetickers, may be unable to find a majority of truechimers among the server population. 1659c2daa00SOllivier Robert <li>If all else fails, see the FAQ and/or the discussion and briefings at the NTP Project page. 1669c2daa00SOllivier Robert </ol> 1679c2daa00SOllivier Robert <hr> 1689c2daa00SOllivier Robert <script type="text/javascript" language="javascript" src="scripts/footer.txt"></script> 1699c2daa00SOllivier Robert </body> 1709c2daa00SOllivier Robert 1719c2daa00SOllivier Robert</html> 172