1.\" Copyright (c) 1985 Regents of the University of California. 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 3. Neither the name of the University nor the names of its contributors 13.\" may be used to endorse or promote products derived from this software 14.\" without specific prior written permission. 15.\" 16.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 19.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 26.\" SUCH DAMAGE. 27.\" 28.Dd January 26, 2005 29.Dt IEEE 3 30.Os 31.Sh NAME 32.Nm ieee 33.Nd IEEE standard 754 for floating-point arithmetic 34.Sh DESCRIPTION 35The IEEE Standard 754 for Binary Floating-Point Arithmetic 36defines representations of floating-point numbers and abstract 37properties of arithmetic operations relating to precision, 38rounding, and exceptional cases, as described below. 39.Ss IEEE STANDARD 754 Floating-Point Arithmetic 40Radix: Binary. 41.Pp 42Overflow and underflow: 43.Bd -ragged -offset indent -compact 44Overflow goes by default to a signed \*(If. 45Underflow is 46.Em gradual . 47.Ed 48.Pp 49Zero is represented ambiguously as +0 or \-0. 50.Bd -ragged -offset indent -compact 51Its sign transforms correctly through multiplication or 52division, and is preserved by addition of zeros 53with like signs; but x\-x yields +0 for every 54finite x. 55The only operations that reveal zero's 56sign are division by zero and 57.Fn copysign x \(+-0 . 58In particular, comparison (x > y, x \(>= y, etc.)\& 59cannot be affected by the sign of zero; but if 60finite x = y then \*(If = 1/(x\-y) \(!= \-1/(y\-x) = \-\*(If. 61.Ed 62.Pp 63Infinity is signed. 64.Bd -ragged -offset indent -compact 65It persists when added to itself 66or to any finite number. 67Its sign transforms 68correctly through multiplication and division, and 69(finite)/\(+-\*(If\0=\0\(+-0 70(nonzero)/0 = \(+-\*(If. 71But 72\*(If\-\*(If, \*(If\(**0 and \*(If/\*(If 73are, like 0/0 and sqrt(\-3), 74invalid operations that produce \*(Na. ... 75.Ed 76.Pp 77Reserved operands (\*(Nas): 78.Bd -ragged -offset indent -compact 79An \*(Na is 80.Em ( N Ns ot Em a N Ns umber ) . 81Some \*(Nas, called Signaling \*(Nas, trap any floating-point operation 82performed upon them; they are used to mark missing 83or uninitialized values, or nonexistent elements 84of arrays. 85The rest are Quiet \*(Nas; they are 86the default results of Invalid Operations, and 87propagate through subsequent arithmetic operations. 88If x \(!= x then x is \*(Na; every other predicate 89(x > y, x = y, x < y, ...) is FALSE if \*(Na is involved. 90.Ed 91.Pp 92Rounding: 93.Bd -ragged -offset indent -compact 94Every algebraic operation (+, \-, \(**, /, 95\(sr) 96is rounded by default to within half an 97.Em ulp , 98and when the rounding error is exactly half an 99.Em ulp 100then 101the rounded value's least significant bit is zero. 102(An 103.Em ulp 104is one 105.Em U Ns nit 106in the 107.Em L Ns ast 108.Em P Ns lace . ) 109This kind of rounding is usually the best kind, 110sometimes provably so; for instance, for every 111x = 1.0, 2.0, 3.0, 4.0, ..., 2.0**52, we find 112(x/3.0)\(**3.0 == x and (x/10.0)\(**10.0 == x and ... 113despite that both the quotients and the products 114have been rounded. 115Only rounding like IEEE 754 can do that. 116But no single kind of rounding can be 117proved best for every circumstance, so IEEE 754 118provides rounding towards zero or towards 119+\*(If or towards \-\*(If 120at the programmer's option. 121.Ed 122.Pp 123Exceptions: 124.Bd -ragged -offset indent -compact 125IEEE 754 recognizes five kinds of floating-point exceptions, 126listed below in declining order of probable importance. 127.Bl -column -offset indent "Invalid Operation" "Gradual Underflow" 128.Em "Exception Default Result" 129Invalid Operation \*(Na, or FALSE 130Overflow \(+-\*(If 131Divide by Zero \(+-\*(If 132Underflow Gradual Underflow 133Inexact Rounded value 134.El 135.Pp 136NOTE: An Exception is not an Error unless handled 137badly. 138What makes a class of exceptions exceptional 139is that no single default response can be satisfactory 140in every instance. 141On the other hand, if a default 142response will serve most instances satisfactorily, 143the unsatisfactory instances cannot justify aborting 144computation every time the exception occurs. 145.Ed 146.Ss Data Formats 147Single-precision: 148.Bd -ragged -offset indent -compact 149Type name: 150.Vt float 151.Pp 152Wordsize: 32 bits. 153.Pp 154Precision: 24 significant bits, 155roughly like 7 significant decimals. 156.Pp 157If x and x' are consecutive positive single-precision 158numbers (they differ by 1 159.Em ulp ) , 160then 161.Bl -column "XXX" -compact 1625.9e\-08 < 0.5**24 < (x'\-x)/x \(<= 0.5**23 < 1.2e\-07. 163.El 164.Pp 165.Bl -column "XXX" -compact 166Range: Overflow threshold = 2.0**128 = 3.4e38 167 Underflow threshold = 0.5**126 = 1.2e\-38 168.El 169.Pp 170Underflowed results round to the nearest 171integer multiple of 172.Bl -column "XXX" -compact 1730.5**149 = 1.4e\-45. 174.El 175.Ed 176.Pp 177Double-precision: 178.Bd -ragged -offset indent -compact 179Type name: 180.Vt double 181.Po On some architectures, 182.Vt long double 183is the same as 184.Vt double 185.Pc 186.Pp 187Wordsize: 64 bits. 188.Pp 189Precision: 53 significant bits, 190roughly like 16 significant decimals. 191.Pp 192If x and x' are consecutive positive double-precision 193numbers (they differ by 1 194.Em ulp ) , 195then 196.Bl -column "XXX" -compact 1971.1e\-16 < 0.5**53 < (x'\-x)/x \(<= 0.5**52 < 2.3e\-16. 198.El 199.Pp 200.Bl -column "XXX" -compact 201Range: Overflow threshold = 2.0**1024 = 1.8e308 202 Underflow threshold = 0.5**1022 = 2.2e\-308 203.El 204.Pp 205Underflowed results round to the nearest 206integer multiple of 207.Bl -column "XXX" -compact 2080.5**1074 = 4.9e\-324. 209.El 210.Ed 211.Pp 212Extended-precision: 213.Bd -ragged -offset indent -compact 214Type name: 215.Vt long double 216(when supported by the hardware) 217.Pp 218Wordsize: 96 bits. 219.Pp 220Precision: 64 significant bits, 221roughly like 19 significant decimals. 222.Pp 223If x and x' are consecutive positive extended-precision 224numbers (they differ by 1 225.Em ulp ) , 226then 227.Bl -column "XXX" -compact 2281.0e\-19 < 0.5**63 < (x'\-x)/x \(<= 0.5**62 < 2.2e\-19. 229.El 230.Pp 231.Bl -column "XXX" -compact 232Range: Overflow threshold = 2.0**16384 = 1.2e4932 233 Underflow threshold = 0.5**16382 = 3.4e\-4932 234.El 235.Pp 236Underflowed results round to the nearest 237integer multiple of 238.Bl -column "XXX" -compact 2390.5**16445 = 5.7e\-4953. 240.El 241.Ed 242.Pp 243Quad-extended-precision: 244.Bd -ragged -offset indent -compact 245Type name: 246.Vt long double 247(when supported by the hardware) 248.Pp 249Wordsize: 128 bits. 250.Pp 251Precision: 113 significant bits, 252roughly like 34 significant decimals. 253.Pp 254If x and x' are consecutive positive quad-extended-precision 255numbers (they differ by 1 256.Em ulp ) , 257then 258.Bl -column "XXX" -compact 2599.6e\-35 < 0.5**113 < (x'\-x)/x \(<= 0.5**112 < 2.0e\-34. 260.El 261.Pp 262.Bl -column "XXX" -compact 263Range: Overflow threshold = 2.0**16384 = 1.2e4932 264 Underflow threshold = 0.5**16382 = 3.4e\-4932 265.El 266.Pp 267Underflowed results round to the nearest 268integer multiple of 269.Bl -column "XXX" -compact 2700.5**16494 = 6.5e\-4966. 271.El 272.Ed 273.Ss Additional Information Regarding Exceptions 274For each kind of floating-point exception, IEEE 754 275provides a Flag that is raised each time its exception 276is signaled, and stays raised until the program resets 277it. 278Programs may also test, save and restore a flag. 279Thus, IEEE 754 provides three ways by which programs 280may cope with exceptions for which the default result 281might be unsatisfactory: 282.Bl -enum 283.It 284Test for a condition that might cause an exception 285later, and branch to avoid the exception. 286.It 287Test a flag to see whether an exception has occurred 288since the program last reset its flag. 289.It 290Test a result to see whether it is a value that only 291an exception could have produced. 292.Pp 293CAUTION: The only reliable ways to discover 294whether Underflow has occurred are to test whether 295products or quotients lie closer to zero than the 296underflow threshold, or to test the Underflow 297flag. 298(Sums and differences cannot underflow in 299IEEE 754; if x \(!= y then x\-y is correct to 300full precision and certainly nonzero regardless of 301how tiny it may be.) 302Products and quotients that 303underflow gradually can lose accuracy gradually 304without vanishing, so comparing them with zero 305(as one might on a VAX) will not reveal the loss. 306Fortunately, if a gradually underflowed value is 307destined to be added to something bigger than the 308underflow threshold, as is almost always the case, 309digits lost to gradual underflow will not be missed 310because they would have been rounded off anyway. 311So gradual underflows are usually 312.Em provably 313ignorable. 314The same cannot be said of underflows flushed to 0. 315.El 316.Pp 317At the option of an implementor conforming to IEEE 754, 318other ways to cope with exceptions may be provided: 319.Bl -enum 320.It 321ABORT. 322This mechanism classifies an exception in 323advance as an incident to be handled by means 324traditionally associated with error-handling 325statements like "ON ERROR GO TO ...". 326Different 327languages offer different forms of this statement, 328but most share the following characteristics: 329.Bl -dash 330.It 331No means is provided to substitute a value for 332the offending operation's result and resume 333computation from what may be the middle of an 334expression. 335An exceptional result is abandoned. 336.It 337In a subprogram that lacks an error-handling 338statement, an exception causes the subprogram to 339abort within whatever program called it, and so 340on back up the chain of calling subprograms until 341an error-handling statement is encountered or the 342whole task is aborted and memory is dumped. 343.El 344.It 345STOP. 346This mechanism, requiring an interactive 347debugging environment, is more for the programmer 348than the program. 349It classifies an exception in 350advance as a symptom of a programmer's error; the 351exception suspends execution as near as it can to 352the offending operation so that the programmer can 353look around to see how it happened. 354Quite often 355the first several exceptions turn out to be quite 356unexceptionable, so the programmer ought ideally 357to be able to resume execution after each one as if 358execution had not been stopped. 359.It 360\&... Other ways lie beyond the scope of this document. 361.El 362.Pp 363Ideally, each 364elementary function should act as if it were indivisible, or 365atomic, in the sense that ... 366.Bl -enum 367.It 368No exception should be signaled that is not deserved by 369the data supplied to that function. 370.It 371Any exception signaled should be identified with that 372function rather than with one of its subroutines. 373.It 374The internal behavior of an atomic function should not 375be disrupted when a calling program changes from 376one to another of the five or so ways of handling 377exceptions listed above, although the definition 378of the function may be correlated intentionally 379with exception handling. 380.El 381.Pp 382The functions in 383.Nm libm 384are only approximately atomic. 385They signal no inappropriate exception except possibly ... 386.Bl -tag -width indent -offset indent -compact 387.It Xo 388Over/Underflow 389.Xc 390when a result, if properly computed, might have lain barely within range, and 391.It Xo 392Inexact in 393.Fn cabs , 394.Fn cbrt , 395.Fn hypot , 396.Fn log10 397and 398.Fn pow 399.Xc 400when it happens to be exact, thanks to fortuitous cancellation of errors. 401.El 402Otherwise, ... 403.Bl -tag -width indent -offset indent -compact 404.It Xo 405Invalid Operation is signaled only when 406.Xc 407any result but \*(Na would probably be misleading. 408.It Xo 409Overflow is signaled only when 410.Xc 411the exact result would be finite but beyond the overflow threshold. 412.It Xo 413Divide-by-Zero is signaled only when 414.Xc 415a function takes exactly infinite values at finite operands. 416.It Xo 417Underflow is signaled only when 418.Xc 419the exact result would be nonzero but tinier than the underflow threshold. 420.It Xo 421Inexact is signaled only when 422.Xc 423greater range or precision would be needed to represent the exact result. 424.El 425.Sh SEE ALSO 426.Xr fenv 3 , 427.Xr ieee_test 3 , 428.Xr math 3 429.Pp 430An explanation of IEEE 754 and its proposed extension p854 431was published in the IEEE magazine MICRO in August 1984 under 432the title "A Proposed Radix- and Word-length-independent 433Standard for Floating-point Arithmetic" by 434.An "W. J. Cody" 435et al. 436The manuals for Pascal, C and BASIC on the Apple Macintosh 437document the features of IEEE 754 pretty well. 438Articles in the IEEE magazine COMPUTER vol.\& 14 no.\& 3 (Mar.\& 4391981), and in the ACM SIGNUM Newsletter Special Issue of 440Oct.\& 1979, may be helpful although they pertain to 441superseded drafts of the standard. 442.Sh STANDARDS 443.St -ieee754 444