Lines Matching +full:double +full:- +full:precision
9 -------------------------------------------------------------------------------
12 SoftFloat is a software implementation of floating-point that conforms to
13 the IEC/IEEE Standard for Binary Floating-Point Arithmetic. As many as four
14 formats are supported: single precision, double precision, extended double
15 precision, and quadruple precision. All operations required by the standard
20 IEC/IEEE Floating-Point Standard. Details about the standard are available
24 -------------------------------------------------------------------------------
28 SoftFloat header files assume an ISO/ANSI-style C compiler. No attempt
29 has been made to accommodate compilers that are not ISO-conformant. In
33 Support for the extended double-precision and quadruple-precision formats
34 depends on a C compiler that implements 64-bit integer arithmetic. If the
36 limited to only single and double precisions. When that is the case, all
37 references in this document to the extended double precision, quadruple
38 precision, and 64-bit integers should be ignored.
41 -------------------------------------------------------------------------------
50 Extended Double-Precision Rounding Precision
56 Round-to-Integer Functions
59 Raise-Exception Function
64 -------------------------------------------------------------------------------
70 provided by the National Science Foundation under grant MIP-9311980. The
72 a fixed-point vector processor in collaboration with the University of
82 -------------------------------------------------------------------------------
85 When 64-bit integers are supported by the compiler, the `softfloat.h' header
86 file defines four types: `float32' (single precision), `float64' (double
87 precision), `floatx80' (extended double precision), and `float128'
88 (quadruple precision). The `float32' and `float64' types are defined in
89 terms of 32-bit and 64-bit integer types, respectively, while the `float128'
90 type is defined as a structure of two 64-bit integers, taking into account
92 is defined as a structure containing one 16-bit and one 64-bit integer, with
96 When 64-bit integers are _not_ supported by the compiler, the `softfloat.h'
98 ISO/ANSI C guarantees at least one built-in integer type of 32 bits,
100 `float64' type is defined as a structure of two 32-bit integers, with the
104 implements the usual C `float' and `double' types according to the IEC/IEEE
106 in memory from the native `float' and `double' types. (On the other hand,
109 native `float' and `double' types.)
113 -- Conversions among all the floating-point formats, and also between
114 integers (32-bit and 64-bit) and any of the floating-point formats.
116 -- The usual add, subtract, multiply, divide, and square root operations
117 for all floating-point formats.
119 -- For each format, the floating-point remainder operation defined by the
122 -- For each floating-point format, a ``round to integer'' operation that
123 rounds to the nearest integer value in the same format. (The floating-
126 -- Comparisons between two values in the same floating-point format.
132 -------------------------------------------------------------------------------
143 -------------------------------------------------------------------------------
144 Extended Double-Precision Rounding Precision
146 For extended double precision (`floatx80') only, the rounding precision
153 operations are rounded (as usual) to the full precision of the extended
154 double-precision format. Setting `floatx80_rounding_precision' to 32
155 or to 64 causes the operations listed to be rounded to reduced precision
156 equivalent to single precision (`float32') or to double precision
157 (`float64'), respectively. When rounding to reduced precision, additional
164 -------------------------------------------------------------------------------
179 where `<exception>' is the appropriate name. To raise a floating-point
192 -------------------------------------------------------------------------------
195 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
198 All conversions among the floating-point formats are supported, as are all
199 conversions between a floating-point format and 32-bit and 64-bit signed
218 returns one result. Conversions from a smaller to a larger floating-point
219 format are always exact and so require no rounding. Conversions from 32-bit
220 integers to double precision and larger formats are also exact, and likewise
221 for conversions from 64-bit integers to extended double and quadruple
224 Conversions from floating-point to integer raise the invalid exception if
226 size (32 or 64 bits). If the floating-point operand is a NaN, the largest
230 On conversions to integer, if the floating-point operand is not already an
244 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
257 Rounding of the extended double-precision (`floatx80') functions is affected
259 section _Extended_Double-Precision_Rounding_Precision_.
261 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
274 the value x - n*y, where n is the integer closest to x/y. If x/y is exactly
283 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
284 Round-to-Integer Functions
286 For each format, SoftFloat implements the round-to-integer function
294 Each function takes a single floating-point operand and returns a result of
297 the resulting integer value is returned in the same floating-point format.
299 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
302 The following floating-point comparison functions are provided:
314 The standard greater-than (>), greater-than-or-equal (>=), and not-equal
316 not-equal function is just the logical complement of the equal function.
317 The greater-than-or-equal function is identical to the less-than-or-equal
318 function with the operands reversed; and the greater-than function can be
319 obtained from the less-than function in the same way.
321 The IEC/IEEE Standard specifies that the less-than-or-equal and less-than
337 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
340 The following functions test whether a floating-point value is a signaling
351 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
352 Raise-Exception Function
354 SoftFloat provides a function for raising floating-point exceptions:
362 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
365 -------------------------------------------------------------------------------
368 At the time of this writing, the most up-to-date information about