softfloat.c - OpenGrok cross reference for /linux/arch/arm/nwfpe/softfloat.c

Lines Matching +full:point +full:- +full:to +full:- +full:point
4 This C source file is part of the SoftFloat IEC/IEEE Floating-point
10 National Science Foundation under grant MIP-9311980.  The original version
11 of this code was written as part of a project to build a fixed-point vector
15 http://www.jhauser.us/arithmetic/SoftFloat-2b/SoftFloat-source.txt
18 has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT
19 TIMES RESULT IN INCORRECT BEHAVIOR.  USE OF THIS SOFTWARE IS RESTRICTED TO
25 include prominent notice akin to these three paragraphs for those parts of
38 -------------------------------------------------------------------------------
39 Primitive arithmetic functions, including multi-word arithmetic, and
40 division and square root approximations.  (Can be specialized to target if
42 -------------------------------------------------------------------------------
44 #include "softfloat-macros"
47 -------------------------------------------------------------------------------
48 Functions and definitions to determine:  (1) whether tininess for underflow
52 are propagated from function inputs to output.  These details are target-
54 -------------------------------------------------------------------------------
56 #include "softfloat-specialize"
59 -------------------------------------------------------------------------------
60 Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
61 and 7, and returns the properly rounded 32-bit integer corresponding to the
63 to an integer.  Bit 63 of `absZ' must be zero.  Ordinarily, the fixed-point
64 input is simply rounded to an integer, with the inexact exception raised if
65 the input cannot be represented exactly as an integer.  If the fixed-point
68 -------------------------------------------------------------------------------
77     roundingMode = roundData->mode;  in roundAndPackInt32()
98     if ( zSign ) z = - z;  in roundAndPackInt32()
100         roundData->exception |= float_flag_invalid;  in roundAndPackInt32()
103     if ( roundBits ) roundData->exception |= float_flag_inexact;  in roundAndPackInt32()
109 -------------------------------------------------------------------------------
110 Returns the fraction bits of the single-precision floating-point value `a'.
111 -------------------------------------------------------------------------------
121 -------------------------------------------------------------------------------
122 Returns the exponent bits of the single-precision floating-point value `a'.
123 -------------------------------------------------------------------------------
133 -------------------------------------------------------------------------------
134 Returns the sign bit of the single-precision floating-point value `a'.
135 -------------------------------------------------------------------------------
147 -------------------------------------------------------------------------------
148 Normalizes the subnormal single-precision floating-point value represented
150 significand are stored at the locations pointed to by `zExpPtr' and
152 -------------------------------------------------------------------------------
159     shiftCount = countLeadingZeros32( aSig ) - 8;  in normalizeFloat32Subnormal()
161     *zExpPtr = 1 - shiftCount;  in normalizeFloat32Subnormal()
166 -------------------------------------------------------------------------------
168 single-precision floating-point value, returning the result.  After being
170 together to form the result.  This means that any integer portion of `zSig'
172 will have an integer portion equal to 1, the `zExp' input should be 1 less
175 -------------------------------------------------------------------------------
195 -------------------------------------------------------------------------------
196 Takes an abstract floating-point value having sign `zSign', exponent `zExp',
197 and significand `zSig', and returns the proper single-precision floating-
198 point value corresponding to the abstract input.  Ordinarily, the abstract
199 value is simply rounded and packed into the single-precision format, with
203 returned.  If the abstract value is too small, the input value is rounded to
205 the abstract input cannot be represented exactly as a subnormal single-
206 precision floating-point number.
207     The input significand `zSig' has its binary point between bits 30
208 and 29, which is 7 bits to the left of the usual location.  This shifted
212 normalized, `zExp' must be 1 less than the ``true'' floating-point exponent.
214 Binary Floating-point Arithmetic.
215 -------------------------------------------------------------------------------
224     roundingMode = roundData->mode;  in roundAndPackFloat32()
247             roundData->exception |= float_flag_overflow | float_flag_inexact;  in roundAndPackFloat32()
248             return packFloat32( zSign, 0xFF, 0 ) - ( roundIncrement == 0 );  in roundAndPackFloat32()
253                 || ( zExp < -1 )  in roundAndPackFloat32()
255             shift32RightJamming( zSig, - zExp, &zSig );  in roundAndPackFloat32()
258             if ( isTiny && roundBits ) roundData->exception |= float_flag_underflow;  in roundAndPackFloat32()
261     if ( roundBits ) roundData->exception |= float_flag_inexact;  in roundAndPackFloat32()
270 -------------------------------------------------------------------------------
271 Takes an abstract floating-point value having sign `zSign', exponent `zExp',
272 and significand `zSig', and returns the proper single-precision floating-
273 point value corresponding to the abstract input.  This routine is just like
274 `roundAndPackFloat32' except that `zSig' does not have to be normalized in
275 any way.  In all cases, `zExp' must be 1 less than the ``true'' floating-
276 point exponent.
277 -------------------------------------------------------------------------------
284     shiftCount = countLeadingZeros32( zSig ) - 1;  in normalizeRoundAndPackFloat32()
285     return roundAndPackFloat32( roundData, zSign, zExp - shiftCount, zSig<<shiftCount );  in normalizeRoundAndPackFloat32()
290 -------------------------------------------------------------------------------
291 Returns the fraction bits of the double-precision floating-point value `a'.
292 -------------------------------------------------------------------------------
302 -------------------------------------------------------------------------------
303 Returns the exponent bits of the double-precision floating-point value `a'.
304 -------------------------------------------------------------------------------
314 -------------------------------------------------------------------------------
315 Returns the sign bit of the double-precision floating-point value `a'.
316 -------------------------------------------------------------------------------
328 -------------------------------------------------------------------------------
329 Normalizes the subnormal double-precision floating-point value represented
331 significand are stored at the locations pointed to by `zExpPtr' and
333 -------------------------------------------------------------------------------
340     shiftCount = countLeadingZeros64( aSig ) - 11;  in normalizeFloat64Subnormal()
342     *zExpPtr = 1 - shiftCount;  in normalizeFloat64Subnormal()
347 -------------------------------------------------------------------------------
349 double-precision floating-point value, returning the result.  After being
351 together to form the result.  This means that any integer portion of `zSig'
353 will have an integer portion equal to 1, the `zExp' input should be 1 less
356 -------------------------------------------------------------------------------
366 -------------------------------------------------------------------------------
367 Takes an abstract floating-point value having sign `zSign', exponent `zExp',
368 and significand `zSig', and returns the proper double-precision floating-
369 point value corresponding to the abstract input.  Ordinarily, the abstract
370 value is simply rounded and packed into the double-precision format, with
374 returned.  If the abstract value is too small, the input value is rounded to
376 the abstract input cannot be represented exactly as a subnormal double-
377 precision floating-point number.
378     The input significand `zSig' has its binary point between bits 62
379 and 61, which is 10 bits to the left of the usual location.  This shifted
383 normalized, `zExp' must be 1 less than the ``true'' floating-point exponent.
385 Binary Floating-point Arithmetic.
386 -------------------------------------------------------------------------------
395     roundingMode = roundData->mode;  in roundAndPackFloat64()
420             roundData->exception |= float_flag_overflow | float_flag_inexact;  in roundAndPackFloat64()
421             return packFloat64( zSign, 0x7FF, 0 ) - ( roundIncrement == 0 );  in roundAndPackFloat64()
426                 || ( zExp < -1 )  in roundAndPackFloat64()
428             shift64RightJamming( zSig, - zExp, &zSig );  in roundAndPackFloat64()
431             if ( isTiny && roundBits ) roundData->exception |= float_flag_underflow;  in roundAndPackFloat64()
434     if ( roundBits ) roundData->exception |= float_flag_inexact;  in roundAndPackFloat64()
443 -------------------------------------------------------------------------------
444 Takes an abstract floating-point value having sign `zSign', exponent `zExp',
445 and significand `zSig', and returns the proper double-precision floating-
446 point value corresponding to the abstract input.  This routine is just like
447 `roundAndPackFloat64' except that `zSig' does not have to be normalized in
448 any way.  In all cases, `zExp' must be 1 less than the ``true'' floating-
449 point exponent.
450 -------------------------------------------------------------------------------
457     shiftCount = countLeadingZeros64( zSig ) - 1;  in normalizeRoundAndPackFloat64()
458     return roundAndPackFloat64( roundData, zSign, zExp - shiftCount, zSig<<shiftCount );  in normalizeRoundAndPackFloat64()
465 -------------------------------------------------------------------------------
466 Returns the fraction bits of the extended double-precision floating-point
468 -------------------------------------------------------------------------------
478 -------------------------------------------------------------------------------
479 Returns the exponent bits of the extended double-precision floating-point
481 -------------------------------------------------------------------------------
491 -------------------------------------------------------------------------------
492 Returns the sign bit of the extended double-precision floating-point value
494 -------------------------------------------------------------------------------
504 -------------------------------------------------------------------------------
505 Normalizes the subnormal extended double-precision floating-point value
507 and significand are stored at the locations pointed to by `zExpPtr' and
509 -------------------------------------------------------------------------------
518     *zExpPtr = 1 - shiftCount;  in normalizeFloatx80Subnormal()
523 -------------------------------------------------------------------------------
525 extended double-precision floating-point value, returning the result.
526 -------------------------------------------------------------------------------
540 -------------------------------------------------------------------------------
541 Takes an abstract floating-point value having sign `zSign', exponent `zExp',
543 and returns the proper extended double-precision floating-point value
544 corresponding to the abstract input.  Ordinarily, the abstract value is
545 rounded and packed into the extended double-precision format, with the
549 returned.  If the abstract value is too small, the input value is rounded to
552 double-precision floating-point number.
553     If `roundingPrecision' is 32 or 64, the result is rounded to the same
555 result is rounded to the full precision of the extended double-precision
561 Floating-point Arithmetic.
562 -------------------------------------------------------------------------------
573     roundingMode = roundData->mode;  in roundAndPackFloatx80()
574     roundingPrecision = roundData->precision;  in roundAndPackFloatx80()
604     if ( 0x7FFD <= (bits32) ( zExp - 1 ) ) {  in roundAndPackFloatx80()
615             shift64RightJamming( zSig0, 1 - zExp, &zSig0 );  in roundAndPackFloatx80()
618             if ( isTiny && roundBits ) roundData->exception |= float_flag_underflow;  in roundAndPackFloatx80()
619             if ( roundBits ) roundData->exception |= float_flag_inexact;  in roundAndPackFloatx80()
630     if ( roundBits ) roundData->exception |= float_flag_inexact;  in roundAndPackFloatx80()
658     if ( 0x7FFD <= (bits32) ( zExp - 1 ) ) {  in roundAndPackFloatx80()
667             roundData->exception |= float_flag_overflow | float_flag_inexact;  in roundAndPackFloatx80()
682             shift64ExtraRightJamming( zSig0, zSig1, 1 - zExp, &zSig0, &zSig1 );  in roundAndPackFloatx80()
684             if ( isTiny && zSig1 ) roundData->exception |= float_flag_underflow;  in roundAndPackFloatx80()
685             if ( zSig1 ) roundData->exception |= float_flag_inexact;  in roundAndPackFloatx80()
705     if ( zSig1 ) roundData->exception |= float_flag_inexact;  in roundAndPackFloatx80()
724 -------------------------------------------------------------------------------
725 Takes an abstract floating-point value having sign `zSign', exponent
727 and returns the proper extended double-precision floating-point value
728 corresponding to the abstract input.  This routine is just like
729 `roundAndPackFloatx80' except that the input significand does not have to be
731 -------------------------------------------------------------------------------
743         zExp -= 64;  in normalizeRoundAndPackFloatx80()
747     zExp -= shiftCount;  in normalizeRoundAndPackFloatx80()
756 -------------------------------------------------------------------------------
757 Returns the result of converting the 32-bit two's complement integer `a' to
758 the single-precision floating-point format.  The conversion is performed
759 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
760 -------------------------------------------------------------------------------
769     return normalizeRoundAndPackFloat32( roundData, zSign, 0x9C, zSign ? - a : a );  in int32_to_float32()
774 -------------------------------------------------------------------------------
775 Returns the result of converting the 32-bit two's complement integer `a' to
776 the double-precision floating-point format.  The conversion is performed
777 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
778 -------------------------------------------------------------------------------
789     absA = aSign ? - a : a;  in int32_to_float64()
792     return packFloat64( aSign, 0x432 - shiftCount, zSig<<shiftCount );  in int32_to_float64()
799 -------------------------------------------------------------------------------
800 Returns the result of converting the 32-bit two's complement integer `a'
801 to the extended double-precision floating-point format.  The conversion
802 is performed according to the IEC/IEEE Standard for Binary Floating-point
804 -------------------------------------------------------------------------------
815     absA = zSign ? - a : a;  in int32_to_floatx80()
818     return packFloatx80( zSign, 0x403E - shiftCount, zSig<<shiftCount );  in int32_to_floatx80()
825 -------------------------------------------------------------------------------
826 Returns the result of converting the single-precision floating-point value
827 `a' to the 32-bit two's complement integer format.  The conversion is
828 performed according to the IEC/IEEE Standard for Binary Floating-point
829 Arithmetic---which means in particular that the conversion is rounded
830 according to the current rounding mode.  If `a' is a NaN, the largest
833 -------------------------------------------------------------------------------
847     shiftCount = 0xAF - aExp;  in float32_to_int32()
856 -------------------------------------------------------------------------------
857 Returns the result of converting the single-precision floating-point value
858 `a' to the 32-bit two's complement integer format.  The conversion is
859 performed according to the IEC/IEEE Standard for Binary Floating-point
864 -------------------------------------------------------------------------------
876     shiftCount = aExp - 0x9E;  in float32_to_int32_round_to_zero()
888     z = aSig>>( - shiftCount );  in float32_to_int32_round_to_zero()
892     return aSign ? - z : z;  in float32_to_int32_round_to_zero()
897 -------------------------------------------------------------------------------
898 Returns the result of converting the single-precision floating-point value
899 `a' to the double-precision floating-point format.  The conversion is
900 performed according to the IEC/IEEE Standard for Binary Floating-point
902 -------------------------------------------------------------------------------
920         --aExp;  in float32_to_float64()
929 -------------------------------------------------------------------------------
930 Returns the result of converting the single-precision floating-point value
931 `a' to the extended double-precision floating-point format.  The conversion
932 is performed according to the IEC/IEEE Standard for Binary Floating-point
934 -------------------------------------------------------------------------------
961 -------------------------------------------------------------------------------
962 Rounds the single-precision floating-point value `a' to an integer, and
963 returns the result as a single-precision floating-point value.  The
964 operation is performed according to the IEC/IEEE Standard for Binary
965 Floating-point Arithmetic.
966 -------------------------------------------------------------------------------
983     roundingMode = roundData->mode;  in float32_round_to_int()
986         roundData->exception |= float_flag_inexact;  in float32_round_to_int()
1002     lastBitMask <<= 0x96 - aExp;  in float32_round_to_int()
1003     roundBitsMask = lastBitMask - 1;  in float32_round_to_int()
1015     if ( z != a ) roundData->exception |= float_flag_inexact;  in float32_round_to_int()
1021 -------------------------------------------------------------------------------
1022 Returns the result of adding the absolute values of the single-precision
1023 floating-point values `a' and `b'.  If `zSign' is true, the sum is negated
1025 addition is performed according to the IEC/IEEE Standard for Binary
1026 Floating-point Arithmetic.
1027 -------------------------------------------------------------------------------
1039     expDiff = aExp - bExp;  in addFloat32Sigs()
1048             --expDiff;  in addFloat32Sigs()
1067         shift32RightJamming( aSig, - expDiff, &aSig );  in addFloat32Sigs()
1082     --zExp;  in addFloat32Sigs()
1093 -------------------------------------------------------------------------------
1094 Returns the result of subtracting the absolute values of the single-
1095 precision floating-point values `a' and `b'.  If `zSign' is true, the
1097 result is a NaN.  The subtraction is performed according to the IEC/IEEE
1098 Standard for Binary Floating-point Arithmetic.
1099 -------------------------------------------------------------------------------
1111     expDiff = aExp - bExp;  in subFloat32Sigs()
1118         roundData->exception |= float_flag_invalid;  in subFloat32Sigs()
1127     return packFloat32( roundData->mode == float_round_down, 0, 0 );  in subFloat32Sigs()
1139     shift32RightJamming( aSig, - expDiff, &aSig );  in subFloat32Sigs()
1142     zSig = bSig - aSig;  in subFloat32Sigs()
1152         --expDiff;  in subFloat32Sigs()
1160     zSig = aSig - bSig;  in subFloat32Sigs()
1163     --zExp;  in subFloat32Sigs()
1169 -------------------------------------------------------------------------------
1170 Returns the result of adding the single-precision floating-point values `a'
1171 and `b'.  The operation is performed according to the IEC/IEEE Standard for
1172 Binary Floating-point Arithmetic.
1173 -------------------------------------------------------------------------------
1191 -------------------------------------------------------------------------------
1192 Returns the result of subtracting the single-precision floating-point values
1193 `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
1194 for Binary Floating-point Arithmetic.
1195 -------------------------------------------------------------------------------
1213 -------------------------------------------------------------------------------
1214 Returns the result of multiplying the single-precision floating-point values
1215 `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
1216 for Binary Floating-point Arithmetic.
1217 -------------------------------------------------------------------------------
1239             roundData->exception |= float_flag_invalid;  in float32_mul()
1247             roundData->exception |= float_flag_invalid;  in float32_mul()
1260     zExp = aExp + bExp - 0x7F;  in float32_mul()
1267         --zExp;  in float32_mul()
1274 -------------------------------------------------------------------------------
1275 Returns the result of dividing the single-precision floating-point value `a'
1276 by the corresponding value `b'.  The operation is performed according to the
1277 IEC/IEEE Standard for Binary Floating-point Arithmetic.
1278 -------------------------------------------------------------------------------
1297             roundData->exception |= float_flag_invalid;  in float32_div()
1309                 roundData->exception |= float_flag_invalid;  in float32_div()
1312             roundData->exception |= float_flag_divbyzero;  in float32_div()
1321     zExp = aExp - bExp + 0x7D;  in float32_div()
1341 -------------------------------------------------------------------------------
1342 Returns the remainder of the single-precision floating-point value `a'
1343 with respect to the corresponding value `b'.  The operation is performed
1344 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
1345 -------------------------------------------------------------------------------
1367         roundData->exception |= float_flag_invalid;  in float32_rem()
1376             roundData->exception |= float_flag_invalid;  in float32_rem()
1385     expDiff = aExp - bExp;  in float32_rem()
1392             if ( expDiff < -1 ) return a;  in float32_rem()
1396         if ( q ) aSig -= bSig;  in float32_rem()
1401             q >>= 32 - expDiff;  in float32_rem()
1403             aSig = ( ( aSig>>1 )<<( expDiff - 1 ) ) - bSig * q;  in float32_rem()
1411         if ( bSig <= aSig ) aSig -= bSig;  in float32_rem()
1414         expDiff -= 64;  in float32_rem()
1417             q64 = ( 2 < q64 ) ? q64 - 2 : 0;  in float32_rem()
1418             aSig64 = - ( ( bSig * q64 )<<38 );  in float32_rem()
1419             expDiff -= 62;  in float32_rem()
1423         q64 = ( 2 < q64 ) ? q64 - 2 : 0;  in float32_rem()
1424         q = q64>>( 64 - expDiff );  in float32_rem()
1426         aSig = ( ( aSig64>>33 )<<( expDiff - 1 ) ) - bSig * q;  in float32_rem()
1431         aSig -= bSig;  in float32_rem()
1438     if ( zSign ) aSig = - aSig;  in float32_rem()
1444 -------------------------------------------------------------------------------
1445 Returns the square root of the single-precision floating-point value `a'.
1446 The operation is performed according to the IEC/IEEE Standard for Binary
1447 Floating-point Arithmetic.
1448 -------------------------------------------------------------------------------
1463         roundData->exception |= float_flag_invalid;  in float32_sqrt()
1468         roundData->exception |= float_flag_invalid;  in float32_sqrt()
1475     zExp = ( ( aExp - 0x7F )>>1 ) + 0x7E;  in float32_sqrt()
1485             rem = ( ( (bits64) aSig )<<32 ) - term;  in float32_sqrt()
1487                 --zSig;  in float32_sqrt()
1499 -------------------------------------------------------------------------------
1500 Returns 1 if the single-precision floating-point value `a' is equal to the
1502 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
1503 -------------------------------------------------------------------------------
1521 -------------------------------------------------------------------------------
1522 Returns 1 if the single-precision floating-point value `a' is less than or
1523 equal to the corresponding value `b', and 0 otherwise.  The comparison is
1524 performed according to the IEC/IEEE Standard for Binary Floating-point
1526 -------------------------------------------------------------------------------
1546 -------------------------------------------------------------------------------
1547 Returns 1 if the single-precision floating-point value `a' is less than
1549 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
1550 -------------------------------------------------------------------------------
1570 -------------------------------------------------------------------------------
1571 Returns 1 if the single-precision floating-point value `a' is equal to the
1574 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
1575 -------------------------------------------------------------------------------
1591 -------------------------------------------------------------------------------
1592 Returns 1 if the single-precision floating-point value `a' is less than or
1593 equal to the corresponding value `b', and 0 otherwise.  Quiet NaNs do not
1594 cause an exception.  Otherwise, the comparison is performed according to the
1595 IEC/IEEE Standard for Binary Floating-point Arithmetic.
1596 -------------------------------------------------------------------------------
1617 -------------------------------------------------------------------------------
1618 Returns 1 if the single-precision floating-point value `a' is less than
1620 exception.  Otherwise, the comparison is performed according to the IEC/IEEE
1621 Standard for Binary Floating-point Arithmetic.
1622 -------------------------------------------------------------------------------
1642 -------------------------------------------------------------------------------
1643 Returns the result of converting the double-precision floating-point value
1644 `a' to the 32-bit two's complement integer format.  The conversion is
1645 performed according to the IEC/IEEE Standard for Binary Floating-point
1646 Arithmetic---which means in particular that the conversion is rounded
1647 according to the current rounding mode.  If `a' is a NaN, the largest
1650 -------------------------------------------------------------------------------
1663     shiftCount = 0x42C - aExp;  in float64_to_int32()
1670 -------------------------------------------------------------------------------
1671 Returns the result of converting the double-precision floating-point value
1672 `a' to the 32-bit two's complement integer format.  The conversion is
1673 performed according to the IEC/IEEE Standard for Binary Floating-point
1678 -------------------------------------------------------------------------------
1690     shiftCount = 0x433 - aExp;  in float64_to_int32_round_to_zero()
1703     if ( aSign ) z = - z;  in float64_to_int32_round_to_zero()
1717 -------------------------------------------------------------------------------
1718 Returns the result of converting the double-precision floating-point value
1719 `a' to the 32-bit two's complement unsigned integer format.  The conversion
1720 is performed according to the IEC/IEEE Standard for Binary Floating-point
1721 Arithmetic---which means in particular that the conversion is rounded
1722 according to the current rounding mode.  If `a' is a NaN, the largest
1725 -------------------------------------------------------------------------------
1738     shiftCount = 0x42C - aExp;  in float64_to_uint32()
1744 -------------------------------------------------------------------------------
1745 Returns the result of converting the double-precision floating-point value
1746 `a' to the 32-bit two's complement integer format.  The conversion is
1747 performed according to the IEC/IEEE Standard for Binary Floating-point
1751 -------------------------------------------------------------------------------
1763     shiftCount = 0x433 - aExp;  in float64_to_uint32_round_to_zero()
1776     if ( aSign ) z = - z;  in float64_to_uint32_round_to_zero()
1789 -------------------------------------------------------------------------------
1790 Returns the result of converting the double-precision floating-point value
1791 `a' to the single-precision floating-point format.  The conversion is
1792 performed according to the IEC/IEEE Standard for Binary Floating-point
1794 -------------------------------------------------------------------------------
1814         aExp -= 0x381;  in float64_to_float32()
1823 -------------------------------------------------------------------------------
1824 Returns the result of converting the double-precision floating-point value
1825 `a' to the extended double-precision floating-point format.  The conversion
1826 is performed according to the IEC/IEEE Standard for Binary Floating-point
1828 -------------------------------------------------------------------------------
1856 -------------------------------------------------------------------------------
1857 Rounds the double-precision floating-point value `a' to an integer, and
1858 returns the result as a double-precision floating-point value.  The
1859 operation is performed according to the IEC/IEEE Standard for Binary
1860 Floating-point Arithmetic.
1861 -------------------------------------------------------------------------------
1880         roundData->exception |= float_flag_inexact;  in float64_round_to_int()
1882         switch ( roundData->mode ) {  in float64_round_to_int()
1897     lastBitMask <<= 0x433 - aExp;  in float64_round_to_int()
1898     roundBitsMask = lastBitMask - 1;  in float64_round_to_int()
1900     roundingMode = roundData->mode;  in float64_round_to_int()
1911     if ( z != a ) roundData->exception |= float_flag_inexact;  in float64_round_to_int()
1917 -------------------------------------------------------------------------------
1918 Returns the result of adding the absolute values of the double-precision
1919 floating-point values `a' and `b'.  If `zSign' is true, the sum is negated
1921 addition is performed according to the IEC/IEEE Standard for Binary
1922 Floating-point Arithmetic.
1923 -------------------------------------------------------------------------------
1935     expDiff = aExp - bExp;  in addFloat64Sigs()
1944             --expDiff;  in addFloat64Sigs()
1963         shift64RightJamming( aSig, - expDiff, &aSig );  in addFloat64Sigs()
1978     --zExp;  in addFloat64Sigs()
1989 -------------------------------------------------------------------------------
1990 Returns the result of subtracting the absolute values of the double-
1991 precision floating-point values `a' and `b'.  If `zSign' is true, the
1993 result is a NaN.  The subtraction is performed according to the IEC/IEEE
1994 Standard for Binary Floating-point Arithmetic.
1995 -------------------------------------------------------------------------------
2007     expDiff = aExp - bExp;  in subFloat64Sigs()
2014         roundData->exception |= float_flag_invalid;  in subFloat64Sigs()
2023     return packFloat64( roundData->mode == float_round_down, 0, 0 );  in subFloat64Sigs()
2035     shift64RightJamming( aSig, - expDiff, &aSig );  in subFloat64Sigs()
2038     zSig = bSig - aSig;  in subFloat64Sigs()
2048         --expDiff;  in subFloat64Sigs()
2056     zSig = aSig - bSig;  in subFloat64Sigs()
2059     --zExp;  in subFloat64Sigs()
2065 -------------------------------------------------------------------------------
2066 Returns the result of adding the double-precision floating-point values `a'
2067 and `b'.  The operation is performed according to the IEC/IEEE Standard for
2068 Binary Floating-point Arithmetic.
2069 -------------------------------------------------------------------------------
2087 -------------------------------------------------------------------------------
2088 Returns the result of subtracting the double-precision floating-point values
2089 `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
2090 for Binary Floating-point Arithmetic.
2091 -------------------------------------------------------------------------------
2109 -------------------------------------------------------------------------------
2110 Returns the result of multiplying the double-precision floating-point values
2111 `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
2112 for Binary Floating-point Arithmetic.
2113 -------------------------------------------------------------------------------
2133             roundData->exception |= float_flag_invalid;  in float64_mul()
2141             roundData->exception |= float_flag_invalid;  in float64_mul()
2154     zExp = aExp + bExp - 0x3FF;  in float64_mul()
2161         --zExp;  in float64_mul()
2168 -------------------------------------------------------------------------------
2169 Returns the result of dividing the double-precision floating-point value `a'
2170 by the corresponding value `b'.  The operation is performed according to
2171 the IEC/IEEE Standard for Binary Floating-point Arithmetic.
2172 -------------------------------------------------------------------------------
2193             roundData->exception |= float_flag_invalid;  in float64_div()
2205                 roundData->exception |= float_flag_invalid;  in float64_div()
2208             roundData->exception |= float_flag_divbyzero;  in float64_div()
2217     zExp = aExp - bExp + 0x3FD;  in float64_div()
2229             --zSig;  in float64_div()
2239 -------------------------------------------------------------------------------
2240 Returns the remainder of the double-precision floating-point value `a'
2241 with respect to the corresponding value `b'.  The operation is performed
2242 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
2243 -------------------------------------------------------------------------------
2263         roundData->exception |= float_flag_invalid;  in float64_rem()
2272             roundData->exception |= float_flag_invalid;  in float64_rem()
2281     expDiff = aExp - bExp;  in float64_rem()
2285         if ( expDiff < -1 ) return a;  in float64_rem()
2289     if ( q ) aSig -= bSig;  in float64_rem()
2290     expDiff -= 64;  in float64_rem()
2293         q = ( 2 < q ) ? q - 2 : 0;  in float64_rem()
2294         aSig = - ( ( bSig>>2 ) * q );  in float64_rem()
2295         expDiff -= 62;  in float64_rem()
2300         q = ( 2 < q ) ? q - 2 : 0;  in float64_rem()
2301         q >>= 64 - expDiff;  in float64_rem()
2303         aSig = ( ( aSig>>1 )<<( expDiff - 1 ) ) - bSig * q;  in float64_rem()
2312         aSig -= bSig;  in float64_rem()
2319     if ( zSign ) aSig = - aSig;  in float64_rem()
2325 -------------------------------------------------------------------------------
2326 Returns the square root of the double-precision floating-point value `a'.
2327 The operation is performed according to the IEC/IEEE Standard for Binary
2328 Floating-point Arithmetic.
2329 -------------------------------------------------------------------------------
2345         roundData->exception |= float_flag_invalid;  in float64_sqrt()
2350         roundData->exception |= float_flag_invalid;  in float64_sqrt()
2357     zExp = ( ( aExp - 0x3FF )>>1 ) + 0x3FE;  in float64_sqrt()
2361     aSig <<= 9 - ( aExp & 1 );  in float64_sqrt()
2372                 --zSig;  in float64_sqrt()
2386 -------------------------------------------------------------------------------
2387 Returns 1 if the double-precision floating-point value `a' is equal to the
2389 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
2390 -------------------------------------------------------------------------------
2408 -------------------------------------------------------------------------------
2409 Returns 1 if the double-precision floating-point value `a' is less than or
2410 equal to the corresponding value `b', and 0 otherwise.  The comparison is
2411 performed according to the IEC/IEEE Standard for Binary Floating-point
2413 -------------------------------------------------------------------------------
2433 -------------------------------------------------------------------------------
2434 Returns 1 if the double-precision floating-point value `a' is less than
2436 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
2437 -------------------------------------------------------------------------------
2457 -------------------------------------------------------------------------------
2458 Returns 1 if the double-precision floating-point value `a' is equal to the
2461 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
2462 -------------------------------------------------------------------------------
2478 -------------------------------------------------------------------------------
2479 Returns 1 if the double-precision floating-point value `a' is less than or
2480 equal to the corresponding value `b', and 0 otherwise.  Quiet NaNs do not
2481 cause an exception.  Otherwise, the comparison is performed according to the
2482 IEC/IEEE Standard for Binary Floating-point Arithmetic.
2483 -------------------------------------------------------------------------------
2504 -------------------------------------------------------------------------------
2505 Returns 1 if the double-precision floating-point value `a' is less than
2507 exception.  Otherwise, the comparison is performed according to the IEC/IEEE
2508 Standard for Binary Floating-point Arithmetic.
2509 -------------------------------------------------------------------------------
2531 -------------------------------------------------------------------------------
2532 Returns the result of converting the extended double-precision floating-
2533 point value `a' to the 32-bit two's complement integer format.  The
2534 conversion is performed according to the IEC/IEEE Standard for Binary
2535 Floating-point Arithmetic---which means in particular that the conversion
2536 is rounded according to the current rounding mode.  If `a' is a NaN, the
2539 -------------------------------------------------------------------------------
2551     shiftCount = 0x4037 - aExp;  in floatx80_to_int32()
2559 -------------------------------------------------------------------------------
2560 Returns the result of converting the extended double-precision floating-
2561 point value `a' to the 32-bit two's complement integer format.  The
2562 conversion is performed according to the IEC/IEEE Standard for Binary
2563 Floating-point Arithmetic, except that the conversion is always rounded
2567 -------------------------------------------------------------------------------
2579     shiftCount = 0x403E - aExp;  in floatx80_to_int32_round_to_zero()
2591     if ( aSign ) z = - z;  in floatx80_to_int32_round_to_zero()
2605 -------------------------------------------------------------------------------
2606 Returns the result of converting the extended double-precision floating-
2607 point value `a' to the single-precision floating-point format.  The
2608 conversion is performed according to the IEC/IEEE Standard for Binary
2609 Floating-point Arithmetic.
2610 -------------------------------------------------------------------------------
2628     if ( aExp || aSig ) aExp -= 0x3F81;  in floatx80_to_float32()
2634 -------------------------------------------------------------------------------
2635 Returns the result of converting the extended double-precision floating-
2636 point value `a' to the double-precision floating-point format.  The
2637 conversion is performed according to the IEC/IEEE Standard for Binary
2638 Floating-point Arithmetic.
2639 -------------------------------------------------------------------------------
2657     if ( aExp || aSig ) aExp -= 0x3C01;  in floatx80_to_float64()
2663 -------------------------------------------------------------------------------
2664 Rounds the extended double-precision floating-point value `a' to an integer,
2665 and returns the result as an extended quadruple-precision floating-point
2666 value.  The operation is performed according to the IEC/IEEE Standard for
2667 Binary Floating-point Arithmetic.
2668 -------------------------------------------------------------------------------
2690         roundData->exception |= float_flag_inexact;  in floatx80_round_to_int()
2692         switch ( roundData->mode ) {  in floatx80_round_to_int()
2713     lastBitMask <<= 0x403E - aExp;  in floatx80_round_to_int()
2714     roundBitsMask = lastBitMask - 1;  in floatx80_round_to_int()
2716     roundingMode = roundData->mode;  in floatx80_round_to_int()
2731     if ( z.low != a.low ) roundData->exception |= float_flag_inexact;  in floatx80_round_to_int()
2737 -------------------------------------------------------------------------------
2738 Returns the result of adding the absolute values of the extended double-
2739 precision floating-point values `a' and `b'.  If `zSign' is true, the sum is
2741 The addition is performed according to the IEC/IEEE Standard for Binary
2742 Floating-point Arithmetic.
2743 -------------------------------------------------------------------------------
2755     expDiff = aExp - bExp;  in addFloatx80Sigs()
2761         if ( bExp == 0 ) --expDiff;  in addFloatx80Sigs()
2771         shift64ExtraRightJamming( aSig, 0, - expDiff, &aSig, &zSig1 );  in addFloatx80Sigs()
2806 -------------------------------------------------------------------------------
2808 double-precision floating-point values `a' and `b'.  If `zSign' is true,
2810 result is a NaN.  The subtraction is performed according to the IEC/IEEE
2811 Standard for Binary Floating-point Arithmetic.
2812 -------------------------------------------------------------------------------
2825     expDiff = aExp - bExp;  in subFloatx80Sigs()
2832         roundData->exception |= float_flag_invalid;  in subFloatx80Sigs()
2845     return packFloatx80( roundData->mode == float_round_down, 0, 0 );  in subFloatx80Sigs()
2852     shift128RightJamming( aSig, 0, - expDiff, &aSig, &zSig1 );  in subFloatx80Sigs()
2863     if ( bExp == 0 ) --expDiff;  in subFloatx80Sigs()
2876 -------------------------------------------------------------------------------
2877 Returns the result of adding the extended double-precision floating-point
2878 values `a' and `b'.  The operation is performed according to the IEC/IEEE
2879 Standard for Binary Floating-point Arithmetic.
2880 -------------------------------------------------------------------------------
2898 -------------------------------------------------------------------------------
2899 Returns the result of subtracting the extended double-precision floating-
2900 point values `a' and `b'.  The operation is performed according to the
2901 IEC/IEEE Standard for Binary Floating-point Arithmetic.
2902 -------------------------------------------------------------------------------
2920 -------------------------------------------------------------------------------
2921 Returns the result of multiplying the extended double-precision floating-
2922 point values `a' and `b'.  The operation is performed according to the
2923 IEC/IEEE Standard for Binary Floating-point Arithmetic.
2924 -------------------------------------------------------------------------------
2952             roundData->exception |= float_flag_invalid;  in floatx80_mul()
2968     zExp = aExp + bExp - 0x3FFE;  in floatx80_mul()
2972         --zExp;  in floatx80_mul()
2981 -------------------------------------------------------------------------------
2982 Returns the result of dividing the extended double-precision floating-point
2984 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
2985 -------------------------------------------------------------------------------
3018                 roundData->exception |= float_flag_invalid;  in floatx80_div()
3024             roundData->exception |= float_flag_divbyzero;  in floatx80_div()
3033     zExp = aExp - bExp + 0x3FFE;  in floatx80_div()
3043         --zSig0;  in floatx80_div()
3051             --zSig1;  in floatx80_div()
3063 -------------------------------------------------------------------------------
3064 Returns the remainder of the extended double-precision floating-point value
3065 `a' with respect to the corresponding value `b'.  The operation is performed
3066 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
3067 -------------------------------------------------------------------------------
3097             roundData->exception |= float_flag_invalid;  in floatx80_rem()
3111     expDiff = aExp - bExp;  in floatx80_rem()
3114         if ( expDiff < -1 ) return a;  in floatx80_rem()
3119     if ( q ) aSig0 -= bSig;  in floatx80_rem()
3120     expDiff -= 64;  in floatx80_rem()
3123         q = ( 2 < q ) ? q - 2 : 0;  in floatx80_rem()
3127         expDiff -= 62;  in floatx80_rem()
3132         q = ( 2 < q ) ? q - 2 : 0;  in floatx80_rem()
3133         q >>= 64 - expDiff;  in floatx80_rem()
3134         mul64To128( bSig, q<<( 64 - expDiff ), &term0, &term1 );  in floatx80_rem()
3136         shortShift128Left( 0, bSig, 64 - expDiff, &term0, &term1 );  in floatx80_rem()
3163 -------------------------------------------------------------------------------
3164 Returns the square root of the extended double-precision floating-point
3165 value `a'.  The operation is performed according to the IEC/IEEE Standard
3166 for Binary Floating-point Arithmetic.
3167 -------------------------------------------------------------------------------
3189         roundData->exception |= float_flag_invalid;  in floatx80_sqrt()
3199     zExp = ( ( aExp - 0x3FFF )>>1 ) + 0x3FFF;  in floatx80_sqrt()
3210         --zSig0;  in floatx80_sqrt()
3225             --zSig1;  in floatx80_sqrt()
3240 -------------------------------------------------------------------------------
3241 Returns 1 if the extended double-precision floating-point value `a' is
3242 equal to the corresponding value `b', and 0 otherwise.  The comparison is
3243 performed according to the IEC/IEEE Standard for Binary Floating-point
3245 -------------------------------------------------------------------------------
3271 -------------------------------------------------------------------------------
3272 Returns 1 if the extended double-precision floating-point value `a' is
3273 less than or equal to the corresponding value `b', and 0 otherwise.  The
3274 comparison is performed according to the IEC/IEEE Standard for Binary
3275 Floating-point Arithmetic.
3276 -------------------------------------------------------------------------------
3305 -------------------------------------------------------------------------------
3306 Returns 1 if the extended double-precision floating-point value `a' is
3308 is performed according to the IEC/IEEE Standard for Binary Floating-point
3310 -------------------------------------------------------------------------------
3339 -------------------------------------------------------------------------------
3340 Returns 1 if the extended double-precision floating-point value `a' is equal
3341 to the corresponding value `b', and 0 otherwise.  The invalid exception is
3343 according to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
3344 -------------------------------------------------------------------------------
3367 -------------------------------------------------------------------------------
3368 Returns 1 if the extended double-precision floating-point value `a' is less
3369 than or equal to the corresponding value `b', and 0 otherwise.  Quiet NaNs
3371 to the IEC/IEEE Standard for Binary Floating-point Arithmetic.
3372 -------------------------------------------------------------------------------
3401 -------------------------------------------------------------------------------
3402 Returns 1 if the extended double-precision floating-point value `a' is less
3404 an exception.  Otherwise, the comparison is performed according to the
3405 IEC/IEEE Standard for Binary Floating-point Arithmetic.
3406 -------------------------------------------------------------------------------
In current file

In project "undefined"

On Google