s_cbrt.c - OpenGrok history log for /freebsd/lib/msun/src/s

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 0dd5a560	28-Jan-2024	Steve Kargl <kargl@FreeBSD.org>	lib/msun: Cleanup after $FreeBSD$ removal Remove no longer needed explicit inclusion of sys/cdefs.h. PR: 276669 MFC after: 1 week
# dc36d6f9	23-Nov-2023	Warner Losh <imp@FreeBSD.org>	lib: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl s lib: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix show more ...
Revision tags: release/14.0.0
# 1d386b48	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
Revision tags: release/13.2.0, release/12.4.0
# 369ea052	04-Aug-2022	Steve Kargl <kargl@FreeBSD.org>	[libm] Correct comments in s_cbrt[l].c Damian McGuckin <damianm at esi dot com dot au> noted that the accuracy claims in the code for cbrt(3) and cbrtl(3) were incorrect. Fix the comments to more ac [libm] Correct comments in s_cbrt[l].c Damian McGuckin <damianm at esi dot com dot au> noted that the accuracy claims in the code for cbrt(3) and cbrtl(3) were incorrect. Fix the comments to more accurately describe the accuracies. PR: 265603 MFC after: 3 days show more ...
Revision tags: release/13.1.0, release/12.3.0, release/13.0.0, release/12.2.0, release/11.4.0, release/12.1.0, release/11.3.0
# 003fdafb	28-Dec-2018	Justin Hibbits <jhibbits@FreeBSD.org>	libm: Include float.h to get LDBL_MANT_DIG The long double aliases of double functions are only exposed as aliases if LDBL_MANT_DIG is 53 (same as DBL_MANT_DIG). Without float.h included these file libm: Include float.h to get LDBL_MANT_DIG The long double aliases of double functions are only exposed as aliases if LDBL_MANT_DIG is 53 (same as DBL_MANT_DIG). Without float.h included these files were not exposing weak aliases as expected, leading to link failures if programs use the *l functions. This should fix editors/calligra on targets with 64-bit long double, which uses erfl and erfcl. Found on powerpc64. Reviewed by: kargl@ show more ...
Revision tags: release/12.0.0, release/11.2.0, release/10.4.0, release/11.1.0, release/11.0.1, release/11.0.0
# 75f46cf6	01-May-2016	Pedro F. Giffuni <pfg@FreeBSD.org>	lib: minor spelling fixes in comments. No functional change.
Revision tags: release/10.3.0, release/10.2.0, release/10.1.0, release/9.3.0, release/10.0.0, release/9.2.0, release/8.4.0, release/9.1.0, release/8.3.0_cvs, release/8.3.0, release/9.0.0
# dfe5233b	12-Mar-2011	Steve Kargl <kargl@FreeBSD.org>	Implement the long double version for the cube root function, cbrtl. The algorithm uses Newton's iterations with a crude estimate of the cube root to converge to a result. Reviewed by: bde Approved Implement the long double version for the cube root function, cbrtl. The algorithm uses Newton's iterations with a crude estimate of the cube root to converge to a result. Reviewed by: bde Approved by: das show more ...
# 6bccea7c	21-Feb-2011	Rebecca Cran <brucec@FreeBSD.org>	Fix typos - remove duplicate "the". PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days
Revision tags: release/7.4.0_cvs, release/8.2.0_cvs, release/7.4.0, release/8.2.0, release/8.1.0_cvs, release/8.1.0, release/7.3.0_cvs, release/7.3.0, release/8.0.0_cvs, release/8.0.0, release/7.2.0_cvs, release/7.2.0, release/7.1.0_cvs, release/7.1.0, release/6.4.0_cvs, release/6.4.0, release/7.0.0_cvs, release/7.0.0
# 5aa554c7	22-Feb-2008	David Schultz <das@FreeBSD.org>	s/rcsid/__FBSDID/
Revision tags: release/6.3.0_cvs, release/6.3.0, release/6.2.0_cvs, release/6.2.0, release/5.5.0_cvs, release/5.5.0, release/6.1.0_cvs, release/6.1.0
# 5776f433	20-Dec-2005	Bruce Evans <bde@FreeBSD.org>	Extract the high and low words together. With gcc-3.4 on uniformly distributed non-large args, this saves about 14 of 134 cycles for Athlon64s and about 5 of 199 cycles for AthlonXPs. Moved the che Extract the high and low words together. With gcc-3.4 on uniformly distributed non-large args, this saves about 14 of 134 cycles for Athlon64s and about 5 of 199 cycles for AthlonXPs. Moved the check for x == 0 inside the check for subnormals. With gcc-3.4 on uniformly distributed non-large args, this saves another 5 cycles on Athlon64s and loses 1 cycle on AthlonXPs. Use INSERT_WORDS() and not SET_HIGH_WORD() when converting the first approximation from bits to double. With gcc-3.4 on uniformly distributed non-large args, this saves another 4 cycles on both Athlon64s and and AthlonXPs. Accessing doubles as 2 words may be an optimization on old CPUs, but on current CPUs it tends to cause extra operations and pipeline stalls, especially for writes, even when only 1 of the words needs to be accessed. Removed an unused variable. show more ...
# c5964538	19-Dec-2005	Bruce Evans <bde@FreeBSD.org>	Use a minimax polynomial approximation instead of a Pade rational function approximation for the second step. The polynomial has degree 2 for cbrtf() and 4 for cbrt(). These degrees are minimal for Use a minimax polynomial approximation instead of a Pade rational function approximation for the second step. The polynomial has degree 2 for cbrtf() and 4 for cbrt(). These degrees are minimal for the final accuracy to be essentially the same as before (slightly smaller). Adjust the rounding between steps 2 and 3 to match. Unfortunately, for cbrt(), this breaks the claimed accuracy slightly although incorrect rounding doesn't. Claim less accuracy since its not worth pessimizing the polynomial or relying on exhaustive testing to get insignificantly more accuracy. This saves about 30 cycles on Athlons (mainly by avoiding 2 divisions) so it gives an overall optimization in the 10-25% range (a larger percentage for float precision, especially in 32-bit mode, since other overheads are more dominant for double precision, surprisingly more in 32-bit mode). show more ...
# ce804bff	18-Dec-2005	Bruce Evans <bde@FreeBSD.org>	Fixed code to match comments and the algorithm: - in preparing for the third approximation, actually make t larger in magnitude than cbrt(x). After chopping, t must be incremented by 2 ulps to m Fixed code to match comments and the algorithm: - in preparing for the third approximation, actually make t larger in magnitude than cbrt(x). After chopping, t must be incremented by 2 ulps to make it larger, not 1 ulp since chopping can reduce it by almost 1 ulp and it might already be up to half a different-sized-ulp smaller than cbrt(x). I have not found any cases where this is essential, but the think-time error bound depends on it. The relative smallness of the different-sized-ulp limited the bug. If there are cases where this is essential, then the final error bound would be 5/6+epsilon instead of of 4/6+epsilon ulps (still < 1). - in preparing for the third approximation, round more carefully (but still sloppily to avoid branches) so that the claimed error bound of 0.667 ulps is satisfied in all cases tested for cbrt() and remains satisfied in all cases for cbrtf(). There isn't enough spare precision for very sloppy rounding to work: - in cbrt(), even with the inadequate increment, the actual error was 0.6685 in some cases, and correcting the increment increased this a little. The fix uses sloppy rounding to 25 bits instead of very sloppy rounding to 21 bits, and starts using uint64_t instead of 2 words for bit manipulation so that rounding more bits is not much costly. - in cbrtf(), the 0.667 bound was already satisfied even with the inadequate increment, but change the code to almost match cbrt() anyway. There is not enough spare precision in the Newton approximation to double the inadequate increment without exceeding the 0.667 bound, and no spare precision to avoid this problem as in cbrt(). The fix is to round using an increment of 2 smaller-ulps before chopping so that an increment of 1 ulp is enough. In cbrt(), we essentially do the same, but move the chop point so that the increment of 1 is not needed. Fixed comments to match code: - in cbrt(), the second approximation is good to 25 bits, not quite 26 bits. - in cbrt(), don't claim that the second approximation may be implemented in single precision. Single precision cannot handle the full exponent range without minor but pessimal changes to renormalize, and although single precision is enough, 25 bit precision is now claimed and used. Added comments about some of the magic for the error bound 4/6+epsilon. I still don't understand why it is 4/6+ and not 6/6+ ulps. Indent comments at the right of code more consistently. show more ...
# 7aac169e	15-Dec-2005	Bruce Evans <bde@FreeBSD.org>	Added comments about the apparently-magic rational function used in the second step of approximating cbrt(x). It turns out to be neither very magic not nor very good. It is just the (2,2) Pade appr Added comments about the apparently-magic rational function used in the second step of approximating cbrt(x). It turns out to be neither very magic not nor very good. It is just the (2,2) Pade approximation to 1/cbrt(r) at r = 1, arranged in a strange way to use fewer operations at a cost of replacing 4 multiplications by 1 division, which is an especially bad tradeoff on machines where some of the multiplications can be done in parallel. A Remez rational approximation would give at least 2 more bits of accuracy, but the (2,2) Pade approximation already gives 6 more bits than needed. (Changed the comment which essentially says that it gives 3 more bits.) Lower order Pade approximations are not quite accurate enough for double precision but are plenty for float precision. A lower order Remez rational approximation might be enough for double precision too. However, rational approximations inherently require an extra division, and polynomial approximations work well for 1/cbrt(r) at r = 1, so I plan to switch to using the latter. There are some technical complications that tend to cost a division in another way. show more ...
# ec761d75	13-Dec-2005	Bruce Evans <bde@FreeBSD.org>	Optimize by not doing excessive conversions for handling the sign bit. This gives an optimization of between 9 and 22% on Athlons (largest for cbrt() on amd64 -- from 205 to 159 cycles). We extracte Optimize by not doing excessive conversions for handling the sign bit. This gives an optimization of between 9 and 22% on Athlons (largest for cbrt() on amd64 -- from 205 to 159 cycles). We extracted the sign bit and worked with \|x\|, and restored the sign bit as the last step. We avoided branches to a fault by using accesses to FP values as bits to clear and restore the sign bit. Avoiding branches is usually good, but the bit access macros are not so good (especially for setting FP values), and here they always caused pipeline stalls on Athlons. Even using branches would be faster except on args that give perfect branch misprediction, since only mispredicted branches cause stalls, but it possible to avoid touching the sign bit in FP values at all (except to preserve it in conversions from bits to FP not related to the sign bit). Do this. The results are identical except in 2 of the 3 unsupported rounding modes, since all the approximations use odd rational functions so they work right on strictly negative values, and the special case of -0 doesn't use an approximation. show more ...
# 7d5a4821	13-Dec-2005	Bruce Evans <bde@FreeBSD.org>	Fixed some especially horrible style bugs (indentation that is neither KNF nor fdlibmNF combined with multiple statements per line).
# af7f9913	11-Dec-2005	Bruce Evans <bde@FreeBSD.org>	Added comments about the magic behind <cbrt(x) in bits> ~= <x in bits>/3 + BIAS. Keep the large comments only in the double version as usual. Fixed some style bugs (mainly grammar and spelling e Added comments about the magic behind <cbrt(x) in bits> ~= <x in bits>/3 + BIAS. Keep the large comments only in the double version as usual. Fixed some style bugs (mainly grammar and spelling errors in comments). show more ...
Revision tags: release/6.0.0_cvs, release/6.0.0, release/5.4.0_cvs, release/5.4.0, release/4.11.0_cvs, release/4.11.0, release/5.3.0_cvs, release/5.3.0, release/4.10.0_cvs, release/4.10.0, release/5.2.1_cvs, release/5.2.1, release/5.2.0_cvs, release/5.2.0, release/4.9.0_cvs, release/4.9.0, release/5.1.0_cvs, release/5.1.0, release/4.8.0_cvs, release/4.8.0, release/5.0.0_cvs, release/5.0.0, release/4.7.0_cvs, release/4.6.2_cvs, release/4.6.2, release/4.6.1, release/4.6.0_cvs
# 59b19ff1	28-May-2002	Alfred Perlstein <alfred@FreeBSD.org>	Fix formatting, this is hard to explain, so I'll show one example. - float ynf(int n, float x) /* wrapper ynf / +float +ynf(int n, float x) / wrapper ynf / This is because the __S Fix formatting, this is hard to explain, so I'll show one example. - float ynf(int n, float x) / wrapper ynf / +float +ynf(int n, float x) / wrapper ynf */ This is because the __STDC__ stuff was indented. Reviewed by: md5 show more ...
# 2dcc2286	28-May-2002	Alfred Perlstein <alfred@FreeBSD.org>	Assume __STDC__, remove non-__STDC__ code. Reviewed by: md5
Revision tags: release/4.5.0_cvs, release/4.4.0_cvs, release/4.3.0_cvs, release/4.3.0, release/4.2.0, release/4.1.1_cvs, release/4.1.0, release/3.5.0_cvs, release/4.0.0_cvs, release/3.4.0_cvs, release/3.3.0_cvs
# 7f3dea24	28-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
Revision tags: release/3.2.0, release/3.1.0, release/3.0.0, release/2.2.8, release/2.2.7, release/2.2.6, release/2.2.5_cvs, release/2.2.2_cvs, release/2.2.1_cvs, release/2.2.0, release/2.1.7_cvs
# 7e546392	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Revert $FreeBSD$ to $Id$
Revision tags: release/2.1.6_cvs, release/2.1.6.1
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise. show more ...
Revision tags: release/2.1.5_cvs, release/2.1.0_cvs, release/2.0.5_cvs
# 6c06b4e2	30-May-1995	Rodney W. Grimes <rgrimes@FreeBSD.org>	Remove trailing whitespace.
Revision tags: release/2.0
# 3a8617a8	19-Aug-1994	Jordan K. Hubbard <jkh@FreeBSD.org>	J.T. Conklin's latest version of the Sun math library. -- Begin comments from J.T. Conklin: The most significant improvement is the addition of "float" versions of the math functions that take float J.T. Conklin's latest version of the Sun math library. -- Begin comments from J.T. Conklin: The most significant improvement is the addition of "float" versions of the math functions that take float arguments, return floats, and do all operations in floating point. This doesn't help (performance) much on the i386, but they are still nice to have. The float versions were orginally done by Cygnus' Ian Taylor when fdlibm was integrated into the libm we support for embedded systems. I gave Ian a copy of my libm as a starting point since I had already fixed a lot of bugs & problems in Sun's original code. After he was done, I cleaned it up a bit and integrated the changes back into my libm. -- End comments Reviewed by: jkh Submitted by: jtc show more ...
Revision tags: release/7.4.0_cvs, release/8.2.0_cvs, release/7.4.0, release/8.2.0, release/8.1.0_cvs, release/8.1.0, release/7.3.0_cvs, release/7.3.0, release/8.0.0_cvs, release/8.0.0, release/7.2.0_cvs, release/7.2.0, release/7.1.0_cvs, release/7.1.0, release/6.4.0_cvs, release/6.4.0, release/7.0.0_cvs, release/7.0.0
# 5aa554c7	22-Feb-2008	David Schultz <das@FreeBSD.org>	s/rcsid/__FBSDID/
Revision tags: release/6.3.0_cvs, release/6.3.0, release/6.2.0_cvs, release/6.2.0, release/5.5.0_cvs, release/5.5.0, release/6.1.0_cvs, release/6.1.0
# 5776f433	20-Dec-2005	Bruce Evans <bde@FreeBSD.org>	Extract the high and low words together. With gcc-3.4 on uniformly distributed non-large args, this saves about 14 of 134 cycles for Athlon64s and about 5 of 199 cycles for AthlonXPs. Moved the che Extract the high and low words together. With gcc-3.4 on uniformly distributed non-large args, this saves about 14 of 134 cycles for Athlon64s and about 5 of 199 cycles for AthlonXPs. Moved the check for x == 0 inside the check for subnormals. With gcc-3.4 on uniformly distributed non-large args, this saves another 5 cycles on Athlon64s and loses 1 cycle on AthlonXPs. Use INSERT_WORDS() and not SET_HIGH_WORD() when converting the first approximation from bits to double. With gcc-3.4 on uniformly distributed non-large args, this saves another 4 cycles on both Athlon64s and and AthlonXPs. Accessing doubles as 2 words may be an optimization on old CPUs, but on current CPUs it tends to cause extra operations and pipeline stalls, especially for writes, even when only 1 of the words needs to be accessed. Removed an unused variable. show more ...
12