#
2b2cd978 |
| 13-Apr-2024 |
Henri Chataing <henrichataing@google.com> |
msun: Fix math error in comment explaining y reduction
x = k + y for some integer k and |y| < 1/2 exp2(x) = exp2(k + y) = exp2(k) * exp2(y) which can be written as 2**k * exp2(y)
The original had x
msun: Fix math error in comment explaining y reduction
x = k + y for some integer k and |y| < 1/2 exp2(x) = exp2(k + y) = exp2(k) * exp2(y) which can be written as 2**k * exp2(y)
The original had x = 2**k + y, which is has an extra 2** in it which isn't correct.
Confirmed by forumula 2 in Gal and Bachelis referenced in the comments for the source of this method https://dl.acm.org/doi/pdf/10.1145/103147.103151
The actual code is correct.
Reviewed by: imp (who added s_exp2.c and wrote the commit message) Pull Request: https://github.com/freebsd/freebsd-src/pull/1127
show more ...
|
Revision tags: release/13.3.0 |
|
#
0dd5a560 |
| 28-Jan-2024 |
Steve Kargl <kargl@FreeBSD.org> |
lib/msun: Cleanup after $FreeBSD$ removal
Remove no longer needed explicit inclusion of sys/cdefs.h.
PR: 276669 MFC after: 1 week
|
Revision tags: release/14.0.0 |
|
#
1d386b48 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
4d846d26 |
| 10-May-2023 |
Warner Losh <imp@FreeBSD.org> |
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
show more ...
|
Revision tags: release/13.2.0, release/12.4.0, release/13.1.0, release/12.3.0, release/13.0.0, release/12.2.0, release/11.4.0, release/12.1.0, release/11.3.0, release/12.0.0, release/11.2.0 |
|
#
5e53a4f9 |
| 26-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
lib: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I was using mis-identified many licenses so this was mostly a manual - error pr
lib: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I was using mis-identified many licenses so this was mostly a manual - error prone - task.
The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
show more ...
|
Revision tags: release/10.4.0, release/11.1.0, release/11.0.1, release/11.0.0, release/10.3.0, release/10.2.0, release/10.1.0, release/9.3.0, release/10.0.0, release/9.2.0 |
|
#
d1d01586 |
| 05-Sep-2013 |
Simon J. Gerraty <sjg@FreeBSD.org> |
Merge from head
|
#
40f65a4d |
| 07-Aug-2013 |
Peter Grehan <grehan@FreeBSD.org> |
IFC @ r254014
|
#
552311f4 |
| 17-Jul-2013 |
Xin LI <delphij@FreeBSD.org> |
IFC @253398
|
#
cfe30d02 |
| 19-Jun-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Merge fresh head.
|
Revision tags: release/8.4.0 |
|
#
7dbbb6dd |
| 27-May-2013 |
David Schultz <das@FreeBSD.org> |
Fix some regressions caused by the switch from gcc to clang. The fixes are workarounds for various symptoms of the problem described in clang bugs 3929, 8100, 8241, 10409, and 12958.
The regression
Fix some regressions caused by the switch from gcc to clang. The fixes are workarounds for various symptoms of the problem described in clang bugs 3929, 8100, 8241, 10409, and 12958.
The regression tests did their job: they failed, someone brought it up on the mailing lists, and then the issue got ignored for 6 months. Oops. There may still be some regressions for functions we don't have test coverage for yet.
show more ...
|
Revision tags: release/9.1.0, release/8.3.0_cvs, release/8.3.0, release/9.0.0, release/7.4.0_cvs, release/8.2.0_cvs, release/7.4.0, release/8.2.0, release/8.1.0_cvs, release/8.1.0, release/7.3.0_cvs, release/7.3.0, release/8.0.0_cvs, release/8.0.0, release/7.2.0_cvs, release/7.2.0, release/7.1.0_cvs, release/7.1.0, release/6.4.0_cvs, release/6.4.0, release/7.0.0_cvs, release/7.0.0 |
|
#
fab324df |
| 22-Feb-2008 |
David Schultz <das@FreeBSD.org> |
Remove an unused variable.
|
#
f01bfe5c |
| 13-Feb-2008 |
Bruce Evans <bde@FreeBSD.org> |
Fix exp2*(x) on signaling NaNs by returning x+x as usual.
This has the side effect of confusing gcc-4.2.1's optimizer into more often doing the right thing. When it does the wrong thing here, it se
Fix exp2*(x) on signaling NaNs by returning x+x as usual.
This has the side effect of confusing gcc-4.2.1's optimizer into more often doing the right thing. When it does the wrong thing here, it seems to be mainly making too many copies of x with dependency chains. This effect is tiny on amd64, but in some cases on i386 it is enormous. E.g., on i386 (A64) with -O1, the current version of exp2() should take about 50 cycles, but took 83 cycles before this change and 66 cycles after this change. exp2f() with -O1 only speeded up from 51 to 47 cycles. (exp2f() should take about 40 cycles, on an Athlon in either i386 or amd64 mode, and now takes 42 on amd64). exp2l() with -O1 slowed down from 155 cycles to 123 for some args; this is unimportant since the i386 exp2l() is a fake; the wrong thing for it seems to involve branch misprediction.
show more ...
|
#
828f7b4a |
| 13-Feb-2008 |
Bruce Evans <bde@FreeBSD.org> |
Rearrange the polynomial evaluation for better parallelism. This is faster on all machines tested (old Celeron (P2), A64 (amd64 and i386) and ia64) except on ia64 when compiled with -O1. It takes 2
Rearrange the polynomial evaluation for better parallelism. This is faster on all machines tested (old Celeron (P2), A64 (amd64 and i386) and ia64) except on ia64 when compiled with -O1. It takes 2 more multiplications, so it will be slower on old machines. The speedup is about 8 cycles = 17% on A64 (amd64 and i386) with best CFLAGS and some parallelism in the caller.
Move the evaluation of 2**k up a bit so that it doesn't compete too much with the new polynomial evaluation. Unlike the previous optimization, this rearrangement cannot change the result, so compilers and CPU schedulers can do it, but they don't do it quite right yet. This saves a whole 1 or 2 cycles on A64.
show more ...
|
#
51f86873 |
| 11-Feb-2008 |
Bruce Evans <bde@FreeBSD.org> |
Use double precision for z and thus for the entire calculation of exp2(i/TBLSIZE) * p(z) instead of only for the final multiplication and addition. This fixes the code to match the comment that the
Use double precision for z and thus for the entire calculation of exp2(i/TBLSIZE) * p(z) instead of only for the final multiplication and addition. This fixes the code to match the comment that the maximum error is 0.5010 ulps (except on machines that evaluate float expressions in extra precision, e.g., i386's, where the evaluation was already in extra precision).
Fix and expand the comment about use of double precision.
The relative roundoff error from evaluating p(z) in non-extra precision was about 16 times larger than in exp2() because the interval length is 16 times smaller. Its maximum was at least P1 * (1.0 ulps) * max(|z|) ~= log(2) * 1.0 * 1/32 ~= 0.0217 ulps (1.0 ulps from the addition in (1 + P1*z) with a cancelation error when z ~= -1/32). The actual final maximum was 0.5313 ulps, of which 0.0303 ulps must have come from the additional roundoff error in p(z). I can't explain why the additional roundoff error was almost 3/2 times larger than the rough estimate.
show more ...
|
#
a373e66b |
| 07-Feb-2008 |
Bruce Evans <bde@FreeBSD.org> |
Use a better method of scaling by 2**k. Instead of adding to the exponent bits of the reduced result, construct 2**k (hopefully in parallel with the construction of the reduced result) and multiply
Use a better method of scaling by 2**k. Instead of adding to the exponent bits of the reduced result, construct 2**k (hopefully in parallel with the construction of the reduced result) and multiply by it. This tends to be much faster if the construction of 2**k is actually in parallel, and might be faster even with no parallelism since adjustment of the exponent requires a read-modify-wrtite at an unfortunate time for pipelines.
In some cases involving exp2* on amd64 (A64), this change saves about 40 cycles or 30%. I think it is inherently only about 12 cycles faster in these cases and the rest of the speedup is from partly-accidentally avoiding compiler pessimizations (the construction of 2**k is now manually scheduled for good results, and -O2 doesn't always mess this up). In most cases on amd64 (A64) and i386 (A64) the speedup is about 20 cycles. The worst case that I found is expf on ia64 where this change is a pessimization of about 10 cycles or 5%. The manual scheduling for plain exp[f] is harder and not as tuned.
This change ld128/s_exp2l.c has not been tested.
show more ...
|
#
b134ea72 |
| 28-Jan-2008 |
David Schultz <das@FreeBSD.org> |
Adjust the exponent before converting the result from double to float precision. This fixes some double rounding problems for subnormals and simplifies things a bit.
|
#
684217d8 |
| 19-Jan-2008 |
Bruce Evans <bde@FreeBSD.org> |
Use STRICT_ASSIGN() for exp2f() and exp2() instead of a volatile variable hack for exp2f() only.
The volatile variable had a surprisingly large cost for exp2f() -- 19 cycles or 15% on i386 in the wo
Use STRICT_ASSIGN() for exp2f() and exp2() instead of a volatile variable hack for exp2f() only.
The volatile variable had a surprisingly large cost for exp2f() -- 19 cycles or 15% on i386 in the worst case observed. This is only partly explained by there being several references to the variable, only one of which benefited from it being volatile. Arches that have working assignment are likely to benefit even more from not having any volatile variable.
exp2() now has a chance of working with extra precision on i386.
exp2() has even more references to the variable, so it would have been pessimized more by simply declaring the variable as volatile. Even the temporary volatile variable for STRICT_ASSIGN costs 5-10% on i386, (A64) so I will change STRICT_ASSIGN() to do an ordinary assignment until i386 defaults to extra precision.
show more ...
|
#
86c2e0c0 |
| 18-Jan-2008 |
David Schultz <das@FreeBSD.org> |
Use volatile hacks to make sure these functions generate an underflow exception when they're supposed to. Previously, gcc -O2 was optimizing away the statement that generated it.
|
Revision tags: release/6.3.0_cvs, release/6.3.0, release/6.2.0_cvs, release/6.2.0, release/5.5.0_cvs, release/5.5.0, release/6.1.0_cvs, release/6.1.0, release/6.0.0_cvs, release/6.0.0, release/5.4.0_cvs, release/5.4.0 |
|
#
f8d6ede6 |
| 05-Apr-2005 |
David Schultz <das@FreeBSD.org> |
Implement exp2() and exp2f().
|
Revision tags: release/9.1.0, release/8.3.0_cvs, release/8.3.0, release/9.0.0, release/7.4.0_cvs, release/8.2.0_cvs, release/7.4.0, release/8.2.0, release/8.1.0_cvs, release/8.1.0, release/7.3.0_cvs, release/7.3.0, release/8.0.0_cvs, release/8.0.0, release/7.2.0_cvs, release/7.2.0, release/7.1.0_cvs, release/7.1.0, release/6.4.0_cvs, release/6.4.0, release/7.0.0_cvs, release/7.0.0 |
|
#
fab324df |
| 22-Feb-2008 |
David Schultz <das@FreeBSD.org> |
Remove an unused variable.
|
#
f01bfe5c |
| 13-Feb-2008 |
Bruce Evans <bde@FreeBSD.org> |
Fix exp2*(x) on signaling NaNs by returning x+x as usual.
This has the side effect of confusing gcc-4.2.1's optimizer into more often doing the right thing. When it does the wrong thing here, it se
Fix exp2*(x) on signaling NaNs by returning x+x as usual.
This has the side effect of confusing gcc-4.2.1's optimizer into more often doing the right thing. When it does the wrong thing here, it seems to be mainly making too many copies of x with dependency chains. This effect is tiny on amd64, but in some cases on i386 it is enormous. E.g., on i386 (A64) with -O1, the current version of exp2() should take about 50 cycles, but took 83 cycles before this change and 66 cycles after this change. exp2f() with -O1 only speeded up from 51 to 47 cycles. (exp2f() should take about 40 cycles, on an Athlon in either i386 or amd64 mode, and now takes 42 on amd64). exp2l() with -O1 slowed down from 155 cycles to 123 for some args; this is unimportant since the i386 exp2l() is a fake; the wrong thing for it seems to involve branch misprediction.
show more ...
|
#
828f7b4a |
| 13-Feb-2008 |
Bruce Evans <bde@FreeBSD.org> |
Rearrange the polynomial evaluation for better parallelism. This is faster on all machines tested (old Celeron (P2), A64 (amd64 and i386) and ia64) except on ia64 when compiled with -O1. It takes 2
Rearrange the polynomial evaluation for better parallelism. This is faster on all machines tested (old Celeron (P2), A64 (amd64 and i386) and ia64) except on ia64 when compiled with -O1. It takes 2 more multiplications, so it will be slower on old machines. The speedup is about 8 cycles = 17% on A64 (amd64 and i386) with best CFLAGS and some parallelism in the caller.
Move the evaluation of 2**k up a bit so that it doesn't compete too much with the new polynomial evaluation. Unlike the previous optimization, this rearrangement cannot change the result, so compilers and CPU schedulers can do it, but they don't do it quite right yet. This saves a whole 1 or 2 cycles on A64.
show more ...
|
#
51f86873 |
| 11-Feb-2008 |
Bruce Evans <bde@FreeBSD.org> |
Use double precision for z and thus for the entire calculation of exp2(i/TBLSIZE) * p(z) instead of only for the final multiplication and addition. This fixes the code to match the comment that the
Use double precision for z and thus for the entire calculation of exp2(i/TBLSIZE) * p(z) instead of only for the final multiplication and addition. This fixes the code to match the comment that the maximum error is 0.5010 ulps (except on machines that evaluate float expressions in extra precision, e.g., i386's, where the evaluation was already in extra precision).
Fix and expand the comment about use of double precision.
The relative roundoff error from evaluating p(z) in non-extra precision was about 16 times larger than in exp2() because the interval length is 16 times smaller. Its maximum was at least P1 * (1.0 ulps) * max(|z|) ~= log(2) * 1.0 * 1/32 ~= 0.0217 ulps (1.0 ulps from the addition in (1 + P1*z) with a cancelation error when z ~= -1/32). The actual final maximum was 0.5313 ulps, of which 0.0303 ulps must have come from the additional roundoff error in p(z). I can't explain why the additional roundoff error was almost 3/2 times larger than the rough estimate.
show more ...
|
#
a373e66b |
| 07-Feb-2008 |
Bruce Evans <bde@FreeBSD.org> |
Use a better method of scaling by 2**k. Instead of adding to the exponent bits of the reduced result, construct 2**k (hopefully in parallel with the construction of the reduced result) and multiply
Use a better method of scaling by 2**k. Instead of adding to the exponent bits of the reduced result, construct 2**k (hopefully in parallel with the construction of the reduced result) and multiply by it. This tends to be much faster if the construction of 2**k is actually in parallel, and might be faster even with no parallelism since adjustment of the exponent requires a read-modify-wrtite at an unfortunate time for pipelines.
In some cases involving exp2* on amd64 (A64), this change saves about 40 cycles or 30%. I think it is inherently only about 12 cycles faster in these cases and the rest of the speedup is from partly-accidentally avoiding compiler pessimizations (the construction of 2**k is now manually scheduled for good results, and -O2 doesn't always mess this up). In most cases on amd64 (A64) and i386 (A64) the speedup is about 20 cycles. The worst case that I found is expf on ia64 where this change is a pessimization of about 10 cycles or 5%. The manual scheduling for plain exp[f] is harder and not as tuned.
This change ld128/s_exp2l.c has not been tested.
show more ...
|
#
b134ea72 |
| 28-Jan-2008 |
David Schultz <das@FreeBSD.org> |
Adjust the exponent before converting the result from double to float precision. This fixes some double rounding problems for subnormals and simplifies things a bit.
|