History log of /freebsd/lib/msun/src/s_exp2f.c (Results 1 – 25 of 28)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 2b2cd978 13-Apr-2024 Henri Chataing <henrichataing@google.com>

msun: Fix math error in comment explaining y reduction

x = k + y for some integer k and |y| < 1/2
exp2(x) = exp2(k + y) = exp2(k) * exp2(y)
which can be written as 2**k * exp2(y)

The original had x

msun: Fix math error in comment explaining y reduction

x = k + y for some integer k and |y| < 1/2
exp2(x) = exp2(k + y) = exp2(k) * exp2(y)
which can be written as 2**k * exp2(y)

The original had x = 2**k + y, which is has an extra 2** in it which
isn't correct.

Confirmed by forumula 2 in Gal and Bachelis referenced in the comments
for the source of this method
https://dl.acm.org/doi/pdf/10.1145/103147.103151

The actual code is correct.

Reviewed by: imp (who added s_exp2.c and wrote the commit message)
Pull Request: https://github.com/freebsd/freebsd-src/pull/1127

show more ...


Revision tags: release/13.3.0
# 0dd5a560 28-Jan-2024 Steve Kargl <kargl@FreeBSD.org>

lib/msun: Cleanup after $FreeBSD$ removal

Remove no longer needed explicit inclusion of sys/cdefs.h.

PR: 276669
MFC after: 1 week


Revision tags: release/14.0.0
# 1d386b48 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix

show more ...


Revision tags: release/13.2.0, release/12.4.0, release/13.1.0, release/12.3.0, release/13.0.0, release/12.2.0, release/11.4.0, release/12.1.0, release/11.3.0, release/12.0.0, release/11.2.0
# 5e53a4f9 26-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

lib: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using mis-identified many licenses so this was mostly a manual - error
pr

lib: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using mis-identified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

show more ...


Revision tags: release/10.4.0, release/11.1.0, release/11.0.1, release/11.0.0, release/10.3.0, release/10.2.0, release/10.1.0, release/9.3.0, release/10.0.0, release/9.2.0
# d1d01586 05-Sep-2013 Simon J. Gerraty <sjg@FreeBSD.org>

Merge from head


# 40f65a4d 07-Aug-2013 Peter Grehan <grehan@FreeBSD.org>

IFC @ r254014


# 552311f4 17-Jul-2013 Xin LI <delphij@FreeBSD.org>

IFC @253398


# cfe30d02 19-Jun-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Merge fresh head.


Revision tags: release/8.4.0
# 7dbbb6dd 27-May-2013 David Schultz <das@FreeBSD.org>

Fix some regressions caused by the switch from gcc to clang. The fixes
are workarounds for various symptoms of the problem described in clang
bugs 3929, 8100, 8241, 10409, and 12958.

The regression

Fix some regressions caused by the switch from gcc to clang. The fixes
are workarounds for various symptoms of the problem described in clang
bugs 3929, 8100, 8241, 10409, and 12958.

The regression tests did their job: they failed, someone brought it
up on the mailing lists, and then the issue got ignored for 6 months.
Oops. There may still be some regressions for functions we don't have
test coverage for yet.

show more ...


Revision tags: release/9.1.0, release/8.3.0_cvs, release/8.3.0, release/9.0.0, release/7.4.0_cvs, release/8.2.0_cvs, release/7.4.0, release/8.2.0, release/8.1.0_cvs, release/8.1.0, release/7.3.0_cvs, release/7.3.0, release/8.0.0_cvs, release/8.0.0, release/7.2.0_cvs, release/7.2.0, release/7.1.0_cvs, release/7.1.0, release/6.4.0_cvs, release/6.4.0, release/7.0.0_cvs, release/7.0.0
# fab324df 22-Feb-2008 David Schultz <das@FreeBSD.org>

Remove an unused variable.


# f01bfe5c 13-Feb-2008 Bruce Evans <bde@FreeBSD.org>

Fix exp2*(x) on signaling NaNs by returning x+x as usual.

This has the side effect of confusing gcc-4.2.1's optimizer into more
often doing the right thing. When it does the wrong thing here, it
se

Fix exp2*(x) on signaling NaNs by returning x+x as usual.

This has the side effect of confusing gcc-4.2.1's optimizer into more
often doing the right thing. When it does the wrong thing here, it
seems to be mainly making too many copies of x with dependency chains.
This effect is tiny on amd64, but in some cases on i386 it is enormous.
E.g., on i386 (A64) with -O1, the current version of exp2() should
take about 50 cycles, but took 83 cycles before this change and 66
cycles after this change. exp2f() with -O1 only speeded up from 51
to 47 cycles. (exp2f() should take about 40 cycles, on an Athlon in
either i386 or amd64 mode, and now takes 42 on amd64). exp2l() with
-O1 slowed down from 155 cycles to 123 for some args; this is unimportant
since the i386 exp2l() is a fake; the wrong thing for it seems to
involve branch misprediction.

show more ...


# 828f7b4a 13-Feb-2008 Bruce Evans <bde@FreeBSD.org>

Rearrange the polynomial evaluation for better parallelism. This is
faster on all machines tested (old Celeron (P2), A64 (amd64 and i386)
and ia64) except on ia64 when compiled with -O1. It takes 2

Rearrange the polynomial evaluation for better parallelism. This is
faster on all machines tested (old Celeron (P2), A64 (amd64 and i386)
and ia64) except on ia64 when compiled with -O1. It takes 2 more
multiplications, so it will be slower on old machines. The speedup
is about 8 cycles = 17% on A64 (amd64 and i386) with best CFLAGS
and some parallelism in the caller.

Move the evaluation of 2**k up a bit so that it doesn't compete too
much with the new polynomial evaluation. Unlike the previous
optimization, this rearrangement cannot change the result, so compilers
and CPU schedulers can do it, but they don't do it quite right yet.
This saves a whole 1 or 2 cycles on A64.

show more ...


# 51f86873 11-Feb-2008 Bruce Evans <bde@FreeBSD.org>

Use double precision for z and thus for the entire calculation of
exp2(i/TBLSIZE) * p(z) instead of only for the final multiplication
and addition. This fixes the code to match the comment that the

Use double precision for z and thus for the entire calculation of
exp2(i/TBLSIZE) * p(z) instead of only for the final multiplication
and addition. This fixes the code to match the comment that the maximum
error is 0.5010 ulps (except on machines that evaluate float expressions
in extra precision, e.g., i386's, where the evaluation was already
in extra precision).

Fix and expand the comment about use of double precision.

The relative roundoff error from evaluating p(z) in non-extra precision
was about 16 times larger than in exp2() because the interval length
is 16 times smaller. Its maximum was at least P1 * (1.0 ulps) *
max(|z|) ~= log(2) * 1.0 * 1/32 ~= 0.0217 ulps (1.0 ulps from the
addition in (1 + P1*z) with a cancelation error when z ~= -1/32). The
actual final maximum was 0.5313 ulps, of which 0.0303 ulps must have
come from the additional roundoff error in p(z). I can't explain why
the additional roundoff error was almost 3/2 times larger than the rough
estimate.

show more ...


# a373e66b 07-Feb-2008 Bruce Evans <bde@FreeBSD.org>

Use a better method of scaling by 2**k. Instead of adding to the
exponent bits of the reduced result, construct 2**k (hopefully in
parallel with the construction of the reduced result) and multiply

Use a better method of scaling by 2**k. Instead of adding to the
exponent bits of the reduced result, construct 2**k (hopefully in
parallel with the construction of the reduced result) and multiply by
it. This tends to be much faster if the construction of 2**k is
actually in parallel, and might be faster even with no parallelism
since adjustment of the exponent requires a read-modify-wrtite at an
unfortunate time for pipelines.

In some cases involving exp2* on amd64 (A64), this change saves about
40 cycles or 30%. I think it is inherently only about 12 cycles faster
in these cases and the rest of the speedup is from partly-accidentally
avoiding compiler pessimizations (the construction of 2**k is now
manually scheduled for good results, and -O2 doesn't always mess this
up). In most cases on amd64 (A64) and i386 (A64) the speedup is about
20 cycles. The worst case that I found is expf on ia64 where this
change is a pessimization of about 10 cycles or 5%. The manual
scheduling for plain exp[f] is harder and not as tuned.

This change ld128/s_exp2l.c has not been tested.

show more ...


# b134ea72 28-Jan-2008 David Schultz <das@FreeBSD.org>

Adjust the exponent before converting the result from double to
float precision. This fixes some double rounding problems for
subnormals and simplifies things a bit.


# 684217d8 19-Jan-2008 Bruce Evans <bde@FreeBSD.org>

Use STRICT_ASSIGN() for exp2f() and exp2() instead of a volatile
variable hack for exp2f() only.

The volatile variable had a surprisingly large cost for exp2f() -- 19
cycles or 15% on i386 in the wo

Use STRICT_ASSIGN() for exp2f() and exp2() instead of a volatile
variable hack for exp2f() only.

The volatile variable had a surprisingly large cost for exp2f() -- 19
cycles or 15% on i386 in the worst case observed. This is only partly
explained by there being several references to the variable, only one
of which benefited from it being volatile. Arches that have working
assignment are likely to benefit even more from not having any volatile
variable.

exp2() now has a chance of working with extra precision on i386.

exp2() has even more references to the variable, so it would have been
pessimized more by simply declaring the variable as volatile. Even
the temporary volatile variable for STRICT_ASSIGN costs 5-10% on i386,
(A64) so I will change STRICT_ASSIGN() to do an ordinary assignment
until i386 defaults to extra precision.

show more ...


# 86c2e0c0 18-Jan-2008 David Schultz <das@FreeBSD.org>

Use volatile hacks to make sure these functions generate an underflow
exception when they're supposed to. Previously, gcc -O2 was optimizing
away the statement that generated it.


Revision tags: release/6.3.0_cvs, release/6.3.0, release/6.2.0_cvs, release/6.2.0, release/5.5.0_cvs, release/5.5.0, release/6.1.0_cvs, release/6.1.0, release/6.0.0_cvs, release/6.0.0, release/5.4.0_cvs, release/5.4.0
# f8d6ede6 05-Apr-2005 David Schultz <das@FreeBSD.org>

Implement exp2() and exp2f().


Revision tags: release/9.1.0, release/8.3.0_cvs, release/8.3.0, release/9.0.0, release/7.4.0_cvs, release/8.2.0_cvs, release/7.4.0, release/8.2.0, release/8.1.0_cvs, release/8.1.0, release/7.3.0_cvs, release/7.3.0, release/8.0.0_cvs, release/8.0.0, release/7.2.0_cvs, release/7.2.0, release/7.1.0_cvs, release/7.1.0, release/6.4.0_cvs, release/6.4.0, release/7.0.0_cvs, release/7.0.0
# fab324df 22-Feb-2008 David Schultz <das@FreeBSD.org>

Remove an unused variable.


# f01bfe5c 13-Feb-2008 Bruce Evans <bde@FreeBSD.org>

Fix exp2*(x) on signaling NaNs by returning x+x as usual.

This has the side effect of confusing gcc-4.2.1's optimizer into more
often doing the right thing. When it does the wrong thing here, it
se

Fix exp2*(x) on signaling NaNs by returning x+x as usual.

This has the side effect of confusing gcc-4.2.1's optimizer into more
often doing the right thing. When it does the wrong thing here, it
seems to be mainly making too many copies of x with dependency chains.
This effect is tiny on amd64, but in some cases on i386 it is enormous.
E.g., on i386 (A64) with -O1, the current version of exp2() should
take about 50 cycles, but took 83 cycles before this change and 66
cycles after this change. exp2f() with -O1 only speeded up from 51
to 47 cycles. (exp2f() should take about 40 cycles, on an Athlon in
either i386 or amd64 mode, and now takes 42 on amd64). exp2l() with
-O1 slowed down from 155 cycles to 123 for some args; this is unimportant
since the i386 exp2l() is a fake; the wrong thing for it seems to
involve branch misprediction.

show more ...


# 828f7b4a 13-Feb-2008 Bruce Evans <bde@FreeBSD.org>

Rearrange the polynomial evaluation for better parallelism. This is
faster on all machines tested (old Celeron (P2), A64 (amd64 and i386)
and ia64) except on ia64 when compiled with -O1. It takes 2

Rearrange the polynomial evaluation for better parallelism. This is
faster on all machines tested (old Celeron (P2), A64 (amd64 and i386)
and ia64) except on ia64 when compiled with -O1. It takes 2 more
multiplications, so it will be slower on old machines. The speedup
is about 8 cycles = 17% on A64 (amd64 and i386) with best CFLAGS
and some parallelism in the caller.

Move the evaluation of 2**k up a bit so that it doesn't compete too
much with the new polynomial evaluation. Unlike the previous
optimization, this rearrangement cannot change the result, so compilers
and CPU schedulers can do it, but they don't do it quite right yet.
This saves a whole 1 or 2 cycles on A64.

show more ...


# 51f86873 11-Feb-2008 Bruce Evans <bde@FreeBSD.org>

Use double precision for z and thus for the entire calculation of
exp2(i/TBLSIZE) * p(z) instead of only for the final multiplication
and addition. This fixes the code to match the comment that the

Use double precision for z and thus for the entire calculation of
exp2(i/TBLSIZE) * p(z) instead of only for the final multiplication
and addition. This fixes the code to match the comment that the maximum
error is 0.5010 ulps (except on machines that evaluate float expressions
in extra precision, e.g., i386's, where the evaluation was already
in extra precision).

Fix and expand the comment about use of double precision.

The relative roundoff error from evaluating p(z) in non-extra precision
was about 16 times larger than in exp2() because the interval length
is 16 times smaller. Its maximum was at least P1 * (1.0 ulps) *
max(|z|) ~= log(2) * 1.0 * 1/32 ~= 0.0217 ulps (1.0 ulps from the
addition in (1 + P1*z) with a cancelation error when z ~= -1/32). The
actual final maximum was 0.5313 ulps, of which 0.0303 ulps must have
come from the additional roundoff error in p(z). I can't explain why
the additional roundoff error was almost 3/2 times larger than the rough
estimate.

show more ...


# a373e66b 07-Feb-2008 Bruce Evans <bde@FreeBSD.org>

Use a better method of scaling by 2**k. Instead of adding to the
exponent bits of the reduced result, construct 2**k (hopefully in
parallel with the construction of the reduced result) and multiply

Use a better method of scaling by 2**k. Instead of adding to the
exponent bits of the reduced result, construct 2**k (hopefully in
parallel with the construction of the reduced result) and multiply by
it. This tends to be much faster if the construction of 2**k is
actually in parallel, and might be faster even with no parallelism
since adjustment of the exponent requires a read-modify-wrtite at an
unfortunate time for pipelines.

In some cases involving exp2* on amd64 (A64), this change saves about
40 cycles or 30%. I think it is inherently only about 12 cycles faster
in these cases and the rest of the speedup is from partly-accidentally
avoiding compiler pessimizations (the construction of 2**k is now
manually scheduled for good results, and -O2 doesn't always mess this
up). In most cases on amd64 (A64) and i386 (A64) the speedup is about
20 cycles. The worst case that I found is expf on ia64 where this
change is a pessimization of about 10 cycles or 5%. The manual
scheduling for plain exp[f] is harder and not as tuned.

This change ld128/s_exp2l.c has not been tested.

show more ...


# b134ea72 28-Jan-2008 David Schultz <das@FreeBSD.org>

Adjust the exponent before converting the result from double to
float precision. This fixes some double rounding problems for
subnormals and simplifies things a bit.


12