#
0dd5a560 |
| 28-Jan-2024 |
Steve Kargl <kargl@FreeBSD.org> |
lib/msun: Cleanup after $FreeBSD$ removal
Remove no longer needed explicit inclusion of sys/cdefs.h.
PR: 276669 MFC after: 1 week
|
Revision tags: release/14.0.0 |
|
#
1d386b48 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
Revision tags: release/13.2.0, release/12.4.0, release/13.1.0, release/12.3.0, release/13.0.0, release/12.2.0, release/11.4.0, release/12.1.0, release/11.3.0, release/12.0.0, release/11.2.0, release/10.4.0, release/11.1.0, release/11.0.1, release/11.0.0, release/10.3.0, release/10.2.0, release/10.1.0, release/9.3.0, release/10.0.0, release/9.2.0, release/8.4.0, release/9.1.0, release/8.3.0_cvs, release/8.3.0, release/9.0.0, release/7.4.0_cvs, release/8.2.0_cvs, release/7.4.0, release/8.2.0, release/8.1.0_cvs, release/8.1.0, release/7.3.0_cvs, release/7.3.0, release/8.0.0_cvs, release/8.0.0, release/7.2.0_cvs, release/7.2.0, release/7.1.0_cvs, release/7.1.0, release/6.4.0_cvs, release/6.4.0 |
|
#
e822ea5b |
| 25-Feb-2008 |
Bruce Evans <bde@FreeBSD.org> |
Inline __ieee754__rem_pio2f(). On amd64 (A64) and i386 (A64), this gives an average speedup of about 12 cycles or 17% for 9pi/4 < |x| <= 2**19pi/2 and a smaller speedup for larger x, and a small spe
Inline __ieee754__rem_pio2f(). On amd64 (A64) and i386 (A64), this gives an average speedup of about 12 cycles or 17% for 9pi/4 < |x| <= 2**19pi/2 and a smaller speedup for larger x, and a small speeddown for |x| <= 9pi/4 (only 1-2 cycles average, but that is 4%).
Inlining this is less likely to bust caches than inlining the float version since it is much smaller (about 220 bytes text and rodata) and has many fewer branches. However, the float version was already large due to its manual inlining of the branches and also the polynomial evaluations.
show more ...
|
#
70d818a2 |
| 25-Feb-2008 |
Bruce Evans <bde@FreeBSD.org> |
Change __ieee754_rem_pio2f() to return double instead of float so that this function and its callers cosf(), sinf() and tanf() don't waste time converting values from doubles to floats and back for |
Change __ieee754_rem_pio2f() to return double instead of float so that this function and its callers cosf(), sinf() and tanf() don't waste time converting values from doubles to floats and back for |x| > 9pi/4. All these functions were optimized a few years ago to mostly use doubles internally and across the __kernel*() interfaces but not across the __ieee754_rem_pio2f() interface.
This saves about 40 cycles in cosf(), sinf() and tanf() for |x| > 9pi/4 on amd64 (A64), and about 20 cycles on i386 (A64) (except for cosf() and sinf() in the upper range). 40 cycles is about 35% for |x| < 9pi/4 <= 2**19pi/2 and about 5% for |x| > 2**19pi/2. The saving is much larger on amd64 than on i386 since the conversions are not easy to optimize except on i386 where some of them are automatic and others are optimized invalidly. amd64 is still about 10% slower in cosf() and tanf() in the lower range due to conversion overhead.
This also gives a tiny speedup for |x| <= 9pi/4 on amd64 (by simplifying the code). It also avoids compiler bugs and/or additional slowness in the conversions on (not yet supported) machines where double_t != double.
show more ...
|
Revision tags: release/7.0.0_cvs, release/7.0.0 |
|
#
5aa554c7 |
| 22-Feb-2008 |
David Schultz <das@FreeBSD.org> |
s/rcsid/__FBSDID/
|
Revision tags: release/6.3.0_cvs, release/6.3.0, release/6.2.0_cvs, release/6.2.0, release/5.5.0_cvs, release/5.5.0, release/6.1.0_cvs, release/6.1.0 |
|
#
960d3da0 |
| 28-Nov-2005 |
Bruce Evans <bde@FreeBSD.org> |
Changed spelling of the request-to-inline macro name to match the change of the function name.
Added my (non-)copyright.
In k_tanf.c, added the first set of redundant parentheses to control groupin
Changed spelling of the request-to-inline macro name to match the change of the function name.
Added my (non-)copyright.
In k_tanf.c, added the first set of redundant parentheses to control grouping which was claimed to be added in the previous commit.
show more ...
|
#
94a5f9be |
| 23-Nov-2005 |
Bruce Evans <bde@FreeBSD.org> |
Use only double precision for "kernel" tanf (except for returning float). This is a minor interface change. The function is renamed from __kernel_tanf() to __kernel_tandf() so that misues of it will
Use only double precision for "kernel" tanf (except for returning float). This is a minor interface change. The function is renamed from __kernel_tanf() to __kernel_tandf() so that misues of it will cause link errors and not crashes.
This version is a routine translation with no special optimizations for accuracy or efficiency. It gives an unimportant increase in accuracy, from ~0.9 ulps to 0.5285 ulps. Almost all of the error is from the minimax polynomial (~0.03 ulps and the final rounding step (< 0.5 ulps). It gives strange differences in efficiency in the -5 to +10% range, with -O1 fairly consistently becoming faster and -O2 slower on AXP and A64 with gcc-3.3 and gcc-3.4.
show more ...
|
#
4ce51209 |
| 21-Nov-2005 |
Bruce Evans <bde@FreeBSD.org> |
Mess up the "kernel" float trig function .c files with ifdefs so that they can be #included in other .c files to give inline functions, and use them to inline the functions in most callers (not in e_
Mess up the "kernel" float trig function .c files with ifdefs so that they can be #included in other .c files to give inline functions, and use them to inline the functions in most callers (not in e_lgammaf_r.c). __kernel_tanf() is too large and complicated for gcc to inline very well.
An athlons, this gives a speed increase under favourable pipeline conditions of about 10% overall (larger for AXP, smaller for A64). E.g., on AXP, sinf() on uniformly distributed args in [-2Pi, 2Pi] now takes 30-56 cycles; it used to take 45-61 cycles; hardware fsin takes 65-129.
show more ...
|
#
23f6483e |
| 20-Nov-2005 |
Bruce Evans <bde@FreeBSD.org> |
Restored a cleanup in rev.1.9 tthat was lost in rev.1.10.
|
#
8299eb7e |
| 19-Nov-2005 |
Bruce Evans <bde@FreeBSD.org> |
Moved all the optimizations for |x| <= 9pi/2 from __ieee754_rem_pio2f() to its 3 callers and manually inline them.
On Athlons, with favourable compiler flags and optimizations and favourable pipelin
Moved all the optimizations for |x| <= 9pi/2 from __ieee754_rem_pio2f() to its 3 callers and manually inline them.
On Athlons, with favourable compiler flags and optimizations and favourable pipeline conditions, this gives a speedup of 30-40 cycles for cosf(), sinf() and tanf() on the range pi/4 < |x| <= 9pi/4, so thes functions are now signifcantly faster than the hardware trig functions in many cases. E.g., in a benchmark with uniformly distributed x in [-2pi, 2pi], A64 hardware fcos took 72-129 cycles and cosf() took 37-55 cycles. Out-of-order execution is needed to get both of these times. The optimizations in this commit apparently work more by removing 1 serialization point than by reducing latency.
show more ...
|
#
75ff209c |
| 17-Nov-2005 |
Bruce Evans <bde@FreeBSD.org> |
Minor cleanups:
s_cosf.c and s_sinf.c: Use a non-bogus magic constant for the threshold of pi/4. It was 2 ulps smaller than pi/4 rounded down, but its value is not critical so it should be the resu
Minor cleanups:
s_cosf.c and s_sinf.c: Use a non-bogus magic constant for the threshold of pi/4. It was 2 ulps smaller than pi/4 rounded down, but its value is not critical so it should be the result of natural rounding.
s_cosf.c and s_tanf.c: Use a literal 0.0 instead of an unnecessary variable initialized to [(float)]0.0. Let the function prototype convert to 0.0F.
Improved wording in some comments.
Attempted to improve indentation of comments.
show more ...
|
Revision tags: release/6.0.0_cvs, release/6.0.0 |
|
#
cb92d4d5 |
| 02-Nov-2005 |
Bruce Evans <bde@FreeBSD.org> |
Moved the optimization for tiny x from __kernel_tan[f](x) to tan[f](x) so that it can be faster for tiny x and avoided for reduced x.
This improves things a little differently than for cosine and si
Moved the optimization for tiny x from __kernel_tan[f](x) to tan[f](x) so that it can be faster for tiny x and avoided for reduced x.
This improves things a little differently than for cosine and sine. We still need to reclassify x in the "kernel" functions, but we get an extra optimization for tiny x, and an overall optimization since tiny reduced x rarely happens. We also get optimizations for space and style. A large block of poorly duplicated code to fix a special case is no longer needed. This supersedes the fixes in k_sin.c revs 1.9 and 1.11 and k_sinf.c 1.8 and 1.10.
Fixed wrong constant for the cutoff for "tiny" in tanf(). It was 2**-28, but should be almost the same as the cutoff in sinf() (2**-12). The incorrect cutoff protected us from the bugs fixed in k_sinf.c 1.8 and 1.10, except 4 cases of reduced args passed the cutoff and needed special handling in theory although not in practice. Now we essentially use a cutoff of 0 for the case of reduced args, so we now have 0 special args instead of 4.
This change makes no difference to the results for sinf() (since it only changes the algorithm for the 4 special args and the results for those happen not to change), but it changes lots of results for sin(). Exhaustive testing is impossible for sin(), but exhaustive testing for sinf() (relative to a version with the old algorithm and a fixed cutoff) shows that the changes in the error are either reductions or from 0.5-epsilon ulps to 0.5+epsilon ulps. The new method just uses some extra terms in approximations so it tends to give more accurate results, and there are apparently no problems from having extra accuracy. On amd64 with -O1, on all float args the error range in ulps is reduced from (0.500, 0.665] to [0.335, 0.500) in 24168 cases and increased from 0.500-epsilon to 0.500+epsilon in 24 cases. Non- exhaustive testing by ucbtest shows no differences.
show more ...
|
Revision tags: release/5.4.0_cvs, release/5.4.0, release/4.11.0_cvs, release/4.11.0, release/5.3.0_cvs, release/5.3.0, release/4.10.0_cvs, release/4.10.0, release/5.2.1_cvs, release/5.2.1, release/5.2.0_cvs, release/5.2.0, release/4.9.0_cvs, release/4.9.0, release/5.1.0_cvs, release/5.1.0, release/4.8.0_cvs, release/4.8.0, release/5.0.0_cvs, release/5.0.0, release/4.7.0_cvs, release/4.6.2_cvs, release/4.6.2, release/4.6.1, release/4.6.0_cvs |
|
#
59b19ff1 |
| 28-May-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Fix formatting, this is hard to explain, so I'll show one example.
- float ynf(int n, float x) /* wrapper ynf */ +float +ynf(int n, float x) /* wrapper ynf */
This is because the __S
Fix formatting, this is hard to explain, so I'll show one example.
- float ynf(int n, float x) /* wrapper ynf */ +float +ynf(int n, float x) /* wrapper ynf */
This is because the __STDC__ stuff was indented.
Reviewed by: md5
show more ...
|
#
2dcc2286 |
| 28-May-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Assume __STDC__, remove non-__STDC__ code.
Reviewed by: md5
|
Revision tags: release/4.5.0_cvs, release/4.4.0_cvs, release/4.3.0_cvs, release/4.3.0, release/4.2.0, release/4.1.1_cvs, release/4.1.0, release/3.5.0_cvs, release/4.0.0_cvs, release/3.4.0_cvs, release/3.3.0_cvs |
|
#
7f3dea24 |
| 28-Aug-1999 |
Peter Wemm <peter@FreeBSD.org> |
$Id$ -> $FreeBSD$
|
Revision tags: release/3.2.0, release/3.1.0, release/3.0.0, release/2.2.8, release/2.2.7, release/2.2.6, release/2.2.5_cvs, release/2.2.2_cvs, release/2.2.1_cvs, release/2.2.0, release/2.1.7_cvs |
|
#
7e546392 |
| 22-Feb-1997 |
Peter Wemm <peter@FreeBSD.org> |
Revert $FreeBSD$ to $Id$
|
Revision tags: release/2.1.6_cvs, release/2.1.6.1 |
|
#
1130b656 |
| 14-Jan-1997 |
Jordan K. Hubbard <jkh@FreeBSD.org> |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
show more ...
|
Revision tags: release/2.1.5_cvs, release/2.1.0_cvs, release/2.0.5_cvs |
|
#
6c06b4e2 |
| 30-May-1995 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
Remove trailing whitespace.
|
Revision tags: release/2.0 |
|
#
3a8617a8 |
| 19-Aug-1994 |
Jordan K. Hubbard <jkh@FreeBSD.org> |
J.T. Conklin's latest version of the Sun math library.
-- Begin comments from J.T. Conklin: The most significant improvement is the addition of "float" versions of the math functions that take float
J.T. Conklin's latest version of the Sun math library.
-- Begin comments from J.T. Conklin: The most significant improvement is the addition of "float" versions of the math functions that take float arguments, return floats, and do all operations in floating point. This doesn't help (performance) much on the i386, but they are still nice to have.
The float versions were orginally done by Cygnus' Ian Taylor when fdlibm was integrated into the libm we support for embedded systems. I gave Ian a copy of my libm as a starting point since I had already fixed a lot of bugs & problems in Sun's original code. After he was done, I cleaned it up a bit and integrated the changes back into my libm. -- End comments
Reviewed by: jkh Submitted by: jtc
show more ...
|
Revision tags: release/13.2.0, release/12.4.0, release/13.1.0, release/12.3.0, release/13.0.0, release/12.2.0, release/11.4.0, release/12.1.0, release/11.3.0, release/12.0.0, release/11.2.0, release/10.4.0, release/11.1.0, release/11.0.1, release/11.0.0, release/10.3.0, release/10.2.0, release/10.1.0, release/9.3.0, release/10.0.0, release/9.2.0, release/8.4.0, release/9.1.0, release/8.3.0_cvs, release/8.3.0, release/9.0.0, release/7.4.0_cvs, release/8.2.0_cvs, release/7.4.0, release/8.2.0, release/8.1.0_cvs, release/8.1.0, release/7.3.0_cvs, release/7.3.0, release/8.0.0_cvs, release/8.0.0, release/7.2.0_cvs, release/7.2.0, release/7.1.0_cvs, release/7.1.0, release/6.4.0_cvs, release/6.4.0 |
|
#
e822ea5b |
| 25-Feb-2008 |
Bruce Evans <bde@FreeBSD.org> |
Inline __ieee754__rem_pio2f(). On amd64 (A64) and i386 (A64), this gives an average speedup of about 12 cycles or 17% for 9pi/4 < |x| <= 2**19pi/2 and a smaller speedup for larger x, and a small spe
Inline __ieee754__rem_pio2f(). On amd64 (A64) and i386 (A64), this gives an average speedup of about 12 cycles or 17% for 9pi/4 < |x| <= 2**19pi/2 and a smaller speedup for larger x, and a small speeddown for |x| <= 9pi/4 (only 1-2 cycles average, but that is 4%).
Inlining this is less likely to bust caches than inlining the float version since it is much smaller (about 220 bytes text and rodata) and has many fewer branches. However, the float version was already large due to its manual inlining of the branches and also the polynomial evaluations.
show more ...
|
#
70d818a2 |
| 25-Feb-2008 |
Bruce Evans <bde@FreeBSD.org> |
Change __ieee754_rem_pio2f() to return double instead of float so that this function and its callers cosf(), sinf() and tanf() don't waste time converting values from doubles to floats and back for |
Change __ieee754_rem_pio2f() to return double instead of float so that this function and its callers cosf(), sinf() and tanf() don't waste time converting values from doubles to floats and back for |x| > 9pi/4. All these functions were optimized a few years ago to mostly use doubles internally and across the __kernel*() interfaces but not across the __ieee754_rem_pio2f() interface.
This saves about 40 cycles in cosf(), sinf() and tanf() for |x| > 9pi/4 on amd64 (A64), and about 20 cycles on i386 (A64) (except for cosf() and sinf() in the upper range). 40 cycles is about 35% for |x| < 9pi/4 <= 2**19pi/2 and about 5% for |x| > 2**19pi/2. The saving is much larger on amd64 than on i386 since the conversions are not easy to optimize except on i386 where some of them are automatic and others are optimized invalidly. amd64 is still about 10% slower in cosf() and tanf() in the lower range due to conversion overhead.
This also gives a tiny speedup for |x| <= 9pi/4 on amd64 (by simplifying the code). It also avoids compiler bugs and/or additional slowness in the conversions on (not yet supported) machines where double_t != double.
show more ...
|
Revision tags: release/7.0.0_cvs, release/7.0.0 |
|
#
5aa554c7 |
| 22-Feb-2008 |
David Schultz <das@FreeBSD.org> |
s/rcsid/__FBSDID/
|
Revision tags: release/6.3.0_cvs, release/6.3.0, release/6.2.0_cvs, release/6.2.0, release/5.5.0_cvs, release/5.5.0, release/6.1.0_cvs, release/6.1.0 |
|
#
960d3da0 |
| 28-Nov-2005 |
Bruce Evans <bde@FreeBSD.org> |
Changed spelling of the request-to-inline macro name to match the change of the function name.
Added my (non-)copyright.
In k_tanf.c, added the first set of redundant parentheses to control groupin
Changed spelling of the request-to-inline macro name to match the change of the function name.
Added my (non-)copyright.
In k_tanf.c, added the first set of redundant parentheses to control grouping which was claimed to be added in the previous commit.
show more ...
|
#
94a5f9be |
| 23-Nov-2005 |
Bruce Evans <bde@FreeBSD.org> |
Use only double precision for "kernel" tanf (except for returning float). This is a minor interface change. The function is renamed from __kernel_tanf() to __kernel_tandf() so that misues of it will
Use only double precision for "kernel" tanf (except for returning float). This is a minor interface change. The function is renamed from __kernel_tanf() to __kernel_tandf() so that misues of it will cause link errors and not crashes.
This version is a routine translation with no special optimizations for accuracy or efficiency. It gives an unimportant increase in accuracy, from ~0.9 ulps to 0.5285 ulps. Almost all of the error is from the minimax polynomial (~0.03 ulps and the final rounding step (< 0.5 ulps). It gives strange differences in efficiency in the -5 to +10% range, with -O1 fairly consistently becoming faster and -O2 slower on AXP and A64 with gcc-3.3 and gcc-3.4.
show more ...
|
#
4ce51209 |
| 21-Nov-2005 |
Bruce Evans <bde@FreeBSD.org> |
Mess up the "kernel" float trig function .c files with ifdefs so that they can be #included in other .c files to give inline functions, and use them to inline the functions in most callers (not in e_
Mess up the "kernel" float trig function .c files with ifdefs so that they can be #included in other .c files to give inline functions, and use them to inline the functions in most callers (not in e_lgammaf_r.c). __kernel_tanf() is too large and complicated for gcc to inline very well.
An athlons, this gives a speed increase under favourable pipeline conditions of about 10% overall (larger for AXP, smaller for A64). E.g., on AXP, sinf() on uniformly distributed args in [-2Pi, 2Pi] now takes 30-56 cycles; it used to take 45-61 cycles; hardware fsin takes 65-129.
show more ...
|