History log of /freebsd/lib/msun/src/k_tanf.c (Results 1 – 25 of 62)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 0dd5a560 28-Jan-2024 Steve Kargl <kargl@FreeBSD.org>

lib/msun: Cleanup after $FreeBSD$ removal

Remove no longer needed explicit inclusion of sys/cdefs.h.

PR: 276669
MFC after: 1 week


Revision tags: release/14.0.0
# 1d386b48 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


Revision tags: release/13.2.0, release/12.4.0
# 82007616 10-Sep-2022 Gordon Bergling <gbe@FreeBSD.org>

msun: Remove a double word in a source code comment

- s/to to/to/

MFC after: 3 days


Revision tags: release/13.1.0, release/12.3.0, release/13.0.0, release/12.2.0, release/11.4.0, release/12.1.0, release/11.3.0, release/12.0.0, release/11.2.0, release/10.4.0, release/11.1.0, release/11.0.1, release/11.0.0, release/10.3.0, release/10.2.0, release/10.1.0, release/9.3.0, release/10.0.0, release/9.2.0, release/8.4.0, release/9.1.0
# e477abf7 27-Nov-2012 Alexander Motin <mav@FreeBSD.org>

MFC @ r241285


# a10c6f55 11-Nov-2012 Neel Natu <neel@FreeBSD.org>

IFC @ r242684


# 23090366 04-Nov-2012 Simon J. Gerraty <sjg@FreeBSD.org>

Sync from head


# 24bf3585 04-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Merge head r233826 through r240095.


# 2b795b29 11-Aug-2012 Dimitry Andric <dim@FreeBSD.org>

Change a few extern inline functions in libm to static inline, since
they need to refer to static constants, which C99 does not allow for
extern inline functions.

While here, change a comment in e_r

Change a few extern inline functions in libm to static inline, since
they need to refer to static constants, which C99 does not allow for
extern inline functions.

While here, change a comment in e_rem_pio2f.c to mention the correct
number of bits.

Reviewed by: bde
MFC after: 1 week

show more ...


Revision tags: release/8.3.0_cvs, release/8.3.0, release/9.0.0, release/7.4.0_cvs, release/8.2.0_cvs, release/7.4.0, release/8.2.0, release/8.1.0_cvs, release/8.1.0, release/7.3.0_cvs, release/7.3.0, release/8.0.0_cvs, release/8.0.0
# 10b3b545 17-Sep-2009 Dag-Erling Smørgrav <des@FreeBSD.org>

Merge from head


# 7d4b968b 17-Sep-2009 Dag-Erling Smørgrav <des@FreeBSD.org>

Merge from head up to r188941 (last revision before the USB stack switch)


# 7e857dd1 12-Jun-2009 Oleksandr Tymoshenko <gonzo@FreeBSD.org>

- Merge from HEAD


# 2e03f452 04-Jun-2009 Jung-uk Kim <jkim@FreeBSD.org>

Resync with head.


# b492f289 03-Jun-2009 Ed Schouten <ed@FreeBSD.org>

Use ISO C99 style inline semantics in msun.

Because we use ISO C99 nowadays, we can just get rid of enforcing
GNU89-style inlining.


Revision tags: release/7.2.0_cvs, release/7.2.0
# bad3b688 18-Jan-2009 Oleksandr Tymoshenko <gonzo@FreeBSD.org>

Sync with head


# 4630140c 13-Jan-2009 David Schultz <das@FreeBSD.org>

Use __gnu89_inline so that these files will compile with newer versions
of gcc, where the meaning of 'inline' was changed to match C99.

Noticed by: rdivacky


Revision tags: release/7.1.0_cvs, release/7.1.0, release/6.4.0_cvs, release/6.4.0, release/7.0.0_cvs, release/7.0.0
# 5aa554c7 22-Feb-2008 David Schultz <das@FreeBSD.org>

s/rcsid/__FBSDID/


Revision tags: release/6.3.0_cvs, release/6.3.0, release/6.2.0_cvs, release/6.2.0, release/5.5.0_cvs, release/5.5.0, release/6.1.0_cvs, release/6.1.0
# 1dd21062 28-Nov-2005 Bruce Evans <bde@FreeBSD.org>

Rearranged the polynomial evaluation some more to reduce dependencies.
Instead of echoing the code in a comment, try to describe why we split
up the evaluation in a special way.

The new optimization

Rearranged the polynomial evaluation some more to reduce dependencies.
Instead of echoing the code in a comment, try to describe why we split
up the evaluation in a special way.

The new optimization is mostly to move the evaluation of w = z*z later
so that everything else (except z = x*x) doesn't have to wait for w.
On Athlons, FP multiplication has a latency of 4 cycles so this
optimization saves 4 cycles per call provided no new dependencies are
introduced. Tweaking the other terms in to reduce dependencies saves
a couple more cycles in some cases (more on AXP than on A64; up to 8
cycles out of 56 altogether in some cases). The previous version had
a similar optimization for s = z*x. Special optimizations like these
probably have a larger effect than the simple 2-way vectorization
permitted (but not activated by gcc) in the old version, since 2-way
vectorization is not enough and the polynomial's degree is so small
in the float case that non-vectorizable dependencies dominate.

On an AXP, tanf() on uniformly distributed args in [-2pi, 2pi] now
takes 34-55 cycles (was 39-59 cycles).

show more ...


# 960d3da0 28-Nov-2005 Bruce Evans <bde@FreeBSD.org>

Changed spelling of the request-to-inline macro name to match the change
of the function name.

Added my (non-)copyright.

In k_tanf.c, added the first set of redundant parentheses to control
groupin

Changed spelling of the request-to-inline macro name to match the change
of the function name.

Added my (non-)copyright.

In k_tanf.c, added the first set of redundant parentheses to control
grouping which was claimed to be added in the previous commit.

show more ...


# 833f0e1a 24-Nov-2005 Bruce Evans <bde@FreeBSD.org>

Minor cleanups and optimizations:

- Remove dead code that I forgot to remove in the previous commit.

- Calculate the sum of the lower terms of the polynomial (divided by
x**5) in a single express

Minor cleanups and optimizations:

- Remove dead code that I forgot to remove in the previous commit.

- Calculate the sum of the lower terms of the polynomial (divided by
x**5) in a single expression (sum of odd terms) + (sum of even terms)
with parentheses to control grouping. This is clearer and happens to
give better instruction scheduling for a tiny optimization (an
average of about ~0.5 cycles/call on Athlons).

- Calculate the final sum in a single expression with parentheses to
control grouping too. Change the grouping from
first_term + (second_term + sum_of_lower_terms) to
(first_term + second_term) + sum_of_lower_terms. Normally the first
grouping must be used for accuracy, but extra precision makes any
grouping give a correct result so we can group for efficiency. This
is a larger optimization (average 3-4 cycles/call or 5%).

- Use parentheses to indicate that the C order of left to right evaluation
is what is wanted (for efficiency) in a multiplication too.

The old fdlibm code has several optimizations related to these. 2
involve doing an extra operation that can be done almost in parallel
on some superscalar machines but are pessimizations on sequential
machines. Others involve statement ordering or expression grouping.
All of these except the ordering for the combining the sums of the odd
and even terms seem to be ideal for Athlons, but parallelism is still
limited so all of these optimizations combined together with the ones
in this commit save only ~6-8 cycles (~10%).

On an AXP, tanf() on uniformly distributed args in [-2pi, 2pi] now
takes 39-59 cycles. I don't know of any more optimizations for tanf()
short of writing it all in asm with very MD instruction scheduling.
Hardware fsin takes 122-138 cycles. Most of the optimizations for
tanf() don't work very well for tan[l](). fdlibm tan() now takes
145-365 cycles.

show more ...


# 16638b55 24-Nov-2005 Bruce Evans <bde@FreeBSD.org>

Optimized by eliminating the special case for 0.67434 <= |x| < pi/4.

A single polynomial approximation for tan(x) works in infinite precision
up to |x| < pi/2, but in finite precision, to restrict t

Optimized by eliminating the special case for 0.67434 <= |x| < pi/4.

A single polynomial approximation for tan(x) works in infinite precision
up to |x| < pi/2, but in finite precision, to restrict the accumulated
roundoff error to < 1 ulp, |x| must be restricted to less than about
sqrt(0.5/((1.5+1.5)/3)) ~= 0.707. We restricted it a bit more to
give a safety margin including some slop for optimizations. Now that
we use double precision for the calculations, the accumulated roundoff
error is in double-precision ulps so it can easily be made almost 2**29
times smaller than a single-precision ulp. Near x = pi/4 its maximum
is about 0.5+(1.5+1.5)*x**2/3 ~= 1.117 double-precision ulps.

The minimax polynomial needs to be different to work for the larger
interval. I didn't increase its degree the old degree is just large
enough to keep the final error less than 1 ulp and increasing the
degree would be a pessimization. The maximum error is now ~0.80
ulps instead of ~0.53 ulps.

The speedup from this optimization for uniformly distributed args in
[-2pi, 2pi] is 28-43% on athlons, depending on how badly gcc selected
and scheduled the instructions in the old version. The old version
has some int-to-float conversions that are apparently difficult to schedule
well, but gcc-3.3 somehow did everything ~10 cycles or ~10% faster than
gcc-3.4, with the difference especially large on AXPs. On A64s, the
problem seems to be related to documented penalties for moving single
precision data to undead xmm registers. With this version, the speed
is cycles is almost independent of the athlon and gcc version despite
the large differences in instruction selection to use the FPU on AXPs
and SSE on A64s.

show more ...


# 94a5f9be 23-Nov-2005 Bruce Evans <bde@FreeBSD.org>

Use only double precision for "kernel" tanf (except for returning float).
This is a minor interface change. The function is renamed from
__kernel_tanf() to __kernel_tandf() so that misues of it will

Use only double precision for "kernel" tanf (except for returning float).
This is a minor interface change. The function is renamed from
__kernel_tanf() to __kernel_tandf() so that misues of it will cause
link errors and not crashes.

This version is a routine translation with no special optimizations
for accuracy or efficiency. It gives an unimportant increase in
accuracy, from ~0.9 ulps to 0.5285 ulps. Almost all of the error is
from the minimax polynomial (~0.03 ulps and the final rounding step
(< 0.5 ulps). It gives strange differences in efficiency in the -5
to +10% range, with -O1 fairly consistently becoming faster and -O2
slower on AXP and A64 with gcc-3.3 and gcc-3.4.

show more ...


# 4ce51209 21-Nov-2005 Bruce Evans <bde@FreeBSD.org>

Mess up the "kernel" float trig function .c files with ifdefs so that
they can be #included in other .c files to give inline functions, and
use them to inline the functions in most callers (not in e_

Mess up the "kernel" float trig function .c files with ifdefs so that
they can be #included in other .c files to give inline functions, and
use them to inline the functions in most callers (not in e_lgammaf_r.c).
__kernel_tanf() is too large and complicated for gcc to inline very well.

An athlons, this gives a speed increase under favourable pipeline
conditions of about 10% overall (larger for AXP, smaller for A64).
E.g., on AXP, sinf() on uniformly distributed args in [-2Pi, 2Pi]
now takes 30-56 cycles; it used to take 45-61 cycles; hardware fsin
takes 65-129.

show more ...


# 58652034 21-Nov-2005 Bruce Evans <bde@FreeBSD.org>

Use double precision to simplify and optimize a long division.

On athlons, this gives a speedup of 10-20% for tanf() on uniformly
distributed args in [-2Pi, 2Pi]. (It only directly applies for 43%

Use double precision to simplify and optimize a long division.

On athlons, this gives a speedup of 10-20% for tanf() on uniformly
distributed args in [-2Pi, 2Pi]. (It only directly applies for 43%
of the args and gives a 16-20% speedup for these (more for AXP than
A64) and this gives an overall speedup of 10-12% which is all that it
should; however, it gives an overall speedup of 17-20% with gcc-3.3
on AXP-A64 by mysteriously effected cases where it isn't executed.)

I originally intended to use double precision for all internals of
float trig functions and will probably still do this, but benchmarking
showed that converting to double precision and back is a pessimization
in cases where a simple float precision calculation works, so it may
be optimal to switch precisions only when using extra precision is
much simpler.

show more ...


# e96c4fd9 12-Nov-2005 Bruce Evans <bde@FreeBSD.org>

Imoproved comments for the minimax polynomial.

Removed an unused variable.

Fixed some wrong comments and some nearby misformatting.


# c01611e4 10-Nov-2005 Bruce Evans <bde@FreeBSD.org>

As for __kernel_cosf() and __kernel_sinf(), use a fairly optimal minimax
polynomial for __kernel_tanf(). The old one was the double-precision
polynomial with coefficients truncated to float. Trunca

As for __kernel_cosf() and __kernel_sinf(), use a fairly optimal minimax
polynomial for __kernel_tanf(). The old one was the double-precision
polynomial with coefficients truncated to float. Truncation is not
a good way to convert minimax polynomials to lower precision. Optimize
for efficiency and use the lowest-degree polynomial that gives a
relative error of less than 1 ulp. It has degree 13 instead of 27,
and happens to be 2.5 times more accurate (in infinite precision) than
the old polynomial (the maximum error is 0.017 ulps instead of 0.041
ulps).

Unlike for cosf and sinf, the old accuracy was close to being inadequate
-- the polynomial for double precision has a max error of 0.014 ulps
and nearly this small an error is needed. The new accuracy is also a
bit small, but exhaustive checking shows that even the old accuracy
was enough. The increased accuracy reduces the maximum relative error
in the final result on amd64 -O1 from 0.9588 ulps to 0.9044 ulps.

show more ...


123