xref: /linux/arch/x86/math-emu/README (revision e5451c8f8330e03ad3cfa16048b4daf961af434f)
1*da957e11SThomas Gleixner +---------------------------------------------------------------------------+
2*da957e11SThomas Gleixner |  wm-FPU-emu   an FPU emulator for 80386 and 80486SX microprocessors.      |
3*da957e11SThomas Gleixner |                                                                           |
4*da957e11SThomas Gleixner | Copyright (C) 1992,1993,1994,1995,1996,1997,1999                          |
5*da957e11SThomas Gleixner |                       W. Metzenthen, 22 Parker St, Ormond, Vic 3163,      |
6*da957e11SThomas Gleixner |                       Australia.  E-mail billm@melbpc.org.au              |
7*da957e11SThomas Gleixner |                                                                           |
8*da957e11SThomas Gleixner |    This program is free software; you can redistribute it and/or modify   |
9*da957e11SThomas Gleixner |    it under the terms of the GNU General Public License version 2 as      |
10*da957e11SThomas Gleixner |    published by the Free Software Foundation.                             |
11*da957e11SThomas Gleixner |                                                                           |
12*da957e11SThomas Gleixner |    This program is distributed in the hope that it will be useful,        |
13*da957e11SThomas Gleixner |    but WITHOUT ANY WARRANTY; without even the implied warranty of         |
14*da957e11SThomas Gleixner |    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the          |
15*da957e11SThomas Gleixner |    GNU General Public License for more details.                           |
16*da957e11SThomas Gleixner |                                                                           |
17*da957e11SThomas Gleixner |    You should have received a copy of the GNU General Public License      |
18*da957e11SThomas Gleixner |    along with this program; if not, write to the Free Software            |
19*da957e11SThomas Gleixner |    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.              |
20*da957e11SThomas Gleixner |                                                                           |
21*da957e11SThomas Gleixner +---------------------------------------------------------------------------+
22*da957e11SThomas Gleixner
23*da957e11SThomas Gleixner
24*da957e11SThomas Gleixner
25*da957e11SThomas Gleixnerwm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387
26*da957e11SThomas Gleixnerwhich was my 80387 emulator for early versions of djgpp (gcc under
27*da957e11SThomas Gleixnermsdos); wm-emu387 was in turn based upon emu387 which was written by
28*da957e11SThomas GleixnerDJ Delorie for djgpp.  The interface to the Linux kernel is based upon
29*da957e11SThomas Gleixnerthe original Linux math emulator by Linus Torvalds.
30*da957e11SThomas Gleixner
31*da957e11SThomas GleixnerMy target FPU for wm-FPU-emu is that described in the Intel486
32*da957e11SThomas GleixnerProgrammer's Reference Manual (1992 edition). Unfortunately, numerous
33*da957e11SThomas Gleixnerfacets of the functioning of the FPU are not well covered in the
34*da957e11SThomas GleixnerReference Manual. The information in the manual has been supplemented
35*da957e11SThomas Gleixnerwith measurements on real 80486's. Unfortunately, it is simply not
36*da957e11SThomas Gleixnerpossible to be sure that all of the peculiarities of the 80486 have
37*da957e11SThomas Gleixnerbeen discovered, so there is always likely to be obscure differences
38*da957e11SThomas Gleixnerin the detailed behaviour of the emulator and a real 80486.
39*da957e11SThomas Gleixner
40*da957e11SThomas Gleixnerwm-FPU-emu does not implement all of the behaviour of the 80486 FPU,
41*da957e11SThomas Gleixnerbut is very close.  See "Limitations" later in this file for a list of
42*da957e11SThomas Gleixnersome differences.
43*da957e11SThomas Gleixner
44*da957e11SThomas GleixnerPlease report bugs, etc to me at:
45*da957e11SThomas Gleixner       billm@melbpc.org.au
46*da957e11SThomas Gleixneror     b.metzenthen@medoto.unimelb.edu.au
47*da957e11SThomas Gleixner
48*da957e11SThomas GleixnerFor more information on the emulator and on floating point topics, see
49*da957e11SThomas Gleixnermy web pages, currently at  http://www.suburbia.net/~billm/
50*da957e11SThomas Gleixner
51*da957e11SThomas Gleixner
52*da957e11SThomas Gleixner--Bill Metzenthen
53*da957e11SThomas Gleixner  December 1999
54*da957e11SThomas Gleixner
55*da957e11SThomas Gleixner
56*da957e11SThomas Gleixner----------------------- Internals of wm-FPU-emu -----------------------
57*da957e11SThomas Gleixner
58*da957e11SThomas GleixnerNumeric algorithms:
59*da957e11SThomas Gleixner(1) Add, subtract, and multiply. Nothing remarkable in these.
60*da957e11SThomas Gleixner(2) Divide has been tuned to get reasonable performance. The algorithm
61*da957e11SThomas Gleixner    is not the obvious one which most people seem to use, but is designed
62*da957e11SThomas Gleixner    to take advantage of the characteristics of the 80386. I expect that
63*da957e11SThomas Gleixner    it has been invented many times before I discovered it, but I have not
64*da957e11SThomas Gleixner    seen it. It is based upon one of those ideas which one carries around
65*da957e11SThomas Gleixner    for years without ever bothering to check it out.
66*da957e11SThomas Gleixner(3) The sqrt function has been tuned to get good performance. It is based
67*da957e11SThomas Gleixner    upon Newton's classic method. Performance was improved by capitalizing
68*da957e11SThomas Gleixner    upon the properties of Newton's method, and the code is once again
69*da957e11SThomas Gleixner    structured taking account of the 80386 characteristics.
70*da957e11SThomas Gleixner(4) The trig, log, and exp functions are based in each case upon quasi-
71*da957e11SThomas Gleixner    "optimal" polynomial approximations. My definition of "optimal" was
72*da957e11SThomas Gleixner    based upon getting good accuracy with reasonable speed.
73*da957e11SThomas Gleixner(5) The argument reducing code for the trig function effectively uses
74*da957e11SThomas Gleixner    a value of pi which is accurate to more than 128 bits. As a consequence,
75*da957e11SThomas Gleixner    the reduced argument is accurate to more than 64 bits for arguments up
76*da957e11SThomas Gleixner    to a few pi, and accurate to more than 64 bits for most arguments,
77*da957e11SThomas Gleixner    even for arguments approaching 2^63. This is far superior to an
78*da957e11SThomas Gleixner    80486, which uses a value of pi which is accurate to 66 bits.
79*da957e11SThomas Gleixner
80*da957e11SThomas GleixnerThe code of the emulator is complicated slightly by the need to
81*da957e11SThomas Gleixneraccount for a limited form of re-entrancy. Normally, the emulator will
82*da957e11SThomas Gleixneremulate each FPU instruction to completion without interruption.
83*da957e11SThomas GleixnerHowever, it may happen that when the emulator is accessing the user
84*da957e11SThomas Gleixnermemory space, swapping may be needed. In this case the emulator may be
85*da957e11SThomas Gleixnertemporarily suspended while disk i/o takes place. During this time
86*da957e11SThomas Gleixneranother process may use the emulator, thereby perhaps changing static
87*da957e11SThomas Gleixnervariables. The code which accesses user memory is confined to five
88*da957e11SThomas Gleixnerfiles:
89*da957e11SThomas Gleixner    fpu_entry.c
90*da957e11SThomas Gleixner    reg_ld_str.c
91*da957e11SThomas Gleixner    load_store.c
92*da957e11SThomas Gleixner    get_address.c
93*da957e11SThomas Gleixner    errors.c
94*da957e11SThomas GleixnerAs from version 1.12 of the emulator, no static variables are used
95*da957e11SThomas Gleixner(apart from those in the kernel's per-process tables). The emulator is
96*da957e11SThomas Gleixnertherefore now fully re-entrant, rather than having just the restricted
97*da957e11SThomas Gleixnerform of re-entrancy which is required by the Linux kernel.
98*da957e11SThomas Gleixner
99*da957e11SThomas Gleixner----------------------- Limitations of wm-FPU-emu -----------------------
100*da957e11SThomas Gleixner
101*da957e11SThomas GleixnerThere are a number of differences between the current wm-FPU-emu
102*da957e11SThomas Gleixner(version 2.01) and the 80486 FPU (apart from bugs).  The differences
103*da957e11SThomas Gleixnerare fewer than those which applied to the 1.xx series of the emulator.
104*da957e11SThomas GleixnerSome of the more important differences are listed below:
105*da957e11SThomas Gleixner
106*da957e11SThomas GleixnerThe Roundup flag does not have much meaning for the transcendental
107*da957e11SThomas Gleixnerfunctions and its 80486 value with these functions is likely to differ
108*da957e11SThomas Gleixnerfrom its emulator value.
109*da957e11SThomas Gleixner
110*da957e11SThomas GleixnerIn a few rare cases the Underflow flag obtained with the emulator will
111*da957e11SThomas Gleixnerbe different from that obtained with an 80486. This occurs when the
112*da957e11SThomas Gleixnerfollowing conditions apply simultaneously:
113*da957e11SThomas Gleixner(a) the operands have a higher precision than the current setting of the
114*da957e11SThomas Gleixner    precision control (PC) flags.
115*da957e11SThomas Gleixner(b) the underflow exception is masked.
116*da957e11SThomas Gleixner(c) the magnitude of the exact result (before rounding) is less than 2^-16382.
117*da957e11SThomas Gleixner(d) the magnitude of the final result (after rounding) is exactly 2^-16382.
118*da957e11SThomas Gleixner(e) the magnitude of the exact result would be exactly 2^-16382 if the
119*da957e11SThomas Gleixner    operands were rounded to the current precision before the arithmetic
120*da957e11SThomas Gleixner    operation was performed.
121*da957e11SThomas GleixnerIf all of these apply, the emulator will set the Underflow flag but a real
122*da957e11SThomas Gleixner80486 will not.
123*da957e11SThomas Gleixner
124*da957e11SThomas GleixnerNOTE: Certain formats of Extended Real are UNSUPPORTED. They are
125*da957e11SThomas Gleixnerunsupported by the 80486. They are the Pseudo-NaNs, Pseudoinfinities,
126*da957e11SThomas Gleixnerand Unnormals. None of these will be generated by an 80486 or by the
127*da957e11SThomas Gleixneremulator. Do not use them. The emulator treats them differently in
128*da957e11SThomas Gleixnerdetail from the way an 80486 does.
129*da957e11SThomas Gleixner
130*da957e11SThomas GleixnerSelf modifying code can cause the emulator to fail. An example of such
131*da957e11SThomas Gleixnercode is:
132*da957e11SThomas Gleixner          movl %esp,[%ebx]
133*da957e11SThomas Gleixner	  fld1
134*da957e11SThomas GleixnerThe FPU instruction may be (usually will be) loaded into the pre-fetch
135*da957e11SThomas Gleixnerqueue of the CPU before the mov instruction is executed. If the
136*da957e11SThomas Gleixnerdestination of the 'movl' overlaps the FPU instruction then the bytes
137*da957e11SThomas Gleixnerin the prefetch queue and memory will be inconsistent when the FPU
138*da957e11SThomas Gleixnerinstruction is executed. The emulator will be invoked but will not be
139*da957e11SThomas Gleixnerable to find the instruction which caused the device-not-present
140*da957e11SThomas Gleixnerexception. For this case, the emulator cannot emulate the behaviour of
141*da957e11SThomas Gleixneran 80486DX.
142*da957e11SThomas Gleixner
143*da957e11SThomas GleixnerHandling of the address size override prefix byte (0x67) has not been
144*da957e11SThomas Gleixnerextensively tested yet. A major problem exists because using it in
145*da957e11SThomas Gleixnervm86 mode can cause a general protection fault. Address offsets
146*da957e11SThomas Gleixnergreater than 0xffff appear to be illegal in vm86 mode but are quite
147*da957e11SThomas Gleixneracceptable (and work) in real mode. A small test program developed to
148*da957e11SThomas Gleixnercheck the addressing, and which runs successfully in real mode,
149*da957e11SThomas Gleixnercrashes dosemu under Linux and also brings Windows down with a general
150*da957e11SThomas Gleixnerprotection fault message when run under the MS-DOS prompt of Windows
151*da957e11SThomas Gleixner3.1. (The program simply reads data from a valid address).
152*da957e11SThomas Gleixner
153*da957e11SThomas GleixnerThe emulator supports 16-bit protected mode, with one difference from
154*da957e11SThomas Gleixneran 80486DX.  A 80486DX will allow some floating point instructions to
155*da957e11SThomas Gleixnerwrite a few bytes below the lowest address of the stack.  The emulator
156*da957e11SThomas Gleixnerwill not allow this in 16-bit protected mode: no instructions are
157*da957e11SThomas Gleixnerallowed to write outside the bounds set by the protection.
158*da957e11SThomas Gleixner
159*da957e11SThomas Gleixner----------------------- Performance of wm-FPU-emu -----------------------
160*da957e11SThomas Gleixner
161*da957e11SThomas GleixnerSpeed.
162*da957e11SThomas Gleixner-----
163*da957e11SThomas Gleixner
164*da957e11SThomas GleixnerThe speed of floating point computation with the emulator will depend
165*da957e11SThomas Gleixnerupon instruction mix. Relative performance is best for the instructions
166*da957e11SThomas Gleixnerwhich require most computation. The simple instructions are adversely
167*da957e11SThomas Gleixneraffected by the FPU instruction trap overhead.
168*da957e11SThomas Gleixner
169*da957e11SThomas Gleixner
170*da957e11SThomas GleixnerTiming: Some simple timing tests have been made on the emulator functions.
171*da957e11SThomas GleixnerThe times include load/store instructions. All times are in microseconds
172*da957e11SThomas Gleixnermeasured on a 33MHz 386 with 64k cache. The Turbo C tests were under
173*da957e11SThomas Gleixnerms-dos, the next two columns are for emulators running with the djgpp
174*da957e11SThomas Gleixnerms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,
175*da957e11SThomas Gleixnerusing libm4.0 (hard).
176*da957e11SThomas Gleixner
177*da957e11SThomas Gleixnerfunction      Turbo C        djgpp 1.06        WM-emu387     wm-FPU-emu
178*da957e11SThomas Gleixner
179*da957e11SThomas Gleixner   +          60.5           154.8              76.5          139.4
180*da957e11SThomas Gleixner   -          61.1-65.5      157.3-160.8        76.2-79.5     142.9-144.7
181*da957e11SThomas Gleixner   *          71.0           190.8              79.6          146.6
182*da957e11SThomas Gleixner   /          61.2-75.0      261.4-266.9        75.3-91.6     142.2-158.1
183*da957e11SThomas Gleixner
184*da957e11SThomas Gleixner sin()        310.8          4692.0            319.0          398.5
185*da957e11SThomas Gleixner cos()        284.4          4855.2            308.0          388.7
186*da957e11SThomas Gleixner tan()        495.0          8807.1            394.9          504.7
187*da957e11SThomas Gleixner atan()       328.9          4866.4            601.1          419.5-491.9
188*da957e11SThomas Gleixner
189*da957e11SThomas Gleixner sqrt()       128.7          crashed           145.2          227.0
190*da957e11SThomas Gleixner log()        413.1-419.1    5103.4-5354.21    254.7-282.2    409.4-437.1
191*da957e11SThomas Gleixner exp()        479.1          6619.2            469.1          850.8
192*da957e11SThomas Gleixner
193*da957e11SThomas Gleixner
194*da957e11SThomas GleixnerThe performance under Linux is improved by the use of look-ahead code.
195*da957e11SThomas GleixnerThe following results show the improvement which is obtained under
196*da957e11SThomas GleixnerLinux due to the look-ahead code. Also given are the times for the
197*da957e11SThomas Gleixneroriginal Linux emulator with the 4.1 'soft' lib.
198*da957e11SThomas Gleixner
199*da957e11SThomas Gleixner [ Linus' note: I changed look-ahead to be the default under linux, as
200*da957e11SThomas Gleixner   there was no reason not to use it after I had edited it to be
201*da957e11SThomas Gleixner   disabled during tracing ]
202*da957e11SThomas Gleixner
203*da957e11SThomas Gleixner            wm-FPU-emu w     original w
204*da957e11SThomas Gleixner            look-ahead       'soft' lib
205*da957e11SThomas Gleixner   +         106.4             190.2
206*da957e11SThomas Gleixner   -         108.6-111.6      192.4-216.2
207*da957e11SThomas Gleixner   *         113.4             193.1
208*da957e11SThomas Gleixner   /         108.8-124.4      700.1-706.2
209*da957e11SThomas Gleixner
210*da957e11SThomas Gleixner sin()       390.5            2642.0
211*da957e11SThomas Gleixner cos()       381.5            2767.4
212*da957e11SThomas Gleixner tan()       496.5            3153.3
213*da957e11SThomas Gleixner atan()      367.2-435.5     2439.4-3396.8
214*da957e11SThomas Gleixner
215*da957e11SThomas Gleixner sqrt()      195.1            4732.5
216*da957e11SThomas Gleixner log()       358.0-387.5     3359.2-3390.3
217*da957e11SThomas Gleixner exp()       619.3            4046.4
218*da957e11SThomas Gleixner
219*da957e11SThomas Gleixner
220*da957e11SThomas GleixnerThese figures are now somewhat out-of-date. The emulator has become
221*da957e11SThomas Gleixnerprogressively slower for most functions as more of the 80486 features
222*da957e11SThomas Gleixnerhave been implemented.
223*da957e11SThomas Gleixner
224*da957e11SThomas Gleixner
225*da957e11SThomas Gleixner----------------------- Accuracy of wm-FPU-emu -----------------------
226*da957e11SThomas Gleixner
227*da957e11SThomas Gleixner
228*da957e11SThomas GleixnerThe accuracy of the emulator is in almost all cases equal to or better
229*da957e11SThomas Gleixnerthan that of an Intel 80486 FPU.
230*da957e11SThomas Gleixner
231*da957e11SThomas GleixnerThe results of the basic arithmetic functions (+,-,*,/), and fsqrt
232*da957e11SThomas Gleixnermatch those of an 80486 FPU. They are the best possible; the error for
233*da957e11SThomas Gleixnerthese never exceeds 1/2 an lsb. The fprem and fprem1 instructions
234*da957e11SThomas Gleixnerreturn exact results; they have no error.
235*da957e11SThomas Gleixner
236*da957e11SThomas Gleixner
237*da957e11SThomas GleixnerThe following table compares the emulator accuracy for the sqrt(),
238*da957e11SThomas Gleixnertrig and log functions against the Turbo C "emulator". For this table,
239*da957e11SThomas Gleixnereach function was tested at about 400 points. Ideal worst-case results
240*da957e11SThomas Gleixnerwould be 64 bits. The reduced Turbo C accuracy of cos() and tan() for
241*da957e11SThomas Gleixnerarguments greater than pi/4 can be thought of as being related to the
242*da957e11SThomas Gleixnerprecision of the argument x; e.g. an argument of pi/2-(1e-10) which is
243*da957e11SThomas Gleixneraccurate to 64 bits can result in a relative accuracy in cos() of
244*da957e11SThomas Gleixnerabout 64 + log2(cos(x)) = 31 bits.
245*da957e11SThomas Gleixner
246*da957e11SThomas Gleixner
247*da957e11SThomas GleixnerFunction      Tested x range            Worst result                Turbo C
248*da957e11SThomas Gleixner                                        (relative bits)
249*da957e11SThomas Gleixner
250*da957e11SThomas Gleixnersqrt(x)       1 .. 2                    64.1                         63.2
251*da957e11SThomas Gleixneratan(x)       1e-10 .. 200              64.2                         62.8
252*da957e11SThomas Gleixnercos(x)        0 .. pi/2-(1e-10)         64.4 (x <= pi/4)             62.4
253*da957e11SThomas Gleixner                                        64.1 (x = pi/2-(1e-10))      31.9
254*da957e11SThomas Gleixnersin(x)        1e-10 .. pi/2             64.0                         62.8
255*da957e11SThomas Gleixnertan(x)        1e-10 .. pi/2-(1e-10)     64.0 (x <= pi/4)             62.1
256*da957e11SThomas Gleixner                                        64.1 (x = pi/2-(1e-10))      31.9
257*da957e11SThomas Gleixnerexp(x)        0 .. 1                    63.1 **                      62.9
258*da957e11SThomas Gleixnerlog(x)        1+1e-6 .. 2               63.8 **                      62.1
259*da957e11SThomas Gleixner
260*da957e11SThomas Gleixner** The accuracy for exp() and log() is low because the FPU (emulator)
261*da957e11SThomas Gleixnerdoes not compute them directly; two operations are required.
262*da957e11SThomas Gleixner
263*da957e11SThomas Gleixner
264*da957e11SThomas GleixnerThe emulator passes the "paranoia" tests (compiled with gcc 2.3.3 or
265*da957e11SThomas Gleixnerlater) for 'float' variables (24 bit precision numbers) when precision
266*da957e11SThomas Gleixnercontrol is set to 24, 53 or 64 bits, and for 'double' variables (53
267*da957e11SThomas Gleixnerbit precision numbers) when precision control is set to 53 bits (a
268*da957e11SThomas Gleixnerproperly performing FPU cannot pass the 'paranoia' tests for 'double'
269*da957e11SThomas Gleixnervariables when precision control is set to 64 bits).
270*da957e11SThomas Gleixner
271*da957e11SThomas GleixnerThe code for reducing the argument for the trig functions (fsin, fcos,
272*da957e11SThomas Gleixnerfptan and fsincos) has been improved and now effectively uses a value
273*da957e11SThomas Gleixnerfor pi which is accurate to more than 128 bits precision. As a
274*da957e11SThomas Gleixnerconsequence, the accuracy of these functions for large arguments has
275*da957e11SThomas Gleixnerbeen dramatically improved (and is now very much better than an 80486
276*da957e11SThomas GleixnerFPU). There is also now no degradation of accuracy for fcos and fptan
277*da957e11SThomas Gleixnerfor operands close to pi/2. Measured results are (note that the
278*da957e11SThomas Gleixnerdefinition of accuracy has changed slightly from that used for the
279*da957e11SThomas Gleixnerabove table):
280*da957e11SThomas Gleixner
281*da957e11SThomas GleixnerFunction      Tested x range          Worst result
282*da957e11SThomas Gleixner                                     (absolute bits)
283*da957e11SThomas Gleixner
284*da957e11SThomas Gleixnercos(x)        0 .. 9.22e+18              62.0
285*da957e11SThomas Gleixnersin(x)        1e-16 .. 9.22e+18          62.1
286*da957e11SThomas Gleixnertan(x)        1e-16 .. 9.22e+18          61.8
287*da957e11SThomas Gleixner
288*da957e11SThomas GleixnerIt is possible with some effort to find very large arguments which
289*da957e11SThomas Gleixnergive much degraded precision. For example, the integer number
290*da957e11SThomas Gleixner           8227740058411162616.0
291*da957e11SThomas Gleixneris within about 10e-7 of a multiple of pi. To find the tan (for
292*da957e11SThomas Gleixnerexample) of this number to 64 bits precision it would be necessary to
293*da957e11SThomas Gleixnerhave a value of pi which had about 150 bits precision. The FPU
294*da957e11SThomas Gleixneremulator computes the result to about 42.6 bits precision (the correct
295*da957e11SThomas Gleixnerresult is about -9.739715e-8). On the other hand, an 80486 FPU returns
296*da957e11SThomas Gleixner0.01059, which in relative terms is hopelessly inaccurate.
297*da957e11SThomas Gleixner
298*da957e11SThomas GleixnerFor arguments close to critical angles (which occur at multiples of
299*da957e11SThomas Gleixnerpi/2) the emulator is more accurate than an 80486 FPU. For very large
300*da957e11SThomas Gleixnerarguments, the emulator is far more accurate.
301*da957e11SThomas Gleixner
302*da957e11SThomas Gleixner
303*da957e11SThomas GleixnerPrior to version 1.20 of the emulator, the accuracy of the results for
304*da957e11SThomas Gleixnerthe transcendental functions (in their principal range) was not as
305*da957e11SThomas Gleixnergood as the results from an 80486 FPU. From version 1.20, the accuracy
306*da957e11SThomas Gleixnerhas been considerably improved and these functions now give measured
307*da957e11SThomas Gleixnerworst-case results which are better than the worst-case results given
308*da957e11SThomas Gleixnerby an 80486 FPU.
309*da957e11SThomas Gleixner
310*da957e11SThomas GleixnerThe following table gives the measured results for the emulator. The
311*da957e11SThomas Gleixnernumber of randomly selected arguments in each case is about half a
312*da957e11SThomas Gleixnermillion.  The group of three columns gives the frequency of the given
313*da957e11SThomas Gleixneraccuracy in number of times per million, thus the second of these
314*da957e11SThomas Gleixnercolumns shows that an accuracy of between 63.80 and 63.89 bits was
315*da957e11SThomas Gleixnerfound at a rate of 133 times per one million measurements for fsin.
316*da957e11SThomas GleixnerThe results show that the fsin, fcos and fptan instructions return
317*da957e11SThomas Gleixnerresults which are in error (i.e. less accurate than the best possible
318*da957e11SThomas Gleixnerresult (which is 64 bits)) for about one per cent of all arguments
319*da957e11SThomas Gleixnerbetween -pi/2 and +pi/2.  The other instructions have a lower
320*da957e11SThomas Gleixnerfrequency of results which are in error.  The last two columns give
321*da957e11SThomas Gleixnerthe worst accuracy which was found (in bits) and the approximate value
322*da957e11SThomas Gleixnerof the argument which produced it.
323*da957e11SThomas Gleixner
324*da957e11SThomas Gleixner                                frequency (per M)
325*da957e11SThomas Gleixner                               -------------------   ---------------
326*da957e11SThomas Gleixnerinstr   arg range    # tests   63.7   63.8    63.9   worst   at arg
327*da957e11SThomas Gleixner                               bits   bits    bits    bits
328*da957e11SThomas Gleixner-----  ------------  -------   ----   ----   -----   -----  --------
329*da957e11SThomas Gleixnerfsin     (0,pi/2)     547756      0    133   10673   63.89  0.451317
330*da957e11SThomas Gleixnerfcos     (0,pi/2)     547563      0    126   10532   63.85  0.700801
331*da957e11SThomas Gleixnerfptan    (0,pi/2)     536274     11    267   10059   63.74  0.784876
332*da957e11SThomas Gleixnerfpatan  4 quadrants   517087      0      8    1855   63.88  0.435121 (4q)
333*da957e11SThomas Gleixnerfyl2x     (0,20)      541861      0      0    1323   63.94  1.40923  (x)
334*da957e11SThomas Gleixnerfyl2xp1 (-.293,.414)  520256      0      0    5678   63.93  0.408542 (x)
335*da957e11SThomas Gleixnerf2xm1     (-1,1)      538847      4    481    6488   63.79  0.167709
336*da957e11SThomas Gleixner
337*da957e11SThomas Gleixner
338*da957e11SThomas GleixnerTests performed on an 80486 FPU showed results of lower accuracy. The
339*da957e11SThomas Gleixnerfollowing table gives the results which were obtained with an AMD
340*da957e11SThomas Gleixner486DX2/66 (other tests indicate that an Intel 486DX produces
341*da957e11SThomas Gleixneridentical results).  The tests were basically the same as those used
342*da957e11SThomas Gleixnerto measure the emulator (the values, being random, were in general not
343*da957e11SThomas Gleixnerthe same).  The total number of tests for each instruction are given
344*da957e11SThomas Gleixnerat the end of the table, in case each about 100k tests were performed.
345*da957e11SThomas GleixnerAnother line of figures at the end of the table shows that most of the
346*da957e11SThomas Gleixnerinstructions return results which are in error for more than 10
347*da957e11SThomas Gleixnerpercent of the arguments tested.
348*da957e11SThomas Gleixner
349*da957e11SThomas GleixnerThe numbers in the body of the table give the approx number of times a
350*da957e11SThomas Gleixnerresult of the given accuracy in bits (given in the left-most column)
351*da957e11SThomas Gleixnerwas obtained per one million arguments. For three of the instructions,
352*da957e11SThomas Gleixnertwo columns of results are given: * The second column for f2xm1 gives
353*da957e11SThomas Gleixnerthe number cases where the results of the first column were for a
354*da957e11SThomas Gleixnerpositive argument, this shows that this instruction gives better
355*da957e11SThomas Gleixnerresults for positive arguments than it does for negative.  * In the
356*da957e11SThomas Gleixnercases of fcos and fptan, the first column gives the results when all
357*da957e11SThomas Gleixnercases where arguments greater than 1.5 were removed from the results
358*da957e11SThomas Gleixnergiven in the second column. Unlike the emulator, an 80486 FPU returns
359*da957e11SThomas Gleixnerresults of relatively poor accuracy for these instructions when the
360*da957e11SThomas Gleixnerargument approaches pi/2. The table does not show those cases when the
361*da957e11SThomas Gleixneraccuracy of the results were less than 62 bits, which occurs quite
362*da957e11SThomas Gleixneroften for fsin and fptan when the argument approaches pi/2. This poor
363*da957e11SThomas Gleixneraccuracy is discussed above in relation to the Turbo C "emulator", and
364*da957e11SThomas Gleixnerthe accuracy of the value of pi.
365*da957e11SThomas Gleixner
366*da957e11SThomas Gleixner
367*da957e11SThomas Gleixnerbits   f2xm1  f2xm1 fpatan   fcos   fcos  fyl2x fyl2xp1  fsin  fptan  fptan
368*da957e11SThomas Gleixner62.0       0      0      0      0    437      0      0      0      0    925
369*da957e11SThomas Gleixner62.1       0      0     10      0    894      0      0      0      0   1023
370*da957e11SThomas Gleixner62.2      14      0      0      0   1033      0      0      0      0    945
371*da957e11SThomas Gleixner62.3      57      0      0      0   1202      0      0      0      0   1023
372*da957e11SThomas Gleixner62.4     385      0      0     10   1292      0     23      0      0   1178
373*da957e11SThomas Gleixner62.5    1140      0      0    119   1649      0     39      0      0   1149
374*da957e11SThomas Gleixner62.6    2037      0      0    189   1620      0     16      0      0   1169
375*da957e11SThomas Gleixner62.7    5086     14      0    646   2315     10    101     35     39   1402
376*da957e11SThomas Gleixner62.8    8818     86      0    984   3050     59    287    131    224   2036
377*da957e11SThomas Gleixner62.9   11340   1355      0   2126   4153     79    605    357    321   1948
378*da957e11SThomas Gleixner63.0   15557   4750      0   3319   5376    246   1281    862    808   2688
379*da957e11SThomas Gleixner63.1   20016   8288      0   4620   6628    511   2569   1723   1510   3302
380*da957e11SThomas Gleixner63.2   24945  11127     10   6588   8098   1120   4470   2968   2990   4724
381*da957e11SThomas Gleixner63.3   25686  12382     69   8774  10682   1906   6775   4482   5474   7236
382*da957e11SThomas Gleixner63.4   29219  14722     79  11109  12311   3094   9414   7259   8912  10587
383*da957e11SThomas Gleixner63.5   30458  14936    393  13802  15014   5874  12666   9609  13762  15262
384*da957e11SThomas Gleixner63.6   32439  16448   1277  17945  19028  10226  15537  14657  19158  20346
385*da957e11SThomas Gleixner63.7   35031  16805   4067  23003  23947  18910  20116  21333  25001  26209
386*da957e11SThomas Gleixner63.8   33251  15820   7673  24781  25675  24617  25354  24440  29433  30329
387*da957e11SThomas Gleixner63.9   33293  16833  18529  28318  29233  31267  31470  27748  29676  30601
388*da957e11SThomas Gleixner
389*da957e11SThomas GleixnerPer cent with error:
390*da957e11SThomas Gleixner        30.9           3.2          18.5    9.8   13.1   11.6          17.4
391*da957e11SThomas GleixnerTotal arguments tested:
392*da957e11SThomas Gleixner       70194  70099 101784 100641 100641 101799 128853 114893 102675 102675
393*da957e11SThomas Gleixner
394*da957e11SThomas Gleixner
395*da957e11SThomas Gleixner------------------------- Contributors -------------------------------
396*da957e11SThomas Gleixner
397*da957e11SThomas GleixnerA number of people have contributed to the development of the
398*da957e11SThomas Gleixneremulator, often by just reporting bugs, sometimes with suggested
399*da957e11SThomas Gleixnerfixes, and a few kind people have provided me with access in one way
400*da957e11SThomas Gleixneror another to an 80486 machine. Contributors include (to those people
401*da957e11SThomas Gleixnerwho I may have forgotten, please forgive me):
402*da957e11SThomas Gleixner
403*da957e11SThomas GleixnerLinus Torvalds
404*da957e11SThomas GleixnerTommy.Thorn@daimi.aau.dk
405*da957e11SThomas GleixnerAndrew.Tridgell@anu.edu.au
406*da957e11SThomas GleixnerNick Holloway, alfie@dcs.warwick.ac.uk
407*da957e11SThomas GleixnerHermano Moura, moura@dcs.gla.ac.uk
408*da957e11SThomas GleixnerJon Jagger, J.Jagger@scp.ac.uk
409*da957e11SThomas GleixnerLennart Benschop
410*da957e11SThomas GleixnerBrian Gallew, geek+@CMU.EDU
411*da957e11SThomas GleixnerThomas Staniszewski, ts3v+@andrew.cmu.edu
412*da957e11SThomas GleixnerMartin Howell, mph@plasma.apana.org.au
413*da957e11SThomas GleixnerM Saggaf, alsaggaf@athena.mit.edu
414*da957e11SThomas GleixnerPeter Barker, PETER@socpsy.sci.fau.edu
415*da957e11SThomas Gleixnertom@vlsivie.tuwien.ac.at
416*da957e11SThomas GleixnerDan Russel, russed@rpi.edu
417*da957e11SThomas GleixnerDaniel Carosone, danielce@ee.mu.oz.au
418*da957e11SThomas Gleixnercae@jpmorgan.com
419*da957e11SThomas GleixnerHamish Coleman, t933093@minyos.xx.rmit.oz.au
420*da957e11SThomas GleixnerBruce Evans, bde@kralizec.zeta.org.au
421*da957e11SThomas GleixnerTimo Korvola, Timo.Korvola@hut.fi
422*da957e11SThomas GleixnerRick Lyons, rick@razorback.brisnet.org.au
423*da957e11SThomas GleixnerRick, jrs@world.std.com
424*da957e11SThomas Gleixner
425*da957e11SThomas Gleixner...and numerous others who responded to my request for help with
426*da957e11SThomas Gleixnera real 80486.
427*da957e11SThomas Gleixner
428