1*da957e11SThomas Gleixner +---------------------------------------------------------------------------+ 2*da957e11SThomas Gleixner | wm-FPU-emu an FPU emulator for 80386 and 80486SX microprocessors. | 3*da957e11SThomas Gleixner | | 4*da957e11SThomas Gleixner | Copyright (C) 1992,1993,1994,1995,1996,1997,1999 | 5*da957e11SThomas Gleixner | W. Metzenthen, 22 Parker St, Ormond, Vic 3163, | 6*da957e11SThomas Gleixner | Australia. E-mail billm@melbpc.org.au | 7*da957e11SThomas Gleixner | | 8*da957e11SThomas Gleixner | This program is free software; you can redistribute it and/or modify | 9*da957e11SThomas Gleixner | it under the terms of the GNU General Public License version 2 as | 10*da957e11SThomas Gleixner | published by the Free Software Foundation. | 11*da957e11SThomas Gleixner | | 12*da957e11SThomas Gleixner | This program is distributed in the hope that it will be useful, | 13*da957e11SThomas Gleixner | but WITHOUT ANY WARRANTY; without even the implied warranty of | 14*da957e11SThomas Gleixner | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | 15*da957e11SThomas Gleixner | GNU General Public License for more details. | 16*da957e11SThomas Gleixner | | 17*da957e11SThomas Gleixner | You should have received a copy of the GNU General Public License | 18*da957e11SThomas Gleixner | along with this program; if not, write to the Free Software | 19*da957e11SThomas Gleixner | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. | 20*da957e11SThomas Gleixner | | 21*da957e11SThomas Gleixner +---------------------------------------------------------------------------+ 22*da957e11SThomas Gleixner 23*da957e11SThomas Gleixner 24*da957e11SThomas Gleixner 25*da957e11SThomas Gleixnerwm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387 26*da957e11SThomas Gleixnerwhich was my 80387 emulator for early versions of djgpp (gcc under 27*da957e11SThomas Gleixnermsdos); wm-emu387 was in turn based upon emu387 which was written by 28*da957e11SThomas GleixnerDJ Delorie for djgpp. The interface to the Linux kernel is based upon 29*da957e11SThomas Gleixnerthe original Linux math emulator by Linus Torvalds. 30*da957e11SThomas Gleixner 31*da957e11SThomas GleixnerMy target FPU for wm-FPU-emu is that described in the Intel486 32*da957e11SThomas GleixnerProgrammer's Reference Manual (1992 edition). Unfortunately, numerous 33*da957e11SThomas Gleixnerfacets of the functioning of the FPU are not well covered in the 34*da957e11SThomas GleixnerReference Manual. The information in the manual has been supplemented 35*da957e11SThomas Gleixnerwith measurements on real 80486's. Unfortunately, it is simply not 36*da957e11SThomas Gleixnerpossible to be sure that all of the peculiarities of the 80486 have 37*da957e11SThomas Gleixnerbeen discovered, so there is always likely to be obscure differences 38*da957e11SThomas Gleixnerin the detailed behaviour of the emulator and a real 80486. 39*da957e11SThomas Gleixner 40*da957e11SThomas Gleixnerwm-FPU-emu does not implement all of the behaviour of the 80486 FPU, 41*da957e11SThomas Gleixnerbut is very close. See "Limitations" later in this file for a list of 42*da957e11SThomas Gleixnersome differences. 43*da957e11SThomas Gleixner 44*da957e11SThomas GleixnerPlease report bugs, etc to me at: 45*da957e11SThomas Gleixner billm@melbpc.org.au 46*da957e11SThomas Gleixneror b.metzenthen@medoto.unimelb.edu.au 47*da957e11SThomas Gleixner 48*da957e11SThomas GleixnerFor more information on the emulator and on floating point topics, see 49*da957e11SThomas Gleixnermy web pages, currently at http://www.suburbia.net/~billm/ 50*da957e11SThomas Gleixner 51*da957e11SThomas Gleixner 52*da957e11SThomas Gleixner--Bill Metzenthen 53*da957e11SThomas Gleixner December 1999 54*da957e11SThomas Gleixner 55*da957e11SThomas Gleixner 56*da957e11SThomas Gleixner----------------------- Internals of wm-FPU-emu ----------------------- 57*da957e11SThomas Gleixner 58*da957e11SThomas GleixnerNumeric algorithms: 59*da957e11SThomas Gleixner(1) Add, subtract, and multiply. Nothing remarkable in these. 60*da957e11SThomas Gleixner(2) Divide has been tuned to get reasonable performance. The algorithm 61*da957e11SThomas Gleixner is not the obvious one which most people seem to use, but is designed 62*da957e11SThomas Gleixner to take advantage of the characteristics of the 80386. I expect that 63*da957e11SThomas Gleixner it has been invented many times before I discovered it, but I have not 64*da957e11SThomas Gleixner seen it. It is based upon one of those ideas which one carries around 65*da957e11SThomas Gleixner for years without ever bothering to check it out. 66*da957e11SThomas Gleixner(3) The sqrt function has been tuned to get good performance. It is based 67*da957e11SThomas Gleixner upon Newton's classic method. Performance was improved by capitalizing 68*da957e11SThomas Gleixner upon the properties of Newton's method, and the code is once again 69*da957e11SThomas Gleixner structured taking account of the 80386 characteristics. 70*da957e11SThomas Gleixner(4) The trig, log, and exp functions are based in each case upon quasi- 71*da957e11SThomas Gleixner "optimal" polynomial approximations. My definition of "optimal" was 72*da957e11SThomas Gleixner based upon getting good accuracy with reasonable speed. 73*da957e11SThomas Gleixner(5) The argument reducing code for the trig function effectively uses 74*da957e11SThomas Gleixner a value of pi which is accurate to more than 128 bits. As a consequence, 75*da957e11SThomas Gleixner the reduced argument is accurate to more than 64 bits for arguments up 76*da957e11SThomas Gleixner to a few pi, and accurate to more than 64 bits for most arguments, 77*da957e11SThomas Gleixner even for arguments approaching 2^63. This is far superior to an 78*da957e11SThomas Gleixner 80486, which uses a value of pi which is accurate to 66 bits. 79*da957e11SThomas Gleixner 80*da957e11SThomas GleixnerThe code of the emulator is complicated slightly by the need to 81*da957e11SThomas Gleixneraccount for a limited form of re-entrancy. Normally, the emulator will 82*da957e11SThomas Gleixneremulate each FPU instruction to completion without interruption. 83*da957e11SThomas GleixnerHowever, it may happen that when the emulator is accessing the user 84*da957e11SThomas Gleixnermemory space, swapping may be needed. In this case the emulator may be 85*da957e11SThomas Gleixnertemporarily suspended while disk i/o takes place. During this time 86*da957e11SThomas Gleixneranother process may use the emulator, thereby perhaps changing static 87*da957e11SThomas Gleixnervariables. The code which accesses user memory is confined to five 88*da957e11SThomas Gleixnerfiles: 89*da957e11SThomas Gleixner fpu_entry.c 90*da957e11SThomas Gleixner reg_ld_str.c 91*da957e11SThomas Gleixner load_store.c 92*da957e11SThomas Gleixner get_address.c 93*da957e11SThomas Gleixner errors.c 94*da957e11SThomas GleixnerAs from version 1.12 of the emulator, no static variables are used 95*da957e11SThomas Gleixner(apart from those in the kernel's per-process tables). The emulator is 96*da957e11SThomas Gleixnertherefore now fully re-entrant, rather than having just the restricted 97*da957e11SThomas Gleixnerform of re-entrancy which is required by the Linux kernel. 98*da957e11SThomas Gleixner 99*da957e11SThomas Gleixner----------------------- Limitations of wm-FPU-emu ----------------------- 100*da957e11SThomas Gleixner 101*da957e11SThomas GleixnerThere are a number of differences between the current wm-FPU-emu 102*da957e11SThomas Gleixner(version 2.01) and the 80486 FPU (apart from bugs). The differences 103*da957e11SThomas Gleixnerare fewer than those which applied to the 1.xx series of the emulator. 104*da957e11SThomas GleixnerSome of the more important differences are listed below: 105*da957e11SThomas Gleixner 106*da957e11SThomas GleixnerThe Roundup flag does not have much meaning for the transcendental 107*da957e11SThomas Gleixnerfunctions and its 80486 value with these functions is likely to differ 108*da957e11SThomas Gleixnerfrom its emulator value. 109*da957e11SThomas Gleixner 110*da957e11SThomas GleixnerIn a few rare cases the Underflow flag obtained with the emulator will 111*da957e11SThomas Gleixnerbe different from that obtained with an 80486. This occurs when the 112*da957e11SThomas Gleixnerfollowing conditions apply simultaneously: 113*da957e11SThomas Gleixner(a) the operands have a higher precision than the current setting of the 114*da957e11SThomas Gleixner precision control (PC) flags. 115*da957e11SThomas Gleixner(b) the underflow exception is masked. 116*da957e11SThomas Gleixner(c) the magnitude of the exact result (before rounding) is less than 2^-16382. 117*da957e11SThomas Gleixner(d) the magnitude of the final result (after rounding) is exactly 2^-16382. 118*da957e11SThomas Gleixner(e) the magnitude of the exact result would be exactly 2^-16382 if the 119*da957e11SThomas Gleixner operands were rounded to the current precision before the arithmetic 120*da957e11SThomas Gleixner operation was performed. 121*da957e11SThomas GleixnerIf all of these apply, the emulator will set the Underflow flag but a real 122*da957e11SThomas Gleixner80486 will not. 123*da957e11SThomas Gleixner 124*da957e11SThomas GleixnerNOTE: Certain formats of Extended Real are UNSUPPORTED. They are 125*da957e11SThomas Gleixnerunsupported by the 80486. They are the Pseudo-NaNs, Pseudoinfinities, 126*da957e11SThomas Gleixnerand Unnormals. None of these will be generated by an 80486 or by the 127*da957e11SThomas Gleixneremulator. Do not use them. The emulator treats them differently in 128*da957e11SThomas Gleixnerdetail from the way an 80486 does. 129*da957e11SThomas Gleixner 130*da957e11SThomas GleixnerSelf modifying code can cause the emulator to fail. An example of such 131*da957e11SThomas Gleixnercode is: 132*da957e11SThomas Gleixner movl %esp,[%ebx] 133*da957e11SThomas Gleixner fld1 134*da957e11SThomas GleixnerThe FPU instruction may be (usually will be) loaded into the pre-fetch 135*da957e11SThomas Gleixnerqueue of the CPU before the mov instruction is executed. If the 136*da957e11SThomas Gleixnerdestination of the 'movl' overlaps the FPU instruction then the bytes 137*da957e11SThomas Gleixnerin the prefetch queue and memory will be inconsistent when the FPU 138*da957e11SThomas Gleixnerinstruction is executed. The emulator will be invoked but will not be 139*da957e11SThomas Gleixnerable to find the instruction which caused the device-not-present 140*da957e11SThomas Gleixnerexception. For this case, the emulator cannot emulate the behaviour of 141*da957e11SThomas Gleixneran 80486DX. 142*da957e11SThomas Gleixner 143*da957e11SThomas GleixnerHandling of the address size override prefix byte (0x67) has not been 144*da957e11SThomas Gleixnerextensively tested yet. A major problem exists because using it in 145*da957e11SThomas Gleixnervm86 mode can cause a general protection fault. Address offsets 146*da957e11SThomas Gleixnergreater than 0xffff appear to be illegal in vm86 mode but are quite 147*da957e11SThomas Gleixneracceptable (and work) in real mode. A small test program developed to 148*da957e11SThomas Gleixnercheck the addressing, and which runs successfully in real mode, 149*da957e11SThomas Gleixnercrashes dosemu under Linux and also brings Windows down with a general 150*da957e11SThomas Gleixnerprotection fault message when run under the MS-DOS prompt of Windows 151*da957e11SThomas Gleixner3.1. (The program simply reads data from a valid address). 152*da957e11SThomas Gleixner 153*da957e11SThomas GleixnerThe emulator supports 16-bit protected mode, with one difference from 154*da957e11SThomas Gleixneran 80486DX. A 80486DX will allow some floating point instructions to 155*da957e11SThomas Gleixnerwrite a few bytes below the lowest address of the stack. The emulator 156*da957e11SThomas Gleixnerwill not allow this in 16-bit protected mode: no instructions are 157*da957e11SThomas Gleixnerallowed to write outside the bounds set by the protection. 158*da957e11SThomas Gleixner 159*da957e11SThomas Gleixner----------------------- Performance of wm-FPU-emu ----------------------- 160*da957e11SThomas Gleixner 161*da957e11SThomas GleixnerSpeed. 162*da957e11SThomas Gleixner----- 163*da957e11SThomas Gleixner 164*da957e11SThomas GleixnerThe speed of floating point computation with the emulator will depend 165*da957e11SThomas Gleixnerupon instruction mix. Relative performance is best for the instructions 166*da957e11SThomas Gleixnerwhich require most computation. The simple instructions are adversely 167*da957e11SThomas Gleixneraffected by the FPU instruction trap overhead. 168*da957e11SThomas Gleixner 169*da957e11SThomas Gleixner 170*da957e11SThomas GleixnerTiming: Some simple timing tests have been made on the emulator functions. 171*da957e11SThomas GleixnerThe times include load/store instructions. All times are in microseconds 172*da957e11SThomas Gleixnermeasured on a 33MHz 386 with 64k cache. The Turbo C tests were under 173*da957e11SThomas Gleixnerms-dos, the next two columns are for emulators running with the djgpp 174*da957e11SThomas Gleixnerms-dos extender. The final column is for wm-FPU-emu in Linux 0.97, 175*da957e11SThomas Gleixnerusing libm4.0 (hard). 176*da957e11SThomas Gleixner 177*da957e11SThomas Gleixnerfunction Turbo C djgpp 1.06 WM-emu387 wm-FPU-emu 178*da957e11SThomas Gleixner 179*da957e11SThomas Gleixner + 60.5 154.8 76.5 139.4 180*da957e11SThomas Gleixner - 61.1-65.5 157.3-160.8 76.2-79.5 142.9-144.7 181*da957e11SThomas Gleixner * 71.0 190.8 79.6 146.6 182*da957e11SThomas Gleixner / 61.2-75.0 261.4-266.9 75.3-91.6 142.2-158.1 183*da957e11SThomas Gleixner 184*da957e11SThomas Gleixner sin() 310.8 4692.0 319.0 398.5 185*da957e11SThomas Gleixner cos() 284.4 4855.2 308.0 388.7 186*da957e11SThomas Gleixner tan() 495.0 8807.1 394.9 504.7 187*da957e11SThomas Gleixner atan() 328.9 4866.4 601.1 419.5-491.9 188*da957e11SThomas Gleixner 189*da957e11SThomas Gleixner sqrt() 128.7 crashed 145.2 227.0 190*da957e11SThomas Gleixner log() 413.1-419.1 5103.4-5354.21 254.7-282.2 409.4-437.1 191*da957e11SThomas Gleixner exp() 479.1 6619.2 469.1 850.8 192*da957e11SThomas Gleixner 193*da957e11SThomas Gleixner 194*da957e11SThomas GleixnerThe performance under Linux is improved by the use of look-ahead code. 195*da957e11SThomas GleixnerThe following results show the improvement which is obtained under 196*da957e11SThomas GleixnerLinux due to the look-ahead code. Also given are the times for the 197*da957e11SThomas Gleixneroriginal Linux emulator with the 4.1 'soft' lib. 198*da957e11SThomas Gleixner 199*da957e11SThomas Gleixner [ Linus' note: I changed look-ahead to be the default under linux, as 200*da957e11SThomas Gleixner there was no reason not to use it after I had edited it to be 201*da957e11SThomas Gleixner disabled during tracing ] 202*da957e11SThomas Gleixner 203*da957e11SThomas Gleixner wm-FPU-emu w original w 204*da957e11SThomas Gleixner look-ahead 'soft' lib 205*da957e11SThomas Gleixner + 106.4 190.2 206*da957e11SThomas Gleixner - 108.6-111.6 192.4-216.2 207*da957e11SThomas Gleixner * 113.4 193.1 208*da957e11SThomas Gleixner / 108.8-124.4 700.1-706.2 209*da957e11SThomas Gleixner 210*da957e11SThomas Gleixner sin() 390.5 2642.0 211*da957e11SThomas Gleixner cos() 381.5 2767.4 212*da957e11SThomas Gleixner tan() 496.5 3153.3 213*da957e11SThomas Gleixner atan() 367.2-435.5 2439.4-3396.8 214*da957e11SThomas Gleixner 215*da957e11SThomas Gleixner sqrt() 195.1 4732.5 216*da957e11SThomas Gleixner log() 358.0-387.5 3359.2-3390.3 217*da957e11SThomas Gleixner exp() 619.3 4046.4 218*da957e11SThomas Gleixner 219*da957e11SThomas Gleixner 220*da957e11SThomas GleixnerThese figures are now somewhat out-of-date. The emulator has become 221*da957e11SThomas Gleixnerprogressively slower for most functions as more of the 80486 features 222*da957e11SThomas Gleixnerhave been implemented. 223*da957e11SThomas Gleixner 224*da957e11SThomas Gleixner 225*da957e11SThomas Gleixner----------------------- Accuracy of wm-FPU-emu ----------------------- 226*da957e11SThomas Gleixner 227*da957e11SThomas Gleixner 228*da957e11SThomas GleixnerThe accuracy of the emulator is in almost all cases equal to or better 229*da957e11SThomas Gleixnerthan that of an Intel 80486 FPU. 230*da957e11SThomas Gleixner 231*da957e11SThomas GleixnerThe results of the basic arithmetic functions (+,-,*,/), and fsqrt 232*da957e11SThomas Gleixnermatch those of an 80486 FPU. They are the best possible; the error for 233*da957e11SThomas Gleixnerthese never exceeds 1/2 an lsb. The fprem and fprem1 instructions 234*da957e11SThomas Gleixnerreturn exact results; they have no error. 235*da957e11SThomas Gleixner 236*da957e11SThomas Gleixner 237*da957e11SThomas GleixnerThe following table compares the emulator accuracy for the sqrt(), 238*da957e11SThomas Gleixnertrig and log functions against the Turbo C "emulator". For this table, 239*da957e11SThomas Gleixnereach function was tested at about 400 points. Ideal worst-case results 240*da957e11SThomas Gleixnerwould be 64 bits. The reduced Turbo C accuracy of cos() and tan() for 241*da957e11SThomas Gleixnerarguments greater than pi/4 can be thought of as being related to the 242*da957e11SThomas Gleixnerprecision of the argument x; e.g. an argument of pi/2-(1e-10) which is 243*da957e11SThomas Gleixneraccurate to 64 bits can result in a relative accuracy in cos() of 244*da957e11SThomas Gleixnerabout 64 + log2(cos(x)) = 31 bits. 245*da957e11SThomas Gleixner 246*da957e11SThomas Gleixner 247*da957e11SThomas GleixnerFunction Tested x range Worst result Turbo C 248*da957e11SThomas Gleixner (relative bits) 249*da957e11SThomas Gleixner 250*da957e11SThomas Gleixnersqrt(x) 1 .. 2 64.1 63.2 251*da957e11SThomas Gleixneratan(x) 1e-10 .. 200 64.2 62.8 252*da957e11SThomas Gleixnercos(x) 0 .. pi/2-(1e-10) 64.4 (x <= pi/4) 62.4 253*da957e11SThomas Gleixner 64.1 (x = pi/2-(1e-10)) 31.9 254*da957e11SThomas Gleixnersin(x) 1e-10 .. pi/2 64.0 62.8 255*da957e11SThomas Gleixnertan(x) 1e-10 .. pi/2-(1e-10) 64.0 (x <= pi/4) 62.1 256*da957e11SThomas Gleixner 64.1 (x = pi/2-(1e-10)) 31.9 257*da957e11SThomas Gleixnerexp(x) 0 .. 1 63.1 ** 62.9 258*da957e11SThomas Gleixnerlog(x) 1+1e-6 .. 2 63.8 ** 62.1 259*da957e11SThomas Gleixner 260*da957e11SThomas Gleixner** The accuracy for exp() and log() is low because the FPU (emulator) 261*da957e11SThomas Gleixnerdoes not compute them directly; two operations are required. 262*da957e11SThomas Gleixner 263*da957e11SThomas Gleixner 264*da957e11SThomas GleixnerThe emulator passes the "paranoia" tests (compiled with gcc 2.3.3 or 265*da957e11SThomas Gleixnerlater) for 'float' variables (24 bit precision numbers) when precision 266*da957e11SThomas Gleixnercontrol is set to 24, 53 or 64 bits, and for 'double' variables (53 267*da957e11SThomas Gleixnerbit precision numbers) when precision control is set to 53 bits (a 268*da957e11SThomas Gleixnerproperly performing FPU cannot pass the 'paranoia' tests for 'double' 269*da957e11SThomas Gleixnervariables when precision control is set to 64 bits). 270*da957e11SThomas Gleixner 271*da957e11SThomas GleixnerThe code for reducing the argument for the trig functions (fsin, fcos, 272*da957e11SThomas Gleixnerfptan and fsincos) has been improved and now effectively uses a value 273*da957e11SThomas Gleixnerfor pi which is accurate to more than 128 bits precision. As a 274*da957e11SThomas Gleixnerconsequence, the accuracy of these functions for large arguments has 275*da957e11SThomas Gleixnerbeen dramatically improved (and is now very much better than an 80486 276*da957e11SThomas GleixnerFPU). There is also now no degradation of accuracy for fcos and fptan 277*da957e11SThomas Gleixnerfor operands close to pi/2. Measured results are (note that the 278*da957e11SThomas Gleixnerdefinition of accuracy has changed slightly from that used for the 279*da957e11SThomas Gleixnerabove table): 280*da957e11SThomas Gleixner 281*da957e11SThomas GleixnerFunction Tested x range Worst result 282*da957e11SThomas Gleixner (absolute bits) 283*da957e11SThomas Gleixner 284*da957e11SThomas Gleixnercos(x) 0 .. 9.22e+18 62.0 285*da957e11SThomas Gleixnersin(x) 1e-16 .. 9.22e+18 62.1 286*da957e11SThomas Gleixnertan(x) 1e-16 .. 9.22e+18 61.8 287*da957e11SThomas Gleixner 288*da957e11SThomas GleixnerIt is possible with some effort to find very large arguments which 289*da957e11SThomas Gleixnergive much degraded precision. For example, the integer number 290*da957e11SThomas Gleixner 8227740058411162616.0 291*da957e11SThomas Gleixneris within about 10e-7 of a multiple of pi. To find the tan (for 292*da957e11SThomas Gleixnerexample) of this number to 64 bits precision it would be necessary to 293*da957e11SThomas Gleixnerhave a value of pi which had about 150 bits precision. The FPU 294*da957e11SThomas Gleixneremulator computes the result to about 42.6 bits precision (the correct 295*da957e11SThomas Gleixnerresult is about -9.739715e-8). On the other hand, an 80486 FPU returns 296*da957e11SThomas Gleixner0.01059, which in relative terms is hopelessly inaccurate. 297*da957e11SThomas Gleixner 298*da957e11SThomas GleixnerFor arguments close to critical angles (which occur at multiples of 299*da957e11SThomas Gleixnerpi/2) the emulator is more accurate than an 80486 FPU. For very large 300*da957e11SThomas Gleixnerarguments, the emulator is far more accurate. 301*da957e11SThomas Gleixner 302*da957e11SThomas Gleixner 303*da957e11SThomas GleixnerPrior to version 1.20 of the emulator, the accuracy of the results for 304*da957e11SThomas Gleixnerthe transcendental functions (in their principal range) was not as 305*da957e11SThomas Gleixnergood as the results from an 80486 FPU. From version 1.20, the accuracy 306*da957e11SThomas Gleixnerhas been considerably improved and these functions now give measured 307*da957e11SThomas Gleixnerworst-case results which are better than the worst-case results given 308*da957e11SThomas Gleixnerby an 80486 FPU. 309*da957e11SThomas Gleixner 310*da957e11SThomas GleixnerThe following table gives the measured results for the emulator. The 311*da957e11SThomas Gleixnernumber of randomly selected arguments in each case is about half a 312*da957e11SThomas Gleixnermillion. The group of three columns gives the frequency of the given 313*da957e11SThomas Gleixneraccuracy in number of times per million, thus the second of these 314*da957e11SThomas Gleixnercolumns shows that an accuracy of between 63.80 and 63.89 bits was 315*da957e11SThomas Gleixnerfound at a rate of 133 times per one million measurements for fsin. 316*da957e11SThomas GleixnerThe results show that the fsin, fcos and fptan instructions return 317*da957e11SThomas Gleixnerresults which are in error (i.e. less accurate than the best possible 318*da957e11SThomas Gleixnerresult (which is 64 bits)) for about one per cent of all arguments 319*da957e11SThomas Gleixnerbetween -pi/2 and +pi/2. The other instructions have a lower 320*da957e11SThomas Gleixnerfrequency of results which are in error. The last two columns give 321*da957e11SThomas Gleixnerthe worst accuracy which was found (in bits) and the approximate value 322*da957e11SThomas Gleixnerof the argument which produced it. 323*da957e11SThomas Gleixner 324*da957e11SThomas Gleixner frequency (per M) 325*da957e11SThomas Gleixner ------------------- --------------- 326*da957e11SThomas Gleixnerinstr arg range # tests 63.7 63.8 63.9 worst at arg 327*da957e11SThomas Gleixner bits bits bits bits 328*da957e11SThomas Gleixner----- ------------ ------- ---- ---- ----- ----- -------- 329*da957e11SThomas Gleixnerfsin (0,pi/2) 547756 0 133 10673 63.89 0.451317 330*da957e11SThomas Gleixnerfcos (0,pi/2) 547563 0 126 10532 63.85 0.700801 331*da957e11SThomas Gleixnerfptan (0,pi/2) 536274 11 267 10059 63.74 0.784876 332*da957e11SThomas Gleixnerfpatan 4 quadrants 517087 0 8 1855 63.88 0.435121 (4q) 333*da957e11SThomas Gleixnerfyl2x (0,20) 541861 0 0 1323 63.94 1.40923 (x) 334*da957e11SThomas Gleixnerfyl2xp1 (-.293,.414) 520256 0 0 5678 63.93 0.408542 (x) 335*da957e11SThomas Gleixnerf2xm1 (-1,1) 538847 4 481 6488 63.79 0.167709 336*da957e11SThomas Gleixner 337*da957e11SThomas Gleixner 338*da957e11SThomas GleixnerTests performed on an 80486 FPU showed results of lower accuracy. The 339*da957e11SThomas Gleixnerfollowing table gives the results which were obtained with an AMD 340*da957e11SThomas Gleixner486DX2/66 (other tests indicate that an Intel 486DX produces 341*da957e11SThomas Gleixneridentical results). The tests were basically the same as those used 342*da957e11SThomas Gleixnerto measure the emulator (the values, being random, were in general not 343*da957e11SThomas Gleixnerthe same). The total number of tests for each instruction are given 344*da957e11SThomas Gleixnerat the end of the table, in case each about 100k tests were performed. 345*da957e11SThomas GleixnerAnother line of figures at the end of the table shows that most of the 346*da957e11SThomas Gleixnerinstructions return results which are in error for more than 10 347*da957e11SThomas Gleixnerpercent of the arguments tested. 348*da957e11SThomas Gleixner 349*da957e11SThomas GleixnerThe numbers in the body of the table give the approx number of times a 350*da957e11SThomas Gleixnerresult of the given accuracy in bits (given in the left-most column) 351*da957e11SThomas Gleixnerwas obtained per one million arguments. For three of the instructions, 352*da957e11SThomas Gleixnertwo columns of results are given: * The second column for f2xm1 gives 353*da957e11SThomas Gleixnerthe number cases where the results of the first column were for a 354*da957e11SThomas Gleixnerpositive argument, this shows that this instruction gives better 355*da957e11SThomas Gleixnerresults for positive arguments than it does for negative. * In the 356*da957e11SThomas Gleixnercases of fcos and fptan, the first column gives the results when all 357*da957e11SThomas Gleixnercases where arguments greater than 1.5 were removed from the results 358*da957e11SThomas Gleixnergiven in the second column. Unlike the emulator, an 80486 FPU returns 359*da957e11SThomas Gleixnerresults of relatively poor accuracy for these instructions when the 360*da957e11SThomas Gleixnerargument approaches pi/2. The table does not show those cases when the 361*da957e11SThomas Gleixneraccuracy of the results were less than 62 bits, which occurs quite 362*da957e11SThomas Gleixneroften for fsin and fptan when the argument approaches pi/2. This poor 363*da957e11SThomas Gleixneraccuracy is discussed above in relation to the Turbo C "emulator", and 364*da957e11SThomas Gleixnerthe accuracy of the value of pi. 365*da957e11SThomas Gleixner 366*da957e11SThomas Gleixner 367*da957e11SThomas Gleixnerbits f2xm1 f2xm1 fpatan fcos fcos fyl2x fyl2xp1 fsin fptan fptan 368*da957e11SThomas Gleixner62.0 0 0 0 0 437 0 0 0 0 925 369*da957e11SThomas Gleixner62.1 0 0 10 0 894 0 0 0 0 1023 370*da957e11SThomas Gleixner62.2 14 0 0 0 1033 0 0 0 0 945 371*da957e11SThomas Gleixner62.3 57 0 0 0 1202 0 0 0 0 1023 372*da957e11SThomas Gleixner62.4 385 0 0 10 1292 0 23 0 0 1178 373*da957e11SThomas Gleixner62.5 1140 0 0 119 1649 0 39 0 0 1149 374*da957e11SThomas Gleixner62.6 2037 0 0 189 1620 0 16 0 0 1169 375*da957e11SThomas Gleixner62.7 5086 14 0 646 2315 10 101 35 39 1402 376*da957e11SThomas Gleixner62.8 8818 86 0 984 3050 59 287 131 224 2036 377*da957e11SThomas Gleixner62.9 11340 1355 0 2126 4153 79 605 357 321 1948 378*da957e11SThomas Gleixner63.0 15557 4750 0 3319 5376 246 1281 862 808 2688 379*da957e11SThomas Gleixner63.1 20016 8288 0 4620 6628 511 2569 1723 1510 3302 380*da957e11SThomas Gleixner63.2 24945 11127 10 6588 8098 1120 4470 2968 2990 4724 381*da957e11SThomas Gleixner63.3 25686 12382 69 8774 10682 1906 6775 4482 5474 7236 382*da957e11SThomas Gleixner63.4 29219 14722 79 11109 12311 3094 9414 7259 8912 10587 383*da957e11SThomas Gleixner63.5 30458 14936 393 13802 15014 5874 12666 9609 13762 15262 384*da957e11SThomas Gleixner63.6 32439 16448 1277 17945 19028 10226 15537 14657 19158 20346 385*da957e11SThomas Gleixner63.7 35031 16805 4067 23003 23947 18910 20116 21333 25001 26209 386*da957e11SThomas Gleixner63.8 33251 15820 7673 24781 25675 24617 25354 24440 29433 30329 387*da957e11SThomas Gleixner63.9 33293 16833 18529 28318 29233 31267 31470 27748 29676 30601 388*da957e11SThomas Gleixner 389*da957e11SThomas GleixnerPer cent with error: 390*da957e11SThomas Gleixner 30.9 3.2 18.5 9.8 13.1 11.6 17.4 391*da957e11SThomas GleixnerTotal arguments tested: 392*da957e11SThomas Gleixner 70194 70099 101784 100641 100641 101799 128853 114893 102675 102675 393*da957e11SThomas Gleixner 394*da957e11SThomas Gleixner 395*da957e11SThomas Gleixner------------------------- Contributors ------------------------------- 396*da957e11SThomas Gleixner 397*da957e11SThomas GleixnerA number of people have contributed to the development of the 398*da957e11SThomas Gleixneremulator, often by just reporting bugs, sometimes with suggested 399*da957e11SThomas Gleixnerfixes, and a few kind people have provided me with access in one way 400*da957e11SThomas Gleixneror another to an 80486 machine. Contributors include (to those people 401*da957e11SThomas Gleixnerwho I may have forgotten, please forgive me): 402*da957e11SThomas Gleixner 403*da957e11SThomas GleixnerLinus Torvalds 404*da957e11SThomas GleixnerTommy.Thorn@daimi.aau.dk 405*da957e11SThomas GleixnerAndrew.Tridgell@anu.edu.au 406*da957e11SThomas GleixnerNick Holloway, alfie@dcs.warwick.ac.uk 407*da957e11SThomas GleixnerHermano Moura, moura@dcs.gla.ac.uk 408*da957e11SThomas GleixnerJon Jagger, J.Jagger@scp.ac.uk 409*da957e11SThomas GleixnerLennart Benschop 410*da957e11SThomas GleixnerBrian Gallew, geek+@CMU.EDU 411*da957e11SThomas GleixnerThomas Staniszewski, ts3v+@andrew.cmu.edu 412*da957e11SThomas GleixnerMartin Howell, mph@plasma.apana.org.au 413*da957e11SThomas GleixnerM Saggaf, alsaggaf@athena.mit.edu 414*da957e11SThomas GleixnerPeter Barker, PETER@socpsy.sci.fau.edu 415*da957e11SThomas Gleixnertom@vlsivie.tuwien.ac.at 416*da957e11SThomas GleixnerDan Russel, russed@rpi.edu 417*da957e11SThomas GleixnerDaniel Carosone, danielce@ee.mu.oz.au 418*da957e11SThomas Gleixnercae@jpmorgan.com 419*da957e11SThomas GleixnerHamish Coleman, t933093@minyos.xx.rmit.oz.au 420*da957e11SThomas GleixnerBruce Evans, bde@kralizec.zeta.org.au 421*da957e11SThomas GleixnerTimo Korvola, Timo.Korvola@hut.fi 422*da957e11SThomas GleixnerRick Lyons, rick@razorback.brisnet.org.au 423*da957e11SThomas GleixnerRick, jrs@world.std.com 424*da957e11SThomas Gleixner 425*da957e11SThomas Gleixner...and numerous others who responded to my request for help with 426*da957e11SThomas Gleixnera real 80486. 427*da957e11SThomas Gleixner 428