Lines Matching +full:y +full:- +full:rp
27 * ------------------------------------------
29 * ------------------------------------------
33 * Scale x to y in [1,4) with even powers of 2:
34 * find an integer k such that 1 <= (y=x*2^(2k)) < 4, then
35 * sqrt(x) = 2^k * sqrt(y)
37 * Let q = sqrt(y) truncated to i bit after binary point (q = 1),
40 * s = 2*q , and y = 2 * ( y - q ). (1)
46 * -(i+1) 2
47 * (q + 2 ) <= y. (2)
49 * -(i+1)
55 * -(i+1)
56 * s + 2 <= y (3)
59 * The advantage of (3) is that s and y can be computed by
64 * s = s , y = y ; (4)
68 * -i -(i+1)
69 * s = s + 2 , y = y - s - 2 (5)
74 * it does not necessary to do a full (53-bit) comparison
82 * huge + tiny is equal to huge, and whether huge - tiny is
86 * sqrt(+-0) = +-0 ... exact
88 * sqrt(-ve) = NaN ... with invalid signal
92 *---------------
95 static const double one = 1.0, tiny=1.0e-300;
110 sqrt(-inf)=sNaN */ in sqrt()
114 if(((ix0&(~sign))|ix1)==0) return x;/* sqrt(+-0) = +-0 */ in sqrt()
116 return (x-x)/(x-x); /* sqrt(-ve) = sNaN */ in sqrt()
122 m -= 21; in sqrt()
126 m -= i-1; in sqrt()
127 ix0 |= (ix1>>(32-i)); in sqrt()
130 m -= 1023; /* unbias exponent */ in sqrt()
148 ix0 -= t; in sqrt()
163 ix0 -= t; in sqrt()
164 if (ix1 < t1) ix0 -= 1; in sqrt()
165 ix1 -= t1; in sqrt()
175 z = one-tiny; /* trigger inexact flag */ in sqrt()
200 Other methods (use floating-point arithmetic)
201 -------------
214 standard (IEEE 754-1985). The ability to perform shift, add,
215 subtract and logical AND operations upon 32-bit words is needed
222 Let x0 and x1 be the leading and the trailing 32-bit words of
226 ------------------------------------------------------
228 ------------------------------------------------------
232 ------------------------ ------------------------
234 ------------------------ ------------------------
237 as integers), we obtain an 8-bit approximation of sqrt(x) as
241 y0 := k - T1[31&(k>>15)]. ... y ~ sqrt(x) to 8 bits
242 Here k is a 32-bit integer and T1[] is an integer array containing
243 correction terms. Now magically the floating value of y (y's
244 leading 32-bit word is y0, the value of its trailing word is 0)
245 approximates sqrt(x) to almost 8-bit.
256 Apply Heron's rule three times to y, we have y approximates
259 y := (y+x/y)/2 ... almost 17 sig. bits
260 y := (y+x/y)/2 ... almost 35 sig. bits
261 y := y-(y-x/y)/2 ... within 1 ulp
265 Another way to improve y to within 1 ulp is:
267 y := (y+x/y) ... almost 17 sig. bits to 2*sqrt(x)
268 y := y - 0x00100006 ... almost 18 sig. bits to sqrt(x)
271 (x-y )*y
272 y := y + 2* ---------- ...within 1 ulp
274 3y + x
280 expression 3y*y+x. Hence it is not recommended uless division
286 By twiddling y's last bit it is possible to force y to be
290 use the expression y+-ulp for the next representable floating
291 numbers (up and down) of y. Note that y+-ulp = either fixed
292 point y+-1, or multiply y by nextafter(1,+-inf) in chopped
296 R := RZ; ... set rounding mode to round-toward-zero
297 z := x/y; ... chopped quotient, possibly inexact
299 if(z=y) {
302 return sqrt(x):=y.
304 z := z - ulp; ... special rounding
308 If (r=RN) then z=z+ulp ... rounded-to-nearest
309 If (r=RP) then { ... round-toward-+inf
310 y = y+ulp; z=z+ulp;
312 y := y+z; ... chopped sum
313 y0:=y0-0x00100000; ... y := y/2 is correctly rounded.
316 return sqrt(x):=y.
320 Square root of +inf, +-0, or NaN is itself;
328 Let x0 and x1 be the leading and the trailing 32-bit words of
331 we obtain a 7.8-bit approximation of 1/sqrt(x) as follows.
333 k := 0x5fe80000 - (x0>>1);
334 y0:= k - T2[63&(k>>14)]. ... y ~ 1/sqrt(x) to 7.8 bits
336 Here k is a 32-bit integer and T2[] is an integer array
338 value of y (y's leading 32-bit word is y0, the value of
340 to almost 7.8-bit.
355 Apply Reciproot iteration three times to y and multiply the
358 -1ulp < sqrt(x)-z<1.0625ulp.
360 ... set rounding mode to Round-to-nearest
361 y := y*(1.5-0.5*x*y*y) ... almost 15 sig. bits to 1/sqrt(x)
362 y := y*((1.5-2^-30)+0.5*x*y*y)... about 29 sig. bits to 1/sqrt(x)
364 z := x*y ... 29 bits to sqrt(x), with z*y<1
365 z := z + 0.5*z*(1-z*y) ... about 1 ulp to sqrt(x)
367 Remark 2. The constant 1.5-2^-30 is chosen to bias the error so that
368 (a) the term z*y in the final iteration is always less than 1;
370 -1 ulp < sqrt(x) - z < 1.0625 ulp
371 instead of |sqrt(x)-z|<1.03125ulp.
375 By twiddling y's last bit it is possible to force y to be
379 use the expression y+-ulp for the next representable floating
380 numbers (up and down) of y. Note that y+-ulp = either fixed
381 point y+-1, or multiply y by nextafter(1,+-inf) in chopped
384 R := RZ; ... set rounding mode to round-toward-zero
386 case RN: ... round-to-nearest
387 if(x<= z*(z-ulp)...chopped) z = z - ulp; else
390 case RZ:case RM: ... round-to-zero or round-to--inf
391 R:=RP; ... reset rounding mod to round-to-+inf
392 if(x<z*z ... rounded up) z = z - ulp; else
395 case RP: ... round-to-+inf
414 j := 1 - [(x0>>20)&1] ... j = logb(x) mod 2
415 k := z1 >> 26; ... get z's 25-th and 26-th
434 --------------------
436 --------------------
442 -------------------------------------------------
444 -------------------------------------------------
450 -------------------------------------------------