wiki:FlonumsCowan

Version 18 (modified by cowan, 12 months ago) (diff)

--

Abstract

Flonums are a subset of the inexact real numbers provided by a Scheme implementation. In most Schemes, the flonums and the inexact reals are the same. It is required that if two flonums are equal in the sense of =, they are also equal in the sense of eqv?. That is, if 12.0f0 is a 32-bit inexact number, and 12.0 is a 64-bit inexact number, they cannot both be flonums. In this situation, it is recommended that the 64-bit numbers be flonums.

Rationale

Flonum arithmetic is already supported by many systems, mainly to remove type-dispatching overhead. Standardizing flonum arithmetic increases the portability of code that uses it. Standardizing the range of flonums would make flonum operations inefficient on some systems, which would defeat their purpose. Therefore, this SRFI specifies some of the semantics of flonums, but makes the range implementation-dependent.

The sources of the procedures in this SRFI are R7RS-small, SRFI 141, the R6RS flonum library, and the C99/Posix library <math.h>, which should be available directly or indirectly to Scheme implementers. (The C90 version of <math.h> lacks arcsinh, arccosh, arctanh, erf, and tgamma.)

Specification

Flonum operations must be at least as accurate as their generic counterparts applied to flonum arguments. It is an error, except as otherwise noted, for an argument not to be a flonum. In some cases, operations should be more accurate than their naive generic expansions because they have a smaller total roundoff error. If the generic result is a non-real number, the result is +nan.0 if the implementation supports that number, or an arbitrary flonum if not.

This SRFI uses x, y, z as parameter names for flonum arguments, and ix, iy, iz as a name for integer-valued flonum arguments, i.e., flonums for which the integer? predicate returns true.

Constants

The following C99 constants are provided as Scheme variables.

fl-e

Value of the mathematical constant e. (C99 M_E)

fl-log2-e

Value of (fllog fl-e 2.0). (C99 M_LOG2E)

fl-log10-e

Value of (fllog fl-e 10.0). (C99 M_LOG10E)

fl-ln-2

Value of (fllog 2.0) (C99 M_LN2)

fl-ln-10

Value of (fllog 10.0) (C99 M_LN10)

fl-pi

Value of the mathematical constant π. (C99 M_PI)

fl-pi/2

Value of (fl/ fl-pi 2.0) (C99 M_PI_2)

fl-pi/4

Value of (fl/ fl-pi 4.0) (C99 M_PI_4)

fl-1/pi

Value of (fl/ 1.0 fl-pi). (C99 M_1_PI)

fl-2/pi

Value of (fl/ 2.0 fl-pi). (C99 M_2_PI)

fl-2/sqrt-pi

Value of (fl/ 2.0 (flsqrt fl-pi)). (C99 M_2_SQRTPI)

fl-sqrt-2

Value of (flsqrt 2.0). (C99 M_SQRT2)

fl-1/sqrt-2

Value of (fl/ 1,0 (flsqrt 2.0)). (C99 M_SQRT1_2)

fl-maximum

Value of the largest finite flonum.

fl-fast-multiply-add

Equal to #t if (fl+* x y z) is known to be faster than (fl+ (fl* x y) z), or #f otherwise. (C99 FP_FAST_FMA`)

fl-integer-exponent-zero

Value of (flinteger-binary-log 0). (C99 FP_ILOGB0)

fl-integer-exponent-nan

Value of (flinteger-binary-log +0.nan). (C99 FP_ILOGBNAN)

fl-radix

Value of the floating-point radix (2 on most machines).

Constructors

(flonum number)

Returns the closest flonum equivalent to number in the sense of = and <.

(fladjacent x y)

Returns a flonum adjacent to x in the direction of y. Specifically: if x < y, returns the smallest flonum larger than x; if x > y, returns the largest flonum smaller than x; if x = y, returns x. (C99 nextafter)

(flcopysign x y)

Returns a flonum whose magnitude is the magnitude of x and whose sign is the sign of y. (C99 copysign)

Predicates

(flonum? obj)

Returns #t if obj is a flonum and #f otherwise.

(fl= x y z ...)

(fl< x y z ...)

(fl> x y z ...)

(fl<= x y z ...)

(fl>= x y z ...)

These procedures return #t if their arguments are (respectively): equal, monotonically increasing, monotonically decreasing, monotonically nondecreasing, or monotonically nonincreasing; they return #f otherwise. These predicates must be transitive.

(flinteger? x`)‌‌

(flzero? x)

(flpositive? x)

(flnegative? x)

(flodd? ix)

(fleven? ix)

(flfinite? ix)

(flinfinite? ix)

(flnan? ix)

These numerical predicates test a flonum for a particular property, returning #t or #f. The flinteger? procedure tests whether the flonum is an integer, flzero? tests whether it is fl=? to zero, flpositive? tests whether it is greater than zero, flnegative? tests whether it is less than zero, flodd? tests whether it is odd, fleven? tests whether it is even, flfinite? tests whether it is not an infinity and not a NaN, flinfinite? tests whether it is an infinity, and flnan? tests whether it is a NaN.

Note that (flnegative? -0.0)< must return #f; otherwise it would lose the correspondence with (fl< -0.0 0.0), which is #f according to IEEE 754.

Arithmetic

(fl+ x)

(fl* x)

Return the flonum sum or product of their flonum arguments. In general, they should return the flonum that best approximates the mathematical sum or product. (For implementations that represent flonums using IEEE binary floating point, the meaning of "best" is defined by the IEEE standards.)

(fl+* x y z)

Returns (fl+ (fl* x y) z), possibly faster. If the constant fl-fast-fl+* is #f, it will definitely be faster. (C99 fma)

(fl- x y ...)

(fl/ x y ...)

With two or more arguments, these procedures return the difference or quotient of their arguments, associating to the left. With one argument, however, they return the additive or multiplicative inverse of their argument.

In general, they should return the flonum that best approximates the mathematical difference or quotient. For undefined quotients, fl/ behaves as specified by the IEEE standards. For implementations that represent flonums using IEEE binary floating point, the meaning of "best" is reasonably well-defined by the IEEE standards.

(flmax x ...)

(flmin x ...)

Return the maximum/minimum argument. If there are no arguments, these procedures return +inf.0/-inf.0 if the implementation provides these numbers, and fl-maximum / its negation otherwise.

(flabs x)

Returns the absolute value of x.

(flabsdiff x y)

Returns (flabs (fl- x y)). (C99 fdim)

(flsgn x)

Returns (flcopy-sign 1.0 x).

(flnumerator x)

(fldenominator x)

Returns the numerator/enominator of x as a flonum; the result is computed as if <i>x</i> was represented as a fraction in lowest terms. The denominator is always positive. The denominator of 0.0 is defined to be 1.0.

(flfloor x)

(flceiling x)

(flround x)

(fltruncate x)

These procedures return integral flonums for flonum arguments that are not infinities or NaNs?. For such arguments, flfloor returns the largest integral flonum not larger than x. The flceiling procedure returns the smallest integral flonum not smaller than x. The fltruncate procedure returns the integral flonum closest to x whose absolute value is not larger than the absolute value of x. The flround procedure returns the closest integral flonum to x, rounding to even when x represents a number halfway between two integers.

Although infinities and NaNs? are not integers, these procedures return an infinity when given an infinity as an argument, and a NaN when given a NaN.

Exponents and logarithms

flexp	double exp(double)	ex
fl2	double exp2(double)	base-2 exponential
flexp-minus-1	double expm1(double)	exp-1
flsquare
flsqrt
flcbrt	double cbrt(double)	cube root
flhypot	double hypot(double, double)	sqrt(x2+y2)
flexp
fl2exp	double ldexp(double x, int n)	x*2n
flexponent	double logb(double x)	the exponent of x, which is the integral part of
log_r(|x|), as a signed floating-point value, for non-zero x, where r is the radix of the machine's floating-point arithmetic
flfraction-exponent	double modf(double, double *)	returns two values, fraction and int exponent
flinteger-exponent	int ilogb(double)	binary log as int
fllog
fllog2	double log2(double)	log base 2
fllog10	double log10(double)	log base 10
fllog+1	double log1p(double x)	log (1+x)
flnormalized-fraction-exponent	double frexp(double, int *)
returns two values, fraction in range [1/2,1) and int exponent
flscalbn	double scalbn(double x, int y)	x*ry, where r is the machine float radix

The flsqrt procedure returns the principal square root of x. For -0.0, flsqrt should return -0.0.

The flexpt procedure returns x raised to the power 'y if x is non-negative or y is an integral flonum. If x is negative and y is not an integer, the result may be a NaN, or may be some unspecified flonum. If x is zero, then the result is zero.

Trigonometric functions

(flsin x) (C99 sin)

(flcos x) (C99 cos)

(fltan x) (C99 tan)

(flasin x) (C99 asin)

(flacos x) (C99 acos)

(flatan x [y]) (C99 atan and atan2)

(flsinh x) (C99 sinh)

(flcosh x) (C99 cosh)

(fltanh x) (C99 tanh)

(flasinh x) (C99 asinh)

(flacosh x) (C99 acosh)

(flatanh x) (C99 atanh)

These are the usual trigonometric functions. Note that if the result is not a real number, +nan.0 is returned if available, or if not, an arbitrary flonum. The flatan function, when passed two arguments, returns (flatan (/ y x)) without requiring the use of complex numbers. Implementations that use IEEE binary floating-point arithmetic should follow the relevant standards for these procedures.

(flremquo x y)

Returns two values, the result of (flround-remainder x y) and the low-order n bits (as a correctly signed exact integer) of the rounded quotient. The value of n is implementation-dependent but at least 3. This function can be used to reduce the argument of the inverse trigonometric functions, while preserving the correct quadrant or octant.

Integer division

The following procedures are the flonum counterparts of procedures from SRFI 141:

flfloor/ flfloor-quotient flfloor-remainder
flceiling/ flceiling-quotient flceiling-remainder
fltruncate/ fltruncate-quotient fltruncate-remainder
flround/ flround-quotient flround-remainder
fleuclidean/ fleuclidean-quotient fleuclidean-remainder
flbalanced/ flbalanced-quotient flbalanced-remainder

They have the same arguments and semantics as their generic counterparts, except that it is an error if the arguments are not flonums.

Special functions

Scheme nameC signatureComments
flcomplementary-error-functiondouble erfc(double)-
flerror-functiondouble erf(double)-
flfirst-besseldouble jn(int n, double)bessel function of the first kind, order n
flgammadouble tgamma(double)-
fllog-gammadouble lgamma(double)returns two values, log(|gamma(x)|) and sgn(gamma(x))
flsecond-besseldouble yn(int n, double)bessel function of the second kind, order n