wiki:ComparatorsCowan

Comparators

Please see SRFI 114 for the predecessor to this proposal. SRFI 114 has been deprecated by its author. Here is the first draft of a simpler and better replacement, which will eventually become a new SRFI.

Abstract

This SRFI provides comparators, procedure bundles that represent one or more of an equality predicate, an ordering predicate, and a hash function. By packaging these procedures together, along with a type test predicate, they can be treated as a single item for use in the implementation of data structures.

Issues

None at present.

Rationale

The four procedures above have complex dependencies on one another, and it is inconvenient to have to pass them individually to other procedures that might or might not make use of all of them. For example, a set implementation by nature requires only an equality predicate, but if it is implemented using a hash table, an appropriate hash function is also required if the implementation does not provide one; alternatively, if it is implemented using a tree, an ordering predicate is required. By passing a comparator rather than a bare equality predicate, the set implementation can make use of whatever procedures are available and useful to it.

This SRFI is a simplified and enhanced rewrite of SRFI 114, and shares some of its design rationale and all of its acknowledgements. The largest change is the replacement of the comparison procedure with the ordering procedure. This allowed most of the special-purpose comparators to be removed. In addition, many of the more specialized

Special thanks to Taylan Ulrich Bayırlı/Kammer, whose insistence that SRFI 114 was over-complicated but also inadequate inspired this redesign.

Specification

The procedures in this SRFI are in the (srfi ???) library (or (srfi :???) on R6RS), but the sample implementation currently places them in the (comparators) library. This means it can't be used alongside SRFI 114, but there's no reason for anyone to do that.

Definitions

A comparator is an object of a disjoint type. It is a bundle of procedures that are useful for comparing two objects either for equality or for ordering. There are four procedures in the bundle:

  • The type test predicate returns #t if its argument has the correct type to be passed as an argument to the other three procedures, and #f otherwise.

  • The equality predicate returns #t if the two objects are the same in the sense of the comparator, and #f otherwise. It is the programmer's responsibility to ensure that it is reflexive, symmetric, transitive, and can handle any arguments that satisfy the type test predicate.

  • The ordering predicate returns #t if the first object precedes the second in a total order, and #f otherwise. Note that if it is true, the equality predicate must be false. It is the programmer's responsibility to ensure that it is irreflexive, asymmetric, transitive, and can handle any arguments that satisfy the type test predicate.

  • The hash function takes one or more arguments and returns an exact non-negative integer; the first argument is the object to be hashed. It is the programmer's responsibility to ensure that it can handle any argument that satisfies the type test predicate, and that it returns the same value on two objects if the equality predicate says they are the same (but not necessarily the converse).

    If a second argument is provided, it is a non-negative exact integer, a hint from the caller of the hash function that the returned value should be bounded from above by the second argument. There is no requirement that a hash function support this hint, and the caller cannot count on its being applied. Supplying #f as the second argument is equivalent to providing no argument.

    If a third argument is provided, it is a positive exact integer that serves as a salt value to be mixed in along with the first argument when performing the hash calculation. If the caller is careful to provide a sufficiently random salt, attacks on the hash function become more difficult. In addition, callers that need more than one hash function can easily obtain it by varying the salt. Like the upper bound, the salt is a hint: no hash function is required to make use of it.

It is also the programmer's responsibility to ensure that all four procedures provide the same result whenever they are applied to the same object(s) (in the sense of eqv?), unless the object(s) have been mutated since the last invocation. In particular, they must not depend in any way on memory addresses in implementations where the garbage collector can move objects in memory.

Limitations

The comparator objects defined in this SRFI are not applicable to circular structure or to NaNs, or to objects containing any of these. Attempts to pass any such objects to any procedure defined here, or to any procedure that is part of a comparator defined here, is an error except as otherwise noted.

Index

  • Predicates: comparator?, comparator-ordered?, comparator-hashable?

  • Constructors: make-comparator, make-pair-comparator, make-list-comparator, make-vector-comparator, make-eq-comparator, make-eqv-comparator, make-equal-comparator

  • Standard hash functions: boolean-hash, char-hash, char-ci-hash, string-hash, string-ci-hash, symbol-hash, number-hash

  • Default comparators: default-comparator, comparator-register-default!

  • Accessors: comparator-type-test-procedure, comparator-equality-predicate, comparator-ordering-predicate, comparator-hash-function, comparator-check-type

  • Comparison predicates: =?, <?, >?, <=?, >=?

Predicates

(comparator? obj)

Returns #t if obj is a comparator, and #f otherwise.

(comparator-ordered? comparator)

Returns #t if comparator has a supplied ordering predicate, and #f otherwise.

(comparator-hashable? comparator)

Returns #t if comparator has a supplied hash function, and #f otherwise.

Constructors

The following comparator constructors all supply appropriate type test functions, equality predicates, ordering predicates, and hash functions based on the supplied arguments. They are allowed to cache their results: they need not return a newly allocated object, since comparators are pure and functional.

(make-comparator type-test equality ordering hash)

Returns a comparator which bundles the type-test, equality, ordering, and hash procedures provided. As a convenience, if ordering or hash is #f, a procedure is provided that signals an error on application. The predicates comparator-ordered? and/or comparator-hashable?, respectively, will return #f in these cases.

Here are calls on make-comparator that will return useful comparators for standard Scheme types:

  • (make-comparator boolean? boolean=? (lambda (x y) (and (not x) y)) (lambda (x) (if x 1 0))) will return a comparator for booleans, expressing the ordering #f < #t and a plausible hash function.

  • (make-comparator real? = < (lambda (x) (exact (abs x)))) will return a comparator expressing the natural ordering of real numbers and a plausible (but not optimal) hash function.

  • (make-comparator string? string=? string<? string-hash) will return a comparator expressing the implementation's ordering of strings. On R5RS systems this is a lexicographic extension of some implementation-defined ordering of characters; on R6RS systems it is a lexicographic extension of Unicode codepoint order; on R7RS systems it is implementation-defined.

  • (make-comparator string? string-ci=? string-ci<? string-ci-hash will return a comparator expressing the implementation's case-insensitive ordering of strings.

(make-pair-comparator [ car-comparator cdr-comparator ])

This comparator compares pairs using car-comparator (a default comparator by default) on their cars. If the cars are not equal, that value is returned. If they are equal, cdr-comparator (also a default comparator by default) is used on their cdrs and that value is returned. Hashing is done on both the car and cdr sides, and the results are hashed together.

(make-list-comparator [ element-comparator [ type-test empty? head tail ] ])

Returns a comparator which lexicographically compares two sequences that satisfy type-test (list by default), as follows:

  • The empty sequence, as determined by calling empty? (null? by default) compares equal to itself.
  • The empty sequence compares less than any non-empty sequence.
  • Two non-empty sequences are compared by comparing the result of calling the head procedure (car by default). If the heads are not equal when compared using element-comparator (a default comparator by default), then the result is the result of that comparison. Otherwise, the results of calling the tail procedure (cdr by default) are compared recursively.

In particular, (make-list-comparator) returns a comparator that compares lists using a default comparator for the elements, and (make-list-comparator element-comparator) returns a comparator that compares lists using element-comparator for the elements.

(make-vector-comparator [ element-comparator [ type-test length ref ] ])

Returns a comparator which compares two sequences that satisfy type-test (vector? by default), using the length procedure (vector-length by default) to determine the lengths of the sequences, and the ref procedure (vector-ref by default) to access a particular elements. If one sequence is shorter than the other, it compares less than the other. If the sequences are the same length, they are compared element-wise using element-comparator (a default comparator by default) until they are exhausted.

In particular, (make-vector-comparator) returns a comparator that compares vectors using a default comparator for the elements, and (make-vector-comparator element-comparator) returns a comparator that compares vectors using element-comparator. Similarly, (make-vector-comparator (make-default-comparator) bytevector? bytevector-length bytevector-u8-ref) returns a comparator that compares bytevectors.

(make-eq-comparator)

(make-eqv-comparator)

(make-equal-comparator)

The equality predicates of these comparators are eq?, eqv?, and equal? respectively. When their ordering procedures are applied to non-equal objects, their behavior is implementation-defined. The type test predicates always return #t.

These comparators accept circular structure (in the case of equal-comparator, provided the implementation's equal? predicate does so) and NaNs.

Standard hash functions

These are hash functions for standard Scheme types, suitable for passing to make-comparator.

(boolean-hash obj [ bound [ salt ] ] )

(char-hash obj [ bound [ salt ] ] )

(char-ci-hash obj [ bound [ salt ] ] )

(string-hash obj [ bound [ salt ] ] )

(string-ci-hash obj [ bound [ salt ] ] )

(symbol-hash obj [ bound [ salt ] ] )

(number-hash obj [ bound [ salt ] ] )

These are suitable hash functions for the specified types. The hash functions char-ci-hash and string-ci-hash treat their first argument case-insensitively.

Default comparators

(make-default-comparator)

Returns a comparator known as a default comparator that accepts any two Scheme values (with the exceptions listed in the Limitations section) and orders them in some implementation-defined way, subject to the following conditions:

  • The following arbitrary ordering between types must hold: the empty list precedes pairs, which precede booleans, which precede characters, which precede strings, which precede symbols, which precede numbers, which precede vectors, which precede bytevectors, which precede all other objects.

  • When applied to pairs, it must behave the same as a comparator returned by make-pair-comparator when applied to no arguments.

  • When applied to booleans, it must compare them using the total order #f < #t and hash them using the hash function boolean-hash.

  • When applied to characters, it must compare them using char=? and char<?. In R5RS, this is an implementation-dependent order that is typically the same as Unicode codepoint order; in R6RS and R7RS, it is Unicode codepoint order. It must also hash them using the hash function char-hash.

  • When applied to strings, it must compare them using string=? and string<?. In R5RS, this is lexicographic order on the implementation-dependent order defined by char<?; in R6RS it is lexicographic order order on Unicode codepoint order; in R7RS it is an implementation-defined order. It must also hash them using the hash function string-hash.

  • When applied to symbols, it must compare them using an implementation-dependent total order. One possibility is to use the order obtained applying symbol->string to the symbols and comparing them using the total order implied by string<?. It must also hash them using the hash function symbol-hash. It is not a requirement that symbol hash functions be consistent with string hash functions, however.

  • When applied to numbers where either number is complex, since non-real numbers cannot be compared with <, the following least-surprising ordering is defined: If the real parts are < or >, so are the numbers; otherwise, the numbers are ordered by their imaginary parts. This can still produce somewhat surprising results if one real part is exact and the other is inexact. It must also hash them using the hash functionnumber-hash.

  • When applied to real numbers, it must compare them using = and <, and hash them using the hash function number-hash.

  • When applied to vectors, it must behave the same as a comparator returned by make-vector-comparator with default arguments.

  • When applied to bytevectors, it must behave the same as a comparator created by (make-vector-comparator (make-comparator < = (lambda (x) x)) bytevector? bytevector-length bytevector-u8-ref).

  • When applied to types registered with comparator-register-default!: it must behave the same as the comparator registered using that function.

  • Given disjoint types a and b, one of three conditions must hold:

    • All objects of type a compare less than all objects of type b.
    • All objects of type a compare greater than all objects of type b.
    • All objects of either type a or type b compare equal to each other. This is not permitted for any of the standard types listed above.

(comparator-register-default! comparator)

Registers comparator for use by default comparators, such that if the objects being compared both satisfy the type test predicate of comparator, it will be employed by default comparators to compare them.

Accessors

(comparator-type-test-procedure comparator)

Returns the type test predicate of comparator.

(comparator-equality-predicate comparator)

Returns the equality predicate of comparator.

(comparator-ordering-predicate comparator)

Returns the ordering predicate of comparator.

(comparator-hash-function comparator)

Returns the hash function of comparator.

(comparator-check-type comparator obj)

Invokes the type test predicate of comparator on obj and returns true if it returns true and signals an error otherwise.

Comparison predicates

(=? comparator object1 object2 object3 ...)

(<? comparator object1 object2 object3 ...)

(>? comparator object1 object2 object3 ...)

(<=? comparator object1 object2 object3 ...)

(>=? comparator object1 object2 object3 ...)

These procedures are analogous to the number, character, and string comparison predicates of Scheme. They allow the convenient use of comparators to handle arbitrary data types.

These procedures apply the comparison procedure of comparator to the objects as follows. If the specified relation returns #t for all objecti and objectj where n is the number of objects and 1 <= i < j <= n, then the procedures return #t, but otherwise #f.

The order in which the values are compared is unspecified. Because the relations are transitive, it suffices to compare each object with its successor.

Implementation

The sample implementation contains the following files: FIXME

  • comparators-impl.scm - the record type definition and most of the procedures
  • default.scm - a simple implementation of the default constructor, which should be improved by implementers to handle records and implementation-specific types
  • hash.scm - built-in hash functions
  • r7rs-shim.scm - procedures for R7RS compatibility, including a trivial implementation of bytevectors on top of SRFI 4 u8vectors
  • complex-shim.scm - a trivial implementation of real-part and imag-part for Schemes that don't have complex numbers
  • comparators.sld - an R7RS library
  • comparators.scm - a Chicken library
  • comparators-test.scm - a test file using the Chicken test egg
Last modified 4 months ago Last modified on 10/25/15 20:31:56