wiki:HashTablesCowan

Version 74 (modified by cowan, 4 years ago) (diff)

--

Hash tables

This WG2 proposal defines an interface to hash tables, which are widely recognized as a fundamental data structure for a wide variety of applications. A hash table is a data structure that:

  • Is disjoint from all other types.
  • Provides a mapping from objects known as keys to corresponding objects known as values.
    • Keys may be any Scheme objects in some kinds of hash tables, but are restricted in other kinds.
    • Values may be any Scheme objects.
  • Has no intrinsic order for the key-value associations it contains.
  • Provides an equality predicate which defines when a proposed key is the same as an existing key. No table may contain more than one value for a given key.
  • Provides a hash function which maps a candidate key into a non-negative exact integer.
  • Supports mutation as the primary means of setting the contents of a table.
  • Assumes that keys are immutable; mutating a key leads to unspecified behavior.
  • Provides key lookup and destructive update in (expected) amortized constant time, provided a satisfactory hash function is available.
  • Does not guarantee that whole-table operations work in the presence of concurrent mutation of the whole hash table (values may be mutated).

Issues

  1. (resolved)
  1. (resolved)

Rationale

Hash tables themselves don't really need defending: almost all dynamically typed languages, from awk to JavaScript to Lua to Perl to Python to Common Lisp, and including many Scheme implementations, provide them in some form as a fundamental data structure. Therefore, what needs to be defended is not the data structure but the procedures. The present proposal is at an intermediate level. It supports a great many convenience procedures on top of the basic hash table interfaces provided by SRFI 69 and R6RS. Nothing in it adds power to what those interfaces provide, but it does add convenience in the form of pre-debugged routines to do various common things, and even some things not so commonly done but useful.

There is no mandated support for thread safety, immutability, or weakness, though there is a semi-portable hook for specifying these features.

This specification accepts separate equality predicates and hash functions for backward compatibility, but strongly encourages the use of SRFI 114 comparators, which package a type test, an equality predicate, and a hash function into a single bundle.

SRFI 69 compatibility

The names used in the present proposal are mostly derived from SRFI 69, with the following changes:

  • The hash-table-exists? procedure seems to ask if its argument is an existing hash table. It has been renamed hash-table-contains?, as in R6RS.
  • The hash-table-walk procedure places the hash table argument first and the walker procedure second. It has been renamed hash-table-for-each and given the same argument order as for-each, string-for-each, and so on. However, unlike those procedures it can only accept one hash table: coordinated access to multiple hash tables is meaningless, given that hash tables are unordered.
  • The SRFI 69 hash-table-fold procedure places the hash table argument first. For the same reasons as the above, in this proposal it appears last.
  • The hash-table-merge! procedure has been renamed hash-table-union!, since it causes the first hash table to become the union of the two hash table arguments. A third argument specifying how to merge the values has been added.

R6RS compatibility

The relatively few hash table procedures in R6RS are all available in the present proposal under somewhat different names. The present proposal adopts SRFI 69's term hash-table rather than R6RS's hashtable, because of the universal use of "hash table" rather than "hashtable" in other languages and in technical prose generally. Besides, the English word hashtable obviously means something that can be ... hashted.

In addition, the hashtable-ref and hashtable-update! of R6RS correspond to the hash-table-ref/default and hash-table-update!/default of both SRFI 69 and the present proposal.

It would be trivial to provide the R6RS names (or for that matter the SRFI 69 names) on top of the present proposal. The only substantive difference is that R6RS hashtable-values and hashtable-entries return vectors, whereas in the present proposal hash-table-values and hash-table-entries return lists.

Reflection and hash-function procedures

SRFI 69 provides reflective procedures that, given a hash table, returns its equality predicate and hash function, as well as procedures that expose the implementation's hash functions suitable for the equality predicates eq?, equal?, string=?, and string-ci=?. The second of these can also be used for eqv?. However, if the eq? hash function directly exposes the address of the key, and the garbage collector moves the hash table, it must also rehash its keys. In such implementations, the hash-by-identity procedure is not idempotent, which makes it dangerous to use outside the context of implementation-provided hash tables.

R6RS eliminates this issue by providing separate constructors for eq? and eqv? hash tables, and refusing to expose the hash functions for them. However, the presents proposal takes the radical option of providing neither reflection nor implementation-based hash functions. They can of course be provided by implementations as extensions with suitable warnings against inappropriate uses.

Common Lisp compatibility

As usual, the Common Lisp names are completely different from the Scheme names. Common Lisp provides the following capabilities that are not in the present proposal:

  • The constructor allows specifying the current capacity (as opposed to size), rehash size, and rehash threshold of the new hash table. There are also accessors and mutators for these.
  • There are hash tables based on equalp (which is not in Scheme).
  • With-hash-table-iterator is a hash table external iterator implemented as a local macro.
  • Sxhash is a stable hash function.

Sources

The procedures in the present proposal are drawn primarily from SRFI 69 and R6RS. In addition, the following sources are acknowledged:

  • The hash-table-extend!, hash-table-extend/default, hash-table-map->alist, hash-table->alist, and alist->hash-table are from Racket, renamed in the style of this proposal.
  • The hash-table-push! and hash-table-pop! procedures are from Gauche.
  • The hash-table-find procedure is from Gambit.
  • The procedure hash-table-set-entries! was suggested by the Common Lisp function pairlis.
  • The procedures hash-table-unfold and hash-table-count were suggested by SRFI 1.
  • The procedures hash-table-accessor and hash-table-accessor/default were loosely inspired by the finite functions of Owl Lisp.
  • The hash table comparison procedures, and the procedures hash-table-map, hash-table-map-entries, hash-table-filter, and hash-table-partition, were suggested by Haskell's Data.Map.Strict module.

The procedures hash-table=?, hash-table-set-entries!, hash-table-replace!, hash-table-replace!/default, hash-table-search!, hash-table-update-each!, hash-table-intersection!, and hash-table-difference! were added for completeness.

The native hash tables of MIT, Guile, SISC, Bigloo, Scheme48, SLIB, RScheme, Scheme 7, Scheme 9, Rep, and FemtoLisp were also investigated, but no additional procedures were incorporated.

Pronunciation

The slash in the names of some procedures can be pronounced "with".

Acknowledgements

Some of the language of the present proposal is copied from SRFI 69 with thanks to its author, Panu Kalliokoski. However, he is not responsible for what I have made of it.

Specification

The procedures in the present proposal are in the (srfi xxx) library (or (srfi :xxx) on R6RS). However, the library in the sample implementation is currently named (hash-tables).

All references to "executing in expected amortized constant time" presuppose that a satisfactory hash function has been supplied by the user. Arbitrary or non-idempotent hash functions can make a hash of the implementation.

Hash tables are allowed to cache the results of calling the equality predicate and hash function, so programs cannot rely on the hash function being called exactly once for every primitive hash table operation: it may be called zero, one, or more times.

It is an error if the procedure argument of hash-table-find, hash-table-count, hash-table-map, hash-table-map-entries, hash-table-for-each, hash-table-update-each!, hash-table-map->list, or hash-table-fold mutates the hash table being walked.

Index

  • Constructors: make-hash-table, hash-table, hash-table-unfold
  • Predicates: hash-table?, hash-table-contains?, hash-table=?, hash-table<?, hash-table>?, hash-table<=?, hash-table>=?
  • Accessors: hash-table-ref, hash-table-ref/default
  • Mutators: hash-table-set!, hash-table-set-all!, hash-table-set-entries!, hash-table-set-alist!, hash-table-delete!, hash-table-delete-keys!, hash-table-extend!, hash-table-extend!/default, hash-table-replace!, hash-table-replace!/default, hash-table-update!, hash-table-update!/default, hash-table-push!, hash-table-pop!, hash-table-search!
  • The whole hash table: hash-table-clear!, hash-table-size, hash-table-keys, hash-table-values, hash-table-entries, hash-table-find, hash-table-count
  • Mapping and folding: hash-table-map, hash-table-for-each, hash-table-map-entries, hash-table-update-each!, hash-table-map->list, hash-table-fold, hash-table-filter, hash-table-partition

Constructors

(make-hash-table comparator [ arg ... ])

(make-hash-table equality-predicate [ hash-function ] [ arg ... ])

Returns a newly allocated hash table whose equality predicate and hash function are extracted from comparator. Alternatively, for backward compatibility the equality predicate and hash function can be passed as separate arguments; this usage is deprecated. If an equality predicate rather than a comparator is provided, the ability to omit the hash argument is severely limited. The implementation must provide hash functions appropriate for use with the predicates eq?, eqv?, equal?, string=?, and string-ci=?, and may extend this list. But if any unknown equality predicate is provided without a hash function, an error should be signaled.

It is an error unless the equality predicate accepts two arguments and returns a truth value, and the hash function accepts one argument in the domain of the equality predicate and returns a non-negative exact integer. It is the programmer's responsibility to ensure that if two objects are the same in the sense of equality predicate, then they return the same value when passed to the hash function. However, the converse is not required.

If the equality predicate, whether passed as part of a comparator or explicitly, is more fine-grained (in the sense of R7RS-small section 6.1) than equal?, the implementation is free to ignore the programmer-specified hash function and use something implementation-dependent. This allows the use of addresses as hashes, in which case the keys must be rehashed if they are moved by the garbage collector.

The meaning of any further arguments is implementation-dependent. However, implementations which support the ability to specify the initial capacity of a hash table should interpret a non-negative exact integer as the specification of that capacity. In addition, if the symbols immutable, thread-safe, weak-keys or weak-values are present, implementations should create immutable hash tables, mutable thread-safe hash tables, hash tables with weak keys, and hash tables with weak values respectively. In an implementation which does not support these features, an error should be signaled if they are requested. To avoid collision with the hash argument, none of these arguments can be procedures.

(SRFI 69 make-hash-table; R6RS make-eq-hashtable, make-eqv-hashtable, and make-hashtable; Common Lisp make-hash-table)

(hash-table comparator ( key value ) ...)

Returns a newly allocated hash table, created as if by make-hash-table. For each pair of arguments, an association is added to the new hash table with key as its key and value as its value.

(hash-table-unfold continue? mapper successor seed comparator ])

Create a new hash table as if by make-hash-table. If the result of applying the predicate continue? to seed is #f, return the hash table. Otherwise, apply the procedure mapper to seed. Mapper returns two values, which are inserted into the hash table as the key and the value respectively. Then get a new seed by applying the procedure successor to seed, and repeat this algorithm.

Predicates

(hash-table? obj)

Returns #t if obj is a hash table, and #f otherwise. (SRFI 69 hash-table?; R6RS hashtable?; Common Lisp hash-table-p)

(hash-table-contains? hash-table key)

Returns #t if there is any association to key in hash-table, and #f otherwise. Must execute in expected amortized constant time. (SRFI 69 hash-table-exists?; R6RS hashtable-contains?)

Note: The following three predicates do not obey the trichotomy law and therefore do not constitute a total order on hash tables.

(hash-table=? value=? hash-table1 hash-table2)

Returns #t if hash-table1 and hash-table2 have the same keys (in the sense of their common equality predicate) and each key has the same values (in the sense of the value=? predicate), and #f otherwise. It is an error to compare two hash tables that have distinct comparators or distinct equality predicates.

(hash-table<? value=? hash-table1 hash-table2)

Returns #t if there are fewer keys in hash-table1 than in hash-table2, all the keys in hash-table1 also exist in hash-table2 (in the sense of their common equality predicate), and each such key has the same value (in the sense of the value=? predicate), and #f otherwise. It is an error to compare two hash tables that have distinct comparators or distinct equality predicates.

(hash-table>? value=? hash-table1 hash-table2)

Returns #t if there are more keys in hash-table1 than in hash-table2, all the keys in hash-table2 also exist in hash-table1 (in the sense of their common equality predicate), and each such key has the same value (in the sense of the value=? predicate), and #f otherwise. It is an error to compare two hash tables that have distinct comparators or distinct equality predicates.

(hash-table<=? value=? hash-table1 hash-table2)

Returns #t if either hash-table=? or hash-table<? returns #t.

(hash-table>=? value=? hash-table1 hash-table2)

Returns #t if either hash-table=? or hash-table>? returns #t.

Accessors

The following procedures, given a key, return the corresponding value.

(hash-table-ref hash-table key [ failure [ success ] ])

Extracts the value associated to key in hash-table, invokes the procedure success on it, and returns its result; if success is not provided, then the value itself is returned. Otherwise, hash-table-ref invokes the procedure failure on no arguments and returns its result; if failure is not provided, then an error that satisfies hash-table-key-not-found? is signaled. Must execute in expected amortized constant time, not counting the time to call the procedures. (SRFI 69 hash-table-ref)

(hash-table-ref/default hash-table key default)

Semantically equivalent to, but may be more efficient than, the following code:

(hash-table-ref hash-table key (lambda () default))

(SRFI 69 hash-table-ref/default, R6RS hashtable-ref; Common Lisp gethash)

Mutators

The following procedures alter the associations in a hash table either unconditionally, or conditionally on the presence or absence of a specified key.

(hash-table-set! hash-table key value)

Creates a new association in hash-table that associates key with value. If there is a previous association for key, it is deleted. It is an error if the type check procedure of the comparator of hash-table, when invoked on key, does not return #t. Likewise, it is an error if key is not a valid argument to the equality predicate of hash-table. Returns an unspecified value. Must execute in expected amortized constant time. (SRFI 69 hash-table-set!; R6RS hashtable-set!)

(hash-table-set-all! hash-table ( key value ) ...)

Repeatedly mutates hash-table, setting each key to the value that follows it.

(hash-table-set-entries! hash-table keys-list values-list)

Repeatedly mutates hash-table, setting each element of keys-list to the corresponding element of values-list in the order in which they are specified. Excess keys or values are ignored.

(hash-table-set-alist! hash-table alist)

Repeatedly mutates hash-table using the associations of alist in the reverse of the order in which they appear in alist.

(hash-table-delete! hash-table key)

Deletes any association to key in hash-table and returns true if there was an association for key, false if not. Must execute in expected amortized constant time. (SRFI 69 hash-table-delete!; R6RS hashtable-delete!; Common Lisp remhash)

(hash-table-delete-keys! hash-table keys-list)

Semantically equivalent to, but may be implemented more efficiently than, the following code:

(for-each (lambda (key) (hash-table-delete hash-table key)) keys-list)

(hash-table-extend! hash-table key [ failure [ success ] ])

Invokes hash-table-ref with the given arguments and returns what it returns. If key was not found in hash-table, its value is set to the result being returned.

(hash-table-extend!/default hash-table key default)

Invokes hash-table-ref/default with the given arguments and returns what it returns. If key was not found in hash-table, its association is set to the result being returned.

(hash-table-replace! hash-table key [ failure [ success ] ])

Invokes hash-table-ref with the given arguments and returns what it returns. If key was found in hash-table, its value is set to the result being returned.

(hash-table-replace!/default hash-table key [ default ])

Invokes hash-table-ref/default with the given arguments and returns what it returns. If key was found in hash-table, its value is set to the result being returned.

(hash-table-update! hash-table key updater [ failure [ success ] ])

Semantically equivalent to, but may be implemented more efficiently than, the following code:

(hash-table-set! hash-table key (updater (hash-table-ref hash-table key failure success)))

Must execute in expected amortized constant time. Returns an unspecified value. (SRFI 69 hash-table-update!/default; R6RS hashtable-update!)

(hash-table-update!/default hash-table key updater default)

Semantically equivalent to, but may be implemented more efficiently than, the following code:

(hash-table-set! hash-table key (updater (hash-table-ref/default hash-table key default)))

Must execute in expected amortized constant time. Returns an unspecified value. (SRFI 69 hash-table-update!)

(hash-table-push! hash-table key value)

Semantically equivalent to, but may be implemented more efficiently than, the following code:

(hash-table-update!/default hash-table key (lambda (x) (cons value x)) (list value))

(hash-table-pop! hash-table key)

If an association with key is found in hash-table, then return the car of the value, and update the value to its own cdr. If the value is not found, an error satisfied by hash-table-key-not-found is signaled. It is an error if the value is not a pair.

(hash-table-search! hash-table key failure success)

Search hash-table for key. If it is not found, invoke the failure procedure with one argument, a unary add procedure which if invoked will insert a new association between key and its argument. If key is found, invoke the success procedure with three arguments: the value found, a unary set procedure that updates the association to have the value specified as its argument, and a nullary delete procedure that deletes the association. Returns whatever success or failure returns, as the case may be.

The whole hash table

These procedures process the associations of the hash table in arbitrary order.

(hash-table-clear! hash-table)

Delete all the associations from hash-table. Should execute in expected amortized constant time. (R6RS hashtable-clear!; Common Lisp clrhash)

(hash-table-size hash-table)

Returns the number of associations in hash-table as an exact integer. Should execute in expected amortized constant time. (SRFI 69 hash-table-size, R6RS hashtable-size; Common Lisp hash-table-count.)

(hash-table-keys hash-table)

Returns a newly allocated list of all the keys in hash-table in an unspecified order. (SRFI 69 hash-table-keys; R6RS hashtable-keys returns a vector)

(hash-table-values hash-table)

Returns a newly allocated list of all the keys in hash-table in an unspecified order, not necessarily the same order as hash-table-keys. (SRFI 69 hash-table-values)

(hash-table-entries hash-table)

Returns two values, a newly allocated list of all the keys in hash-table in an unspecified order and a newly allocated of all the values in hash-table in the corresponding order. (R6RS hash-table-entries returns vectors)

(hash-table-find hash-table proc)

For each association of hash-table, invoke proc on its key and value in an unspecified order. If proc returns true on a value, then return that value. If all the calls to proc return #f, return #f.

(hash-table-count hash-table pred)

For each association of hash-table, invoke pred on its key and value in an unspecified order. Return the number of calls to pred which returned true.

Mapping and folding

These procedures process the associations of the hash table in arbitrary order.

(hash-table-map comparator proc merger hash-table)

Returns a newly allocated hash table as if by make-hash-table. Calls proc for every association in hash-table with the value of the association. The key of the association and the result of invoking proc are entered into the new hash table.

When the key being added already exists in the hash table, the procedure merger is called with arguments oldkey oldvalue newkey newvalue and returns the value to be associated with that key.

(hash-table-for-each proc hash-table)

Calls proc for every association in hash-table with two arguments: the key of the association and the value of the association. The value returned by proc is discarded.Returns an unspecified value. (SRFI 69 hash-table-walk with the arguments reversed; Common Lisp maphash)

(hash-table-map-entries comparator proc merger hash-table)

Returns a newly allocated hash table as if by make-hash-table. Calls proc for every association in hash-table with two arguments: the key of the association and the value of the association. The two values returned by proc are inserted into the new hash table as a key and value.

When the key being added already exists in the hash table, the procedure merger is called with arguments oldkey oldvalue newkey newvalue and returns the value to be associated with that key.

(hash-table-update-each! proc hash-table)

Calls proc for every association in hash-table with two arguments: the key of the association and the value of the association. The value returned by proc is used to update the value of the association. Returns an unspecified value.

(hash-table-map->list proc hash-table)

Calls proc for every association in hash-table with two arguments: the key of the association and the value of the association. The value returned by proc is accumulated into a list, which is returned.

(hash-table-fold proc init hash-table)

Calls proc for every association in hash-table with three arguments: the key of the association, the value of the association, and an accumulated value val. Val is init for the first invocation of procedure, and for subsequent invocations of procedure, the returned value of the previous invocation. The value returned by hash-table-fold is the return value of the last invocation of proc. (SRFI 69 hash-table-fold has the hash-table as the first argument)

(hash-table-filter comparator proc hash-table)

Returns a newly allocated hash table using comparator. Calls proc for every association in hash-table with two arguments, the key and the value of the association, and enters all associations for which proc returns true into the new hash table.

(hash-table-remove comparator proc hash-table)

Returns a newly allocated hash table using comparator. Calls proc for every association in hash-table with two arguments, the key and the value of the association, and enters all associations for which proc returns false into the new hash table.

(hash-table-filter! comparator proc hash-table)

Calls proc for every association in hash-table with two arguments, the key and the value of the association, and removes all associations for which proc returns false from hash-table, which is returned.

(hash-table-partition comparator proc hash-table)

Returns two values, both newly allocated hash tables using comparator. Calls proc for every association in hash-table with two arguments, the key and the value of the association, and enters all associations for which proc returns true into the first hash table and all other associations into the second hash table.

(hash-table-any? comparator proc hash-table)

Calls proc for as many associations in hash-table as necessary with two arguments, the key and the value of the association. If any invocation of proc returns true, hash-table-any? returns #t; otherwise it returns #f.

(hash-table-every? comparator proc hash-table)

Calls proc for as many associations in hash-table as necessary with two arguments, the key and the value of the association. If every invocation of proc returns true, hash-table-any? returns #t; otherwise it returns #f.

Copying and conversion

(hash-table-copy hash-table)

Returns a newly allocated hash table with the same properties and associations as hash-table. (SRFI 69 hash-table-copy; R6RS hashtable-copy, passing #t as the second argument)

(hash-table->alist hash-table)

Returns an alist with the same associations as hash-table in an unspecified order.

(alist->hash-table alist comparator)

Returns a newly allocated hash-table as if by make-hash-table, initializing it from the associations of alist. (SRFI 69 alist->hash-table)

Hash tables as functions

The following procedures allow hash tables to be used as functions with mutable behavior. In this way, for example, lists can be processed by map using the procedure returned from a hash table by hash-table-accessor.

(hash-table-accessor hash-table [ failure [ success ] ])

Curried version of hash-table-ref. Returns a procedure of one argument, a key, which returns what hash-table-ref returns when invoked with the hash-table argument, the passed key, and the failure and success arguments.

(hash-table-accessor/default hash-table default)

Curried version of hash-table-ref/default. Returns a procedure of one argument, a key, which returns what hash-table-ref/default returns when invoked with the hash-table argument, the passed key, and the default argument.

Hash tables as sets

(hash-table-union! hash-table1 hash-table2 [ merger ])

Adds the associations of hash-table2 to hash-table1 and returns hash-table1. The values of keys that appear in both hash tables are set to the result of invoking the merger procedure, which accepts the arguments key1 value1 key2 value2 and defaults to (lambda (key1 value1 key2 value2) value2). (SRFI 69 hash-table-merge!)

(hash-table-intersection! hash-table1 hash-table2 [ merger ])

Deletes the associations from hash-table1 which don't also appear in hash-table2 and returns hash-table1, and sets the values of those keys that appear in both hash tables to the result of invoking the merger procedure, which accepts the arguments key1 value1 key2 value2 and defaults to (lambda (key1 value1 key2 value2) value2).

(hash-table-difference! hash-table1 hash-table2)

Deletes the associations of hash-table1 whose keys are also present in hash-table2 and returns hash-table1.

Exceptions

(hash-table-key-not-found? obj)

Returns #t if obj is an object raised by the hash-table-ref procedure or any other procedure that accesses hash tables when the key is not found and there is no failure procedure, and #f otherwise.

Implementation

The sample implementation (not yet written) is designed to be easily layered over any hash table implementation that supports either SRFI 69 or R6RS (there is an implementation of SRFI 69 on top of R6RS). It was originally intended to support all the native hash table systems mentioned in Sources above as well. However, this turned out not to be practical for the following reasons:

  • Gauche does not support arbitrary equality predicates, only eq?, eqv?, equal?, and string=?.
  • S7 does not support arbitrary equality predicates: the implementation chooses a predicate based on the nature of the keys.
  • Guile supports arbitrary equality predicates, but the equality predicate and hash function must be passed explicitly to all the primitive procedures. Consequently, hash tables corresponding to the present proposal would have to be records containing a Guile hash table, an equality predicate, and a hash function, which means that they could not interoperate directly with naked Guile hash tables.
  • SISC, Scheme48/scsh, RScheme, Scheme 9, and Rep do not provide any procedure that copies a hash table, nor any way of inspecting it to determine its equality predicates and hash functions so that it can be re-created.
  • SLIB hash tables are vectors, not disjoint objects.
  • FemtoLisp supports only equal? as the equality predicate.

As a result, the sample implementation assumes the existence of a SRFI 69 implementation, and imports just the core procedures make-hash-table, hash-table?, hash-table-ref/default, hash-table-set!, hash-table-delete!, hash-table-size, hash-table-walk, and hash-table-copy. New implementers who need to support the present proposal from scratch can take the implementations of these procedures from the SRFI 69 implementation and add them to this implementation.

To layer the implementation directly on top of R6RS, the corresponding procedures need to be imported from (rnrs hashtables). See the file hash-tables.sls in the implementation. It should also be possible to adapt the sample implementation for use on Racket, MIT, Gambit, and Bigloo native hash tables, all of which provide at least the above core.