This site is a static rendering of the Trac instance that was used by R7RS-WG1 for its work on R7RS-small (PDF), which was ratified in 2013. For more information, see Home. For a version of this page that may be more recent, see HashTablesCowan in WG2's repo for R7RS-large.

Hash­Tables­Cowan

cowan
2015-06-21 10:50:07
121Flushing hash-table-search!history
source

Hash tables

This WG2 proposal defines an interface to hash tables, which are widely recognized as a fundamental data structure for a wide variety of applications. A hash table is a data structure that:

In addition, procedures for use on bimaps (bidirectional hash tables) are defined.

Issues

Rationale

Hash tables themselves don't really need defending: almost all dynamically typed languages, from awk to JavaScript to Lua to Perl to Python to Common Lisp, and including many Scheme implementations, provide them in some form as a fundamental data structure. Therefore, what needs to be defended is not the data structure but the procedures. The present proposal is at an intermediate level. It supports a great many convenience procedures on top of the basic hash table interfaces provided by SRFI 69 and R6RS. Nothing in it adds power to what those interfaces provide, but it does add convenience in the form of pre-debugged routines to do various common things, and even some things not so commonly done but useful.

There is no mandated support for thread safety, immutability, or weakness, though there are semi-portable hooks for specifying these features.

This specification accepts separate equality predicates and hash functions for backward compatibility, but strongly encourages the use of SRFI 114 comparators, which package a type test, an equality predicate, and a hash function into a single bundle.

Bimaps are just a convenience structure based on a pair of hash tables, one that maps keys to unique values, the other that maps the values back to their keys. By providing mutation procedures, it becomes trivial to keep the two hash tables consistent.

SRFI 69 compatibility

Names

The names used in the present proposal are mostly derived from SRFI 69, with the following changes:

Calling conventions

Reflection and hash-function procedures

SRFI 69 provides reflective procedures that, given a hash table, returns its equality predicate and hash function, as well as procedures that expose the implementation's hash functions suitable for the equality predicates eq?, equal?, string=?, and string-ci=?. The second of these can also be used for eqv?. However, if the the value of eq? hash function is not idempotent but depends on the memory address of the key, and the garbage collector moves such a key, it must also rehash every hash table containing that key. In such implementations, the hash-by-identity procedure is unsafe to use outside the context of implementation-provided hash tables.

R6RS eliminates this issue by providing separate constructors for eq? and eqv? hash tables, and refusing to expose the hash functions for them. However, the present proposal takes the radical option of exposing neither reflection nor implementation-based hash functions. Instead, implementations are permitted to ignore user-provided hash functions in certain circumstances if they have address-based hash functions available. They can of course be exposed by implementations as extensions, with suitable warnings against inappropriate uses.

R6RS compatibility

The relatively few hash table procedures in R6RS are all available in the present proposal under somewhat different names. The only substantive difference is that R6RS hashtable-values and hashtable-entries return vectors, whereas in the present proposal hash-table-values and hash-table-entries return lists. The present proposal adopts SRFI 69's term hash-table rather than R6RS's hashtable, because of the universal use of "hash table" rather than "hashtable" in other languages and in technical prose generally. Besides, the English word hashtable obviously means something that can be ... hashted.

In addition, the hashtable-ref and hashtable-update! of R6RS correspond to the hash-table-ref/default and hash-table-update!/default of both SRFI 69 and the present proposal.

It would be trivial to provide the R6RS names (or for that matter the SRFI 69 names) on top of the present proposal.

Common Lisp compatibility

As usual, the Common Lisp names are completely different from the Scheme names. Common Lisp provides the following capabilities that are not in the present proposal:

Sources

The procedures in the present proposal are drawn primarily from SRFI 69 and R6RS. In addition, the following sources are acknowledged:

The predicate hash-table-empty?, as well as the mutation procedures hash-table-set-entries!, hash-table-replace!, hash-table-replace!/default, hash-table-map!, hash-table-intersection!, and hash-table-difference!, were added for completeness.

The native hash tables of MIT, Guile, SISC, Bigloo, Scheme48, SLIB, RScheme, Scheme 7, Scheme 9, Rep, and FemtoLisp were also investigated, but no additional procedures were incorporated.

Pronunciation

The slash in the names of some procedures can be pronounced "with".

Acknowledgements

Some of the language of the present proposal is copied from SRFI 69 with thanks to its author, Panu Kalliokoski. However, he is not responsible for what I have done with it.

Specification

The procedures in the present proposal are in the (srfi xxx) library (or (srfi :xxx) on R6RS).

All references to "executing in expected amortized constant time" presuppose that a satisfactory hash function is available. Arbitrary or non-idempotent hash functions can make a hash of any implementation.

Hash tables are allowed to cache the results of calling the equality predicate and hash function, so programs cannot rely on the hash function being called exactly once for every primitive hash table operation: it may be called zero, one, or more times.

It is an error if the procedure argument of hash-table-find, hash-table-count, hash-table-map, hash-table-map-values, hash-table-for-each, hash-table-map!, hash-table-collect, or hash-table-fold mutates the hash table being walked.

It is an error to operate on two hash tables that have different comparators or equality predicates.

It is an error to mutate a key during or after its insertion into a hash table.

Index

Constructors

(make-hash-table comparator [ arg ... ])

(make-hash-table equality-predicate [ hash-function ] [ arg ... ])

Returns a newly allocated hash table whose equality predicate and hash function are extracted from comparator. Alternatively, for backward compatibility with SRFI 69 the equality predicate and hash function can be passed as separate arguments; this usage is deprecated. Note that unlike the hash functions packaged as part of SRFI 114 comparators, SRFI 69 hash functions accept two arguments, the object to be hashed and a non-negative integer that bounds the hash code.

If an equality predicate rather than a comparator is provided, the ability to omit the hash-function argument is severely limited. The implementation must provide hash functions appropriate for use with the predicates eq?, eqv?, equal?, string=?, and string-ci=?, and may extend this list. But if any unknown equality predicate is provided without a hash function, an error should be signaled.

It is an error if the equality predicate does not accept two arguments and return a truth value. It is also an error if the hash function does not accept one argument in the domain of the equality predicate and return a non-negative exact integer. It is the programmer's responsibility to ensure that if two objects are the same in the sense of the equality predicate, then they return the same value when passed to the hash function. However, the converse is not required.

If the equality predicate, whether passed as part of a comparator or explicitly, is more fine-grained (in the sense of R7RS-small section 6.1) than equal?, the implementation is free -- indeed, is encouraged -- to ignore the programmer-specified hash function and use something implementation-dependent. This allows the use of addresses as hashes, in which case the keys must be rehashed if they are moved by the garbage collector.

The meaning of any further arguments is implementation-dependent. However, implementations which support the ability to specify the initial capacity of a hash table should interpret a non-negative exact integer as the specification of that capacity. In addition, if the symbols thread-safe, weak-keys or weak-values are present, implementations should create thread-safe hash tables, hash tables with weak keys, and hash tables with weak values respectively. In an implementation which does not support these features, an error should be signaled if they are requested. To avoid collision with the hash-function argument, none of these arguments can be procedures.

(SRFI 69 make-hash-table; R6RS make-eq-hashtable, make-eqv-hashtable, and make-hashtable; Common Lisp make-hash-table)

(hash-table comparator ( key value ) ...)

Returns a newly allocated hash table, created as if by make-hash-table using comparator. For each pair of arguments, an association is added to the new hash table with key as its key and value as its value. If the implementation supports immutable hash tables, this procedure returns an immutable hash table.

(hash-table-tabulate comparator n proc)

Returns a newly allocated hash table, created as if by make-hash-table using comparator. For each integer from 0 to n - 1, proc is invoked on it, returning two values. The values are used as the key and value of an association added to the new hash table. If the implementation supports immutable hash tables, this procedure returns an immutable hash table.

(hash-table-unfold stop? mapper successor seed comparator arg ... ])

Create a new hash table as if by make-hash-table using comparator and the args. If the result of applying the predicate stop? to seed is true, return the hash table. Otherwise, apply the procedure mapper to seed. Mapper returns two values, which are inserted into the hash table as the key and the value respectively. Then get a new seed by applying the procedure successor to seed, and repeat this algorithm.

(alist->hash-table alist comparator arg ...)

(alist->hash-table alist equality-predicate [ hash-function ] arg ...)

Returns a newly allocated hash-table as if by make-hash-table using comparator and the args. It is then initialized from the associations of alist. Associations earlier in the list take precedence over those that come later. The second form is for compatibility with SRFI 69, and is deprecated. (SRFI 69 alist->hash-table)

Predicates

(hash-table? obj)

Returns #t if obj is a hash table, and #f otherwise. (SRFI 69 hash-table?; R6RS hashtable?; Common Lisp hash-table-p)

(hash-table-contains? hash-table key)

Returns #t if there is any association to key in hash-table, and #f otherwise. Must execute in expected amortized constant time. (SRFI 69 hash-table-exists?; R6RS hashtable-contains?)

(hash-table-empty? hash-table)

Returns #t if hash-table contains no associations, and #f otherwise.

(hash-table=? value-comparator hash-table1 hash-table2)

Returns #t if hash-table1 and hash-table2 have the same keys (in the sense of their common equality predicate) and each key has the same value (in the sense of value-comparator), and #f otherwise.

(hash-table-mutable? hash-table)

Returns #t if the hash table is mutable. Implementations may or may not support immutable hash tables. (R6RS hashtable-mutable?)

Accessors

The following procedures, given a key, return the corresponding value.

(hash-table-ref hash-table key [ failure [ success ] ])

Extracts the value associated to key in hash-table, invokes the procedure success on it, and returns its result; if success is not provided, then the value itself is returned. If key is not contained in hash-table and failure is supplied, then failure is invoked on no arguments and its result is returned. Otherwise, it is an error. Must execute in expected amortized constant time, not counting the time to call the procedures. (SRFI 69 hash-table-ref does not support the success procedure)

(hash-table-ref/default hash-table key default)

Semantically equivalent to, but may be more efficient than, the following code: (hash-table-ref hash-table key (lambda () default))

(SRFI 69 hash-table-ref/default, R6RS hashtable-ref; Common Lisp gethash)

Mutators

The following procedures alter the associations in a hash table either unconditionally, or conditionally on the presence or absence of a specified key. It is an error to add an association to a hash table whose key does not satisfy the type test predicate of the comparator used to create the hash table.

(hash-table-set! hash-table ( key value ) ...)

Repeatedly mutates hash-table, creating new associations in hash-table that associates each key with the value that follows it. If there is a previous association for key, it is deleted. It is an error if the type check procedure of the comparator of hash-table, when invoked on key, does not return #t. Likewise, it is an error if key is not a valid argument to the equality predicate of hash-table. Returns an unspecified value. Must execute in expected amortized constant time per key. (SRFI 69 hash-table-set!, R6RS hashtable-set!, and Common Lisp (setf gethash) do not handle multiple associations)

(hash-table-set-entries! hash-table keys-list values-list)

Repeatedly mutates hash-table, setting each element of keys-list to the corresponding element of values-list in the order in which they are specified. Excess keys or values are ignored.

(hash-table-delete! hash-table key ...)

Deletes any association to each key in hash-table and returns the number of keys that had associations. Must execute in expected amortized constant time per key. (SRFI 69 hash-table-delete!, R6RS hashtable-delete!, and Common Lisp remhash do not handle multiple associations)

(hash-table-delete-keys! hash-table keys-list)

Semantically equivalent to, but may be more efficient than, the following code:

(for-each (lambda (key) (hash-table-delete! hash-table key)) keys-list)

(hash-table-extend! hash-table key [ failure [ success ] ])

Effectively invokes hash-table-ref with the given arguments and returns what it returns. If key was not found in hash-table, its value is set to the result being returned. Must execute in expected amortized constant time.

(hash-table-extend!/default hash-table key default)

Effectively invokes hash-table-ref/default with the given arguments and returns what it returns. If key was not found in hash-table, its association is set to the result being returned. Must execute in expected amortized constant time.

(hash-table-replace! hash-table key [ failure [ success ] ])

Effectively invokes hash-table-ref with the given arguments and returns what it returns. If key was found in hash-table, its value is set to the result being returned. Must execute in expected amortized constant time.

(hash-table-replace!/default hash-table key default)

Effectively invokes hash-table-ref/default with the given arguments and returns what it returns. If key was found in hash-table, its value is set to the result being returned. Must execute in expected amortized constant time.

(hash-table-update! hash-table key updater [ failure [ ''success ] ])

Semantically equivalent to, but may be more efficient than, the following code:

(hash-table-set! hash-table key (updater (hash-table-ref hash-table key failure success)))

Must execute in expected amortized constant time. Returns an unspecified value. (SRFI 69 hash-table-update!/default and R6RS hashtable-update! do not support the success procedure)

(hash-table-update!/default hash-table key updater default)

Semantically equivalent to, but may be more efficient than, the following code:

(hash-table-set! hash-table key (updater (hash-table-ref/default hash-table key default)))

Must execute in expected amortized constant time. Returns an unspecified value. (SRFI 69 hash-table-update!)

(hash-table-push! hash-table key value failure)

If an association with key is found in hash-table, then update the value associated with key to the result of invoking cons on value and the original value. If the key is not found, a new association between key and the result of invoking the thunk failure is added to hash-table. The return value of hash-table-push! is unspecified. It is an error if an existing value is not a pair.

(hash-table-pop! hash-table key failure)

If an association with key is found in hash-table, then return the car of the value, and update the value to its own cdr. If the value is not found, then the result of invoking the thunk failure is returned. It is an error if the existing value is not a pair.

(hash-table-clear! hash-table)

Delete all the associations from hash-table. (R6RS hashtable-clear!; Common Lisp clrhash)

The whole hash table

These procedures process the associations of the hash table in an unspecified order.

(hash-table-size hash-table)

Returns the number of associations in hash-table as an exact integer. Should execute in constant time. (SRFI 69 hash-table-size, R6RS hashtable-size; Common Lisp hash-table-count.)

(hash-table-find hash-table proc failure)

For each association of hash-table, invoke proc on its key and value. If proc returns true, then hash-table-find returns what proc returns. If all the calls to proc return #f, return the result of invoking the thunk failure.

(hash-table-count hash-table pred)

For each association of hash-table, invoke pred on its key and value. Return the number of calls to pred which returned true.

(hash-table-any proc hash-table)

Calls proc for as many associations in hash-table as necessary with two arguments, the key and the value of the association. If any invocation of proc returns true, hash-table-any immediately returns whatever that invocation returns; otherwise it returns #f.

(hash-table-every proc hash-table)

Calls proc for as many associations in hash-table as necessary with two arguments, the key and the value of the association. If any invocation of proc returns false, hash-table-every immediately returns #f; otherwise it returns #t.

Mapping and folding

These procedures process the associations of the hash table in an unspecified order.

(hash-table-map proc comparator merger hash-table)

Returns a newly allocated hash table as if by make-hash-table. Calls proc for every association in hash-table with two arguments: the key of the association and the value of the association. The two values returned by proc are inserted into the new hash table as a key and value.

When the key being added is equal (in the sense of comparator) to a key already inserted in the new hash table, the procedure merger is called with arguments oldkey oldvalue newkey newvalue and returns the key to be associated with the new value.

(hash-table-map-values proc comparator hash-table)

Returns a newly allocated hash table as if by make-hash-table. Calls proc for every association in hash-table with the value of the association. The key of the association and the result of invoking proc are entered into the new hash table.

(hash-table-for-each proc hash-table)

Calls proc for every association in hash-table with two arguments: the key of the association and the value of the association. The value returned by proc is discarded. Returns an unspecified value. (SRFI 69 hash-table-walk with the arguments reversed; Common Lisp maphash)

(hash-table-map! proc hash-table)

Calls proc for every association in hash-table with two arguments: the key of the association and the value of the association. The value returned by proc is used to update the value of the association. Returns an unspecified value.

(hash-table-collect proc hash-table)

Calls proc for every association in hash-table with two arguments: the key of the association and the value of the association. The values returned by the invocations of proc are accumulated into a list, which is returned.

(hash-table-fold proc init hash-table)

Calls proc for every association in hash-table with three arguments: the key of the association, the value of the association, and an accumulated value val. Val is init for the first invocation of procedure, and for subsequent invocations of procedure, the returned value of the previous invocation. The value returned by hash-table-fold is the return value of the last invocation of proc. (SRFI 69 hash-table-fold has the hash-table as the first argument)

(hash-table-filter! proc hash-table)

Calls proc for every association in hash-table with two arguments, the key and the value of the association, and removes all associations for which proc returns false from hash-table, which is returned.

(hash-table-remove! proc hash-table)

Calls proc for every association in hash-table with two arguments, the key and the value of the association, and removes all associations for which proc returns true from hash-table, which is returned.

Copying and conversion

(hash-table-copy hash-table [ mutable? ])

Returns a newly allocated hash table with the same properties and associations as hash-table. If the second argument is present and is true, the new hash table is mutable. Otherwise it is immutable provided that the implementation supports immutable hash tables. (SRFI 69 hash-table-copy does not support a second argument; R6RS hashtable-copy)

(hash-table-keys hash-table)

Returns a newly allocated list of all the keys in hash-table. (SRFI 69 hash-table-keys; R6RS hashtable-keys returns a vector)

(hash-table-values hash-table)

Returns a newly allocated list of all the keys in hash-table. (SRFI 69 hash-table-values)

(hash-table-entries hash-table)

Returns two values, a newly allocated list of all the keys in hash-tableand a newly allocated list of all the values in hash-table in the corresponding order. (R6RS hash-table-entries returns vectors)

(hash-table->alist hash-table)

Returns an alist with the same associations as hash-table in an unspecified order. (SRFI 69)

Hash tables as functions

The following procedures provide functions with mutable behavior based on hash tables. In this way, for example, lists can be processed by map using the procedure returned from a hash table by hash-table-accessor.

(hash-table-accessor hash-table [ failure [ success ] ])

Curried version of hash-table-ref. Returns a procedure of one argument, a key, which returns what hash-table-ref returns when invoked with the the passed arguments.

(hash-table-accessor/default hash-table default)

Curried version of hash-table-ref/default. Returns a procedure of one argument, a key, which returns what hash-table-ref/default returns when invoked with the passed arguments.

Hash tables as sets

(hash-table-union! hash-table1 hash-table2)

Adds the associations of hash-table2 to hash-table1 and returns hash-table1. If a key appears in both hash tables, its value is set to the value appearing in hash-table1. Returns 'hash-table,,1,,''. (SRFI 69 hash-table-merge!)

(hash-table-intersection! hash-table1 hash-table2)

Deletes the associations from hash-table1 which don't also appear in hash-table2 and returns hash-table1.

(hash-table-difference! hash-table1 hash-table2)

Deletes the associations of hash-table1 whose keys are also present in hash-table2 and returns hash-table1.

(hash-table-xor! hash-table1 hash-table2)

Deletes the associations of hash-table1 whose keys are also present in hash-table2, and then adds the associations of hash-table2 whose keys are not present in 'hash-table1,,'' to 'hash-table1,,. Returns hash-table,,1,,''.

Bimaps

A bimap is built by starting with a hash table which represents the forward mapping from keys to values; it is an error if the values are not unique. A second hash table is constructed and populated with the reverse mapping. It is possible to retrieve either underlying hash table for read-only operations, but it is an error to mutate them, so bimap mutation procedures are provided.

(make-bimap hash-table comparator arg ...)

Returns a newly allocated bimap whose forward hash table is hash-table and whose reverse hash table is newly allocated using comparator and any args. It is an error if any value in hash-table is not unique in the sense of comparator. It is an error to mutate hash-table after this procedure returns, as it shares structure with the bimap.

(bimap? obj)

Returns #t if obj is a bimap and #f otherwise.

(bimap-forward-hash-table bimap)

Returns the original hash table passed to the bimap. It is an error to mutate this hash table.

(bimap-reverse-hash-table bimap)

Returns the reverse hash table created by make-bimap. It is an error to mutate this hash table.

(bimap-contains? bimap key)

Returns #t if key is contained in one of the associations of the bimap, and #f otherwise.

(bimap-contains-value? bimap value)

Returns #t if value is contained in one of the associations of the bimap, and #f otherwise.

(bimap=? bimap1 bimap2)

Returns #t if bimap1 and bimap2 contain the same key-value associations. Two associations key1/value1 and key2/value2 are the same if and only if the reverse hash tables of both bimap1 and bimap2 map value,,1,,' and value,,2,,'' to keys that are the same in the sense of the equivalence procedures of both tables. Returns #f otherwise.

(bimap-ref bimap key [ failure [ success ] ])

Returns what (hash-table-ref (bimap-forward-hash-table bimap) key failure success) returns.

(bimap-ref/default bimap key default)

Returns what (hash-table-ref (bimap-forward-hash-table bimap) key default) returns.

(bimap-value-ref bimap value [ failure [ success ] ])

Returns what (hash-table-ref (bimap-reverse-hash-table bimap) value failure success) returns.

(bimap-value-ref/default bimap value default)

Returns what (hash-table-ref (bimap-reverse-hash-table bimap) value default) returns.

(bimap-copy bimap [ immutable? ])

Returns a newly allocated bimap with the same properties and associations as bimap. If the second argument is present and is true, the new bimap is mutable. Otherwise it is made from immutable hash tables and itself immutable, provided the implementation supports immutable hash tables.

The mutation procedures bimap-set!, bimap-set-entries!, bimap-delete!, bimap-delete-keys!, bimap-extend!, bimap-extend/default!, bimap-replace!, bimap-replace/default!, bimap-update!, bimap-update/default!, bimap-clear!, bimap-filter!, and bimap-remove! have the same behavior as their hash table analogues, mutating both hash tables appropriately.

Implementation

The sample implementation is easily layered over any hash table implementation that supports either SRFI 69 or R6RS. The sample implementation is built directly on top of (r6rs hashtables) (available at http:snow-fort.org/pkg), whose portable implementation uses (rnrs hashtables) if that library is available, or SRFI 69 if that library is available, or a slightly improved version of the SRFI 69 reference implementation if no better alternative is available. There is also an [https:github.com/larcenists/larceny/blob/master/lib/SRFI/srfi-69.sch implementation of SRFI 69] on top of the (rnrs hashtables) library of R6RS, which shows how to construct an implementation of SRFI 69 that would be fully compatible with the sample implementation of this proposal.

The hash tables specified here are fully compatible with hash tables as specified by the R6RS, but there may be conflicts when particular implementations of the two libraries (this proposal and R6RS) are imported into a single library or program. There will be no conflicts when the sample implementation of this proposal is imported along with the reference implementation of (r6rs hashtables) used by that sample implementation.

The hash tables specified here are mostly compatible with hash tables as specified by SRFI 69, but there will almost certainly be conflicts when a library or program imports both the library specified here and an implementation of SRFI 69. It should be possible to construct an implementation of SRFI 69 that does not conflict with the sample implementation of this proposal or with (r6rs hashtables), but all non-conflicting implementations of SRFI 69 must

As of this writing, there are no fully compatible implementations of SRFI 69.

The sample implementation was originally intended to support all the native hash table systems mentioned in [#Sources Sources] above as well. However, this turned out not to be practical for the following reasons:

Native Guile hash tables are a special case. The equivalents of hash-table-ref/default, hash-table-set!, and hash-table-delete require the equality predicate and hash function to be passed to them explicitly (although there are utility functions for eq?, eqv?, and equal? hash tables). Consequently, hash tables corresponding to the present proposal would have to be records containing a Guile hash table, an equality predicate, and a hash function, which means they could not interoperate directly with native Guile hash tables.