Version 3 (modified by cowan, 6 years ago) (diff) |
---|

## Hash tables

This proposal defines an interface to hash tables, which are widely recognized as a fundamental data structure for a wide variety of applications. A hash table is a data structure that:

- Is disjoint from all other types.
- Provides a mapping from some set of
*keys*to some set of*values*associated to those keys.- Keys may be any Scheme objects in some kinds of hash tables, but are restricted in other kinds.
- Values may be any Scheme objects.

- Has no intrinsic order for the key-value
*associations*it contains. - Provides an
*equivalence function*which defines when a proposed key is the same as an existing key. No table may contain more than one value for a given key. - Supports mutation as the primary means of setting the contents of a table.
- Assumes that keys are immutable; mutating a key leads to unspecified behavior.
- Provides key lookup and destructive update in amortised constant time.

Incorporating this proposal as a standard feature in WG1 Scheme implementations makes it possible to write efficient and portable programs that use hash tables. (After reading a paper about SNOBOL and realizing that it had hash tables as a standard feature in 1969, six years before Scheme was born, and that every other modern language now has them, I decided it was intolerable for even WG1 Scheme not to have them.)

I have added equivalences to SRFI 69 and R6RS facilities. Systems supporting either should be able to support these hash tables trivially; it will be necessary to write `hash-table-fold` for R6RS systems. I have adopted SRFI 69's term `hash-table` rather than R6RS's `hashtable` because of our decision in #40. Besides, the word *hashtable* obviously means something that can be ... hashted.

The main annoyances of this proposal for SRFI 69 programmers will be remembering to supply a third argument to `hash-table-ref`, and for R6RS programmers will be remembering to insert a hyphen in `hashtable`. Both will have to get used to the new constructors.

## Constructors

These constructors provide suitable hash functions for the equivalence function specified as part of the constructor name. This proposal does not allow (semi-)arbitrary equivalence and hash functions to be specified.

`(make-eq-hash-table)`

Creates a new table with no associations whose equivalence function is `eq?`. (SRFI-69 `(make-hash-table eq? hash-by-identity)`; R6RS `(make-eq-hashtable)`)

`(make-equal-hash-table)`

Creates a new table with no associations whose equivalence function is `equal?`. (SRFI-69 `(make-hash-table equal? hash)`; R6RS `(make-hashtable equal? equal-hash)`)

`(make-string-hash-table)`

Creates a new table with no associations whose equivalence function is `string=?`. (SRFI-69 `(make-hash-table string=? string-hash)`; R6RS `(make-hashtable string=? string-hash)`)

`(make-string-ci-hash-table)`

Creates a new table with no associations whose equivalence function is `string-ci=?`. (SRFI-69 `(make-hash-table string-ci=? string-ci-hash)`; R6RS `(make-hashtable string-ci=? string-ci-hash)`)

Note that there are no hash tables whose equivalence function is `eqv?`, because SRFI 69 does not support them. Users will have to live with `eq?` or `equal?` hash tables as the case may be.

## Copiers

`(hash-table-copy `*hash-table*`)`

Creates a new hash table with the same equivalence predicate and associations as *hash-table*. (SRFI-69 `hash-table-copy`; R6RS `(hashtable-copy `*hash-table*` #t)`)

## Predicates

`(hash-table? `*obj*`)`

Returns `#t` if *obj* is a hash table. (SRFI-69 `hash-table?`; R6RS `hashtable?`)

`(hash-table-contains? `*hash-table*` `*key*`)`

Returns `#t` if there is any association to *key* in *hash-table*. Must execute in amortized O(1) time. (SRFI-69 `hash-table-exists?`; R6RS `hashtable-contains?`)

## Accessors

`(hash-table-ref `*hash-table*` `*key* *default*`)`

Returns the value associated to *key* in *table*. If no value is associated to *key*, *default* is returned. Must execute in amortized O(1) time. (SRFI-69 `hash-table-ref/default`; R6RS `hashtable-ref`)

`(hash-table-size `*table*`)`

Returns the number of associations in *hash-table*. Must execute in amortized O(1) time. (SRFI-69 `hash-table-size`; R6RS `hashtable-size`)

## Mutators

`(hash-table-set! `*hash-table*` `*key*` `*value*`)`

Creates a new association in *hash-table* that associates *key* with *value*. The previous association (if any) is removed. It is an error if *table* is a string or string-ci hash table and *key* is not a string. Must execute in amortized O(1) time. (SRFI-69 `hash-table-set!`; R6RS `hashtable-set!`)

`(hash-table-delete! `*hash-table*` `*key*`)`

Removes any association to *key* in *hash-table*. It is not an error if no association for that *key* exists. Must execute in amortized O(1) time. (SRFI-69 `hash-table-delete!`; R6RS `hashtable-delete!`)

`(hash-table-update! `*hash-table*` `*key*` `*procedure* *default* ]`)`

Semantically equivalent to, but may be implemented more efficiently than, the following code:

`(hash-table-set! `*hash-table*` `*key*` (`*procedure* `(hash-table-ref `*hash-table*` `*key*` `*default*`)))`

Must execute in amortized O(1) time. (SRFI-69 `hash-table-update!/default`; R6RS `hashtable-update!`)

## Iterators

`(hash-table-fold `*hash-table*` `*procedure*` `*init*`)`

Calls *procedure* for every association in *hash-table* with three arguments: the key of the association, the value of the association, and an accumulated value *val*. *Val* is *init* for the first invocation of *procedure*, and for subsequent invocations of *procedure*, the return value of the previous invocation. The value returned by `hash-table-fold` is the return value of the last invocation of f. The order in which *procedure* is called for different associations is unspecified. (SRFI-69 `hash-table-fold; not in R6RS but easily implemented: see below.)

## Excluded features

The following features are **not** part of this proposal for various reasons.

### Incompatible return values

`Hash-table-keys` is present in both SRFI 69 and R6RS, but returns a list of keys in the first, a vector of keys in the second.

`Hash-table-values` returns a list of values in SRFI 69; `hashtable-entries` returns two values, a vector of keys and a vector of values in R6RS.

`Hash-table-hash-function` always returns the hash function of a hash table in SRFI 69, but returns `#f` in the case of `eq?` (and `eqv?`) hash tables in R6RS.

`Hash-table-equivalence-function` returns the equivalence function, but in R6/R7RS you can't reliably check functions for identity, so you don't know what you've got.

### In R6RS but not in SRFI 69

`Hashtable-clear!`, which removes all associations from a hash table.

Specifying the initial capacity of a hash table.

Immutable hash tables.

Hash tables based on `eqv?`.

### In SRFI 69 but not in R6RS

`Hash-table-walk`, which is the analogue of `for-each` for hash tables.

Conversion from a-lists to hash tables and back.

`Hash-table-merge`, which destructively adds all associations in a source hash table to a destination hash table.

Default values specified in the form of a thunk to call.

### In Common Lisp but not Scheme

The current capacity (as opposed to size) of a hash table.

Rehash size and threshold.

Hash tables based on `equalp` (which is not in Scheme).

`With-hash-table-iterator`, a hash table external iterator implemented as a local macro.

`Sxhash`, a stable hash function.

## R6RS implementation of `hash-table-fold`

(define (hash-table-fold h proc init) (let-values (((k v) (hashtable-entries h)) ((s) (hashtable-size h))) (do ((i 0 (+ 1 i)) (val init (proc (vector-ref k i) (vector-ref v i) val))) ((= i s) val))))