ImmutableDataStructuresWortman

Introduction

This proposal defines immutable data structures for queues, sets, and maps. A structure is immutable when all its operations leave the structure unchanged. Note that none of the procedures specified here ends with a !.

Immutable structures are sometimes called persistent and are closely related to pure-functional (a.k.a. pure) structures. The availability of immutable data structures facilitates writing efficient programs in the pure-functional style.

Efficiency bounds

We specify required time efficiency upper bounds using big-O notation. We note when, in some cases, there is "slack" between the required bound and the theoretically optimal bound for an operation. Implementations may use data structures with amortized time bounds, but should document which bounds hold in only an amortized sense. The use of randomized data structures with expected time bounds is discouraged.

Disjoint types

Deques, sets, and maps are disjoint from all other Scheme types, and deques are disjoint from sets and maps. It is unspecified whether sets and maps are disjoint.

Deque

A deque (conventionally pronounced "deck") is a queue data structure that supports efficient manipulation of either of its ends. It may be used in place of a (LIFO) stack or (FIFO) queue.

The unlabeled finger tree data structure can meet all these requirements rather conveniently. A pair of lists is also a suitable implementation.

Construction

(ideque [ element ...])

Returns a deque containing the elements. The leftmost element (if any) will be at the front of the deque and the rightmost element (if any) will be at the back. Takes O(n) time, where n is the number of elements.

(ideque-tabulate n proc)

Invokes proc successively on the integers 0 through n - 1 inclusive, and returns a deque containing the results in the given order. Takes O(n) time.

(ideque-unfold stop? mapper successor seed)

Invokes the predicate stop? on seed. If it returns false, generate the next result by applying mapper to seed, generate the next seed by applying successor to seed, and repeat the algorithm with the new seed. If stop? returns true, return a deque containing the results in order of accumulation. Takes O(n) time.

(ideque-unfold-right stop? mapper successor seed)

Predicates

(ideque? x)

Returns #t if x is an ideque, and #f otherwise. Takes O(1) time.

(ideque-empty? deque)

Returns #t if deque contains zero elements, and #f otherwise. Takes O(1) time.

Queue operations

(ideque-front deque)

(ideque-back deque)

Returns the front/back element of deque. It is an error for deque to be empty. Each takes O(1) time.

(ideque-remove-front deque)

(ideque-remove-back deque)

Returns a deque with the front/back element of deque removed. It is an error for deque to be empty. Each takes O(1) time.

(ideque-add-front deque obj)

(ideque-add-back deque obj)

Returns a deque with obj pushed to the front/back of deque. Each takes O(1) time.

Other accessors

(ideque-take deque n)

(ideque-take-right deque n)

Returns a deque containing the first/last n elements of deque. Takes O(n) time.

(ideque-drop deque n)

(ideque-drop-right deque n)

Returns a deque containing all but the first/last n elements of deque. Takes O(n) time.

(ideque-split-at deque n)

Returns two values, the results of (ideque-take deque n) and (ideque-drop deque n) respectively. Takes O(n) time.

The whole deque

(ideque-length deque)

Returns the length of deque as an exact integer. May take O(n) time, though O(1) is optimal.

(ideque-append deque ...)

Returns a deque with the contents of the first argument deque followed by the others. Takes O(kn) time, where k is the number of deques and n is the number of elements involved, though O(k log n) is possible.

(ideque-concatenate list-of-deques)

Returns a deque with the contents of the first deque in list-of-deques followed by the others. This is provided for Schemes in which the number of arguments which can be passed to apply is limited. Takes O(kn) time, where k is the number of deques and n is the number of elements involved, though O(k log n) is possible.

(ideque-reverse deque)

Returns a deque containing the elements of deque in reverse order. Takes O(n) time.

(ideque-count pred deque)

Returns the number of elements of deque which satisfy pred as an exact integer. Takes O(n) time.

(ideque-delete pred deque)

Returns a deque containing the elements of deque, except for those which satisfy pred. Takes O(n) time.

(ideque-zip deque ...)

Returns a deque of lists (not deques) each of which contains the corresponding element of the argument deques in the order specified. Processing stops when all the elements of any deque have been seen. Takes O(kn) time, where k is the number of deques and n is the number of elements involved.

Mapping

(ideque-map proc deque)

(ideque-for-each proc deque)

(ideque-fold proc deque)

(ideque-fold-right proc deque)

(ideque-append-map proc deque)

Filtering

(ideque-filter proc deque)

(ideque-remove proc deque)

(ideque-partition proc deque)

Searching

(ideque-find proc deque)

(ideque-find-right proc deque)

(ideque-take-while proc deque)

(ideque-take-while-right proc deque)

(ideque-drop-while proc deque)

(ideque-drop-while-right proc deque)

(ideque-map proc deque)

(ideque-span proc deque)

(ideque-break proc deque)

(ideque-any? proc deque)

(ideque-every? proc deque)

Conversion

(list->ideque list)

(ideque->list deque)

Conversion between deque and list structures. FIFO order is preserved, so the front of a list corresponds to the front of a deque. Each operation takes O(n) time.

Set

A sorted set data structure stores a finite collection of unique elements with a defined comparator.

These requirements can be satisfied by many flavors of self-balancing binary trees. Red-black trees, 1-2 brother trees, and labeled 2-3 finger trees are particularly convenient in an immutable context.

Construction

` (iset/merger comparator merger [element...]) (iset comparator [element...]) ` Returns a set using containing elements element.... iset uses set-merger-right. O(n log n) time.

Set-wide queries

` (iset? x) (iset-empty? set) (iset-size set) (iset-comparator set) `

iset-size may take O(n) time, though O(1) is optimal. The other procedures take O(1) time.

Set-theoretic operations

` (iset-union set... [merger]) (iset-intersection set... [merger]) (iset-difference left set...) (iset-xor set1 set2) `

Returns a set containing the union/intersection/difference/symmetric difference of the arguments. All the arguments must be sets sharing an equivalent order predicate. The set operator is applied in a left-associative order. May take O(kn log n) time, where k is the number of sets and n is the number of elements involved, though O(kn) time is optimal.

Element queries

` (iset-member? set x) `

Returns non-#false iff x is an element of set, or #false otherwise. Takes O(log n) time.

` (iset-min set) (iset-max set) `

Returns the least/greatest element of set. It is an error for set to be empty. Takes O(log n) time; O(1) is optimal.

Element operations

` (iset-update set x present-proc absent-proc) `

A continuation-based universal update procedure. Attempts to find an element match equal to x in set according to the order predicate. When such a match is found, calls ` (present-proc match update remove) ` which must either tail-call ` (update new-element ret) ` to replace match with new-element, or tail-call ` (remove ret) ` to remove match from set. In either case ret will be one of the values returned from the enclosing iset-update call.

When no such match is found, iset-update calls ` (absent-proc insert ignore) ` which should either tail-call ` (insert ret) ` to insert x into set, or else tail-call ` (ignore ret) ` to leave set unchanged. Again, ret will be returned from the enclosing iset-update call.

In any case, iset-update returns ` (values new-set ret) ` where new-set is a set reflecting the indicated modification, if any; and ret is the returns value produced by one of the continuations. It runs in O(log n) time.

(This procedure is based on an analogous procedure for hash tables suggested by Alexey Radul and attributed to Taylor Campbell.)

` (iset-find set x [absent-thunk]) ` Returns the element match equal to x in set, or the result of evaluating (absent-thunk) if no such element exists. If absent-thunk is unspecified then a default procedure that always returns #false is used. Takes O(log n) time.

` (iset-include set x [merger]) (iset-exclude set x) ` Returns a set that certainly does/doesn't include x. merger is called when set already includes an element equal to x according to the set's comparator, with left as the object already in set and right as x. Each operation takes O(log n) time.

` (iset-predecessor set x [absent-thunk]) (iset-successor set x [absent-thunk]) ` Returns the element that immediately precedes/succeeds x in set's ordering. If no such element exists because x is the minimum/maximum element, or because set is empty, returns the result of evaluating (absent-thunk). If absent-thunk is unspecified then a default procedure that always returns #false is used. Takes O(log n) time.

Range queries

` (iset-between set least include-least most include-most) `

Returns a set containing the elements of set in the interval between least and most. If include-least/include-most is non-#false then the result includes an element equal to least/most respectively; otherwise those elements are not included. Takes O(k log k + log n) time, where n is the number of elements in the set and k is the number of elements returned; O(k + log n) is optimal.

` (iset-range< set x) (iset-range<= set x) (iset-range= set x) (iset-range>= set x) (iset-range> set x) `

Returns a set containing only the elements of set that are less/less-or-equal/equal/greater-or-equal/greater than x. Takes O(k log k + log n) time, where n is the number of elements in the set and k is the number of elements returned; O(k + log n) is optimal.

Note that since set elements are unique, iset-range= returns a set of at most one element.

Filter, fold, map

` (iset-filter predicate? set) `

Returns a set containing only those elements x for which (predicate? x) returns non-#false. Takes O(n log n) time; O(n) is optimal.

` (iset-fold proc base set) ` The fundamental set iterator. Equivalent to, but may be more efficient than, ` (fold proc base (iset->ordered-list set)) ` Takes O(n) time.

` (iset-map/monotone proc set [comparator]) ` Returns a set containing the elements (proc x) for every x in set. proc must be a monotone unary procedure that preserves the order of set elements. Observe that mapping a set of unique elements with a monotone function yields a set of unique elements, so element uniqueness follows from the monotonicity assumption. If comparator is given, it is the comparator of the set; otherwise the set uses the same order predicate as set. Takes O(n) time.

` (iset-map proc set [merger [comparator]]) ` As iset-map/monotone, except the proc is not required to be monotone. merger is used to select among any duplicates that might be created, and defaults to set-merger-right. O(n log n) time.

Conversion

` (iset->ordered-list set) `

Returns a list containing the elements of set in increasing order. O(n) time.

` (ordered-list->iset comparator list) `

Returns a set containing the elements of list and using comparator comparator. It is an error for list to be anything other than a proper list of elements in increasing order. O(n log n) time; O(n) is optimal.

` (list->iset comparator list [merger]) `

Returns a set containing the elements of list and using comparator comparator. list must be a proper list, but may contain duplicates and need not be in order. merger defaults to set-merger-right. O(n log n) time.

Maps

A map data structure stores key-value associations from a set of keys with an order predicate to arbitrary value objects. It is an alternative to an association list or hash table, which also store key-value associations, but with different key constraints and efficiency guarantees.

In the same way that a list of key-value dotted pairs can implement an association list, a set of key-value dotted pairs can implement a map. Implementations may use this approach, or may implement a distinct data structure specific to maps.

Construction

Map-wide queries

` (imap? x) (imap-empty? map) (imap-size map) (imap-order-predicate map) `

The behavior and efficiency of these operations is the same as the analogous set procedures.

Set-theoretic operations

` (imap-union map...) (imap-intersection map...) (imap-difference map...) (imap-xor map1 map2) `

The behavior and efficiency of these operations is the same as the analogous set procedures.

Element queries

` (imap-member? map key) (imap-min-key map) (imap-max-key map) `

The behavior and efficiency of these operations is the same as the analogous set procedures.

` (imap-min-value map) (imap-max-value map) `

Returns the value associated with (imap-min-key map) and (imap-max-key map) respectively. O(log n) time; O(1) is optimal.

Element operations

` (imap-update map key absent-proc present-proc) `

A continuation-based universal update procedure. Attempts to find a key key-match equal to key in map, and its associated value. When such no such key-match is found, returns the result of calling ` (absent-proc insert) ` . insert is a procedure such that ` (insert new-value) ` returns a version of map in which key is associated with new-value.

When key-match exists, returns the result of calling ` (present-proc key-match value replace remove) ` . replace is a procedure such that ` (replace new-value) ` returns a new version of map in which the key maps to new-value instead of value.

remove is a thunk such that ` (remove) ` returns a new version of map with no association for key.

The imap-update, insert, replace, and remove procedures may take O(log n) time each. Thus if absent-proc and present-proc take constant (or even O(log n)) time and call their passed-in continuations at most once (or even any constant number of times), the entire update process takes O(log n) time. ` (imap-find map key [absent-thunk]) ` Returns the value associated with key in map, or the result of evaluating (absent-thunk) if no such value exists. If absent-thunk is unspecified then a default procedure that always returns #false is used. O(log n) time.

` (imap-include map key value) (imap-exclude map key) ` Returns a set that certainly does/doesn't include an association from key. imap-include will replace any prior association from key that might exist in map. O(log n) time.

` (imap-key-predecessor map key [absent-thunk]) (imap-key-successor map key [absent-thunk]) `

The behavior and efficiency of these operations is the same as the analogous set procedures.

Range queries

` (imap-between map least include-least most include-most) `

The behavior and efficiency of these operations is the same as the analogous set procedures. (Least and most are keys.)

` (imap-range< map key) (imap-range<= map key) (imap-range= map key) (imap-range>= map key) (imap-range> map key) `

The behavior and efficiency of these operations is the same as the analogous set procedures.

Filter, fold, map

` (imap-filter proc set) `

Returns a set containing only those associations for which (proc key value) returns non-#false. O(n log n) time; O(n) is optimal.

` (imap-fold proc base map) ` The fundamental map iterator. Calls ` (proc key value accum) ` successively to accumulate a value initialized to base. O(n) time.

` (imap-map-values proc map) ` Returns a map where each value is replaced with the result of evaluating ` (proc key value) ` for that value's key. Takes O(n) time.

` (imap-map/monotone proc map [precedes?]) ` Returns a map containing the association values returned by calls to ` (proc key value) ` for each key-value association in map. proc must returns (values new-key new-value). The key mapping must be a monotone, preserving the order and uniqueness of keys. The value mapping need not be monotone. If precedes? is given, it is the key order predicate of the map; otherwise the map uses the same order predicate as map. Takes O(n) time.

` (imap-map proc map [precedes?]) ` As imap-map/monotone, except the proc is not required to be monotone. When multiple keys are mapped to the same new-key, only the association originating from the greatest pre-mapped key is retained. O(n log n) time.

Conversion

` (imap->ordered-alist map) `

Returns an association list containing the associations of map in increasing order by key. O(n) time.

` (imap-keys map) ` Returns a set containing the keys of map. O(n log n) time; O(n) is optimal.

` (imap-values map) ` Returns a list containing the values of map in increasing order by key. O(n) time.

` (ordered-alist->imap precedes? alist) `

Returns a map containing the associations of alist and using order predicate precedes?. It is an error for list to be anything other than a proper association list in increasing order by key. O(n log n) time; O(n) is optimal.

` (alist->imap precedes? alist) `

The behavior and efficiency of these operations is the same as the analogous set procedures.

Key comparison

` (imap-key<? map x y) (imap-key=? map x y) (imap-key>? map x y) `

The behavior and efficiency of these operations is the same as the analogous set procedures.

Immutable­Data­Structures­Wortman

Introduction

Efficiency bounds

Disjoint types

Deque

Construction

Predicates

Queue operations

Other accessors

The whole deque

Mapping

Filtering

Searching

Conversion

Set

Construction

Set-wide queries

Set-theoretic operations

Element queries

Element operations

Range queries

Filter, fold, map

Conversion

Maps

Construction

Map-wide queries

Set-theoretic operations

Element queries

Element operations

Range queries

Filter, fold, map

Conversion

Key comparison

ImmutableDataStructuresWortman