This site is a static rendering of the Trac instance that was used by R7RS-WG1 for its work on R7RS-small (PDF), which was ratified in 2013. For more information, see Home.

Source for wiki ImmutableDataStructuresWortman version 3

author

kevinwortman

comment

Add maps, and other small changes

ipnr

127.10.177.1

name

ImmutableDataStructuresWortman

readonly

text

= Introduction =

(TODO)

= Efficiency bounds =

We specify required time efficiency upper bounds using big-O notation. We note when, in some cases, there is "slack" between the required bound and the theoretically optimal bound for an operation. Implementations may use data structures with amortized time bounds, but should document which bounds hold in only an amortized sense. The use of randomized data structures with expected time bounds is discouraged.

= Deque =

A deque (conventionally pronounced "deck") is a queue data structure that supports efficient manipulation of either of its ends. It may be used in place of a (LIFO) stack or (FIFO) queue.

The ''unlabeled finger tree'' data structure can meet all these requirements rather conveniently.

=== Construction ===

{{{#!scheme
(ideque [element...])
}}}

Create a new deque containing {{{element...}}}. The leftmost element (if any) will be at the front of the deque and the rightmost element (if any) will be at the back. Takes O(n) time, where n is the number of elements.

=== Queries ===

{{{#!scheme
(ideque? x)
(ideque-empty? deque)
(ideque-size deque)
}}}

{{{ideque?}}} and {{{ideque-empty?}}} take O(1) time; {{{ideque-size}}} may take O(n) time, though O(1) is optimal.

=== Queue operations ===

{{{#!scheme
(ideque-front deque)
(ideque-back deque)
}}}
Return the front/back element. It is an error for {{{deque}}} to be empty. Each takes O(1) time.

{{{#!scheme
(ideque-pop-front deque)
(ideque-pop-back deque)
}}}
Return a new deque with the front/back element removed. It is an error for {{{deque}}} to be empty. Each takes O(1) time.

{{{#!scheme
(ideque-push-front deque x)
(ideque-push-back deque x)
}}}
Return a new deque with {{{x}}} pushed to the front/back of {{{deque}}}. Each takes O(1) time.

=== Miscellaneous ===

{{{#!scheme
(ideque-append deque...)
}}}
Return a new deque with the contents of the first argument deque followed by the others. Takes O(kn) time, where k is the number of deques and n is the number of elements involved, though O(k log n) is possible.

=== Conversion ===

{{{#!scheme
(list->ideque list)
(ideque->list deque)
}}}

Conversion between deque and list structures. FIFO order is preserved, so the front of a list corresponds to the front of a deque. Each operation takes O(n) time.

= Set =

A sorted set data structure stores a finite collection of unique orderable (a.k.a. comparable or sortable) elements.

These requirements can be satisfied by many flavors of ''self-balancing binary trees.'' Red-black trees, 1-2 brother trees, and labeled 2-3 finger trees are particularly convenient in an immutable context.

=== Order predicates ===

An ''order predicate'' for a set of orderable elements is a procedure {{{precedes?}}} such that, for set elements {{{x}}} and {{{y}}},
{{{#!scheme
(precedes? x y)
}}}
returns non-{{{#false}}} iff {{{x}}} precedes {{{y}}} in the set's ordering. It returns {{{#false}}} when {{{x}}} succeeds (comes after) {{{y}}} or when the elements are tied for the same position.

It is essential that order predicates return {{{#false}}} in the case of equal elements, as implementations may test for element equality with expressions of the form
{{{#!scheme
(and (not (precedes? x y))
     (not (precedes? y x)))
}}}

Note that {{{<}}}, {{{char<?}}}, and {{{string<?}}} are valid order predicates for sets of numbers, characters, and strings, respectively.

The time bounds stated below are all based on the assumption that evaluating an order predicate takes O(1) time.

=== Construction ===

{{{#!scheme
(iset precedes? [element...])
}}}
Return a new set using the order predicate {{{precedes?}}} and containing elements {{{element...}}}. If any subset of element arguments are equal according to {{{precedes?}}} then only the leftmost of them is a member of the returned set. Takes O(n log n) time.

=== Set-wide queries ===

{{{#!scheme
(iset? x)
(iset-empty? set)
(iset-size set)
(iset-order-predicate set)
}}}

{{{iset-size}}} may take O(n) time, though O(1) is optimal. The other procedures take O(1) time.

=== Set-theoretic operations ===

{{{#!scheme
(iset-union set...)
(iset-intersection set...)
(iset-difference left set...)
(iset-symmetric-difference set...)
}}}

Return a new set containing the union/intersection/difference/symmetric difference of the arguments. All the arguments must be sets sharing an equivalent order predicate. Runs in O(kn log n) time, where k is the number of sets and n is the number of elements involved; O(kn) time is optimal.

=== Element queries ===

{{{#!scheme
(iset-member? set x)
}}}

Returns non-{{{#false}}} iff {{{x}}} is an element of {{{set}}}, or {{{#false}}} otherwise. Takes O(log n) time.

{{{#!scheme
(iset-min set)
(iset-max set)
}}}

Returns the least/greatest element of {{{set}}}. It is an error for {{{set}}} to be empty. Takes O(log n) time; O(1) is optimal.

=== Element operations ===

{{{#!scheme
(iset-update set x present-proc absent-proc)
}}}

A continuation-based universal update procedure. Attempts to find an element {{{match}}} equal to {{{x}}} in {{{set}}}. When such a {{{match}}} is found, calls
{{{#!scheme
(present-proc match update remove)
}}}
which must either tail-call
{{{#!scheme
(update new-element ret)
}}}
to replace {{{match}}} with {{{new-element}}}, or tail-call
{{{#!scheme
(remove ret)
}}}
to remove {{{match}}} from {{{set}}}. In either case {{{ret}}} will be returned from the enclosing {{{iset-update}}} call.

When no such {{{match}}} is found, {{{iset-update}}} calls
{{{#!scheme
(absent-proc insert ignore)
}}}
which should either tail-call
{{{#!scheme
(insert ret)
}}}
to insert {{{x}}} into {{{set}}}, or else tail-call
{{{#!scheme
(ignore ret)
}}}
to leave {{{set}}} unchanged. Again, {{{ret}}} will be returned from the enclosing {{{iset-update}}} call.

In any case, {{{iset-update}}} returns
{{{#!scheme
(values new-set ret)
}}}
where {{{new-set}}} is a new set reflecting the indicated modification, if any; and {{{ret}}} is the return value produced by one of the continuations. It runs in O(log n) time.

(This procedure is based on an analogous procedure for hash tables suggested by Alexey Radul and attributed to Taylor Campbell.)

{{{#!scheme
(iset-find iset x [absent-thunk])
}}}
Returns the element {{{y}}} equal to {{{x}}} in {{{set}}}, or the result of evaluating {{{(absent-thunk)}}} if no such element exists. If {{{absent-thunk}}} is unspecified then a default procedure that always returns {{{#false}}} is used. Takes O(log n) time.

{{{#!scheme
(iset-include set x)
(iset-exclude set x)
}}}
Return a new set that certainly does/doesn't include {{{x}}}. Each operation takes O(log n) time.

{{{#!scheme
(iset-predecessor set x [absent-thunk])
(iset-successor set x [absent-thunk])
}}}
Returns the element that immediately precedes/succeeds {{{x}}} in {{{set}}}'s ordering. If no such element exists because {{{x}}} is the minimum/maximum element, or because {{{set}}} is empty, returns the result of evaluating {{{(absent-thunk)}}}. If {{{absent-thunk}}} is unspecified then a default procedure that always returns {{{#false}}} is used. Takes O(log n) time.

=== Range queries ===

{{{#!scheme
(iset-between set least include-least most include-most)
}}}

Returns a new set containing the elements of {{{set}}} in the interval between {{{least}}} and {{{most}}}. If {{{include-least}}}/{{{include-most}}} is non-{{{#false}}} then the result includes elements equal to {{{least}}}/{{{most}}} respectively; otherwise those elements are not included. Takes O(k log k + log n) time, where n is the number of elements in the set and k is the number of elements returned; O(k + log n) is optimal.

{{{#!scheme
(iset-range< set x)
(iset-range<= set x)
(iset-range= set x)
(iset-range>= set x)
(iset-range> set x)
}}}

Returns a new set containing only the elements of {{{set}}} that are less/less-or-equal/equal/greater-or-equal/greater than {{{x}}}. Takes O(k log k + log n) time, where n is the number of elements in the set and k is the number of elements returned; O(k + log n) is optimal.

Note that since set elements are unique, {{{iset-range=}}} returns a set of at most one element.

=== Filter, fold, map ===

{{{#!scheme
(iset-filter predicate? set)
}}}

Returns a new set containing only those elements {{{x}}} for which {{{(predicate? x)}}} returns non-{{{#false}}}. Takes O(n log n) time; O(n) is optimal.

{{{#!scheme
(iset-fold proc base set)
}}}
The fundamental set iterator. Equivalent to, but may be more efficient than,
{{{#!scheme
(fold proc base (iset->ordered-list set))
}}}
Takes O(n) time.

{{{#!scheme
(iset-map/monotone proc set [precedes?])
}}}
Returns a new set containing the elements {{{(proc x)}}} for every {{{x}}} in {{{set}}}. {{{proc}}} must be a ''monotone'' unary procedure that preserves the order and uniqueness of set elements. If {{{precedes?}}} is given, it is the order predicate of the new set; otherwise the new set uses the same order predicate as {{{set}}}. Takes O(n) time.

{{{#!scheme
(iset-map proc set [precedes?])
}}}
As {{{iset-map/monotone}}}, except the {{{proc}}} is not required to be monotone. When multiple elements are mapped to the same value, only the value originating from the least pre-mapped element is retained. Takes O(n log n) time.

=== Conversion ===

{{{#!scheme
(iset->ordered-list set)
}}}

Returns a list containing the elements of {{{set}}} in increasing order. Takes O(n) time.

{{{#!scheme
(ordered-list->iset precedes? list)
}}}

Returns a set containing the elements of {{{list}}} and using order predicate {{{precedes?}}}. It is an error for {{{list}}} to be anything other than a proper list of elements in increasing order. Takes O(n log n) time; O(n) is optimal.

{{{#!scheme
(list->iset precedes? list)
}}}

Returns a set containing the elements of {{{list}}} and using order predicate {{{precedes?}}}. {{{list}}} must be a proper list, but may contain duplicates and need not be in order. When a subset of input elements are equal according to {{{precedes?}}}, only the first such element in {{{list}}} is a member of the returned set. Takes O(n log n) time.

=== Element comparison ===

{{{#!scheme
(iset-element<? set x y)
(iset-element=? set x y)
(iset-element>? set x y)
}}}

These procedures compare {{{x}}} to {{{y}}} according to {{{set}}}'s order predicate and the conventional meaning of {{{<}}}, {{{=}}}, and {{{>}}}.

= Maps =

A map data structure, as specified here, stores key-value associations between a set of keys with an order predicate and arbitrary value objects. It is an alternative to an association list or hash table, which also store key-value associations, but with different key constraints and efficiency guarantees.

In the same way that a list of key-value dotted pairs can implement an association list, a set of key-value dotted pairs can implement a map. Implementations may use this approach, or implement a distinct data structure specific to maps.

=== Construction ===

=== Map-wide queries ===

{{{#!scheme
(imap? x)
(imap-empty? map)
(imap-size map)
(imap-order-predicate map)
}}}

The behavior and efficiency of these operations is the same as the analogous set procedures.

=== Set-theoretic operations ===

{{{#!scheme
(imap-union map...)
(imap-intersection map...)
(imap-difference map...)
(imap-symmetric-difference map...)
}}}

The behavior and efficiency of these operations is the same as the analogous set procedures.

=== Element queries ===

{{{#!scheme
(imap-member? map key)
(imap-min-key map)
(imap-max-key map)
}}}

The behavior and efficiency of these operations is the same as the analogous set procedures.

{{{#!scheme
(imap-min-value map)
(imap-max-value map)
}}}

Returns the value associated with {{{(imap-min-key map)}}} and {{{(imap-max-key map)}}} respectively. Takes O(log n) time; O(1) is optimal.

=== Element operations ===

{{{#!scheme
(imap-update set key present-proc absent-proc)
}}}

A continuation-based universal update procedure. Attempts to find a key {{{key-match}}} equal to {{{key}}} in {{{set}}}. When such a {{{key-match}}} is found, calls
{{{#!scheme
(present-proc key-match value update remove)
}}}
which must either tail-call
{{{#!scheme
(update new-key new-value ret)
}}}
to replace {{{key}}} with {{{new-key}}} and the corresponding value with {{{new-value}}}, or tail-call
{{{#!scheme
(remove ret)
}}}
to remove {{{key-match}}} and its corresponding value from {{{map}}}. In either case {{{ret}}} will be returned from the enclosing {{{imap-update}}} call.

When no such {{{key-match}}} is found, {{{imap-update}}} calls
{{{#!scheme
(absent-proc insert ignore)
}}}
which should either tail-call
{{{#!scheme
(insert new-key value ret)
}}}
to insert {{{new-key}}} and {{{value}}} into {{{map}}}, or else tail-call
{{{#!scheme
(ignore ret)
}}}
to leave {{{map}}} unchanged. Again, {{{ret}}} will be returned from the enclosing {{{imap-update}}} call.

In any case, {{{imap-update}}} returns
{{{#!scheme
(values new-map ret)
}}}
where {{{new-map}}} is a new map reflecting the indicated modification, if any; and {{{ret}}} is the return value produced by one of the continuations. It runs in O(log n) time.

(This procedure is based on an analogous procedure for hash tables suggested by Alexey Radul and attributed to Taylor Campbell.)

{{{#!scheme
(imap-find map key [absent-thunk])
}}}
Returns the value associated with {{{key}}} in {{{map}}}, or the result of evaluating {{{(absent-thunk)}}} if no such value exists. If {{{absent-thunk}}} is unspecified then a default procedure that always returns {{{#false}}} is used. Takes O(log n) time.

{{{#!scheme
(imap-include map key value)
(imap-exclude map key)
}}}
Return a new set that certainly does/doesn't include {{{key}}} and the associated {{{value}}}. {{{imap-include}}} will replace any prior association to {{{key}}} that might exist in {{{map}}}. Each operation takes O(log n) time.

{{{#!scheme
(imap-key-predecessor map key [absent-thunk])
(imap-key-successor map key [absent-thunk])
}}}

The behavior and efficiency of these operations is the same as the analogous set procedures.

=== Range queries ===

{{{#!scheme
(imap-between map least include-least most include-most)
}}}

Returns a new map containing the associations of {{{map}}} whose keys are in the interval between {{{least}}} and {{{most}}}. If {{{include-least}}}/{{{include-most}}} is non-{{{#false}}} then the result includes associations with keys equal to {{{least}}}/{{{most}}} respectively; otherwise those associations are not included. Takes O(k log k + log n) time, where n is the number of elements in the set and k is the number of elements returned; O(k + log n) is optimal.

{{{#!scheme
(imap-range< map key)
(imap-range<= map key)
(imap-range= map key)
(imap-range>= map key)
(imap-range> map key)
}}}

The behavior and efficiency of these operations is the same as the analogous set procedures.

=== Filter, fold, map ===

{{{#!scheme
(imap-filter proc set)
}}}

Returns a new set containing only those associations for which {{{(proc key value)}}} returns non-{{{#false}}}. Takes O(n log n) time; O(n) is optimal.

{{{#!scheme
(imap-fold proc base map)
}}}
The fundamental map iterator. Calls
{{{#!scheme
(proc key value accum)
}}}
successively to accumulate a value initialized to {{{base}}}. Takes O(n) time.

{{{#!scheme
(imap-map-values proc map)
}}}
Returns a new map where each {{{value}}} is replaced with the result of evaluating
{{{#!scheme
(proc key value)
}}}
for that value's key. Takes O(n) time.

{{{#!scheme
(imap-map/monotone proc map [precedes?])
}}}
Returns a new map containing the association values returned by calls to
{{{#!scheme
(proc key value)
}}}
for each key-value association in {{{map}}}. {{{proc}}} must return {{{(values new-key new-value)}}}. The key mapping must be a ''monotone,'' preserving the order and uniqueness of keys. The value mapping need not be monotone. If {{{precedes?}}} is given, it is the key order predicate of the new map; otherwise the new map uses the same order predicate as {{{map}}}. Takes O(n) time.

{{{#!scheme
(imap-map proc map [precedes?])
}}}
As {{{imap-map/monotone}}}, except the {{{proc}}} is not required to be monotone. When multiple keys are mapped to the same {{{new-key}}}, only the association originating from the least pre-mapped key is retained. Takes O(n log n) time.

=== Conversion ===

{{{#!scheme
(imap->ordered-alist map)
}}}

Returns an association list containing the associations of {{{map}}} in increasing order by key. Takes O(n) time.

{{{#!scheme
(imap-keys map)
}}}
Returns a set containing the keys of {{{map}}}. Takes O(n log n) time; O(n) is optimal.

{{{#!scheme
(imap-values map)
}}}
Returns a list containing the values of {{{map}}} in increasing order by key. Takes O(n) time.

{{{#!scheme
(ordered-alist->imap precedes? alist)
}}}

Returns a map containing the associations of {{{alist}}} and using order predicate {{{precedes?}}}. It is an error for {{{list}}} to be anything other than a proper association list in increasing order by key. Takes O(n log n) time; O(n) is optimal.

{{{#!scheme
(alist->imap precedes? alist)
}}}

The behavior and efficiency of these operations is the same as the analogous set procedures.

=== Key comparison ===

{{{#!scheme
(imap-key<? map x y)
(imap-key=? map x y)
(imap-key>? map x y)
}}}

These procedures compare key {{{x}}} to key {{{y}}} according to {{{set}}}'s order predicate and the conventional meaning of {{{<}}}, {{{=}}}, and {{{>}}}.

= Open questions =

* Should there be an explicitly-immutable pair and list?
* "imap" is a name clash with the IMAP email protocol. Is this a dealbreaker?

time

2013-06-05 02:42:52