This site is a static rendering of the Trac instance that was used by R7RS-WG1 for its work on R7RS-small (PDF), which was ratified in 2013. For more information, see Home. For a version of this page that may be more recent, see ImmutableSetsWortman in WG2's repo for R7RS-large.

Immutable­Sets­Wortman

cowan
2015-12-12 22:36:40
4history
source

Introduction

This proposal defines immutable data structures for sets and bags. A structure is immutable when all its operations leave the structure unchanged and still available to any procedure that holds a pointer to it. Note that none of the procedures specified here ends with a !.

Immutable structures are sometimes called persistent and are closely related to pure-functional (a.k.a. pure) structures. The availability of immutable data structures facilitates writing efficient programs in the pure-functional style.

Rationale

Sets are a standard part of the libraries of many high-level programming languages, including Smalltalk, Java, and C++.

Bags are useful for counting anything from a fixed set of possibilities, e.g. the number of each type of error in a log file or the number of uses of each word in a lexicon drawn from a body of documents. Although other data structures can serve the same purpose, using bags clearly expresses the programmer's intent and allows for optimization.

Insofar as possible, the names in this SRFI are harmonized with the names used for SRFI 113. SRFI 113 sets and bags are linear-update: that is, a modification to the data structure returns a modified version and may or may not mutate the original version. In this SRFI, the data structures are guaranteed not to be mutated.

It's possible to use the general sets of this SRFI to contain characters, but the use of the linear-update sets of SRFI 14 is recommended instead.

Immutable sets and bags do not have a lexical syntax representation. It's possible to use SRFI 108 quasi-literal constructors to create them in code, but this SRFI does not standardize how that is done.

The interface to immutable sets and bags depends on SRFI 128 comparators. Comparators conveniently package the equality predicate of the set with the comparison procedure needed to implement the set efficiently.

Specification

Immutable sets (also known as isets) and immutable bags (also known as ibags) are disjoint from all other Scheme types with the possible exception of immutable maps. They may or may not be disjoint from one another.

We specify required time efficiency upper bounds using big-O notation. We note when, in some cases, there is "slack" between the required bound and the theoretically optimal bound for an operation. Implementations may use data structures with amortized time bounds, but should document which bounds hold in only an amortized sense. The use of randomized data structures with expected time bounds is discouraged.

A sorted set data structure stores a finite collection of unique elements with a defined comparator.

The following conditions are errors:

These requirements can be satisfied by many flavors of self-balancing binary trees. Red-black trees, 1-2 brother trees, and labeled 2-3 finger trees are particularly convenient in an immutable context.

Constructors

If two elements are inserted into a set that are equal in the sense of the set's comparator but are not eqv?, the first to be specified or generated prevails.

(iset comparator [ element ...])

Returns a set containing elements, where comparator provides the criterion of identity and ordering. Takes O(n log n) time.

(iset-unfold stop? mapper successor seed)

Invokes the predicate stop? on seed. If it returns false, generate the next result by applying mapper to seed, generate the next seed by applying successor to seed, and repeat this algorithm with the new seed. If stop? returns true, return a set containing the results. Takes O(n log n) time.

Predicates

(iset? obj)

Returns #t if obj is a set, and #f otherwise. Takes O(1) time.

(iset-empty? set)

Returns #t if set contains zero elements, and #f otherwise. Takes O(1) time.

(iset-member? set obj)

Returns #t if set contains obj, and #f otherwise. Takes O(log n) time.

Accessors

(iset-min set)

(iset-max set)

Returns the least/greatest element of set. It is an error for set to be empty. Takes O(log n) time; O(1) is optimal.

(iset-comparator set)

Returns the comparator of set. Takes O(1) time.

(iset-predecessor set obj failure)

(iset-successor set obj failure)

Returns the element that immediately precedes/succeeds obj in set's ordering. If no such element exists because obj is the minimum/maximum element, or because set is empty, returns the result of invoking the thunk failure. Takes O(log n) time.

Functional update

(iset-adjoin set obj)

Returns a set which contains the elements of set and obj as well. If there is already an element of set that is equal (in the sense of the comparator) to obj, the existing element of set prevails. Takes O(log n) time.

(iset-adjoin-all set list)

Returns a set which contains the elements of set and the elements of list as well. It is an error if the elements of list are not distinct and increasing in the sense of the comparator of set. If there is already an element of set which is equal (in the sense of the comparator) to an element of list, the element of set prevails. Takes O(k log n) time, where k is the length of list.

(iset-replace set obj)

Returns a set which contains the elements of set and obj as well. If there is already an element of set that is equal (in the sense of the comparator) to obj, obj prevails. Takes O(log n) time.

(iset-delete set obj)

Returns a set which contains the elements of set with the exception of obj, if present. If there is already an element of set that is equal (in the sense of the comparator) to obj, the element of set prevails. Takes O(log n) time.

(iset-delete-keys set list)

Returns a set which contains the elements of set, excluding the elements of list. It is an error if the elements of list are not distinct and increasing in the sense of the comparator of set. Takes O(k log n) time, where k is the length of list.

The whole set

(iset-size set)

Returns the size of set as an exact integer. May take O(n) time, though O(1) is optimal.

(iset-find set obj failure)

Returns the element equal (in the sense of the comparator of set) to obj in set, or the result of invoking the thunk failure if no such element exists. Takes O(log n) time.

(iset-count pred set)

Returns the number of elements in set which satisfy pred as an exact integer. Takes O(n) time.

(iset-any pred set)

(iset-every pred set)

Invokes pred on the elements of set until one of them returns a true/false value, which is then returned. If there are no such elements, returns #f/#t. Takes O(n) time.

Filtering

(iset-range= set obj)

(iset-range< set obj)

(iset-range> set obj)

(iset-range<= set obj)

(iset-range>= set obj)

Returns a set containing only the elements of set that are equal to / less than / greater than / less than or equal to / greater than or equal to obj. Takes O(log n) time, where n is the number of elements in the set.

Note that since set elements are unique, iset-range= returns a set of at most one element.

(iset-filter pred set)

(iset-remove pred set)

Returns a set containing only those elements on which pred returns true/false. Takes O(n log n) time; O(n) is optimal.

(iset-partitionpred set)

Returns two values, (iset-filter pred set) and (iset-remove pred set) respectively, but may be more efficient.

Folding and mapping

(iset-fold proc nil set)

The fundamental set iterator. Equivalent to, but may be more efficient than, (fold proc base (iset->increasing-list set)). Takes O(n) time.

(iset-fold-right proc nil set)

The fundamental set iterator. Equivalent to, but may be more efficient than, (fold-right proc base (iset->increasing-list set)). Takes O(n) time.

(iset-map/monotone proc set [ comparator ])

Returns a set containing the result of invoking proc on every element in set. It is an error unless proc is a monotone unary procedure that preserves the order of set elements. Observe that mapping a set of unique elements with a monotone function yields a set of unique elements, so element uniqueness follows from the monotonicity assumption. If comparator is given, it is the comparator of the result; otherwise the result uses the same comparator as set. Takes O(n) time.

(iset-mapproc set [ comparator [ merger ] ])

Like iset-map/monotone, except that proc is not required to be monotone. The merger procedure is used to select among any duplicate elements (in the sense of the comparator of set) that might be created; it returns the value to be used; if absent, the element chosen is implementation-specific. Takes O(n log n) time.

(iset-for-each proc set)

Invokes proc on every element in set. The result is unspecified. Takes O(n) time.

Subsets

Note: The following three predicates do not obey the trichotomy law and therefore do not constitute a total order on sets.

(iset=? set1 set2 ...)

Returns #t if each set contains the same elements, and #f otherwise.

(iset<? set1 set2 ...)

Returns #t if each set other than the last is a proper subset of the following set, and #f otherwise.

(iset>? set1 set2 ...)

Returns #t if each set other than the last is a proper superset of the following set, and #f otherwise.

(iset<=? set1 set2 ...)

Returns #t if each set other than the last is a subset of the following set, and #f otherwise.

(iset>=? set1 set2 ...)

Returns #t if each set other than the last is a superset of the following set, and #f otherwise.

Conversion

(iset->list set)

Returns a list containing the elements of set in increasing order. Takes O(n) time.

(increasing-list->iset comparator list)

Returns a set containing the elements of list and using comparator. It is an error for list to be anything other than a proper list of elements in increasing order. Takes O(n log n) time; O(n) is optimal.

(list->iset comparator list [ merger'' ])

Returns a set containing the elements of list and using comparator. It is an error if list is not a proper list, but it may contain duplicates and need not be in order. The merger procedure is used to select among any duplicate elements (in the sense of the comparator of set) that might be created; it accepts the existing and new elements and returns the value to be used. Takes O(n log n) time.

(generator->iset generator`)

(iset->generator iset)

Converts a SRFI 121 generator to and from a set.

Set-theoretic operations

(iset-union set ... )

(iset-intersection set ... )

(iset-difference set ... )

(iset-xor set1 set2)

Returns a set containing the union/intersection/difference/symmetric difference of the arguments. All the arguments must be sets sharing an equivalent comparator. The set operator is applied in a left-associative order. If an element is found in more than one set, the first set in the argument list prevails. May take O(kn log n) time, where k is the number of sets and n is the number of elements involved, though O(kn) time is optimal.

Bags

Bags are like sets, but can contain the same object more than once. However, if two elements that are the same in the sense of the comparator, but not in the sense of eqv?, are both included, it is not guaranteed that they will remain distinct when retrieved from the bag. It is an error for a single procedure to be invoked on bags with different comparators.

The procedures for creating and manipulating bags are the same as those for sets, except that iset is replaced by ibag in their names, and that adjoining an element to a bag is effective even if the bag already contains the element. If two elements in a bag are the same in the sense of the bag's comparator, the implementation may in fact store just one of them.

The ibag-union, ibag-intersection, ibag-difference, and ibag-xor procedures behave as follows when both bags contain elements that are equal in the sense of the bags' comparator:

Additional bag procedures

(ibag-sum set1 set2 ... )

The ibag-sum procedure returns a bag containing all the unique elements in all the bags, such that the count of each unique element in the result is equal to the sum of the counts of that element in the arguments. It differs from ibag-union by treating identical elements as potentially distinct rather than attempting to match them up.

(ibag-product n bag)

The bag-product procedure returns a bag containing all the unique elements in bag, where the count of each unique element in the bag is equal to the count of that element in bag multiplied by n.

(ibag-unique-size bag)

Returns the number of unique elements of bag as an exact integer.

(ibag-element-count bag element)

Returns an exact integer representing the number of times that element appears in bag.

(ibag-for-each-unique proc bag)

Applies proc to each unique element of bag in arbitrary order, passing the element and the number of times it occurs in bag, and discarding the returned values. Returns an unspecified result.

(ibag-fold-unique proc nil bag)

Invokes proc on each unique element of bag in arbitrary order, passing the number of occurrences as a second argument and the result of the previous invocation as a third argument. For the first invocation, nil is used as the third argument. Returns the result of the last invocation, or nil if there is none.

(ibag-increment bag element count)

(ibag-decrement bag element count)

Procedures that return a bag with the same elements as bag, but with the element count of element in bag increased or decreased by the exact integer count (but not less than zero).

(ibag->iset bag)

(iset->ibag set)

The ibag->iset procedure returns a set containing the unique elements (in the sense of the equality predicate) of bag. The iset->ibag procedure returns a bag containing the elements of set. In all cases, the comparator of the result is the same as the comparator of the argument or arguments.

(ibag->alist bag)

(alist->ibag comparator alist)

The ibag->alist procedure returns a newly allocated alist whose keys are the unique elements of bag and whose values are the number of occurrences of each element. The alist->ibag returning a bag based on comparator, where the keys of alist specify the elements and the corresponding values of alist specify how many times they occur.