This site is a static rendering of the Trac instance that was used by R7RS-WG1 for its work on R7RS-small (PDF), which was ratified in 2013. For more information, see Home. For a version of this page that may be more recent, see GeneratorsGauche in WG2's repo for R7RS-large.

Generators­Gauche

cowan
2014-09-21 10:59:28
1history
source

Generators

A generator is merely a procedure with no arguments and works as a source of a series of values. Every time it is called, it yeilds a value. The EOF value indicates the generator is exhausted. For example, read-char can be seen as a generator that generates characters from the current input port.

It is common practice to abstract the source of values in such a way, so it is useful to define utility procedures that work on the generators. This module provides them.

A generator is very lightweight, and handy to implement simple on-demand calculations. However, keep in mind that it is side-effecting construct; you can’t safely backtrack, for example..

You can construct and combine various generators with this module. Eventually, your code need to consume the generated value. There are several built-in procedures for that; see Folding generated values.

Generator constructors

A generator isn’t a special datatype but just an ordinary procedure, so you can make a generator with lambdas. This module provides some common generator constructors for the convenience.

If you want to use your procedure as a generator, note that a generator can be invoked many times even after it returns EOF once. You have to code it so that once it returns EOF, it keeps returning EOF for the subsequent calls.

The result of generator constructors is merely a procedure, and printing it doesn’t show much. In the examples in this section we use generator->list to convert the generator to the list. See Generator operations for the description of generator->list.

Function: null-generator

An empty generator. Returns just an EOF object when called.

Function: circular-generator arg …

Returns an infinite generator that repeats the given arguments.

 
(generator->list (circular-generator 1 2 3) 10)
  ⇒ (1 2 3 1 2 3 1 2 3 1)

Note that the above example limits the length of the converted list by 10; otherwise generator->list won’t return.

Function: giota :optional (count +inf.0) (start 0) (step 1)

Like iota, creates a generator of a series of count numbers, starting from start and increased by step.

 
(generator->list (giota 10 3 2))
  ⇒ (3 5 7 9 11 13 15 17 19 21)

If both start and step are exact, the generator yields exact numbers; otherwise it yields inexact numbers.

 
(generator->list (giota +inf.0 1/2 1/3) 6)
  ⇒ (1/2 5/6 7/6 3/2 11/6 13/6)
(generator->list (giota +inf.0 1.0 2.0) 5)
  ⇒ (1.0 3.0 5.0 7.0 9.0)
Function: grange start :optional (end +inf.0) (step 1)

Similar to giota, creates a generator of a series of numbers. The series begins with start, increased by step, and continues while the number is below end.

 
(generator->list (grange 3 8))
  ⇒ (3 4 5 6 7)
Function: generate proc

Creates a generator from coroutine.

The proc argument is a procedure that takes one argument, yield. When called, generate immediately returns a generator G. When G is called, the proc runs until it calls yield. Calling yield causes to suspend the execution of proc and G returns the value passed to yield.

Once proc returns, it is the end of the series—G returns eof object from then on. The return value of proc is ignored.

The following code creates a generator that produces a series 0, 1, and 2 (effectively the same as (giota 3) and binds it to g.

 
(define g
  (generate
   (^[yield] (let loop ([i 0])
               (when (< i 3) (yield i) (loop (+ i 1)))))))

(generator->list g) ⇒ (0 1 2)
Function: list->generator lis :optional start end
Function: vector->generator vec :optional start end
Function: reverse-vector->generator vec :optional start end
Function: string->generator str :optioanl start end

Returns a generator that yields each item in the given argument. A generator returned from reverse-* procedures runs in reverse order.

 
(generator->list (list->generator '(1 2 3 4 5)))
  ⇒ (1 2 3 4 5)
(generator->list (vector->generator '#(1 2 3 4 5)))
  ⇒ (1 2 3 4 5)
(generator->list (reverse-vector->generator '#(1 2 3 4 5)))
  ⇒ (5 4 3 2 1)
(generator->list (string->generator "abcde"))
  ⇒ (#\a #\b #\c #\d #\e)

By default the generator is exhausted once all items are retrieved; the optional start and end arguments can limit the range the generator walks across; start specifies the left bound and end specifies the right bound.

For forward generators, the first value the generator yields is start-th element, and it ends right before end-th element. For reverse generators, the first value is the item right next to the end-th element, and the last value is the start-th element. at the last element, and reverse generators ends at the first element.

 
(generator->list (vector->generator '#(a b c d e) 2))
  ⇒ (c d e)
(generator->list (vector->generator '#(a b c d e) 2 4))
  ⇒ (c d)
(generator->list (reverse-vector->generator '#(a b c d e) 2))
  ⇒ (e d c b)
(generator->list (reverse-vector->generator '#(a b c d e) 2 4))
  ⇒ (d c)
(generator->list (reverse-vector->generator '#(a b c d e) #f 2))
  ⇒ (b a)
Function: bits->generator n :optional start end
Function: reverse-bits->generator n :optional start end

These procedures take an exact integer and treat it as a sequence of boolean values (0 for false and 1 for true), as integer->list does. Bits->generator takes bits from LSB, while reverse-bits->generator takes them from MSB.

 
(generator->list (bits->generator #b10110))
 ⇒ (#f #t #t #f #t)
(generator->list (reverse-bits->generator #b10110))
 ⇒ (#t #f #t #t #f)

The optional start and/or end arguments are used to specify the range of bitfield, LSB being 0. Unlike list->generator etc, start specifies the rightmost position (inclusive) and end specfies the leftmost position (exclusive). It is consistent with other procedures that accesses bit fields in integers.

 
(generator->list (bits->generator #x56 0 4)
  ⇒ (#f #t #t #f)  ; takes bit 0, 1, 2 and 3
(generator->list (bits->generator #x56 4 8)
  ⇒ (#t #f #t #f)  ; takes bit 4, 5, 6 and 7

(generator->list (reverse-bits->generator #x56 4 8)
  ⇒ (#f #t #f #t)  ; takes bit 7, 6, 5 and 4 
Function: port->sexp-generator input-port
Function: port->line-generator input-port
Function: port->char-generator input-port
Function: port->byte-generator input-port

Returns a generator that reads characters or bytes from the given port, respectively. They’re just (cut read input-port), (cut read-line input-port), (cut read-char input-port) and (cut read-byte input-port), respectively, but we provide them for completeness.

Generic function: x->generator obj

A generic version to convert any collection obj to a generator that walks across the obj. Besides, if obj is an input port, port->char-generator is called.

Function: file->generator filename reader . open-args

Opens a file filename, and returns a generator that reads items from the file by a procedure reader, which takes one argument, an input port. The arguments open-args are passed to open-input-file

The file is closed when the generator is exhausted. If a generator is abandoned before being read to the end, then the file is kept open until the generator is garbage-collected. If you want to make sure the file is closed by a certain point of time, you might want to use a reader procedure as a generator within the dynamic extent of with-input-from-file etc.

Function: file->sexp-generator filename . open-args
Function: file->char-generator filename . open-args
Function: file->line-generator filename . open-args
Function: file->byte-generator filename . open-args

Returns a generator that reads a series of sexps, characters, lines and bytes from a file filename, respectively. These are versions of file->generator specialized by read, read-char, read-line and read-byte as the reader argument.

Like file->generator, open-args are passed to open-input-file. The file is closed when the generator is exhausted.

Function: gunfold p f g seed :optional tail-gen

A generator constructor similar to unfold.

P is a predicate that takes a seed value and determines where to stop. F is a procedure that calculates a value from a seed value. G is a procedure that calculates the next seed value from the current seed value. Tail-gen is a procedure that takes the last seed value and returns a generator that generates the tail.

For each call of the resulting generator, p is called with the current seed value. If it returns a true, then we see we’ve done, and tail-gen is called (if it is given) to get a generator for the tail. Otherwise, we apply f on the current seed value to get the value to generate, and use g to update the seed value.

 
(generator->list (gunfold (^s (> s 5)) (^s (* s 2)) (^s (+ s 1)) 0))
  ⇒ '(0 2 4 6 8 10)

Generator operations

Function: generator->list generator :optional k

Reads items from generator and returns a list of them. By default, this reads until the generator is exhausted. If an optional argument k is given, it must be a nonnegative integer, and the list ends either k items are read, or the generator is exhausted.

Be careful not to pass an infinite generator to this without specifying k—this procedure won’t return but hogs all the memory before crash.

Macro: glet* (binding …) body body2 …

This captures a monadic pattern frequently appears in the generator code. It is in a similar spirit of and-let*, but returns as soon as the evaluating expression returns EOF, instead of #f as and-let* does.

The binding part can be either (var expr) or ( expr ). The actual definition will explain this syntax clearly.

 
(define-syntax glet*
  (syntax-rules ()
    [(_ () body body2 ...) (begin body body2 ...)]
    [(_ ([var gen-expr] more-bindings ...) . body)
     (let1 var gen-expr
       (if (eof-object? var)
         var
         (glet* (more-bindings ...) . body)))]
    [(_ ([ gen-expr ] more-bindings ...) . body)
     (let1 var gen-expr
       (if (eof-object? var)
         var
         (glet* (more-bindings ...) . body)))]))
Macro: glet1 var expr body body2 …

This is to glet* as let1 is to let*. In other words, it is (glet* ([var expr]) body body2 …).

Macro: do-generator (var gexpr) body …

This is a generator version of dolist and dotimes.

Gexpr is an expression that yields a generator. It is evaluated once. The resulting generator is called repeatedly until it returns EOF. Every time the generator is called, body … are evaluated in the scope where var is bound to the value yielded from the generator.

Like dolist and dotimes, this macro exists for side-effects. You can write the same thing with for-each families, but sometimes this macro makes the imperative code more readable:

 
(do-generator [line (file->line-generator "filename")]
  ;; do some side-effecting stuff with line
  )

The following procedures take generators (noted as gen and gen2) and return a generator. For the convenience, they also accept any collection to gen and gen2 parameters; if a collection is passed where a generator is expected, it is implicitly coerced into a generator.

Function: gcons* item … gen

Returns a generator that adds items in front of gen.

 
(generator->list (gcons* 'a 'b (giota 2)))
 ⇒ (a b 0 1)
Function: gappend gen …

Returns a generator that yields the items from the first given generator, and once it is exhausted, use the second generator, and so on.

 
(generator->list (gappend (giota 3) (giota 2)))
 ⇒ (0 1 2 0 1)

(generator->list (gappend))
 ⇒ ()
Function: gconcatenate gen

The gen argument should generate generators and/or sequences. Returns a generator that yields elements from the first generator/sequence, then the second one, then the third, etc.

It is similar to (apply gappend (generator->list gen)), except that gconcatenate can work even gen generates infinite number of generators.

 
($ generator->list $ gconcatenate
   $ list->generator `(,(giota 3) ,(giota 2)))
 ⇒ (0 1 2 0 1)
Function: gmerge less-than gen gen2 …

Creates a generator that yields elements out of input generators, with the order determined by a procedure less-than. The procedure is called as (less-than a b) and must return #t iff the element a must precede the element b.

Each input generator must yield an ordered elements by itself; otherwise the result won’t be ordered.

If only one generator is given, it is just returned (after coercing the input to a generator). In that case, less-than won’t be called at all.

 
(generator->list (gmerge < '(1 3 8) '(5) '(2 4)))
  ⇒ '(1 2 3 4 5 8)
Function: gmap proc gen gen2 …

Returns a generator that yields a value returned by proc applied on the values from given generators. The returned generator terminates when any of the given generator is exhausted.

NB: This differs from generator-map, which consumes all values at once and returns the results as a list, while gmap returns a generator immediately without consuming input.

Function: gmap-accum proc seed gen gen2 …

A generator version of map-accum, mapping with states.

The proc argument is a procedure that takes as many arguments as the input generators plus one. It is called as (proc v v2 … seed) where v, v2, … are the values yielded from the input generators, and seed is the current seed value. It must return two values, the yielding value and the next seed.

Function: gfilter pred gen

Returns a generator that yield the items from the source generator, except those who makes pred answers false.

Function: gfilter-map proc gen gen2 …

Works the same as (gfilter values (gmap proc gen gen2 …)), only slightly efficiently.

Function: gstate-filter proc seed gen

This allows stateful filtering of a series. The proc argument must be a procedure that takes a value V from the source generator and a seed value. It should return two values, a boolean flag and the next seed value. If it returns true for the boolean flag, the generator yields V. Otherwise, the generator keeps calling proc, with updating the seed value, until it sees the true flag value or the source generator is exhausted.

The following example takes a generator of oscillating values and yields only values that are greater than their previous value.

 
(generator->list
 (gstate-filter (^[v s] (values (< s v) v)) 0
                (list->generator '(1 2 3 2 1 0 1 2 3 2 1 0 1 2 3))))
 ⇒ (1 2 3 1 2 3 1 2 3)
Function: gbuffer-filter proc seed gen :optional tail-gen

This procedure allows n-to-m mapping between elements in input and output— that is, you can take a look at serveral input elements to generate one or more output elements.

The procedure proc receives the next input element and accumulated seed value. It returns two values: A list of output values, and the next seed value. If you need to look at more input to generate output, you can return an empty list as the first value.

If the input reaches the end, tail-gen is called with the current seed value; it should return a list of last output values. If omitted, the output ends when the output of the last call to proc is exhausted (the last seed value is discarded).

Suppose you have a text file. Each line contains a command, but if the line ends with backslash, next line is treated as a continuation of the current line. The following code creates a generator that returns logical lines, that is, the lines after such line continuations are taken care of.

 
(gbuffer-filter (^[v s]
                  (if-let1 m (#/\\$/ v)
                    (values '() (cons (m 'before) s))
                    (values `(,(string-concatenate-reverse (cons v s))) '())))
                '()
                (file->line-generator "input-file.txt")
                (^[s] `(,(string-concatenate-reverse s))))
Function: gtake gen k :optional (fill? #f) (padding #f)
Function: gdrop gen k

The generator version of take* and drop*; gtake returns a generator that yields (at most) first k items of the source generator, while gdrop returns a generator that skips the first k items of the source generator.

Those won’t complain if the source generator is exhausted before generating k items. By default, the generator returned by gtake terminates as the source ends, but if you give a true value to fill?, then the returned generator does yield k items, using the padding value to fill the rest.

Note: If you pass true to fill?, gtake always returns a generator that generates exactly k elements even the input generator is already exhausted—there’s no general way to know whether you’ve reached the end of the input. If you need to take k items repeatedly from the input generator, you may want to use gslices below.

Function: gtake-while pred gen
Function: gdrop-while pred gen

The generator version of take-while and drop-while. The generator returned from gtake-while yields items from the source generator as far as pred returns true for each. The generator returned from gdrop-while first reads items from the source generator while pred returns true for them, then start yielding items.

Function: gslices gen k :optional (fill? #f) (padding #f)

The generator version of slices. This returns a generator, that yields a list of k items from the input generator gen at a time.

 
(generator->list (gslices (giota 7) 3))
  ⇒ ((0 1 2) (3 4 5) (6))

The fill? and padding arguments controls how the end of input is handled, just like gtake. When fill? is #f (default), the last item from output generator may not have k items if the input is short to fill them, as shown in the above example. If fill? is true and the input is short to complete k items, padding argument is used to fill the rest.

 
(generator->list (gslices (giota 6) 3 #t 'x))
  ⇒ ((0 1 2) (3 4 5))
(generator->list (gslices (giota 7) 3 #t 'x))
  ⇒ ((0 1 2) (3 4 5) (6 x x))

Folding generated values

Generated values needs to be consumed eventually. Here we provide several procedures to do that. These are useful when combined with input procedures like read, so we have them built-in instead of putting them in a separate module.

Function: generator-fold proc seed gen gen2 …

Works like fold on the generated values by generator procedures gen gen2 ….

When one generator is given, for each value v generated by gen, proc is called as (proc v r), where r is the current accumulated result; the initial value of the accumulated result is seed, and the return value from proc becomes the next accumulated result. When gen returns EOF, the accumulated result at that time is returned from generator-fold.

When more than one generator is given, proc is called as (proc v1 v2r), where v1, v2 … are the values yielded from gen, gen2, …, respectively, and r is the current accumulated result. The iteration terminates when any one of the generators returns EOF.

 
(with-input-from-string "a b c d e"
  (cut generator-fold cons 'z read))
  ⇒ (e d c b a . z)
Function: generator-fold-right proc seed gen gen2 …

Works like fold-right on the generated values by generator procedures gen gen2 …=.

This is provided for completeness, but it isn’t a good way to handle generators; in order to combine values right-associatively, we should read all the values from the generators (until any one of the generator returns EOF), then start calling proc as

 
(proc v0_0 v1_0 ... (proc v0_1 v1_1 ... (proc v0_n v1_n ... seed) ...))

where vn_m is the m-th value yielded by n-th generator.

 
(with-input-from-string "a b c d e"
  (cut generator-fold-right cons 'z read))
  ⇒ (a b c d e . z)

As you see, keeping all intermediate values kind of defeats the benefit of generators.

Function: generator-for-each proc gen gen2 …

A generator version of for-each. Repeatedly applies proc on the values yielded by gen, gen2 … until any one of the generators yields EOF. The values returned from proc are discarded.

This is a handy procedure to consume generated values with side effects.

Function: generator-map proc gen gen2 …

A generator version of map. Repeatedly applies proc on the values yielded by gen, gen2 … until any one of the generators yields EOF. The values returned from proc are collected into a list and returned.

 
(with-input-from-string "a b c d e"
  (cut generator-map symbol->string read))
  ⇒ ("a" "b" "c" "d" "e")

The same effects can be achieved by combining generator->list and gmap. This procedure is provided for the backward compatibility.

 
(generator->list (gmap proc gen gen2 …))
Function: generator-find pred gen

Returns the first item from the generator gen that satisfies the predicate pred.

The following example returns the first line matching the regexp #/XYZ/ from the file ‘foo.txt’.

 
(with-input-from-file "foo.txt"
  (cut generator-find #/XYZ/ read-line))