This site is a static rendering of the Trac instance that was used by R7RS-WG1 for its work on R7RS-small (PDF), which was ratified in 2013. For more information, see Home. For a version of this page that may be more recent, see BlobAPI in WG2's repo for R7RS-large.

Blob­API

cowan
2011-02-03 04:39:51
5history
source

This is a proposal for a WG2 blob (bytevector) API. Blobs are a disjoint type. They have no native interpretation, but parts of them can be interpreted as one of a variety of binary types. The conceit is that everything is a separate procedure with minimal arguments; this makes for a lot of procedures, but each one can be easily inlined by even a very dumb compiler, providing high efficiency.

Basic procedures

(blob? obj)
Returns #t if obj is a blob.
(make-blob n)
Returns a newly allocated blob containing n bytes.
(blob-length blob)
Returns the length of blob in bytes.
(copy-blob blob)
Returns a newly allocated blob containing the same bytes as blob.
(partial-blob blob from to)
Returns a newly allocated blob containing the bytes in blob starting at start (inclusive) and ending at end (exclusive).
(copy-blob! from to)
Copy blob from on top of blob to, which must not be shorter.
(copy-partial-blob! from start end to at)
Copy the part of from starting at start and ending at end into to starting at at.

Because there is no preferred way to interpret the data in a blob, there is no blob function analogous to list or vector and no second argument to make-blob.

Numeric procedures

(blob-<type><endian>-ref blob n)
Returns a Scheme number corresponding to the binary value encoded according to type beginning at offset n in blob.
(blob-<type><endian>-set! blob n v)
Converts v to a binary value encoded according to type and places it into blob beginning at offset n.

The types are:

u8
unsigned 8-bit integer
s8
signed 8-bit integer
u16
unsigned 16-bit integer
s16
signed 16-bit integer
u32
unsigned 32-bit integer
s32
signed 32-bit integer
u64
unsigned 64-bit integer
s64
signed 64-bit integer
u128
unsigned 128-bit integer
s128
signed 128-bit integer
f32
32-bit IEEE float
fn32
32-bit native float (may not be IEEE)
f64
64-bit IEEE float in native endianism
fn64
64-bit native float (may not be IEEE)
c64
64-bit complex number (two 32-bit IEEE floats)
cn64
64-bit complex number (two 32-bit native floats, may not be IEEE)
c128
128-bit complex number (two 64-bit IEEE floats)
cn128
128-bit complex number (two 64-bit native floats, may not be IEEE)

The endianism values are:

(empty)
Native endianism (system-dependent)
le
Little-endianism
be
Big-endianism

Endianism is not applicable to the following types: s8 u8 fn32 fn64 cn64 cn128

String procedures

(blob-<encoding>-ref blob n l)
Returns a Scheme string corresponding to the binary value encoded according to encoding beginning at offset n in blob and continuing for l bytes.
(blob-<encoding>-set! blob n v)
Converts v to a binary string encoded according to encoding and places it into blob beginning at offset n. Returns the number of bytes encoded.

The encodings are:

utf8
UTF-8 encoding
utf16
UTF-16 encoding (respects BOM if present, defaults to native encoding otherwise)
utf16be
UTF-16BE encoding (treats BOM as a normal character)
utf16le
UTF-16LE encoding (treats BOM as a normal character)

Issues

Pick one:

  1. Offsets are in bytes and can be arbitrary
  2. Offsets are in bytes but must be naturally aligned (divisible by n for an n-byte value)
  3. Offsets are in n-byte sub-blobs (forces natural alignment, SRFI-4 style)

Should blob=? be provided?

I've trashed the UTF-32 conversions because nobody uses UTF-32. They can come back if somebody needs them.

WG1

I propose that WG1 provide blob?, make-blob, blob-length, copy-blob, copy-blob!, blob-u8-ref, blob-u8-set!, partial-blob, copy-partial-blob!.