wiki:PortsCowan

Version 10 (modified by cowan, 7 years ago) (diff)

Clarifications and cleanups

Ports

Here's my proposal for WG1 Scheme ports and I/O facilities. It's fully upward compatible with R5RS, but takes ideas from R6RS, SRFI 91 and SRFI 6. Many of the concepts are present in R6RS under other names.

The new features beyond R5RS are:

  • partial control of buffering, character encoding, newline translation, and Scheme case sensitivity
  • string ports (SRFI-6-compatible)
  • binary ports and blob ports (the binary version of string ports)
  • current-error-port, flush-output-port, read-line, delete-file, and file-exists? from R6RS

Port model

In this proposal, there are two kinds of ports, binary ports and character ports. Unusually, every binary port is automatically a character port, though not vice versa. Character ports therefore support character I/O operations, and binary ports support both character I/O operations and binary I/O operations. Indeed, all I/O operations which are valid on a character port are also valid on a binary port.

This implies that some character encoding must be associated with each binary port so that character I/O can be performed on it. The only encoding that implementations MUST support is ASCII, so this is relatively cost-free in the simplest case, since ASCII encoding does not require a separate character buffer or encoding translation table, only a few bits for newline translation and case sensitivity.

However, not all character ports are binary ports. For example, string ports have no concept of character-to-byte encoding, because they only deal with sequences of characters. Although not specified in this proposal, a further generalization is object ports, whose fundamental I/O unit is the Scheme object. All character ports are object ports, because there is a standard encoding of (most) Scheme objects to characters.

This proposal does not specify any way to create a bidirectional port, but allows for their possible existence in an implementation. Sockets, terminals, and pipes are all possible examples.

Filename model

In this proposal, a filename may be specified either as a string or as a settings list, which is a list of alternating keys and values where every key is a symbol. Specifying a string is equivalent to specifying the settings list (path string). Implementations MUST support the following keys:

path

Specifies the filename. The interpretation of filenames is implementation-dependent. There is no default value, but implementations MAY accept other keys in lieu of this one for opening files or file-like objects that don't have string names.

buffering

Specifies what kind of buffering is present. The value #f means no buffering is employed; binary means that there is a binary buffer but no character buffer; #t means there are both character and binary buffering. Other values MAY be specified by an implementation. Buffer sizes are implementation-dependent. The default value is implementation-dependent.

encoding

Specifies what character encoding to use on a binary port. The (case insensitive) value US-ASCII MUST be supported. The values ISO-8859-1 and UTF-8 SHOULD be supported if the implementation contains the appropriate repertoire of characters. Other values MAY be supported, which SHOULD appear in the IANA list of encodings. The default value is implementation-dependent.

If a BOM (Byte Order Mark, U+FEFF) is present at the beginning of input on a port encoded as UTF-8, it is skipped. A BOM is not automatically written on output. Implementations MAY provide a way around this.

newline

Specifies how to translate newlines. The value #f means that no translation is performed. Any other value causes all of CR, LF, CR+LF, NEL (U+0085), CR+NEL, and LS (U+2028) to be translated to #\newline on input. On output, the translation is implementation-dependent. Other values MAY be specified by an implementation. The default value is implementation-dependent.

case-sensitive

Specifies if Scheme symbols are read from the port case-sensitively or not. The value #f means that upper-case letters in symbols are translated to lower case unless escaped; #t means that no translation is done. The default value is implementation-dependent.

Implementations MAY support other keys, SHOULD warn if they detect keys they do not understand, and MAY signal an error in such cases.

Port object procedures

(input-port? obj)

(output-port? obj)

Same as R5RS, but also return #t on bidirectional ports if the implementation provides them.

(port? obj)

Mentioned in R5RS section 3 but not in section 6.6. Part of R6RS.

(current-input-port)

(current-output-port)

Same as R5RS. These are binary ports whose character encoding is implementation-dependent.

(current-error-port)

From R6RS. This is a binary port whose character encoding is implementation-dependent.

(flush-output-port [ output-port ] [ character-only? ])

Same as R6RS, except that if the output-port is omitted, the default port is the current output port. Drains the character buffer of output-port, if any. Then, if the port is a binary port, drains the binary buffer, if any.

(close-input-port port)

(close-output-port port)

From R5RS. Close-output-port implicitly calls flush-output-port first.

(close-port)

Closes both sides of a bidirectional port, if the implementation provides them; otherwise the same as close-input-port or close-output-port as the case may be. It is harmless to close a part, or one side of it, if it is already closed.

(eof-object? obj)

Same as R5RS.

(port-settings-list port)

Return the settings list of port as a list in no particular order. Additional implementation-defined keys or default values may be present.

Character I/O procedures

(character-port? obj)

Returns #t if obj is a character port. SRFI 91 calls this char-port. Implementations may define other kinds of character ports.

(read-char [ character-input-port ])

(write-char [ character-output-port ])

(newline [ character-output-port ])

(peek-char [ character-input-port ])

(char-ready? [ character-input-port ])

Same as R5RS. It is an error to output characters not present in the encoding of a character-output-port.

(read-line [ character-input-port ])

Same as R6RS. Reads a line from port (or the current input port) terminated by a #\newline character (which may be the result of newline conversion in the port). The #\newline is not part of the returned string. This is a convenience function.

(open-input-string string)

(open-output-string)

(get-output-string output-string-port)

Same as SRFI 6.

Binary I/O procedures

String ports are character ports, but not binary ports, so these procedures do not apply to them. Implementations MAY support other kinds of binary ports such as process ports or stream socket ports.

Mixing binary I/O with character I/O on the same port is safe if there is no character buffering on that port, but produces undefined behavior otherwise.

(binary-port? obj)

Returns #t if obj is a binary port. SRFI 91 calls this byte-port?.

(read-u8 [ binary-input-port ])

(write-u8 [ binary-output-port ])

(peek-u8 [ binary-input-port ])

(u8-ready? [ binary-input-port ])

The direct binary analogues of read-char, write-char, newline, peek-char, and char-ready? respectively. They return an exact integer between 0 and 255 rather than a character. SRFI 91 talks of byte rather than u8.

(open-input-blob blob)

(open-output-blob)

(get-output-blob output-blob-port)

Blobs are (possibly specialized) vectors containing integers from 0 to 255 inclusive. These procedures are the binary analogues of the SRFI 6 string port procedures. The term blob is subject to change.

File Module

This is a separate module because some implementations will not have access to a file system.

(call-with-input-file filename proc)

(call-with-output-file filename proc)

(with-input-from-file filename thunk)

(with-output-to-file filename thunk)

(open-input-file filename)

(open-output-file filename)

Same as R5RS, except that any filename argument may be a string or a settings list.

(delete-file filename)

(file-exists? filename)

Same as R6RS. Only string filenames are supported.

Read and Write Modules

These procedures are not in the core because many systems, especially embedded ones, don't require the ability to read or write general Scheme objects, and very small implementations may not want the overhead of a Scheme parser. Writing may be useful even when reading is not, which is why there are two modules.

Note that implementations may provide ports that are not character ports, such as directory ports or vector ports, and extend these procedures to work on them.

Read Module

(read [ input-port ])

Same as R5RS.

Write Module

(write obj [ output-port ])

Same as R5RS, but specifies that only ASCII characters may be output (for re-readability). Non-ASCII characters in symbols, strings, and character literals MUST be escaped.

(display obj [ port ])

Same as R5RS. It is an error to output characters not present in the encoding of output-port.

Thanks

Thanks to the R5RS and R6RS editors; to Marc Feeley, author of SRFI 91 and Gambit-C; and to Will Clinger, author of SRFI 6.