Changes between Version 1 and Version 2 of StringBytevectorConversionCowan


Ignore:
Timestamp:
11/22/12 12:32:32 (5 years ago)
Author:
cowan
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • StringBytevectorConversionCowan

    v1 v2  
    1 = TBD = 
     1This proposal extends the `(scheme base)` procedures `utf8->string` and `string->utf8` to additional encodings, providing procedures that translate between strings and bytevectors that encode those strings using the UTF-16, UTF-16LE, and UTF-16BE encodings. 
    22 
     3== Procedures == 
    34 
    4 Stuff reclaimed from old BlobAPI 
     5`(utf16->string `''bytevector start end''`)` 
    56 
    6 == String procedures == 
     7`(utf16le->string `''bytevector start end''`)` 
    78 
    8 `(bytevector-<encoding>-ref` ''bytevector'' [ [ ''start'' ] ''end'' ] `)` 
     9`(utf16be->string `''bytevector start end''`)` 
    910 
    10 Returns a newly allocated Scheme string corresponding to the binary value encoded according to ''encoding'' beginning at offset ''n'' in ''bytevector'' and continuing for ''l'' bytes. 
     11Decodes the bytes of ''bytevector'' between ''start'' and ''end'' and returns the corresponding string.  It is an error for ''bytevector'' to contain byte sequences representing characters which are forbidden in strings.  If the first two bytes of the bytevector are either `#xFE #xFF` or `#xFF #xFE`, then `utf16->string` interprets them as a byte order mark, and uses that to interpret the rest of the bytevector as little-endian or big-endian respectively.  If no byte order mark is present, the assumed byte order is implementation-defined.  The other two procedures always interpret the encoded form as little-endian or big-endian respectively, and do not recognize any byte order mark. 
    1112 
    12 `(bytevector-<encoding>-set!` ''bytevector n v''`)` 
     13`(string->utf16 `''string start end''`)` 
    1314 
    14 Converts ''v'' to a binary string encoded according to ''encoding'' and places it into ''bytevector'' beginning at offset ''n''.  Returns the number of bytes encoded. 
     15`(string->utf16le `''string start end''`)` 
    1516 
    16 == String encodings == 
     17`(string->utf16be `''string start end''`)` 
    1718 
    18  `utf8`:: 
    19   UTF-8 encoding 
    20  `utf16`:: 
    21   UTF-16 encoding (respects BOM if present, defaults to native encoding otherwise) 
    22  `utf16be`:: 
    23   UTF-16BE encoding (treats BOM as a normal character) 
    24  `utf16le`:: 
    25   UTF-16LE encoding (treats BOM as a normal character) 
     19Encodes the characters of ''string'' between ''start'' and ''end'' and returns the corresponding bytevector.  `String->utf16` always generates a byte order mark, but the byte order it generates is implementation-defined.  The other two procedures always generate the encoded form as little-endian or big-endian respectively, and do not generate any byte order mark.