Changes between Version 5 and Version 6 of UnicodeCowan


Ignore:
Timestamp:
08/31/10 12:26:57 (7 years ago)
Author:
cowan
Comment:

Formatting

Legend:

Unmodified
Added
Removed
Modified
  • UnicodeCowan

    v5 v6  
    771. The `char->integer` procedure must return an exact integer between `0` and `#xD7FF` or between `#xE000` and `#x10FFFF` when applied to a character supported by the implementation and belonging to the Unicode repertoire.  This integer must be the Unicode scalar value of the character.  Integers between `#xD800` and `#xDFFF` do not correspond to any Unicode character. 
    88 
    9 This is independent of the implementation's internal representation.  For example, a Scheme supporting a repertoire of basic Latin and modern Greek characters only might use the ISO 8859-7 encoding internally, in which lower-case lambda is represented as `#xEB`, but char->integer must still return `#x03BB` on that character. 
     9This is independent of the implementation's internal representation.  For example, a Scheme supporting a repertoire of basic Latin and modern Greek characters only might use the ISO 8859-7 encoding internally, in which lower-case lambda is represented as `#xEB`, but `char->integer` must still return `#x03BB` on that character. 
    1010 
    11 An ASCII-only Scheme satisfies this requirement automatically, provided it does not deliberately scramble the natural result.  (Schemes on EBCDIC systems already have ASCII conversion tables readily available.) 
     11An ASCII-only Scheme satisfies this requirement automatically, provided it does not deliberately scramble the natural mapping.  (Schemes on EBCDIC systems already have ASCII conversion tables readily available.) 
    1212 
    1313If the implementation supports non-Unicode characters, then `char->integer` must return an exact integer greater than `#x10FFFF` when applied to such characters.  For example, characters with "bucky bits" could be implemented in this way. 
     
    23235. The procedures `char-{alphabetic,numeric,whitespace,upper-case, lower-case}?` return `#t` if their arguments have the Unicode properties Alphabetic, Numeric, White_Space, Uppercase, or Lowercase respectively.  Note that many alphabetic characters (though no ASCII ones) are neither upper nor lower case. 
    2424 
    25 6. The `char-downcase` procedure, given an argument that forms the uppercase part of a Unicode upper/lower-case pair, must return the lowercase member of the pair, provided that both characters are supported by the Scheme implementation.  Turkic casing pairs are ignored.  If the argument is not the uppercase part of such a pair, it is returned. 
     256. The `char-downcase` procedure, given an argument that forms the uppercase part of a Unicode upper/lower-case pair, must return the lowercase member of the pair, provided that both characters are supported by the Scheme implementation.  (Turkic casing pairs are ignored.)  If the argument is not the uppercase part of such a pair, it is returned. 
    2626 
    27277. The `char-upcase` procedure works the same way, ''mutatis mutandis''.  Note that many Unicode lowercase characters don't have uppercase equivalents. 
    2828 
    29 8. The `char-foldcase` procedure (an extension to R5RS) applies the Unicode simple case-folding algorithm to its argument, ignoring the Turkic mappings.  Mappings that don't accept or don't produce single characters are ignored. 
     298. The `char-foldcase` procedure (an extension to R5RS) applies the Unicode simple case-folding algorithm to its argument (ignoring the Turkic mappings).  Mappings that don't accept or don't produce single characters are ignored. 
    3030 
    3131In an ASCII-only Scheme, this is equivalent to the `char-downcase` procedure. 
     
    414112. In addition to the identifier characters of the ASCII repertoire specified by R5RS, Scheme implementations may permit any additional repertoire of Unicode characters to be employed in identifiers, provided that each character has a Unicode general category of Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co, or is U+200C or U+200D (the zero-width non-joiner and joiner, respectively, which are needed for correct spelling in Persian, Hindi, and other languages).  No non-Unicode characters may be used in identifiers. 
    4242 
    43 13. All Scheme implementations shall permit the sequence `\x<hexdigits>;` to appear in Scheme identifiers.  If the character with the given Unicode scalar value is supported by the implementation, iδentifiers containing such a sequence are equivalent to identifiers containing the corresponding character. 
     4313. All Scheme implementations shall permit the sequence `\x<hexdigits>;` to appear in Scheme identifiers.  If the character with the given Unicode scalar value is supported by the implementation, identifiers containing such a sequence are equivalent to identifiers containing the corresponding character. 
    4444 
    4545Note that what is said of ASCII also applies to ISO 8859-1 (Latin-1), but not to Windows code page 1252 or other encodings.