Next: , Previous: ISO-8859-1 Characters, Up: Characters


5.6 Character Sets

MIT/GNU Scheme's character-set abstraction is used to represent groups of characters, such as the letters or digits. A character set may contain any Unicode character.

— procedure: char-set? object

Returns #t if object is a character set; otherwise returns #f.

— variable: char-set:upper-case
— variable: char-set:lower-case
— variable: char-set:alphabetic
— variable: char-set:numeric
— variable: char-set:alphanumeric
— variable: char-set:whitespace
— variable: char-set:not-whitespace
— variable: char-set:graphic
— variable: char-set:not-graphic
— variable: char-set:standard

These variables contain predefined character sets. At present, these character sets contain only ISO-8859-1 characters; in the future they will contain all the relevant Unicode characters. To see the contents of one of these sets, use char-set->scalar-values.

Alphabetic characters are the 52 upper and lower case letters. Numeric characters are the 10 decimal digits. Alphanumeric characters are those in the union of these two sets. Whitespace characters are #\space, #\tab, #\page, #\linefeed, and #\return. Graphic characters are the printing characters and #\space. Standard characters are the printing characters, #\space, and #\newline. These are the printing characters:

          ! " # $ % & ' ( ) * + , - . /
          0 1 2 3 4 5 6 7 8 9
          : ; < = > ? @
          A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
          [ \ ] ^ _ `
          a b c d e f g h i j k l m n o p q r s t u v w x y z
          { | } ~
     
— procedure: char-upper-case? char
— procedure: char-lower-case? char
— procedure: char-alphabetic? char
— procedure: char-numeric? char
— procedure: char-alphanumeric? char
— procedure: char-whitespace? char
— procedure: char-graphic? char
— procedure: char-standard? object

These predicates are defined in terms of the respective character sets defined above.

— procedure: char-set-member? char-set char

Returns #t if char is in char-set; otherwise returns #f.

— procedure: char-set=? char-set-1 char-set-2

Returns #t if char-set-1 and char-set-2 contain exactly the same characters; otherwise returns #f.

— procedure: char-set char ...

Returns a character set consisting of the specified characters. With no arguments, char-set returns an empty character set.

— procedure: chars->char-set chars

Returns a character set consisting of chars, which must be a list of characters. This is equivalent to (apply char-set chars).

— procedure: string->char-set string

Returns a character set consisting of all the characters that occur in string.

— procedure: scalar-values->char-set items

Returns a character set containing the Unicode scalar values described by items. Items must satisfy well-formed-scalar-values-list?.

— procedure: char-set->scalar-values char-set

Returns a well-formed scalar-values list that describes the Unicode scalar values represented by char-set.

— procedure: well-formed-scalar-values-list? object

Returns #t if object is a well-formed scalar-values list, otherwise returns #f. A well-formed scalar-values list is a proper list, each element of which is either a Unicode scalar value or a pair of Unicode scalar values. A pair of Unicode scalar values represents a contiguous range of Unicode scalar values. The car of the pair is the inclusive lower limit, and the cdr is the exclusive upper limit. The lower limit must be less than or equal to the upper limit.

— procedure: char-set-invert char-set

Returns a character set consisting of the characters that are not in char-set.

— procedure: char-set-difference char-set1 char-set ...

Returns a character set consisting of the characters that are in char-set1 but aren't in any of the char-sets.

— procedure: char-set-intersection char-set ...

Returns a character set consisting of the characters that are in all of the char-sets.

— procedure: char-set-union char-set ...

Returns a character set consisting of the characters that are in at least one o the char-sets.

— procedure: 8-bit-char-set? char-set

Returns #t if char-set contains only 8-bit scalar values (i.e.. ISO-8859-1 characters), otherwise returns #f.

— procedure: ascii-range->char-set lower upper

This procedure is obsolete. Instead use

          (scalar-values->char-set (list (cons lower upper)))
     
— procedure: char-set-members char-set

This procedure is obsolete; instead use char-set->scalar-values.

Returns a newly allocated list of the ISO-8859-1 characters in char-set. If char-set contains any characters outside of the ISO-8859-1 range, they will not be in the returned list.