Structured System variables -- ^$CHARACTER

^$C[HARACTER]

Introduced in the 1995 ANSI M[UMPS] language standard.

This structured system variable provides information about character sets.
(Note that all usage of characters and strings in M[UMPS] is defined in terms of characters, not in terms of bytes; for the M[UMPS] language, it is not relevant whether a character is stored in a single byte, or in multiple bytes.)

In most character sets, the first 128 codes correspond to the ASCII set:

  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 Null SOH STX ETX EOT ENQ ACK Bell BS HT LF VT FF CR SO SI
16 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
32   ! " # $ % & ' ( ) * + , - . /
48 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
64 @ A B C D E F G H I J K L M N O
80 P Q R S T U V W X Y Z [ \ ] ^ _
96 ` a b c d e f g h i j k l m n o
112 p q r s t u v w x y z { | } ~ DEL

Many character sets contain 256 characters, the 'upper' 128, however, are quite different in the various sets:

(Note: the HyperText medium that is used to reproduce this text imposes certain restrictions on which symbols can be displayed. Those special symbols that cannot be rendered within the confines of the HyperText browser will appear as two asterisks (**). The printed version of this text will display all special symbols as the correct graphics.)

ISO-8859-1-USA: click here for this table.

DOSTM: click here for this table.

DECTM: click here for this table.

Apple MacintoshTM: click here for this table.

EBCDICTM: click here for this table.

WRITE !,"The following character sets are available:"
SET SET=""
FOR  SET SET=$ORDER(^$CHARACTER(SET)) QUIT:SET=""  DO
. WRITE !?5,SET
. QUIT

^$CHARACTER("DEC","INPUT","DOS")="$$DOS2DEC^MYSET"
Convert a value that is found in a "DOS" encoded variable, so that it may be manipulated in a "DEC" encoded environment.
Internal expansion: SET work=$$DOS2DEC^MYSET(fetch)
This conversion will be executed implicitly when one is working in a "DEC" encoded environment, and a command like SET X=^XXX(subs) is executed, while global variable ^XXX is "DOS" encoded.

$GET(^$CHARACTER("MYSET","INPUT","OTHERSET"))=""
When no input conversion algorithm is specified in the structured system variable, no implicit conversion takes place when moving information from one type of environment to the other.

^$CHARACTER("DEC","OUTPUT","DOS")="$$DEC2DOS^MYSET"
Convert a value that is being manipulated in a "DEC" encoded environment, so that it may be stored in a "DOS" encoded variable.
Internal expansion: SET store=$$DEC2DOS^MYSET(work)
This conversion will be executed implicitly when one is working in a "DEC" encoded environment, and a command like SET ^XXX(subs)=X is executed, while global variable ^XXX is "DOS" encoded.

DEC2DOS(STRING) ; Convert DEC to DOS
 NEW DOS,DEC
;
; Character:
;               À   Á   Â   Ã   Ä  Å   Æ   Ç   È
 SET DOS=$CHAR(065,065,065,065,142,143,146,128,069)
 SET DEC=$CHAR(192,193,194,195,196,197,198,199,200)
;
;                   É   Ê   Ë   Ì  Í   Î   Ï   Ñ
 SET DOS=DOS_$CHAR(144,069,069,073,073,073,073,165)
 SET DEC=DEC_$CHAR(201,202,203,204,205,206,207,209)
;
;                   Ò   Ó   Ô   Õ  Ö   Œ   Ø   Ù
 SET DOS=DOS_$CHAR(079,079,079,079,153,079,079,085)
 SET DEC=DEC_$CHAR(210,211,212,213,214,215,216,217)
;
;                   Ú   Û   Ü   Ÿ   ß   à   á   â
 SET DOS=DOS_$CHAR(085,085,154,089,225,133,160,131)
 SET DEC=DEC_$CHAR(218,219,220,221,223,224,225,226)
;
;                   ã   ä   å   æ  ç   è   é   ê
 SET DOS=DOS_$CHAR(097,132,134,145,135,138,130,136)
 SET DEC=DEC_$CHAR(227,228,229,230,231,232,233,234)
;
;                   ë   ì   í   î   ï   ñ   ò   ó
 SET DOS=DOS_$CHAR(137,141,161,140,139,164,149,162)
 SET DEC=DEC_$CHAR(235,236,237,238,239,241,242,243)
;
;                   ô   õ   ö   ö   ø   ù   ú   û
 SET DOS=DOS_$CHAR(147,111,148,111,237,151,163,150)
 SET DEC=DEC_$CHAR(244,245,246,247,248,249,250,251)
;
;                   ü   ÿ
 SET DOS=DOS_$CHAR(129,152)
 SET DEC=DEC_$CHAR(252,253)
;
 QUIT $TRANSLATE(STRING,DEC,DOS)

In this example, input-conversion and output-conversion look very much alike. Things get more interesting when character-idioms that are one character in one set translate into multiple characters in the other set (umlauts, ligatures, characters with a special form at the end of a word, etcetera).

^$CHARACTER("MYSET","OUTPUT","OTHERSET")=""
When no output conversion algorithm is specified in the structured system variable, no implicit conversion takes place when moving information from one type of environment to the other.

^$CHARACTER("MYSET","IDENT")="$$IDENT^MYSET"
Check whether a character is a valid one to occur in a name (other than %, the upper case and lower case alphabetics and the digits 0 through 9), which are always valid.
Internal expansion: SET check=$$IDENT^MYSET($ASCII(char))

IDENTM(ASCII) ; For M
QUIT 0

IDENTDOS(ASCII) ; For DOS
IF ASCII>127,ASCII<166 QUIT 1
QUIT 0

IDENTDEC(ASCII) ; For DEC
IF ASCII>191,ASCII<222,ASCII'=208 QUIT 1
IF ASCII>222,ASCII<254,ASCII'=240 QUIT 1
QUIT 0

^$CHARACTER("MYSET","IDENT")=""
If no identification algorithm is specified, the characters that are used for identifiers in the ASCII (or "M") character set are assumed.

^$CHARACTER("MYSET","PATCODE","P")="$$PATP^MYSET"
In order to verify whether a character matched the patcode P, the function that is specified in this node of ^$CHARACTER is used.
Internal expansion: SET check=$$PATP^MYSET($ASCII(char))
This check will be executed implicitly when an expression like X?1P is evaluated.

Valid codes for the third subscript (the patcode) are any one 'identifier' character codes (except for Y and Z) (including the ones pre-defined in the ANSI standard). See ^$CHARACTER(..."IDENT") for the specification of which characters can be used in identifiers.
Codes like ZxxxZ (xxx may be any sequence of valid 'identifier' characters) are reserved for implementation- specific extensions.
Codes like YxxxY (xxx may be any sequence of valid 'identifier' characters) are reserved for application- specific extensions.

PATU(ASCII) ; For DEC
IF ASCII>64,ASCII<91 QUIT 1
IF ASCII>191,ASCII<222,ASCII'=208 QUIT 1
QUIT 0

PATL(ASCII) ; For DEC
IF ASCII>96,ASCII<123 QUIT 1
IF ASCII>223,ASCII<254,ASCII'=240 QUIT 1
QUIT 0

$GET(^$CHARACTER("MYSET","PATCODE",patcode))=""
When no pattern check algorithm is defined for a certain pattern code, no characters in the character set will match that pattern code.

^$CHARACTER("MYSET","COLLATE")="$$COLLATE^MYSET"
Convert a string to an internal format that is used for establishing a collation sequence.
Internal expansion: SET intern=$$COLLATE^MYSET(string)

$GET(^$CHARACTER("MYSET","COLLATE"))=""
When no specific collating transformation is defined, the string itself is used for collating purposes.

NOCASE(STRING) ; Case insensitive collating
SET UP="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
SET LO="abcdefghijklmnopqrstuvwxyz"
QUIT $TRANSLATE(STRING,LO,UP)

FRENCH(STRING) ; French collating
NEW CHARI,CHARN,FIRST,LO,P1,P2,SECOND,THIRD,TMP,UP
; Collating according to the algorithm by
; Alain LaBonté
; As published by ISO on 12 August 1988
SET LO="abcdefghijklmnopqrstuvwxyz"
SET UP="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
SET TMP=$LENGTH(STRING)+2
FOR  QUIT:STRING'["Æ"  DO
. SET P1=$PIECE(STRING,"Æ",1)
. SET P2=$PIECE(STRING,"Æ",2,TMP)
. SET STRING=P1_"AE"_P2 QUIT
FOR  QUIT:STRING'["æ"  DO
. SET P1=$PIECE(STRING,"æ",1)
. SET P2=$PIECE(STRING,"æ",2,TMP)
. SET STRING=P1_"ae"_P2 QUIT
FOR  QUIT:STRING'["Œ"  DO
. SET P1=$PIECE(STRING,"Œ",1)
. SET P2=$PIECE(STRING,"Œ",2,TMP)
. SET STRING=P1_"OE"_P2 QUIT
FOR  QUIT:STRING'["œ"  DO
. SET P1=$PIECE(STRING,"œ",1)
. SET P2=$PIECE(STRING,"œ",2,TMP)
. SET STRING=P1_"oe"_P2 QUIT
SET CHARI="ÂâÀàÇçÉéÊêÈèËëÎîÏïÔôÛûÙùÜüŸÿ"
SET CHARN="AaAaCcEeEeEeEeIiIiOoUuUuUuYy"
SET ACCNT="3322551133224433443333224444"
SET THIRD=$TRANSLATE(STRING,CHARI,CHARN)
SET FIRST=$TRANSLATE(THIRD,UP,LO)
SET TMP=$TRANSLATE(STRING,$TRANSLATE(STRING,CHARI))
SET SECOND=$REVERSE($TRANSLATE(TMP,CHARI,ACCNT))
SET TMP=$TRANSLATE($JUSTIFY("",26)," ",8)
SET THIRD=$TRANSLATE(THIRD,LO,TMP)
SET TMP=$TRANSLATE(TMP,8,9)
SET THIRD=$TRANSLATE(THIRD,UP,TMP)
QUIT FIRST_SECOND_THIRD

The 1995 ANSI M[UMPS] language specification defines the character set profiles for the character sets "ASCII" (based on ANSI X3.4-1990), "M" (which is identical to "ASCII", except for the collation order) and "JIS90" (based on JIS X0201-1990 and JIX X0208-1990).

Additions in a future ANSI M[UMPS] language specification.

A number of characters has been added to the list of valid characters in the name of a character set (but not as the first character of that name): "-" (hyphen, dash), "_" (underscore), "%", "*", ".", "/", ":", "$", "!" and "@".

The collating algorithm for ISO-8859-1-USA has been defined as a three stage process: first the "base" letter counts, then the case, and then the diacritical marks. "Æ" is collated as if it were "AE", "æ" is collated as if it were "ae" and "ß" is collated as if it were "ss". This selection of collating ligatures conforms to ISO 6937. Pattern matches conform to ISO's rules, i.e. "**", "**", "**", "**", "**", "**", "**" and "**" are defined as punctuation characters, not as numeric characters, and "**" are defined as a punctuation character, not as a lower case alphabetic. Two character set profiles are added to the language: "ISO-8859-USA" and "ISO-8859-USA/M". "ISO-8859-USA" collates equivalent to "ASCII" and "ISO-8859-USA/M" collates equivalent to "M".

The order (weight) of the diacritical marks is:

1: ligature (Æ, æ, ß)
2: none
3: stroke (Ð, ð, Þ, þ)
4: acute (Á, á, É, é, Í, í, Ó, ó, Ú, ú, Ý, ý)
5: grave (À, à, È, è, Ì, ì, Ò, ò, Ù, ù)
6: caret or circonflex (Â, â, Ê, ê, Î, î, Ô, ô, Û, û)
7: diaeresis, trema or umlaut (Ä, ä, Ë, ë, Ï, ï, Ö, ö, Ü, ü, Ÿ, ÿ)
8: tilde (Ã, ã, Ñ, ñ, Õ, õ)
9: ring (Å, å)
10: cedilla (Ç, ç)
11: slash (Ø, ø)


This document is © Ed de Moel, 1995-2005.
It is part of a book by Ed de Moel that is published under the title "M[UMPS] by Example" (ISBN 0-918118-42-5).
Printed copies of the book are no longer available.

This document describes the various special (system) variables that are defined in the M[UMPS] language standard (ANSI X11.1, ISO 11756).

The information in this document is NOT authoritative and subject to be modified at any moment.
Please consult the appropriate (draft) language standard for an authoritative definition.

In this document, information is included that will appear in future standards.
The MDC cannot guarantee that these 'next' standards will indeed appear.