☜ | ^$CHARACTERM[UMPS] by Example | ☞ |
Introduced in the 1995 ANSI M[UMPS] language standard.
This structured system variable provides information about
character sets.
(Note that all usage of characters and strings in M[UMPS] is
defined in terms of characters, not in terms of bytes; for the
M[UMPS] language, it is not relevant whether a character is
stored in a single byte, or in multiple bytes.)
In most character sets, the first 128 codes correspond to the ASCII set:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Null | SOH | STX | ETX | EOT | ENQ | ACK | Bell | BS | HT | LF | VT | FF | CR | SO | SI |
16 | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | US |
32 | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / | |
48 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
64 | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
80 | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
96 | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
112 | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL |
Many character sets contain 256 characters, the ‘upper’ 128, however, are quite different in the various sets:
ISO–8859–1-USA: For this table, click on ISO-8859-1.
DOSTM: For this table, click on DOS.
DECTM: For this table, click on DEC.
Apple MacintoshTM: For this table, click on Macintosh.
EBCDICTM: For this table, click on EBCDIC.
Write !,"The following character sets are available:"
Set SET=""
For Set SET=$Order(^$Character(SET))
Quit:SET="" Do
. Write !?5,SET
. Quit
^$Character("DEC","INPUT","DOS")="$$DOS2DEC^MYSET"
Convert a value that is found in a “DOS” encoded variable, so
that it may be manipulated in a “DEC” encoded environment.
Internal expansion: Set work=$$DOS2DEC^MYSET(fetch)
This conversion will be executed implicitly when one is working
in a “DEC” encoded environment, and a command like Set
X=^XXX(subs) is executed, while global variable
^XXX is “DOS” encoded.
$Get(^$Character("MYSET","INPUT","OTHERSET"))=""
When no input conversion algorithm is specified in the structured
system variable, no implicit conversion takes place when moving
information from one type of environment to the other.
^$Character("DEC","OUTPUT","DOS")="$$DEC2DOS^MYSET"
Convert a value that is being manipulated in a “DEC” encoded
environment, so that it may be stored in a “DOS” encoded
variable.
Internal expansion: Set store=$$DEC2DOS^MYSET(work)
This conversion will be executed implicitly when one is working
in a “DEC” encoded environment, and a command like Set
^XXX(subs)=X is executed, while global variable
^XXX is “DOS” encoded.
DEC2DOS(STRING) ; Convert DEC to DOS New DOS,DEC ; ; Character: ; À Á Â Ã Ä Å Æ Ç È Set DOS=$Char(065,065,065,065,142,143,146,128,069) Set DEC=$Char(192,193,194,195,196,197,198,199,200) ; ; É Ê Ë Ì Í Î Ï Ñ Set DOS=DOS_$Char(144,069,069,073,073,073,073,165) Set DEC=DEC_$Char(201,202,203,204,205,206,207,209) ; ; Ò Ó Ô Õ Ö Œ Ø Ù Set DOS=DOS_$Char(079,079,079,079,153,079,079,085) Set DEC=DEC_$Char(210,211,212,213,214,215,216,217) ; ; Ú Û Ü Ÿ ß à á â Set DOS=DOS_$Char(085,085,154,089,225,133,160,131) Set DEC=DEC_$Char(218,219,220,221,223,224,225,226) ; ; ã ä å æ ç è é ê Set DOS=DOS_$Char(097,132,134,145,135,138,130,136) Set DEC=DEC_$Char(227,228,229,230,231,232,233,234) ; ; ë ì í î ï ñ ò ó Set DOS=DOS_$Char(137,141,161,140,139,164,149,162) Set DEC=DEC_$Char(235,236,237,238,239,241,242,243) ; ; ô õ ö ö ø ù ú û Set DOS=DOS_$Char(147,111,148,111,237,151,163,150) Set DEC=DEC_$CHAR(244,245,246,247,248,249,250,251) ; ; ü ÿ Set DOS=DOS_$Char(129,152) Set DEC=DEC_$Char(252,253) ; Quit $TRanslate(STRING,DEC,DOS)
In this example, input-conversion and output-conversion look very much alike. Things get more interesting when character-idioms that are one character in one set translate into multiple characters in the other set (umlauts, ligatures, characters with a special form at the end of a word, etcetera).
^$Character("MYSET","OUTPUT","OTHERSET")=""
When no output conversion algorithm is specified in the
structured system variable, no implicit conversion takes place
when moving information from one type of environment to the
other.
^$Character("MYSET","IDENT")="$$IDENT^MYSET"
Check whether a character is a valid one to occur in a name
(other than %, the upper case and lower case alphabetics
and the digits 0 through 9), which are always
valid.
Internal expansion: Set check=$$IDENT^MYSET($ASCII(char))
IDENTM(ASCII) ; For M
Quit 0
IDENTDOS(ASCII) ; For DOS
If ASCII>127,ASCII<166 Quit 1
QUIT 0
IDENTDEC(ASCII) ; For DEC
If ASCII>191,ASCII<222,ASCII'=208 Quit 1
If ASCII>222,ASCII<254,ASCII'=240 Quit 1
Quit 0
^$Character("MYSET","IDENT")=""
If no identification algorithm is specified, the characters that
are used for identifiers in the ASCII (or “M”) character
set are assumed.
^$Character("MYSET","PATCODE","P")="$$PATP^MYSET"
In order to verify whether a character matched the patcode
P, the function that is specified in this node of
^$CHARACTER is used.
Internal expansion: Set check=$$PATP^MYSET($ASCII(char))
This check will be executed implicitly when an expression like
X?1P is evaluated.
Valid codes for the third subscript (the patcode) are any one
‘identifier’ character codes (except for Y and
Z) (including the ones pre-defined in the ANSI
standard). See ^$Character(..."IDENT") for the
specification of which characters can be used in identifiers.
Codes like ZxxxZ (xxx may be any sequence of
valid ‘identifier’ characters) are reserved for implementation-
specific extensions.
Codes like YxxxY (xxx may be any sequence of
valid ‘identifier’ characters) are reserved for application-
specific extensions.
PATU(ASCII) ; For DEC
If ASCII>64,ASCII<91 Quit 1
If ASCII>191,ASCII<222,ASCII'=208 Quit 1
Quit 0
PATL(ASCII) ; For DEC
If ASCII>96,ASCII<123 Quit 1
If ASCII>223,ASCII<254,ASCII'=240 Quit 1
Quit 0
$Get(^$Character("MYSET","PATCODE",patcode))=""
When no pattern check algorithm is defined for a certain pattern
code, no characters in the character set will match that pattern
code.
^$Character("MYSET","COLLATE")="$$COLLATE^MYSET"
Convert a string to an internal format that is used for
establishing a collation sequence.
Internal expansion: Set
intern=$$COLLATE^MYSET(string)
$Get(^$Character("MYSET","COLLATE"))=""
When no specific collating transformation is defined, the string
itself is used for collating purposes.
NOCASE(STRING) ; Case insensitive collating
Set UP="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
Set LO="abcdefghijklmnopqrstuvwxyz"
Quit $TRanslate(STRING,LO,UP)
FRENCH(STRING) ; French collating
New CHARI,CHARN,FIRST,LO,P1,P2,SECOND,THIRD,TMP,UP
; Collating according to the algorithm by
; Alain LaBonté
; As published by ISO on 12 August 1988
Set LO="abcdefghijklmnopqrstuvwxyz"
Set UP="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
Set TMP=$Length(STRING)+2
For Quit:STRING'["Æ" Do
. Set P1=$Piece(STRING,"Æ",1)
. Set P2=$Piece(STRING,"Æ",2,TMP)
. Set STRING=P1_"AE"_P2 Quit
For Quit:STRING'["æ" Do
. Set P1=$Piece(STRING,"æ",1)
. Set P2=$Piece(STRING,"æ",2,TMP)
. Set STRING=P1_"ae"_P2 Quit
For Quit:STRING'["Œ" Do
. Set P1=$Piece(STRING,"Œ",1)
. Set P2=$Piece(STRING,"Œ",2,TMP)
. Set STRING=P1_"OE"_P2 Quit
For Quit:STRING'["œ" Do
. Set P1=$Piece(STRING,"œ",1)
. Set P2=$Piece(STRING,"œ",2,TMP)
. Set STRING=P1_"oe"_P2 Quit
Set CHARI="ÂâÀàÇçÉéÊêÈèËëÎîÏïÔôÛûÙùÜüŸÿ"
Set CHARN="AaAaCcEeEeEeEeIiIiOoUuUuUuYy"
Set ACCNT="3322551133224433443333224444"
Set THIRD=$TRanslate(STRING,CHARI,CHARN)
Set FIRST=$TRanslate(THIRD,UP,LO)
Set TMP=$TRanslate(STRING,$TRanslate(STRING,CHARI))
Set SECOND=$REverse($TRanslate(TMP,CHARI,ACCNT))
Set TMP=$TRanslate($Justify("",26)," ",8)
Set THIRD=$TRanslate(THIRD,LO,TMP)
Set TMP=$TRanslate(TMP,8,9)
Set THIRD=$TRanslate(THIRD,UP,TMP)
Quit FIRST_SECOND_THIRD
The 1995 ANSI M[UMPS] language specification defines the character set profiles for the character sets “ASCII” (based on ANSI X3.4–1990), “M” (which is identical to “ASCII”, except for the collation order) and “JIS90” (based on JIS X0201–1990 and JIX X0208–1990).
Additions in a future M[UMPS] language specification.
A number of characters has been added to the list of valid characters in the name of a character set (but not as the first character of that name): “-” (hyphen, dash), “_” (underscore), “%”, “*”, “.”, “/”, “:”, “$”, “!” and “@”.
The collating algorithm for ISO–8859–1-USA has been defined as a three stage process: first the “base” letter counts, then the case, and then the diacritical marks. “Æ” is collated as if it were “AE”, “æ” is collated as if it were “ae” and “ß” is collated as if it were “ss”. This selection of collating ligatures conforms to ISO 6937. Pattern matches conform to ISO’s rules, i.e. “ª”, “º”, “¹”, “²”, “³”, “¼”, “½” and “¾” are defined as punctuation characters, not as numeric characters, and “µ” is defined as a punctuation character, not as a lower case alphabetic. Two character set profiles are added to the language: “ISO–8859-USA” and “ISO–8859-USA/M”. “ISO–8859-USA” collates equivalent to “ASCII” and “ISO–8859-USA/M” collates equivalent to “M”.
The order (weight) of the diacritical marks is:
1: | ligature (Æ, æ, ß) |
2: | none |
3: | stroke (Ð, ð, Þ, þ) |
4: | acute (Á, á, É, é, Í, í, Ó, ó, Ú, ú, Ý, ý) |
5: | grave (À, à, È, è, Ì, ì, Ò, ò, Ù, ù) |
6: | caret or circonflex (Â, â, Ê, ê, Î, î, Ô, ô, Û, û) |
7: | diaeresis, trema or umlaut (Ä, ä, Ë, ë, Ï, ï, Ö, ö, Ü, ü, Ÿ, ÿ) |
8: | tilde (Ã, ã, Ñ, ñ, Õ, õ) |
9: | ring (Å, å) |
10: | cedilla (Ç, ç) |
11: | slash (Ø, ø) |
Copyright © Standard Documents; 1977-2024 MUMPS Development Committee;
Copyright © Examples: 1995-2024 Ed de Moel;
Copyright © Annotations: 2003-2008 Jacquard Systems Research
Copyright © Annotations: 2008-2024 Ed de Moel.
The information in this page is
NOT authoritative and subject to be modified
at any moment.
Please consult the
appropriate (draft) language standard for an
authoritative definition.
Some specifications are "approved for inclusion in a future standard". Note that the MUMPS Development Committee cannot guarantee that such future standards will indeed be published.
This page most recently updated on 17-Nov-2023, 11:39:21.
For comments, contact Ed de Moel (demoel@jacquardsystems.com)