The Annotated M[UMPS] Standards

☜

^$CHARACTER

M[UMPS] by Example

☞

Introduced in the 1995 ANSI M[UMPS] language standard.

This structured system variable provides information about character sets.
(Note that all usage of characters and strings in M[UMPS] is defined in terms of characters, not in terms of bytes; for the M[UMPS] language, it is not relevant whether a character is stored in a single byte, or in multiple bytes.)

In most character sets, the first 128 codes correspond to the ASCII set:

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
0	Null	SOH	STX	ETX	EOT	ENQ	ACK	Bell	BS	HT	LF	VT	FF	CR	SO	SI
16	DLE	DC1	DC2	DC3	DC4	NAK	SYN	ETB	CAN	EM	SUB	ESC	FS	GS	RS	US
32		!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
48	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
64	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
80	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
96	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
112	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~	DEL

Many character sets contain 256 characters, the ‘upper’ 128, however, are quite different in the various sets:

ISO–8859–1-USA: For this table, click on ISO-8859-1.

DOS^TM: For this table, click on DOS.

DEC^TM: For this table, click on DEC.

Apple Macintosh^TM: For this table, click on Macintosh.

EBCDIC^TM: For this table, click on EBCDIC.

Write !,"The following character sets are available:" Set SET="" For Set SET=$Order(^$Character(SET)) Quit:SET="" Do . Write !?5,SET . Quit

^$Character("DEC","INPUT","DOS")="$$DOS2DEC^MYSET"
Convert a value that is found in a “DOS” encoded variable, so that it may be manipulated in a “DEC” encoded environment.
Internal expansion: Set work=$$DOS2DEC^MYSET(fetch)
This conversion will be executed implicitly when one is working in a “DEC” encoded environment, and a command like Set X=^XXX(subs) is executed, while global variable ^XXX is “DOS” encoded.

$Get(^$Character("MYSET","INPUT","OTHERSET"))=""
When no input conversion algorithm is specified in the structured system variable, no implicit conversion takes place when moving information from one type of environment to the other.

^$Character("DEC","OUTPUT","DOS")="$$DEC2DOS^MYSET"
Convert a value that is being manipulated in a “DEC” encoded environment, so that it may be stored in a “DOS” encoded variable.
Internal expansion: Set store=$$DEC2DOS^MYSET(work)
This conversion will be executed implicitly when one is working in a “DEC” encoded environment, and a command like Set ^XXX(subs)=X is executed, while global variable ^XXX is “DOS” encoded.

DEC2DOS(STRING) ; Convert DEC to DOS
 New DOS,DEC
;
; Character:
;               À   Á   Â   Ã   Ä  Å   Æ   Ç   È
 Set DOS=$Char(065,065,065,065,142,143,146,128,069)
 Set DEC=$Char(192,193,194,195,196,197,198,199,200)
;
;                   É   Ê   Ë   Ì  Í   Î   Ï   Ñ
 Set DOS=DOS_$Char(144,069,069,073,073,073,073,165)
 Set DEC=DEC_$Char(201,202,203,204,205,206,207,209)
;
;                   Ò   Ó   Ô   Õ  Ö   Œ   Ø   Ù
 Set DOS=DOS_$Char(079,079,079,079,153,079,079,085)
 Set DEC=DEC_$Char(210,211,212,213,214,215,216,217)
;
;                   Ú   Û   Ü   Ÿ   ß   à   á   â
 Set DOS=DOS_$Char(085,085,154,089,225,133,160,131)
 Set DEC=DEC_$Char(218,219,220,221,223,224,225,226)
;
;                   ã   ä   å   æ  ç   è   é   ê
 Set DOS=DOS_$Char(097,132,134,145,135,138,130,136)
 Set DEC=DEC_$Char(227,228,229,230,231,232,233,234)
;
;                   ë   ì   í   î   ï   ñ   ò   ó
 Set DOS=DOS_$Char(137,141,161,140,139,164,149,162)
 Set DEC=DEC_$Char(235,236,237,238,239,241,242,243)
;
;                   ô   õ   ö   ö   ø   ù   ú   û
 Set DOS=DOS_$Char(147,111,148,111,237,151,163,150)
 Set DEC=DEC_$CHAR(244,245,246,247,248,249,250,251)
;
;                   ü   ÿ
 Set DOS=DOS_$Char(129,152)
 Set DEC=DEC_$Char(252,253)
;
 Quit $TRanslate(STRING,DEC,DOS)

In this example, input-conversion and output-conversion look very much alike. Things get more interesting when character-idioms that are one character in one set translate into multiple characters in the other set (umlauts, ligatures, characters with a special form at the end of a word, etcetera).

^$Character("MYSET","OUTPUT","OTHERSET")=""
When no output conversion algorithm is specified in the structured system variable, no implicit conversion takes place when moving information from one type of environment to the other.

^$Character("MYSET","IDENT")="$$IDENT^MYSET"
Check whether a character is a valid one to occur in a name (other than %, the upper case and lower case alphabetics and the digits 0 through 9), which are always valid.
Internal expansion: Set check=$$IDENT^MYSET($ASCII(char))

IDENTM(ASCII) ; For M Quit 0

IDENTDOS(ASCII) ; For DOS If ASCII>127,ASCII<166 Quit 1 QUIT 0

IDENTDEC(ASCII) ; For DEC If ASCII>191,ASCII<222,ASCII'=208 Quit 1 If ASCII>222,ASCII<254,ASCII'=240 Quit 1 Quit 0

^$Character("MYSET","IDENT")=""
If no identification algorithm is specified, the characters that are used for identifiers in the ASCII (or “M”) character set are assumed.

^$Character("MYSET","PATCODE","P")="$$PATP^MYSET"
In order to verify whether a character matched the patcode P, the function that is specified in this node of ^$CHARACTER is used.
Internal expansion: Set check=$$PATP^MYSET($ASCII(char))
This check will be executed implicitly when an expression like X?1P is evaluated.

Valid codes for the third subscript (the patcode) are any one ‘identifier’ character codes (except for Y and Z) (including the ones pre-defined in the ANSI standard). See ^$Character(..."IDENT") for the specification of which characters can be used in identifiers.
Codes like ZxxxZ (xxx may be any sequence of valid ‘identifier’ characters) are reserved for implementation- specific extensions.
Codes like YxxxY (xxx may be any sequence of valid ‘identifier’ characters) are reserved for application- specific extensions.

PATU(ASCII) ; For DEC If ASCII>64,ASCII<91 Quit 1 If ASCII>191,ASCII<222,ASCII'=208 Quit 1 Quit 0

PATL(ASCII) ; For DEC If ASCII>96,ASCII<123 Quit 1 If ASCII>223,ASCII<254,ASCII'=240 Quit 1 Quit 0

$Get(^$Character("MYSET","PATCODE",patcode))=""
When no pattern check algorithm is defined for a certain pattern code, no characters in the character set will match that pattern code.

^$Character("MYSET","COLLATE")="$$COLLATE^MYSET"
Convert a string to an internal format that is used for establishing a collation sequence.
Internal expansion: Set intern=$$COLLATE^MYSET(string)

$Get(^$Character("MYSET","COLLATE"))=""
When no specific collating transformation is defined, the string itself is used for collating purposes.

NOCASE(STRING) ; Case insensitive collating Set UP="ABCDEFGHIJKLMNOPQRSTUVWXYZ" Set LO="abcdefghijklmnopqrstuvwxyz" Quit $TRanslate(STRING,LO,UP)

FRENCH(STRING) ; French collating New CHARI,CHARN,FIRST,LO,P1,P2,SECOND,THIRD,TMP,UP ; Collating according to the algorithm by ; Alain LaBonté ; As published by ISO on 12 August 1988 Set LO="abcdefghijklmnopqrstuvwxyz" Set UP="ABCDEFGHIJKLMNOPQRSTUVWXYZ" Set TMP=$Length(STRING)+2 For Quit:STRING'["Æ" Do . Set P1=$Piece(STRING,"Æ",1) . Set P2=$Piece(STRING,"Æ",2,TMP) . Set STRING=P1_"AE"_P2 Quit For Quit:STRING'["æ" Do . Set P1=$Piece(STRING,"æ",1) . Set P2=$Piece(STRING,"æ",2,TMP) . Set STRING=P1_"ae"_P2 Quit For Quit:STRING'["Œ" Do . Set P1=$Piece(STRING,"Œ",1) . Set P2=$Piece(STRING,"Œ",2,TMP) . Set STRING=P1_"OE"_P2 Quit For Quit:STRING'["œ" Do . Set P1=$Piece(STRING,"œ",1) . Set P2=$Piece(STRING,"œ",2,TMP) . Set STRING=P1_"oe"_P2 Quit Set CHARI="ÂâÀàÇçÉéÊêÈèËëÎîÏïÔôÛûÙùÜüŸÿ" Set CHARN="AaAaCcEeEeEeEeIiIiOoUuUuUuYy" Set ACCNT="3322551133224433443333224444" Set THIRD=$TRanslate(STRING,CHARI,CHARN) Set FIRST=$TRanslate(THIRD,UP,LO) Set TMP=$TRanslate(STRING,$TRanslate(STRING,CHARI)) Set SECOND=$REverse($TRanslate(TMP,CHARI,ACCNT)) Set TMP=$TRanslate($Justify("",26)," ",8) Set THIRD=$TRanslate(THIRD,LO,TMP) Set TMP=$TRanslate(TMP,8,9) Set THIRD=$TRanslate(THIRD,UP,TMP) Quit FIRST_SECOND_THIRD

The 1995 ANSI M[UMPS] language specification defines the character set profiles for the character sets “ASCII” (based on ANSI X3.4–1990), “M” (which is identical to “ASCII”, except for the collation order) and “JIS90” (based on JIS X0201–1990 and JIX X0208–1990).

Additions in a future M[UMPS] language specification.

A number of characters has been added to the list of valid characters in the name of a character set (but not as the first character of that name): “-” (hyphen, dash), “_” (underscore), “%”, “*”, “.”, “/”, “:”, “$”, “!” and “@”.

The collating algorithm for ISO–8859–1-USA has been defined as a three stage process: first the “base” letter counts, then the case, and then the diacritical marks. “Æ” is collated as if it were “AE”, “æ” is collated as if it were “ae” and “ß” is collated as if it were “ss”. This selection of collating ligatures conforms to ISO 6937. Pattern matches conform to ISO’s rules, i.e. “ª”, “º”, “¹”, “²”, “³”, “¼”, “½” and “¾” are defined as punctuation characters, not as numeric characters, and “µ” is defined as a punctuation character, not as a lower case alphabetic. Two character set profiles are added to the language: “ISO–8859-USA” and “ISO–8859-USA/M”. “ISO–8859-USA” collates equivalent to “ASCII” and “ISO–8859-USA/M” collates equivalent to “M”.

The order (weight) of the diacritical marks is:

1:	ligature (Æ, æ, ß)
2:	none
3:	stroke (Ð, ð, Þ, þ)
4:	acute (Á, á, É, é, Í, í, Ó, ó, Ú, ú, Ý, ý)
5:	grave (À, à, È, è, Ì, ì, Ò, ò, Ù, ù)
6:	caret or circonflex (Â, â, Ê, ê, Î, î, Ô, ô, Û, û)
7:	diaeresis, trema or umlaut (Ä, ä, Ë, ë, Ï, ï, Ö, ö, Ü, ü, Ÿ, ÿ)
8:	tilde (Ã, ã, Ñ, ñ, Õ, õ)
9:	ring (Å, å)
10:	cedilla (Ç, ç)
11:	slash (Ø, ø)

Copyright © Standard Documents; 1977-2024 MUMPS Development Committee;
Copyright © Examples: 1995-2024 Ed de Moel;
Copyright © Annotations: 2003-2008 Jacquard Systems Research
Copyright © Annotations: 2008-2024 Ed de Moel.

The information in this page is NOT authoritative and subject to be modified at any moment.
Please consult the appropriate (draft) language standard for an authoritative definition.

Some specifications are "approved for inclusion in a future standard". Note that the MUMPS Development Committee cannot guarantee that such future standards will indeed be published.

This page most recently updated on 17-Nov-2023, 11:39:21.

For comments, contact Ed de Moel (demoel@jacquardsystems.com)