Pattern Match Operator

M[UMPS] by Example

Relational operator ‘matches pattern’ (?)

Introduced in the 1977 ANSI M[UMPS] language standard.

The pattern-codes to be used in pattern-matching are:
A: the 26 upper and 26 lower-case alphabetic characters
C: the 33 control-characters
E: the 128 characters in the ASCII set
L: the 26 lower-case characters
N: the 10 digits
P: the 33 punctuation-characters
U: the 26 upper-case characters

Modified for internationalization in the 1995 ANSI M[UMPS] language standard:
A: upper and lower-case characters
C: control-characters
E: all characters in the character set
L: lower-case characters
N: digits
P: punctuation-characters
U: upper-case characters

1 (true) when the value of variable X matches a pattern that "looks like" 3 digits, one point, 2 digits and any number of upper-case symbols, 0 (false) otherwise:

1 (true) when the value of variable X matches the pattern of a 1985 Dutch license plate, 0 (false) otherwise:

1 (true) when the value of variable X is a positive number, less than 1000 with at least 1 and at most 5 digits following the decimal point, 0 (false) otherwise:

Addition in the 1995 ANSI M[UMPS] language standard

In order to support the Japanese character sets, two new pattern identifiers are added:

KA for Kanji ($Char(161) - $Char(223))
ZEN for JIS ($Char(8481) - $Char(32382))

Additions in the 1995 ANSI M[UMPS] language standard (and correction in a future) M[UMPS] language standard:

The concept of ‘alternation’ is introduced. An ‘alternation’ is a list of possible patterns that each are a valid match for a pattern.

is equivalent to

would match "12-345-6" and "12-3:4567".

would match any of:

Approved for addition in a future M[UMPS] language standard:

In order to support the character ISO-8859-1/USA, a new pattern identifier is added:

I: "International" characters (any non-ASCII characters in ISO-8859-1/USA).

It is made possible to exclude certain patterns:

1 (true) when the value of variable X does not contain any control characters, 0 (false) otherwise:

1 (true) when the value of variable X starts and ends with the letter "Y", and no other occurrences of that letter are present in that value, 0 (false) otherwise:

The concept of "ranges" is introduced. It is made possible to specify that a pattern is matched when one of a set of specified characters occurs:

Reference   Value
"word"?.["aeiouAEIOU"]   0 (false)
"ff3a"?.["a":"f"]["A":"F"]N   1 (true)

The first pattern would be matched by strings that contain only vowels; the second pattern would be matched by purely hexadecimal numbers.

As a new feature, it has been made possible to extract the substring that matches a specific sub-pattern from the string that is being "matched". When using this new feature, the name of the variable that is to receive the string-segment in question is named between parentheses following the pattern-atom that it is intended to match.

Assume that the value of local variable X matches the following pattern: X?4N1","1.3N, i.e. 4 numeric digits, one comma and then between one and 3 more digits. The code segment:
If '(X?4N(ITEM)1","1.3N(QUANT(ITEM)) Do ...
would cause the values of local variables ITEM and QUANT(ITEM) to be set to ITEM=$Extract(X,1,4) (the part that matches 4N) and QUANT(ITEM)=$Extract(X,6,$Length(X)) (the part that matches 1.3N).

Note that the assignment occurs as the pattern is being matched (strict left-to-right), so that the value of local variable ITEM is well defined when the pattern matching processor will attempt to assign a value to QUANT(ITEM).

Finally, a special case of indirection is pattern indirection:
>Set string="123-44-5678"
>Write string?3N1"-"2N1"-"4N
>>Set pattern="3N1""-""2N1""-""4N"
>Write pattern
>Write string?@pattern

