Button for 1977 Button for 1984 Button for 1990 Button for 1995 Button for MDC Button for notes Button for examples

A: Character Set Profiles

Draft MDC Standard

Annex A: Character Set Profiles (normative)

The definition of a Character Set Profile requires the definition of four elements the names of the characters in the character set and the internal codes which are used to represent them, the definitions of which characters match which pattern codes, the collation scheme used, and the definition of which characters may be used in names.

Note that the patcodes A, C, E, L, N, P, and U are applicable for all character set profiles; in addition patcode E matches any character, not just those listed in any specific charset.

Two collation schemes are provided which only require a properly defined table of characters for the Character Set associated with the specific Character Set Profile.

STRING COLLATION

Determining the Collation Ordering for a Character Set Profile requires the collation value(s) for each character within the character set be accessible a group of values presented as an n-tuple. Each column of the definition table provides one value of the tuple in the specified order. When no value is present in any column, the corresponding character ID value is used in its place. Note that certain characters may be represented with more than one value entry line in the table; in these cases the entries are taken one at a time and treated as if they represented separate characters in the original string (e.g., the character Æ in ISO-Latin1 (Character ID number 198) would be treated as a form of the string "AE").

Let s be any non-empty string. Define the numeric function CVn(s) to return the nth-order collation value for string s: unless otherwise specified this value is determined by evaluating the value in the nth column of each collation tuple for each character in the string examined in left-to-right order and combining them together. Note: selected collation-tuple columns may optionally be designated for right-to-left evaluation.

The Collation Ordering function CO determines relative ordering for a character set. The exact value of this function is not specified here, however, the values formed by any implementation must satisfy the following rules when comparing two non-equal strings:

    Let t also be any non-empty string, not equal to s. The STRING Collation Ordering function CO is defined as:
  1. CO( "" , s ) = s
  2. CO( s , t ) = t
    if, and only if, there is a j such that CVj(t) > CVj(s) and for all i, i=1 ... j–1, CVi(t) = CVi(s);
    otherwise CO(s,t) = s.

M[UMPS] COLLATION

The M[UMPS] Collation Ordering function CO uses the definition of CVn(s) specified in STRING Collation and is otherwise different only with respect to numbers:

    Let s be any non-empty string, let m and n be strings satisfying the definition of numeric data values (see I.7.1.4.3), and u and v be non-empty strings which do not satisfy that definition.
  1. CO( "" , s ) = s
  2. CO( m , n ) = n
    if n > m; otherwise, CO( m , n ) = m
  3. CO( m , u ) = u
  4. CO( u , v ) = v
    if, and only if, there is a j such that CVj(v) > CVj(u) and for all i, i=1 ... j–1, CVi(v) = CVi(u);
    otherwise, CO(u,v)=u.
Button for 1977 Button for 1984 Button for 1990 Button for 1995 Button for MDC Button for notes Button for examples

Copyright © Standard Documents; 1977-2024 MUMPS Development Committee;
Copyright © Examples: 1995-2024 Ed de Moel;
Copyright © Annotations: 2003-2008 Jacquard Systems Research
Copyright © Annotations: 2008-2024 Ed de Moel.

Some specifications are "approved for inclusion in a future standard". Note that the MUMPS Development Committee cannot guarantee that such future standards will indeed be published.

This page most recently updated on 13-Sep-2014, 22:22:19.

For comments, contact Ed de Moel (demoel@jacquardsystems.com)