Volume 8, Number 1, March 2000

Tips 'N' Tricks: Message In - Garbage out

by Winfried Gerum

Modern information processing equipment is remarkably reliable. The chance for a single bit to flip due to cosmic radiation, natural radioactivity, etc.seems to be negligible. However, as we continue to process more bits and process them faster, the probability rises that at least one faulty bit will be encountered. A single flawed bit in a data stream can wreak havoc if the data are compressed files or directories. If you have BIG money, you can buy hardware that has built-in error correcting storage. For the rest of us there are software techniques which allow detection of erroneous bits.

The method of choice is cyclic redundancy checksums (CRCs). A CRC is a signature computed from a string of data. Small alterations of the data are expected to produce a different CRC. As a CRC has only a few bytes, it is clear that many strings produce the same CRC. A CRC of 1 byte (8 bits) has a chance of 1 in 256 that a corrupted string will have the same CRC. A 2 byte CRC reduces this chance to 1 in 65,536; and a 4 byte CRC reduces it to 1 in 4,294,967,296

Below you will find MUMPS code to compute such CRC values. As MUMPS has strings of CHARACTERS, not strings of BYTES, it is no surprise that there are no bit (or byte) manipulation facilities in the language. The MUMPS Standard carefully avoids any reference to BYTES. From the vista of the Standard a character must have at least 7 bits to accomodate the ASCII Character set, but there is no explicit upper limit! To process UNICODE there would be no need to change the MUMPS language! Therefore it is a challenge to describe something like a CRC within Standard MUMPS.

In reality most MUMPS implementations use exactly 8 bit characters. IF $ASCII($CHAR(N))=N holds true for all values N=0:1:255 for such a MUMPS implementation then the CRC functions provided in this article will work correctly. These functions are probably not exactly fast, but they are portable.

CRC8 produces an 8 bit (1 character) checksum, CRC16 a 16 bit (two character) checksum, and CRC32 produces a 32 bit (four character) checksum.

The values generated have the common property, that $$CRC(X_$$CRC(X)) produces a string of $C(0)

Some modifications retain the properties of the CRC functions:

Note: In many implementations $CHAR(N) does an implicit $CHAR(N#256). Therefore all occurences of "#256" may be purged on some systems without affecting the results (but be sure to include a comment about this!)

So even if MUMPS does not have "bit manipulation functionality" it has again been demonstrated that MUMPS is indeed a general purpose language.

ZCRC	;CYCLIC REDUNDANCY CHECKSUMs
	Q  ;-do not call directly

	;-8 bit cyclic redundancy checksum
	;-Input: X=String whose CRC will be computed.
	;-return value: CRC as 1 char
	;-generating polynomial: G(X)=(X^8)+(X^4)+1
CRC8(X) N C,I,J,R
	S R="00000000"
	F I=1:1:$L(X) S C=$A(X,I) F J=128,64,32,16,8,4,2,1 D
	.I C\J#2!!$E(R)
	.S R=$E(R,2,8)_$T E  Q
	.F J=3 S $E(R,J)='$E(R,J)
	Q $C($$BD(R))

	;-16 bit cyclic redundancy checksum
	;-Input: X=String whose CRC will be computed.
	;-return value: CRC as 2 chars
	;-generating polynomial (CCITT): G(X)=(X^16)+(X^12)+(X^5)+1
CRC16(X) N C,I,J,R
	S R="0000000000000000"
	F I=1:1:$L(X) S C=$A(X,I) F J=128,64,32,16,8,4,2,1 D
	.I C\J#2!!$E(R)
	.S R=$E(R,2,16)_$T E  Q
	.F J=4,11 S $E(R,J)='$E(R,J)
	S R=$$BD(R)
	Q $C(R\256,R#256)

	;-32 bit cyclic redundancy checksum
	;-Input: X=String whose CRC will be computed.
	;-return value: CRC as 4 chars
	;-generating polynomial:
	;-  G(X)=(X^32)+(X^26)+(X^23)+(X^22)+(X^16)+(X^12)+(X^11)+
	;-       (X^10)+(X^8) +(X^7) +(X^5) +(X^4) +(X^2) + X    +1
CRC32(X)	N C,I,J,R
	S R="00000000000000000000000000000000"
	F I=1:1:$L(X) S C=$A(X,I) F J=128,64,32,16,8,4,2,1 D
	.I C\J#2!!$E(R)
	.S R=$E(R,2,32)_$T E  Q
	.F J=6,9,10,16,20,21,22,24,25,27,28,30,31 S $E(R,J)='$E(R,J)
	S R=$$BD(R)
	Q $C(R\16777216,R\65536#256,R\256#256,R#256)

	;-conversion binary to decimal
BD(X)	N I,R S R=0 F I=1:1:$L(X) S R=R*2+$E(X,I)
	Q R

For additional reading on the need of error detection/recovery schemes, see: Wallich, Paul "TO ERR IS MECHANICAL" Scientific American; October 1999; http://www.sciam.com/1999/1099issue/1099cyber.html

Winfried Gerum's column "Tips 'N' Tricks" appears occasionally in M Computing. He can be reached at wg@winner.de.