A web-interface to GT.M

or, Less is More

Ed de Moel, Jacquard Systems Research

One of the first questions I asked when I started to consider GT.M (Greystone MUMPS) as an alternative for MSM and Caché was about the nature of the interface between this implementation of M[UMPS] and "the web". The answer surprised me a little: I was informed that there was none. When I pressed the issue, I learned that Winfried Bantel had implemented "something based on CGI", but that was all the information that was available.

As I was pretty busy at that time, I left that issue on the "to do" pile on my desk, and recently, I finally got around to looking at what is or isn't there. The result was very surprising: there was nothing, which ends up meaning that everything that is needed is there, and the possibilities are almost unlimited.

The CGI interface

The basic premise of the CGI interface is that, when a certain request is made from a browser, a program can be started on the computer that acts as the web-server for such requests. When that program is started, the low-level software on the server-computer has set up a number of "environment variables" that contain the details about the request, and it is up to the application program to interpret these variables and take appropriate action. The underlying software is called the Common Gateway Interface (CGI), and its variables are commonly referred to as "the CGI variables". The example below will describe how these variables can be accessed from a GT.M program.

The surprise

The approach that Winfried Bantel had taken was to use this (bare-bones) interface, create a little program in C that sets up the parameters needed to invoke GT.M correctly, and then call GT.M to execute the M[UMPS] program that would process the web-request. So there it was: without any software to be provided by the implementor of the M[UMPS] system, the full power of CGI is available to the M[UMPS] application programmer, and the only limitations that matter are those that the application programmer chooses to put on his or her application.

The C program is basically a "one-liner". All it does is set up some parameter variables, and then call GT.M. The sample code below is what I used for my first test (I only made some slight enhancements to the sample program that was provided by Winfried Bantel) (click here to download the C-program):


#include <stdio.h>
#include <stdlib.h>

int main ()
{

    /* Where GT.M lives */
    char gtmdir [] = "/home/edm/gtm/v4p2cd";

    /* Where the application lives */
    char appdir []  = "/home/edm/gtm/edm";

    /* GT.M database descriptor file */
    char gtmdb []  = "/edm.gld;

    /* MUMPS program to execute */
    char mumpspgm [] = "^TestCGI";

    char routpath [256];
    char execute [256];
    char gtmgbldir [256];

    setenv("gtm_dist", gtmdir, 1);

    sprintf(gtmgbldir, "%s%s", appdir, gtmdb);
    setenv("gtmgbldir", gtmgbldir, 1);

    sprintf (routpath, "%s %s", appdir, gtmdir);
    setenv("gtmroutines", routpath, 1);

    sprintf(execute,"cd %s;%s/mumps -run %s", appdir, gtmdir, mumpspgm);

    if (system(execute)) /* Invoke GT.M */
    { /* Error occured */
        printf("CONTENT-TYPE: TEXT/PLAIN\n\nError!!!\n");
        printf("gtmdir      = %s\n", gtmdir);
        printf("database    = %s\n", gtmdb);
        printf("gtmgbldir   = %s\n", getenv("gtmgbldir"));
        printf("gtmroutines = %s\n", getenv("gtmroutines"));
        printf("gtm_dist    = %s\n", getenv("gtm_dist"));
        printf("execute     = %s\n", execute);
    }
}

At the top of the routine, the parameters are defined. The values used for my test are shown in red bold italics. Next, these values are copied into the appropriate places, and then GT.M is invoked using the built-in function system. That's all that is needed.

If there should be an error in one of the parameters, the invocation of system(execute) will fail, and the bottom-part of the program will execute. In such a case, the text generated by the printfs in this section will appear as an error message in the browser window.

In order to make this program work, it has to be compiled (cc TestCGI.c), and the resulting executable program will have to be copied into the directory where the CGI processor expects to find its executables.

(In my case: mv a.out /home/httpd/cgi-bin/TestCGI).

The M[UMPS] part

Now, let's look at the M[UMPS] program. One difference between GT.M and many other implementations of M[UMPS] is that programs are compiled rather than interpreted. This means that the source code is stored in "normal" files in directories of the operating system, as well as the executeble versions of the programs.
In this case, the source code would be stored in TestCGI.m, and the compiled (executable) code would be stored in TestCGI.o. In the C program, the directory that contains these files was specified as /home/edm/gtm/edm.

The M[UMPS] program, of course, is supposed to do "whatever the application requires". But, before the actual application can start, the parameters passed in the web-request (often called URL, Uniform Resource Locator, or URI, Uniform Resource Identifier) need to be copied from the CGI environment into the M[UMPS] environment.

Let's look at the start of my test program: (click here to download the M[UMPS]-program):


TestCGI ; CGI-Call for GTM
 ;
 New i,data,entry
 Write "CONTENT-TYPE: TEXT/HTML",!!
 Set $ZTRAP="Do Error^"_$Text(+0)_"($ZSTATUS)"
 ;
 If $ZTRNLNM("REQUEST_METHOD")="POST" Do
 . ; Retrieve variables from <stdin>
 . Read data#$ZTRNLNM("CONTENT_LENGTH"):5
 . Quit
 Else  Do
 . ; Retrieve variables from Query-String
 . Set data=$ZTRNLNM("QUERY_STRING")
 . Quit
 ;
 For i=1:1:$Length(data,"&") Do
 . New ind,pc,val
 . Set pc=$Piece(data,"&",i)
 . Set ind=$$URLin($Piece(pc,"=",1)),val=$$URLin($Piece(pc,"=",2))
 . Set:ind'="" %Key(ind)=val
 . Quit
 ;
 Set entry=$Get(%Key("Action"))
 If entry="" Do Error("No action requested.")
 ; Error trap does not return
 ;
 If entry="xxx" Do xxx Halt
 If entry="yyy" Do yyy Halt
 . . .
 ;
 Do Error("Invalid action requested: """_entry_""".")
 ; Error trap does not return
 ;

The first thing that this program does is find out which of the possible mechanism the requestor used to pass parameters in the web-request. There are two possible ways: GET and POST. When the request is a "GET", all parameters are specified on the URL; when the request is a "POST", the parameters can be read from standard input (which is the current device when the program is started). Examples below will demonstrate when which option will be used.

The program checks which method applies for the current invocation by fetching the value of an environment variable. The name of the variable is REQUEST_METHOD. The function in GT.M that fetches values of environment variables is $ZTRNLNM. Once we know which method is being used, we need to get the actual parameters. If the method is "GET", the environment variable QUERY_STRING will have the complete query, including all parameters, and otherwise (if the method is "POST"), the parameters have to be read from standard input. In that case, environment variable CONTENT_LENGTH will tell us how many characters to read to obtain all parameter values.

Once these parameter values are captured into local variables of the M[UMPS] program, they can be used to call the appropriate application subroutine(s), and within those communicate the desired specific details.

The big surprise

So, the interesting thing is that, even though GT.M doesn't include any web-specific tools, it already did contain tools to allow the implementation to interact with the operating environment within which it does its work. And the surprise is that these tools are quite enough to get a web-interface to work. All it takes is a C program to start the M[UMPS] program, and a built-in function to get access to the CGI variables.

The downside of this situation is that there are no explicit tools that would help an application-programmer create a web-interface for an existing application, but I also noticed that a much more important benefit of this situation is that there are no built-in limitations in the web-interface that prevent the application-programmer from using any specific detail of the possible ways to interact with "the web".

More details

In order to evaluate the parameters, a function $$URLin is called, and for error-handling, a subroutine Error is called. The rest of this article describes these and other functions that would help create a functioning web-application.

Parameters

The language of "the web" has a construct known as a "web-address", sometimes called a URL or a URI. This feature is used to gain access to information. A URL typically looks like http://machine.domain/directory/filename?keyword= value&keyword= value&..., where "http" is the name of the communication-protocol to be used, machine is the name of the computer to be accessed, domain is the name of the network (organization) to which that computer belongs, directory is any number of levels of directories and subdirectories to get to the filename, and the optional keyword=value combinations specify any additional parameters to be passed to the application identified by filename.

An issue that each language addresses in its own way is how to deal with "special characters". In order to be able to parse out the keywords and values appropriately, the web-language has the rule that all characters in parameters or keywords that are not numeric or alphabetic must be encoded as a percent-sign (%) followed by two hexadecimal digits, representing the ASCII code of the character in question. Thus, a comma would become %2c (ASCII code 44), and a plus sign would become %2b. A special case is the space-character, which may also be represented as a plus-sign (the hexadecimal representation (%20) is also valid). In the sample program, the parameters are parsed out by the procedure $$URLin, which looks like:


URLin(x) New c,e,hex,i,p,r,z
 Set hex="0123456789abcdef",z=$tr(x,"ABCDEF","abcdef")
 Set r="" For i=1:1:$Length(x) Do
 . Set e=$Extract(x,i)
 . If e="+" Set r=r_" " Quit
 . If e="%" Do  Quit
 . . Set c=$Find(hex,$Extract(z,i+1))-2*16+$Find(hex,$Extract(z,i+2))-2
 . . Set r=r_$Char(c),i=i+2
 . . Quit
 . Set r=r_e
 . Quit
 Quit r

The sample M[UMPS] program uses a parameter called Action as a first selector that decides which of the available subroutines the program will execute. Note the lines that look like:

    If entry="xxx" Do xxx Halt

Each of these lines calls the appropriate subroutine for one of the possible actions.

Inside these subroutines, the values in the array %Key() will give access to any further parameters.

Now, the M[UMPS] program could simply execute Do @entry. If the program were written like that, any callable M[UMPS] subroutine could be invoked using the same interface program (and the error-trap this is already set up would take care of any values for entry that could not be processed). However, in practice, there are issues of security to consider, and most systems contain maintenance programs that should not be too easily accessible to outside users, so some level of protection will typically be required.

Similar to the parameter-input-conversion function, the program contains a procedure to format parameters that are to be passed into a web-address:


URLout(x) New e,i,hex,r
 Set hex="0123456789abcdef"
 Set r="" For i=1:1:$Length(x) Do
 . Set e=$Extract(x,i)
 . If e?1AN Set r=r_e Quit
 . If e=" " Set r=r_"+" Quit
 . Set e=$ASCII(e),r=r_"%"_$Extract(hex,e\16+1)_$Extract(hex,e#16+1)
 . Quit
 Quit r

Next to the procedure called $$URLout, there also is a procedure called $$HTMLout. The function of the latter procedure comes into play when representing normal text within a web-page. Within the web-language (HTML, HyperText Mark-up Language), the characters ampersand (&), less than (<) and greater than (>) have a special meaning, and those characters shouldn't appear by themselves. Characters outside of the 7-bit ASCII set may be mis-interpreted when transmitted without special care. If any of those characters need to be shown, a special construct is needed. The function $$HTMLout takes care of these transformations:


HTMLout(x) New e,i,r
 Set r="" For i=1:1:$Length(x) Do
 . Set e=$Extract(x,i)
 . If e="&" Set r=r_"&amp;" Quit
 . If e="<" Set r=r_"&lt;" Quit
 . If e=">" Set r=r_"&gt;" Quit
 . If $Ascii(e)>126 Set r=r_"&#"_$Ascii(e)_";" Quit
 . Set r=r_e
 . Quit
 Quit r

Error Handling

One unfortunate aspect of application-programs is that there are always expected and unexpected cases that cannot be handled by the software. Whenever such a situation arises, the error needs to be reported, and, if possible handled. The sample program contains a simple error-handling subroutine that just reports the error and some information about the M[UMPS] environment when the error happened. A "real" application would probably need a somewhat more sophisticated error handler, but this sample-version would be enough to get any prototype-program going:


Error(MSG) ;
 Write "<html><title>Sample Error Report</title><body>"
 Write !,"<h2><font color=#ff0000>Error: "_$$HTMLout(MSG)_"</h2>"
 Write !,"<h3>GT.M version is ",$ZVERSION,"</h3>"
 Write !,"<pre>"
 ZShow "*"
 Write !,"</pre>"
 Write !,"</body></html>"
 Halt

Note that this error handling procedure has a parameter, and that for expected errors a specific value is passed (e.g. "No action specified" or "Invalid action requested"), whereas for unexpected errors an ad-hoc value ($ZSTATUS) is being passed.

Get and Post

When a request is made using a URL like http://www.company.com/cgi-bin/program?Action=Buy&Article=Milk&Quantity=2, the transaction is processed as a "Get" request. This is also the case when the URL is provided in a hyper-text-link like <a href="http://...&Quantity=2">. The "Post" method comes into play when information is entered into a form and subsequently transmitted to the web-server. The code-snippet below creates a form with fields that correspond to the parameters in the above example.

 Write !,"<form action=""TestCGI"" method=""POST"">"
 Write !,"<table>"
 Write !,"<tr><td align=right>Action:</td>"
 Write !,"<td><input type=radio value=""Buy"" name=""Action""> Buy<br>"
 Write !,"<input type=radio value=""Sell"" checked name=""Action""> Sell</td></tr>"
 Write !,"<tr><td align=right>Article:</td>"
 Write !,"<td><select>"
 Write !,"<option value=1001>Milk</option>"
 Write !,"<option value=1002>Yoghurt</option>"
 Write !,"<option value=1003 selected>Butter</option>"
 Write !,"<option value=1004>Cheese</option>"
 Write !,"</select></td></tr>"
 Write !,"<tr><td align=right>Quantity:</td>"
 Write !,"<td><input type=text name=""Quantity"" value=5></td></tr>"
 Write !,"</table>"
 Write !,"<p><input type=submit value=""Process Order""></p>"
 Write !,"</form>"
When displayed in a browser, this "form" would appear as:
Action: Buy
Sell
Article:
Quantity:

The line that starts the "form" specifies the method to be used for processing the request. Both "Get" and "Post" are allowed for forms. However, when the "Post" method is chosen, the URL that is displayed in most browsers will look a lot cleaner than when the "Get" method is chosen. Also, when the number of parameters gets large, the "Get" method is likely to run into problems with the maximum number of characters that can be in a URL, whereas there is no limit to the amount of text that can be fed into a program through standard input...

More CGI Variables

The sample M[UMPS] program uses three CGI variables to obtain the information that it needs to finds its parameter values (REQUEST_METHOD, CONTENT_LENGTH and QUERY_STRING). The variable REQUEST_METHOD is always defined, QUERY_STRING is also always defined, but it only has a non-empty value when at least one parameter is specified in the URL. CONTENT_LENGTH, however, is only defined if the current method is a "POST".

A complete list of all CGI variables is hard to provide, because each web-server offers its own. The table below shows the variables that are used by Apache 1.3.12. (View "your own" copy of these variables by clicking on the following links:

(Note that "your" IP address will always appear to be 90.0.0.1. This is the (intranet) address of the firewall that relays your request to the computer that actually processes the requests.)

When
defined
Name Sample value
post CONTENT_LENGTH 80
post CONTENT_TYPE application/x-www-form-urlencoded
always DOCUMENT_ROOT /home/httpd/html
always GATEWAY_INTERFACE CGI/1.1
always HOST jsrnote01
always HOSTTYPE i386-linux
always HTTP_ACCEPT_CHARSET iso-8859-1,*,utf-8
always HTTP_ACCEPT_ENCODING gzip
always HTTP_ACCEPT_LANGUAGE en,ru
always HTTP_ACCEPT image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*
always HTTP_CONNECTION Keep-Alive
always HTTP_HOST 90.0.0.5
post HTTP_REFERER http://90.0.0.5/Demo/TestCGI?Action=xxx
always HTTP_USER_AGENT Mozilla/4.7 [en] (WinNT; U)
always MACHTYPE i386
always OSTYPE linux
always PATH /sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin
always PWD /home/edm/gtm/edm
always QUERY_STRING Action=xxx
always REMOTE_ADDR 90.0.0.17
always REMOTE_PORT 2928
always REQUEST_METHOD POST
always REQUEST_URI /Demo/TestCGI?Action=xxx
always SCRIPT_FILENAME /home/edm/demo/TestCGI
always SCRIPT_NAME /Demo/TestCGI
always SERVER_ADDR 90.0.0.5
always SERVER_ADMIN root@localhost
always SERVER_NAME JSRNote01
always SERVER_PORT 80
always SERVER_PROTOCOL HTTP/1.0
always SERVER_SIGNATURE <ADDRESS>Apache/1.3.12 Server at JSRNote01 Port 80</ADDRESS>
always SERVER_SOFTWARE Apache/1.3.12 (Unix) (Red Hat/Linux) PHP/3.0.15 mod_perl/1.21

Notes

Ed de Moel is past chairman of the MDC and works with Jacquard Systems Research.
His experience includes developing software for research in medicine and physics.
Over the past ten years, Ed's has mostly focused on the production of tools for data management and analysis, and tools for the support of day-to-day operation of medical systems.
Ed has worked with the Greystone Group at Sanchez on a project to make GT.M more compliant with the 1995 ANSI standard, and currently works with the Department of Veterans Affairs on their project to add images to the medical record.
Ed can be reached by e-mail.


Winfried Bantel (home-page) is a consultant from Germany. Over the years he has provided numerous contributions to the M[UMPS] communities in Germany and world-wide.