
Volume 4, number 5, December 1996, pages 32-33
Wind
ills
Beyond the year 2000, there be dragons...
by Ed J.P.M. de Moel
We all have seen numerous articles in the press lately, both
in the general press and in the M[UMPS] related literature, that
deal with the possible problems that may or will arise once we
have celebrated the first new year's eve that enters us into a
year that no longer has "19" as its first two digits.
Some problems have already happened: I recently overheard the
people at a car rental agency instructing a new clerk to enter
expiration dates on drivers licences that are in the years 2000
or 2001 as "99", because otherwise, the computer would think that
the license in question expired over 90 years ago, and the
machine would refuse to rent a car to the owner of the fairly
recently renewed driver's license.
In a previous column, I already indicated that $HOROLOG offers
us a tool that could help us to deal with dates in a manner that
does not suffer from "turn-of-the-century" problems. But is such
a tool really the answer to all the possible problems?
In comp.lang.mumps, there have been several
exposés of the usage of several tools, both standardized
tools and implementation specific ones, that sometimes help indeed
to avoid problems, and sometimes turn themselves against us.
In this column, I would like to repeat some "do's and don'ts"
and emphasize a couple of areas where some additional care and
review might be in place.
How should we store dates?
There are several good answers to this question. Basically,
any format that preserves enough information to know which
century a date is in is a good one. Examples are:
- $HOROLOG (my birthday in the year 2000 will be on
58121)
- FileMan format (the same birthday will be on
3000217)
- Formats based on the ISO representation (with or
without separators between the fields), like:
2000/02/17, 2000-02-17 or 20000217
- A more free interpretation of the year-month-day idea
like 2000-Feb-17
- Formats that are closer to "human readable" form, that
preserve all information, like 17-FEB-2000, 17/02/2000
or 02/17/2000 (the latter two depending in local
preferences regarding sequence).
- Julian format (year plus number of days into that year),
like: 2000/38 or 2000-038
And many, many more.
Some of these formats have the advantage that they collate in
the same order as the dates that they represent, some do not
preserve the collating order. E.g. "2000-Jan-31" would
sort after "2000-Feb-1", but, "2000-01-31" will
always precede "2000-02-01", and this collating sequence
will also be maintained when a different separator character is
chosen.
A nice additional feature of the forms that offer some
redundancy is that they can be used to store "imprecise" dates.
"2000-02-00" and "3000200" could be used to
indicate that we know that a date is in February of the year
2000, but we don't know (or care) exactly on which day, whereas a
format like the Julian date or $HOROLOG would not allow for any
"imprecise" date between the end of one month and the start of
the next. In fact, some of the utilities that do conversions
between $HOROLOG format and "readable" form can be tricked into
converting "0-FEB-2000" into 58104, and when we convert that
number back to "readable" we end up with "31-Jan-2000"...
(Of course, the "better" conversion utilities would report an
error when the day-number is less than 1 or higher than the last
day of the month in question.)
How should we display dates?
There is no answer to this question that will always be seen
as "correct". In practice, the answer to this question will be
dictated by a number of considerations that are dependent on the
situation at hand:
- Every country has its own preference for the sequence
of showing year, month and day. The "correct" format
is "whatever is locally preferred", whether that is
"day/month/year", "month/day/year" or
"year/month/day".
- When you're dealing with an audience that has
members from multiple cultural origins, it is generally
a good idea to show the name of the month in letters
(Jan, Feb, ...) rather than in digits. In the representation
"17-Feb-2000" it is obvious which field is the
date, and which is the month. Even if you happen to be
from a culture that would prefer to see "Feb-17-2000", you would have no problem understanding
what is intended.
- Ideally, the year should always be shown in "all its
digits", but, especially when you need to cram a lot of
information on a tiny display, space limitations often
dictate that some digits will have to be sacrificed.
- Keep in mind that cultural preferences might require
that a date be represented in a different form altogether
(see Winfried Gerum's columns in several previous
issues, where he presented conversion algorithms to
various formats that he needed for his customers).
How does one enter a date?
Of all the questions I wanted to address in this issue, this
one is the hardest to answer: the answer to this question cannot
be dictated by a programmer or by a standards committee, this
answer will come from the end user.
If we write software that neatly prevents all possible
problems, but forces the end user to enter information in a form
that he or she is not willing to use, it is quite predictable
that that end user will be looking for a different software
provider.
So... our software should be kind to the people who enter the
data, and should make reasonable attempts to figure out what they
mean.
- When someone enters a birth date, we may safely
assume that the date in question is in the past, and
when some enters the date that a mortgage should be
paid off, it should typically be a future date.
- When the user enters "02-17-00" we may safely
assume that the "17" indicates a day number, so the
only problem is to make the right assumption whether
the "02" or the "00" indicates the year...
- Of course, today our software would probably be
correct to assume that a birth date in "00" or "02"
would be 1900 or 1902 respectively, but 10 years from
now, both 1900 and 2000 (or 1902 and 2002) could be
intended.
- Software should always allow the end user to enter
dates in a "complete" format. Ideally, the software
should allow for "just about any way in which people
typically enter dates", but it is acceptable to restrict the
end user to a number of pre-defined possibilities.
Forcing a user to enter the year number as two digits
can end up being really counter-productive, especially
when, as in the car-rental example, the assumption
about the meaning of the two-digit year number is not
really helpful.
- Of course, we can laugh about the assumption that a
drivers license with a year-stamp of "01" would have
expired 95 years ago, but not all cases are as obvious as
this one. Whenever the end user omits information, the
software will have to make some assumption about the
missing information.
- To make a long story short: software can make a
number of valid assumptions about information
entered, but there are many cases where there remains
some uncertainty:
- What is the month in "05/07/1965"?
- What is the year in "02-17-00"?
- Which year does one assume when none is
entered (this year, last year or next year)?
- In all these cases, it seems fair to make a well
documented assumption, and display the result of that
assumption in an unambiguous format, so that the end
user can confirm that the software made the correct
assumption, or make the appropriate correction.
Obtaining such a correction while the end user still has
the source material available typically increases the
odds that the intended information ends up in the
database.
Now, where does "my" software make any wrong assumptions...
Ed de Moel is past chairman of the MDC
and works with Jacquard Systems Research.
His experience includes developing software for research in medicine
and physics.
Over the past ten years, Ed's has mostly focused on the production
of tools for data management and analysis, and tools for the support
of day-to-day operation of medical systems.
Ed can be
reached by e-mail.