ENCLOSURE 0, PAGE 1

Enclosure 1 shows five miniforms cropped from 1990 Census Long Forms. The
training CD-ROMs will contain between 10,000 and 50,000 of these images
and the corresponding ASCII answer (reference) files. The test CD-ROM will
be distributed in late October with between 10,000 and 50,000 more images
but no reference files. The Conference is scheduled for February of 1994. 

The training and testing materials will be distributed as NIST Multiple
Image Set (MIS) files in a compressed IHEAD format on separate CDROMs.
Each MIS file will contain five images (like on the enclosure). Notice
that there are two different form types that are being extracted as
miniforms. The most obvious difference between them is the relative
location of the large black boxes and the right-most vertical line.
However, the location of the answer field boxes relative to the large
black boxes is also different. 

Participants will be expected to return their classification results
(hypothetical classifications) in a NIST MFS file format as illustrated by
the d00f00.hyp file shown in Enclosure 2. The hypothetical classifications
will be scored against references classifications like those in file
d00f00.ref of Enclosure 2. The reference files will be included with the
training materials, but not with the test materials. Participants may also
return field level rejection or confidence files as indicated by the files
named d00f00.rjx and d00f00.con in the SYSTEM_NAME subdirectory tree of
Enclosure 2. Character level confidence and reject files will not be used
as discussed in Enclosure 3. More detailed file creation and naming
specifications for the hypothesis, rejection, and confidence files will be
included with the training materials. Finally, the training and test
materials will have identical directory formats to make the transition
from training to testing as smooth as possible. 

Two measures of field accuracy will be used to score the hypothesis files
submitted by Participants. The first of these is the field error fraction,
that is, the fraction of the hypothetical fields that differ in any way
from the reference fields. The second is a measure of the distance between
the hypothetical and reference fields. Both will be plotted as a function
of rejection fraction if either confidence or rejection files are
submitted with the hypothesis files. Enclosure 3 provides more details
about the scoring. 

Enclosure 3 also makes some points about the contents of the images in
Enclosure 1 that may affect participation and scoring. Beyond those
points, however, it should be mentioned that the images in Enclosure 1 are
among the best image quality in the hand-print region that have been
produced to date. The poorest quality images will be removed from the
training and test sets with an automated procedure, but there will be
poorer quality images in what remains than shown in Enclosure 1, and
poorer quality hand print as well. To get an idea of the range of image
and print quality, you may obtain by anonymous ftp a representative sample
of the types of images that will be sent for training and for testing from 

sequoyah.ncsl.nist.gov, IP 129.6.61.25. 

More details can be found in Enclosure 2. This site will also have a
whatsnew subdirectory in which important dates and other important
information will appear once they become available.  Most conference
activities will be run using the anonymous ftp site. 

Enclosure 4 is the format for an application to participate in the 2nd
Conference. Anyone who sends a signed copy of this letter to me before the
training data is sent out for writing on CDROM will receive the training
materials and test materials when they are sent out. The training data may
be ready for writing on the CDROM by July 15. As soon as a firm date is
set, it will be posted in the whatsnew subdirectory mentioned above. 

ENCLOSURE 0, PAGE 2

The Committee reserves the right to distribute the training and test
materials to anyone who returns the form letter after the date specified
above, depending upon the availability of these materials. There may also
be restrictions on the number of participants and colleagues from a single
organization that can actually attend the meeting, and the Committee may
request that a single participant from a single organization represent the
entire organization and all of its systems. 

Notice that the enclosed application format requires the applicant to sign
a statement that he or she agrees to abide by the rules of participation
stated in Enclosure 5. Finally, Enclosure 6 is a draft of a form for
describing your system that will be sent with the test materials; it is to
be returned at the same time as your test results. If an applicant fails
to provide the information requested on this form (presumably because it
is proprietary), that applicant will still be allowed to submit results,
and attend the main meeting, but may not be allowed to attend sessions
where participants who have provided this type of information describe
their systems and their participation in the conference. The decision on
this matter will be made on the basis of how many participants provide the
requested information and how many do not. In case the number of
applicants exceeds the capacity of the meeting facilities, the Committee
reserves the right to limit attendance to those participants (and a number
of colleagues to be decided) submitting results that exceed a performance
threshold chosen to fill the meeting room. This decision will be made at
the discretion of the Conference Committee who may, nevertheless, poll the
participants for their feelings about this issue. 

Comments or suggestions may be sent to me at 

geist@magi.ncsl.nist.gov

or 

Jon Geist, (301) 590-0932 (FAX).

Please do not suggest that we use any other format other than MIS and
IHEAD as changes from these formats are not practical. Also, if there is a
large volume of comments, you may not receive a personal reply to your
comments, but they will be taken into account in the final plans for the
Conference. 

Requests for technical information about the data and other information at
the FTP site should be addressed to 

urt@magi.ncsl.nist.gov

or

R. Allen Wilkinson, (301) 590-0932 (FAX).




