This directory contains a version of the MUC_scorer program that was
modified for the HUB-4 IE Spoke evaluation.  THE PARTS FOR SCORING
OTHER TASKS, E.G. CO, ST, TE, AND TR HAVE NOT BEEN TESTED AT ALL.


INSTRUCTIONS FOR INSTALLATION

1. Assume that the ieeval package is in a directory named $IEEVAL.

2. cd to the src sub-directory of this directory

   unix% cd $IEEVAL/software/MUC_scorer/src

3. run "make"

   unix%  make 

4. the ieeval pipeline assumes that the MUC_scorer program remains in 
   the src subdirectory, so DON'T run "make install"

READING THE MANUAL

There is an HTML manual in 

    $IEEVAL/software/MUC_scorer/doc/mucscore.htm

It has not been brought up to date for use in the ieeval package.  See
"configuration options" below for a list of options in the
configuration file that have been added since the last version (3.3)
of MUC_scorer.

CHANGES SINCE VERSION 3.3

  NEW FILL TYPES
      There are two new slot types for this version:  the 
      "extent" fill and the "lexseq" fill.  The extent fill consists 
      of two integers connected by a period, e.g. "333.335".  The 
      first integer is the coordinate of the start of the extent, and
      the last integer is the coordinate of the end of the extent.

      The lexseq fill, used in the "content" metric of the HUB
      evaluation, is a sequence of lexemes, each terminated by a pair 
      integers in square brackets, e.g.  
   
        "O[333,0]'[333,0]BRIEN[333,0]"

        "%HESITATION[452,1]

      The first integer in the pair is the lexeme's offset, and the
      second integer is the number of lexemes to ignore if the current
      lexeme doesn't match in a comparison with the ref or hyp.  The
      second number is used to implement optional lexemes, such as
      fragments or pause fillers.

  NEW CONFIGURATION OPTIONS

     :ne_allow_incorrect - 
         If "yes", two tags of the same type
         will be matched, even if they have nothing correct.  Default
         is "no".
     :ne_offset_tolerance - 
         the difference allowed between two correct
         offset values. Default is 0.
     :sgml_offset_slot - 
         The name of the slot containing the offset of
         an NE tag.  This is used in the pre-condition test, to see
         if two tags have any overlap at all.
     :sgml_EXTENT_slot - 
         The name of the slot containing the EXTENT values
     :report_comment_string -
         In the NE tag-by-tag output, the character printed in the
         first column of the headings, to distinguish them from 
         data.  Defaults to "#"
     :report_lexeme_separator
         In the NE tag-by-tag output, the character which delimits
         lexemes in content fills.  The default is " ", but note
         that this is equal to "", so the lexemes will all be stuck
         together         
     :sgml_CONTENT_slot
         The slot containing the CONTENT in an NE tag
     :extract_from_text
         If "yes", the text between tags is extracted into the TEXT
         slot, as in the MUC string fills.  If "no", the text is
         assumed to be in the "content" attribute inside the SGML tags.
     :use_offsets_in_content_matching
         If "yes", two lexemes in a content fill will only match if
         they have the same value.  If "no", their offsets are not
         compared.
     :fragment_prefix_char
         The character which marks a lexeme as a fragment when it is
         the first character of the array.


QUESTIONS OR COMMENTS?

   Please email 

          douthat@gso.saic.com


