MER-12 Dry Run Data: MER-12 Dry Run Test Collection
June 29, 2012

1.0 Introduction 

    This release contains the MER Dry Run test materials
    for the 2012 evaluation cycle.  It contains the textual materials
    that accompany a MER test collection and links to the event kit
    training videos in LDC release LDC2012E01.  The links are relative
    top level of the archive.  To use the links, unpack the archive
    into your existing LDC2012E01 directory.

    The structure of this distribution is identical the release
    structure of the MER-12 Test Collection data except the MER-12
    Test Collection will contain the test videos.

    Please refer to the MER '12 evaluation plan (available at
    http://www.nist.gov/itl/iad/mig/mer12.cfm) for more information.

2.0 Corpus Structure and Contents 

    Video files are located in the video/ directory.  Each
    subdirectory that contains video files also contains CHECKSUMS, a
    file containing md5 checksums for the video files in that
    subdirectory.

    In the video/ directory you will find 5 subdirectories labeled
    E022/, E026/, E027/, E028/, E030/.  Each subdirectory corresponds
    to one MED event.  Inside each subdirectory are 6 .mp4 files, each
    showing a positive instance of that event.

    In the databases/ directory, you will find 3 .csv files containing
    metadata:

    MER12DRYRUN_DATE_ClipMD.csv contains general clip metadata
    including a unique ClipID, associated MEDIA_FILE, CODEC, MD5SUM
    and DURATION where: 
    	- ClipID: unique numerical ID for this video/metadata entry 
    	- MEDIA_FILE: filenames for the media file corresponding to
          this entry.
	- MD5SUM: MD5 checksum for the media file corresponding to
          this entry.
	- DURATION: duration in seconds for the media file
          corresponding to this entry.

    MER12DRYRUN_DATE_EventDB.csv contains the names and ids for the
    events in the data set 
    	- EventID: is one of the event IDs or NULL
        - EventName: is the textual name of the event

    MER12DRYRUN_DATE_Ref.csv is the Ground Truth file used by the DEVA
    scorer 
    	- TrialID: The ID that we use to separate all possible entries
    	that are to be scored (present also in TrialIndex)
        - Targ: a 'y' or 'n' value that specifies if the entry is a
    	"true" occurrence or not

    In the doc/ directory, you will find event_id.map, a file that
    shows the mapping between each numerical event ID and its textual
    event name.

3.0 Data Format

    All video files are in MPEG-4 format and use h.264 video encoding
    and AAC audio encoding.  Video resolution and audio/video bitrates
    are retained as found in the original harvested files to the
    extent possible.

    All other files are .txt, .csv, or .map.

4.0 Data Use/Licensing

    All data in this release have been reviewed for license or use
    restrictions incompatible with the MED-12 program and have been
    judged as usable for MED-12.  This release is subject to the terms
    of the MED-12 development license agreement between users and LDC.

5.0 Copyright Notice

    Portions (c) 2012 Trustees of the University of Pennsylvania

6.0 Contacts

    Amanda Morris (Project Manager) 
    Chris Caruso (Collection) 
    Haejoong Lee (Annotation Infrastructure) 
    Kevin Walker (Collection)
    Denise DiPersio (IPR/Licensing)
    Alonso Indacochea (Annotation)
    Jonahan Fiscus (NIST)
