MED-11 Documentation and Metadata: MED10, DEVT, EventKits, and MED11 Test Set LDC2011E42 May 26, 2011 1.0 Introduction ================ LDC2011E42 is a web download corpus comprising the release of documentation and metadata for the MED track of TRECVID 2011. Distribution is restricted to registered MED-11 participants who have completed an evaluation license with LDC. This release contains documentation and metadata files accompanying video files released as LDC catalog item numbers LDC2011E06 and LDC2011E41. The release contains metadata files for five video collections: 1. MED10TRN - the MED '10 Training Set, 2. MED10EVAL - the MED '10 Evaluation Set, 3. DEVT - the MED '11 Training resources, 4. EVENTS - the MED '11 Event Kit resources, 5. MED11TEST - the MED '11 Test Set Permitted uses of the collections are documented in the MED-11 evaluation plan and summarized in Section 5 of this document. Evaluation participants should contact NIST with any questions. 2.0 Corpus Directory Structure and Contents =========================================== events ------ In the events/ directory you will find 15 subdirectories labeled E001-E015, each subdirectory corresponding to a MED-11 event. Within each subdirectory is a text file containing the event kit for that MED-11 event. doc --- In the doc/ directory, you will find two files: 1) Event_id.map - shows the mapping between each numerical event ID and its textual event name. 2) Clip_location_lookup_table.csv - gives the location of every clip in in either LDC2011E06 or LDC2011E41. It consists of four columns, with headers: a) ClipID, the unique numerical identification number for the clip. b) Banned_Status, indicating whether the clip is banned from use or not banned. Per the license agreement, files listed as "banned" must be deleted from your system. c) CatalogID, the LDC catalog number for the latest release containing the clip. d) Directory_Path, listing the directory path to the clip, relative to the root of that release. database -------- The databases/ directory contains tables describing five MED video clip collections. Five tables are provided for each collection using the following naming convention: __.csv where: is one of the collection names is the version date of the file is the data content type. It can be either "ClipMD", "JudgementMD", "Ref", "TrialIndex", or "EventDB" Note that the "Ref" and "JugementMD" files for the "MED11TEST" collection are NOT in this release and will be released after the MED '11 evaluation. Each data content type is defined as follows: *_ClipMD.csv contains general clip metadata including the following columns : ClipID MEDIA_FILE CODEC MD5SUM DURATION where: - ClipID: unique numerical ID for this video/metadata entry - MEDIA_FILE: filename for the media file corresponding to this entry - MD5SUM: MD5 checksum for the media file corresponding to this entry - DURATION: duration in seconds for the media file corresponding to this entry *_EventDB.csv is the Event Description Metadata file, containing the following columns : EventID EventName where: -EventID: unique numerical ID for the event -EventName: textual name for the event *_TrialIndex.csv is the list of detection trials a system must perform. Each trial consists of a TrialID, an unique identifier, and an EventID/ClipID pair. The file contains three components: TrialID ClipID Event where: - TrialID: The unique ID of the trial. *_Ref.csv is the Ground Truth file used by the DEVA scorer. The file contains two columns: TrialID and Targ. where: - Targ: a 'y' or 'n' value that specifies if the entry is a "true" occurrence or not *_JudgementMD.csv is the ClipID-Level Event Annotation Metadata file. This file contains the results of the annotation process. Data scouts, (the annotators) performed both clip-based annotations (synopsis, genre, etc.) and event-specific annotations ("Does this clip contain event X?"). All ClipIDs are present in the *_JudgementMD.csv file for that collection, but ClipIDs may occur twice if two event-specific judgements were made. The *_JudgementMD.csv files for the MED10TRN and MED10EVAL clip collections contain a different inventory of columns than the *_JudgementMD.csv files for the other three clip collections. The column inventories are: MED-10 judgement files: ClipID EventID INSTANCE_TYPE SYNOPSIS GENRE TOPIC SCENE OBJECTS ACTIVITIES INSTANCE_VARIETY INSTANCE_COMPLEXITY AUDIO_EVIDENCE TEXT_EVIDENCE NON_ENG_SPEECH NON_ENG_TEXT All other Judgement files: ClipID EventID INSTANCE_TYPE SYNOPSIS GENRE TOPIC OBJECTS ACTIVITIES INSTANCE_VARIETY INSTANCE_COMPLEXITY AUDIO_EVIDENCE SCENE TEXT_EVIDENCE NON_ENG_SPEECH NON_ENG_TEXT NARRATIVE_AUDIO NARRATIVE_TEXT INSTANCE_COMMENT where: -EventID: is one of the events or "NULL". The "NULL" event indicates no event-specific judgement was made for the ClipID - this is a background clip. -INSTANCE_TYPE: This field records the results of the event-specific annotation with regard to the event in the "EventID" column. The field has five possible values from most restrictive to least: o positive : the clip is a true instance of the specified event o near_miss : the clip may superficially appear to be a positive instance of the specified event, but in fact lacks sufficient evidence to constitute a positive instance o not_sure : annotator could not decide between positive/near_miss instance. o related : the clip contains one or more of the same or similar types of people, objects, locations, and/or actions associated with the target event, but does not meet the requirements to be a positive instance o NULL : "NULL" events have the NULL INSTANCE_TYPE Not all clips were judged against all INSTANCE_TYPEs. Where clips were judged against multiple INSTANCE_TYPEs, the most restrictive membership was used. -SYNOPSIS: brief freeform summary of the clip content written by data scouts -GENRE: subjective data scout judgement about the clip's genre -TOPIC: general topic category for this clip assigned by data scout -SCENE: optional freeform description by data scout of the scene/setting for the clip -OBJECTS: optional freeform description by the data scout of the objects/people shown in the clip -ACTIVITIES: optional freeform description by the data scout of the activities shown in the clip -INSTANCE_VARIETY: subjective judgement by data scout about whether this clip is more unusual than other positive instances of this event -INSTANCE_COMPLEXITY: subjective judgement by data scout about whether this clip is more difficult/complex than other positive instances of this event -AUDIO_EVIDENCE: does this clip contain audio evidence supporting the event -TEXT_EVIDENCE: does this clip contain text evidence supporting the event -NON_ENG_SPEECH: does this clip contain non-English speech -NON_ENG_TEXT: does this clip contain non-English text -NARRATIVE_AUDIO: does the clip include someone explaining or describing (some or all of) the event step-by-step as they perform it. Empty when the Event is NULL. May be empty for some non-NULL events. -NARRATIVE_TEXT: does the clip include text explaining or describing (some or all of) the event step-by-step as it is performed. Empty when the Event is NULL. May be empty for some non-NULL events. -INSTANCE_COMMENT: optional comment providing additional information about why the clip was labeled as a near_miss instance. 3.0 Data Format =============== All files are .txt, .map, or .csv. 4.0 Data Use/Licensing ====================== All data in this release have been reviewed for license or use restrictions incompatible with the MED-11 program and have been judged as usable for MED-11. This release is subject to the terms of the MED-11 license agreement between users and LDC. 5.0 MED Data Use Guidance ========================= For the purposes of describing the uses of the 5 distributed video collections, they can be seen as follows: -MED-10 data resources (MED10TRN and MED10EVAL) -Training events kits (EVENTS: E001-E005) -Testing events kits (EVENTS: E006-E015) -Development clip collection (DEVT) -TRECVID MED '11 Evaluation clip collection (MED11TEST) The listing below summarizes how participants are to interact with the data resources. -Train events kits + MED-10 data resources: Unrestricted use -Testing events kits + MED-10 data resources: Unrestricted use -Train events kits + DEVT collection: Unrestricted use -Testing events kits + DEVT collection: Unrestricted use -Training events kits + MED11TEST collection: NO RESEARCH PERMITTED -Testing events kits + MED11TEST collection: Restricted to preparatory use and blind evaluation use "Unrestricted use" means participants may listen, watch, or process any of these elements separately or together. This is covered in Section 3.1 (Events) and Section 3.2 (DEVT) of the evaluation plan. The soon to be released content of Section 3.2 will be: Two multimedia collections will be provided for training which participants may use for research, development testing, and error analysis of development testing. Both collections will include truth data. There are no MED-11 evaluation rules or restrictions governing how participants are to use the training resources in conjunction with the training and evaluation event kits. "Preparatory use" means the data was provided early in the evaluation cycle so that participants can automatically build their metadata store for the test collection over the Summer rather than in a short, specific time window in the Fall. This data may NOT be manually explored, viewed, or automatically analyzed in any way beyond its use in during the automatic metadata generation process. The full details are covered in Section 3.3 of the evaluation plan. "Blind evaluation use" means after both the event agent and metadata store have been frozen, the participant's event agent execution occurs. 6.0 Copyright Notice ==================== Portions (c) 2011 Trustees of the University of Pennsylvania 7.0 Contacts ============ Stephanie Strassel (Project Management) Haejoong Lee (Annotation Infrastructure) Chris Caruso (Collection) Kevin Walker (Collection) Yuvi Masory (Collection) Denise DiPersio (IPR/Licensing) Amanda Morris (Annotation) ---- README Created by Amanda Morris May 26 2011 Edited by Amanda Morris June 20 2011