Q1: Can the dev test set knowledge be used to for the eval?  E.g., can the development test lexicon and audio be used to train models?

Ans: In a real application, one would very likely do so.  However, the existing evaluation participants unanimously agreed to NOT do so because they would rather focus on other, language-oriented techniques to improve performance rather than adding data. 

Q2: The evaluation plan says that a detection "score" is provided for each keyword hit.  What is the expected nature of the score, a probability, likelihood score, something else?

Ans: The evaluation plan does not specify the scale or domain of the detection score and the evaluation code neither expects and rejects
any scale/domain.  However, the scale/domain you chose should be consistent across keywords otherwise your ATWVs will be less than optimal had you properly normalized the detection scores across keywords.  Further, a single YES/NO decision threshold must be applied to all keywords; the evaluation tool will not score the system output if this check fails.

Q3: Can we participate in only the KWS Task?

Ans: Absolutely!  It is not required to participate in the STT tasks.

Q4: Is the Babel Data available for purchase?

Ans: At the moment, only the 2013 Babel Surprise Language data is being release to OpenKWS participants.  We are working on the public release plan but have no concrete details at this time. 

Q5: Do the use of existing speech/non-speech models and the like qualify for use in the BaseLR condition?

Ans: No. The BaseLR condition is a strict, 'flat-start' condition so that we can quantify the effects of adding additional data and models.

Q6: Will the lexicon and transcriptions be provided?

Ans: Yes, the build pack will have a lexicon with phonetic spellings.  There will be a Language Specific Peculiarities (LSP) document supplied with the build pack that provides a wealth of  information about the language.  Make sure you start there.

Q7: Will we get the voice-activity-detection marks, since large portions of CTS audio are silent?

Ans: No. Speech activity detection is part of the "task" so the evaluation data will not come supplied with speech-activity detection time marks. The evaluation data will be supplied as the audio data and an Experiment Control File (ECF) file which indicates the data to process.  That is not to say teams could not share technology or outputs for speech activity detection.  

Q8: Will the development set include word-level time stamps for us to tune our systems for the eval data?

Ans: Yes.  In the build pack, the development test set (15 hrs of data) will have force-aligned word times.  The training data (80 hrs.) will not.  

Q9: The evaluation documentation it states we will be provided with the lexicon and transcriptions, but in LDC2013E27 (Babel-structured STD 2006 data), the transcriptions and reference material folders are empty. Is NIST planning to give us this information at a later time? Or is NIST only intending on providing the audio?

Ans: First, the force aligned, word-times for LDC2013E27 are in the RTTMs of the IndusDB.20130305.OpenKWS.tar.bz2.  Please take a look there.  There is not lexicon supplied with the LDC2013E27 data because a lexicon was not originally built for that data set.

As a aside, LDC2013E27 is intended as a primer to get folks using the evaluation tools, getting there Indus account established, and getting used to the evaluation concepts before the real data comes out.  We left out the original transcripts (which are in a very different form than the Babel data) so that folks won't waste time with those transcripts.  There are tools in F4DE to process the Babel transcripts.