Q1-part1: Can information gleaned from the web be used for the BaseLR condition?
A: Yes, definitely. Leveraged knowledge does not factor into the BaseLR/OtherLR distinction.
Q1-part2: Can an online rule-based morphological analyzer be used for BaseLR?
A: No. Any other tools including your own trained on other resources, pushes you over to OtherLR because you're using a non-BaseLR resource to train a component. If, however, you build a component for another language, throw away the model and build a new model using the BaseLR, the component could be used as a BaseLR component. In the particular case of a rule-based system, the rules are the model so its use would make the system OtherLR.
Q: Will we ever get transcripts for untranscribed training?
A: No. There are no transcripts for those files.
Q: How many keywords can we expect?
A: about 4K KWs, maybe more.
Q: Will dev keywords be provided?
A: No Kws will be provided for dev this year. It is important to develop strategies for tuning KWS so it is important for people to develop their own KWs. Two teams agreed to provide a set of keywords for the community to share. They will be released in an IndudDB and put on the scoring server.
Q: The OpenKWS Workshop is in July. Is it possible to videoconference in?
A: It may be hard to do at a hotel but we will check. We will send out slides for certain.
Q: What are the differences from surprise last year?
80 hrs of training audio with 60 hrs transcribed (the vietnamese training was all transcribed).
Note: There is a 10 hr transcribed subset of training for sub train. (Babel folks are using this condition for Program goals)-
10 hrs of dev,
75 hrs for eval.
- There is only 3 wks between the Build Pack release and the Eval Pack unlock dates
Q: What can you say about OOV rate?
A: There will be OOV keywords. The rate is language dependent and we'll tell you the fraction.
Q: What will be released on April 2? Specifically, when will an IndusDB be released and what will it contain?
On April 2: The Build Pack: (LSP, training and development test audio with transcripts and lexicon)
On April 2: An initial IndusDB with the conversational development test reference transcripts
After NIST receives the contributed keyword lists, a second IndusDB with the 2 keyword list.
Q: Is the LimitedLP condition only for Babel performers?
A: No, all can participate in the LimitedLP condition. We wanted all to know it's being focused on in Babel.
Q: Are we able to define new data conditions?
A: Yes and encouraged to. See Appendix I: "Experimental Conditions for KWS Evaluations" of the evaluation plan for how to specify the new data condition. Please send NIST the definition of the data condition and they will post it on the OpenKWS web site so that others are aware of it and potentially able to run the same condition.