RT07 Telecon minutes 11/14/2006 11 AM EST ICSI: Chuck Wooters, Adam Janin IBM: Makis Potamianos, Etienne ?, Jing Huang NIST: Jon Fiscus UKA: John McDonough, Cedric Rochet, Matthias Wolfel, Sebastien Stuker CMU: Susi Burger LIMSI: Claude Barras, Lori Lamel Action items indicated with --> Decisions are marked with ** AGENDA: I. Review of old action items A. NIST releases an evaluation plan which includes the definition of Speaker Attributed STT - desire for as much overlapping speech as possible --> JF will publish this by end of November B. NIST polls the community to discontinue the support of SAD - One person objected to the discontinuation, but since they may not participate at all, it is their opinion ** All agree: this task will be discontinued C. NIST starts a discussion on whether or not coffee breaks are in-domain and therefore eligible to be in the test set. - some discussion of this point. - proposal to table" discussion until folks can look at the data -- Cedric: "coffee breaks" already labeled. JF: CLEAR folks prob want this in test set. Makis: Why must RT07 group be same as CLEAR? JF: That way the data is multiply annotated. Makis: use a "don't care" condition for coffee breaks. JF: Or run it and break it out completely, treat as separate subdomain. Is it already transcribed? Cedric & Susi: Yes, all of it. --> Cedric will make available a 5-minute coffee break sample of medium difficulty (not too much crazy cross-talk and not a lack of discussion) in addition to the dev data. NOTE: will take around 2 weeks to download the dev data --> Everyone will look at some of the data so that we can make an informed decision. Discuss next time. D. NIST publishes the Forced Aligned reference files from RT-05 and RT-06 --> NIST will do this --> JF: do for dev data, that has been released again E. Janin will research and report what development data from AMI has been released and also what data to specify as dev test. -- Adam: ICSI has released all meetings. -- new collection: - internal eval, slightly different: series of 4 scenario meetings, each with 4 participants, run with many groups. - goal of internal eval: test tools (devd under AMI consortium) for getting up to speed on a meeting using different browser technologies - participants given access to 3 previous meetings, and collected recording is 4th meeting w/ 4 people trying to finish a task begun in the 3 previous. - concern: social interactions will be a little diff - recording conditions are still standard - raw recordings - don't know how much will be released by cutoff date - Question: Will it be release as dev data? - AMI will not have time to annotate - this data is supplementary to other data - will be available in a couple of months - will release to NIST as part of eval - should have several hours by the 13th (Dec) - Some discussion about whether or not to use new collection decision: use it! - JF: is as interactive as is in real corpus? Adam thinks probably just as interactive, because data is task-based -- AMI public corpus is avail -- legal for training - avail from AMI website - 100 hours or so - released after eval last year - Data can be downloaded or send a disk to AMI for data - transcripts only avail in XML format: can we convert to something like STM format? - Adam: AMI prob has these tools --> Adam will point people to quick-and-dirty scripts --> JF will publish these tools on website --> Adam will circulate this dev divide, which was developed internally F. CHIL will report on the soon-to-be-released DEV data and it will be documented on a web site - dev data: transcripts in a usable form - Cedric sent them already from CHIL -- transcripts and video labels - JF will look @ dev transcripts to see if are in form that is usable for STT and SASTT eval - Question: when CMU sent transcripts, did build STMs for them? Susi: Yes. They were built as ELDA wanted them. Used JF's scripts. Didn't use normalization stuff because CHIL didn't need them. Question: Make them for distant mic condition? Susi: Transcription 2X first CTMs, then Far Field condition. Sometimes slight reverberation in FF condition, so segmentation may be a little bit longer than for CTMs. Also try to make comments on speaker accents, interaction, overspeaking, etc. JF: Was this data released? Yes. All CHIL consortium got the data, but not sent outside of this. Cedric: NIST should be distribution center, after looking it over. Jon: NIST doesn't have the resources Cedric: So for now dev data released all over CHIL. Cedric can give access to CHIL server. --> Cedric will send an email about how to access this data. - sites can send HD or can download - NOTE: NIST did this in about a week -- Discussion of attributes of this data. ** Makis: speaker diarization: not do on lecture data? [sheck jon's notes] - SASTT implies keep it for both conf and lec - JF: dev lec data that has been released = Susi noted there was more interactive data than previous. Selection process trend: more and more interactivity -- will this continue? Susi: don't know. At first, old-style; now, is more interactive, not like typical meeting, but also not like straight-up presentation w/a few questions -- somewhere in the middle now. - ICSI: only concern here is percentage of lecture vs. everything else. Less than 10%? Prob not worth it to run it in the eval. - JF: Make this objective. If interactivity is too infrequent to make it worthwhile, then not do it. - Makis: can we get some measure of interactivity in the next month or so? - ICSI: how close to 10% are we? - JF: overall mix, then judge. - Adam: if under 10%, then best bet is to attribute all to 1 speaker. So if GREATER than 10%, then seems silly to run task on. Best decision-maker on this will be Susi. --> JF and SB will decide how to make this decision (offline). G. NIST posts the requirements for evaluation data collection including instructions for synchronization. - no comments H. NIST will be sending a draft of the evaluation schedule - cutoff for training data is 12/31 - most of this data is already released - 1 month between eval data and results - no word on where the conference will be - LIMSI: dates of GALE wkshp in CA 3rd week in March? Don't know. --> everyone will look at this in detail to make sure it works, then ratify at next telecon II. New agenda items A. Close-talking mic task - JMcDonough: what's become of the segmentation aspect of the Close-Talking Mic condition? Can manual segmentation be a primary condition for the IHM tests? - Editorial comment: This refer to discussion at May PI meeting where we discussed the effort that needs to go into segmentation and crosstalk elimination in order to field a good IHM, STT system - Adam: likes this for base-line - issue: do we want to spend the effort on that kind of research? - JF: in the eval sched: 2 deadlines exist: automatic sys, then cascaded or ref input cond. we could run IHM condition with reference segmentation. - JMcDonough: Do we elevate this to primary eval condition? Or do you write own segmentation. - Adam: @ ICSI will do just because interested, but other sites might not be. ICSI in favor of reference segmentation as a primary condition. - Reference segmentation will be released on 22nd after 1st eval deadline. - JMcDonough: If people submit results late, would they get the ref segmentation prior to their submission: JF: No. -- All in agreement. --> JF: wants to write a proposal for email discussion. - JMcDonough: So what's the primary condition? - last year MDM was primary (table-top) - make it all distant mics = table-top + mic arrays? - JF: propose also to retain MDM as primary condition. --> JF will propose to keep the MDM condition the primary condition via email B. Training data: - NIST will be releasing a set of training data meetings by the training data cutoff data. - JF: 10-12 hours data. Transcribing now. Are people interested? YES. C. Schedule next telecon - December 5, 11:00 EST - Susi won't be there - Lori might not be there --> JF will schedule this. ==============================================================================