Hi all, Complementing Stephen and David's emails: #### items 1-3 of the agenda: About which videos to examine/annotate, we decided that Qiang will process this one: *Australia Open Women's doubles match, 2008.* This is the one which has minor synchronisation issues that Qiang agreed to rectfy by finding the "magic numbers", i.e., the time lag for each of the four VOB files in that DVD. The plan is that nobody will modify the "master" VOB files but we will use these magic numbers to sync our annotations. In the fugure, once we are happy with the job on the above match, both sites will start working with another match. #### items 4,5 and 6, about levels of annotation and ontology: Like Stephen reminded, Ibrahim's ICIP paper has quite a large set of classes annotated: http://www.ee.surrey.ac.uk/CVSSP/Publications/papers/Almajai-ICIP-2010.pdf SFR Serve by Far player, Right Side SFL Serve by Far player, Left Side SNR Serve by Near player, Right Side SNL Serve by Near player, Left Side BIF Bounce Inside Far player's half court BOF Bounce Outside Far player's half court BIN Bounce Inside Near player's half court BON Bounce Outside Near player's half court HF Hit by Far player HN Hit by Near player BIFSR Bounce Inside Far player's Serve area on the Right BIFSL Bounce Inside Far player's Serve area on the Left BOFS Bounce Out of Far player's Serve area BINSR Bounce Inside Near player's Serve area on the Right BINSL Bounce Inside Near player's Serve area on the Left BONS Bounce Out of Near player's Serve area NET Bounce on NET Obviously we cannot "hear" all of the above classes from audio, and audio has different sets of classes that come from the voice of the umpire, line judges and crowd, but it's important that if we can at least keep the "hit" sound/action as a common point that can be detected using both audio and visual cues. At a higher level, there are match situations and scores. From our side, our existing system produces an annotation of all the events above and of detected scores in a number of XML files. All of them have time tags, some tags are instantaneous (ex. the time a hit is detected) and some tags refer to a play shot (a time interval). I'm sharing some sample XML files so that you guys have a better idea of this: serveDetection.xml has "low-level" instantaneous events highLevel.xml has scores detected. *Action for UEA*: Can you guys pelase produce a list of audio events that you believe you will be able to detect? #### 7. How the labelling tasks will be split between the sites I believe it's pretty obvious that: - Audio: UEA - Synchronisation: UEA - Video: Surrey we can keep iterating about details of annotations. #### 8. Propose a schedule for labelling Qiang comes over to Surrey: early April. Hopefully by then the synchronisation issue will be solved and a sketch of what we want to detect/annotate will be on the way. #### 9. Propose a test of the labelling at an early point by running a simple experiment on a labelled file. We can start by doing annotations/experiments using only the first VOB file of the Australia Womens' Doubles 2008 match. #### Next face-to-face meeting: June or July. ________________________________________ From: Windridge D Dr (CVSSP) Sent: 10 March 2011 18:19 To: Cox Stephen Prof (CMP); Christmas WJ Dr (CVSSP); de Campos TE Dr (Electronic Eng); Kittler JV Prof (CVSSP) Cc: Huang Qiang Dr (CMP) Subject: RE: Notes on today's Skype call Dear Stephen, Many thanks for the synopsis - it tallies with my own minuting. As to the specific queries raised, I would be inclined answer as follows: 1. Yes, it would be very useful if Qiang could send the offset file. 2. I'm not sure, but did Australia Woman's 2008 doubles feature on the list? 3. Fully Agree. I would add one other minute: the suggested plan to target, in the short term, a conference paper using the level 1 cross-modal data derived from the data sets for which we do have good quality audio/vision plus synchronization (i.e. we ought to at least have enough data to produce a paper indicating the potential annotation performance improvement gained by passing priors between audio and video). This could then be expanded, in the longer term, into a journal paper once the rule induction/cross-modal bootstrapping is in place, and the system is likely to be more robust with respect to missing event information in either audio or video (i.e. we can draw on a wider array of common datasets for a full evaluation). Thanks again for the very constructive discussions all the best, David ________________________________________ From: Cox Stephen Prof (CMP) [S.J.Cox@uea.ac.uk] Sent: 10 March 2011 17:29 To: Windridge D Dr (CVSSP); Christmas WJ Dr (CVSSP); de Campos TE Dr (Electronic Eng); Kittler JV Prof (CVSSP) Cc: Huang Qiang Dr (CMP) Subject: Notes on today's Skype call Dear All, Thanks for a useful Skype call today. Here is what I think we agreed. 1. Synchronisation The audio/video synchronisation varies from clip to clip but it should be possible to establish manually a single time offset for any given clip that can be used to make synchronisation possible. Qiang will investigate this and report on how well it works. Is there any need for him to send you the values of this offset for the clips? 2. Annotation We agreed to annotate the following four videos, for which have shot-by-shot descriptions, using the new annotation scheme 1. US Open 2009, Single, K.Clijsters vs. Na Li (70 mins) 2. Australia 2010, Double, M+B Byan vs. Nestor/Zimonjic (105 mins) 3. Australia 2010, Single, Na Li vs. V. Williams (150 mins) 4. Australia 2010, Single, M.Cilic vs. A.Roddick (165 mins) 3. Levels of annotation The lowest level will contain events that can be inferred from the signal e.g. ball-hit sounds, line judge's call (audio) ball-position, players' positions (video) etc. The full ontology of these annotations will be discussed when Qiang visits Surrey in early April. These events need fairly accurate audio/video synchronisation. The higher level, or levels, will contain "semantic" events, such as the current score, when a game has ended, who won a point etc. The design of these needs to be done carefully, as we need to make sure that we have the right levels of description to enable inference to be done. David, Teo and Qiang will discuss this when Qiang visits. These events do not need to be so accurately specified in time as the low-level events. We will circulate each other a proposed list of events before Qiang's visit. Best Wishes,