Hi all,

Complementing Stephen and David's emails:

#### items 1-3 of the agenda:
About which videos to examine/annotate, we decided that Qiang 
will process this one:

*Australia Open Women's doubles match, 2008.*

This is the one which has minor synchronisation issues that Qiang agreed
to rectfy by finding the "magic numbers", i.e., the time lag for
each of the four VOB files in that DVD.
The plan is that nobody will modify the "master" VOB files but we
will use these magic numbers to sync our annotations.

In the fugure, once we are happy with the job on the above match,
both sites will start working with another match.


#### items 4,5 and 6, about levels of annotation and ontology:

Like Stephen reminded, Ibrahim's ICIP paper has quite
a large set of classes annotated:
http://www.ee.surrey.ac.uk/CVSSP/Publications/papers/Almajai-ICIP-2010.pdf

SFR Serve by Far player, Right Side
SFL Serve by Far player, Left Side
SNR Serve by Near player, Right Side
SNL Serve by Near player, Left Side
BIF Bounce Inside Far player's half court
BOF Bounce Outside Far player's half court
BIN Bounce Inside Near player's half court
BON Bounce Outside Near player's half court
HF Hit by Far player
HN Hit by Near player
BIFSR Bounce Inside Far player's Serve area on the Right
BIFSL Bounce Inside Far player's Serve area on the Left
BOFS Bounce Out of Far player's Serve area
BINSR Bounce Inside Near player's Serve area on the Right
BINSL Bounce Inside Near player's Serve area on the Left
BONS Bounce Out of Near player's Serve area
NET Bounce on NET

Obviously we cannot "hear" all of the above classes
from audio, and audio has different sets of classes
that come from the voice of the umpire, line judges
and crowd, but it's important that if we can at least
keep the "hit" sound/action as a common point that 
can be detected using both audio and visual cues.

At a higher level, there are match situations and scores.

From our side, our existing system produces an annotation
of all the events above and of detected scores in 
a number of XML files. All of them have time tags,
some tags are instantaneous (ex. the time a hit is
detected) and some tags refer to a play shot (a time
interval). I'm sharing some sample XML files so that
you guys have a better idea of this:
serveDetection.xml has "low-level" instantaneous events
highLevel.xml has scores detected.


*Action for UEA*: Can you guys pelase produce a list 
of audio events that you believe you will be able to 
detect?

#### 7. How the labelling tasks will be split between the sites

I believe it's pretty obvious that:
- Audio: UEA
- Synchronisation: UEA
- Video: Surrey
we can keep iterating about details of annotations.

#### 8. Propose a schedule for labelling
Qiang comes over to Surrey: early April.
Hopefully by then the synchronisation issue will be 
solved and a sketch of what we want to detect/annotate
will be on the way.

#### 9. Propose a test of the labelling at an early point by running a simple
experiment on a labelled file. 

We can start by doing annotations/experiments using only 
the first VOB file of the Australia Womens' Doubles 2008 match.


#### Next face-to-face meeting: 
June or July.


________________________________________
From: Windridge D Dr (CVSSP)
Sent: 10 March 2011 18:19
To: Cox Stephen Prof (CMP); Christmas WJ Dr (CVSSP); de Campos TE Dr (Electronic Eng); Kittler JV Prof (CVSSP)
Cc: Huang Qiang Dr (CMP)
Subject: RE: Notes on today's Skype call

Dear Stephen,

Many thanks for the synopsis - it tallies with my own minuting. As to the specific queries raised, I would be inclined answer as follows:

1. Yes, it would be very useful if Qiang could send the offset file.

2. I'm not sure, but did Australia Woman's 2008 doubles feature on the list?

3. Fully Agree.

I would add one other minute: the suggested plan to target, in the short term, a conference paper using the level 1 cross-modal data derived from the data sets for which we do have good quality audio/vision plus synchronization (i.e. we ought to at least have enough data to produce a paper indicating the potential annotation performance improvement gained by passing priors between audio and video).

This could then be expanded, in the longer term, into a journal paper once the rule induction/cross-modal bootstrapping is in place, and the system is likely to be more robust with respect to missing event information in either audio or video (i.e. we can draw on a wider array of common datasets for a full evaluation).

Thanks again for the very constructive discussions

all the best,
David


________________________________________
From: Cox Stephen Prof (CMP) [S.J.Cox@uea.ac.uk]
Sent: 10 March 2011 17:29
To: Windridge D Dr (CVSSP); Christmas WJ Dr (CVSSP); de Campos TE Dr (Electronic Eng); Kittler JV Prof (CVSSP)
Cc: Huang Qiang Dr (CMP)
Subject: Notes on today's Skype call

Dear All,

Thanks for a useful Skype call today.

Here is what I think we agreed.

1. Synchronisation
The audio/video synchronisation varies from clip to clip but it should be possible to establish manually a single time offset for any given clip that can be used to make synchronisation possible.  Qiang will investigate this and report on how well it works.  Is there any need for him to send you the values of this offset for the clips?

2. Annotation
We agreed to annotate the following four videos, for which have shot-by-shot descriptions, using the new annotation scheme
1. US Open 2009,     Single,  K.Clijsters vs. Na Li  (70 mins)
2. Australia 2010,   Double,  M+B Byan    vs. Nestor/Zimonjic (105 mins)
3. Australia 2010,   Single,  Na Li       vs. V. Williams     (150 mins)
4. Australia 2010,   Single,  M.Cilic     vs. A.Roddick       (165 mins)

3. Levels of annotation
The lowest level will contain events that can be inferred from the signal e.g. ball-hit sounds, line judge's call (audio) ball-position, players' positions (video) etc.  The full ontology of these annotations will be discussed when Qiang visits Surrey in early April.  These events need fairly accurate audio/video synchronisation.

The higher level, or levels, will contain "semantic" events, such as the current score, when a game has ended, who won a point etc.  The design of these needs to be done carefully, as we need to make sure that we have the right levels of description to enable inference to be done.  David, Teo and Qiang will discuss this when Qiang visits.  These events do not need to be so accurately specified in time as the low-level events.

We will circulate each other a proposed list of events before Qiang's visit.

Best Wishes,