detailed comments on the use case and reqs 1.0 draft from David Singer on 2009-04-16 (public-media-annotation@w3.org from April 2009)

From: David Singer <singer@apple.com>
Date: Thu, 16 Apr 2009 11:21:37 +0200
To: public-media-annotation@w3.org
Message-Id: <p0624083fc60ca22daa86@[17.202.35.52]>
Hi, after a read-through of 
<http://www.w3.org/TR/2009/WD-media-annot-reqs-20090119/> I have some 
detailed comments...

1:  change 'needs' to 'need'..."In addition, video services on the 
web...need..."

3:  below the diagram, change 'both' to 'either' (unless somehow the 
media resource is annotated in both formats at once):  "that will 
return values from either the XMP or IPTC metadata"

3:  The last paragraph, the words "applications, like:" and what 
follows would be better phrased as "applications.  For example:" and 
then follow with an HTML list (bulleted, probably).

Requirement r4, I would change "custom" to "user-defined" or 
"non-standard", I think.

5.2: towards the end, "etc." is missing its "."

5.3:  this has the feel of having been edited a few times and the 
language is now a bit odd.  Can I suggest:

People nowadays are able to enjoy large number of programs from 
different content providers (broadcasting companies, Internet video 
website, etc.). To achieve better user experience, reduce the user's 
experience of being overloaded, and hence retain users, some systems 
provide recommendations based on the user's history, ratings, or 
stated preferences. However, different content providers usually have 
their specific or proprietary metadata models, which is one of the 
key problems faced by recommendation service providers. A common 
ontology spanning different metadata sets can allow recommendation 
systems to return a better, larger, and more relevant selection than 
when the metadata systems are unrelated.

Company A is an IPTV add-value service provider. One of their 
services is to recommend programs that users might like, based on 
their watching history or explicit rating of programs. In this 
system, users are able to watch regular TV programs with electronic 
program guide (EPG) format metadata, videos such as from YouTube, 
with website-specific metadata, etc. In order to perform uniform and 
effective recommendation in the absence of a common set of 
vocabularies, they would need to design own integrated media 
annotation model.

5.5:  the set over which "Find all" is not well identified, I assume 
it's "within a database, such as that of a search engine indexing the 
internet or other web-accessible content (e.g. a corporate 
repository, library, etc.)".

5.7:  I think there is a major use case that needs mentioning: 
accessibility.  There are requirements that for users who are unable 
to consume time-based media in general, or some formats in 
particular, the media data have annotations and links that express a 
summary, transcript, and so on.

6:  Requirements.  Very little is said here about the format of the 
returned result.  Most metadata systems are appallingly vague about 
the format of the stored data (often merely saying it's a string), 
even when the key suggests a restricted value set (e.g. "creation 
date").  This should probably be reflected in 6.13 requirement r13, 
"allow for undefined/unformatted return values for the same property" 
as well as the current text.



More structural comments:

security: we need to say we understand that user-agent access to 
metadata might give rise to cross-site scripting and other security 
issues, but that we expect those issues to be handled in the same way 
as the same issue for images and other embedded data.

media ID:  we probably want to say that a major question both users 
and search engines might like to know is whether two pieces of media, 
two URLs etc. essentially are referring to the same content.  This 
can really only be done with a uniform media identifier system, and 
unfortunately there is none (despite ISRC etc.).

time varying:  some metadata is naturally time-varying (e.g. "what 
chapter or scene of this movie am I in?") and the cue-ranges design 
of HTML5 is designed to support this (e.g. flipping slides to go with 
a video of someone speaking).  While I realize that many major 
metadata systems express time-invariant metadata ('for the whole 
file'), it might be prudent to document how we intend to handle this. 
It could be by using URLs that include fragment identifiers in the 
query (i.e. a script could find a URL on the page and explicitly add 
"#t=30s" to the end), or it could be by separate arguments to the 
API.  Some discussion of this captured in the document would be 
prudent, I think.
-- 
David Singer
Multimedia Standards, Apple Inc.
Received on Thursday, 16 April 2009 09:24:20 UTC