- From: Dave Singer <singer@apple.com>
- Date: Fri, 3 Oct 2008 11:07:31 -0700
- To: public-media-annotation@w3.org
I think we need to wonder a little about what is doing the interaction with the media resource. Perhaps the easiest use-case is a web site that allows for upload, and wants to display selected information about the media clips it gets. That use-case asks not only for metadata but also web/DOM-level APIs that give uniform access to selected metadata across a variety of file formats. Search engines are, in some senses, out of scope (but read on). The crawl the web, and find media files, but how they extract the metadata from those media files is a private arrangement for them ('off the web'); they don't use web APIs (I think) and they can handle whatever formats they like in whatever way they like. The sense in which this is in scope is that we'd like search/index engines again to be able to do uniform indexing of selected metadata across a variety of formats, so again we need some level of semantic match for those metadata elements across a variety of formats. The semantic (mis)match problem is easily illustrated. Consider two metadata systems: A has tags for Title, Artist B has tags for Title, Sub-Title, Artist, Composer We find the same work in these two formats; A Title="Dvorak Symphony 6, II Adagio", Artist="BBC Symphony Orchestra" B Title="Symphony 6", Sub-title="II Adagio", Artist="BBC Symphony Orchestra", Composer="Dvorak, Antonin" What does the DOM API return when the script asks for "Artist" -- does the composer get included from file B, even though in A he's been put in the title(faute de mieux)? Indeed, does the first file ever get indexed under the name of Dvorak? And so on. One simple case is that people with ownership rights in media will be very unhappy if a web page *cannot* access basic information about ownership (the copyright notice, for example). It's not that it must be present in every file, or accessed by every page, but that every file should be capable of carrying the notice, and any page should be capable of getting it if it's there. Other things to think about: * is the annotation structured or simple? So, for example, is a person a structured element with family name, given name, birthdate, and so on, or is it a string "Dvorak, Antonin"? * are annotations temporal (possibly varying in time) or atemporal? Most metadata systems today treat it as atemporal ('what is the copyright?') but this runs into problems when e.g. media is pasted together, or for TV-like stations. I am tempted to say that all queries should be relative to a time-point and all answers return the bracketing time-range over which the answer is valid (which might be large or even of indefinite extent): what is the copyright at time 10? from 5 thru 2005 it is "(C) Acme digital 1665". * what about the data-type of annotations? Most annotation systems today use strings, but this makes life interesting when a metadata item is the cover art of an album. There really aren't great ways to handle this kind of typed binary data in typical DOM/scripting environments, as I understand it. well, there are many more... -- David Singer Apple/QuickTime
Received on Friday, 3 October 2008 18:09:11 UTC