Re: Systematic access to media/plugin metadata from David Singer on 2011-04-07 (public-html@w3.org from April 2011)

From: David Singer <singer@apple.com>
Date: Thu, 07 Apr 2011 14:25:18 -0700
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: Leonard Rosenthol <lrosenth@adobe.com>, Danny Ayers <danny.ayers@gmail.com>, "public-html@w3.org" <public-html@w3.org>
Message-id: <E235D7E7-0E25-409A-8F12-04FBA037786F@apple.com>
On Apr 7, 2011, at 0:42 , Silvia Pfeiffer wrote:

> Let's also not forget the work of the W3C Media Annotations Working Group.
> http://www.w3.org/2008/WebVideo/Annotations/

Yes, that is the most appropriate place for the W3C to be looking!  The W3C's group has looked at harmonizing API-like access to disparate media containers and formats, and maintaining as much semantic equivalence as possible.  This is probably the most appropriate work for the HTML area.

The MWG, referred to below, has published best practices and recommendations for image data, but not yet for visual.  XMP is part of the conversation there, for sure.

> 
> (Even though my personal opinion is that we need only name-value pairs
> for metadata.)

Yes, many things can be simply expressed this way. I was involved in MPEG-7, which is XML-based, and the ability to structure data, while it looks appealing, can result in complex expressions (and a temptation to allow everything to become complex).

I think that what the browsers should do for metadata should match what they do for tracks and media-data; be as format-agnostic as possible.  I don't see any reason why simple questions, such as "what is the title of this work?", "what is its copyright status, if any?", "what is the role of this track?" shouldn't be answerable without any assumption of the container or metadata format.

> 
> Cheers,
> Silvia.
> 
> 
> On Wed, Apr 6, 2011 at 11:51 PM, Leonard Rosenthol <lrosenth@adobe.com> wrote:
>> Danny - you are correct.  Not only is there standard way of doing it, but the few things that do exist don't actually take into account the actual metadata standards used today for such assets/resources.
>> 
>> The standard(s) for raster image metadata that are in use by everyone from camera vendors to software have been defined by the MWG (Metadata Working Group - <http://www.metadataworkinggroup.org/>) to which two browser vendors (Apple and Microsoft) are also members.  XMP is the primary standard here and with its recent approval as an ISO standard (16684-1, <http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=57421>) and it's relationship to other W3C standards (RDF, RDF-a, etc.) makes it the clear candidate for exposure in browsers.
>> 
>> In addition, the MWG is currently working on video metadata standards.  Since most video in use today already contains XMP metadata, that is the strongest contender among the options that have been presented.  As such, again, exposure of XMP through a set of APIs would seem like the logical solution here as well.
>> 
>> Finally, as you point out, other plugins such as Adobe Reader or Flash have access to XMP-based metadata as well and IF there were an interface for plugins, they too could expose it.  I will point out, however, that because PDF is a "rich document format" that there can be MULTIPLE sets of XMP that you may want access to.  For example, if I have a picture/image on a page of a PDF, there can be XMP associated with that image IN ADDITION TO the XMP for the entire document.  Not sure how that could/should be exposed - but I present it for your consideration none-the-less.
>> 
>> Granted, we're getting close to Last Call, so not sure how easy it would be to introduce such changes before then into the standard.  However, I think this is clearly an area that needs to be explored and developed and Adobe (for one) would be willing to actively participate in such a development.
>> 
>> 
>> Leonard Rosenthol  |  PDF Architect · Principal Scientist |  Adobe Systems Incorporated  |  leonardr@adobe.com
>> 
>> -----Original Message-----
>> From: public-html-request@w3.org [mailto:public-html-request@w3.org] On Behalf Of Danny Ayers
>> Sent: Wednesday, April 06, 2011 3:27 AM
>> To: public-html@w3.org
>> Subject: Systematic access to media/plugin metadata
>> 
>> Suppose I wish to index a HTML document for a client-side catalogue or
>> search engine, and that document includes other 'rich' resources
>> (images, audio/video, objects for plugins etc). How do I get at the
>> metadata contained inside those resources for my index without
>> stepping outside the browser?
>> 
>> I've only just started looking at this problem, so have undoubtedly
>> missed relevant info in the spec, pointers appreciated. But as far as
>> I can tell there isn't any systematic answer. I reckon there should
>> be.
>> 
>> Ok, so what information is potentially available will be dependent on
>> the user agent and the resource, and there's likely to be a broad
>> spectrum of capability there (e.g. browsers are likely to be able to
>> dig inside standard image formats as they have to render them, whereas
>> an arbitrary plugin object will often be completely out of reach). But
>> just because the ability for a specific UA to handle a specific kind
>> of resource is unknown, that doesn't preclude the provision of a
>> facility for accessing information that is available in a particular
>> case, and providing a simple framework to cover all such cases.
>> 
>> The general shape of the metadata of any such resource will be the
>> same: a series of property names (string/URI) and corresponding values
>> (string/URI) associated with the resource. So I think ideally this
>> should be available to Javascript through a consistent API, something
>> along the lines of {resource}.meta.name and {resource}.meta.value.
>> While it wouldn't be a big stretch to add information about the
>> availability of meta (something like image availability states) I
>> don't think this would be necessary - either a name/value pair is
>> there or it isn't.
>> 
>> There's currently at least one blocker to a systematic approach, at
>> least for plugins: "This specification does not define a mechanism for
>> interacting with plugins, as it is expected to be user-agent- and
>> platform-specific." [1]. But this seems an unnecessarily broad
>> brushstroke - how a specific plugin handles a given resource is fairly
>> irrelevant to a generic mechanism for accessing information about the
>> resource. For example, from the perspective of metadata it doesn't
>> matter how a plugin renders a PDF, as long as pointers to any embedded
>> XMP data are available. If the developer of the plugin doesn't wish to
>> make such data available, fair enough, there's no compulsion.
>> 
>> Anyhow, just putting out feelers - if something might be viable, maybe
>> a proposal will be possible after a bit of exploration.
>> 
>> Cheers,
>> Danny.
>> 
>> [1] http://dev.w3.org/html5/spec/Overview.html#plugins
>> 
>> --
>> http://danny.ayers.name
>> 
>> 
> 

David Singer
Multimedia and Software Standards, Apple Inc.
Received on Thursday, 7 April 2011 21:25:49 UTC