RE: Systematic access to media/plugin metadata from Leonard Rosenthol on 2011-04-06 (public-html@w3.org from April 2011)

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Wed, 6 Apr 2011 06:51:23 -0700
To: Danny Ayers <danny.ayers@gmail.com>, "public-html@w3.org" <public-html@w3.org>
Message-ID: <D23D6B9E57D654429A9AB6918CACEAA9805990FCF4@NAMBX02.corp.adobe.com>
Danny - you are correct.  Not only is there standard way of doing it, but the few things that do exist don't actually take into account the actual metadata standards used today for such assets/resources.

The standard(s) for raster image metadata that are in use by everyone from camera vendors to software have been defined by the MWG (Metadata Working Group - <http://www.metadataworkinggroup.org/>) to which two browser vendors (Apple and Microsoft) are also members.  XMP is the primary standard here and with its recent approval as an ISO standard (16684-1, <http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=57421>) and it's relationship to other W3C standards (RDF, RDF-a, etc.) makes it the clear candidate for exposure in browsers.

In addition, the MWG is currently working on video metadata standards.  Since most video in use today already contains XMP metadata, that is the strongest contender among the options that have been presented.  As such, again, exposure of XMP through a set of APIs would seem like the logical solution here as well.

Finally, as you point out, other plugins such as Adobe Reader or Flash have access to XMP-based metadata as well and IF there were an interface for plugins, they too could expose it.  I will point out, however, that because PDF is a "rich document format" that there can be MULTIPLE sets of XMP that you may want access to.  For example, if I have a picture/image on a page of a PDF, there can be XMP associated with that image IN ADDITION TO the XMP for the entire document.  Not sure how that could/should be exposed - but I present it for your consideration none-the-less.

Granted, we're getting close to Last Call, so not sure how easy it would be to introduce such changes before then into the standard.  However, I think this is clearly an area that needs to be explored and developed and Adobe (for one) would be willing to actively participate in such a development.  


Leonard Rosenthol  |  PDF Architect · Principal Scientist |  Adobe Systems Incorporated  |  leonardr@adobe.com

-----Original Message-----
From: public-html-request@w3.org [mailto:public-html-request@w3.org] On Behalf Of Danny Ayers
Sent: Wednesday, April 06, 2011 3:27 AM
To: public-html@w3.org
Subject: Systematic access to media/plugin metadata

Suppose I wish to index a HTML document for a client-side catalogue or
search engine, and that document includes other 'rich' resources
(images, audio/video, objects for plugins etc). How do I get at the
metadata contained inside those resources for my index without
stepping outside the browser?

I've only just started looking at this problem, so have undoubtedly
missed relevant info in the spec, pointers appreciated. But as far as
I can tell there isn't any systematic answer. I reckon there should
be.

Ok, so what information is potentially available will be dependent on
the user agent and the resource, and there's likely to be a broad
spectrum of capability there (e.g. browsers are likely to be able to
dig inside standard image formats as they have to render them, whereas
an arbitrary plugin object will often be completely out of reach). But
just because the ability for a specific UA to handle a specific kind
of resource is unknown, that doesn't preclude the provision of a
facility for accessing information that is available in a particular
case, and providing a simple framework to cover all such cases.

The general shape of the metadata of any such resource will be the
same: a series of property names (string/URI) and corresponding values
(string/URI) associated with the resource. So I think ideally this
should be available to Javascript through a consistent API, something
along the lines of {resource}.meta.name and {resource}.meta.value.
While it wouldn't be a big stretch to add information about the
availability of meta (something like image availability states) I
don't think this would be necessary - either a name/value pair is
there or it isn't.

There's currently at least one blocker to a systematic approach, at
least for plugins: "This specification does not define a mechanism for
interacting with plugins, as it is expected to be user-agent- and
platform-specific." [1]. But this seems an unnecessarily broad
brushstroke - how a specific plugin handles a given resource is fairly
irrelevant to a generic mechanism for accessing information about the
resource. For example, from the perspective of metadata it doesn't
matter how a plugin renders a PDF, as long as pointers to any embedded
XMP data are available. If the developer of the plugin doesn't wish to
make such data available, fair enough, there's no compulsion.

Anyhow, just putting out feelers - if something might be viable, maybe
a proposal will be possible after a bit of exploration.

Cheers,
Danny.

[1] http://dev.w3.org/html5/spec/Overview.html#plugins


-- 
http://danny.ayers.name
Received on Wednesday, 6 April 2011 13:51:59 UTC