- From: Leonard Rosenthol <lrosenth@adobe.com>
- Date: Wed, 6 Apr 2011 06:51:23 -0700
- To: Danny Ayers <danny.ayers@gmail.com>, "public-html@w3.org" <public-html@w3.org>
Danny - you are correct. Not only is there standard way of doing it, but the few things that do exist don't actually take into account the actual metadata standards used today for such assets/resources. The standard(s) for raster image metadata that are in use by everyone from camera vendors to software have been defined by the MWG (Metadata Working Group - <http://www.metadataworkinggroup.org/>) to which two browser vendors (Apple and Microsoft) are also members. XMP is the primary standard here and with its recent approval as an ISO standard (16684-1, <http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=57421>) and it's relationship to other W3C standards (RDF, RDF-a, etc.) makes it the clear candidate for exposure in browsers. In addition, the MWG is currently working on video metadata standards. Since most video in use today already contains XMP metadata, that is the strongest contender among the options that have been presented. As such, again, exposure of XMP through a set of APIs would seem like the logical solution here as well. Finally, as you point out, other plugins such as Adobe Reader or Flash have access to XMP-based metadata as well and IF there were an interface for plugins, they too could expose it. I will point out, however, that because PDF is a "rich document format" that there can be MULTIPLE sets of XMP that you may want access to. For example, if I have a picture/image on a page of a PDF, there can be XMP associated with that image IN ADDITION TO the XMP for the entire document. Not sure how that could/should be exposed - but I present it for your consideration none-the-less. Granted, we're getting close to Last Call, so not sure how easy it would be to introduce such changes before then into the standard. However, I think this is clearly an area that needs to be explored and developed and Adobe (for one) would be willing to actively participate in such a development. Leonard Rosenthol | PDF Architect ยท Principal Scientist | Adobe Systems Incorporated | leonardr@adobe.com -----Original Message----- From: public-html-request@w3.org [mailto:public-html-request@w3.org] On Behalf Of Danny Ayers Sent: Wednesday, April 06, 2011 3:27 AM To: public-html@w3.org Subject: Systematic access to media/plugin metadata Suppose I wish to index a HTML document for a client-side catalogue or search engine, and that document includes other 'rich' resources (images, audio/video, objects for plugins etc). How do I get at the metadata contained inside those resources for my index without stepping outside the browser? I've only just started looking at this problem, so have undoubtedly missed relevant info in the spec, pointers appreciated. But as far as I can tell there isn't any systematic answer. I reckon there should be. Ok, so what information is potentially available will be dependent on the user agent and the resource, and there's likely to be a broad spectrum of capability there (e.g. browsers are likely to be able to dig inside standard image formats as they have to render them, whereas an arbitrary plugin object will often be completely out of reach). But just because the ability for a specific UA to handle a specific kind of resource is unknown, that doesn't preclude the provision of a facility for accessing information that is available in a particular case, and providing a simple framework to cover all such cases. The general shape of the metadata of any such resource will be the same: a series of property names (string/URI) and corresponding values (string/URI) associated with the resource. So I think ideally this should be available to Javascript through a consistent API, something along the lines of {resource}.meta.name and {resource}.meta.value. While it wouldn't be a big stretch to add information about the availability of meta (something like image availability states) I don't think this would be necessary - either a name/value pair is there or it isn't. There's currently at least one blocker to a systematic approach, at least for plugins: "This specification does not define a mechanism for interacting with plugins, as it is expected to be user-agent- and platform-specific." [1]. But this seems an unnecessarily broad brushstroke - how a specific plugin handles a given resource is fairly irrelevant to a generic mechanism for accessing information about the resource. For example, from the perspective of metadata it doesn't matter how a plugin renders a PDF, as long as pointers to any embedded XMP data are available. If the developer of the plugin doesn't wish to make such data available, fair enough, there's no compulsion. Anyhow, just putting out feelers - if something might be viable, maybe a proposal will be possible after a bit of exploration. Cheers, Danny. [1] http://dev.w3.org/html5/spec/Overview.html#plugins -- http://danny.ayers.name
Received on Wednesday, 6 April 2011 13:51:59 UTC