- From: Danny Ayers <danny.ayers@gmail.com>
- Date: Wed, 6 Apr 2011 12:27:14 +0200
- To: public-html@w3.org
Suppose I wish to index a HTML document for a client-side catalogue or search engine, and that document includes other 'rich' resources (images, audio/video, objects for plugins etc). How do I get at the metadata contained inside those resources for my index without stepping outside the browser? I've only just started looking at this problem, so have undoubtedly missed relevant info in the spec, pointers appreciated. But as far as I can tell there isn't any systematic answer. I reckon there should be. Ok, so what information is potentially available will be dependent on the user agent and the resource, and there's likely to be a broad spectrum of capability there (e.g. browsers are likely to be able to dig inside standard image formats as they have to render them, whereas an arbitrary plugin object will often be completely out of reach). But just because the ability for a specific UA to handle a specific kind of resource is unknown, that doesn't preclude the provision of a facility for accessing information that is available in a particular case, and providing a simple framework to cover all such cases. The general shape of the metadata of any such resource will be the same: a series of property names (string/URI) and corresponding values (string/URI) associated with the resource. So I think ideally this should be available to Javascript through a consistent API, something along the lines of {resource}.meta.name and {resource}.meta.value. While it wouldn't be a big stretch to add information about the availability of meta (something like image availability states) I don't think this would be necessary - either a name/value pair is there or it isn't. There's currently at least one blocker to a systematic approach, at least for plugins: "This specification does not define a mechanism for interacting with plugins, as it is expected to be user-agent- and platform-specific." [1]. But this seems an unnecessarily broad brushstroke - how a specific plugin handles a given resource is fairly irrelevant to a generic mechanism for accessing information about the resource. For example, from the perspective of metadata it doesn't matter how a plugin renders a PDF, as long as pointers to any embedded XMP data are available. If the developer of the plugin doesn't wish to make such data available, fair enough, there's no compulsion. Anyhow, just putting out feelers - if something might be viable, maybe a proposal will be possible after a bit of exploration. Cheers, Danny. [1] http://dev.w3.org/html5/spec/Overview.html#plugins -- http://danny.ayers.name
Received on Wednesday, 6 April 2011 10:27:41 UTC