Systematic access to media/plugin metadata

Suppose I wish to index a HTML document for a client-side catalogue or
search engine, and that document includes other 'rich' resources
(images, audio/video, objects for plugins etc). How do I get at the
metadata contained inside those resources for my index without
stepping outside the browser?

I've only just started looking at this problem, so have undoubtedly
missed relevant info in the spec, pointers appreciated. But as far as
I can tell there isn't any systematic answer. I reckon there should
be.

Ok, so what information is potentially available will be dependent on
the user agent and the resource, and there's likely to be a broad
spectrum of capability there (e.g. browsers are likely to be able to
dig inside standard image formats as they have to render them, whereas
an arbitrary plugin object will often be completely out of reach). But
just because the ability for a specific UA to handle a specific kind
of resource is unknown, that doesn't preclude the provision of a
facility for accessing information that is available in a particular
case, and providing a simple framework to cover all such cases.

The general shape of the metadata of any such resource will be the
same: a series of property names (string/URI) and corresponding values
(string/URI) associated with the resource. So I think ideally this
should be available to Javascript through a consistent API, something
along the lines of {resource}.meta.name and {resource}.meta.value.
While it wouldn't be a big stretch to add information about the
availability of meta (something like image availability states) I
don't think this would be necessary - either a name/value pair is
there or it isn't.

There's currently at least one blocker to a systematic approach, at
least for plugins: "This specification does not define a mechanism for
interacting with plugins, as it is expected to be user-agent- and
platform-specific." [1]. But this seems an unnecessarily broad
brushstroke - how a specific plugin handles a given resource is fairly
irrelevant to a generic mechanism for accessing information about the
resource. For example, from the perspective of metadata it doesn't
matter how a plugin renders a PDF, as long as pointers to any embedded
XMP data are available. If the developer of the plugin doesn't wish to
make such data available, fair enough, there's no compulsion.

Anyhow, just putting out feelers - if something might be viable, maybe
a proposal will be possible after a bit of exploration.

Cheers,
Danny.

[1] http://dev.w3.org/html5/spec/Overview.html#plugins

-- 
http://danny.ayers.name

Received on Wednesday, 6 April 2011 10:27:41 UTC