RE: Uniform access to metadata: XRD use case. from Larry Masinter on 2009-03-12 (www-archive@w3.org from March 2009)

From: Larry Masinter <masinter@adobe.com>
Date: Wed, 11 Mar 2009 20:05:26 -0700
To: Jonathan Rees <jar@creativecommons.org>
CC: Eran Hammer-Lahav <eran@hueniverse.com>, "connolly@w3.org" <connolly@w3.org>, "www-archive@w3.org" <www-archive@w3.org>
Message-ID: <8B62A039C620904E92F1233570534C9B0118C8826204@nambx04.corp.adobe.com>

>  (I missed the beginning of this; why is this on www-archive instead of
> www-talk as Eran requested, and what document or message are you
> quoting?)

I was quoting an email from Eran. I sent it to www-archive
and not www-talk partly because it was a minor point in 
a long discussion. I'll move it back if it seems relevant.
Used www-archive so we don't have to forward everything.

> The problem is that each format does it in a different way. 

Well, XMP does have a generic "packet scanning" mechanism but
you still need to find a safe place to put metadata that doesn't
interfere with binary file formats.

> So how to get at it, and modify it,

If you can recognize the XMP and there's sufficient padding
and so size isn't an issue, then this isn't so bad.

> If you're not familiar with the details of
> every particular format, which of course is impossible. You could use
> a registry of metadata extractors - for each media type, a program
> that extracts metadata in a way peculiar to that type. You would also
> need programs for altering the metadata.

In fact, this is exactly what happens. Every format has
an indexing profile which leads to a program that knows
how to extract indexable data.

That's how OS X spotlight and XP/Vista search are able to
find arbitrary documents and images and movies by title,
author, date created, etc.

> I don't know where you'd put such a registry or how it
> would be maintained.

This is widely deployed technology. XMP happens to be one
common metadata format for media. The registry is maintained
by the OS search facilities.



> But for me, besides the problem of maintaining such a registry and the
> difficulties of managing all these different formats in different
> positions in the representation, the killer is situations where
> getting the information from the "representation" is impossible in
> principle.

Well, 'in principle' isn't really the issue here, it's 'in practice'.

> Examples include text/plain, 

Yes, impossible. Fortunately, not actually so common.

> encrypted content

Encrypted PDF uses the XMP in the clear. Likely other
formats that support encryption would do so as well.


> content that's inaccessible due to access control restrictions, 
> server policy, or server architecture, 

often, access control policy for metadata is the
same as access control for the data. But cases
where it isn't, that's a case for outboard metadata.

> and exotic media types not known at the time
> an application is written or in the hypothetical registry.

The registry isn't hypothetical. And applications that
go through a generic metadata library interface get the
extensibility of dynamically loading new formats.

> I agree that metadata should go in the "representation" whenever
> possible.

I'm not sure I would say "whenever possible" but certainly
"when appropriate"

>  But you seem to be wilfully missing the point of the
> exercise. 

I don't think so.

> I just don't get the resistance to agreeing on a uniform
> protocol,

I think it's important for applications to simple metadata
APIs, but of course, everything should be as simple as possible
but no simpler.  The failure modes of separated metadata --
that the metadata gets lost, separated from the data,
modified inappropriately -- need to be traded off against
the costs and application requirements.

>  to augment in-representation metadata or to provide it in
> situations where it really is impossible or impractical to get it from
> the "representation".

I'm not at all opposed to a uniform access method, just
wanting it to recognize the complexity of deployed content.


Larry
-- 
http://larry.masinter.net

Received on Thursday, 12 March 2009 03:07:03 UTC