- From: Larry Masinter <masinter@adobe.com>
- Date: Wed, 11 Mar 2009 20:05:26 -0700
- To: Jonathan Rees <jar@creativecommons.org>
- CC: Eran Hammer-Lahav <eran@hueniverse.com>, "connolly@w3.org" <connolly@w3.org>, "www-archive@w3.org" <www-archive@w3.org>
> (I missed the beginning of this; why is this on www-archive instead of > www-talk as Eran requested, and what document or message are you > quoting?) I was quoting an email from Eran. I sent it to www-archive and not www-talk partly because it was a minor point in a long discussion. I'll move it back if it seems relevant. Used www-archive so we don't have to forward everything. > The problem is that each format does it in a different way. Well, XMP does have a generic "packet scanning" mechanism but you still need to find a safe place to put metadata that doesn't interfere with binary file formats. > So how to get at it, and modify it, If you can recognize the XMP and there's sufficient padding and so size isn't an issue, then this isn't so bad. > If you're not familiar with the details of > every particular format, which of course is impossible. You could use > a registry of metadata extractors - for each media type, a program > that extracts metadata in a way peculiar to that type. You would also > need programs for altering the metadata. In fact, this is exactly what happens. Every format has an indexing profile which leads to a program that knows how to extract indexable data. That's how OS X spotlight and XP/Vista search are able to find arbitrary documents and images and movies by title, author, date created, etc. > I don't know where you'd put such a registry or how it > would be maintained. This is widely deployed technology. XMP happens to be one common metadata format for media. The registry is maintained by the OS search facilities. > But for me, besides the problem of maintaining such a registry and the > difficulties of managing all these different formats in different > positions in the representation, the killer is situations where > getting the information from the "representation" is impossible in > principle. Well, 'in principle' isn't really the issue here, it's 'in practice'. > Examples include text/plain, Yes, impossible. Fortunately, not actually so common. > encrypted content Encrypted PDF uses the XMP in the clear. Likely other formats that support encryption would do so as well. > content that's inaccessible due to access control restrictions, > server policy, or server architecture, often, access control policy for metadata is the same as access control for the data. But cases where it isn't, that's a case for outboard metadata. > and exotic media types not known at the time > an application is written or in the hypothetical registry. The registry isn't hypothetical. And applications that go through a generic metadata library interface get the extensibility of dynamically loading new formats. > I agree that metadata should go in the "representation" whenever > possible. I'm not sure I would say "whenever possible" but certainly "when appropriate" > But you seem to be wilfully missing the point of the > exercise. I don't think so. > I just don't get the resistance to agreeing on a uniform > protocol, I think it's important for applications to simple metadata APIs, but of course, everything should be as simple as possible but no simpler. The failure modes of separated metadata -- that the metadata gets lost, separated from the data, modified inappropriately -- need to be traded off against the costs and application requirements. > to augment in-representation metadata or to provide it in > situations where it really is impossible or impractical to get it from > the "representation". I'm not at all opposed to a uniform access method, just wanting it to recognize the complexity of deployed content. Larry -- http://larry.masinter.net
Received on Thursday, 12 March 2009 03:07:03 UTC