Re: [httpRedirections-57] Resource-Decription Header: a possible proposal to consider. from Jonathan Rees on 2008-02-07 (www-tag@w3.org from February 2008)

From: Jonathan Rees <jar@creativecommons.org>
Date: Thu, 7 Feb 2008 11:13:40 -0500
To: Richard Cyganiak <richard@cyganiak.de>
Cc: "Williams, Stuart (HP Labs, Bristol)" <skw@hp.com>, "www-tag@w3.org" <www-tag@w3.org>, Graham Klyne <GK@ninebynine.org>, Jonathan Borden <jonathan@openhealth.org>
Message-Id: <D8F69C5B-F92F-4F66-8259-BD9047C03068@creativecommons.org>
Richard,

I like your analysis, but disagree with your suggestion about  
embedded metadata.

If I may paraphrase:  Let's try to identify the need, then figure out  
how best to satisfy it, whether through HTTP or some other means. The  
issue is then

1. The perceived need for a generic "metadata channel" in some  
standard protocol.

I will take the risk that point 2 (the nature of a 303 target and/or  
description) can be taken up independently of point 1 (Stuart takes  
it up in the message he sent while I was composing this one), so I  
won't talk about it, yet. I think this point is much more important.

Certainly it's easy to create metadata channels by sacrificing  
"standard", as demonstrated by the LSID and handle protocols, or by  
sacrificing "generic", as demonstrated by 303s (which are not generic  
across IRs and non-IRs) and MIME-type-specific metadata extraction.  
So the question is how to get the conjunction.

On Feb 7, 2008, at 6:36 AM, Richard Cyganiak wrote:
> ... On the other hand it's not evident to me that HTTP *should*  
> have the ability to locate metadata about IRs.
> This problem can be addressed by layering another mechanism on top  
> of HTTP, e.g. by embedding the metadata in the IR's representation,

Doesn't work.

1. Many formats, such as text/plain or application/pgp-encrypted,  
have no place to put such metadata.

2. Supporting all MIME types is intractable - look at the painful and  
ongoing effort Creative Commons is going through in order to get its  
metadata into a zillion different file formats. Imagine replicating  
that infrastructure for all metadata publishing tools, and updating  
all tools every time a new media type comes along.

3. Having to edit a PDF or JPEG file to modify metadata, besides  
being very clumsy and error prone, disrupts widely used file  
operations such as compression, checksums, digital signatures, write  
dates, mirroring, and so on.

4. The metadata provider may not have access to the resource to  
modify it, either for technical or policy reasons.

5. The material may be protected or licensed, so that representations  
cannot be made available. The metadata is still useful in these cases  
- e.g. it may tell you the resource's subject matter, or how to  
obtain access.

6. Metadata is not always appropriately carried by the resource's  
representations, especially when it transcends what's true about that  
particular representation (stability guarantees, version lists,  
superseded-by links) or contradicts it (e.g. provides an errata or  
critique).

> or linking to the metadata from the IR's representation, or by  
> providing the metadata in an external well-known location a la  
> sitemaps or POWDER.
This is very attractive, and we've discussed options like these  
inside the HCLS IG, but unless this approach has been developed as  
far as Resource-description:, to the point that it is something that  
the TAG or a WG can recommend as the standard way to do this, it is  
hard to compare the two alternative solutions. I could probably track  
down the POWDER reference, but would you mind providing a citation  
for this particular idea?

Another solution would be to create a new protocol (which is what  
you're doing if you say that an HTTP method with the given URI as the  
request-URI won't cut it). This is just what the handle and LSID  
systems have done. From an implementation point of view this would be  
a nightmare, I think.

Or, we could piggyback off of an existing protocol. One could bridge  
http over LSID, for example, and use LSID operations to obtain  
metadata for http-named resources. (I hope this sounds ridiculous.)

> I would like to see more evidence that shows a need for a generic  
> protocol-level mechanism. Elsewhere in this thread, Jonathan cites  
> experiences with LSIDs as one data point. Is there more?
I gave the handle protocol (RFC 3650) as another. Stuart has kindly  
provided more.

This may be difficult to answer because it may be a chicken and egg  
problem. If the TAG or a W3C WG or some other authority were to say  
"this is how you provide descriptions generically" I bet lots of  
people would start using whatever channel was described, as long as  
the implementation overhead were low enough. By writing descriptions  
in RDF, many kinds of information could be communicated in a single  
document, and we could let a thousand ontology flowers bloom. As it  
is, although there is no technical hurdle, there is insufficient  
standardization for "ignition" of such a facility.

Why isn't this the case for 303, which is a sort of standard and has  
some adoption? One reason, I fear, is that it is too subtle for most  
webmasters to "get". But more significantly I think it has limited  
adoption because it only applies to non-IRs. Many of the things that  
one might want to talk about on the (semantic) web are IRs. In  
particular IRs are the subject of what I'm told is the mostly widely  
deployed use of RDF, namely CC license metadata. Give a generic way  
to talk about IRs and you create a huge number of potential users of  
the semantic web - those who want to provide any kind of metadata for  
web-published things ("resource descriptions", remember?). They  
become liberated from the burden of dealing with hundreds of distinct  
MIME types, the awkwardness of having to version files in order to  
change metadata, etc.

> HTTP is concerned with the transport of representations of  
> resources. The nature and relationships of those representations  
> are a separate concern.

This is plausible. Do you consider Resource-description: itself to go  
too far in the direction being concerned with the nature of the  
resource?  It is certainly expedient, and it isolates all information  
about "nature and relationships" to a separate document, but would  
you rather that an HTTP header not even be allowed to mention such a  
document?

Best
Jonathan
Received on Thursday, 7 February 2008 16:14:18 UTC