- From: Jonathan Rees <jar@creativecommons.org>
- Date: Thu, 7 Feb 2008 11:13:40 -0500
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: "Williams, Stuart (HP Labs, Bristol)" <skw@hp.com>, "www-tag@w3.org" <www-tag@w3.org>, Graham Klyne <GK@ninebynine.org>, Jonathan Borden <jonathan@openhealth.org>
Richard, I like your analysis, but disagree with your suggestion about embedded metadata. If I may paraphrase: Let's try to identify the need, then figure out how best to satisfy it, whether through HTTP or some other means. The issue is then 1. The perceived need for a generic "metadata channel" in some standard protocol. I will take the risk that point 2 (the nature of a 303 target and/or description) can be taken up independently of point 1 (Stuart takes it up in the message he sent while I was composing this one), so I won't talk about it, yet. I think this point is much more important. Certainly it's easy to create metadata channels by sacrificing "standard", as demonstrated by the LSID and handle protocols, or by sacrificing "generic", as demonstrated by 303s (which are not generic across IRs and non-IRs) and MIME-type-specific metadata extraction. So the question is how to get the conjunction. On Feb 7, 2008, at 6:36 AM, Richard Cyganiak wrote: > ... On the other hand it's not evident to me that HTTP *should* > have the ability to locate metadata about IRs. > This problem can be addressed by layering another mechanism on top > of HTTP, e.g. by embedding the metadata in the IR's representation, Doesn't work. 1. Many formats, such as text/plain or application/pgp-encrypted, have no place to put such metadata. 2. Supporting all MIME types is intractable - look at the painful and ongoing effort Creative Commons is going through in order to get its metadata into a zillion different file formats. Imagine replicating that infrastructure for all metadata publishing tools, and updating all tools every time a new media type comes along. 3. Having to edit a PDF or JPEG file to modify metadata, besides being very clumsy and error prone, disrupts widely used file operations such as compression, checksums, digital signatures, write dates, mirroring, and so on. 4. The metadata provider may not have access to the resource to modify it, either for technical or policy reasons. 5. The material may be protected or licensed, so that representations cannot be made available. The metadata is still useful in these cases - e.g. it may tell you the resource's subject matter, or how to obtain access. 6. Metadata is not always appropriately carried by the resource's representations, especially when it transcends what's true about that particular representation (stability guarantees, version lists, superseded-by links) or contradicts it (e.g. provides an errata or critique). > or linking to the metadata from the IR's representation, or by > providing the metadata in an external well-known location a la > sitemaps or POWDER. This is very attractive, and we've discussed options like these inside the HCLS IG, but unless this approach has been developed as far as Resource-description:, to the point that it is something that the TAG or a WG can recommend as the standard way to do this, it is hard to compare the two alternative solutions. I could probably track down the POWDER reference, but would you mind providing a citation for this particular idea? Another solution would be to create a new protocol (which is what you're doing if you say that an HTTP method with the given URI as the request-URI won't cut it). This is just what the handle and LSID systems have done. From an implementation point of view this would be a nightmare, I think. Or, we could piggyback off of an existing protocol. One could bridge http over LSID, for example, and use LSID operations to obtain metadata for http-named resources. (I hope this sounds ridiculous.) > I would like to see more evidence that shows a need for a generic > protocol-level mechanism. Elsewhere in this thread, Jonathan cites > experiences with LSIDs as one data point. Is there more? I gave the handle protocol (RFC 3650) as another. Stuart has kindly provided more. This may be difficult to answer because it may be a chicken and egg problem. If the TAG or a W3C WG or some other authority were to say "this is how you provide descriptions generically" I bet lots of people would start using whatever channel was described, as long as the implementation overhead were low enough. By writing descriptions in RDF, many kinds of information could be communicated in a single document, and we could let a thousand ontology flowers bloom. As it is, although there is no technical hurdle, there is insufficient standardization for "ignition" of such a facility. Why isn't this the case for 303, which is a sort of standard and has some adoption? One reason, I fear, is that it is too subtle for most webmasters to "get". But more significantly I think it has limited adoption because it only applies to non-IRs. Many of the things that one might want to talk about on the (semantic) web are IRs. In particular IRs are the subject of what I'm told is the mostly widely deployed use of RDF, namely CC license metadata. Give a generic way to talk about IRs and you create a huge number of potential users of the semantic web - those who want to provide any kind of metadata for web-published things ("resource descriptions", remember?). They become liberated from the burden of dealing with hundreds of distinct MIME types, the awkwardness of having to version files in order to change metadata, etc. > HTTP is concerned with the transport of representations of > resources. The nature and relationships of those representations > are a separate concern. This is plausible. Do you consider Resource-description: itself to go too far in the direction being concerned with the nature of the resource? It is certainly expedient, and it isolates all information about "nature and relationships" to a separate document, but would you rather that an HTTP header not even be allowed to mention such a document? Best Jonathan
Received on Thursday, 7 February 2008 16:14:18 UTC