- From: Jonathan Rees <jar@creativecommons.org>
- Date: Wed, 6 Oct 2010 11:01:21 -0400
- To: Noah Mendelsohn <nrm@arcanedomain.com>
- Cc: Larry Masinter <LMM@acm.org>, "www-tag@w3.org" <www-tag@w3.org>
On Tue, Oct 5, 2010 at 2:43 PM, Noah Mendelsohn <nrm@arcanedomain.com> wrote: > Larry: > > For some time, the TAG has had open an ACTION-282 on Jonathan: > > ACTION-282 : on - Jonathan Rees - Draft a finding on metadata architecture. > - Due: 2010-10-21 - OPEN > > On our call of 16 Sept. 2010, it was agreed that we would discuss at the > upcoming F2F, and you generously offered [1]: > > "Larry: May be good to have a reading list... I will send mail " > > Accordingly, I was assigned: > > ACTION-465 : on - Noah Mendelsohn - Schedule F2F discussion of ACTION-282, > "which metadata mechanisms to use when". Get reading list from Larry and > www-tag. - Due: 2010-10-05 - OPEN > > So, Larry, it would be very helpful if you would prepare the reading list, > for me to include in the set of required readings for the F2F. Can you give > me an ETA? Thank you! > > Noah > > > [1] http://www.w3.org/2001/tag/2010/09/16-minutes.html#item05 Larry, here are some of my notes on the subject. These are off the cuff and in a full treatment would have to be combined with other material on the subject. -Jonathan ------- Because this is the TAG list I'll use "resource", "representation", and "identification" per AWWW in spite of my dislike of its definitions of those words. Ordinary people should substitute "thing" for "resource", "bag of bits" for "representation", and "naming" or "designation" for "identification". There is confusion about what "metadata" is. In the wider world, and the library community specifically, it means "data about data" or "data about documents". Unfortunately there is a second sense circulating; on occasion "metadata" is applied to information pertaining to just about any kind of entity. For example, a person's date of birth is sometimes called "metadata" about the person. To avoid confusion, and to help preserve the meaningfulness of the word "metadata", I advise restricting "metadata" to the former use, and applying a more general term such as "data" or "descriptive data" in the latter situation. The word "document" suffers from overuse so I will say "metadata subject" for something that metadata can be about. For me these are things that you might put in a library or other document repository. Their identity is preserved through acts of reproduction. They don't change in significant ways - any significant change leads to a different metadata subject, not to a change in the original one. Whether a change is "significant" is always a matter of judgment but mainly what's meant is that reformatting (DOC to PDF, etc.) is not usually significant; if a library has to reformat its holdings to make obsolete formats accessible to current readers that's not considered a threat to the identity of a metadata subject. In the context of web architecture we are concerned with both metadata and (other) descriptive data, because not all "resources" are metadata subjects. To understand metadata on the web you need to distinguish resources from representations, and concomitantly descriptive data for resources from metadata for metadata subjects. For example, consider the resource <http://news.google.com/>. Properly speaking this is not a metadata subject. Descriptive data for this resource might include that it is currently provided by Google Inc., or that the information it yields is updated frequently, or that on 6 October 2010 it linked to an article entitled "Scientists Win Chemistry Nobel for Carbon Atom Link". However, any particular "representation" of this resource would be a perfectly good metadata subject, with metadata such as publication date, language, word count, and subject matter. Metadata that properly belongs to a representation is often asserted instead on a resource that has such a representation. There are several reasons for this: 1. sadly, representations and metadata subjects do not generally have their own URIs, so specifying the subject of metadata assertions is hard, and we just pick the nearest plausible URI (cf. duri:) 2. the metadata might be sufficiently invariant across representations (varying through conneg, session, time, etc.) to justify overloading the resource's URI to mean either the resource or "any representation of the resource" 3. because writing it is so concise, the base URI provides a tempting subject for use in assertions about the representation Thus, one might say that Roy Fielding is an author of the resource <http://www.w3.org/TR/webarch/>, even though what's really meant is that he is an author of the (current) representation(s) of <http://www.w3.org/TR/webarch/>. We might even take the URI as a name not for a potentially changing resource, but for a particular metadata subject (with "representations" varying only in inconsequential ways). Example: based on known site policy, we might take http://www.w3.org/TR/2004/REC-webarch-20041215/ to refer to the 15 December 2004 version of the webarch recommendation, and use this URI to name it in, say, a scholarly references list or bibliographic database. However, any metadata assertion (author, title, etc.) stated using a URI should be approached with caution, as the metadata subject you would see now might not be the one to which that metadata originally applied. Expectations in this regard need to be set through some out-of-band mechanism such as application architecture or articulated site stability policy. Where does one find metadata on the web? We currently have a number of options, among which are: - bibliographic databases and "landing pages" examples: openurl, OAI-ORE, pubmed - embedded in a "representation" in various ways examples: XHTML+RDFa, <title>, <meta>, <link>, XMP - HTTP entity-headers such as Content-language: - following a link provided by a Link: header (see "new opportunities" blog post) - .well-known/host-meta + link-template (see "new opportunities" blog post) In principle metadata can be given directly in a <link> element, Link: header, or host-meta template, but I think we're recommending that there be a single Link: (etc) that directs you to a second document whose purpose is to describe the resource (as "resource description"). Like any metadata source, when a resource URI is available, a resource description could contain descriptive data for the resource, or invariant metadata for its representations, or both. Related to this are linked data practices around GET/303, fragid + RDF. The RDF context is more general than metadata subjects; a set of axioms with a shared subject could be metadata but only if that shared subject is a metadata subject. If two sources of metadata conflict, which one gets priority? The cynical answer is that every chunk of metadata has its own provenance. You have to just know the characteristics of the metadata source, and figure out for yourself which source is more likely to give you the right answer. The question is similar to: If two web pages disagree, which one is right? An answer given by the LRDD draft is that it's an error if Link: metadata conflicts with link-template metadata. What you get from the two sources must be the same. The motivation for this is to allow clients to stop looking for metadata as soon as it is found at one location. The requirement that the metadata be identical frees the client from any need to (at considerable cost in network bandwidth) examine the other source. (Link: and link-template are not yet deployed for metadata discovery as far as I know. They may be in use for other purposes such as OpenID.) Larry has suggested that any particular metadata source could communicate - perhaps through choice between two different Link: relations - whether it intends the provided metadata to override some other source (such as embedded metadata) or not. For example, if a "representation" has embedded metadata asserting that the author is Roy Fielding, but the server (via Link:) asserts that the author is Larry Masinter, there could be two cases: either the server would say that the embedded metadata is more likely to be accurate than what it is providing (i.e. Link: is giving a sort of default), or it might believe that the information it's providing is more likely to be right than embedded metadata (maybe the server's metadata was subject to better QC than the embedded metadata).
Received on Wednesday, 6 October 2010 15:01:49 UTC