- From: <Patrick.Stickler@nokia.com>
- Date: Mon, 2 Mar 2009 11:58:27 +0100
- To: <www-tag@w3.org>
- CC: <julian.reschke@gmx.de>, <jar@creativecommons.org>, <connolly@w3.org>, <wangxiao@musc.edu>, <eran@hueniverse.com>
(I'm collecting and posting some comments made in another thread which I feel are important to the discussions concerning uniform access to metadata and which some may have missed if they are not following that particular thread) When talking about uniform access to metadata descriptions, it is clear that we must have a clear definition of what is meant by "description" and what the primary purpose of such descriptions is. It may also be useful of we used a qualified term such as "semantic web description" to indicate that we are talking about a very specific, narrowly constrained meaning. Taking "resource" and "representation" per AWWW... I propose that the "semantic web description" of a resource be defined as a particular subtype of "representation" from which may be derived one or more RDF graphs which, if merged, the merged graph will contain one or more triples in which the URI of the resource in question occurs as the subject, and where there may be zero or more triples in which the URI of the resource in question does not occur as the subject. Ideally, semantic web descriptions would be served as RDF/XML, but other alternative forms of expression should be allowed, expecially in support of embedded microformats or alternative graph serialization encodings. Note that a particular representation may provide a description of a resource, such as one expressed in English prose, but if no RDF graph can be derived from the representation using reasonable industry standard methods, such that the statements of fact made about the resource are inaccessible to a semantic web agent, that representation is not a semantic web description (even if it may still be a description of the resource in the broader sense). This is a practical, functional distinction, based on the specific needs of semantic web agents. And just as it is immediate clear from a server's response to a web agent's request that there are no representations available via a particular URI, likewise it should be just as efficiently and immediately clear from a server's response to a semantic web agent request that there are no semantic web descriptions available via a particular URI. And just as there may be no representation available for a resource, likewise there may be no semantic web description available for a resource. And just as there may be multiple representations of a given resource, so too may there be multiple semantic web descriptions of a given resource. A set of alternative semantic web descriptions may differ in their (a) level of detail (how much is said about the resource in question) (b) degree of focus (how much is said about other related resources, either in a particular graph or in all graphs serialized in the ) (c) encoding (how the RDF graph(s) are serialized in the representation) (d) noise level (the ratio of bytes which correspond to graph serialization versus other markup and/or content of some kind) The above four facets come into play across the entire spectrum of metadata creation, management, publication, and discovery. Folks building semantic web agents, and servers which cater to them, are seeking an optimal, standardized way for semantic web agents to have clear, efficient, and uniform access to those particular special semantic web descriptions they need, such that both the means of access are optimal as well as the above facets (a) through (d) are optimal for their needs, and to do so with minimal disruption to existing web based solutions, and with minimal burden to either implementors or content producers. To take the hammer analogy. A representation is a nail. A semantic web description is a specialized kind of nail which ideally needs a particular kind of hammer. Not any old hammer will work well for that nail, some of the hammers may work better than others, and many kinds of hammers won't work at all. If one needs to use a particular kind of nail, one looks for the most optimal kind of hammer available for that kind of nail, and if none of the hammers one has in one's toolbox are sufficiently good for the job (even if a few might be made to work with a certain level of success) one adds a new hammer to one's toolbox, a hammer that works optimally with that particular kind of nail. And if one works almost exclusively with a particular kind of nail, one will want a hammer that is as optimal as possible for that kind of nail. HTTP GET is a very good and long proven hammer for working with representation nails. You might consider it your quitenssential hammer, and representations the quitessential nail. The kind that everyone has around the house or shop, and is used far more than any other kind of hammer and nail. Semantic web descriptions are a very special kind of nail. HTTP GET plus some form of linking is a hammer that *can* be used for semantic web description nails, but not optimally (and I've explained why elsewhere). HTTP GET plus content negotiation is another hammer that *can* be used for semantic web description nails, but not optimally (and I've explained why elsewhere). URIQA is a hammer that is specifically designed to be maximally optimal for working with semantic web description nails. Semantic web agents who deal (in most cases) exclusively with semantic web description nails deserve a hammer that is optimally suited for their work, not just any old hammer that kind of gets the job done, but not terribly well. The standardized semantic web machinery should not be a hack or a kludge. A clear distinction must be made between the methdologies and tools used to create, manage, and publish/expose metadata from the methodologies and tools used to discover and access metadata. What is optimal for the creation, management, and publication processes is not necessarily optimal for the discovery and access processes (and in my experience, seldom is). And the solution chosen for uniform access to metadata should not unduely limit or show bias to the various alternative methodologies and tools used for creating, managing, and publishing such metadata, and must work equally well for both existing (legacy) systems and resources as well as new systems and resources. Linking tied to representations (either in the HTTP header or embedded in markup) makes sense when those links are presumed to be interpreted in the context of consuming the particular representation (e.g. links to stylesheets, etc.). But such linking places too much needless processing burden on semantic web agents who really are not interested in just any representation, but only semantic web descriptions. All of the alternative proposals to URIQA for efficient *uniform* access to metadata, which I have seen thus far, are actually optimal rather as alternative methods of exposing metadata to harvesting agents, to be syndicated into a knowledgebase which actually provides the truly uniform access to that metdata to semantic web agents. URIQA in no way precludes using any of the proposed linking or embedding approaches on the authoring/management/publication side to associate descriptions with resources, but given the sheer variety of options, such details should not be relevant to external semantic web agents. URIQA enables whatever metadata management and publication techniques a site might employ (possibly many, and probably changing over time) to remain properly hidden under the hood and irrelevant to semantic web agents requesting descriptions. Expecting every semantic web agent to deal with representations containing embedded metadata (or maybe nothing more than embedded links to metadata) and to have to sleuth around to figure out which of a number of possible discovery strategies is being used on a particular site is *ludicrous*. Arguments along the lines of "well, if the agents use HEAD and the links are provided in the HTTP response header, then the agents don't have to retrieve and parse the representations" seem totally disconnected from practical reality. Linking is great and useful. Microformats are great and useful. HTML <meta> tags area great and useful. There are many many great and useful methods for expressing/exposing descriptive metadata about resources, and different environments and tools and processes (and user skills) will prefer some options to others, and in many cases, multiple combinations will be used. But when it comes time for semantic web agents to ask particular servers for authoritative descriptions of resources denoted by a URI grounded in that server root, which are in a format that they can efficiently consume, we really should expect a single, simple, efficient, optimal, and *uniform* method of access to such descriptions. To that end, I see no other valid proposal on the table aside from URIQA. And URIQA complements all of the other techniques put forward, as they are used in various contexts by content owners to associate and expose descriptive metadata to a harvesting component of a URIQA based solution. Regards, Patrick -- Patrick Stickler Chief Architect Forum Nokia Developer Infrastructure & Operations +358 50 4823 878 patrick.stickler@nokia.com
Received on Monday, 2 March 2009 10:56:54 UTC