The right kind of hammer for the particular kind of nail (uniform access to metadata descriptions) from Patrick.Stickler@nokia.com on 2009-03-02 (www-tag@w3.org from March 2009)

From: <Patrick.Stickler@nokia.com>
Date: Mon, 2 Mar 2009 11:58:27 +0100
To: <www-tag@w3.org>
CC: <julian.reschke@gmx.de>, <jar@creativecommons.org>, <connolly@w3.org>, <wangxiao@musc.edu>, <eran@hueniverse.com>
Message-ID: <C5D18873.E213%patrick.stickler@nokia.com>
(I'm collecting and posting some comments made in another thread which I
feel are important to the discussions concerning uniform access to metadata
and which some may have missed if they are not following that particular
thread)



When talking about uniform access to metadata descriptions, it is clear that
we must have a clear definition of what is meant by "description" and what
the primary purpose of such descriptions is. It may also be useful of we
used a qualified term such as "semantic web description" to indicate that we
are talking about a very specific, narrowly constrained meaning.

Taking "resource" and "representation" per AWWW...

I propose that the "semantic web description" of a resource be defined as a
particular subtype of "representation" from which may be derived one or more
RDF graphs which, if merged, the merged graph will contain one or more
triples in which the URI of the resource in question occurs as the subject,
and where there may be zero or more triples in which the URI of the resource
in question does not occur as the subject.

Ideally, semantic web descriptions would be served as RDF/XML, but other
alternative forms of expression should be allowed, expecially in support of
embedded microformats or alternative graph serialization encodings.

Note that a particular representation may provide a description of a
resource, such as one expressed in English prose, but if no RDF graph can be
derived from the representation using reasonable industry standard methods,
such that the statements of fact made about the resource are inaccessible to
a semantic web agent, that representation is not a semantic web description
(even if it may still be a description of the resource in the broader
sense). This is a practical, functional distinction, based on the specific
needs of semantic web agents.

And just as it is immediate clear from a server's response to a web agent's
request that there are no representations available via a particular URI,
likewise it should be just as efficiently and immediately clear from a
server's response to a semantic web agent request that there are no semantic
web descriptions available via a particular URI.

And just as there may be no representation available for a resource,
likewise there may be no semantic web description available for a resource.

And just as there may be multiple representations of a given resource, so
too may there be multiple semantic web descriptions of a given resource.

A set of alternative semantic web descriptions may differ in their

(a) level of detail (how much is said about the resource in question)

(b) degree of focus (how much is said about other related resources, either
in a particular graph or in all graphs serialized in the )

(c) encoding (how the RDF graph(s) are serialized in the representation)

(d) noise level (the ratio of bytes which correspond to graph serialization
versus other markup and/or content of some kind)

The above four facets come into play across the entire spectrum of metadata
creation, management, publication, and discovery.

Folks building semantic web agents, and servers which cater to them, are
seeking an optimal, standardized way for semantic web agents to have clear,
efficient, and uniform access to those particular special semantic web
descriptions they need, such that both the means of access are optimal as
well as the above facets (a) through (d) are optimal for their needs, and to
do so with minimal disruption to existing web based solutions, and with
minimal burden to either implementors or content producers.

To take the hammer analogy. A representation is a nail. A semantic web
description is a specialized kind of nail which ideally needs a particular
kind of hammer. Not any old hammer will work well for that nail, some of the
hammers may work better than others, and many kinds of hammers won't work at
all. If one needs to use a particular kind of nail, one looks for the most
optimal kind of hammer available for that kind of nail, and if none of the
hammers one has in one's toolbox are sufficiently good for the job (even if
a few might be made to work with a certain level of success) one adds a new
hammer to one's toolbox, a hammer that works optimally with that particular
kind of nail. And if one works almost exclusively with a particular kind of
nail, one will want a hammer that is as optimal as possible for that kind of
nail.

HTTP GET is a very good and long proven hammer for working with
representation nails. You might consider it your quitenssential hammer, and
representations the quitessential nail. The kind that everyone has around
the house or shop, and is used far more than any other kind of hammer and
nail.

Semantic web descriptions are a very special kind of nail.

HTTP GET plus some form of linking is a hammer that *can* be used for
semantic web description nails, but not optimally (and I've explained why
elsewhere). 

HTTP GET plus content negotiation is another hammer that *can* be used for
semantic web description nails, but not optimally (and I've explained why
elsewhere). 

URIQA is a hammer that is specifically designed to be maximally optimal for
working with semantic web description nails. Semantic web agents who deal
(in most cases) exclusively with semantic web description nails deserve a
hammer that is optimally suited for their work, not just any old hammer that
kind of gets the job done, but not terribly well.

The standardized semantic web machinery should not be a hack or a kludge.

A clear distinction must be made between the methdologies and tools used to
create, manage, and publish/expose metadata from the methodologies and tools
used to discover and access metadata. What is optimal for the creation,
management, and publication processes is not necessarily optimal for the
discovery and access processes (and in my experience, seldom is).

And the solution chosen for uniform access to metadata should not unduely
limit or show bias to the various alternative methodologies and tools used
for creating, managing, and publishing such metadata, and must work equally
well for both existing (legacy) systems and resources as well as new systems
and resources.

Linking tied to representations (either in the HTTP header or embedded in
markup) makes sense when those links are presumed to be interpreted in the
context of consuming the particular representation (e.g. links to
stylesheets, etc.). But such linking places too much needless processing
burden on semantic web agents who really are not interested in just any
representation, but only semantic web descriptions.

All of the alternative proposals to URIQA for efficient *uniform* access to
metadata, which I have seen thus far, are actually optimal rather as
alternative methods of exposing metadata to harvesting agents, to be
syndicated into a knowledgebase which actually provides the truly uniform
access to that metdata to semantic web agents.

URIQA in no way precludes using any of the proposed linking or embedding
approaches on the authoring/management/publication side to associate
descriptions with resources, but given the sheer variety of options, such
details should not be relevant to external semantic web agents. URIQA
enables whatever metadata management and publication techniques a site might
employ (possibly many, and probably changing over time) to remain properly
hidden under the hood and irrelevant to semantic web agents requesting
descriptions.

Expecting every semantic web agent to deal with representations containing
embedded metadata (or maybe nothing more than embedded links to metadata)
and to have to sleuth around to figure out which of a number of possible
discovery strategies is being used on a particular site is *ludicrous*.

Arguments along the lines of "well, if the agents use HEAD and the links are
provided in the HTTP response header, then the agents don't have to retrieve
and parse the representations" seem totally disconnected from practical
reality.

Linking is great and useful. Microformats are great and useful. HTML <meta>
tags area great and useful. There are many many great and useful methods for
expressing/exposing descriptive metadata about resources, and different
environments and tools and processes (and user skills) will prefer some
options to others, and in many cases, multiple combinations will be used.

But when it comes time for semantic web agents to ask particular servers for
authoritative descriptions of resources denoted by a URI grounded in that
server root, which are in a format that they can efficiently consume, we
really should expect a single, simple, efficient, optimal, and *uniform*
method of access to such descriptions.

To that end, I see no other valid proposal on the table aside from URIQA.

And URIQA complements all of the other techniques put forward, as they are
used in various contexts by content owners to associate and expose
descriptive metadata to a harvesting component of a URIQA based solution.

Regards,

Patrick

-- 

Patrick Stickler 
Chief Architect
Forum Nokia Developer Infrastructure & Operations
+358 50 4823 878
patrick.stickler@nokia.com
Received on Monday, 2 March 2009 10:56:54 UTC