Re: Semantic Web User Agent Conformance from Harry Halpin on 2007-11-27 (semantic-web@w3.org from November 2007)

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Tue, 27 Nov 2007 16:46:00 -0500
To: Danny Ayers <danny.ayers@gmail.com>
Cc: "Sean B. Palmer" <sean@miscoranda.com>, semantic-web@w3.org
Message-ID: <474C9018.6050003@ibiblio.org>
Danny Ayers wrote:
> On 22/11/2007, Sean B. Palmer <sean@miscoranda.com> wrote:
>
>   
>> I'm proposing some kind of work on conformance levels for Semantic Web
>> User Agents, such that when someone says "how many triples are in
>> $uri", we can answer confidently "a Class 2 Semantic Web User Agent
>> will return 53 triples"; or perhaps not *that* abstract, but along
>> those lines
This problem is not unique to RDF. Even a successful and rather aged
specification such as XML has large gaps here. For example, does the
infoset include Xinclude processing or not? As Henry Thompson pointed
out earlier today: "Exactly what /the/ infoset of an XML document is is
already somewhat under-determined, in that a well-formed XML document as
processed by a conformant processor may yield two distinct infosets,
depending on whether that processor processes all the external parameter
entities in the document's DTD." [1]. Indeed, if you can't even
determine what the conformance of XML is, is there hope for RDF?

However, I do think it would be useful to look at types of conformance
levels. I can think of a few off the top of my head (limiting myself to
W3C Recs and Rec-track things)

1) RDF from RDF/XML without any entailment
3) RDF from RDF/XML + GRDDL + RDFa
4) RDF from RDF/XML + GRDDL + RDFa + RDF(S) reasoning
5) RDF from RDF/XML + GRDDL + RDFa + RDF(S) reasoning + OWL entailment

Note it's combinatoric, the hierarchy is just a single off the cuff but
sensible one. One could easily have RDF from RDF/XML + OWL entailment.
And one could specify more precisely the conditions that one would do
the OWL entailment with...anyways, if someone wanted to write up draft
levels.

In XML world what they are doing to their version of this question so
far (which is, does an XML document include information post-DTD
attribute defaulting? How about XML Schema validation PSVI?)
is to produce a whole WG and a mini-language to let users talk about it
in terms of pipelines of processing[2]. Something similar for RDF could
be useful.

Lastly, earlier I saw some comments about GRDDL and conformance. Note
that we in the GRDDL WG purposively avoided making over strenuous
conformance requirements on GRDDL except on security, since it would
have added unneeded complications and prevented evolution of GRDDL.
However, in my opinion if you have a function that takes a URI and
purports to return a graph, then if the representation is a
GRDDL-enabled XHTML or XML document, I would believe that one should
return the GRDDL-enabled RDF as the author *intended* the RDF to be read
from the document. Ditto RDFa.

As for 404s, well - I think we need to assume "normal operating
conditions of the Web" - i.e. no 404s (more ideal than normal really)
when doing processing that explicitly accesses URIs, and if there is a
404 then there needs to be an error message.

[1]http://www.w3.org/2001/tag/doc/elabInfoset/
[2]http://www.w3.org/XML/Processing/

> While I generally like the idea of checks like this, it seems there
> might be problems both in practice & in principle.
>
> In practice...ok, for example let's say I say my doc uses hTurtle, but
> due to circumstances beyond anyone's control the profile doc 404s. A
> lot less triples.
>
> In principle, well firstly I feel a little uncomfortable with the
> implication that an agent needs to provide a given level of
> conformance. A big benefit of the kind of data we deal with is that
> the producer can publish what it likes, the consumer can pick & choose
> what it likes.
>
> But being marginally more concrete, how might one go about pinning
> down the association between a resource and its representation as a
> single (named?) graph to the extent necessary to inspire confidence?
> Take a case like an RSS 1.0 blog feed. Yesterday it contained 100
> triples, today it contains 100 triples. Different triples each day,
> yet both presumably constitute a legit representation of the resource
> in question. (Along with whatever triples are expressed in any
> different representations - GRDDL, RDFa etc, which may or may not
> coincide with those in the feed).
>
> It seems to me that formal conformance levels are too strong in this
> context, way beyond the kind of thing e.g. the RDF Validator and
> Vapour offer. There's obvious benefit in testing tools like those
> mentioned recently in the validation thread, but I'm not sure how
> deterministic a given chunk of web clients/servers can be (and it will
> be a chunk if we consider GRDDL's profile chaining).
>
> Consider a Semantic Web cache, which for practical reasons doesn't
> accumulate every triple it encounters. The view to the agent may
> sometimes differ significantly from the current data available at the
> other side of the cache. Is this a legitimiate component on the
> Semantic Web? How does it /really/ differ from say an RDF/XML file
> served on the Web? Will that file as seen by a consumer always exactly
> reflect the producer's intended truth?
>
> Dunno, although I like the sound of conformance levels within a very
> local context (and Draconian format checking etc), more generally my
> gut feeling is that a better test of a SWUA is how resilient/useful it
> is in circumstances of limited (c.f. danbri's "missing isn't broken")
> and even unreliable information.
>
> Cheers,
> Danny.
>
>   


-- 
  -harry

Harry Halpin,  University of Edinburgh 
http://www.ibiblio.org/hhalpin 6B522426
Received on Tuesday, 27 November 2007 21:46:15 UTC