- From: Harry Halpin <hhalpin@ibiblio.org>
- Date: Tue, 27 Nov 2007 16:46:00 -0500
- To: Danny Ayers <danny.ayers@gmail.com>
- Cc: "Sean B. Palmer" <sean@miscoranda.com>, semantic-web@w3.org
Danny Ayers wrote: > On 22/11/2007, Sean B. Palmer <sean@miscoranda.com> wrote: > > >> I'm proposing some kind of work on conformance levels for Semantic Web >> User Agents, such that when someone says "how many triples are in >> $uri", we can answer confidently "a Class 2 Semantic Web User Agent >> will return 53 triples"; or perhaps not *that* abstract, but along >> those lines This problem is not unique to RDF. Even a successful and rather aged specification such as XML has large gaps here. For example, does the infoset include Xinclude processing or not? As Henry Thompson pointed out earlier today: "Exactly what /the/ infoset of an XML document is is already somewhat under-determined, in that a well-formed XML document as processed by a conformant processor may yield two distinct infosets, depending on whether that processor processes all the external parameter entities in the document's DTD." [1]. Indeed, if you can't even determine what the conformance of XML is, is there hope for RDF? However, I do think it would be useful to look at types of conformance levels. I can think of a few off the top of my head (limiting myself to W3C Recs and Rec-track things) 1) RDF from RDF/XML without any entailment 3) RDF from RDF/XML + GRDDL + RDFa 4) RDF from RDF/XML + GRDDL + RDFa + RDF(S) reasoning 5) RDF from RDF/XML + GRDDL + RDFa + RDF(S) reasoning + OWL entailment Note it's combinatoric, the hierarchy is just a single off the cuff but sensible one. One could easily have RDF from RDF/XML + OWL entailment. And one could specify more precisely the conditions that one would do the OWL entailment with...anyways, if someone wanted to write up draft levels. In XML world what they are doing to their version of this question so far (which is, does an XML document include information post-DTD attribute defaulting? How about XML Schema validation PSVI?) is to produce a whole WG and a mini-language to let users talk about it in terms of pipelines of processing[2]. Something similar for RDF could be useful. Lastly, earlier I saw some comments about GRDDL and conformance. Note that we in the GRDDL WG purposively avoided making over strenuous conformance requirements on GRDDL except on security, since it would have added unneeded complications and prevented evolution of GRDDL. However, in my opinion if you have a function that takes a URI and purports to return a graph, then if the representation is a GRDDL-enabled XHTML or XML document, I would believe that one should return the GRDDL-enabled RDF as the author *intended* the RDF to be read from the document. Ditto RDFa. As for 404s, well - I think we need to assume "normal operating conditions of the Web" - i.e. no 404s (more ideal than normal really) when doing processing that explicitly accesses URIs, and if there is a 404 then there needs to be an error message. [1]http://www.w3.org/2001/tag/doc/elabInfoset/ [2]http://www.w3.org/XML/Processing/ > While I generally like the idea of checks like this, it seems there > might be problems both in practice & in principle. > > In practice...ok, for example let's say I say my doc uses hTurtle, but > due to circumstances beyond anyone's control the profile doc 404s. A > lot less triples. > > In principle, well firstly I feel a little uncomfortable with the > implication that an agent needs to provide a given level of > conformance. A big benefit of the kind of data we deal with is that > the producer can publish what it likes, the consumer can pick & choose > what it likes. > > But being marginally more concrete, how might one go about pinning > down the association between a resource and its representation as a > single (named?) graph to the extent necessary to inspire confidence? > Take a case like an RSS 1.0 blog feed. Yesterday it contained 100 > triples, today it contains 100 triples. Different triples each day, > yet both presumably constitute a legit representation of the resource > in question. (Along with whatever triples are expressed in any > different representations - GRDDL, RDFa etc, which may or may not > coincide with those in the feed). > > It seems to me that formal conformance levels are too strong in this > context, way beyond the kind of thing e.g. the RDF Validator and > Vapour offer. There's obvious benefit in testing tools like those > mentioned recently in the validation thread, but I'm not sure how > deterministic a given chunk of web clients/servers can be (and it will > be a chunk if we consider GRDDL's profile chaining). > > Consider a Semantic Web cache, which for practical reasons doesn't > accumulate every triple it encounters. The view to the agent may > sometimes differ significantly from the current data available at the > other side of the cache. Is this a legitimiate component on the > Semantic Web? How does it /really/ differ from say an RDF/XML file > served on the Web? Will that file as seen by a consumer always exactly > reflect the producer's intended truth? > > Dunno, although I like the sound of conformance levels within a very > local context (and Draconian format checking etc), more generally my > gut feeling is that a better test of a SWUA is how resilient/useful it > is in circumstances of limited (c.f. danbri's "missing isn't broken") > and even unreliable information. > > Cheers, > Danny. > > -- -harry Harry Halpin, University of Edinburgh http://www.ibiblio.org/hhalpin 6B522426
Received on Tuesday, 27 November 2007 21:46:15 UTC