Semantic Web User Agent Conformance from Sean B. Palmer on 2007-11-22 (semantic-web@w3.org from November 2007)

From: Sean B. Palmer <sean@miscoranda.com>
Date: Thu, 22 Nov 2007 16:46:45 +0000
To: semantic-web@w3.org
Message-ID: <b6bb4d890711220846m6964d41na65cd32f44f6bbdc@mail.gmail.com>
One of the biggest Semantic Web questions people are asking right now
is: when a Semantic Web User Agent gets a document, how many normative
ways of getting triples from it are there? Or, from the other
direction: how many triples is the author asserting in some document?
The answer is, generally, "how long is a piece of string?", but in
fact there are lots of cases in which we need to construct more
specific answers.

I'm proposing some kind of work on conformance levels for Semantic Web
User Agents, such that when someone says "how many triples are in
$uri", we can answer confidently "a Class 2 Semantic Web User Agent
will return 53 triples"; or perhaps not *that* abstract, but along
those lines. It would be nice for example if we could specify things
very granularly too, so a vocabulary for specifying user agent
conformances on levels from the granular (single test cases) to the
abstract ("I support RDF/XML") would also be good.

The aim is for document producers to know how many UAs out there
support the format that they're using, and to give some kind of
regularity to what is currently a bundle of ad hoc solutions.

As a bit of context, here are some people who've been thinking about
this specifically from an engineering point of view:

http://chatlogs.planetrdf.com/swig/2007-11-22.html#T15-55-12
- Chimezie Ogbuji

http://www.w3.org/DesignIssues/diagrams/arch/follow
- Tim Berners-Lee

http://swhack.com/logs/2007-11-22#T13-09-31
- Keith Alexander

http://swig.xmlhack.com/2007/11/22/2007-11-22.html#1195725171.081142
- me

There's also been a lot of discussion about Xiaoshu Wang's paper on
kinda the same issues, but I think that's a distraction so I won't
bother to link to it.

Now the case that got me thinking about this today is RDFa. The
current RDFa specification states as follows:

"A conforming RDFa Processor MUST make available to a consuming
application a single RDF [graph] containing all possible triples
generated by using the rules in the Processing Model section."
- http://www.w3.org/TR/rdfa-syntax/#uaconf

What a huge tax on Semantic Web User Agents that also have to be
conforming GRDDL agents and conforming RDF/XML agents and so on! GRDDL
in particular is a tricky case because the conformance is left open
but it recommends that you implement *all* of the underlying
processing mechanisms and then XSLT 1.0 as the main transformation
language; but this particular conformance class doesn't have a name,
it's just represented in the GRDDL Test Cases REC.

I've already filed a comment about RDFa, and I've asked for
@profile=".../rdfa" to be a SHOULD in RDFa documents, and to absolve
conforming user agents of the responsibility to parse anything other
than an @profile using RDFa document. There are some important
comments in that thread that I've made, e.g.:

[[[
The burden is where you have a URI, and you know it gives triples, but
you can't be sure that the author isn't going to use eRDF one day,
change their mind and use some GRDDL hDialect the next, and then RDFa
the next. So you have to try *all* of the available mechanisms to be
safe.

Now, if there were some well defined heuristics for telling which
possible transformations might apply, that would greatly reduce the
burden on the suite of parsers required to handle all this.
]]] - Re: RDFa RFE: No Mandated DOCTYPE
http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2007Nov/0065

After that, I considered (and still consider) that this work is
probably in the remit of the SW Deployment WG, and wrote to them about
it:

[[[
If you really want to take this to its logical conclusion, it would be
nice to have a vocabulary for describing the capabilities of Semantic
Web user agents to consume various documents, a writeup of the
heuristics that they ought to use, and a kind of extra layer of
conformance levels for Semantic Web user agent authors to try to meet.
"Don't wanna support all of GRDDL? Here are a few common subsets that
are well deployed."

This should be based on some level of description, looking to see what
kinds of documents people are actually using, and prescription, what
kinds would be good to produce especially in future when things like
RDFa go to rec.
]]] - Best Practices Issue: RDF Format Discovery
http://lists.w3.org/Archives/Public/public-swd-wg/2007Nov/0056

I think that this would be valuable work to carry out. The SWD WG
haven't replied yet, but I was wondering what the Semantic Web
Interest Group thought, and I thought that at the very least I ought
to keep people over here informed.

Thanks,

-- 
Sean B. Palmer, http://inamidst.com/sbp/
Received on Thursday, 22 November 2007 16:46:59 UTC