RDF Stylesheets from Sean B. Palmer on 2007-11-13 (semantic-web@w3.org from November 2007)

From: Sean B. Palmer <sean@miscoranda.com>
Date: Tue, 13 Nov 2007 12:23:04 +0000
To: semantic-web@w3.org
Message-ID: <b6bb4d890711130423u2279096ao73a0e4a25cd16a9d@mail.gmail.com>
This email sketches class of documents I call RDF Stylesheets, of
which one immediate application is making GRDDL easier to deploy via a
small extension. This is a lightweight proposal with a big impact.

There are five methods for including a stylesheet in XHTML 1.0.

1) The <style> element.
2) <link rel="stylesheet" ... />
3) The Link: HTTP header, with rel=stylesheet.
4) The @style attribute (limited to text/css).
5) <?xml-stylesheet ...?>

Methods 1-4 are defined by the HTML 4.01 specification [1] (yes,
including the Link HTTP header; see Section 14.6 [2]). Method 5 is
defined by the Associating Style Sheets with XML documents W3C
recommendation [3].

Neither of these specifications define what they mean by a style
sheet. Methods 1-3, and 5 are language agnostic both in terms of
allowing unconstrained media type, and in how the HTML specification
explicitly leaves the door open for style sheet languages:

"This specification doesn't tie HTML to any particular style sheet
language. This allows for a range of such languages to be used, for
instance simple ones for the majority of users and much more complex
ones for the minority of users with highly specialized needs."
- http://www.w3.org/TR/html401/present/styles.html#h-14.1

The most commonly used style sheet languages at the moment are CSS and
XSLT. CSS allows content generation now (when I said that this was
beyond the remit of CSS, on www-style many years ago, I was told
effectively that the remit had changed). XSLT is turing complete; a
programming language.

GRDDL uses XSLT stylesheets (amongst other transformation languages)
to select information in documents which can be serialised, presented,
as RDF/XML. The information is there in the original, but needs
massaging into a machine readable format. GRDDL used the verb
rel="transformation" for this, which is not defined by the HTML
specification, so it needs to make use of the extensibility mechanism
provided therein, the @profile attribute:

  <head profile="http://www.w3.org/2003/g/data-view"> [...]
    <link rel="transformation"
       href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" />
 - http://www.w3.org/TR/grddl/#grddl-xhtml

Whereas, in fact, it should've leveraged the existing stylesheet verb.
How would that look? Well, it *wouldn't* look like the following, even
though the following is valid and will give you triples, because it
doesn't use the full GRDDL mechanism:

    <link rel="stylesheet" type="text/xsl"
       href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" />

(On the use of "text/xsl", the position here isn't all that clear. See
http://www.dpawson.co.uk/xsl/sect2/mimetypes.html and especially its
"clear as mud" exasperation.)

The reason why you can't do this is probably the reason why
@rel="stylesheet" wasn't considered for GRDDL. The problem is that
GRDDL mechanism is recursive: you process transformations of the input
to RDF, and then the RDF output may contain further transformations
which you process as new input, recursively. The XSLT mechanism is
applied once and that's it.

Hence the idea of RDF Stylesheets. The GRDDL mechanism is actually
formally specified in RDF already (or rather, an extension of it which
is currently being called N3Logic by the cool kids; see [4], [5], and
[6] for details), and the formal description is in the specification
space though not linked from the specification itself:

http://www.w3.org/TR/grddl/grddl-rules.n3

See [7] if you want to get distracted about this.

Recall that RDF is a meta-language, like XML: we can specify
applications, new languages, inside RDF just as we can specify new
applications and languages (like XSLT) in XML. RDF/XML is an
application of XML. XML can be expressed as an application of RDF
(search for XML Infoset in RDF work). It's important only to go as far
with the meta-language thing as you have to.

That being said, now imagine the following:

    <link rel="stylesheet" type="application/rdf+xml"
       href="http://example.org/dc-extract" />

What kind of application or language should the dc-extract RDF
document contain? It has to be a stylesheet, in some loose sense of
the word that HTML will allow, and it has to be application/rdf+xml.
The intent is to make it a GRDDL Stylesheet, a subclass document type
of RDF Stylesheet, using the formalisation we have in grddl-rules.n3.

I'm not going to complete the formalisation here, because email
messages to lists.w3.org are where formalisation starts, not
completes. But here's a goodly chunk of hints:

Consider the mechanism by which GRDDL works which is identical with
stylesheets. They both take an input document and transform it. The
way that we specify the input document using rel="transformation" is
that it's the document in which the transformation link occurs. That's
very obvious and intuitive, though the formal rule for it (as you'll
see in a moment), is actually quite complicated. With RDF Stylesheets,
at any rate, we're doing something different: only the user agent
requesting the stylesheet knows what the input document is, but it
gets that via a traversal in the same way that it traverses to
grddl:transformations. XSLT as a language has the document(...)
function for traversing, of course, but the input to an XSLT
stylesheet is implicit in XSLT as a language. We're defining a
stylesheet language for GRDDL here.

In other words, the pattern is that a stylesheet is a kind of document
which applies to implicit input. The *power* of this is its
reusablity. Anything can link to the dc-extract GRDDL Stylesheet, in
our current case.

But the GRDDL formalisation requires an input document to use in its
rules. We need to model the implicit input mechanism in the GRDDL
Stylesheet language somehow, and thankfully it's very easy to do so.
If you're starting to not understand, hopefully this is
straightforward enough to shed light on it:

[[[
<> a s:Stylesheet;
   s:input [ grddl:transformation <.../something.xsl>; etc. ] .
]]] - would go in dc-extract, the GRDDL Stylesheet instance

Nifty, no? And it only requires (you'll see why if you read the
formalisation stuff below) a single property and an optional class.
One cool thing that it enables is for you to bind together existing
transformation under a single URI. You don't have to specify one
transform for each XSLT stylesheet; all the work takes place in the
GRDDL Stylesheet, and the HTML merely references it. This drops the
burden on the HTML, which is favourable.

So, if you want the formal view of how this works, first you probably
should look at how GRDDL formalises the current rel="transformation"
way of doing it for comparison:

{
?N gspec:profileName "http://www.w3.org/2003/g/data-view".
(?N
""".//*[namespace-uri()="http://www.w3.org/1999/xhtml" and
        (local-name() = "a"
         or local-name() = "link")"""
) gspec:xpath ?E.
(?E "@rel") gspec:xpath [ fn:string [
   fn:normalize-space ?E_REL ]].
(?E_REL "[ \t\r\n]+") fn:tokenize [
 list:member "transformation" ].
(?E "@href") gspec:xpath [ fn:string ?T_REF ].
?E gspec:htmlBase ?BASE.
(?T_REF ?BASE) fn:resolve-uri ?TURI.
?T log:uri ?TURI. }
=> { ?N grddl:transformation ?T. } .

And here's how we do it using a GRDDL Stylesheet:

{ (?N
""".//*[namespace-uri()="http://www.w3.org/1999/xhtml" and
        (local-name() = "a"
         or local-name() = "link")"""
) gspec:xpath ?E.
(?E "@rel") gspec:xpath [ fn:string [
   fn:normalize-space ?E_REL ]].
(?E_REL "[ \t\r\n]+") fn:tokenize [
 list:member "stylesheet" ].
(?E "@href") gspec:xpath [ fn:string ?S_REF ].
?E gspec:htmlBase ?BASE.
(?S_REF ?BASE) fn:resolve-uri ?SURI.
?S log:uri ?SURI .
?S [ log:semantics [
   log:includes { ?S s:input [ grddl:transformation ?T ] }
 ] ] . }
=> { ?N grddl:transformation ?T. } .

The quantification levels aren't right (consider all variables
quantified over the root document scope), but you get the drift if
you're one of the three people in the world who understands N3Logic.
The main point of the s:input arc, in case you were wondering, is to
provide something as a subject for the grddl:transformation
statement(s), and any other associated GRDDL stuff you want to put in
there.

In fact, s:Stylesheet and s:input are quite powerful in general. You
could build lots of different RDF Stylesheet languages if you wanted
to, not just the GRDDL Stylesheet Language that I've started to
sketch. I already have another kind of RDF Stylesheet language in mind
(hint: it involves non-XML input).

To summarise, the main benefit of RDF Stylesheets to GRDDL is that
they let you omit the @profile attribute, which I am given to
understand has been raised as a detraction point against GRDDL; and
that they let you make a new union of transformations and give it a
URI, reducing the amount of HTML that you have to write to use GRDDL
per instance. It makes GRDDL more easily reusable—it makes it into a
stylesheet language!

Whilst I'm fixing GRDDL, I'll note that GRDDL could really work on any
information resource, not just XML ones, and I don't really understand
why that apparently arbitrary constraint was added when the
conformance definition for GRDDL user-agents is then so *loose*. If
anyone can explain the former with regards (and only with regards) to
the latter, then I'd very much appreciate it.

In summary, then, I've just sketched a class of documents, RDF
Stylesheets, which you can use with Methods 1-3, and 5, in HTML,
XHTML, XML, and anything you can bung a Link header on; and using
whatever mechanism is available elsewhere. The use of RDF Stylesheets
is primarily intended at the moment for specifying ways to get RDF out
of various formats, especially with a GRDDL Stylesheet Language as low
hanging fruit, but is an open class inasmuch as the range of the
rel="stylesheet" verb is an open class.

Thanks,

[1] http://www.w3.org/TR/html401/present/styles.html
[2] http://www.w3.org/TR/html401/present/styles.html#h-14.6
[3] http://www.w3.org/TR/xml-stylesheet/
[4] http://www.w3.org/DesignIssues/N3Logic
[5] http://arxiv.org/abs/0711.1533
[6] http://lists.w3.org/Archives/Public/public-cwm-bugs/2007Nov/0014
[7] http://chatlogs.planetrdf.com/swig/2007-11-12.html#T19-16-34

P.S. When I say that N3Logic is a formalisation of the GRDDL
mechanism, you can read that as it being a reference implementation,
because really it's a declarative programming language rather than (as
well as?) a logic. You could have an imperative language expressing
the same kind of thing with test cases, I'll bet; and some people
would find it more intuitive to do that.

In fact, one of the big Semantic Web questions for me at the moment is
the interplay of declarative and imperative. I think that the Semantic
Web could benefit hugely from a skillful combination of the two, which
I've already started to take step towards with Plan3 and so on. The
other RDF Stylesheet language (the non-XML, hint hint, one) I
mentioned, for example, might require some declarative-imperative mix
to tune it up.

-- 
Sean B. Palmer, http://inamidst.com/sbp/
Received on Tuesday, 13 November 2007 12:23:13 UTC