- From: Sean B. Palmer <sean@miscoranda.com>
- Date: Tue, 13 Nov 2007 12:25:49 +0000
- To: public-grddl-wg@w3.org
(This is a forward of a message that I just sent to semantic-web. It proposes further work to take place on GRDDL. When I discussed it with Danny Ayers I think he suggested sending it to the still active GRDDL WG, so that's the intent behind this message being sent to this list.) This email sketches class of documents I call RDF Stylesheets, of which one immediate application is making GRDDL easier to deploy via a small extension. This is a lightweight proposal with a big impact. There are five methods for including a stylesheet in XHTML 1.0. 1) The <style> element. 2) <link rel="stylesheet" ... /> 3) The Link: HTTP header, with rel=stylesheet. 4) The @style attribute (limited to text/css). 5) <?xml-stylesheet ...?> Methods 1-4 are defined by the HTML 4.01 specification [1] (yes, including the Link HTTP header; see Section 14.6 [2]). Method 5 is defined by the Associating Style Sheets with XML documents W3C recommendation [3]. Neither of these specifications define what they mean by a style sheet. Methods 1-3, and 5 are language agnostic both in terms of allowing unconstrained media type, and in how the HTML specification explicitly leaves the door open for style sheet languages: "This specification doesn't tie HTML to any particular style sheet language. This allows for a range of such languages to be used, for instance simple ones for the majority of users and much more complex ones for the minority of users with highly specialized needs." - http://www.w3.org/TR/html401/present/styles.html#h-14.1 The most commonly used style sheet languages at the moment are CSS and XSLT. CSS allows content generation now (when I said that this was beyond the remit of CSS, on www-style many years ago, I was told effectively that the remit had changed). XSLT is turing complete; a programming language. GRDDL uses XSLT stylesheets (amongst other transformation languages) to select information in documents which can be serialised, presented, as RDF/XML. The information is there in the original, but needs massaging into a machine readable format. GRDDL used the verb rel="transformation" for this, which is not defined by the HTML specification, so it needs to make use of the extensibility mechanism provided therein, the @profile attribute: <head profile="http://www.w3.org/2003/g/data-view"> [...] <link rel="transformation" href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" /> - http://www.w3.org/TR/grddl/#grddl-xhtml Whereas, in fact, it should've leveraged the existing stylesheet verb. How would that look? Well, it *wouldn't* look like the following, even though the following is valid and will give you triples, because it doesn't use the full GRDDL mechanism: <link rel="stylesheet" type="text/xsl" href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" /> (On the use of "text/xsl", the position here isn't all that clear. See http://www.dpawson.co.uk/xsl/sect2/mimetypes.html and especially its "clear as mud" exasperation.) The reason why you can't do this is probably the reason why @rel="stylesheet" wasn't considered for GRDDL. The problem is that GRDDL mechanism is recursive: you process transformations of the input to RDF, and then the RDF output may contain further transformations which you process as new input, recursively. The XSLT mechanism is applied once and that's it. Hence the idea of RDF Stylesheets. The GRDDL mechanism is actually formally specified in RDF already (or rather, an extension of it which is currently being called N3Logic by the cool kids; see [4], [5], and [6] for details), and the formal description is in the specification space though not linked from the specification itself: http://www.w3.org/TR/grddl/grddl-rules.n3 See [7] if you want to get distracted about this. Recall that RDF is a meta-language, like XML: we can specify applications, new languages, inside RDF just as we can specify new applications and languages (like XSLT) in XML. RDF/XML is an application of XML. XML can be expressed as an application of RDF (search for XML Infoset in RDF work). It's important only to go as far with the meta-language thing as you have to. That being said, now imagine the following: <link rel="stylesheet" type="application/rdf+xml" href="http://example.org/dc-extract" /> What kind of application or language should the dc-extract RDF document contain? It has to be a stylesheet, in some loose sense of the word that HTML will allow, and it has to be application/rdf+xml. The intent is to make it a GRDDL Stylesheet, a subclass document type of RDF Stylesheet, using the formalisation we have in grddl-rules.n3. I'm not going to complete the formalisation here, because email messages to lists.w3.org are where formalisation starts, not completes. But here's a goodly chunk of hints: Consider the mechanism by which GRDDL works which is identical with stylesheets. They both take an input document and transform it. The way that we specify the input document using rel="transformation" is that it's the document in which the transformation link occurs. That's very obvious and intuitive, though the formal rule for it (as you'll see in a moment), is actually quite complicated. With RDF Stylesheets, at any rate, we're doing something different: only the user agent requesting the stylesheet knows what the input document is, but it gets that via a traversal in the same way that it traverses to grddl:transformations. XSLT as a language has the document(...) function for traversing, of course, but the input to an XSLT stylesheet is implicit in XSLT as a language. We're defining a stylesheet language for GRDDL here. In other words, the pattern is that a stylesheet is a kind of document which applies to implicit input. The *power* of this is its reusablity. Anything can link to the dc-extract GRDDL Stylesheet, in our current case. But the GRDDL formalisation requires an input document to use in its rules. We need to model the implicit input mechanism in the GRDDL Stylesheet language somehow, and thankfully it's very easy to do so. If you're starting to not understand, hopefully this is straightforward enough to shed light on it: [[[ <> a s:Stylesheet; s:input [ grddl:transformation <.../something.xsl>; etc. ] . ]]] - would go in dc-extract, the GRDDL Stylesheet instance Nifty, no? And it only requires (you'll see why if you read the formalisation stuff below) a single property and an optional class. One cool thing that it enables is for you to bind together existing transformation under a single URI. You don't have to specify one transform for each XSLT stylesheet; all the work takes place in the GRDDL Stylesheet, and the HTML merely references it. This drops the burden on the HTML, which is favourable. So, if you want the formal view of how this works, first you probably should look at how GRDDL formalises the current rel="transformation" way of doing it for comparison: { ?N gspec:profileName "http://www.w3.org/2003/g/data-view". (?N """.//*[namespace-uri()="http://www.w3.org/1999/xhtml" and (local-name() = "a" or local-name() = "link")""" ) gspec:xpath ?E. (?E "@rel") gspec:xpath [ fn:string [ fn:normalize-space ?E_REL ]]. (?E_REL "[ \t\r\n]+") fn:tokenize [ list:member "transformation" ]. (?E "@href") gspec:xpath [ fn:string ?T_REF ]. ?E gspec:htmlBase ?BASE. (?T_REF ?BASE) fn:resolve-uri ?TURI. ?T log:uri ?TURI. } => { ?N grddl:transformation ?T. } . And here's how we do it using a GRDDL Stylesheet: { (?N """.//*[namespace-uri()="http://www.w3.org/1999/xhtml" and (local-name() = "a" or local-name() = "link")""" ) gspec:xpath ?E. (?E "@rel") gspec:xpath [ fn:string [ fn:normalize-space ?E_REL ]]. (?E_REL "[ \t\r\n]+") fn:tokenize [ list:member "stylesheet" ]. (?E "@href") gspec:xpath [ fn:string ?S_REF ]. ?E gspec:htmlBase ?BASE. (?S_REF ?BASE) fn:resolve-uri ?SURI. ?S log:uri ?SURI . ?S [ log:semantics [ log:includes { ?S s:input [ grddl:transformation ?T ] } ] ] . } => { ?N grddl:transformation ?T. } . The quantification levels aren't right (consider all variables quantified over the root document scope), but you get the drift if you're one of the three people in the world who understands N3Logic. The main point of the s:input arc, in case you were wondering, is to provide something as a subject for the grddl:transformation statement(s), and any other associated GRDDL stuff you want to put in there. In fact, s:Stylesheet and s:input are quite powerful in general. You could build lots of different RDF Stylesheet languages if you wanted to, not just the GRDDL Stylesheet Language that I've started to sketch. I already have another kind of RDF Stylesheet language in mind (hint: it involves non-XML input). To summarise, the main benefit of RDF Stylesheets to GRDDL is that they let you omit the @profile attribute, which I am given to understand has been raised as a detraction point against GRDDL; and that they let you make a new union of transformations and give it a URI, reducing the amount of HTML that you have to write to use GRDDL per instance. It makes GRDDL more easily reusable—it makes it into a stylesheet language! Whilst I'm fixing GRDDL, I'll note that GRDDL could really work on any information resource, not just XML ones, and I don't really understand why that apparently arbitrary constraint was added when the conformance definition for GRDDL user-agents is then so *loose*. If anyone can explain the former with regards (and only with regards) to the latter, then I'd very much appreciate it. In summary, then, I've just sketched a class of documents, RDF Stylesheets, which you can use with Methods 1-3, and 5, in HTML, XHTML, XML, and anything you can bung a Link header on; and using whatever mechanism is available elsewhere. The use of RDF Stylesheets is primarily intended at the moment for specifying ways to get RDF out of various formats, especially with a GRDDL Stylesheet Language as low hanging fruit, but is an open class inasmuch as the range of the rel="stylesheet" verb is an open class. Thanks, [1] http://www.w3.org/TR/html401/present/styles.html [2] http://www.w3.org/TR/html401/present/styles.html#h-14.6 [3] http://www.w3.org/TR/xml-stylesheet/ [4] http://www.w3.org/DesignIssues/N3Logic [5] http://arxiv.org/abs/0711.1533 [6] http://lists.w3.org/Archives/Public/public-cwm-bugs/2007Nov/0014 [7] http://chatlogs.planetrdf.com/swig/2007-11-12.html#T19-16-34 P.S. When I say that N3Logic is a formalisation of the GRDDL mechanism, you can read that as it being a reference implementation, because really it's a declarative programming language rather than (as well as?) a logic. You could have an imperative language expressing the same kind of thing with test cases, I'll bet; and some people would find it more intuitive to do that. In fact, one of the big Semantic Web questions for me at the moment is the interplay of declarative and imperative. I think that the Semantic Web could benefit hugely from a skillful combination of the two, which I've already started to take step towards with Plan3 and so on. The other RDF Stylesheet language (the non-XML, hint hint, one) I mentioned, for example, might require some declarative-imperative mix to tune it up. -- Sean B. Palmer, http://inamidst.com/sbp/
Received on Tuesday, 13 November 2007 12:32:20 UTC