Re: GRDDL extraction *to* RDFa

Hi Chimezie,

Chimezie Ogbuji wrote:
> I have a concern that such a transform isn't really a GRDDL transform
> but a generic one, which *results*
> in a GRDDL Source Document (which could refer to a standard [1]
> XHTML/RDFa -> RDF GRDDL transform)

I'm glad you're bringing up these points, as I know that the statement
(hGRDDL instanceof GRDDL) is not necessarily clear or agreeable to all.

> I've always thought of the GRDDL process as a black box where an XML
> dialect goes in and an RDF syntax comes out (RDF/XML currently, though I
> think there
> is an outstanding issue in the author's version to consider other
> serializations of RDF).  The mechanisms by which the transforms are
> registered and the transformation itself constitute the black box.

Correct, I agree with that, though I would say that XHTML+RDFa is just
another syntax for RDF (with extra bells and whistles, like how to
display it so a human can read it.)

> The problem with your scenario as I see it is that it requires an
> expansion of the current definition to include (as output) not only
> 'non-standard' RDF serialization syntaxes (Turtle, N3, TriX, etc..) but
> embedded syntaxes which I'd argue are better served as the *input* to
> GRDDL which results in a 'stand-alone' RDF syntax.

XHTML+RDFa can definitely be the input to some GRDDL transform if you
want raw triples, but that shouldn't be the only way that RDF triples
are gleaned from it. RDFa can and should be a first-class serialization
of RDF.

The way I look at it is: there is no "pure RDF" syntax. N3, RDF/XML,
these are all methods of serializing RDF. A bit like unicode encodings
if you will, where the unicode character (respectively RDF triple) is
always abstract, but its encoding (respectively RDF/XML serialization)
is actually a string of bytes.

Here's another twist on it: some people think N3 is a better way of
expressing RDF than RDF/XML. In that world, it would make sense to
provide a GRDDL transform from RDF/XML to N3, in which case RDF/XML is
now a GRDDL input. That would be awfully recursive, but it all depends
on what we think the "pure" expression of RDF is. I wonder if we really
want to be setting in stone such an opinion, or if we should instead say
"GRDDL outputs some standard carrier of RDF triples, e.g. RDF/XML."

> It seems to me that
> your scenario is really a two phase process (where the first phase is a
> layout transformation and the second is a GRDDL transformation)
> especially since the primary motivation is "preserving the style and
> layout of the page.":
> 
> 1) Transformation from homegrown HTML to XHTML+RDFa (I'd argue by the
> current scope this is not a GRDDL process and could easily be served
> with a <?xml-stylesheet?> instruction *within* the original HTML)

Yes, though this approach makes the purpose of the transformation very
opaque.

> 2) Transformation from the XHTML+RDFa to 'raw' RDF (used by the RDFa
> browser which itself could be a GRDDL Processor - using an existing [1]
> transform for this process)

In my mind, this second step is not always necessary. In fact, the
transformation to 'raw' RDF discards important contextual information,
which means you wouldn't want this to be the only way to "read" RDFa.
Consider the way the RDFa bookmarklet works: it doesn't GRDDL the
XHTML+RDFa to triples, it actually navigates the DOM tree, looking for
attached triples along the way.

I see this "DOM ornaments" approach as a major way of "reading the
triples" from an RDFa document.

> The only problem with the first step of course is that <?xml-stylesheet
> ?> is more of a suggestion than a 'standard' (even though it is
> supported by most major browsers).

Yes, there is that, plus the fact that it doesn't indicate anything
about *what* this transformation is meant for.

> I've actually had a real world need to do something like your usecase
> suggests.  I've been working on an Atom-driven Python Weblog tool [2]
> which uses XSLT for its templating in conjunction with Python WSGI for
> the web server stack.  I added a few presentation templates and modified
> the XSLT (which takes an Atom feed as source ) to output XHTML with RDFa
> markup for Atom metadata (author, label, date of creation, etc..).

Very cool. Would it be okay if we linked this work from rdfa.info?

What you're describing here fits very well within the use case I
described, but I think you and I may have slightly different
interpretations of why.

In my mind, whether your munging together of ATOM and templating
produces RDF/XML or XHTML+RDFa shouldn't make a difference. In the first
case, it's only machine-readable, in the second case it's both machine
and human readable. If you had, instead, produced some homegrown HTML
with no RDFa or anything else, then you could transform that homegrown
HTML to a carrier of RDF, which could be either RDF/XML or XHTML+RDFa.

In other words, I see the GRDDL-able gap between homegrown-HTML and
XHTML+RDFa. If I'm understanding you correctly, you see the gap between
XHTML+RDFa and RDF/XML.

> In order to test the template visually, I installed your RDFa Highlight
> bookmarklet [3].  I also modified the XSLT such that the output XHTML
> document was also a GRDDL Source Document (by adding the appropriate
> profile and link[@rel=transformation] element).  This way the RDFa
> bookmarklet could understand the RDFa directly and a generic GRDDL
> Processor could as well.
> 
> I think the categories of possible output for GRDDL are:
> 
> - RDF/XML alone
> - Stand-alone RDF syntaxes (NTriples, N3, TriX, etc..)
> - Embedded RDF syntaxe for XML (eRDF, RDFa, etc..)

So the central question is whether you conceive of RDFa as a
serialization of RDF or as something that can be transformed into a
valid serialization of RDF. I think it is a first-class serialization,
but I know not everyone agrees.

The second question is whether we should be deciding, in this working
group, what constitutes "pure RDF," or whether we should say that a
GRDDL output should be "some accepted, mime-typed, serialization of RDF,
e.g. RDF/XML." I am clearly in favor of this latter approach.

> The current spec only seems to support the first

Yes, I agree that the spec currently reads this way, and I'm proposing
that this be made more flexible (though I don't think I'm the first to
suggest this.)

Thanks for helping to clarify these issues; I think you've pushed me to
express more clearly what I'm actually proposing. At least, I *hope*
it's clearer :)

-Ben

Received on Sunday, 10 September 2006 00:23:28 UTC