Review of GRDDL Documents and Issues from Murray Maloney on 2006-09-26 (public-grddl-wg@w3.org from September 2006)

From: Murray Maloney <murray@muzmo.com>
Date: Tue, 26 Sep 2006 17:12:48 -0400
To: GRDDL Working Group <public-grddl-wg@w3.org>
Message-Id: <5.1.1.6.2.20060926110844.00b0d038@mail.muzmo.com>
To complete my actions, I examined

[1] Gleaning Resource Descriptions from Dialects of Languages (GRDDL)
editor's draft $Date: 2006/09/20 22:49:35 $
which I retrieved from http://www.w3.org/2004/01/rdxh/spec

[2] GRDDL Use Cases
Editor's Draft 10 Sep 2006
which I retrieved from 
http://www.w3.org/2001/sw/grddl-wg/doc43/scenario-gallery.htm

[3] GRDDL Primer
Editor's Draft 20 September 2006
which I retrieved from http://research.talis.com/2006/grddl-wg/primer

I am neither willing nor able to make an assertion about the suitability for
publication of any of these documents. My goal was to report on anomalies
in the use of vocabulary among the documents. In the end, I did not note
so many issues with consistency of terminology, but I did discover some
places where an edit would be helpful. I suppose that the two terms that
I had diffculty with are:

         GRDDL Processor:
                 I think of it as a GRDDL-aware processor. That is, I don't
                 think that I will be using a dedicated GRDDL processor
                 so often. But I do expect that some of my user agents and
                 network services may incorporate awareness of GRDDL
                 conventions. Such agents may or may not be responsible
                 for traversing the link and routing the result accordingly.

         GRDDL Source Document
                 I think of it as a source document that is a candidate for 
transformation.
                 That is, just because an XHTML document is decorated
                 with GRDDL ornaments doesn't mean that any transform
                 will ever occur.

         GRDDL Result Document
                 I think of it as simply a result document, or a gleaned 
RDF document.
                 That is simply because that is what GRDDL promises me in name.

         TRANSFORMATION
                 I think that this REL token should have been TRANSFORMER,
                 or something that expresses that the target is in fact a 
processor.
                 I think that TRANSFORMATION would have been or would be
                 suitable to express that the target is already transformed 
and is
                 cached.

I should note some of my own prejudices as I began reading. I referred to
the Wikipedia for a definition of gleaning, to wit:

         Gleaning is the collection of leftover crops from farmers' fields
         after they have been mechanically harvested or on fields where
         it is not economically profitable to harvest.

So, I start with the supposition that the subject refers to the act of 
economically
harvesting resource descriptions from disparate sources.

Gleaning Resource Descriptions from Dialects of Languages (GRDDL)
================================================
As I read the abstract:

         This document presents GRDDL, a mechanism for Gleaning Resource 
Descriptions
         from Dialects of Languages; that is, for getting RDF data out of 
XML and XHTML  documents using explicitly associated transformation 
algorithms, typically represented
         in XSLT.

I want to rewrite it thus:

         This document presents GRDDL, a mechanism for Gleaning Resource 
Descriptions
         from Dialects of Languages; that is, for harvesting RDF data from 
the field of
         XML documents by identifying transformation algorithms, typically 
represented in XSLT.
         A corresponding GRDDL Use Case Working Draft provides motivating 
examples.
         A GRDDL Primer demonstrates the mechanism on XHTML documents which
         include widely-deployed dialects, more recently known as micro 
formats.

1. Introduction: Data and Documents

I think that I know what the opening paragraphs are trying to say, but it 
is not
written in a way that rings true. I think that what needs to be said here is:

         There are many ways to look at the content of XML documents that
         exist on the web. There are XML document formats representing
         everything from poetry to prose, from spreadsheets to databases,
         from linked lists to ontologies.

         While this breadth of expression is quite liberating, inspiring 
new dialects to
         codify both common and customized meanings, it can prove to be a 
barrier
         to understanding across different domains or fields.

         How, for example, does software discover the author of a poem, a
         spreadsheet and an ontology? And how can software determine whether
         authors of each are in fact the same person.

         The Resource Description Framework[RDFC04] provides a standard for
         making statements about resources in the form of a 
subject-predicate-object
         expression. One way to represent the fact "This book's author is 
Stephen King"
         is RDF would be as a triple whose subject is "this book," whose 
predicate is
         "has the author", and whose object is "Stephen King," The 
predicate, "has the author"   expresses a relationship between the subject 
(book) and the object (person).
         Using URIs to uniquely identify the book, the author and even the 
relationship
         would facilitate software design because not everyone knows 
Stephen King
         or even spells his name consistently.

         The RDF framework includes an XML concrete syntax and an abstract 
syntax.
         Software tools that use the Resource Description Framework 
naturally prefer
         to work with documents whose data is encoded using RDF/XML.

         GRDDL is a mechanism for Gleaning Resource Descriptions from 
Dialects of Languages;     that is, for harvesting RDF data from the field 
of XML documents by identifying         transformation algorithms, 
typically represented in XSLT.

         There are essentially three parts to using GRDDL. Firstly, an XML 
document
         must identify itself as a candidate for use by a GRDDL-aware 
processor.
         Secondly, the candidate document must provide a link to one or 
more decoding
         algorithms. Thirdly, the GRDDL-aware processor must traverse the 
link and
         execute the target in order to yield the resulting RDF.

         For example, Dublin Core meta-data can be written in an HTML 
dialect[RFC2731]
         that has a clear correspondence to an encoding in RDF/XML[DCRDF]. 
The   correspondence can be expressed in an XSLT transformation, 
dc-extract.xsl:

         [Please include the candidate and the result documents.]

         [...]

2. The GRDDL profile for XHTML

This section should more accurately be entitled:

2. Using GRDDL with valid XHTML

It should go on to explain that valid XHTML is constrained by a DTD and
what the implications of that are on GRDDL. Then it should continue to explain
how to use the profile attribute and <LINK element to identify candidacy
and link to a transformer.

The second example would be much more satisfying if it showed the full source
and the eventual result documents, or linked to such in the Primer or Use 
Cases.

Please note that I think that the REL value TRANSFORMATION is a misnomer.
It should have been TRANSFORMER. I think that TRANSFORMATION should
have been used to link to a document that was previously cached, presumably
following an earlier use of the transformer. But I suspect that that ship has
already sailed.

3. The GRDDL transformation attribute in XML

This section should more accurately be entitled:

3. Using GRDDL with well-formed XML

I am confused. Why is the <head profile="http://www.w3.org/2003/g/data-view">
needed at all in this example?

The bottom line is that I only need foo:transformation="...", where foo is name
associated with the namespace: http://www.w3.org/2003/g/data-view#
and I succeed in identifying the document as a candidate and providing its 
links.

4.GRDDL for XML Namespace and HTML Profile Documents

This section should more accurately be entitled:

4. Using GRDDL with XML Namespace and HTML Profile Documents

A GRDDL-aware processor can become aware of candidate documents
through a parallel awareness of XML namespaces and HTML profiles.
That is, transformations can be associated not only with individual documents
but also with whole dialects that share an XML namespace or XHTML profile.

Then I move into terra-incognita. I have heard discussions over the years about
a putative namespace document, but I did not know that there was now such
as thing as a normative namespace document. I also got lost in the diagrams
and the text. I am asked to consider a lot of things, but I never quite know
how I am supposed to make this work. HELP!

5. GRDDL Transformations

I think that this should be entitled:

5. GRDDL Transformers

It would read more clearly to me as:

         The resources that are retrievable by traversing a 
GRDDL:TRANSFORMATION link should be transformation algorithms that have 
available representations in widely-supported formats. We expect most 
GRDDL-aware processors to support XSLT version 1[XSLT1] for the foreseeable 
future, though XSLT2[XSLT2] deployment is increasing. While javascript, C, 
or any other programming language technically expresses the relevant 
information, XSLT is specifically designed to express XML to XML 
transformations and has some good safety characteristics.

Again, I am confused. Why is it an error to use document() in your transform?
Might I not want to include some boiler-plate RDF in response to a well-known
chunk of XML?

6. Security Considerations

Now I see why document() is deprecated. But an error?

Finally, I did not see any issues over which I thought that XML Processing WG
needed to assert any kind of precedence or authority. I'll keep my eyes open.



GRDDL Use Cases
==============
Abstract

A number of documents contain data that could be valuable if they were 
automatically accessible. In particular, it would be extremely interesting 
if such documents could be transformed in RDF as a pivot language for other 
systems which don't use that specific document format themselves.

I would say:

There exist a plethora of XML documents on the web whose data content could
be economically harvested for use by RDF-aware processing tools to make that
data available to systems which may not support such a wide variety of dialects
but which do support RDF.

[...]

Use case #1 - Scheduling : Jane is trying to coordinate a meeting.
and
Glossary

I have difficulty reconciling the use of the term "GRDDL Source Document"
with the supposed act of gleaning resource descriptions. The use of GRDDL
as a descriptor or qualifier prefix belies the fact that most of these 
documents
have a much broader function and applicability than just GRDDL. Surely my
home page is not a GRDDL document per se. It may identify itself as a
candidate for transformation, or it may be identified as such by virtue of
its document element's membership in a namespace or HTML profile,
but that doesn't make it a GRDDL source document. In my opinion,
as a reader. Certainly the term "source document" applies, as does the
term "GRDDL fodder."

Use Cases 2-6

I read through them all. I found them interesting and as well-written as
one could hope. I read then and thought that I understood the motivation.

Show me how this works. It looks too much like hand-waving. HELP!

GRDDL Primer
===========

Introduction

I found that I was already into deep water when I tried to wade through
the introduction. If you look at my suggested introduction for the GRDDL WD,
I think that you will see that it takes a more gradual approach.

Also, I found some RDF-bias in this introduction. I find it entirely unhelpful
to position RDF as a preferred method for managing and manipulating data.
For each dialect for which a transformation is likely to be developed someday,
I am fairly certain that the inventor of that dialect considers their 
dialect to be
the best way to express and convey that information. The point is not so much
that RDF is a superior data-encoding format. Rather, I think, the point is that
there are so many tools, extant and yet to be developed, that can be leveraged
if and when it is practical to harvest greater volumes of data from the web.

[...]

Linking to a GRDDL Transform

I found this title to be out of place, appearing above the content of the
undecorated example. I would have found it helpful to see what the
undecorated might look like in a typical browser -- yes, I did follow the
link and did see it there, but I still think that it would not be onerous
to show the text in the Primer. I suggest that the title be used at the
point in the example where the LINK element is added.

It would seem that another title is also called for:

Adding GRDDL to the Profile

Which would explain about making profile ="http://www.w3.org/2003/g/data-view"
I also think that this where the namespace should be added too, so as not to
confuse readers.

Referencing via Profile Documents

I feel like I am over my head. I had never heard of Profile Documents.
I did not encounter this in either of the previous WDs. Why here? Why now?
If this is a Primer, it should build up my knowledge slowly and deliberately.
This feels like more advanced subject matter.

Buying a Guitar Example

Seems as though this should be divided into two distinct examples, where
the second example utilizes what we learned in the first. The first example
would explain about gathering Friends information into a useful collection,
and discuss some of the ways that one might imagine using such a collection.
The second example would be much as it is now. As it stands, it is a bit 
daunting
for a Primer.

======================================================
That's all for now folks. I look forward to your comments.

Regards,

Murray
Received on Tuesday, 26 September 2006 21:19:21 UTC