- From: Murray Maloney <murray@muzmo.com>
- Date: Tue, 26 Sep 2006 17:12:48 -0400
- To: GRDDL Working Group <public-grddl-wg@w3.org>
- Message-Id: <5.1.1.6.2.20060926110844.00b0d038@mail.muzmo.com>
To complete my actions, I examined [1] Gleaning Resource Descriptions from Dialects of Languages (GRDDL) editor's draft $Date: 2006/09/20 22:49:35 $ which I retrieved from http://www.w3.org/2004/01/rdxh/spec [2] GRDDL Use Cases Editor's Draft 10 Sep 2006 which I retrieved from http://www.w3.org/2001/sw/grddl-wg/doc43/scenario-gallery.htm [3] GRDDL Primer Editor's Draft 20 September 2006 which I retrieved from http://research.talis.com/2006/grddl-wg/primer I am neither willing nor able to make an assertion about the suitability for publication of any of these documents. My goal was to report on anomalies in the use of vocabulary among the documents. In the end, I did not note so many issues with consistency of terminology, but I did discover some places where an edit would be helpful. I suppose that the two terms that I had diffculty with are: GRDDL Processor: I think of it as a GRDDL-aware processor. That is, I don't think that I will be using a dedicated GRDDL processor so often. But I do expect that some of my user agents and network services may incorporate awareness of GRDDL conventions. Such agents may or may not be responsible for traversing the link and routing the result accordingly. GRDDL Source Document I think of it as a source document that is a candidate for transformation. That is, just because an XHTML document is decorated with GRDDL ornaments doesn't mean that any transform will ever occur. GRDDL Result Document I think of it as simply a result document, or a gleaned RDF document. That is simply because that is what GRDDL promises me in name. TRANSFORMATION I think that this REL token should have been TRANSFORMER, or something that expresses that the target is in fact a processor. I think that TRANSFORMATION would have been or would be suitable to express that the target is already transformed and is cached. I should note some of my own prejudices as I began reading. I referred to the Wikipedia for a definition of gleaning, to wit: Gleaning is the collection of leftover crops from farmers' fields after they have been mechanically harvested or on fields where it is not economically profitable to harvest. So, I start with the supposition that the subject refers to the act of economically harvesting resource descriptions from disparate sources. Gleaning Resource Descriptions from Dialects of Languages (GRDDL) ================================================ As I read the abstract: This document presents GRDDL, a mechanism for Gleaning Resource Descriptions from Dialects of Languages; that is, for getting RDF data out of XML and XHTML documents using explicitly associated transformation algorithms, typically represented in XSLT. I want to rewrite it thus: This document presents GRDDL, a mechanism for Gleaning Resource Descriptions from Dialects of Languages; that is, for harvesting RDF data from the field of XML documents by identifying transformation algorithms, typically represented in XSLT. A corresponding GRDDL Use Case Working Draft provides motivating examples. A GRDDL Primer demonstrates the mechanism on XHTML documents which include widely-deployed dialects, more recently known as micro formats. 1. Introduction: Data and Documents I think that I know what the opening paragraphs are trying to say, but it is not written in a way that rings true. I think that what needs to be said here is: There are many ways to look at the content of XML documents that exist on the web. There are XML document formats representing everything from poetry to prose, from spreadsheets to databases, from linked lists to ontologies. While this breadth of expression is quite liberating, inspiring new dialects to codify both common and customized meanings, it can prove to be a barrier to understanding across different domains or fields. How, for example, does software discover the author of a poem, a spreadsheet and an ontology? And how can software determine whether authors of each are in fact the same person. The Resource Description Framework[RDFC04] provides a standard for making statements about resources in the form of a subject-predicate-object expression. One way to represent the fact "This book's author is Stephen King" is RDF would be as a triple whose subject is "this book," whose predicate is "has the author", and whose object is "Stephen King," The predicate, "has the author" expresses a relationship between the subject (book) and the object (person). Using URIs to uniquely identify the book, the author and even the relationship would facilitate software design because not everyone knows Stephen King or even spells his name consistently. The RDF framework includes an XML concrete syntax and an abstract syntax. Software tools that use the Resource Description Framework naturally prefer to work with documents whose data is encoded using RDF/XML. GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages; that is, for harvesting RDF data from the field of XML documents by identifying transformation algorithms, typically represented in XSLT. There are essentially three parts to using GRDDL. Firstly, an XML document must identify itself as a candidate for use by a GRDDL-aware processor. Secondly, the candidate document must provide a link to one or more decoding algorithms. Thirdly, the GRDDL-aware processor must traverse the link and execute the target in order to yield the resulting RDF. For example, Dublin Core meta-data can be written in an HTML dialect[RFC2731] that has a clear correspondence to an encoding in RDF/XML[DCRDF]. The correspondence can be expressed in an XSLT transformation, dc-extract.xsl: [Please include the candidate and the result documents.] [...] 2. The GRDDL profile for XHTML This section should more accurately be entitled: 2. Using GRDDL with valid XHTML It should go on to explain that valid XHTML is constrained by a DTD and what the implications of that are on GRDDL. Then it should continue to explain how to use the profile attribute and <LINK element to identify candidacy and link to a transformer. The second example would be much more satisfying if it showed the full source and the eventual result documents, or linked to such in the Primer or Use Cases. Please note that I think that the REL value TRANSFORMATION is a misnomer. It should have been TRANSFORMER. I think that TRANSFORMATION should have been used to link to a document that was previously cached, presumably following an earlier use of the transformer. But I suspect that that ship has already sailed. 3. The GRDDL transformation attribute in XML This section should more accurately be entitled: 3. Using GRDDL with well-formed XML I am confused. Why is the <head profile="http://www.w3.org/2003/g/data-view"> needed at all in this example? The bottom line is that I only need foo:transformation="...", where foo is name associated with the namespace: http://www.w3.org/2003/g/data-view# and I succeed in identifying the document as a candidate and providing its links. 4.GRDDL for XML Namespace and HTML Profile Documents This section should more accurately be entitled: 4. Using GRDDL with XML Namespace and HTML Profile Documents A GRDDL-aware processor can become aware of candidate documents through a parallel awareness of XML namespaces and HTML profiles. That is, transformations can be associated not only with individual documents but also with whole dialects that share an XML namespace or XHTML profile. Then I move into terra-incognita. I have heard discussions over the years about a putative namespace document, but I did not know that there was now such as thing as a normative namespace document. I also got lost in the diagrams and the text. I am asked to consider a lot of things, but I never quite know how I am supposed to make this work. HELP! 5. GRDDL Transformations I think that this should be entitled: 5. GRDDL Transformers It would read more clearly to me as: The resources that are retrievable by traversing a GRDDL:TRANSFORMATION link should be transformation algorithms that have available representations in widely-supported formats. We expect most GRDDL-aware processors to support XSLT version 1[XSLT1] for the foreseeable future, though XSLT2[XSLT2] deployment is increasing. While javascript, C, or any other programming language technically expresses the relevant information, XSLT is specifically designed to express XML to XML transformations and has some good safety characteristics. Again, I am confused. Why is it an error to use document() in your transform? Might I not want to include some boiler-plate RDF in response to a well-known chunk of XML? 6. Security Considerations Now I see why document() is deprecated. But an error? Finally, I did not see any issues over which I thought that XML Processing WG needed to assert any kind of precedence or authority. I'll keep my eyes open. GRDDL Use Cases ============== Abstract A number of documents contain data that could be valuable if they were automatically accessible. In particular, it would be extremely interesting if such documents could be transformed in RDF as a pivot language for other systems which don't use that specific document format themselves. I would say: There exist a plethora of XML documents on the web whose data content could be economically harvested for use by RDF-aware processing tools to make that data available to systems which may not support such a wide variety of dialects but which do support RDF. [...] Use case #1 - Scheduling : Jane is trying to coordinate a meeting. and Glossary I have difficulty reconciling the use of the term "GRDDL Source Document" with the supposed act of gleaning resource descriptions. The use of GRDDL as a descriptor or qualifier prefix belies the fact that most of these documents have a much broader function and applicability than just GRDDL. Surely my home page is not a GRDDL document per se. It may identify itself as a candidate for transformation, or it may be identified as such by virtue of its document element's membership in a namespace or HTML profile, but that doesn't make it a GRDDL source document. In my opinion, as a reader. Certainly the term "source document" applies, as does the term "GRDDL fodder." Use Cases 2-6 I read through them all. I found them interesting and as well-written as one could hope. I read then and thought that I understood the motivation. Show me how this works. It looks too much like hand-waving. HELP! GRDDL Primer =========== Introduction I found that I was already into deep water when I tried to wade through the introduction. If you look at my suggested introduction for the GRDDL WD, I think that you will see that it takes a more gradual approach. Also, I found some RDF-bias in this introduction. I find it entirely unhelpful to position RDF as a preferred method for managing and manipulating data. For each dialect for which a transformation is likely to be developed someday, I am fairly certain that the inventor of that dialect considers their dialect to be the best way to express and convey that information. The point is not so much that RDF is a superior data-encoding format. Rather, I think, the point is that there are so many tools, extant and yet to be developed, that can be leveraged if and when it is practical to harvest greater volumes of data from the web. [...] Linking to a GRDDL Transform I found this title to be out of place, appearing above the content of the undecorated example. I would have found it helpful to see what the undecorated might look like in a typical browser -- yes, I did follow the link and did see it there, but I still think that it would not be onerous to show the text in the Primer. I suggest that the title be used at the point in the example where the LINK element is added. It would seem that another title is also called for: Adding GRDDL to the Profile Which would explain about making profile ="http://www.w3.org/2003/g/data-view" I also think that this where the namespace should be added too, so as not to confuse readers. Referencing via Profile Documents I feel like I am over my head. I had never heard of Profile Documents. I did not encounter this in either of the previous WDs. Why here? Why now? If this is a Primer, it should build up my knowledge slowly and deliberately. This feels like more advanced subject matter. Buying a Guitar Example Seems as though this should be divided into two distinct examples, where the second example utilizes what we learned in the first. The first example would explain about gathering Friends information into a useful collection, and discuss some of the ways that one might imagine using such a collection. The second example would be much as it is now. As it stands, it is a bit daunting for a Primer. ====================================================== That's all for now folks. I look forward to your comments. Regards, Murray
Received on Tuesday, 26 September 2006 21:19:21 UTC