issue-dbooth-9a: GRDDL should be usable in a messaging pipeline

This is a personal comment -- not on behalf of HP.  

[Incidently, I will be happy to help supply proposed wording changes to
the GRDDL spec that would address this issue, though this message does
not specify them.]

Suppose I wish to use GRDDL in a Unix pipeline in order to make App1
produce RDF for consumption by App2 by wrapping App0 and transforming
its XML output of App0:

                 App1
  +------------------------------------+
  |                                    |
  |  +------+               +-------+  |               +------+
  |  |      |               |       |  |               |      |
  |  | App0 |--> XML msg -->| GRDDL |--|--> RDF msg -->| App2 |
  |  |      |               | xform |  |               |      |
  |  +------+               +-------+  |               +------+
  |                                    |
  +------------------------------------+


Some observations:

 - App0 is not "on the web" and does not have a URI.  However, I suppose
one could consider it an "information resource" in the TAG WebArch
sense.
 - Each XML msg is a particular XML instance document -- a concrete
sequence of bytes, or "representation" in the TAG WebArch sense. There
is nothing vague or abstract about it. No content negotiation is
involved.
 - Each XML msg is a separate message whose entire semantics are to be
exposed by GRDDL transforming it into RDF.
 - It does not make sense to talk about the GRDDL results of App0 in
general, as though App0 were a static "information resource". It only
makes sense to talk about the GRDDL results of a particular XML message.


The main reason the spec at present does not adequately address this use
case is that in multiple places the spec defines GRDDL results in terms
of "information resources" instead of "representations".  It does not
always make sense to talk about the GRDDL results of an information
resource, because that information resource may produce different
information content at different times or for different consumers.  But
it *always* makes sense to talk about the GRDDL results of a specific
representation.

For example, suppose an information resource, ir,  produces a different
representation each time it is queried -- the current weather in Oaxaca,
for example -- and I have two XSLT scripts that I use to glean RDF from
them: one extracts the temperature (getTemperature.xsl) and the other
extracts the humidity (getHumidity.xsl).  The final RDF should be the
combined result of applying getTemperature.xsl and getHumidity.xsl to
the *same* representation.  But the spec does not define merged GRDDL
results for a particular representation, it only defines them for an
information resource as a whole, which could have a jumble of
temperatures and humidities from different days.

Actually, to be more specific, the problem is not that the spec *does*
define results in terms of information resources -- I don't see big harm
in *also* doing that (except that by doing so it introduces unnecessary
ambiguity, which I'll discuss separately) and for namespace and profile
documents there is a need to go from information resource to GRDDL
results -- the problem is that it *fails* to define results in terms of
representations.

Here are some places in the spec where this problem shows up:
http://www.w3.org/TR/grddl/#rule_result
http://www.w3.org/TR/grddl/#rule_merge
http://www.w3.org/TR/grddl/#rule_rdfxbase
http://www.w3.org/TR/grddl/#rule_profiletrans
http://www.w3.org/TR/grddl/#rule_txprop
http://www.w3.org/TR/grddl/#GRDDL_aware_agent
http://www.w3.org/TR/grddl/#agt_obl

These should be relatively easy to fix.  In fact, quite tellingly, some
of the normative rules define a variable for a posited "information
resource" but never reference that variable.

Lest anyone assert that this pipeline use case is outside the WG's scope
because it isn't reflected in the GRDDL use cases document, I will note
that:

 - The GRDDL use cases document contains *many* aspects of problem
context that nobody intends to become a part of the spec.

 - The use of information resources in the use cases document seemed
quite natural to me, as part of the *context*, and hence I did not see
any problem with the use cases document when I reviewed that earlier.

 - It seemed obvious to me that GRDDL would be used to define the
semantics of individual *representations*, since one can only really be
sure of the semantics of a particular representation.

 - It *always* makes sense to talk about the GRDDL results of a
particular representation; it does not always make sense to talk about
the GRDDL results of an information resource.

 - The GRDDL use cases document does in fact have a wiki example, and
since wiki content often changes, that is a good example of the need for
GRDDL results to reflect the semantics of a particular *representation*
rather than an information resource in general.

BTW, I would be happy to join the teleconference (if invited) to further
explain and answer questions if you think that would be helpful.

Again, thanks for all your work on this, and please let me know how I
can be most useful in helping to resolve this issue.

Thanks

David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software

Received on Friday, 25 May 2007 07:35:29 UTC