W3C home > Mailing lists > Public > public-grddl-wg@w3.org > October 2006

RE: out-of-band transformation information

From: McBride, Brian <brian.mcbride@hp.com>
Date: Thu, 26 Oct 2006 16:20:34 +0100
Message-ID: <86FE9B2B91ADD04095335314BE6906E884DA4E@sdcexc04.emea.cpqcorp.net>
To: "Dan Connolly" <connolly@w3.org>, "GRDDL Working Group" <public-grddl-wg@w3.org>

> specifying GRDDL transformation for document with no 
> transformation attribute? Bob DuCharme (Wednesday, 25 
> October) 
> http://lists.w3.org/Archives/Public/public-grddl-comments/2006
OctDec/0010.html
> 
> Brian McBride made a similar comment back in January...
> 
> "I think there are at least two things missing:
> 
> 2) a way to describe a transformation on a (set of) pages 
> without access to the pages themselves or their schema. "
>  -- Brian McBride, 27 Jan 2006
>  http://lists.w3.org/Archives/Public/www-archive/2006Jan/0049

I still think there is such a requirement and I considered making such a
comment on the use cases document, but I have not yet thought through
what the implications for the GRDDL spec could be.  It may be that we
can do rather a lot with GRDDL as currently conceived.

[...]

> Chime added a point that is closer to my position...
> 
> "Well, a transformation nominated / defined by the producer 
> (in this case) would be more authorative than one nominated 
> by consumer (especially if the content is in a specific 
> vocabulary), wouldn't you say? "
> http://lists.w3.org/Archives/Public/public-grddl-comments/2006
OctDec/0013.html

I think the answer to this is "Sometimes.  So?".

I can conceive of cases where folks publish information.  They have
publication processes for that they don't want to mess with, but they
would be willing to put up an 'out of the main stream' process that
provided an RDF translation in the form of a transform on the original.
In this case the source of the transformation as the same authority has
the publisher of the original information.

I can also conceive of cases where we in HPLabs would put up services
which provided RDF versions of information published by others in HP.
Yes, our authoritity is different to that of the original publishers and
folks can make up their minds whether to trust us or not.  I see this a
very useful facility for bootstrapping the use of semantic web
technology inside HP.  We can just do it, and if folks find it useful,
then this creates pressure for the original publishers to take on the
job.

[...]

> 
> On the technical substance, out-of-band transformation is 
> scraping, and please let's keep that separate from GRDDL. 
> GRDDL is about data that the publisher says, authoritatively, 
> is RDF data. i.e.
> you can follow your nose from the document to the 
> transformation to RDF.

I'm open to the idea that what Dan's suggests is the best answer.  I
think GRDDL is very useful without 'out of band transformations'.  But
what Dan has written is a plea, rather than an argument and I think we
need an argument to explain why GRDDL has been scoped to exclude this
requirement.  As you can see, I'm not yet convinced by the 'authority
argument".

Part of the problem may lie in the name GRDDL confusing folks about the
intent.  "Gleaning" suggests GRDDL is about something the client does
rather than, as Dan seeks to scope it, something the publisher does.
PRDDL?  But I think we are too late for a name change.

There is a general idea that RDF can be represented in the form of XML
plus a transform from that XML to a representation of RDF.  An
appropriate agent can identify the applicable transforms and run them to
produce RDF.  GRDDL defines a means for agents to determine the
appropriate transforms by examining the published XML.  Agents could
have other means of determining what transforms their users want them to
apply - and some means for determining what transforms are available.
Such means could be the subject of future specifications, should there
be demand for such.

> 
> There is clearly an issue here, but I am not inclined to add 
> it to the issue list in the GRDDL specification.

It needs to be tracked somewhere.

> I'd like someone to give this out-of-band transformation 
> stuff a separate name, since it's clearly not useful to 
> pretend the issue doesn't exist.

I used to call it "3rd party transformations", I think.

Brian
Received on Thursday, 26 October 2006 15:21:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:11:46 GMT