Re: Multiple GRDDL results in a single transform??? GRDDL and Named Graphs from Phil Archer on 2008-01-22 (public-grddl-wg@w3.org from January 2008)

From: Phil Archer <parcher@icra.org>
Date: Tue, 22 Jan 2008 09:50:52 +0000
To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
CC: Jeremy Carroll <jjc@hpl.hp.com>, "public-grddl-wg@w3.org" <public-grddl-wg@w3.org>, "patrick.stickler@nokia.com" <patrick.stickler@nokia.com>, "chris@bizer.de" <chris@bizer.de>
Message-ID: <4795BC7C.9060602@icra.org>
Thanks for this, David.

I spent some time thinking through Jeremy's latest input and this 
thread. Two related e-mails came out of that [1, 2]. In the first I 
created some examples of simple Description Resources in an operational 
RDF/XML format - i.e. with loose semantics that could perhaps then be 
transformed into a structure with more formal encoding. Sample RDF/XML 
files and Graphs all posted - although I did notice some errors in the 
mail that I've since corrected.

But... then turning to this thread I re-wrote those examples just as XML 
for [2]. Given your comments about the difficulty of transforming 
RDF/XML (which Dan C also mentioned on a recent SW Coordination Group 
call) this seemed a sensible use of time. I tried to look for reasons 
why RDF/XML might be preferable to just using XML. Perhaps the former 
offers more flexibility? Well, it does in some regards, especially if 
the person who defines the description is different from the person who 
says "all the resources on example.org fit that description". But I 
don't think this is a big deal, especially now that the model now seems 
to be:

For each POWDER document: Check the attribution and validity information

If the source is trusted and the validity conditions are met
   Apply the transform to obtain semantic DRs
   Merge the RDF with your triple store
Else
   Ignore

I worry a little about the fact that there's no "and remember to demerge 
those triples when the validity conditions no longer apply" - but that 
seems to be unavoidable.

A case where RDF/XML might be better than XML might be when assigning a 
classification to a bunch of movies. Something like

<wdr:hasScope rdf:parseType="Resource">
   <wdr:includeHosts>movies.example.com</wdr:includeHosts>
   <wdr:hasDescriptors rdf:parseType="Resource">
     <ex:classification rdf:resource="http://classify.example.org#PG-13" />
   </wdr:hasDescriptors>
</wdr:hasScope>

The classification organisation is not the same as the content provider 
(or DR author)

But I'm not sure that this is any more flexible than:

<URIset>
   <includeHosts>movies.example.com</includeHosts>
   <Descriptors>
 
<ex:classification>http://classify.example.org#PG-13</ex:classification>
   </Descriptors>
</URISet>

And if the latter can be GRDDL'd and the former can't, well, that kind 
of seals it I'd say!

We've been trying all along to retain the expressivity and extensibility 
of the semantic web. Since we began, and especially since TPAC, it's 
been clear that something has to give... but in reality we don't have a 
use case that demands such flexibility. What we do have a is a lot of 
constraints on the reality of what content providers and potential users 
of DRs can and can't be expected to do.

Right... off to teach myself XSLT...

Phil.

[1] http://lists.w3.org/Archives/Public/public-powderwg/2008Jan/0018.html
[2] http://lists.w3.org/Archives/Public/public-powderwg/2008Jan/0019.html

Booth, David (HP Software - Boston) wrote:
> Hi Phil,
> 
> Yes, I think the two-stage approach we discussed in Boston is a good one: using an XML notation whose RDF semantics are given by a GRDDL transformation.  My alarm was in seeing that the initial XML might specifically be RDF/XML.  I think that would be problematic.  But I also think there should be ways to work around that issue.  In particular, if the XML format were *not* RDF/XML -- even superficially different -- it would avoid the problem, and a GRDDL transformation could be specified that would map it to the real RDF.
> 
> I was actually assuming that POWDER would want to use a much more custom XML notation in order to be easier for XML-only processors to handle, rather than looking like RDF/XML, which I would think would be harder for XML-only processors to handle.  Given that a transformation will be able to yield the resulting RDF, is there a reason why you are thinking of having the original XML be RDF/XML?
> 
> 
> David Booth, Ph.D.
> HP Software
> +1 617 629 8881 office  |  dbooth@hp.com
> http://www.hp.com/go/software
> 
> Opinions expressed herein are those of the author and do not represent the official views of HP unless explicitly stated otherwise.
> 
> 
>> -----Original Message-----
>> From: Phil Archer [mailto:parcher@icra.org]
>> Sent: Friday, January 18, 2008 11:14 AM
>> To: Booth, David (HP Software - Boston)
>> Cc: Jeremy Carroll; public-grddl-wg@w3.org;
>> patrick.stickler@nokia.com; chris@bizer.de
>> Subject: Re: Multiple GRDDL results in a single transform???
>> GRDDL and Named Graphs
>>
>> It is beyond my skill to get involved with the detailed
>> discussion here
>> but, as you know, David, we ended up theorising about a two-stage
>> encoding of POWDER when we met in Boston last year. In a fine
>> example of
>> convergent evolution, Jeremy has arrived at a similar notion - an
>> operational version of a Description Resource, mapping via a
>> prescribed
>> transformation into a semantically more exact version (we are toying
>> with names like DR-O/DR-S or POWDER Lite/POWDER Full for these).
>>
>> Now... since GRDDL is about extracting RDF where it may not
>> be apparent,
>> its use for this transformation feels right (and it's always
>> nice to use
>> new Recs), but, if the detail doesn't allow this, OK, RDF/XML
>> is XML so
>> we should be able to use XSLT - I think. And if we can't use that
>> either, then I think we may well contemplate writing our own
>> algorithm.
>>
>> We're exploring all possibilities here :-)
>>
>> Phil.
>>
>> Booth, David (HP Software - Boston) wrote:
>>> In http://www.w3.org/2007/OWL/wiki/POWDER
>>> I am taken aback by this statement:
>>> "By the operation of GRDDL, then every POWDER document has
>> two GRDDL results: itself (being an RDF/XML document), and
>> the result of the POWDER transform applied to that document."
>>> In the GRDDL WG I remember pursuing the question of whether
>> an RDF/XML document could have a GRDDL transformation (by
>> virtue of being XML) in addition to the identity
>> transformation defined by the GRDDL spec:
>>> http://www.w3.org/2004/01/rdxh/spec#rule_rdfxbase
>>> "If an information resource IR is represented by a
>> conforming RDF/XML document[RDFX], then the RDF graph
>> represented by that document is a GRDDL result of IR."
>>> I remember being told that it is not possible: the RDF/XML
>> syntax does not allow the grddl:transformation attribute to
>> be specified on the root element.  Indeed, the RDF validator at
>>> http://www.w3.org/RDF/Validator/
>>> confirms this.  When I feed this supposedly RDF/XML into
>> the validator:
>>> [[
>>> <?xml version="1.0"?>
>>> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>   xmlns:dc="http://purl.org/dc/elements/1.1/"
>>> xmlns:grddl='http://www.w3.org/2003/g/data-view#'
>>>       grddl:transformation="glean_title.xsl"
>>>   <rdf:Description rdf:about="http://www.w3.org/">
>>>     <dc:title>World Wide Web Consortium</dc:title>
>>>   </rdf:Description>
>>> </rdf:RDF>
>>> ]]
>>> The validator reports: "Error: {E201} Illegal attributes on
>> rdf:RDF[Line = 6, Column = 2]"
>>> How exactly is POWDER proposing to gain this additional
>> GRDDL transformation?
>>>
>>>
>>> David Booth, Ph.D.
>>> HP Software
>>> +1 617 629 8881 office  |  dbooth@hp.com
>>> http://www.hp.com/go/software
>>>
>>> Opinions expressed herein are those of the author and do
>> not represent the official views of HP unless explicitly
>> stated otherwise.
>>>
>>>> -----Original Message-----
>>>> From: public-grddl-wg-request@w3.org
>>>> [mailto:public-grddl-wg-request@w3.org] On Behalf Of Jeremy Carroll
>>>> Sent: Thursday, January 17, 2008 1:43 PM
>>>> To: public-grddl-wg@w3.org
>>>> Cc: patrick.stickler@nokia.com; chris@bizer.de; Phil Archer
>>>> Subject: Multiple GRDDL results in a single transform???
>>>> GRDDL and Named Graphs
>>>>
>>>>
>>>>
>>>> Summary:
>>>> - XSLT2 supports multiple output documents, is each a GRDDL result?
>>>> - With a document with multiple GRDDL results can we
>> regard each as a
>>>> graph in a named graph approach (particularly if each
>> GRDDL result is
>>>> given a different base URI somehow, e.g. in an XSLT2
>> result-document
>>>> instruction)
>>>> - Can different GRDDL results for the same document be treated with
>>>> different pragmatic force (e.g. the end-user acts on some
>> of the GRDDL
>>>> results while ignoring others, perhaps in a systematic way)
>>>> - Note it is possible to do this with XSLT1, and some trickery
>>>>
>>>> ===========
>>>>
>>>> I am looking at POWDER, and thinking about using GRDDL to convert a
>>>> simpler form into a more complicated form.
>>>>
>>>> The idea is that the simpler form would be more suited to
>> operational
>>>> processing, but the more complex form would have a fuller
>> statement of
>>>> the formal semantics, that underwrites the operational semantics.
>>>>
>>>> The page on which I am working is:
>>>>
>>>> http://www.w3.org/2007/OWL/wiki/POWDER
>>>>
>>>>
>>>> One issue is that a typical POWDER document consists of one DR
>>>> (description of resources or something). Some POWDER
>> documents consist
>>>> of more than one DR.
>>>>
>>>> A DR typically specifies the following:
>>>>    - validity dates, during which it is claimed
>>>>    - a set of resources defined by matching various
>> properties of URIs
>>>>    - properties that each of those resources are claimed to
>>>> have, while
>>>> the DR is valid (e.g. being pornographic)
>>>>
>>>> Thus a DR can be seen as claiming a rdfs:subClassOf
>>>> relationship, during
>>>> validity dates.
>>>>
>>>> One way of handling this, in the single DR case, is to include the
>>>> subClassOf in the GRDDL result, make the validity dates
>> refer to the
>>>> document itself (the information resource), so that outside
>>>> the validity
>>>> period the GRDDL result says that it is invalid, and hence
>>>> shouldn't be
>>>> believed; whereas during the validity period, the
>> subClassOf triple is
>>>> asserted.
>>>>
>>>> /// aside
>>>> Another way of handling this is to move all the complexity
>> of validity
>>>> and subClassOf etc. into the text of the definition of DR,
>> and use a
>>>> 'semantic extension' as the formal implementation ....
>>>> /// i don't really like that, since it's pushing the maths past its
>>>> design  limitations.
>>>>
>>>> ====
>>>>
>>>> Here is an XSLT1 implementation sketch, for multiple DRs in a
>>>> single file.
>>>>
>>>> The namespace is used to encode (an upper bound for) the
>>>> number of DRs.
>>>> e.g.
>>>>
>>>> http://example.org/powder?10
>>>>
>>>> can have no more than 10 DRs in it, whereas
>>>> http://example.org/powder?1000
>>>>
>>>> can have 1000 DRs
>>>>
>>>> The GRDDL result for
>>>>
>>>> http://example.org/powder?N
>>>>
>>>> provides N different GRDDL transforms for the namespace, the i-th
>>>> transform selecting the i-th DR in the document and
>> transforming it.
>>>> The result of the i-th transform includes the validity
>> triples for the
>>>> ith DR and the subClassOf triple, which should only be
>> believed if the
>>>> DR is valid.
>>>>
>>>> The intended reading is that the GRDDL results including
>>>> invalid DRs are
>>>> filtered, and only the GRDDL results with valid DRs are beleived.
>>>>
>>>> One way of achieving this is to attach the validity to the
>> information
>>>> resource itself, e.g. a GRDDL result of
>>>>
>>>>    <rdf:Description rdf:about="">
>>>>       <wdr:validFrom>2007-01-01</wdr:validFrom>
>>>>       <wdr:validUntil>2007-07-07</wdr:validUntil>
>>>>    </rdf:Description>
>>>>
>>>> would describe a current invalid information resource, and hence,
>>>> pragmatically not useful.
>>>>
>>>> In this way, an application would have many different
>> GRDDL results,
>>>> some describing a valid information resource, some not, and it is
>>>> expected to act on the merge of the GRDDL results
>> describing a valid
>>>> information resource.
>>>>
>>>> Jeremy
>>>>
>> --
>> Phil Archer
>> Chief Technical Officer,
>> Family Online Safety Institute
>> w. http://www.fosi.org/people/philarcher/
Received on Tuesday, 22 January 2008 09:51:14 UTC