GRDDL.py modifications and Earl Report

Attached is the latest EARL report for GRDDL.py after an attempt to
cover more of the new tests.  The ones it still fails are (with
justification in most cases):


* #grddlProfileBase4 failed
* #xmlbase1 failed

Looks to me like the statement in the output graph should be made about
http://www.w3.org/2001/sw/grddl-wg/td/xmlWithBase not
http://www.w3.org/2001/sw/grddl-wg/td/base/xhtmlWithBaseElement if GRDDL
is following XML Base WRT to the source document

* #xmlbase3 failed

The base of the GRDDL result (the subject of the statement) should be  
http://www.w3.org/2001/sw/grddl-wg/td/xmlWithBase as determined by the
embedded xml:base

* #xmlbase4 failed

The base of the GRDDL result should be what the protocol specified :

http://www.w3.org/2001/sw/grddl-wg/td/base/xmlWithoutBase

* #primer-hotel-data failed
* #xhtmlWithMoreThanOneProfile failed
* #httpHeaders failed
* #htmlbase1 failed
* #embedded-rdf7 failed

XSLT 1.0 failure

* #embedded-rdf4 failed

failure due to RDFLib - see below

* #htmlbase4 failed
* #embedded-rdf7-alt failed

XSLT 2.0 transform

* #xmlbase2 failed

The base of the GRDDL result should be the absolute URL of the input
document (there is no redirect and no notion of a base URI in the body
of the input document):

http://www.w3.org/2001/sw/grddl-wg/td/base/xmlWithoutBase

Modifications were made to GRDDL.py.  In particular:

1. The use of urllib2 was modified to retrieve the *final* location
(even in the face of redirects).  This is passed on to the 4Suite XML
non-validating parser as the 'default base', the base of the root node
(determined with respect to what the protocol indicates) is used
consistently from then on as the XML Base for all GRDDL mechanisms
2. It will try to parse the source as RDF/XML only once (using the
returned media-type and the root element as criteria)
3. While parsing a GRDDL result, it will pass RDFLib the BaseURI of the
root node (calculated by 4Suite's expanded support for XML Base ontop of
XPath 1.0 - which inherently does not support XML Base) as the baseUri
to use if none is specified within (note however, RDFLib 2.4.0 does not
seem to be properly resolving relative xml:base values against a given
base URI - it attempts to hold it's hand in this regard)
4. When matching profileTransformation and namespaceTransformation an
appropriate resolution against this base is done (XHTML Base is not
taken into account currently)
5. GRDDL mechanisms fail gracefully (so a failure to glean via XMLNS
will not prevent an attempt at XHTML profile-based gleaning, for
instance)

So, I believe it conforms with XML Base (and the dependent URI RFC) as
best it can given the possible limitations to the parsing of relative
xml:base within a GRDDL result

-- 
Chimezie Ogbuji
Lead Systems Analyst
Thoracic and Cardiovascular Surgery
Cleveland Clinic Foundation
9500 Euclid Avenue/ W26
Cleveland, Ohio 44195
Office: (216)444-8593
ogbujic@ccf.org


===================================




Cleveland Clinic is ranked one of the top 3 hospitals in
America by U.S.News & World Report. Visit us online at
http://www.clevelandclinic.org for a complete listing of
our services, staff and locations.


Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.

Received on Wednesday, 25 April 2007 14:19:30 UTC