Notes on implementing GRDDL from scratch

As an exercise, I wrote (from scratch) a GRDDL implementation for RDFLib 
and 4Suite and ported testHarness.py to work with the implementation (it 
uses RDFLib for processing the test manifest and a graph isomorphism 
mechanism to properly check non-lean graphs for equivalence).  Both are 
attached and get through the test suites (including the RDFa test DanC 
added recently).  I plan on modifying the ported testHarness to output 
test results using the EARL vocabulary [1].

Below are some notes along the way that I thought were relevant:

## Using the GRDDL source Uri as the Base URI  ##

The GRDDL source uri is used as the Base URI when parsing the source 
document as well as when parsing the resulting RDF syntax.  The APIs 
for both scenarios allow an explicit base uri to be passed on as a 
parameter.  This properly accomodated the use of empty relative URIs references within the result of 
one of the test cases (I forget which).  The base was also used to resolve 
references to transformation uris (some of which were relative URI 
references).

## NS Dispatch Termination ##

I setup a list of namespace uri's that are known to not be GRDDLable (to 
avoid any uneccessary attempt to glean from them).  Currently the XHTML 
namespace is the only item in this list

In addition, to aid in avoiding circular namespace dispatch processing, 
the implementation maintains a list of applied transforms.  Perhaps it 
should also maintain a list of visited namespace uris to avoid that kind 
of redundancy as well?  Is guidance in the spec appropriate for such a 
scenario?

## Guidance in parsing a GRDDL result (@method or @media-type?) ##

Currently, the implementation keys off xsl:output/@method to determine how 
it parses the resulting RDF syntax.  This seems to provide sufficient 
coverage, but ofcourse, doesn't accomodate specific mime-types (which can 
also be specified via xsl:output/@media-type).

For example: RDFLib has a built in RDFa parser, however the client needs 
some *specific* indication of when to try parsing the resulting RDF syntax 
as RDFa.  For example, if there is a specific media-type for RDFa 
(application/xhtml+xml+rdfa - or some such), the only way to guide the 
parser appropriately is to use xsl:output/@media-type otherwise the parser 
would only know that the result was XML but not whether it is (RDFa or 
RDF/XML).

Currently it will only try to parse a GRDDL result identified as xml (via 
@method) as RDF/XML.  I guess a more comprehensive approach would be to 
check the media-type as a secondary indication to @method, but what about 
if they 'clash' - i.e., the @method is xml, but the media-type is text/n3?

Ofcourse, if the resulting XHTML/RDFa had GRDDL hooks that pointed to an 
RDFa2RDFXML transform, this would be a non-issue as the glean process 
would pick this up (as long as the RDFa/XHTML @method was 'xml').

## Mime-types of GRDDL source URIs ##

The implementation has a (disabled) mechanism for only attempting to parse 
a GRDDL source URI as XML if the content-type in the HTTP header response 
is appropriate:

   (?:text|application)/.*\+?xml'

Should a glean not be attempted if a GRDDL source document is served as 
text/plain? The documents in the test suite, for instance are served as 
text/plain

[1] http://www.w3.org/TR/EARL10-Schema/

Chimezie Ogbuji
Lead Systems Analyst
Thoracic and Cardiovascular Surgery
Cleveland Clinic Foundation
9500 Euclid Avenue/ W26
Cleveland, Ohio 44195
Office: (216)444-8593
ogbujic@ccf.org

Received on Friday, 10 November 2006 17:57:59 UTC