Re: Please declare in GRDDL spec that XML validation is not required from Harry Halpin on 2007-02-27 (public-grddl-comments@w3.org from January to March 2007)

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Tue, 27 Feb 2007 12:28:30 -0500
To: Dave Beckett <dave@dajobe.org>
Cc: public-grddl-comments@w3.org
Message-ID: <45E46A3E.5080405@ibiblio.org>

Actually, as the author of the offending web-page (sorry, it was hacked
together by hand in my attempt to learn Embedded RDF - I'll fix it up
and package it up with the VCard/RDF note after we get GRDDL to Last
Call...), I think the answer is that Raptor is right and GRDDL.py is off.

The reason is that while DanC correctly notes we underspecified lots of
things, we did not underspecify that a GRDDL transforms XPath nodes to
graphs: " If an information resource([WEBARCH]
<http://www.w3.org/2004/01/rdxh/spec#WEBARCH>, section 2.2) IR is
represented by an XML document with an XPath root node R, and R has a
GRDDL transformation with a transformation property TP, and TP applied
to R gives an RDF Graph G, then G is a GRDDL result of IR." I believe in
order to get XPath nodes, once must get an XPath data model: "XPath
operates on the abstract, logical structure of an XML document, rather
than its surface syntax. This logical structure, known as the *data
model*, is defined in [XQuery/XPath Data Model (XDM)]
<http://www.w3.org/TR/xpath20/#datamodel>.]" [1]

Therefore, if something is not a valid XML document, and Raptor claims
that VCard Table is not, then it should not produce any GRDDL results.
However, we do have a use case [2] that shows how tidy can be used to
get well-formed XML out of tagsoup, and therefore get the Infoset.
However, the paragraph DanC mentions notes that this should be a feature
of the transform itself, although clients may try to do this at their
own risk.

[1]http://www.w3.org/TR/xpath20/#id-introduction
[2]
http://www.w3.org/2001/sw/grddl-wg/doc43/scenario-gallery.htm#html_tidy_use_case

Dave Beckett wrote:
> http://chatlogs.planetrdf.com/swig/2007-02-10#T03-28-23
> onwards:
>
> <chimezie> .grddl
> http://www.ibiblio.org/hhalpin/homepage/notes/vcardtable.html "SELECT
> ?homeSyn WHERE { ?homeSyn owl:equivalentProperty foaf:homePage }"
>
> <Emeka> Querying against 98 triples
> ...
>
> Raptor failed on this document.
>
> Checking I found:
>
> $ xmllint --valid --noout
> http://www.ibiblio.org/hhalpin/homepage/notes/vcardtable.html
> http://www.ibiblio.org/hhalpin/homepage/notes/vcardtable.html:29: element
> div: validity error : ID v.Address already defined
> w3.org/2006/vcard/ns#Address">v:Address</a></td><td></td><td><div id="v.Address"
>                                                                                ^
> http://www.ibiblio.org/hhalpin/homepage/notes/vcardtable.html:48: element
> tr: validity error : Element tr content does not follow the DTD, expecting
> (th | td)+, got (td td a td td )
> </tr><tr id="v.url">
>      ^
> http://www.ibiblio.org/hhalpin/homepage/notes/vcardtable.html:134: element
> tr: validity error : ID v.role already defined
> </tr><tr id="v.role">
>
>
> However GRDDL.py was generating triples.   It was not obvious to me
> that you are assuming the GRDDL process runs in WF-only XML mode.
>
> I shall change Raptor's use of libxml accordingly, if this is
> the case.
>
> Is XML validation of the profile/namespace URIs, XSLT documents
> also ignored?  I would assume not, since they are somebody else's
> mime type, spec.  RDF/XML aka application/rdf+xml does use validation.
>
> Dave
>
>   

-- 
		-harry

Harry Halpin,  University of Edinburgh 
http://www.ibiblio.org/hhalpin 6B522426

Received on Tuesday, 27 February 2007 17:28:49 UTC