Re: Are there valid RDF/XML documents that encode invalid RDF?

There have been other cases when we’ve added tests to pre-existing RDF test suites. The RDF Test Suite Community Group manages all the existing test suites for RDF and SPARQL [1]. You might consider a pull request to add a test for RDF/XML. This would be an rdft:TestXMLNegativeSyntax test. It seems to me that this wouldn’t test the use of character reference expansion in RDF/XML, and it might require a positive test case to make sure that it is done correctly. We can discuss how appropriate the tests might be on the Pull Request.

Gregg Kellogg
gregg@greggkellogg.net

[1] https://github.com/w3c/rdf-tests <https://github.com/w3c/rdf-tests>

> On Jul 6, 2020, at 8:41 AM, Wouter Beek <wouter@triply.cc> wrote:
> 
> Dear Richard,
> 
> > I can't find any rationale for ignoring the &#xA; character reference. And the referenced character is not allowed in an IRI. This would make the document not valid RDF/XML.
> 
> In that case most (if not all) RDF/XML parsers do this the wrong way :-(
> 
> ---
> Best,
> Wouter.
> 
> Email: wouter@triply.cc
> WWW: https://triply.cc <https://triply.cc/>
> Tel: +31647674624
> 
> 
> On Fri, Jul 3, 2020 at 7:04 PM Richard Cyganiak <richard@cyganiak.de <mailto:richard@cyganiak.de>> wrote:
> I can't find any rationale for ignoring the &#xA; character reference. And the referenced character is not allowed in an IRI. This would make the document not valid RDF/XML.
> 
> Richard
> 
> 
> 
> > On 2 Jul 2020, at 20:27, Wouter Beek <wouter@triply.cc> wrote:
> > 
> > Dear list,
> > 
> > We encounter RDF/XML documents in the wild that contain `&# HEX HEX`
> > escaped characters.  Here is an MWE (notice the subject term):
> > 
> > ```
> > <?xml version="1.0" encoding="utf-8" ?>
> > <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns# <http://www.w3.org/1999/02/22-rdf-syntax-ns#>" xmlns:ns0="b:">
> >  <rdf:Description rdf:about="a:&#xA;">
> >    <ns0:b rdf:resource="c:c"/>
> >  </rdf:Description>
> > </rdf:RDF>
> > ```
> > 
> > Some RDF/XML parsers remove these escape sequences altogether (without
> > replacing them with anything), e.g., Rapper, W3C RDF/XML validator.
> > 
> > Some RDF/XML parsers replace these escape sequences with the
> > corresponding characters, thereby introducing syntax errors in RDF
> > terms (in the above example: introducing an unescaped newline
> > character inside an IRI).  An example of such a parser is
> > <https://github.com/rdfjs/rdfxml-streaming-parser.js/issues/39 <https://github.com/rdfjs/rdfxml-streaming-parser.js/issues/39>>.
> > 
> > My question is as follows:
> >  1. Is the above example snippet a valid RDF/XML document?
> >  2. If so, is it intended that some valid RDF/XML documents encode
> > invalid RDF, or is there a standard procedure of handling such
> > documents such that result in valid RDF somehow?
> > 
> > ---
> > Best,
> > Wouter.
> > 
> > Email: wouter@triply.cc
> > WWW: https://triply.cc <https://triply.cc/>
> > Tel: +31647674624
> > 
> 

Received on Monday, 6 July 2020 15:58:26 UTC