Re: Escaping control characters in SPARQL/XML or, why doesn't SPARQL/XML use XML 1.1?

On 20/07/15 20:29, Gary King wrote:
> Suppose I have a triple-store containing like
>
>     <http://a> <http://b> "Hi \u001A is control-Z” .
>
> What should the SPARQL/XML output be for this query:
>
>      SELECT ?o { ?s ?p ?o }
>
> If I use Apache Jena 2.13.0 and ask for JSON, I get:
>
> {
>    "head": {
>      "vars": [ "o" ]
>    } ,
>    "results": {
>      "bindings": [
>        {
>          "o": { "type": "literal" , "value": "Hi \u001A is control-Z" }
>        }
>      ]
>    }
> }
>
> Asking for XML, however, gives me:
>
> <?xml version="1.0"?>
> <sparql xmlns="http://www.w3.org/2005/sparql-results#">
>    <head>
>      <variable name="o"/>
>    </head>
>    <results>
>      <result>
>        <binding name="o">
>          <literal>Hi  is control-Z</literal>

There is a real raw control-Z in that line (which is illegal in XML 
1.0).  It just displays as a space character in some fonts.  if I 
cut&paste the line into emacs it displays as ^Z.

Unfortunately, you can't conneg for XML 1.0 vs XML 1.1 as far as I know 
which makes the whole thing a bit of a "no win" situation.  The SPARQL 
Results in XML spec happens to say "XML 1.0".

Historically, an app couldn't (spec-wise) get the character in first 
place (RDF/XML in XML 1.0).  Nowadays, Turtle,

BTW: The state of XML 1.1 for Java is iffy:
https://bugs.openjdk.java.net/browse/JDK-8029437

>        </binding>
>      </result>
>    </results>
> </sparql>
>
> Where the control-Z character has disappeared.
>
> AFAIK, XML 1.0 cannot encode these control characters, whereas an XML 1.1 output could use &#x1a;. I also see that the RDF validator (http://www.w3.org/RDF/Validator/) is perfectly happy with the results whereas it seems as if it should not be?
>
> thoughts?

Use JSON (and it parses faster), or SPARQL results in TSV.

Emit XML 1.1 if you need to.

 Andy

>
> thanks,
> --
> Gary Warren King, metabang.com
> Cell: (413) 559 8738
> Fax: (206) 338-4052
> gwkkwg on Skype * garethsan on AIM * gwking on twitter
>
>

Received on Tuesday, 21 July 2015 07:32:29 UTC