Re: literals with \0

On 01/05/13 12:48, Eric Prud'hommeaux wrote:
> * Andy Seaborne <andy.seaborne@epimorphics.com> [2013-05-01 10:19+0100]
>> gedit complains about (but displays) the attachment.
>>
>> On 01/05/13 05:52, Eric Prud'hommeaux wrote:
>>> I've noticed 6 vectors for creating literals with C0 codes
>>> (including \0):
>>>    old turtle
>>>    APIs
>>>    SPARQL CONSTRUCT
>>>    SPARQL Update
>>>    RDBs via Direct Mapping
>>>    RDBs via R2RML
>>> (RDB example reproducable with
>>>    create table test(s text);
>>>    insert into test (s) values ('a\0b');
>>>    select s, length(s) from test;
>>>    +------+-----------+
>>>    | s    | length(s) |
>>>    +------+-----------+
>>>    |      |         1 |
>>>    | a b  |         3 |
>>>    +------+-----------+
>>
>> ? where did the first row come from?
>
> MySQL's D-entailment. ˚͜˚
> My first insert was '\0\, but i figured that 'a\0b' would be more
> illustrative.
>
>
>>> ).
>>>
>>> These can't be serialized in RDF/XML. Nor can the results of a query
>>> including this data be serialized in application/sparql-results, e.g.
>>
>> application/sparql-results+xml
>
> quite right -- tx for the correction.
>
>
>> There is also
>>
>> application/sparql-results+json
>> text/tab-separated-values

TSV says
http://www.iana.org/assignments/media-types/text/tab-separated-values

"""
Required Parameters: Character Set, Encoding Type
"""


I avoided CVS as it is not a true representation of the data but ...

> Does text/csv permit *anything* outside of
> %x20-21 / %x23-2B / %x2D-7E / COMMA / CR / LF / 2DQUOTE ?
> — http://tools.ietf.org/html/rfc4180#page-4

RFC 4180 says:
"""
Common usage of CSV is US-ASCII, but other character sets defined
       by IANA for the "text" tree may be used in conjunction with the
       "charset" parameter.
"""
so UTF-8 is possible.

>
>
>> JSON allows \u0000 - RFC 4627 refers to Unicode 4.0
>>
>>
>>>    SELECT ?icon { ?who <p> ?icon FILTER (regex(?icon, "PNG")) }
>>> They can, however, be queried in SPARQL:
>>>    SELECT ?who { ?who <p> ?icon FILTER (regex(?icon, "PNG")) }
>>> (Technically, useful functions like fn:regex are based on strings, but
>>> I don't know of implementations which enforce this.)
>>>
>>> In theory, existing turtle files like the attached are rendered
>>> illegal by the post-facto declaration that they are xs:strings.
>>> In practice, people don't enforce this (noting that these tests
>>> existed for a while in Turtle with no one failing or crying fowl.)
>>>
>>
>

Received on Wednesday, 1 May 2013 12:06:12 UTC