- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Wed, 1 May 2013 08:32:12 -0400
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: public-rdf-wg@w3.org
* Andy Seaborne <andy.seaborne@epimorphics.com> [2013-05-01 13:05+0100]
>
>
> On 01/05/13 12:48, Eric Prud'hommeaux wrote:
> >* Andy Seaborne <andy.seaborne@epimorphics.com> [2013-05-01 10:19+0100]
> >>gedit complains about (but displays) the attachment.
> >>
> >>On 01/05/13 05:52, Eric Prud'hommeaux wrote:
> >>>I've noticed 6 vectors for creating literals with C0 codes
> >>>(including \0):
> >>> old turtle
> >>> APIs
> >>> SPARQL CONSTRUCT
> >>> SPARQL Update
> >>> RDBs via Direct Mapping
> >>> RDBs via R2RML
> >>>(RDB example reproducable with
> >>> create table test(s text);
> >>> insert into test (s) values ('a\0b');
> >>> select s, length(s) from test;
> >>> +------+-----------+
> >>> | s | length(s) |
> >>> +------+-----------+
> >>> | | 1 |
> >>> | a b | 3 |
> >>> +------+-----------+
> >>
> >>? where did the first row come from?
> >
> >MySQL's D-entailment. ˚͜˚
> >My first insert was '\0\, but i figured that 'a\0b' would be more
> >illustrative.
> >
> >
> >>>).
> >>>
> >>>These can't be serialized in RDF/XML. Nor can the results of a query
> >>>including this data be serialized in application/sparql-results, e.g.
> >>
> >>application/sparql-results+xml
> >
> >quite right -- tx for the correction.
> >
> >
> >>There is also
> >>
> >>application/sparql-results+json
> >>text/tab-separated-values
>
> TSV says
> http://www.iana.org/assignments/media-types/text/tab-separated-values
>
> """
> Required Parameters: Character Set, Encoding Type
> """
>
>
> I avoided CVS as it is not a true representation of the data but ...
>
> >Does text/csv permit *anything* outside of
> >%x20-21 / %x23-2B / %x2D-7E / COMMA / CR / LF / 2DQUOTE ?
> >— http://tools.ietf.org/html/rfc4180#page-4
>
> RFC 4180 says:
> """
> Common usage of CSV is US-ASCII, but other character sets defined
> by IANA for the "text" tree may be used in conjunction with the
> "charset" parameter.
> """
> so UTF-8 is possible.
Given that the grammar permits only a subset of ASCII, it seems that
any ASCII-compatible encoding (JIS, UTF-8) would only express the
ASCII subset. For non-ASCII-compatible encodings (UTF-16, EBCDIC),
there'd be a point to the charset parameter, but it still wouldn't
permit any characters outside ASCII.
Or maybe the interpretation is supposed to be "if you're using a non-
ASCII encoding, make up a new production for TEXTDATA." At any rate,
the path to character range compatibility isn't clear to me.
> >>JSON allows \u0000 - RFC 4627 refers to Unicode 4.0
> >>
> >>
> >>> SELECT ?icon { ?who <p> ?icon FILTER (regex(?icon, "PNG")) }
> >>>They can, however, be queried in SPARQL:
> >>> SELECT ?who { ?who <p> ?icon FILTER (regex(?icon, "PNG")) }
> >>>(Technically, useful functions like fn:regex are based on strings, but
> >>>I don't know of implementations which enforce this.)
> >>>
> >>>In theory, existing turtle files like the attached are rendered
> >>>illegal by the post-facto declaration that they are xs:strings.
> >>>In practice, people don't enforce this (noting that these tests
> >>>existed for a while in Turtle with no one failing or crying fowl.)
> >>>
> >>
> >
--
-ericP
Received on Wednesday, 1 May 2013 12:32:40 UTC