- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Wed, 1 May 2013 08:32:12 -0400
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: public-rdf-wg@w3.org
* Andy Seaborne <andy.seaborne@epimorphics.com> [2013-05-01 13:05+0100] > > > On 01/05/13 12:48, Eric Prud'hommeaux wrote: > >* Andy Seaborne <andy.seaborne@epimorphics.com> [2013-05-01 10:19+0100] > >>gedit complains about (but displays) the attachment. > >> > >>On 01/05/13 05:52, Eric Prud'hommeaux wrote: > >>>I've noticed 6 vectors for creating literals with C0 codes > >>>(including \0): > >>> old turtle > >>> APIs > >>> SPARQL CONSTRUCT > >>> SPARQL Update > >>> RDBs via Direct Mapping > >>> RDBs via R2RML > >>>(RDB example reproducable with > >>> create table test(s text); > >>> insert into test (s) values ('a\0b'); > >>> select s, length(s) from test; > >>> +------+-----------+ > >>> | s | length(s) | > >>> +------+-----------+ > >>> | | 1 | > >>> | a b | 3 | > >>> +------+-----------+ > >> > >>? where did the first row come from? > > > >MySQL's D-entailment. ˚͜˚ > >My first insert was '\0\, but i figured that 'a\0b' would be more > >illustrative. > > > > > >>>). > >>> > >>>These can't be serialized in RDF/XML. Nor can the results of a query > >>>including this data be serialized in application/sparql-results, e.g. > >> > >>application/sparql-results+xml > > > >quite right -- tx for the correction. > > > > > >>There is also > >> > >>application/sparql-results+json > >>text/tab-separated-values > > TSV says > http://www.iana.org/assignments/media-types/text/tab-separated-values > > """ > Required Parameters: Character Set, Encoding Type > """ > > > I avoided CVS as it is not a true representation of the data but ... > > >Does text/csv permit *anything* outside of > >%x20-21 / %x23-2B / %x2D-7E / COMMA / CR / LF / 2DQUOTE ? > >— http://tools.ietf.org/html/rfc4180#page-4 > > RFC 4180 says: > """ > Common usage of CSV is US-ASCII, but other character sets defined > by IANA for the "text" tree may be used in conjunction with the > "charset" parameter. > """ > so UTF-8 is possible. Given that the grammar permits only a subset of ASCII, it seems that any ASCII-compatible encoding (JIS, UTF-8) would only express the ASCII subset. For non-ASCII-compatible encodings (UTF-16, EBCDIC), there'd be a point to the charset parameter, but it still wouldn't permit any characters outside ASCII. Or maybe the interpretation is supposed to be "if you're using a non- ASCII encoding, make up a new production for TEXTDATA." At any rate, the path to character range compatibility isn't clear to me. > >>JSON allows \u0000 - RFC 4627 refers to Unicode 4.0 > >> > >> > >>> SELECT ?icon { ?who <p> ?icon FILTER (regex(?icon, "PNG")) } > >>>They can, however, be queried in SPARQL: > >>> SELECT ?who { ?who <p> ?icon FILTER (regex(?icon, "PNG")) } > >>>(Technically, useful functions like fn:regex are based on strings, but > >>>I don't know of implementations which enforce this.) > >>> > >>>In theory, existing turtle files like the attached are rendered > >>>illegal by the post-facto declaration that they are xs:strings. > >>>In practice, people don't enforce this (noting that these tests > >>>existed for a while in Turtle with no one failing or crying fowl.) > >>> > >> > > -- -ericP
Received on Wednesday, 1 May 2013 12:32:40 UTC