- From: Gavin Carothers <gavin@topquadrant.com>
- Date: Wed, 20 Jul 2011 10:08:25 -0700
- To: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
- Cc: "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
On Wed, Jul 20, 2011 at 9:26 AM, Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr> wrote: > Hi all, > > it seems that I could not make myself clear during today's telecon, on > my concern about making N-Triples utf-8 compliant. > > Don't get me wrong: I would *love* to see N-Triples support utf-8 (and > get the universe rid of ASCII, for that matter ;) > > But my concern is that: > * N-Triples will still support \uXXXX escaping +1, all current N-Triple files MUST continue to be valid Turtle and N-Triple files. > * so there would be several ways to serialize a literal in N-Triples Ah, yeah that is more of an issue. > > I could perfectly live with that, but I think that one use-case of > N-Triples is to be processed by RDF-unaware tools, such as grep, sed or > sort. > > I know those tools have perfect utf-8 support; but they don't know that > "\u00e9" is the same as "é". So if I'm grep'ing an N-Triples file for > the string "trouvé", I may miss it if it is spelled "trouv\u00e9". Already true today. But yeah I guess you could get SOME hits in UTF-8 and some literals that are escaped. > And > if I'm sort'ing the triples, the escaped characteres will not be > interpreted, and so get wrongly sorted. This is already sort of true with new lines. You have to remember that new lines always have to be escaped. > > This is my concern in making N-Triples utf-8 compliant: we loose the > good property it had to have exactly one way of serializing a given graph. While I agree we need this, N-Triples does NOT define one. Blank node labelling is annoying. > > Would that be possible to specify that \uXXXX escaping can only be used > in ASCII files, while UTF-8 files *must* use the UTF-8 encoding? The "what if humans want to write an annoying code point to type?" issue doesn't really seem to apply to N-Triples? I'm sort of okay with this on the serialization side totally not okay with not allowing mixed mode files to be parsed. Perhaps just a best practice note to implementers? --Gavin
Received on Wednesday, 20 July 2011 17:08:56 UTC