- From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
- Date: Wed, 20 Jul 2011 18:26:30 +0200
- To: "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
Hi all, it seems that I could not make myself clear during today's telecon, on my concern about making N-Triples utf-8 compliant. Don't get me wrong: I would *love* to see N-Triples support utf-8 (and get the universe rid of ASCII, for that matter ;) But my concern is that: * N-Triples will still support \uXXXX escaping * so there would be several ways to serialize a literal in N-Triples I could perfectly live with that, but I think that one use-case of N-Triples is to be processed by RDF-unaware tools, such as grep, sed or sort. I know those tools have perfect utf-8 support; but they don't know that "\u00e9" is the same as "é". So if I'm grep'ing an N-Triples file for the string "trouvé", I may miss it if it is spelled "trouv\u00e9". And if I'm sort'ing the triples, the escaped characteres will not be interpreted, and so get wrongly sorted. This is my concern in making N-Triples utf-8 compliant: we loose the good property it had to have exactly one way of serializing a given graph. Would that be possible to specify that \uXXXX escaping can only be used in ASCII files, while UTF-8 files *must* use the UTF-8 encoding? pa
Received on Wednesday, 20 July 2011 16:27:14 UTC