Re: Oracle's stand regarding N-TRIPLES from Richard Cyganiak on 2011-08-21 (public-rdf-wg@w3.org from August 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Mon, 22 Aug 2011 00:11:39 +0100
To: Zhe Wu <alan.wu@oracle.com>
Cc: Steve Harris <steve.harris@garlik.com>, public-rdf-wg@w3.org
Message-Id: <EF438140-9558-4AF3-AF48-4D9C983E4F85@cyganiak.de>

Hi Zhe,

On 21 Aug 2011, at 18:28, Zhe Wu wrote:
> Well, seeing "Großräschen" is only better when the engineer understands that particular language, isn't it?

There *are* plenty of engineers who understand that particular language.

> To me personally, seeing that encoded string is better than a string containing characters that I don't read.

Well, I'll have to take your word for that, but let me tell you this: I live in a part of the world where many languages use a mix of US-ASCII characters and other characters, and I often travel to places and meet people whose names include a few characters outside the US-ASCII range, even if I don't understand the particular language. To me, the suggestion that seeing “\u00E9” is somehow better than seeing “é” is rather absurd.

> On the Linux terminal I am using, I can't even cut & paste that string. It only gets the "Gro" portion right.

It works fine on the Linux terminal I am using. You should submit a bug report to your Linux vendor.

> I understand the perspective of developer usability and a possible escaping cost for implementation on
> some platforms. However, none of them seems to be significant enough to justify all the potential interoperability,
> and backward compatibility issues.

Someone needs to make a little utility that converts UTF-8 characters inside an N-Triples file into \u escapes. That conversion is not difficult at all -- it's already implemented in every N-Triples serializer in fact. Such a utility is all the tools that anyone will need to deal with the backward compatibility issue.

Formats that can't display Unicode characters are slowly disappearing. Eventually you'll have to support some format that natively supports Unicode anyway. Why not start tackling that problem now.

Best,
Richard

Received on Sunday, 21 August 2011 23:12:19 UTC