Re: Oracle's stand regarding N-TRIPLES

Well, all out input documents come in as UTF-8, so it's much more work to use N-Triples if you want to escape the unicode.

In practice we tend to mark documents of triples as Turtle, to avoid the reencoding, but parsing Turtle is much, much less efficient.

- Steve

On 2011-08-20, at 02:34, Zhe Wu wrote:

> Hi Steve,
> 
> Thanks for the clarification! Now one thing I'd like to understand is that if we keep the current N-TRIPLE syntax, then presumably,
> - existing tools/platforms dealing with N-TRIPLES don't have to change (less work for engineers :)),
> - there is no risk of breaking existing client applications that accept the current N-TRIPLES syntax,
> 
> On top of everything, the existing N-TRIPLES can support all international characters. It's not like we are missing anything.
> Why fix something that is not broken?
> 
> I don't see how adding UTF8 encoding can make N-TRIPLES much more useful. I do see
> a lot of potential interoperability, backward compatibility issues that associate with a new encoding.
> 
> Thanks,
> 
> Zhe
> 
> 
> On 8/19/2011 3:40 PM, Steve Harris wrote:
>> Yes, we support N-Triples, but it's much less useful that it could be, as it doesn't support a common unicode encoding.
>> 
>> - Steve
>> 
>> On 2011-08-19, at 16:56, Zhe Wu wrote:
>> 
>>> Hi Steve,
>>> 
>>> I was under the impression that your product supported N-TRIPLES. Guess I was wrong.
>>> Adding a new format can be more efficient for one system, and can be more in-efficient for another
>>> system.
>>> 
>>> Thanks,
>>> 
>>> Zhe
>>> 
>>> On 8/19/2011 2:17 AM, Steve Harris wrote:
>>>> I agree with Jeremy.
>>>> 
>>>> For us, the lack of UTF-8 support is a serious impediment to using N-Triples as a bulk dump/restore format.
>>>> 
>>>> We use UTF-8 internally to hold RDF literals, as every other format is natively UTF-8, so the export to N-Triples requires a lot of unnecessary and inefficient escaping.
>>>> 
>>>> - Steve
>>>> 
>>>> On 2011-08-18, at 23:26, Jeremy Carroll wrote:
>>>> 
>>>>> Hi Zhe
>>>>> 
>>>>> I find this a surprisingly strong position.
>>>>> When ingesting N-Triples the code path to read UTF-8 and the code path to read \uXXXX escape sequences are probably equally horrible. The UTF-8 code path is the more conventional one to be following on the Web.
>>>>> 
>>>>> It seems like a fairly small amount of extra code for a vendor to support, with negligible impact on performance. The only downside, that I can see, would be that new data will not be readable by old software, which is the normal downside with new versions of a format.
>>>>> 
>>>>> We may differ in our judgment about how important that downside is, or I may have missed some other disadvantage that motivates Oracle's strong reaction.
>>>>> 
>>>>> My understanding is that 2004 N-triples docs will be valid turtle docs ....
>>>>> 
>>>>> Jeremy
>>>>> 
>>>>> 
>>>>> 
>>>>> On 8/18/2011 9:05 AM, Zhe Wu wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> After discussing with the whole Oracle Database Semantic Technologies team, we
>>>>>> have the following consensus within Oracle.
>>>>>> 
>>>>>> 1) The existing N-TRIPLES format [1] is key to Oracle's product;
>>>>>> 2) Oracle hasn't received from Oracle's customers any change request/suggestions regarding the current N-TRIPLES syntax;
>>>>>> 3) As a platform vendor, Oracle does not see any significant justifications to change/mend the existing syntax;
>>>>>> 
>>>>>> Hence Oracle will not support any major changes to the existing N-TRIPLE format, including
>>>>>> support for UTF-8.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Zhe&    Souri
>>>>>> 
>>>>>> [1]http://www.w3.org/TR/rdf-testcases/#ntriples  (In "RDF Test Cases: W3C Recommendation 10 February 2004")
>>>>>> 
>>>>>> 
>>> 
> 
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Saturday, 20 August 2011 10:46:44 UTC