Re: one more comment on STRING_LITERAL2 Re: review comments of N-Triples in the Turtle document from Gavin Carothers on 2012-03-21 (public-rdf-wg@w3.org from March 2012)

From: Gavin Carothers <gavin@carothers.name>
Date: Tue, 20 Mar 2012 17:38:28 -0700
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-wg@w3.org
Message-ID: <CAPqY83wQNK3gZAX8CFACjfVJEC7buFeRaAMsxGN3sH2qi+fUsQ@mail.gmail.com>
On Tue, Mar 20, 2012 at 12:47 PM, Andy Seaborne
<andy.seaborne@epimorphics.com> wrote:
>
>
> On 20/03/12 18:53, Zhe Wu wrote:
>>
>> Hi Gavin,
>>
>> Thanks very much for your quick response!
>>
>> One small comment. The STRING_LITERAL2 is defined as follows.
>>
>> STRING_LITERAL2   ::= '"' ( ( [^\"\\\n\r] ) | ECHAR | UCHAR )* '"'
>>
>> If I read it correctly, this allows a single quote, among many other
>> things, to be used (as is) inside a pair of double quotes.
>> A user can also put a character of ASCII code 0x12 inside a pair of
>> double quotes.
>
>
> Did you mean 0x22, a double quote?  0x12 is a control character.
>
>
>>
>> Maybe we want to restrict it a little bit?
>
>
> I think the confusion is editorial not a technical change.  It does not mean
> to exclude \" (2 characters).
>
> [^\"\\\n\r] should be [^"\#xA#xD]
>
> BNF does not have it's own escape character rules.  You need to write the
> hex for NL and CR.
>
> It actually says you can't put the letters 'n' and 'r' in directly and it
> excludes \ 5 separate times.  \n is not NL.  It's '\' and a 'n'.
>

I'm afraid the current Turtle grammar is written using the "What
Yacker parses" version of EBNF. As mentioned years ago it would be
nice to have an EBNF spec other than the little bits of it in XML.
Will clean up.

--Gavin

>> I am wondering what do you
>> think of using a table, similar to the table in 3.2 in the old test spec?
>> We can add a new column for the new N-Triple encoding. That way, users
>> can see the difference/*side by side*/.
>>
>>
>> I am not convinced that having text/turtle, and application/ntriples on
>> top of the existing
>> text/plain for the old style encoding is a good thing. What /*new
>> */features are we achieving?
>>
>>
>> Thanks,
>>
>> Zhe
>>
>>
>> On 3/20/2012 11:23 AM, Gavin Carothers wrote:
>>>
>>> On Tue, Mar 20, 2012 at 11:05 AM, Zhe Wu<alan.wu@oracle.com>  wrote:
>>>>
>>>> Hi Gavin,
>>>>
>>>> Please see my comments inline.
>>>>
>>>>
>>>>>> - Replace
>>>>>>          "N-Triples may also be provided as text/plain. When used in
>>>>>> this
>>>>>> way N-Triples must
>>>>>>          use the escaped form of any character outside US-ASCII"
>>>>>>   with
>>>>>>          "When encoded using US-ASCII as specified in section 3
>>>>>> [REF1],
>>>>>> N-Triples should
>>>>>>           be provided as text/plain."
>>>>>
>>>>> This isn't exactly true. There is nothing wrong with encoding an
>>>>> N-Triples file using US-ASCII and serving as application/ntriples. The
>>>>> relationship goes the other direction. If you want to provide
>>>>> text/plain N-Triples you MUST use US-ASCII. If you want to provide
>>>>> US-ASCII you can use either text/plain, text/turtle, or
>>>>> application/ntriples.
>>>>>
>>>> I guess my question really is what do we gain from encoding using
>>>> US-ASCII
>>>> and serving
>>>> as application/ntriples?
>>>
>>> The same bytes can served as application/ntriples, text/turtle, and
>>> text/plain and have exactly the same meaning. This is a good thing,
>>> UTF-8 is awesome like that.
>>>
>>>>
>>>>
>>>>>> - Add the following to the end of "See N-Triples Media Type for the
>>>>>> media
>>>>>> type registration form."
>>>>>>
>>>>>>   For maximum backward compatibility, users or applications may want
>>>>>> to
>>>>>> choose US-ASCII
>>>>>>   encoding to serialize N-Triples.
>>>>>
>>>>> I don't think we should recommend providing any format in US-ASCII over
>>>>> UTF-8.
>>>>>
>>>> I don't think that sentence truly recommends US-ASCII over UTF-8.  It is
>>>> important, in my opinion,
>>>> for us to point out non-trivial consequences caused by the changes we
>>>> propose.
>>>>
>>>> Assume a user serializes using UTF-8 encoding for non ASCII characters
>>>> and
>>>> the
>>>> new \ encoding for ', \b, and \f. Such a serialization will not work
>>>> with some of the existing tools, rapper 2-1.9.0 for example.
>>>>
>>>> The proposed new sentence simply makes clear one important consequence.
>>>
>>> Okay, I think I agree not sure on the exact phrasing but expanding the
>>> differences section seems like a good idea.
>>>
>>> Thanks very much for the feedback, I'll see if I can get some or all
>>> of it in to the document before the next meeting.
>>>
>>> --Gavin
>>>
>>
>
Received on Wednesday, 21 March 2012 00:38:57 UTC