N-triples (1.5)

At 04:38 PM 7/19/01 +0100, Dave Beckett wrote:

>I've managed to update the document
>   http://www.w3.org/2001/sw/RDFCore/ntriples/
>with the notation changes you previously described, and with the
>issues you have brought up, along with some solutions.

A nit:  you still have:

    eoln ::= cr? lf

given that (a) email converts to CRLF in transit, and back to local 
conventions on receipt, (b) HTTP does not touch EOL sequences, (c) systems 
exist that use CR/LF (PCs), bare LF (Un*x), bare CR (Macs, I believe) then 
I think any of these may appear, however the document is created.

Thus (contrary to my earlier comments) I'd suggest:

    eoln ::= cr lf | cr | lf

(Aaron:  am I right about the Mac?)

> >>>Graham Klyne said:
><snip/>
> > If you want to stick with just US-ASCII in an N-triples file then I won't
> > fight it, but my own feeling is that it would be easier to just
> > say:  always use UTF-8 encoding.  That seems fairly future-proof.
>
>I don't mind saying N-Triples is UTF-8 since I've got code around to
>do that and it comes for free with Java and Python for example.
>However it just moves the escaping to a different level and makes it
>impossible for anyone to generate unicode characters with plain text
>(ASCII that is) editors.

If you want to have a single encoded representation for a string, that's true.

I guess I don't mind either!

>Dave said:
> > >How about just one escape \UXXXXXXXX for all chars not made available
> > >by \-escapes or used in-situ - that seems more appealing for this
> > >little syntax.
>Graham said:
> > Well, that could work too.
>
>Yes, but is it better than my other suggestions?
>I've listed all the suggestions in the updated doc.

OK, my vote is for your option 1.

My reasons, FWIW, are:

(a) to achieve a common representation for any string, and to be possible 
to create N-triples with a non-UTF-8 tools.  (The common encoded 
representation isn't so important to me, but the requirement has been 
expressed...)

(b) that the higher 32-bit code-points seem very rare, and 4 leading zeros 
is a lot of overhead for a very occasionally (if ever) required feature.

> > > > 5. eoln format
><snip/>
>Graham said:
> > I suppose, then, we must go back to allowing CRLF, LF or CR as a line
> > break, to be compatible with anything that can be served via HTTP.
>
>which was actually where I started in the first version, Doh!

Yes, I acknowledge that.  That was before I'd taken on board the fact that 
HTTP breaks the MIME text handling model that I'm most familiar 
with.  Sorry for the roundabout (but I learned something, so it wasn't a 
total waste ;-).

#g


------------
Graham Klyne
(GK@ACM.ORG)

Received on Thursday, 19 July 2001 15:52:37 UTC