- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Fri, 18 May 2012 15:31:49 +0100
- To: RDF-WG <public-rdf-wg@w3.org>
On 18/05/12 14:45, Sandro Hawke wrote: > On Fri, 2012-05-18 at 14:08 +0100, Andy Seaborne wrote: >> >> On 18/05/12 13:12, Eric Prud'hommeaux wrote: >>> * Richard Cyganiak<richard@cyganiak.de> [2012-05-18 12:35+0100] >>>> >>>> On 18 May 2012, at 11:34, Eric Prud'hommeaux wrote: >>>>> Does the existing body of N-Triples permit a grammar with no default whitespace rules? >>>>> >>>>> triples: triple (LF triple)* LF? >>>>> triple: subject HWS predicate HWS object '.' >>>>> >>>>> I.e, do all the N-Triples out there look like "<s> <p> <o>."? >>>> >>>> This is what N-Triples as currently defined requires. Isn't that sufficient? >>>> >>>> ntripleDoc ::= line* >>>> line ::= ws* ( comment | triple )? eoln >>>> triple ::= subject ws+ predicate ws+ object ws* '.' ws* >>>> ws ::= space | tab >>>> eoln ::= cr | lf | cr lf >>> >>> I was just interested to see how much your SHOULD: >>> [[ >>> * Richard Cyganiak<richard@cyganiak.de> [2012-05-18 11:06+0100] >>>> I would even go one step further and add some SHOULD-level guidance on where to put what whitespace. Perhaps something like: exactly one space between s and p; exactly one space between p and o; no WS before or after the period; no WS at >>> the start of a line; CR+LF as EOL. >>> ]] >>> could be turned into a MUST. >> >> I don't think a MUST is a good idea, partially because it's too late, >> but also despite being a dump format, it's not pure binary. Blank lines >> and comments do have a roll here and the CR+LF is a mild inconvenience >> in some text tools. >> >> There is variance in IRIs so from that point alone, NT has variations >> enough to stop blindly processing with line-based tools. I've seen the >> :80 thing in messy data. >> >> processing based on appearance needs an extra step to be safe at scale >> (i.e. not need checking afterwards). >> >> What a canonical form is good for is as a target for a simple tools to >> process and output. Hopefully, then tool makers will provide it by user >> demand. > > So, no one would be writing a parser for n-triples that ONLY did > canonical n-triples. (At the point where you're writing something that > can keep a table of b-nodes labels, scanning over multiple spaces > between the subject and the predicate is pretty easy.) But we'd say > people SHOULD output "canonical" n-triples so that plain-text > RDF-unaware tools like sort and grep would work. Is that the > proposal? I wrote the original message for this thread as feedback on the nearly-LC-publishable Turtle document. I included a way to resolve the confusion that has been pointed out to me elsewhere. When Richard mentioned SHOULD-guidance on where to put what whitespace, I took that and tried to list the areas that I thought needed covering IF line based tools were to process N-triples in an RDF-unaware fashion - that is, manipulating bytes. It is a usage we have discucssed here before. I am suggesting we could describe a canonical form so people can use if they want. An extra step to get NT to that canonical form may be needed when working at scale because it's a nuisance to find the billion and first triple is formatted differently. That is not going as far as SHOULD (= "there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.") . I'm happy with not saying anything. . My tools do output that form as far as I know (I've not checked today) except it's ' .' because I like that. . When I write such line-based tools on a case-by-case basis Andy > > -- Sandro > > >> Andy >> >>> >>> >>>> Richard >>>> >>>> >>>> >>>>> I note that Oracle has been vigilent about preserving backwards-compatibility. Souri, do you have a sense of what Oracle has been using? >>>>> >>>>>> I also note that RDF 2004 N-Triples allows comments (only at the start of a line). This makes sense for the use as a test case format, but not much sense for the use as a dump format. >>>>>> >>>>>> Best, >>>>>> Richard >>>>>> >>>>>> >>>>>> [1] http://www.w3.org/TR/rdf-testcases/#ntriples >>>>>> >>>>>> >>>>>> >>>>>> On 18 May 2012, at 10:04, Andy Seaborne wrote: >>>>>> >>>>>>> Gavin, Eric, >>>>>>> >>>>>>> rdf-turtle says: >>>>>>> >>>>>>> [1] ntriplesDoc ::= (triple)? (EOL triple)* (EOL)? >>>>>>> [2] triple ::= subject predicate object '.' >>>>>>> [8] EOL ::= ([#xD#xA])+ >>>>>>> >>>>>>> What are the white space rules? >>>>>>> >>>>>>> Does it inherit white space processing from the rest of Turtle? Comments seem to come from Turtle. >>>>>>> >>>>>>> If it does not inherit white space rules, >>>>>>> what about horizontal white space inside triples? >>>>>>> >>>>>>> If it does inherit white space rules, >>>>>>> that includes newlines within triples between S/P or P/O. >>>>>>> >>>>>>> The simplest solution is to add text in section 12.3 to say that horizontal white space outside tokens is discarded (which is different to Turtle). >>>>>>> >>>>>>> Andy >>>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> -ericP >>>>> >>>> >>> >> >> > > >
Received on Friday, 18 May 2012 14:33:45 UTC