- From: Stan Devitt <stan.devitt@agfa.com>
- Date: Fri, 17 Jun 2005 09:17:47 -0400
- To: public-rdf-dawg@w3.org
>> I can also live with no "." in [84] NCCHAR (as is actually the case for >> cwm and jena) >> but not as it is in 1.397 >> http://www.w3.org/2001/sw/DataAccess/rq23/#rNCCHAR >> Hm.. actually still prefer "'.' in but not ending a QName" >> >> > > Jos, > > This matter came up again when the new syntax grammar was first drafted after > F2F5. There is a tradeoff of complex grammar (making a special case of the last > character, which also be the first) and being more general and more aligned with > XML NCNAMES. In the end it came down to being more like XML. > > http://lists.w3.org/Archives/Public/public-rdf-dawg/2005AprJun/0169 that escaped my attention and I'm now very confused ... Jos and Andy, I understand the desire to be "more like XML" in terms of tokenization, but I think it is a big mistake to ignore the current heavy use of "." as a statement separator. It is okay to be more XML like providing you deal sensibly with this reality. I am looking at this largely from the point of view of SPARQL being as consistent as possible with the n3 family of grammars and so I am considering the consequences in the whole family of languages. Specifically I am looking at 1) avoiding use patterns that are error prone or easily misinterpreted by authors or those reading. 2) readability in general. 3) robustness under editing. The tokens of the form "a.b" are NOT a big problem as in general authors don't run statements together without intervening space and are in no danger of making accidental mistakes. These tokens should be allowed. The problem is specifically the "." at the end of the token. A small search of many collection of authored triples shows a large number of triples that follow traditional sentence structure of ending with "." occurring with no space immediately after the last token, and it seems to be very natural for authors to do so given the analogy with sentence structure. Consider the following fragments (quoted here using >> ... << to clarify where the fragments start and stop ) 1. Authors are not used to making a distinction between >>a b c.< < and >>a b c .<< At very least, this means that a large number of existing source documents need to be changed. 2. The above example becomes especially problematic in places where the use of a terminating "." is optional such as in formulas. >>{a b c.}<< and >>{a b c}<< and >>{a b c .}<< Now all three are syntactically valid but we may have different meanings all dependent on the author noticing the presence or absence of a space. Consider the large number of errors found in "C" programs because of the two different meanings of "a = b" and "a == b", This is simlar and you actually have a chance here to avoid a similar pitfal. 3. Consider formulas spread across multiple lines. >>{ a b c . d e f . } << As written, you can permute the rows and not change the meaning or the parsabilty. Now change the spacing. >>{ a b c . d e f . } << It still parses albeit to a different token, but I can't permute the rows and still parse it. Also, the simple typographical error of forgetting a space after the "f" changes the meaning entirely and the author might not even notice . In this example the author might even have actually intended "f." . Either way, It also makes cut and paste more error prone. Conclusion Given that "." has a very sensitive and specific punctuation role in the "n3 family" of grammars we should strive for " XML like" rather than 100% compliance. I suggest that reasonable here is to allow "." inside tokens, but not at the end. ps. I must confess that I have never seen a name of the form "a." used in practice in XML documents.
Received on Friday, 17 June 2005 13:17:55 UTC