Re: '.' in (but not ending) a QName

>> I can also live with no "." in [84] NCCHAR (as is actually the case for 


>> cwm and jena)
>> but not as it is in 1.397 
>> http://www.w3.org/2001/sw/DataAccess/rq23/#rNCCHAR
>> Hm.. actually still prefer "'.' in but not ending a QName"
>> 
>> 
>
> Jos,
>
> This matter came up again when the new syntax grammar was first drafted 
after 
> F2F5.  There is a tradeoff of complex grammar (making a special case of 
the last 
> character, which also be the first) and being more general and more 
aligned with
> XML NCNAMES.  In the end it came down to being more like XML.
>
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2005AprJun/0169

that escaped my attention and I'm now very confused ...

Jos and Andy,

I understand the desire to be "more like XML" in terms of tokenization, 
but  I think it is a big mistake to ignore the 
current heavy use of "." as a statement separator.   It is okay to be more 
XML like providing you deal sensibly 
with this reality.

I am looking at this largely from the point of view of SPARQL being as 
consistent as possible with the n3 family of grammars
and so I am considering the consequences in the whole family of languages. 
  Specifically I am looking at

1) avoiding use patterns  that are error prone or easily misinterpreted by 
authors or those reading.
2) readability in general.
3) robustness under editing.

The tokens of the form "a.b"  are NOT a big problem as in general authors 
don't run statements together 
without intervening space and are in no danger of making accidental 
mistakes.  These tokens should be allowed.

The problem  is specifically the "." at the end of the token.  A small 
search of many collection of authored triples shows 
a large number of triples that follow traditional sentence structure of 
ending with  "."  occurring with no space immediately
after the last token, and it seems to be very natural for authors to do so 
given the analogy with sentence structure.

Consider the  following fragments  (quoted here using >> ... <<  to 
clarify where the fragments start and stop )

1.     Authors are not used to making a distinction between 

        >>a  b  c.< <      and   >>a b c  .<<

At very least, this means that a large number of existing source documents 
need to be changed.

2.   The above example becomes especially problematic in places where the 
use of  a terminating "." is optional such as
in formulas. 

               >>{a b c.}<<     and   >>{a b c}<<   and >>{a b c .}<<

Now all three are syntactically valid but we may have different meanings 
all dependent on the author noticing the presence
or absence of a  space. 

Consider the large number of errors found in  "C" programs because  of the 
two different meanings of   "a = b" and  "a == b",
This is simlar and you actually have a chance here to avoid a similar 
pitfal.


3.   Consider formulas spread across multiple lines.

>>{
      a   b   c  .
     d   e    f   .
    }
<<

As written, you can permute the rows and not change the meaning or the 
parsabilty.   Now change the spacing.

>>{
      a   b   c  .
     d   e    f .
    }
<<

It still parses albeit to a different token, but I can't permute the rows 
and still parse it.  Also,  the simple typographical
error of forgetting a space after the "f" changes the meaning entirely and 
the author might not even notice . 
In this example the author might even have actually intended "f." .
Either way, It also makes cut and paste more error prone. 


Conclusion

Given that "."  has a very sensitive and specific punctuation role  in the 
"n3  family" of grammars  we should 
strive for " XML like" rather than 100% compliance.    I suggest that 
reasonable here is to allow "." inside tokens, 
but not at the end.

ps.   I must  confess that I have never seen a name of the form  "a." used 
in practice in XML documents.



 

Received on Friday, 17 June 2005 13:17:55 UTC