- From: Stan Devitt <stan.devitt@agfa.com>
- Date: Fri, 17 Jun 2005 09:17:47 -0400
- To: public-rdf-dawg@w3.org
>> I can also live with no "." in [84] NCCHAR (as is actually the case for
>> cwm and jena)
>> but not as it is in 1.397
>> http://www.w3.org/2001/sw/DataAccess/rq23/#rNCCHAR
>> Hm.. actually still prefer "'.' in but not ending a QName"
>>
>>
>
> Jos,
>
> This matter came up again when the new syntax grammar was first drafted
after
> F2F5. There is a tradeoff of complex grammar (making a special case of
the last
> character, which also be the first) and being more general and more
aligned with
> XML NCNAMES. In the end it came down to being more like XML.
>
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2005AprJun/0169
that escaped my attention and I'm now very confused ...
Jos and Andy,
I understand the desire to be "more like XML" in terms of tokenization,
but I think it is a big mistake to ignore the
current heavy use of "." as a statement separator. It is okay to be more
XML like providing you deal sensibly
with this reality.
I am looking at this largely from the point of view of SPARQL being as
consistent as possible with the n3 family of grammars
and so I am considering the consequences in the whole family of languages.
Specifically I am looking at
1) avoiding use patterns that are error prone or easily misinterpreted by
authors or those reading.
2) readability in general.
3) robustness under editing.
The tokens of the form "a.b" are NOT a big problem as in general authors
don't run statements together
without intervening space and are in no danger of making accidental
mistakes. These tokens should be allowed.
The problem is specifically the "." at the end of the token. A small
search of many collection of authored triples shows
a large number of triples that follow traditional sentence structure of
ending with "." occurring with no space immediately
after the last token, and it seems to be very natural for authors to do so
given the analogy with sentence structure.
Consider the following fragments (quoted here using >> ... << to
clarify where the fragments start and stop )
1. Authors are not used to making a distinction between
>>a b c.< < and >>a b c .<<
At very least, this means that a large number of existing source documents
need to be changed.
2. The above example becomes especially problematic in places where the
use of a terminating "." is optional such as
in formulas.
>>{a b c.}<< and >>{a b c}<< and >>{a b c .}<<
Now all three are syntactically valid but we may have different meanings
all dependent on the author noticing the presence
or absence of a space.
Consider the large number of errors found in "C" programs because of the
two different meanings of "a = b" and "a == b",
This is simlar and you actually have a chance here to avoid a similar
pitfal.
3. Consider formulas spread across multiple lines.
>>{
a b c .
d e f .
}
<<
As written, you can permute the rows and not change the meaning or the
parsabilty. Now change the spacing.
>>{
a b c .
d e f .
}
<<
It still parses albeit to a different token, but I can't permute the rows
and still parse it. Also, the simple typographical
error of forgetting a space after the "f" changes the meaning entirely and
the author might not even notice .
In this example the author might even have actually intended "f." .
Either way, It also makes cut and paste more error prone.
Conclusion
Given that "." has a very sensitive and specific punctuation role in the
"n3 family" of grammars we should
strive for " XML like" rather than 100% compliance. I suggest that
reasonable here is to allow "." inside tokens,
but not at the end.
ps. I must confess that I have never seen a name of the form "a." used
in practice in XML documents.
Received on Friday, 17 June 2005 13:17:55 UTC