- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Mon, 27 Jun 2005 08:23:48 -0400
- To: Stan Devitt <stan.devitt@agfa.com>
- Cc: public-rdf-dawg-comments@w3.org
- Message-ID: <20050627122348.GB27833@w3.org>
Moving thread to the comments list... This is a personal reply seeking minor clarification, not a DAWG response. On Fri, Jun 17, 2005 at 09:17:47AM -0400, Stan Devitt wrote: > > >> I can also live with no "." in [84] NCCHAR (as is actually the case for > > > >> cwm and jena) > >> but not as it is in 1.397 > >> http://www.w3.org/2001/sw/DataAccess/rq23/#rNCCHAR > >> Hm.. actually still prefer "'.' in but not ending a QName" > >> > >> > > > > Jos, > > > > This matter came up again when the new syntax grammar was first drafted > after > > F2F5. There is a tradeoff of complex grammar (making a special case of > the last > > character, which also be the first) and being more general and more > aligned with > > XML NCNAMES. In the end it came down to being more like XML. > > > > http://lists.w3.org/Archives/Public/public-rdf-dawg/2005AprJun/0169 > > that escaped my attention and I'm now very confused ... > > Jos and Andy, > > I understand the desire to be "more like XML" in terms of tokenization, > but I think it is a big mistake to ignore the > current heavy use of "." as a statement separator. It is okay to be more > XML like providing you deal sensibly > with this reality. As you probably observe, we are continuing to use '.' as a statement separator. The constraint is that SPARQL triple patterns must be separated by " ." instead of "." . > I am looking at this largely from the point of view of SPARQL being as > consistent as possible with the n3 family of grammars > and so I am considering the consequences in the whole family of languages. > Specifically I am looking at > > 1) avoiding use patterns that are error prone or easily misinterpreted by > authors or those reading. > 2) readability in general. > 3) robustness under editing. > > The tokens of the form "a.b" are NOT a big problem as in general authors > don't run statements together > without intervening space and are in no danger of making accidental > mistakes. These tokens should be allowed. I read this as an endorsement of '.'s in (but not at the end of) triple patterns. > The problem is specifically the "." at the end of the token. A small > search of many collection of authored triples shows > a large number of triples that follow traditional sentence structure of > ending with "." occurring with no space immediately > after the last token, and it seems to be very natural for authors to do so > given the analogy with sentence structure. While it is true that we may influence n3 or turtle to tokenize '.' at the end of an NCNAME , we are only *directly* affecting the SPARQL grammar, and not make SPARQL queries exist in the wild yet (nor are many turtle documents machine-transformed into SPARQL queries at present). > Consider the following fragments (quoted here using >> ... << to > clarify where the fragments start and stop ) > > 1. Authors are not used to making a distinction between > > >>a b c.< < and >>a b c .<< > > At very least, this means that a large number of existing source documents > need to be changed. > > 2. The above example becomes especially problematic in places where the > use of a terminating "." is optional such as > in formulas. > > >>{a b c.}<< and >>{a b c}<< and >>{a b c .}<< > > Now all three are syntactically valid but we may have different meanings > all dependent on the author noticing the presence > or absence of a space. > > Consider the large number of errors found in "C" programs because of the > two different meanings of "a = b" and "a == b", > This is simlar and you actually have a chance here to avoid a similar > pitfal. > > > 3. Consider formulas spread across multiple lines. > > >>{ > a b c . > d e f . > } > << > > As written, you can permute the rows and not change the meaning or the > parsabilty. Now change the spacing. > > >>{ > a b c . > d e f . > } > << > > It still parses albeit to a different token, but I can't permute the rows > and still parse it. Also, the simple typographical > error of forgetting a space after the "f" changes the meaning entirely and > the author might not even notice . > In this example the author might even have actually intended "f." . > Either way, It also makes cut and paste more error prone. > > > Conclusion > > Given that "." has a very sensitive and specific punctuation role in the > "n3 family" of grammars we should > strive for " XML like" rather than 100% compliance. I suggest that > reasonable here is to allow "." inside tokens, > but not at the end. > > ps. I must confess that I have never seen a name of the form "a." used > in practice in XML documents. The downside of _not_ allowing "a." is that it will be impossible to query RDF data in this form: <rdf:Description> <foo:bar.>value<foo:bar.> </rdf:Description> -- -eric office: +81.466.49.1170 W3C, Keio Research Institute at SFC, Shonan Fujisawa Campus, Keio University, 5322 Endo, Fujisawa, Kanagawa 252-8520 JAPAN +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA cell: +81.90.6533.3882 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Monday, 27 June 2005 12:23:54 UTC