- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Mon, 27 Jun 2005 08:23:48 -0400
- To: Stan Devitt <stan.devitt@agfa.com>
- Cc: public-rdf-dawg-comments@w3.org
- Message-ID: <20050627122348.GB27833@w3.org>
Moving thread to the comments list...
This is a personal reply seeking minor clarification, not a DAWG response.
On Fri, Jun 17, 2005 at 09:17:47AM -0400, Stan Devitt wrote:
>
> >> I can also live with no "." in [84] NCCHAR (as is actually the case for
>
>
> >> cwm and jena)
> >> but not as it is in 1.397
> >> http://www.w3.org/2001/sw/DataAccess/rq23/#rNCCHAR
> >> Hm.. actually still prefer "'.' in but not ending a QName"
> >>
> >>
> >
> > Jos,
> >
> > This matter came up again when the new syntax grammar was first drafted
> after
> > F2F5. There is a tradeoff of complex grammar (making a special case of
> the last
> > character, which also be the first) and being more general and more
> aligned with
> > XML NCNAMES. In the end it came down to being more like XML.
> >
> > http://lists.w3.org/Archives/Public/public-rdf-dawg/2005AprJun/0169
>
> that escaped my attention and I'm now very confused ...
>
> Jos and Andy,
>
> I understand the desire to be "more like XML" in terms of tokenization,
> but I think it is a big mistake to ignore the
> current heavy use of "." as a statement separator. It is okay to be more
> XML like providing you deal sensibly
> with this reality.
As you probably observe, we are continuing to use '.' as a statement
separator. The constraint is that SPARQL triple patterns must be
separated by " ." instead of "." .
> I am looking at this largely from the point of view of SPARQL being as
> consistent as possible with the n3 family of grammars
> and so I am considering the consequences in the whole family of languages.
> Specifically I am looking at
>
> 1) avoiding use patterns that are error prone or easily misinterpreted by
> authors or those reading.
> 2) readability in general.
> 3) robustness under editing.
>
> The tokens of the form "a.b" are NOT a big problem as in general authors
> don't run statements together
> without intervening space and are in no danger of making accidental
> mistakes. These tokens should be allowed.
I read this as an endorsement of '.'s in (but not at the end of)
triple patterns.
> The problem is specifically the "." at the end of the token. A small
> search of many collection of authored triples shows
> a large number of triples that follow traditional sentence structure of
> ending with "." occurring with no space immediately
> after the last token, and it seems to be very natural for authors to do so
> given the analogy with sentence structure.
While it is true that we may influence n3 or turtle to tokenize '.' at
the end of an NCNAME , we are only *directly* affecting the SPARQL
grammar, and not make SPARQL queries exist in the wild yet (nor are
many turtle documents machine-transformed into SPARQL queries at
present).
> Consider the following fragments (quoted here using >> ... << to
> clarify where the fragments start and stop )
>
> 1. Authors are not used to making a distinction between
>
> >>a b c.< < and >>a b c .<<
>
> At very least, this means that a large number of existing source documents
> need to be changed.
>
> 2. The above example becomes especially problematic in places where the
> use of a terminating "." is optional such as
> in formulas.
>
> >>{a b c.}<< and >>{a b c}<< and >>{a b c .}<<
>
> Now all three are syntactically valid but we may have different meanings
> all dependent on the author noticing the presence
> or absence of a space.
>
> Consider the large number of errors found in "C" programs because of the
> two different meanings of "a = b" and "a == b",
> This is simlar and you actually have a chance here to avoid a similar
> pitfal.
>
>
> 3. Consider formulas spread across multiple lines.
>
> >>{
> a b c .
> d e f .
> }
> <<
>
> As written, you can permute the rows and not change the meaning or the
> parsabilty. Now change the spacing.
>
> >>{
> a b c .
> d e f .
> }
> <<
>
> It still parses albeit to a different token, but I can't permute the rows
> and still parse it. Also, the simple typographical
> error of forgetting a space after the "f" changes the meaning entirely and
> the author might not even notice .
> In this example the author might even have actually intended "f." .
> Either way, It also makes cut and paste more error prone.
>
>
> Conclusion
>
> Given that "." has a very sensitive and specific punctuation role in the
> "n3 family" of grammars we should
> strive for " XML like" rather than 100% compliance. I suggest that
> reasonable here is to allow "." inside tokens,
> but not at the end.
>
> ps. I must confess that I have never seen a name of the form "a." used
> in practice in XML documents.
The downside of _not_ allowing "a." is that it will be impossible to
query RDF data in this form:
<rdf:Description>
<foo:bar.>value<foo:bar.>
</rdf:Description>
--
-eric
office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
Shonan Fujisawa Campus, Keio University,
5322 Endo, Fujisawa, Kanagawa 252-8520
JAPAN
+1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell: +81.90.6533.3882
(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Monday, 27 June 2005 12:23:54 UTC