Re: '.' in (but not ending) a QName from Eric Prud'hommeaux on 2005-06-27 (public-rdf-dawg-comments@w3.org from June 2005)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 27 Jun 2005 08:23:48 -0400
To: Stan Devitt <stan.devitt@agfa.com>
Cc: public-rdf-dawg-comments@w3.org
Message-ID: <20050627122348.GB27833@w3.org>
Moving thread to the comments list...

This is a personal reply seeking minor clarification, not a DAWG response.

On Fri, Jun 17, 2005 at 09:17:47AM -0400, Stan Devitt wrote:
> 
> >> I can also live with no "." in [84] NCCHAR (as is actually the case for 
> 
> 
> >> cwm and jena)
> >> but not as it is in 1.397 
> >> http://www.w3.org/2001/sw/DataAccess/rq23/#rNCCHAR
> >> Hm.. actually still prefer "'.' in but not ending a QName"
> >> 
> >> 
> >
> > Jos,
> >
> > This matter came up again when the new syntax grammar was first drafted 
> after 
> > F2F5.  There is a tradeoff of complex grammar (making a special case of 
> the last 
> > character, which also be the first) and being more general and more 
> aligned with
> > XML NCNAMES.  In the end it came down to being more like XML.
> >
> > http://lists.w3.org/Archives/Public/public-rdf-dawg/2005AprJun/0169
> 
> that escaped my attention and I'm now very confused ...
> 
> Jos and Andy,
> 
> I understand the desire to be "more like XML" in terms of tokenization, 
> but  I think it is a big mistake to ignore the 
> current heavy use of "." as a statement separator.   It is okay to be more 
> XML like providing you deal sensibly 
> with this reality.

As you probably observe, we are continuing to use '.' as a statement
separator. The constraint is that SPARQL triple patterns must be
separated by " ." instead of "." .

> I am looking at this largely from the point of view of SPARQL being as 
> consistent as possible with the n3 family of grammars
> and so I am considering the consequences in the whole family of languages. 
>   Specifically I am looking at
> 
> 1) avoiding use patterns  that are error prone or easily misinterpreted by 
> authors or those reading.
> 2) readability in general.
> 3) robustness under editing.
> 
> The tokens of the form "a.b"  are NOT a big problem as in general authors 
> don't run statements together 
> without intervening space and are in no danger of making accidental 
> mistakes.  These tokens should be allowed.

I read this as an endorsement of '.'s in (but not at the end of)
triple patterns.

> The problem  is specifically the "." at the end of the token.  A small 
> search of many collection of authored triples shows 
> a large number of triples that follow traditional sentence structure of 
> ending with  "."  occurring with no space immediately
> after the last token, and it seems to be very natural for authors to do so 
> given the analogy with sentence structure.

While it is true that we may influence n3 or turtle to tokenize '.' at
the end of an NCNAME , we are only *directly* affecting the SPARQL
grammar, and not make SPARQL queries exist in the wild yet (nor are
many turtle documents machine-transformed into SPARQL queries at
present).

> Consider the  following fragments  (quoted here using >> ... <<  to 
> clarify where the fragments start and stop )
> 
> 1.     Authors are not used to making a distinction between 
> 
>         >>a  b  c.< <      and   >>a b c  .<<
> 
> At very least, this means that a large number of existing source documents 
> need to be changed.
> 
> 2.   The above example becomes especially problematic in places where the 
> use of  a terminating "." is optional such as
> in formulas. 
> 
>                >>{a b c.}<<     and   >>{a b c}<<   and >>{a b c .}<<
> 
> Now all three are syntactically valid but we may have different meanings 
> all dependent on the author noticing the presence
> or absence of a  space. 
> 
> Consider the large number of errors found in  "C" programs because  of the 
> two different meanings of   "a = b" and  "a == b",
> This is simlar and you actually have a chance here to avoid a similar 
> pitfal.
> 
> 
> 3.   Consider formulas spread across multiple lines.
> 
> >>{
>       a   b   c  .
>      d   e    f   .
>     }
> <<
> 
> As written, you can permute the rows and not change the meaning or the 
> parsabilty.   Now change the spacing.
> 
> >>{
>       a   b   c  .
>      d   e    f .
>     }
> <<
> 
> It still parses albeit to a different token, but I can't permute the rows 
> and still parse it.  Also,  the simple typographical
> error of forgetting a space after the "f" changes the meaning entirely and 
> the author might not even notice . 
> In this example the author might even have actually intended "f." .
> Either way, It also makes cut and paste more error prone. 
> 
> 
> Conclusion
> 
> Given that "."  has a very sensitive and specific punctuation role  in the 
> "n3  family" of grammars  we should 
> strive for " XML like" rather than 100% compliance.    I suggest that 
> reasonable here is to allow "." inside tokens, 
> but not at the end.
> 
> ps.   I must  confess that I have never seen a name of the form  "a." used 
> in practice in XML documents.

The downside of _not_ allowing "a." is that it will be impossible to
query RDF data in this form:

<rdf:Description>
  <foo:bar.>value<foo:bar.>
</rdf:Description>

-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +81.90.6533.3882

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Monday, 27 June 2005 12:23:54 UTC