RE: N3 paths : use of . from Seaborne, Andy on 2003-02-13 (www-rdf-interest@w3.org from February 2003)

From: Seaborne, Andy <Andy_Seaborne@hplb.hpl.hp.com>
Date: Thu, 13 Feb 2003 15:08:20 -0000
To: "'Tim Berners-Lee'" <timbl@w3.org>
Cc: "'www-rdf-interest@w3.org'" <www-rdf-interest@w3.org>
Message-ID: <5E13A1874524D411A876006008CD059F061D64CC@0-mail-1.hpl.hp.com>
   Comments inline


Tim Berners-Lee wrote:
> First of all, thank you for adding N3 support to  Jena!
> I owe you a beer.
> 
> 
> On Tuesday, Feb 11, 2003, at 10:16 US/Pacific, Seaborne, Andy wrote:
> 
> > Tim, Dan,
> >
> > I came across this while updating Jena's N3 parser:
> >
> > http://www.w3.org/2000/10/swap/doc/Shortcuts.html
> >
> > I noticed that N3 is using '.' (dot) for a path separator.  I agree 
> > that '.' is more object-like syntax; I also note that N3 isn't a 
> > standard as such.
> >
> > I don't wish to be picky but N3 is a very convenient way to
> write RDF.
> >  This
> > feature seems to require a choice between reading Qnames
> and the path
> > syntax
> > and it also reduces the useability of parser generators.
> >
> >
> > 1/ Isn't '.' is legal in QNames in the LocalPart?  XML 1.1
> allows . in
> > NameChar and XML namespaces allows it as prefix or
> LocalPart as they
> > are derived from NameChar through NCName.
> >
> 
> Yes, it is valid in XML. No, it isn't valid in N3.
>   If you use "-" or "."  cwm falls back to  <name.x>  syntax. You can 
> use "_", as in C, python and similar languages.
> 
> Motivation: Keep operators for use as operators.

OK - you can always write any URI using the <fullURI> syntax.  

The issues seem to be:

1/ Whether to align with XML Qnames 

I prefer that the user can write terms like they would in XML.  If the
vocabulary of dataset they are working with comes from XML, then the
prefixes and local parts may contains '.' It would be nice to allow the user
to write terms like they would for the XML (.e. as qnames).

This isn't essential.  This does of course require special handling for
statement terminator ".". A trailing dot on a object URI is a nasty case and
requires looking beyond the dot.


2/ Whether the double use of . (for path separator and statement
separator/terminator) is confusing.  I think it is.  But I don't see that
using "!" is ugly.


> 
> > Example: 'ex:name.x' => http://host/ns#name.x
> >
> > Might have got this wrong.  [Fortunately, while "name." is legal as
> > well, as white space is allowed before the statement separator here
> the standard
> > technique of greedy tokenizing works OK.] You can do smart stuff by
> > lexing to just break on whitespace, looking for multiple
> occurences of : or
> > <> URIs
> > in a long string but it is a bad fit to standard tools and
> the quality
> > of
> > error messages usually suffers.
> >
> 
> When you look ahead, you have to check the character after the dot for 
> being a name character (alphanumeric, _) or an opener  from "({[".
> If it is white or EOF or a closer from "})]" then it is a statement 
> terminating period.
> 
> Sorry!

That's doable even for simple recursive descent.  As expert (i.e not me)
would have to give an opinion of how this fits naturally LALR(1) systems in
practice but presumable it does.

It gets visually messy with 

  a b c.d.e . f.g.h x y.

> 
> I did use "!" but  there was an outcry from a couple of users. I 
> checked around and found happiness with a little lookahead from parser 
> people. I gather lex can handle it.
> 
> >
> > 2/ Writing parsers: I do not claim to much experise with parser
> > genrators but having a syntax that is whitespace sensitive makes it
> hard to use
> > standard parser tools antlr, javacc, yacc/flex etc.  (Don't
> know yapps
> > well
> > enough.)  Similarly, context sensitive tokenization is not
> so simple in
> > these tools.
> >
> 
> Indeed.  But possible.  Would you (as user *and* implementer!) be 
> happier with reverting to "!" ?
> 

User mode:
I don't find "!" ugly.  Given "." already has a meaning, I prefer "!" aside
from the XML qnames issue.

Implementer mode:
As an implementer of an N3 parser, I will follow "the standard" i.e.
whatever cwm does.  Alignment with cwm for data interchange of N3 is more
important that the alignment with XML Qnames.  (Aside: We don't support
formulae but the parser does cope with everything and can be used separately
- its only at the RDF generation step that errors are raised). Currently,
Jena does not write qnames with dots in and does not parse qnames with dots
in.

> 
> 
> >
> > To date, I have tuned the Jena parser so that it is able to parse
> > everything in 2000/10/swap and been able to handle all RDF URIs but 
> > on
> this one I
> > don't
> > think I can do both with a common parser generator (I use
> antlr) due
> > to the
> > ambiguities of '.' in the middle of path expressions.  I am
> not saying
> > that
> > it is technically impossible (it might be, might not - I haven't
> > checked closely enough) but it is going to require a very complex 
> > grammar stage,
> > using a near trivial lexer with whitespace processing in the parser.
> >
> Yuk.
> 
> > N3 is sufficiently small that a hand-coded parser is only
> as complex
> > as a
> > generated one but requiring hand-crafted parsers (i.e. making it
> > tricky to use std tools) means that the grammar can't be written 
> > down
> succinctly
> > (i.e.
> > in the natural style) for other developers.
> 
> Agreed.
> 
> >   From my experience, I reckon
> > the parser generator approach was slightly quicker than
> hand-crafting
> > would
> > have been.  Maintenance is less costly with a generated parser.
> >
> 
> Agreed.
> 
> >
> > 3/ The original style, '!', is good for BCPL programmers :-)
> >
> 
> Really? I was thinking of email paths!     
> timbl@vxcrna!cernvax!mcvax!fnal
> 
> There is very little use of paths out there, so I could switch it
> back. In fact the cwm parser still groks both.
> 
> I found "!" looked kinda ugly with variables
> 
> {  ?my!tt:line34 ?x } => { ?my!tf:line56 ?x }.

Bluff :-)  Does not parse. (cwm.py 1.123)  But I see what you are getting
at.

Seriously, as . has a meaning in N3 already, as statement terminator, my
preference is to use one item of punctuation for one thing.  Independent of
the XML issue.

----

It seems from cwm that paths are allowed in subject and object slots, so is
more than a string of properties - is the termination a blank node?

----

By the way:
    :X1 a:x1.a:y1 :Y1 .
generates
    this     <http://www.w3.org/2000/10/swap/log#forSome> :_g0 .
    :X1     :_g0 :Y1 .
    a:x1     a:y1 :_g0 .
which isn't a string of properties a:x1 a:y1


This
   :X2 a:x2 _:a . _:a a:y2 :Y2 .
generates:
    this     <http://www.w3.org/2000/10/swap/log#forSome> :_ga .
    :X2     a:x2 :_ga .
    :_ga     a:y2 :Y .
and
   :X3 a:x3 [ a:y3 :Y3 ].
generates
    
    :X3     a:x3  [
             a:y3 :Y3 ] .
(same but comes out differently).
Should they not all be the same structure?  

(I confess - Jena generates the nearly same as case one - and is wrong, not
RDF - the triple emission messes up and it tries for a blank property -
opps).

> 
> If you don't mind your message being public, please reply cc
> www-rdf-interest@w3.org in case anyone else has strong views.
> 
> > 	Andy
> 

See <>


[Aside:

To everyone:

PS Jena's N3 grammar is:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/jena/jena2/src/com/hp/hpl/jen
a/n3/n3.g?rev=1.3&content-type=text/vnd.viewcvs-markup

Any helpful suggestions on grammar design gratefully received.  I'm a novice
at this game. ]
Received on Thursday, 13 February 2003 10:08:30 UTC