W3C home > Mailing lists > Public > public-cwm-talk@w3.org > April to June 2004

Re: pls split turtle lexical details out of the grammar

From: Dave Beckett <dave.beckett@bristol.ac.uk>
Date: Tue, 4 May 2004 13:50:56 +0100
To: Dan Connolly <connolly@w3.org>
Cc: public-cwm-talk@w3.org
Message-Id: <20040504135056.361bb7e3@hoth.ilrt.bris.ac.uk>

On Tue, 13 Apr 2004 11:17:20 -0500, Dan Connolly <connolly@w3.org> wrote:

> The turtle grammar starts out all formal...
> "This EBNF is the notation used in XML 1.0 second edition over an
> alphabet of [UNICODE] characters."
>  -- http://www.ilrt.bris.ac.uk/discovery/2004/01/turtle/
> 
> but then it's not, really:
> 
> [[
> relativeURI ::= character* with escapes as defined in the N-Triples
> section 3.3 URI References. This is then used as a relative URI and
> resolved against the current base URI to give an absolute URI reference.
> 
> string ::= character* with escapes as defined in N-Triples section 3.2
> Strings
> ]]
> 
> I'd much rather have the lexical details specified separately
> and have the grammar be a real formal grammar, suitable for
> use with yacc or the equivalent.

Well, that's exactly the same form as used in RDF Test Cases REC / Ntriples.

Anyway.  I was just trying to avoid making the document a lot bigger.
EBNF is one way to present grammars but always needs translation
into lex and yacc forms since there is hardly ever a clear indication
of tokens (and that is sometimes an implementation choice).

Raptor's turtle parser does use lex and yacc but I chose to do some
of the tokens separately, such as whitespace processing rather than
follow the EBNF precisely.

The lexical details are not enough, or not always useful to express in
the EBNF, such as the escaping rules for Unicode characters, saying
that language is not significant for most RDF typed literals etc.
 
> By way of comparison/contrast, see
> 
>   http://www.w3.org/2000/10/swap/rdfn3-gram.html
>   http://www.w3.org/2000/10/swap/rdfn3.g
>   $Id: rdfn3.g,v 1.18 2002/08/15 23:20:36 connolly Exp $

That's sort of useful but omits lots of detail such as definitions
of URIREF and QNAME - see recent bugs I posted to public-cwm-bug
on the problems with leaving that lax.

> p.s. I see
> "Turtle is a work in progress, and I am looking for feedback"
> 
> but I wasn't sure where you wanted the feedback sent. I was
> going to copy www-archive, but then grammar engineering is
> reasonably high on the cwm development agenda too, so I
> copied public-cwm-talk.

Fine with me.  I was wondering if you'd be happy with using this list
for Turtle discussions, even if they digress from cwm and N3?

Dave
Received on Tuesday, 4 May 2004 08:52:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:11:01 GMT