Re: CWM Tokenization Error from Sean B. Palmer on 2005-01-17 (public-cwm-bugs@w3.org from January 2005)

From: Sean B. Palmer <sean+cwm@infomesh.net>
Date: Mon, 17 Jan 2005 16:43:59 +0000
To: Yosi Scharf <syosi@mit.edu>
CC: public-cwm-bugs@w3.org
Message-ID: <41EBEB4F.9050700@infomesh.net>

Yosi Scharf wrote:

> Is that actually expected?

I'm fairly sure that it is, based on the documentation in n3.n3, but of
course one can easily make the case that the expected result is whatever
CWM does. Here's the relevant piece:

[[[
# tokenizing:
# Absorb anything until end of regexp, then stil white space
# [...]
#  WS MUST be inserted between tokens where ambiguity would arise.
#  (possible ending characters of one and beginning characters overlap)
]]] - http://www.w3.org/2000/10/swap/grammar/n3.n3

Since underscore isn't allowed in declarations and language codes, I
think that "@a_m" is unambiguously equivalent to "@a _m". Again,
according to the RDF BNF, underscore can start a prefixless QName.

This is also broken in n3p.py (and predictiveParser.py, from which it's
derived) as you rightly point out; 'tis next on my n3p todo list. I'd
race you to fix it, CWM vs. n3p, but I'm sure you'd win :-)

Cheers,

-- 
Sean B. Palmer, http://inamidst.com/sbp/

Received on Monday, 17 January 2005 16:44:33 UTC