Minus signs in N3 (again)

I remember being bothered by this when I first saw it, but it's only now
that I'm starting to try to use it for an actual (open source!) product that
I'm motivated enough to complain. And, happily, it seems that the natural
evolution of the language has left an opening for a simple resolution of
this issue.

To refresh everyone's memory, the de facto spec. for N3
(http://www.w3.org/DesignIssues/Notation3) contains the following section:

> Identifier munging
> 
> This syntax does not allow minus signs in identifiers, whereas the XML
> encoding for RDF does.
> 
> The current solution is mapping sequences of "-" and "_" into sequences of "_"
> by taking a contiguous sequence of - and _, replacing _ with 0 and - with 1,
> then adding a leading "1", taking what you have as a binary number,
> subtracting 1 from the result, and then writing that many _ signs. The mapping
> is 1:1, and maps the simple case of - onto __ and _ onto _. The only
> disadvantage is that those who go crazy with n consecutive occurrences of -
> and/or _ in XML will pay for it in an even crazier 2**n long sequence in the
> N3. (2001/12/4)
> 
> A messy thing from the N3 point of view is that for content negotiation to
> work between RDF/XML and N3, the fragment identifier sysntax must be the same,
> and this would suggest that both use the "-" (XML) form.
> It would be nice to be able to outlaw - in IDs.

The professed reason for this is that Tim wanted to reserve '-' for [future]
operators.

First of all, proper accommodation for '-' as a non-initial character in
identifiers does not preclude the use of '-' in operator names, nor require
a look-ahead lexer to parse (as erroneously asserted in an earlier post). It
only means that some whitespace will be required between identifiers and any
operators which begin with '-'. However, this would seem generally required
in order to avoid confusion, i.e., to prevent unwary people from
misunderstanding the meaning of such a string, anyway.

Besides which, N3 appears to have been pretty stable for quite some time
now, and it hasn't included an operator which begins with - since '->' (and
'<-') were dropped, apparently over a year ago.

But most damningly, the escape mechanism devised to address this issue is
not only unwieldy and unfriendly, it is not even complete: how do you escape
"a-to-_"? My guess is cwm would currently escape it as a__to___, which
collides with the encoding of "a-to__". Not to mention the fact that many N3
parsers don't seem to implement "identifier munging" at all (see
http://infomesh.net/2002/n3qname.html).

So here's hoping Tim is now ready to excise this wart from N3. If he were
serious about banishing '-' from RDF IDs, he should have fought that battle
a long time ago, back when RDF was being finalized. It would seem much too
late for that now, so the sensible thing to do is just accept it, properly
accommodate it, and move on.

And while we're at it, the same thing goes for '.'. I expect some will
reflexively protest that the use of '.' as the statement terminator makes
this problematic. They might have a leg to stand on if it weren't for the
fact that '.' is already also permitted within numeric (rational) literals.

So what do you say? Can we finally add proper support for '-' and '.' as
non-initial chars in indentifiers to N3?
-  
Kevin D. Keck
http://vimss.lbl.gov/~kdkeck/
510-486-4856

Received on Sunday, 13 July 2003 01:55:34 UTC