Re: Several minor problems in the grammar for the functional-style syntax from Bijan Parsia on 2009-03-22 (public-owl-wg@w3.org from March 2009)

From: Bijan Parsia <bparsia@cs.manchester.ac.uk>
Date: Sun, 22 Mar 2009 12:42:14 +0000
To: "Boris Motik" <boris.motik@comlab.ox.ac.uk>
Cc: "'Ivan Herman'" <ivan@w3.org>, "'W3C OWL Working Group'" <public-owl-wg@w3.org>
Message-Id: <661128B4-DCEA-48F8-8BAC-227BC81376E8@cs.manchester.ac.uk>

On 22 Mar 2009, at 09:38, Boris Motik wrote:

> Hello,
>
> A small addendum to my previous e-mail.
>
> So our biggest problem is how to treat @()^"=<> in CURIEs, and in my  
> previous
> e-mail, I suggested that we strengthen the irelative-ref production  
> and prohibit
> these characters.
>
> Another way to go would be to introduce some quoting mechanism that  
> would allow
> us to distinguish, say, ) as a part of a CURIE from ) as a delimiter  
> of the
> grammar. In fact, the IRI spec already provides for such a  
> mechanism: we could
> encode these characters as %XX. Clearly, this would make CURIEs of  
> the form
>    a:xpointer(id('a'))
> slightly more difficult to read as they would be represented as
>    a:xpointer%28id%28'a'%29%29

I was going to suggest this. Or you could use safe curies and makes  
sure [] aren't used elsewhere.

> In the context of OWL, however, I don't believe this to be a  
> problem: (1) I do
> not expect that ontologies will contain many CURIEs that will  
> contain such
> characters; furthermore, (2) such things can be processed by the  
> tools, so no
> user needs to see the %28 and %29 escape sequences.

So, basically, the options are:
	1) Forbid ambiguous characters in CURIEs, URIs terminating with those  
characters have to be represented with full URIs
	2) Require percent encoding of problem characters
	3) Require safe CURIEs (per the spec)

The problem with 2 is that URI comparison become a bit trickier. What  
do we say now? We'd have to make sure that there was a URI  
normalization phase (or only a CURIE normalization?)

The problem with 3 is that it adds a bit of logic to the CURIE parse  
phase (i.e., check for leading [, make sure theirs a trailing ])

The problem with one is a burden on serializers.

I prefer 1. I think it's the smallest change from the status quo.

Cheers,
Bijan.

Received on Sunday, 22 March 2009 12:42:57 UTC