RE: Several minor problems in the grammar for the functional-style syntax from Boris Motik on 2009-03-22 (public-owl-wg@w3.org from March 2009)

From: Boris Motik <boris.motik@comlab.ox.ac.uk>
Date: Sun, 22 Mar 2009 09:38:27 -0000
To: "'Ivan Herman'" <ivan@w3.org>
Cc: "'W3C OWL Working Group'" <public-owl-wg@w3.org>
Message-ID: <F9F911A5E33F4D82B69E905A2B5ED1C1@wolf>
Hello,

A small addendum to my previous e-mail.

So our biggest problem is how to treat @()^"=<> in CURIEs, and in my previous
e-mail, I suggested that we strengthen the irelative-ref production and prohibit
these characters.

Another way to go would be to introduce some quoting mechanism that would allow
us to distinguish, say, ) as a part of a CURIE from ) as a delimiter of the
grammar. In fact, the IRI spec already provides for such a mechanism: we could
encode these characters as %XX. Clearly, this would make CURIEs of the form
    a:xpointer(id('a'))
slightly more difficult to read as they would be represented as
    a:xpointer%28id%28'a'%29%29

In the context of OWL, however, I don't believe this to be a problem: (1) I do
not expect that ontologies will contain many CURIEs that will contain such
characters; furthermore, (2) such things can be processed by the tools, so no
user needs to see the %28 and %29 escape sequences.

Regards,

	Boris

> -----Original Message-----
> From: public-owl-wg-request@w3.org [mailto:public-owl-wg-request@w3.org] On
> Behalf Of Ivan Herman
> Sent: 22 March 2009 08:59
> To: Boris Motik
> Cc: 'W3C OWL Working Group'
> Subject: Re: Several minor problems in the grammar for the functional-style
> syntax
> 
> 
> 
> Boris Motik wrote:
> > Hello,
> >
> > Yevgeny Kazakov is currently trying to implement the functional-style syntax
> at
> > our lab, and he has found a number of minor problems in our definitions.
> 
> As a general rule, I would propose to get into contact with the editors
> of the CURIE syntax, ie, Shane (shane@aptest.com) or Mark
> (mark.birbeck@webBackplane.com). We should try to avoid deviation from
> the CURIE CR.
> 
> http://www.w3.org/TR/curie
> 
> You feedbacks are from implementers, so this is exactly the type of
> feedbacks they are looking for.
> 
> >                                                                           I
> > present below the problems, as well as the possible solutions. Most of the
> > problems are caused by the syntax of CURIE, which is defined like this:
> >
> > curie := [[prefix] ":"] irelative-ref
> > prefix := NCName
> > NCName := defined by XML
> > irelative-ref: defined by the IRI spec
> >
> >
> > 1. The CURIE spec is not clear regarding whether the prefix, :, and the
> > irelative-ref in a CURIE can be separated by a whitespace. This makes
> parsing
> > CURIEs such as a:b:c ambiguous, as it is not clear whether one means
> >     a:b :c
> > or
> >     a :b:c.
> >
> > This problem could be solved if we made the 'curie' production a terminal
> and
> > explicitly state that there should be no spaces in it.
> >
> 
> Isn't it correct that NCName cannot contain whitespace? Than my reading
> of the grammar above that it is _not_ allowed to have a whitespace there...
> 
> >
> > 2. We use @()^"=<>: as special characters in the spec -- that is, we use
> them as
> > stand-alone terminals. Ideally, we'd want the other terminals not to contain
> > these. This, however, is not the case: while NCName cannot contain any of
> these,
> > irelative-ref can contain the characters "@=():". The latter is quite
> > unfortunate: if you write
> >    abc)
> > it is not clear whether the closing parenthesis is part of the irelative-ref
> or
> > not. This prevents the functional-style syntax from being tokenized
> correctly.
> >
> > Another problem is that, because irelative-ref can contain :, we cannot
> > ambiguously parse the simple CURIE "a:b". One way of parsing it is as "a",
> ":",
> > and "b", but another way is to parse it as a simple irelative-ref with the
> value
> > "a:b".
> >
> > We could fix these problems by changing the spec such that, in contrast to
> the
> > CURIE spec, we allow irelative-ref to be only NCName. In this way, no CURIE
> can
> > contain the dangerous characters, so we are fine. Furthermore, the grammar
> for
> > CURIE becomes NCName ":" NCName, and, since NCName cannot contain ":", we
> can
> > parse CURIEs correctly.
> >
> >
> 
> Ouch. I see the issue. This means that some valid URI-s like
> 
> http://www.w.w/#xpointer(id('a'))
> 
> (from http://www.w3.org/TR/xptr-framework/)
> 
> cannot be expressed as CURIES in the FS. It is not a huge deal, of
> course (we can always use explicit URI-s) but it is till a bit of a pain.
> 
> Just exploring an alternative: what if the way we modify the syntax is
> to disallow reference without a prefix? Ie, we could say:
> 
> curie := [prefix] ':' reference
> 
> This makes what this means is that
> 
> Namespace(bla=http://www.w.w/#)
> bla:xpointer(id('a'))
> 
> is not a terminal because the prefix is there, so is
> 
> :xpointer(id('afasd'))
> 
> because the leading ':' is there (and the default namespace is used)
> and, finally,
> 
> xpointer(id('a'))
> 
> is a terminal because there is no prefix mechanism at all, ie, it is not
> a curie.
> 
> I believe that the CURIE spec should allow a host language to do to that
> and, I believe, it does not at the moment. Maybe something to report back...
> 
> >
> > 3. There is an ambiguity between CURIE and nodeID: the string
> >     _:abc
> > can be parsed either as a single terminal matching the nodeID production, or
> as
> > three terminals "_" ":" "abc" matching the CURIE production. (Note that _ is
> a
> > valid NCName.)
> >
> > To fix this, in our version of the 'curie' production we should prevent a
> CURIE
> > to start with "_:". This is OK: the actual CURIE spec says that this type of
> > usage can be disallowed in a host language and they explicitly mention RDF.
> >
> 
> I am not sure I understand. In RDFa, for example, the curie production
> '_:X' is used for BNodes which is in line with our definition of nodeID.
> CURIE allows the definition of '_:' in a specific host language as we
> want. So what is the problem exactly?
> 
> >
> > 4. There is a general problem with the fact that our reserved words match
> the
> > 'curie' production; for example, "ObjectUnionOf" is a perfectly valid CURIE
> > (even with the fixes outlined above). This is clearly a problem, as it makes
> our
> > grammar not be LL(1); for example, to parse
> >     ObjectUnionOf( abc )
> > we need to look two tokens down the line (i.e., only after you see "(" we
> know
> > that we must have been in the production for "ObjectUnionOf"). Perhaps our
> > grammar is such that, by increasing the lookahead, we can circumvent this
> > problem; however, I am not sure of that, and this is a really sketchy
> solution
> > that is very likely to cause problems in practice.
> >
> > We can avoid this problem by saying that the 'curie' production MUST NOT
> match
> > one of the terminal symbols; that is, instead of using a CURIE that matches
> to
> > one of the terminals, one MUST spell out such CURIE as a full IRI (which is
> > enclosed in <> and is therefore fine).
> >
> 
> Doesn't the approach on disallowing the reference alone solve this
> problem, too?
> 
> 
> >
> > 5. It is currently unclear whether "quotedString" can contain CRLF. The
> current
> > definition seems to allow this, but Yevgeny was confused. We could perhaps
> just
> > add a clarification that says "yes, it is allowed".
> >
> >
> 
> Sure. Again, I would send this feedback to Shane and Mark.
> 
> Cheers
> 
> Ivan
> 
> > Please let me know how you feel about my proposals.
> >
> > Regards,
> >
> > 	Boris
> >
> >
> >
> 
> --
> 
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Sunday, 22 March 2009 09:39:37 UTC