- From: Boris Motik <boris.motik@comlab.ox.ac.uk>
- Date: Sun, 22 Mar 2009 09:38:27 -0000
- To: "'Ivan Herman'" <ivan@w3.org>
- Cc: "'W3C OWL Working Group'" <public-owl-wg@w3.org>
Hello, A small addendum to my previous e-mail. So our biggest problem is how to treat @()^"=<> in CURIEs, and in my previous e-mail, I suggested that we strengthen the irelative-ref production and prohibit these characters. Another way to go would be to introduce some quoting mechanism that would allow us to distinguish, say, ) as a part of a CURIE from ) as a delimiter of the grammar. In fact, the IRI spec already provides for such a mechanism: we could encode these characters as %XX. Clearly, this would make CURIEs of the form a:xpointer(id('a')) slightly more difficult to read as they would be represented as a:xpointer%28id%28'a'%29%29 In the context of OWL, however, I don't believe this to be a problem: (1) I do not expect that ontologies will contain many CURIEs that will contain such characters; furthermore, (2) such things can be processed by the tools, so no user needs to see the %28 and %29 escape sequences. Regards, Boris > -----Original Message----- > From: public-owl-wg-request@w3.org [mailto:public-owl-wg-request@w3.org] On > Behalf Of Ivan Herman > Sent: 22 March 2009 08:59 > To: Boris Motik > Cc: 'W3C OWL Working Group' > Subject: Re: Several minor problems in the grammar for the functional-style > syntax > > > > Boris Motik wrote: > > Hello, > > > > Yevgeny Kazakov is currently trying to implement the functional-style syntax > at > > our lab, and he has found a number of minor problems in our definitions. > > As a general rule, I would propose to get into contact with the editors > of the CURIE syntax, ie, Shane (shane@aptest.com) or Mark > (mark.birbeck@webBackplane.com). We should try to avoid deviation from > the CURIE CR. > > http://www.w3.org/TR/curie > > You feedbacks are from implementers, so this is exactly the type of > feedbacks they are looking for. > > > I > > present below the problems, as well as the possible solutions. Most of the > > problems are caused by the syntax of CURIE, which is defined like this: > > > > curie := [[prefix] ":"] irelative-ref > > prefix := NCName > > NCName := defined by XML > > irelative-ref: defined by the IRI spec > > > > > > 1. The CURIE spec is not clear regarding whether the prefix, :, and the > > irelative-ref in a CURIE can be separated by a whitespace. This makes > parsing > > CURIEs such as a:b:c ambiguous, as it is not clear whether one means > > a:b :c > > or > > a :b:c. > > > > This problem could be solved if we made the 'curie' production a terminal > and > > explicitly state that there should be no spaces in it. > > > > Isn't it correct that NCName cannot contain whitespace? Than my reading > of the grammar above that it is _not_ allowed to have a whitespace there... > > > > > 2. We use @()^"=<>: as special characters in the spec -- that is, we use > them as > > stand-alone terminals. Ideally, we'd want the other terminals not to contain > > these. This, however, is not the case: while NCName cannot contain any of > these, > > irelative-ref can contain the characters "@=():". The latter is quite > > unfortunate: if you write > > abc) > > it is not clear whether the closing parenthesis is part of the irelative-ref > or > > not. This prevents the functional-style syntax from being tokenized > correctly. > > > > Another problem is that, because irelative-ref can contain :, we cannot > > ambiguously parse the simple CURIE "a:b". One way of parsing it is as "a", > ":", > > and "b", but another way is to parse it as a simple irelative-ref with the > value > > "a:b". > > > > We could fix these problems by changing the spec such that, in contrast to > the > > CURIE spec, we allow irelative-ref to be only NCName. In this way, no CURIE > can > > contain the dangerous characters, so we are fine. Furthermore, the grammar > for > > CURIE becomes NCName ":" NCName, and, since NCName cannot contain ":", we > can > > parse CURIEs correctly. > > > > > > Ouch. I see the issue. This means that some valid URI-s like > > http://www.w.w/#xpointer(id('a')) > > (from http://www.w3.org/TR/xptr-framework/) > > cannot be expressed as CURIES in the FS. It is not a huge deal, of > course (we can always use explicit URI-s) but it is till a bit of a pain. > > Just exploring an alternative: what if the way we modify the syntax is > to disallow reference without a prefix? Ie, we could say: > > curie := [prefix] ':' reference > > This makes what this means is that > > Namespace(bla=http://www.w.w/#) > bla:xpointer(id('a')) > > is not a terminal because the prefix is there, so is > > :xpointer(id('afasd')) > > because the leading ':' is there (and the default namespace is used) > and, finally, > > xpointer(id('a')) > > is a terminal because there is no prefix mechanism at all, ie, it is not > a curie. > > I believe that the CURIE spec should allow a host language to do to that > and, I believe, it does not at the moment. Maybe something to report back... > > > > > 3. There is an ambiguity between CURIE and nodeID: the string > > _:abc > > can be parsed either as a single terminal matching the nodeID production, or > as > > three terminals "_" ":" "abc" matching the CURIE production. (Note that _ is > a > > valid NCName.) > > > > To fix this, in our version of the 'curie' production we should prevent a > CURIE > > to start with "_:". This is OK: the actual CURIE spec says that this type of > > usage can be disallowed in a host language and they explicitly mention RDF. > > > > I am not sure I understand. In RDFa, for example, the curie production > '_:X' is used for BNodes which is in line with our definition of nodeID. > CURIE allows the definition of '_:' in a specific host language as we > want. So what is the problem exactly? > > > > > 4. There is a general problem with the fact that our reserved words match > the > > 'curie' production; for example, "ObjectUnionOf" is a perfectly valid CURIE > > (even with the fixes outlined above). This is clearly a problem, as it makes > our > > grammar not be LL(1); for example, to parse > > ObjectUnionOf( abc ) > > we need to look two tokens down the line (i.e., only after you see "(" we > know > > that we must have been in the production for "ObjectUnionOf"). Perhaps our > > grammar is such that, by increasing the lookahead, we can circumvent this > > problem; however, I am not sure of that, and this is a really sketchy > solution > > that is very likely to cause problems in practice. > > > > We can avoid this problem by saying that the 'curie' production MUST NOT > match > > one of the terminal symbols; that is, instead of using a CURIE that matches > to > > one of the terminals, one MUST spell out such CURIE as a full IRI (which is > > enclosed in <> and is therefore fine). > > > > Doesn't the approach on disallowing the reference alone solve this > problem, too? > > > > > > 5. It is currently unclear whether "quotedString" can contain CRLF. The > current > > definition seems to allow this, but Yevgeny was confused. We could perhaps > just > > add a clarification that says "yes, it is allowed". > > > > > > Sure. Again, I would send this feedback to Shane and Mark. > > Cheers > > Ivan > > > Please let me know how you feel about my proposals. > > > > Regards, > > > > Boris > > > > > > > > -- > > Ivan Herman, W3C Semantic Web Activity Lead > Home: http://www.w3.org/People/Ivan/ > mobile: +31-641044153 > PGP Key: http://www.ivan-herman.net/pgpkey.html > FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Sunday, 22 March 2009 09:39:37 UTC