- From: Ivan Herman <ivan@w3.org>
- Date: Sun, 22 Mar 2009 09:59:20 +0100
- To: Boris Motik <boris.motik@comlab.ox.ac.uk>
- CC: 'W3C OWL Working Group' <public-owl-wg@w3.org>
- Message-ID: <49C5FDE8.1070901@w3.org>
Boris Motik wrote: > Hello, > > Yevgeny Kazakov is currently trying to implement the functional-style syntax at > our lab, and he has found a number of minor problems in our definitions. As a general rule, I would propose to get into contact with the editors of the CURIE syntax, ie, Shane (shane@aptest.com) or Mark (mark.birbeck@webBackplane.com). We should try to avoid deviation from the CURIE CR. http://www.w3.org/TR/curie You feedbacks are from implementers, so this is exactly the type of feedbacks they are looking for. > I > present below the problems, as well as the possible solutions. Most of the > problems are caused by the syntax of CURIE, which is defined like this: > > curie := [[prefix] ":"] irelative-ref > prefix := NCName > NCName := defined by XML > irelative-ref: defined by the IRI spec > > > 1. The CURIE spec is not clear regarding whether the prefix, :, and the > irelative-ref in a CURIE can be separated by a whitespace. This makes parsing > CURIEs such as a:b:c ambiguous, as it is not clear whether one means > a:b :c > or > a :b:c. > > This problem could be solved if we made the 'curie' production a terminal and > explicitly state that there should be no spaces in it. > Isn't it correct that NCName cannot contain whitespace? Than my reading of the grammar above that it is _not_ allowed to have a whitespace there... > > 2. We use @()^"=<>: as special characters in the spec -- that is, we use them as > stand-alone terminals. Ideally, we'd want the other terminals not to contain > these. This, however, is not the case: while NCName cannot contain any of these, > irelative-ref can contain the characters "@=():". The latter is quite > unfortunate: if you write > abc) > it is not clear whether the closing parenthesis is part of the irelative-ref or > not. This prevents the functional-style syntax from being tokenized correctly. > > Another problem is that, because irelative-ref can contain :, we cannot > ambiguously parse the simple CURIE "a:b". One way of parsing it is as "a", ":", > and "b", but another way is to parse it as a simple irelative-ref with the value > "a:b". > > We could fix these problems by changing the spec such that, in contrast to the > CURIE spec, we allow irelative-ref to be only NCName. In this way, no CURIE can > contain the dangerous characters, so we are fine. Furthermore, the grammar for > CURIE becomes NCName ":" NCName, and, since NCName cannot contain ":", we can > parse CURIEs correctly. > > Ouch. I see the issue. This means that some valid URI-s like http://www.w.w/#xpointer(id('a')) (from http://www.w3.org/TR/xptr-framework/) cannot be expressed as CURIES in the FS. It is not a huge deal, of course (we can always use explicit URI-s) but it is till a bit of a pain. Just exploring an alternative: what if the way we modify the syntax is to disallow reference without a prefix? Ie, we could say: curie := [prefix] ':' reference This makes what this means is that Namespace(bla=http://www.w.w/#) bla:xpointer(id('a')) is not a terminal because the prefix is there, so is :xpointer(id('afasd')) because the leading ':' is there (and the default namespace is used) and, finally, xpointer(id('a')) is a terminal because there is no prefix mechanism at all, ie, it is not a curie. I believe that the CURIE spec should allow a host language to do to that and, I believe, it does not at the moment. Maybe something to report back... > > 3. There is an ambiguity between CURIE and nodeID: the string > _:abc > can be parsed either as a single terminal matching the nodeID production, or as > three terminals "_" ":" "abc" matching the CURIE production. (Note that _ is a > valid NCName.) > > To fix this, in our version of the 'curie' production we should prevent a CURIE > to start with "_:". This is OK: the actual CURIE spec says that this type of > usage can be disallowed in a host language and they explicitly mention RDF. > I am not sure I understand. In RDFa, for example, the curie production '_:X' is used for BNodes which is in line with our definition of nodeID. CURIE allows the definition of '_:' in a specific host language as we want. So what is the problem exactly? > > 4. There is a general problem with the fact that our reserved words match the > 'curie' production; for example, "ObjectUnionOf" is a perfectly valid CURIE > (even with the fixes outlined above). This is clearly a problem, as it makes our > grammar not be LL(1); for example, to parse > ObjectUnionOf( abc ) > we need to look two tokens down the line (i.e., only after you see "(" we know > that we must have been in the production for "ObjectUnionOf"). Perhaps our > grammar is such that, by increasing the lookahead, we can circumvent this > problem; however, I am not sure of that, and this is a really sketchy solution > that is very likely to cause problems in practice. > > We can avoid this problem by saying that the 'curie' production MUST NOT match > one of the terminal symbols; that is, instead of using a CURIE that matches to > one of the terminals, one MUST spell out such CURIE as a full IRI (which is > enclosed in <> and is therefore fine). > Doesn't the approach on disallowing the reference alone solve this problem, too? > > 5. It is currently unclear whether "quotedString" can contain CRLF. The current > definition seems to allow this, but Yevgeny was confused. We could perhaps just > add a clarification that says "yes, it is allowed". > > Sure. Again, I would send this feedback to Shane and Mark. Cheers Ivan > Please let me know how you feel about my proposals. > > Regards, > > Boris > > > -- Ivan Herman, W3C Semantic Web Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 PGP Key: http://www.ivan-herman.net/pgpkey.html FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Sunday, 22 March 2009 08:59:56 UTC