- From: Boris Motik <boris.motik@comlab.ox.ac.uk>
- Date: Sat, 21 Mar 2009 22:02:02 -0000
- To: "'W3C OWL Working Group'" <public-owl-wg@w3.org>
Hello, Yevgeny Kazakov is currently trying to implement the functional-style syntax at our lab, and he has found a number of minor problems in our definitions. I present below the problems, as well as the possible solutions. Most of the problems are caused by the syntax of CURIE, which is defined like this: curie := [[prefix] ":"] irelative-ref prefix := NCName NCName := defined by XML irelative-ref: defined by the IRI spec 1. The CURIE spec is not clear regarding whether the prefix, :, and the irelative-ref in a CURIE can be separated by a whitespace. This makes parsing CURIEs such as a:b:c ambiguous, as it is not clear whether one means a:b :c or a :b:c. This problem could be solved if we made the 'curie' production a terminal and explicitly state that there should be no spaces in it. 2. We use @()^"=<>: as special characters in the spec -- that is, we use them as stand-alone terminals. Ideally, we'd want the other terminals not to contain these. This, however, is not the case: while NCName cannot contain any of these, irelative-ref can contain the characters "@=():". The latter is quite unfortunate: if you write abc) it is not clear whether the closing parenthesis is part of the irelative-ref or not. This prevents the functional-style syntax from being tokenized correctly. Another problem is that, because irelative-ref can contain :, we cannot ambiguously parse the simple CURIE "a:b". One way of parsing it is as "a", ":", and "b", but another way is to parse it as a simple irelative-ref with the value "a:b". We could fix these problems by changing the spec such that, in contrast to the CURIE spec, we allow irelative-ref to be only NCName. In this way, no CURIE can contain the dangerous characters, so we are fine. Furthermore, the grammar for CURIE becomes NCName ":" NCName, and, since NCName cannot contain ":", we can parse CURIEs correctly. 3. There is an ambiguity between CURIE and nodeID: the string _:abc can be parsed either as a single terminal matching the nodeID production, or as three terminals "_" ":" "abc" matching the CURIE production. (Note that _ is a valid NCName.) To fix this, in our version of the 'curie' production we should prevent a CURIE to start with "_:". This is OK: the actual CURIE spec says that this type of usage can be disallowed in a host language and they explicitly mention RDF. 4. There is a general problem with the fact that our reserved words match the 'curie' production; for example, "ObjectUnionOf" is a perfectly valid CURIE (even with the fixes outlined above). This is clearly a problem, as it makes our grammar not be LL(1); for example, to parse ObjectUnionOf( abc ) we need to look two tokens down the line (i.e., only after you see "(" we know that we must have been in the production for "ObjectUnionOf"). Perhaps our grammar is such that, by increasing the lookahead, we can circumvent this problem; however, I am not sure of that, and this is a really sketchy solution that is very likely to cause problems in practice. We can avoid this problem by saying that the 'curie' production MUST NOT match one of the terminal symbols; that is, instead of using a CURIE that matches to one of the terminals, one MUST spell out such CURIE as a full IRI (which is enclosed in <> and is therefore fine). 5. It is currently unclear whether "quotedString" can contain CRLF. The current definition seems to allow this, but Yevgeny was confused. We could perhaps just add a clarification that says "yes, it is allowed". Please let me know how you feel about my proposals. Regards, Boris
Received on Saturday, 21 March 2009 22:03:09 UTC