- From: Ivan Herman <ivan@w3.org>
- Date: Sun, 22 Mar 2009 09:59:20 +0100
- To: Boris Motik <boris.motik@comlab.ox.ac.uk>
- CC: 'W3C OWL Working Group' <public-owl-wg@w3.org>
- Message-ID: <49C5FDE8.1070901@w3.org>
Boris Motik wrote:
> Hello,
>
> Yevgeny Kazakov is currently trying to implement the functional-style syntax at
> our lab, and he has found a number of minor problems in our definitions.
As a general rule, I would propose to get into contact with the editors
of the CURIE syntax, ie, Shane (shane@aptest.com) or Mark
(mark.birbeck@webBackplane.com). We should try to avoid deviation from
the CURIE CR.
http://www.w3.org/TR/curie
You feedbacks are from implementers, so this is exactly the type of
feedbacks they are looking for.
> I
> present below the problems, as well as the possible solutions. Most of the
> problems are caused by the syntax of CURIE, which is defined like this:
>
> curie := [[prefix] ":"] irelative-ref
> prefix := NCName
> NCName := defined by XML
> irelative-ref: defined by the IRI spec
>
>
> 1. The CURIE spec is not clear regarding whether the prefix, :, and the
> irelative-ref in a CURIE can be separated by a whitespace. This makes parsing
> CURIEs such as a:b:c ambiguous, as it is not clear whether one means
> a:b :c
> or
> a :b:c.
>
> This problem could be solved if we made the 'curie' production a terminal and
> explicitly state that there should be no spaces in it.
>
Isn't it correct that NCName cannot contain whitespace? Than my reading
of the grammar above that it is _not_ allowed to have a whitespace there...
>
> 2. We use @()^"=<>: as special characters in the spec -- that is, we use them as
> stand-alone terminals. Ideally, we'd want the other terminals not to contain
> these. This, however, is not the case: while NCName cannot contain any of these,
> irelative-ref can contain the characters "@=():". The latter is quite
> unfortunate: if you write
> abc)
> it is not clear whether the closing parenthesis is part of the irelative-ref or
> not. This prevents the functional-style syntax from being tokenized correctly.
>
> Another problem is that, because irelative-ref can contain :, we cannot
> ambiguously parse the simple CURIE "a:b". One way of parsing it is as "a", ":",
> and "b", but another way is to parse it as a simple irelative-ref with the value
> "a:b".
>
> We could fix these problems by changing the spec such that, in contrast to the
> CURIE spec, we allow irelative-ref to be only NCName. In this way, no CURIE can
> contain the dangerous characters, so we are fine. Furthermore, the grammar for
> CURIE becomes NCName ":" NCName, and, since NCName cannot contain ":", we can
> parse CURIEs correctly.
>
>
Ouch. I see the issue. This means that some valid URI-s like
http://www.w.w/#xpointer(id('a'))
(from http://www.w3.org/TR/xptr-framework/)
cannot be expressed as CURIES in the FS. It is not a huge deal, of
course (we can always use explicit URI-s) but it is till a bit of a pain.
Just exploring an alternative: what if the way we modify the syntax is
to disallow reference without a prefix? Ie, we could say:
curie := [prefix] ':' reference
This makes what this means is that
Namespace(bla=http://www.w.w/#)
bla:xpointer(id('a'))
is not a terminal because the prefix is there, so is
:xpointer(id('afasd'))
because the leading ':' is there (and the default namespace is used)
and, finally,
xpointer(id('a'))
is a terminal because there is no prefix mechanism at all, ie, it is not
a curie.
I believe that the CURIE spec should allow a host language to do to that
and, I believe, it does not at the moment. Maybe something to report back...
>
> 3. There is an ambiguity between CURIE and nodeID: the string
> _:abc
> can be parsed either as a single terminal matching the nodeID production, or as
> three terminals "_" ":" "abc" matching the CURIE production. (Note that _ is a
> valid NCName.)
>
> To fix this, in our version of the 'curie' production we should prevent a CURIE
> to start with "_:". This is OK: the actual CURIE spec says that this type of
> usage can be disallowed in a host language and they explicitly mention RDF.
>
I am not sure I understand. In RDFa, for example, the curie production
'_:X' is used for BNodes which is in line with our definition of nodeID.
CURIE allows the definition of '_:' in a specific host language as we
want. So what is the problem exactly?
>
> 4. There is a general problem with the fact that our reserved words match the
> 'curie' production; for example, "ObjectUnionOf" is a perfectly valid CURIE
> (even with the fixes outlined above). This is clearly a problem, as it makes our
> grammar not be LL(1); for example, to parse
> ObjectUnionOf( abc )
> we need to look two tokens down the line (i.e., only after you see "(" we know
> that we must have been in the production for "ObjectUnionOf"). Perhaps our
> grammar is such that, by increasing the lookahead, we can circumvent this
> problem; however, I am not sure of that, and this is a really sketchy solution
> that is very likely to cause problems in practice.
>
> We can avoid this problem by saying that the 'curie' production MUST NOT match
> one of the terminal symbols; that is, instead of using a CURIE that matches to
> one of the terminals, one MUST spell out such CURIE as a full IRI (which is
> enclosed in <> and is therefore fine).
>
Doesn't the approach on disallowing the reference alone solve this
problem, too?
>
> 5. It is currently unclear whether "quotedString" can contain CRLF. The current
> definition seems to allow this, but Yevgeny was confused. We could perhaps just
> add a clarification that says "yes, it is allowed".
>
>
Sure. Again, I would send this feedback to Shane and Mark.
Cheers
Ivan
> Please let me know how you feel about my proposals.
>
> Regards,
>
> Boris
>
>
>
--
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Sunday, 22 March 2009 08:59:56 UTC