Re: Several minor problems in the grammar for the functional-style syntax

Boris Motik wrote:
> Hello,
> 
> Yevgeny Kazakov is currently trying to implement the functional-style syntax at
> our lab, and he has found a number of minor problems in our definitions. 

As a general rule, I would propose to get into contact with the editors
of the CURIE syntax, ie, Shane (shane@aptest.com) or Mark
(mark.birbeck@webBackplane.com). We should try to avoid deviation from
the CURIE CR.

http://www.w3.org/TR/curie

You feedbacks are from implementers, so this is exactly the type of
feedbacks they are looking for.

>                                                                           I
> present below the problems, as well as the possible solutions. Most of the
> problems are caused by the syntax of CURIE, which is defined like this:
> 
> curie := [[prefix] ":"] irelative-ref
> prefix := NCName
> NCName := defined by XML
> irelative-ref: defined by the IRI spec
>  
> 
> 1. The CURIE spec is not clear regarding whether the prefix, :, and the
> irelative-ref in a CURIE can be separated by a whitespace. This makes parsing
> CURIEs such as a:b:c ambiguous, as it is not clear whether one means
>     a:b :c
> or
>     a :b:c.
> 
> This problem could be solved if we made the 'curie' production a terminal and
> explicitly state that there should be no spaces in it.
> 

Isn't it correct that NCName cannot contain whitespace? Than my reading
of the grammar above that it is _not_ allowed to have a whitespace there...

> 
> 2. We use @()^"=<>: as special characters in the spec -- that is, we use them as
> stand-alone terminals. Ideally, we'd want the other terminals not to contain
> these. This, however, is not the case: while NCName cannot contain any of these,
> irelative-ref can contain the characters "@=():". The latter is quite
> unfortunate: if you write 
>    abc)
> it is not clear whether the closing parenthesis is part of the irelative-ref or
> not. This prevents the functional-style syntax from being tokenized correctly.
> 
> Another problem is that, because irelative-ref can contain :, we cannot
> ambiguously parse the simple CURIE "a:b". One way of parsing it is as "a", ":",
> and "b", but another way is to parse it as a simple irelative-ref with the value
> "a:b".
> 
> We could fix these problems by changing the spec such that, in contrast to the
> CURIE spec, we allow irelative-ref to be only NCName. In this way, no CURIE can
> contain the dangerous characters, so we are fine. Furthermore, the grammar for
> CURIE becomes NCName ":" NCName, and, since NCName cannot contain ":", we can
> parse CURIEs correctly.
> 
> 

Ouch. I see the issue. This means that some valid URI-s like

http://www.w.w/#xpointer(id('a'))

(from http://www.w3.org/TR/xptr-framework/)

cannot be expressed as CURIES in the FS. It is not a huge deal, of
course (we can always use explicit URI-s) but it is till a bit of a pain.

Just exploring an alternative: what if the way we modify the syntax is
to disallow reference without a prefix? Ie, we could say:

curie := [prefix] ':' reference

This makes what this means is that

Namespace(bla=http://www.w.w/#)
bla:xpointer(id('a'))

is not a terminal because the prefix is there, so is

:xpointer(id('afasd'))

because the leading ':' is there (and the default namespace is used)
and, finally,

xpointer(id('a'))

is a terminal because there is no prefix mechanism at all, ie, it is not
a curie.

I believe that the CURIE spec should allow a host language to do to that
and, I believe, it does not at the moment. Maybe something to report back...

> 
> 3. There is an ambiguity between CURIE and nodeID: the string
>     _:abc
> can be parsed either as a single terminal matching the nodeID production, or as
> three terminals "_" ":" "abc" matching the CURIE production. (Note that _ is a
> valid NCName.)
> 
> To fix this, in our version of the 'curie' production we should prevent a CURIE
> to start with "_:". This is OK: the actual CURIE spec says that this type of
> usage can be disallowed in a host language and they explicitly mention RDF.
> 

I am not sure I understand. In RDFa, for example, the curie production
'_:X' is used for BNodes which is in line with our definition of nodeID.
CURIE allows the definition of '_:' in a specific host language as we
want. So what is the problem exactly?

> 
> 4. There is a general problem with the fact that our reserved words match the
> 'curie' production; for example, "ObjectUnionOf" is a perfectly valid CURIE
> (even with the fixes outlined above). This is clearly a problem, as it makes our
> grammar not be LL(1); for example, to parse
>     ObjectUnionOf( abc )
> we need to look two tokens down the line (i.e., only after you see "(" we know
> that we must have been in the production for "ObjectUnionOf"). Perhaps our
> grammar is such that, by increasing the lookahead, we can circumvent this
> problem; however, I am not sure of that, and this is a really sketchy solution
> that is very likely to cause problems in practice.
> 
> We can avoid this problem by saying that the 'curie' production MUST NOT match
> one of the terminal symbols; that is, instead of using a CURIE that matches to
> one of the terminals, one MUST spell out such CURIE as a full IRI (which is
> enclosed in <> and is therefore fine).
> 

Doesn't the approach on disallowing the reference alone solve this
problem, too?


> 
> 5. It is currently unclear whether "quotedString" can contain CRLF. The current
> definition seems to allow this, but Yevgeny was confused. We could perhaps just
> add a clarification that says "yes, it is allowed".
> 
> 

Sure. Again, I would send this feedback to Shane and Mark.

Cheers

Ivan

> Please let me know how you feel about my proposals.
>  
> Regards,
> 
>  Boris
> 
> 
> 

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Sunday, 22 March 2009 08:59:56 UTC