Fwd: Comment on Path / PName clash and Turtle impact

This comment to the SPARQL-WG is also about wanting unescaped / in 
Turtle prefix names.

 Andy

On 18/02/12 02:43, David Robillard wrote:
> Hello,
>
> Apologies for sending this past the Last Call, but I have a comment
> about the decision to combine PNames and Property Paths in SPARQL and
> escaping PNames to resolve the problems this causes.
>
> My perspective is mainly that of a Turtle user/implementer.  I
> discovered this issue updating my Turtle implementation[1] for the
> latest spec.  I discovered that an odd new rule has been added to the
> grammar:
>
> [163s] PN_LOCAL_ESC ::= '\\' ( '_' | '~' | '.' | '-' | '!' | '$' | '&' |
> "'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | ':' | '/' | '?' | '#' |
> '@' | '%' )
>
> Unhappy with how ugly this is, and puzzled why such a specific seemingly
> arbitrary set of characters has been introduced as escapes in PNames, I
> investigated.  It turns out this is from SPARQL, and the escapes are to
> avoid clashing with Property Paths (hereafter just "paths").
>
> This seems like a problem to me: the Turtle specification now has a
> strange and unpleasant grammar rule from a different specification, to
> mesh with a concept that is meaningless in the context of a Turtle
> document.  I do agree, though, that copy/paste compatibility between
> statements in both languages is highly desirable.
>
> My main point is about the method: I think escaping is a very poor way
> of achieving this, and quotation is more appropriate.  Either Paths, or
> PNames, should be quoted, or have a special leading character, to remove
> this ambiguity.
>
> Some cons of the current escaping scheme:
>
> * Escaping is ugly, and difficult to work with.  Paths that include
> pnames with special characters are difficult to read.
>
> * Copying from other data sources that use these characters is
> difficult, so much so that expecting a user to manually do this (i.e.
> escape every character in the above list) is not realistic, and
> error-prone.
>
> * This effectively prevents future revisions of SPARQL from adding
> anything to the path syntax.  If both of these specs become
> recommendations, then Turtle (and the corresponding rules in SPARQL
> itself) will have baked-in escapes specifically to work around path
> syntax.  None can be added, because this will break the rules for
> PNames, in both SPARQL and Turtle.
>
> * The very existence of escaping implies there is a need to express
> these characters in PNames.  However, this has been made tedious and
> ugly to accomodate paths.  In my opinion, this is somewhat backwards.
> Both languages should have a clean PName syntax.  Paths are a different
> thing, and should be clearly designated as such.  Put another way,
> property paths are not pnames, and crippling the pname syntax for paths
> is a poor design when there are very simple alternative ways of
> differentiating the two.
>
> Some pros of quoting, rather than escaping:
>
> * Much easier to read.  Even in a purely SPARQL context, ignoring
> Turtle, having a path be very clearly delineated is much simpler to read
> than navigating a mess of escapes and trying to mentally parse what is
> going on.
>
> * Turtle is not 'infected' by this SPARQL specific grammar
> consideration, and both can use a simpler, more expressive, and more
> friendly PName grammar.  SPARQL is not 'locked in' forevermore and is
> free to update the path syntax in the future.
>
> * Copy/paste compatibility with other data sources is much simpler,
> since quoting is easy, unlike escaping.  It is also less error prone,
> since only the quote character needs special consideration.
>
> * The grammars become cleaner, since Path rules and PName rules are
> clearly distinct (though the former would refer to the latter).  The
> PName rules do not need to take into consideration every character used
> in the Path syntax, which is crucial since the PName rules must be in
> Turtle as well.  The current PName rule is a symptom that different
> types of tokens have not been properly distinguished.
>
> * The PName rules would be far more (possibly entirely) compatible with
> CURIES, rather than extremely SPARQL specific.
>
> I am not sure exactly what to suggest in terms of syntax.  It seems most
> in-line with existing practice to not quote 'top-level' PNames, but
> rather quote paths somehow.  This resolves the Turtle problems, but does
> not resolve issues with PNames inside paths.  Here, it seems quoting is
> best.  One proposal: paths always have a leading '/', and PNames within
> paths are quoted with '[' and ']' (as in the CURIE spec).  Thus, the
> example:
>
> ?x foaf:knows/foaf:name ?name .
>
> Would become:
>
> ?x /[foaf:knows]/[foaf:name] ?name .
>
> The quoting means the PNames are free to contain extended characters,
> e.g. rather than the unwieldy:
>
> ?x eg:foo\/bar\/baz/eg:terms\/a\+b ?b .
>
> You would have:
>
> ?x /[eg:foo/bar/baz]/[eg:terms/a+b] ?b .
>
> Importantly, no quoting of PNames in any other context is necessary, and
> no escaping of PNames is necessary at all, which is a significant win
> for "copy-paste compatibility" (quoting could also be optional in
> paths).
>
> The prefix character is analogous to the '?' used for variables.  This
> works well, and is very simple, since a token that starts with a '?' is
> clearly a variable, and there is no clashing.  Paths (indeed, any new
> kind of token) should be similarly simple to distinguish.  A token that
> starts with a '?' is a variable.  A token that starts with a '/' is a
> property path.  Simple, consistent, extensible.
>
> Note these are just off-the-cuff examples, I have not thought much about
> the best syntax.  Leading slash for paths and [] quoting as above may
> not be the best choices for whatever reason; I am more interested in
> highlighting the problem first.  If quoting in paths is not popular, I
> wouldn't mind escaping *only in paths* - at least that doesn't wreck
> Turtle.
>
> In my opinion, this is a very serious issue.  I have a strong aversion
> to implementing these PName escapes in Turtle, and consider it an
> outright error.  Again, apologies for being late, but a more palatable
> resolution to this problem would be a significant improvement, and
> prevent future problems.
>
> Thanks,
>
> -dr
>
> [1] http://drobilla.net/software/serd/
>
>

Received on Tuesday, 21 February 2012 19:03:17 UTC