- From: David Robillard <d@drobilla.net>
- Date: Fri, 17 Feb 2012 21:43:38 -0500
- To: public-rdf-dawg-comments@w3.org
Hello, Apologies for sending this past the Last Call, but I have a comment about the decision to combine PNames and Property Paths in SPARQL and escaping PNames to resolve the problems this causes. My perspective is mainly that of a Turtle user/implementer. I discovered this issue updating my Turtle implementation[1] for the latest spec. I discovered that an odd new rule has been added to the grammar: [163s] PN_LOCAL_ESC ::= '\\' ( '_' | '~' | '.' | '-' | '!' | '$' | '&' | "'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | ':' | '/' | '?' | '#' | '@' | '%' ) Unhappy with how ugly this is, and puzzled why such a specific seemingly arbitrary set of characters has been introduced as escapes in PNames, I investigated. It turns out this is from SPARQL, and the escapes are to avoid clashing with Property Paths (hereafter just "paths"). This seems like a problem to me: the Turtle specification now has a strange and unpleasant grammar rule from a different specification, to mesh with a concept that is meaningless in the context of a Turtle document. I do agree, though, that copy/paste compatibility between statements in both languages is highly desirable. My main point is about the method: I think escaping is a very poor way of achieving this, and quotation is more appropriate. Either Paths, or PNames, should be quoted, or have a special leading character, to remove this ambiguity. Some cons of the current escaping scheme: * Escaping is ugly, and difficult to work with. Paths that include pnames with special characters are difficult to read. * Copying from other data sources that use these characters is difficult, so much so that expecting a user to manually do this (i.e. escape every character in the above list) is not realistic, and error-prone. * This effectively prevents future revisions of SPARQL from adding anything to the path syntax. If both of these specs become recommendations, then Turtle (and the corresponding rules in SPARQL itself) will have baked-in escapes specifically to work around path syntax. None can be added, because this will break the rules for PNames, in both SPARQL and Turtle. * The very existence of escaping implies there is a need to express these characters in PNames. However, this has been made tedious and ugly to accomodate paths. In my opinion, this is somewhat backwards. Both languages should have a clean PName syntax. Paths are a different thing, and should be clearly designated as such. Put another way, property paths are not pnames, and crippling the pname syntax for paths is a poor design when there are very simple alternative ways of differentiating the two. Some pros of quoting, rather than escaping: * Much easier to read. Even in a purely SPARQL context, ignoring Turtle, having a path be very clearly delineated is much simpler to read than navigating a mess of escapes and trying to mentally parse what is going on. * Turtle is not 'infected' by this SPARQL specific grammar consideration, and both can use a simpler, more expressive, and more friendly PName grammar. SPARQL is not 'locked in' forevermore and is free to update the path syntax in the future. * Copy/paste compatibility with other data sources is much simpler, since quoting is easy, unlike escaping. It is also less error prone, since only the quote character needs special consideration. * The grammars become cleaner, since Path rules and PName rules are clearly distinct (though the former would refer to the latter). The PName rules do not need to take into consideration every character used in the Path syntax, which is crucial since the PName rules must be in Turtle as well. The current PName rule is a symptom that different types of tokens have not been properly distinguished. * The PName rules would be far more (possibly entirely) compatible with CURIES, rather than extremely SPARQL specific. I am not sure exactly what to suggest in terms of syntax. It seems most in-line with existing practice to not quote 'top-level' PNames, but rather quote paths somehow. This resolves the Turtle problems, but does not resolve issues with PNames inside paths. Here, it seems quoting is best. One proposal: paths always have a leading '/', and PNames within paths are quoted with '[' and ']' (as in the CURIE spec). Thus, the example: ?x foaf:knows/foaf:name ?name . Would become: ?x /[foaf:knows]/[foaf:name] ?name . The quoting means the PNames are free to contain extended characters, e.g. rather than the unwieldy: ?x eg:foo\/bar\/baz/eg:terms\/a\+b ?b . You would have: ?x /[eg:foo/bar/baz]/[eg:terms/a+b] ?b . Importantly, no quoting of PNames in any other context is necessary, and no escaping of PNames is necessary at all, which is a significant win for "copy-paste compatibility" (quoting could also be optional in paths). The prefix character is analogous to the '?' used for variables. This works well, and is very simple, since a token that starts with a '?' is clearly a variable, and there is no clashing. Paths (indeed, any new kind of token) should be similarly simple to distinguish. A token that starts with a '?' is a variable. A token that starts with a '/' is a property path. Simple, consistent, extensible. Note these are just off-the-cuff examples, I have not thought much about the best syntax. Leading slash for paths and [] quoting as above may not be the best choices for whatever reason; I am more interested in highlighting the problem first. If quoting in paths is not popular, I wouldn't mind escaping *only in paths* - at least that doesn't wreck Turtle. In my opinion, this is a very serious issue. I have a strong aversion to implementing these PName escapes in Turtle, and consider it an outright error. Again, apologies for being late, but a more palatable resolution to this problem would be a significant improvement, and prevent future problems. Thanks, -dr [1] http://drobilla.net/software/serd/
Received on Saturday, 18 February 2012 02:44:06 UTC