- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Fri, 18 May 2012 15:34:10 +0100
- To: Alex Hall <alexhall@revelytix.com>
- CC: public-rdf-wg@w3.org
> The document does sort of say that IRIs must be valid IRIs, as does > rdf-concepts so it is a matter of how prominently to say it. > > Turtle: > [[ > 6.2 RDF Term Constructors > > production type > IRIREF IRI > > The characters between "<" and ">" are unescaped¹ to form the > unicode string of the IRI. > ]] > > so it says IRIREF produces an IRI and hence conformance checking is > done. It's prominent though. > > > Did you mean to say it's NOT prominent? At any rate, I think it's > debatable. That passage could also be read to mean that the only thing > required to produce an IRI is unescaping the stuff between '<' and '>'. > > -Alex Sorry - yes I did mean not prominent and yes the exact meaning is not immediately clear. I was just finding a place where it seems to rule out non-IRIs. > That makes sense. So what's the point of the IRIREF pattern being > something more complex than /<[^ \t\n\r>]*>/ ? (Or even /<[^>]*>/, or > -- if you have the nongreedy operator -- just /<.*?>/) > > > I think the grammar as written is a happy compromise of rejecting input that is obviously not an IRI since it contains illegal characters, without introducing the full-blown complexity of RFC3987. Keeping in mind that not all environments will have access to an IRI library, I don't think it's appropriate to allow absolutely everything within the <> brackets. +1 We had this debate in SPARQL 1.0 and exactly that point of at least rejecting impossible characters in the grammar token rule was the decision. And now, it enforce \u rules in the tokenizer which is good at such things. Andy
Received on Friday, 18 May 2012 14:35:12 UTC