Re: turtle conformance clause / strict-vs-loose parsing

 >     The document does sort of say that IRIs must be valid IRIs, as does
 >     rdf-concepts so it is a matter of how prominently to say it.
 >
 >     Turtle:
 >     [[
 >     6.2 RDF Term Constructors
 >
 >     production      type
 >     IRIREF          IRI
 >
 >     The characters between "<" and ">" are unescaped¹ to form the
 >     unicode string of the IRI.
 >     ]]
 >
 >     so it says IRIREF produces an IRI and hence conformance checking is
 >     done.  It's prominent though.
 >
 >
 > Did you mean to say it's NOT prominent? At any rate, I think it's
 > debatable. That passage could also be read to mean that the only thing
 > required to produce an IRI is unescaping the stuff between '<' and '>'.
 >
 > -Alex

Sorry - yes I did mean not prominent and yes the exact meaning is not 
immediately clear.  I was just finding a place where it seems to rule 
out non-IRIs.

>     That makes sense.   So what's the point of the IRIREF pattern being
>     something more complex than /<[^ \t\n\r>]*>/ ?    (Or even /<[^>]*>/, or
>     -- if you have the nongreedy operator --  just /<.*?>/)
>
>
> I think the grammar as written is a happy compromise of rejecting input that is obviously not an IRI since it contains illegal characters, without introducing the full-blown complexity of RFC3987. Keeping in mind that not all environments will have access to an IRI library, I don't think it's appropriate to allow absolutely everything within the <> brackets.

+1

We had this debate in SPARQL 1.0 and exactly that point of at least 
rejecting impossible characters in the grammar token rule was the 
decision.  And now, it enforce \u rules in the tokenizer which is good 
at such things.


 Andy

Received on Friday, 18 May 2012 14:35:12 UTC