- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Fri, 18 May 2012 14:46:48 +0100
- To: public-rdf-wg@w3.org
On 18/05/12 14:30, Alex Hall wrote: > On Fri, May 18, 2012 at 6:20 AM, Richard Cyganiak <richard@cyganiak.de > <mailto:richard@cyganiak.de>> wrote: > > Sandro, > > -1 to a “loose Turtle”. > > If a conforming Turtle parser were allowed to accept a document > containing <http://example.org/a|b>, then what next? This is not a > valid IRI. So it is not allowed in an RDF graph. A Turtle parser is > rarely a stand-alone system — it's a component in a larger system. > Once the Turtle parser tries passing on the pseudo-IRI to the next > component, then a number of things can happen: > > The next component might reject it outright. > > Or the next component accepts it and stores the pseudo-IRI. Then the > user can do their thing. Then when the user tries to save their > work, the serializer checks IRIs and rejects it, taking down the app > with an error message. (This is Jena's default behaviour, or at > least was the last time I checked.) > > Or maybe the entire system works, except that now we have a > situation where certain RDF “graphs” can be loaded and saved in > Turtle but not in other syntaxes. This will cause major headaches > for users, who will end up messing around with format converters in > order to get broken data into a format that doesn't complain about > the data being broken. > > Or maybe the system accepts the IRI and puts it into its store, but > then you can't delete it from the store any more because the SPARQL > Update part of the system is stricter and rejects DELETE DATA > commands containing broken IRIs. > > Given the complexity of RDF-bases systems, and the many interacting > components and specifications involved, this kind of error handling > cannot be introduced for a single syntax. It has to be done > centrally so that all involved components and specifications can > behave in a consistent way. Defining algorithms for error recovery > for broken RDF data may well be a good idea, but I don't think this > should be part of an 1.1 update to RDF, and I don't think we are > chartered to do it. > > > +1 to all these sentiments - in practice, letting an invalid IRI into an > RDF system will likely screw things up later down the line when > validation is eventually applied. > > However, the Turtle grammar already allows the creation of invalid IRIs. > The main purpose of the IRIREF rule is to disallow characters that are > illegal everywhere in an IRI, but you can still construct an invalid IRI > by either: > 1. Using legal characters in an illegal order, e.g. <a#b#c> > 2. Using Unicode escapes for illegal characters, e.g. <a\u007Cb> (which > is the escaped form of <a|b>) > > This was illustrated in a message to public-rdf-comments > (http://lists.w3.org/Archives/Public/public-rdf-comments/2012Mar/0000.html), > which pointed out a positive parser test in the test suite which > contained an IRI with escaped control characters. An implementation that > parsed the resulting, unescaped IRI using an IRI library was reporting > an error for this case. The consensus on the list was that parsing with > an IRI library is a perfectly appropriate thing to do, and that the test > case should be changed or removed. > > In light of this, I think the Turtle document should give guidance that > the Turtle grammar alone is not sufficient to reject invalid IRIs, and > that conforming parsers MAY (or even SHOULD) do additional validation > against the grammar from RFC3987. > > -Alex > Agreed. The document does sort of say that IRIs must be valid IRIs, as does rdf-concepts so it is a matter of how prominently to say it. Turtle: [[ 6.2 RDF Term Constructors production type IRIREF IRI The characters between "<" and ">" are unescaped¹ to form the unicode string of the IRI. ]] so it says IRIREF produces an IRI and hence conformance checking is done. It's prominent though. Concepts ==> [[ An IRI (Internationalized Resource Identifier) within an RDF graph is a Unicode string [UNICODE] that conforms to the syntax defined in RFC 3987 [IRI]. ]] Andy
Received on Friday, 18 May 2012 13:47:26 UTC