- From: Alex Hall <alexhall@revelytix.com>
- Date: Fri, 18 May 2012 09:52:50 -0400
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: public-rdf-wg@w3.org
- Message-ID: <CAFq2biyM7nb9_bMjufdC93-stbiktcsQ_ovevbU4oFQgc1DMDQ@mail.gmail.com>
On Fri, May 18, 2012 at 9:46 AM, Andy Seaborne < andy.seaborne@epimorphics.com> wrote: > > > On 18/05/12 14:30, Alex Hall wrote: > >> On Fri, May 18, 2012 at 6:20 AM, Richard Cyganiak <richard@cyganiak.de >> <mailto:richard@cyganiak.de>> wrote: >> >> Sandro, >> >> -1 to a “loose Turtle”. >> >> If a conforming Turtle parser were allowed to accept a document >> containing <http://example.org/a|b>, then what next? This is not a >> valid IRI. So it is not allowed in an RDF graph. A Turtle parser is >> rarely a stand-alone system — it's a component in a larger system. >> Once the Turtle parser tries passing on the pseudo-IRI to the next >> component, then a number of things can happen: >> >> The next component might reject it outright. >> >> Or the next component accepts it and stores the pseudo-IRI. Then the >> user can do their thing. Then when the user tries to save their >> work, the serializer checks IRIs and rejects it, taking down the app >> with an error message. (This is Jena's default behaviour, or at >> least was the last time I checked.) >> >> Or maybe the entire system works, except that now we have a >> situation where certain RDF “graphs” can be loaded and saved in >> Turtle but not in other syntaxes. This will cause major headaches >> for users, who will end up messing around with format converters in >> order to get broken data into a format that doesn't complain about >> the data being broken. >> >> Or maybe the system accepts the IRI and puts it into its store, but >> then you can't delete it from the store any more because the SPARQL >> Update part of the system is stricter and rejects DELETE DATA >> commands containing broken IRIs. >> >> Given the complexity of RDF-bases systems, and the many interacting >> components and specifications involved, this kind of error handling >> cannot be introduced for a single syntax. It has to be done >> centrally so that all involved components and specifications can >> behave in a consistent way. Defining algorithms for error recovery >> for broken RDF data may well be a good idea, but I don't think this >> should be part of an 1.1 update to RDF, and I don't think we are >> chartered to do it. >> >> >> +1 to all these sentiments - in practice, letting an invalid IRI into an >> RDF system will likely screw things up later down the line when >> validation is eventually applied. >> >> However, the Turtle grammar already allows the creation of invalid IRIs. >> The main purpose of the IRIREF rule is to disallow characters that are >> illegal everywhere in an IRI, but you can still construct an invalid IRI >> by either: >> 1. Using legal characters in an illegal order, e.g. <a#b#c> >> 2. Using Unicode escapes for illegal characters, e.g. <a\u007Cb> (which >> is the escaped form of <a|b>) >> >> This was illustrated in a message to public-rdf-comments >> (http://lists.w3.org/Archives/**Public/public-rdf-comments/** >> 2012Mar/0000.html<http://lists.w3.org/Archives/Public/public-rdf-comments/2012Mar/0000.html> >> ), >> which pointed out a positive parser test in the test suite which >> contained an IRI with escaped control characters. An implementation that >> parsed the resulting, unescaped IRI using an IRI library was reporting >> an error for this case. The consensus on the list was that parsing with >> an IRI library is a perfectly appropriate thing to do, and that the test >> case should be changed or removed. >> >> In light of this, I think the Turtle document should give guidance that >> the Turtle grammar alone is not sufficient to reject invalid IRIs, and >> that conforming parsers MAY (or even SHOULD) do additional validation >> against the grammar from RFC3987. >> >> -Alex >> >> > Agreed. > > The document does sort of say that IRIs must be valid IRIs, as does > rdf-concepts so it is a matter of how prominently to say it. > > Turtle: > [[ > 6.2 RDF Term Constructors > > production type > IRIREF IRI > > The characters between "<" and ">" are unescaped¹ to form the unicode > string of the IRI. > ]] > > so it says IRIREF produces an IRI and hence conformance checking is done. > It's prominent though. > > Did you mean to say it's NOT prominent? At any rate, I think it's debatable. That passage could also be read to mean that the only thing required to produce an IRI is unescaping the stuff between '<' and '>'. -Alex > Concepts ==> > [[ > An IRI (Internationalized Resource Identifier) within an RDF graph is a > Unicode string [UNICODE] that conforms to the syntax defined in RFC 3987 > [IRI]. > ]] > > Andy > >
Received on Friday, 18 May 2012 13:53:46 UTC