- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Mon, 12 Mar 2018 11:00:37 -0600
- To: Steven Pemberton <steven.pemberton@cwi.nl>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, public-xformsusers@w3.org
> On Mar 11, 2018, at 1:51 PM, Steven Pemberton <steven.pemberton@cwi.nl> wrote: > > Thanks for this. A solid and thorough piece of work. > > However, I think our problem is a little simpler; we don't need to parse the URI, only recognise if it is correct or not. This means we can greatly simplify the syntax. The only thing the types defined in that document do is recognize whether the input value is correct or not. By ‘correct’ I mean (and I assume you also mean) ‘recognized by the grammar in the spec’. There are plenty of simplifications of the syntax around, but they don’t recognize the set of strings generated by the grammar. The regular expression in Annex B of 3986, for example, can be used to recognize the gross structure of a string known to be a correct URI (or perhaps IRI), but on examination it turns out to accept any string of characters, so it does not distinguish correct from incorrect. > > As far as I can see, the basis of the regex needed is: > > IRI-reference = (scheme ":" [hier] | [hier-nc]) [ "?" ipch-q* ] [ "#" ipch-f* ] > > where the only difference between 'hier' and 'hier-nc' is that hier-nc may have no colons before the first (if any) "/" character. > > (The only difference between the characters represented by 'ipch-q' and 'ipch-f' is that ipch-q can contain characters from the private use areas.) > > As I see it, XForms needs two IRI types. For the case where a user is required to type in a full web-address: > > IRI = scheme ":" ihier-part [ "?" iquery ] [ "#" ifragment ] > > and where data could hold either a full IRI or a relative IRI: > > IRI-reference = (scheme ":" [hier] | [hier-nc]) [ "?" ipch-q* ] [ "#" ipch-f* ] It’s not clear to me whether you are proposing to simplify the grammar (1) by relaxing some of its constraints or (2) by omitting non-terminals (as in the treatment of iquery and ifragment in your definition of IRI-reference) and replacing complex expressions with simpler expressions which recognize exactly the same languages. In the first case, the result will not in fact be checking URIs or IRIs for correctness, so on reflection I assume that that cannot be what you have in mind. In the second case, you have the burden of proving the equivalence between the grammars in the RFCs and the expressions you are constructing, but you may be able to produce final regular expressions which are simpler than those in the unpublished WG Note. The Note performs a few simplifications here and there but does not attempt any broad restructuring of the grammar, since one of its purposes is to make it easy to confirm that the type defined is correct and accepts the same strings as the grammars in the RFCs. It might be possible to simplify things a great deal by restructuring the grammar, though I believe the largest contribution to the complexity of the grammar is currently made by the definition of ihost, which I don’t see a particularly good way to simplify. Bear in mind that the entity names used to construct the regex disappear without a trace; simplification of the grammar by eliminating non-terminals like ifragment and iquery will thus have no effect on the complexity of the final expression. Michael ******************************************** C. M. Sperberg-McQueen Black Mesa Technologies LLC cmsmcq@blackmesatech.com http://www.blackmesatech.com ********************************************
Received on Monday, 12 March 2018 17:01:05 UTC