For Issue #1, I like the formulation. However, I'd like to see one more
piece of information (logically) returned: if the parse could not continue
to the end, then what was the last character successfully parsed.
That is, in "http://google.com*<space>*", it would return the offset between
the "m" and the space.
So why do this? It is because a very common problem is to find an IRI in
plain text, where the end is not known. This needs to be done in email, word
processors, HTML editors, and a host of other products. By having an
explicit specification that lets us know what the last character is, one can
then (logically) call the function again to determine whether the segment up
to the error point is a valid IRI.
Once we have the spec all sorted out, then on that basis someone can write a
fast parser that returns all and only those instances that can be complete
IRIs -- and more lenient ones that allow some information (such as the
scheme) to be defaulted.
Mark
— Il meglio è l’inimico del bene —
On Thu, Apr 8, 2010 at 18:40, Ian Hickson <ian@hixie.ch> wrote:
> Issue 1:
> ========================================================================
> Update the IRI specification to define an algorithm with the following
> characteristics:
>