W3C home > Mailing lists > Public > public-html@w3.org > April 2010

Re: Change definition of URL to normatively reference IRI specification using a well-defined interface

From: Mark Davis ☕ <mark@macchiato.com>
Date: Fri, 9 Apr 2010 09:54:54 -0700
Message-ID: <n2w30b660a21004090954l134b46a8u86f2222c9627d61@mail.gmail.com>
To: Ian Hickson <ian@hixie.ch>
Cc: Ted Hardie <ted.ietf@gmail.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, Maciej Stachowiak <mjs@apple.com>, Larry Masinter <LMM@acm.org>, Julian Reschke <julian.reschke@gmx.de>, Marc Blanchet <Marc.Blanchet@viagenie.ca>, Sam Ruby <rubys@intertwingly.net>, Paul Cotton <Paul.Cotton@microsoft.com>, Martin Duerst <duerst@w3.org>, Michel SUIGNARD <Michel@suignard.com>, public-html <public-html@w3.org>, "public-iri@w3.org" <public-iri@w3.org>
For Issue #1, I like the formulation. However, I'd like to see one more
piece of information (logically) returned: if the parse could not continue
to the end, then what was the last character successfully parsed.

That is, in "http://google.com*<space>*", it would return the offset between
the "m" and the space.

So why do this? It is because a very common problem is to find an IRI in
plain text, where the end is not known. This needs to be done in email, word
processors, HTML editors, and a host of other products. By having an
explicit specification that lets us know what the last character is, one can
then (logically) call the function again to determine whether the segment up
to the error point is a valid IRI.

Once we have the spec all sorted out, then on that basis someone can write a
fast parser that returns all and only those instances that can be complete
IRIs -- and more lenient ones that allow some information (such as the
scheme) to be defaulted.

Mark

— Il meglio è l’inimico del bene —


On Thu, Apr 8, 2010 at 18:40, Ian Hickson <ian@hixie.ch> wrote:

> Issue 1:
> ========================================================================
> Update the IRI specification to define an algorithm with the following
> characteristics:
>
Received on Friday, 9 April 2010 16:55:30 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:17:07 GMT