W3C home > Mailing lists > Public > public-html@w3.org > April 2010

Re: Change definition of URL to normatively reference IRI specification using a well-defined interface

From: Julian Reschke <julian.reschke@gmx.de>
Date: Fri, 09 Apr 2010 19:12:49 +0200
Message-ID: <4BBF6011.6000607@gmx.de>
To: Mark Davis ☕ <mark@macchiato.com>
CC: Ian Hickson <ian@hixie.ch>, Ted Hardie <ted.ietf@gmail.com>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Maciej Stachowiak <mjs@apple.com>, Larry Masinter <LMM@acm.org>, Marc Blanchet <Marc.Blanchet@viagenie.ca>, Sam Ruby <rubys@intertwingly.net>, Paul Cotton <Paul.Cotton@microsoft.com>, Martin Duerst <duerst@w3.org>, Michel SUIGNARD <Michel@suignard.com>, public-html <public-html@w3.org>, "public-iri@w3.org" <public-iri@w3.org>
On 09.04.2010 18:54, Mark Davis ☕ wrote:
>   For Issue #1, I like the formulation. However, I'd like to see one
> more piece of information (logically) returned: if the parse could not
> continue to the end, then what was the last character successfully parsed.
>
> That is, in "http://google.com/<space>/", it would return the offset
> between the "m" and the space.
>
> So why do this? It is because a very common problem is to find an IRI in
> plain text, where the end is not known. This needs to be done in email,
> word processors, HTML editors, and a host of other products. By having
> an explicit specification that lets us know what the last character is,
> one can then (logically) call the function again to determine whether
> the segment up to the error point is a valid IRI.

Hmm. Not convinced.

1) If you want to parse IRIs out of content, wouldn't you also need to 
consider *leading* non IRI characters?

2) What's wrong with just adding up the individual segments (plus 
delimiters)?

> ...

Best regards, Julian
Received on Friday, 9 April 2010 17:13:36 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:39:16 UTC