W3C home > Mailing lists > Public > public-html@w3.org > April 2010

Re: Change definition of URL to normatively reference IRI specification using a well-defined interface

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sat, 10 Apr 2010 13:25:52 +0200
Message-ID: <4BC06040.5030907@gmx.de>
To: Mark Davis ☕ <mark@macchiato.com>
CC: Ian Hickson <ian@hixie.ch>, Ted Hardie <ted.ietf@gmail.com>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Maciej Stachowiak <mjs@apple.com>, Larry Masinter <LMM@acm.org>, Marc Blanchet <Marc.Blanchet@viagenie.ca>, Sam Ruby <rubys@intertwingly.net>, Paul Cotton <Paul.Cotton@microsoft.com>, Martin Duerst <duerst@w3.org>, Michel SUIGNARD <Michel@suignard.com>, public-html <public-html@w3.org>, "public-iri@w3.org" <public-iri@w3.org>
On 09.04.2010 19:41, Mark Davis ☕ wrote:
>   When you would actually implement it, there are a few different kinds
> of APIs that you would use, such as:
> end = lookingAt(string, startPosition);
> if there is an IRI starting at startPosition, return the end of it -
> otherwise return an error.
> <start, end> = scan(string, startPosition);
> find the first instance of an IRI in a string at or after startPosition,
> returning where it starts and ends.
> The key is that if the Issue#1 specification can return the first error
> point (as I outlined in the message), then one can design and implement
> fast code to implement the above (or other kinds of APIs). The reference
> code for /testing/ lookingAt would implement the algorithm in Issue#1
> (as amended). The reference code for /testing/ scan would just call
> lookingAt in a loop, starting at position 0, returning if something is
> found, and otherwise going to the next character. This would just be
> reference code; the reference code can be much faster.
> Mark
> ...


I'd agree with you if we were talking about an Application Programming 
Interface. But this is just about the specification interface between 
HTML (& friends) and IRI.

That being said, defining a sane Javascript API for handling web 
addresses would be great. Maybe it could be developed and deployed 
similar to the way the JSON thingy was.

Finally: parsing addresses out of content is highly context dependent. 
Do you consider angle brackets as URI delimiters in plain text? Can 
whitespace appear in angle-bracket quoted addresses? In unquoted 
addresses? Is whitespace a delimiter between addresses (such as in a few 
set-of-URI-typed HTML attributes), or part of the address?

I'd rather not have to think about this in the IRI spec. Maybe in a 
BCP-like companion spec, though.

Best regards, Julian
Received on Saturday, 10 April 2010 11:26:44 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:16:01 UTC