Re: Change definition of URL to normatively reference IRI specification using a well-defined interface

When you would actually implement it, there are a few different kinds of
APIs that you would use, such as:

end = lookingAt(string, startPosition);
if there is an IRI starting at startPosition, return the end of it -
otherwise return an error.

<start, end> = scan(string, startPosition);
find the first instance of an IRI in a string at or after startPosition,
returning where it starts and ends.


The key is that if the Issue#1 specification can return the first error
point (as I outlined in the message), then one can design and implement fast
code to implement the above (or other kinds of APIs). The reference code for
*testing* lookingAt would implement the algorithm in Issue#1 (as amended).
The reference code for *testing* scan would just call lookingAt in a loop,
starting at position 0, returning if something is found, and otherwise going
to the next character. This would just be reference code; the reference code
can be much faster.

Mark

— Il meglio è l’inimico del bene —


On Fri, Apr 9, 2010 at 10:12, Julian Reschke <julian.reschke@gmx.de> wrote:

> On 09.04.2010 18:54, Mark Davis ☕ wrote:
>
>>  For Issue #1, I like the formulation. However, I'd like to see one
>> more piece of information (logically) returned: if the parse could not
>> continue to the end, then what was the last character successfully parsed.
>>
>> That is, in "http://google.com/<space>/", it would return the offset
>> between the "m" and the space.
>>
>> So why do this? It is because a very common problem is to find an IRI in
>> plain text, where the end is not known. This needs to be done in email,
>> word processors, HTML editors, and a host of other products. By having
>> an explicit specification that lets us know what the last character is,
>> one can then (logically) call the function again to determine whether
>> the segment up to the error point is a valid IRI.
>>
>
> Hmm. Not convinced.
>
> 1) If you want to parse IRIs out of content, wouldn't you also need to
> consider *leading* non IRI characters?
>
> 2) What's wrong with just adding up the individual segments (plus
> delimiters)?
>
>  ...
>>
>
> Best regards, Julian
>
>

Received on Friday, 9 April 2010 17:42:16 UTC