Re: URI Syntax

     Hello,

On Mon, Nov 21, 2011 at 6:40 PM, Toby Inkstere <tai@g5n.co.uk> wrote:
> On Thu, 17 Nov 2011 11:19:41 -0500
> Oliver Ruebenacker <curoli@gmail.com> wrote:
>
>>   I have a silly little technical question: when parsing an XML/RDF
>> document, what is the easiest way to find out whether a string
>> representing a URI is a complete absolute URI,a relative URI or an
>> abbreviation?
>
> As Tim said, there is no place in RDF/XML where you need to distinguish
> between URIs and QNames. Everywhere a URI is allowed a QName is
> disallowed, and vice versa.
>
> To distinguish between a relative URI and an absolute one, use the
> following regular expression:
>
>        ^([A-Za-z][A-Za-z0-9.+-]*):
>
> If it matches that regular expression, it's absolute; otherwise it's
> relative. Easy peasy. (That regular expression is derived from Section
> 3.1 of RFC 3986, which defines the syntax for URI schemes.)

  Thanks a lot, this is indeed easy.

  I should have realized that, whatever is the rule for delimiting
path segments, it is something that does not appear in a scheme name.

> For performance reasons, you may wish to limit the length of a URI
> scheme that can be matched:
>
>        ^([A-Za-z][A-Za-z0-9.+-]{0,127}):
>
> ... it seems unlikely that any URI scheme more than 128 characters will
> ever be used. This will prevent massive strings of alphanumeric
> characters from slowing down your regular expression.

  Although I don't recall any scheme name longer than five characters,
I would be cautious what fancy ideas people may come up with in the
future, such as scheme names generated by encoding other information.

     Take care
     Oliver

-- 
Oliver Ruebenacker, Computational Cell Biologist
Virtual Cell (http://vcell.org)
SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org)
http://www.oliver.curiousworld.org

Received on Tuesday, 22 November 2011 14:01:25 UTC