Re: URI Syntax

On Thu, 17 Nov 2011 11:19:41 -0500
Oliver Ruebenacker <curoli@gmail.com> wrote:

>   I have a silly little technical question: when parsing an XML/RDF
> document, what is the easiest way to find out whether a string
> representing a URI is a complete absolute URI,a relative URI or an
> abbreviation?

As Tim said, there is no place in RDF/XML where you need to distinguish
between URIs and QNames. Everywhere a URI is allowed a QName is
disallowed, and vice versa.

To distinguish between a relative URI and an absolute one, use the
following regular expression:

	^([A-Za-z][A-Za-z0-9.+-]*):

If it matches that regular expression, it's absolute; otherwise it's
relative. Easy peasy. (That regular expression is derived from Section
3.1 of RFC 3986, which defines the syntax for URI schemes.)

For performance reasons, you may wish to limit the length of a URI
scheme that can be matched:

	^([A-Za-z][A-Za-z0-9.+-]{0,127}):

... it seems unlikely that any URI scheme more than 128 characters will
ever be used. This will prevent massive strings of alphanumeric
characters from slowing down your regular expression.

In RDFa there are places where you need to distinguish between absolute
URIs, relative URIs and CURIEs (which are abbreviated URIs, a bit like
QNames). In that case, you distinguish between absolute and relative as
above, but when you get an absolute URI, check the part before the
colon and if an expansion has been defined for it, then it's not an
absolute URI, but a CURIE.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>

Received on Monday, 21 November 2011 23:39:37 UTC