Re: Colon symbol in URI?

I think your analysis might be better than mine; thanks for being
gentle with me.

RFC 3986:

   URI producing applications should percent-encode data octets that
   correspond to characters in the reserved set [which includes :]
unless these characters
   are specifically allowed by the URI scheme to represent data in that
   component.

So what 3986 says about path syntax is not of much relevance. However,
RFC 2616 is the authority for the http: URI scheme syntax, and it
gives the syntax by reference to RFC 2396. I missed the RFC 2396
production for 'pchar' that you cite. : is described as reserved so I
made the mistake of thinking it required escaping. So I believe you're
right.

Regardless, the Foundry has made its choice (motivated in part by the
desire to have the option to enjoy Qname syntax), and I think that for
stability and interoperability it would be best to stick with its id
policy, by whatever means necessary. At the same time Protege
shouldn't convert : to _ or %3a; it should leave : alone (as you say).
Maybe Protege could make a special case for OBO Foundry to detect and
fix violations of the id policy. (Maybe there could even be a general
error detection and correction facility, so that other URI origins
could enjoy the same benefit.)

And I'm still of the opinion that putting a : in a URI path, even when
it's correct, is asking for trouble...

Best
Jonathan

On Sat, Aug 13, 2011 at 12:34 PM, Dave Reynolds
<dave.e.reynolds@gmail.com> wrote:
> On Fri, 2011-08-12 at 13:50 -0400, Jonathan Rees wrote:
>> But URI syntax, in particular whether you can put two colons in a URI
>> (i.e. RDF URI Reference and/or IRI), is not up to any of these
>> specifications. That would be up to RFC 3986, which delegates to RFC
>> 2616, which delegates to RFC 2396, which says that : is reserved and
>> has to be %-escaped. In practice I suspect this is not always done,
>> and perhaps the new IRI spec (in progress) will say something about
>> that.
>
> Not sure that's right. The relevant grammar rules from RFC 3986 are:
>
>      path          =   path-absolute   / ... other options omitted
>
>      path-absolute = "/" [ segment-nz *( "/" segment ) ]
>
>      segment       = *pchar
>      segment-nz    = 1*pchar
>      pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
>
> So "SMTH:0000353" is a legal segment and "/obo/SMTH:0000353" a legal
> path and "http://purl.obofoundry.org/obo/SMTH:0000353" a legal URI.
> Isn't it?
>
> The same was true in RFC2396:
>
>      path          = [ abs_path | opaque_part ]
>
>      path_segments = segment *( "/" segment )
>      segment       = *pchar *( ";" param )
>      param         = *pchar
>
>      pchar         = unreserved | escaped |
>                      ":" | "@" | "&" | "=" | "+" | "$" | ","
>
> As Markus says, it can't be abbreviated to QName form in RDF/XML and so
> can't be used as a property but you *can* use it as a class and still
> serialize in RDF/XML because there is no requirement to use abbreviated
> forms for classes. For example, Jena will produce legal RDF/XML such as:
>
> <rdf:RDF
>    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>    xmlns:j.0="http://purl.obofoundry.org/obo/SMTH:0000353"
>    xmlns:owl="http://www.w3.org/2002/07/owl#">
>  <owl:Class rdf:about="http://purl.obofoundry.org/obo/SMTH:0000353"/>
>  <rdf:Description rdf:about="http://www.openjena.com/test#i">
>    <rdf:type
> rdf:resource="http://purl.obofoundry.org/obo/SMTH:0000353"/>
>  </rdf:Description>
> </rdf:RDF>
>
>
> Dave
>
>
>

Received on Saturday, 13 August 2011 17:09:08 UTC