- From: Jonathan Rees <jar@creativecommons.org>
- Date: Fri, 12 Aug 2011 13:50:53 -0400
- To: Melanie Courtot <mcourtot@gmail.com>
- Cc: Markus Krötzsch <markus.kroetzsch@cs.ox.ac.uk>, zhutchok@ebi.ac.uk, OWL Development Mailing List <public-owl-dev@w3.org>
On Fri, Aug 12, 2011 at 12:07 PM, Melanie Courtot <mcourtot@gmail.com> wrote: > Hi, > > The part after the : is a QName, and the relevant spec is at [1]. It does forbid to use a digit as first character after the colon. > > While we were working on the OBO ID policy [2], Jonathan Rees (cc) mentioned there was a proposal to relax those constraints by using CURIEs [3] instead of QNames, but I didn't check it and don't know its status; he may be able to add more information. > > Cheers, > Melanie > > [1] http://www.w3.org/TR/REC-xml-names/#NT-QName > [2] http://www.obofoundry.org/id-policy.shtml > [3] http://www.w3.org/2001/sw/BestPractices/HTML/2005-10-27-CURIE Two questions here: whether a digit can immediately follow the namespace prefix, and whether a second : can follow it. I'll take them in order. I think the theory of abbreviated URIs has played out as follows: - The prefix : suffix pattern is generically called concise URI or 'curie', see http://www.w3.org/TR/curie/ - There are three instantiations of the Curie pattern in current specs - XML Qnames are the Curies of RDF/XML. They require the suffix to be an 'NCname', which has to start with a letter or _ - SPARQL and the newly revised Turtle draft specification http://www.w3.org/TR/turtle/ have more liberal Curies. Their Curies allow an empty suffix, a digit after the colon, and possible other goodies. - RDFa also has Curies which I believe to be a superset of SPARQL/Turtle Curies. I don't know the details but you should be able to find them in http://www.w3.org/TR/xhtml-rdfa/ (I just spent 2 minutes and failed though). When Curies occur in RDF/XML, they do so as element names, not inside strings. XML is not going to change, and RDF/XML even if it does get reissued will still be tied to XML, so it's not going to change. If I said anything about syntax liberalization, it was probably in reference to Turtle, which has indeed changed in the way I expected. But URI syntax, in particular whether you can put two colons in a URI (i.e. RDF URI Reference and/or IRI), is not up to any of these specifications. That would be up to RFC 3986, which delegates to RFC 2616, which delegates to RFC 2396, which says that : is reserved and has to be %-escaped. In practice I suspect this is not always done, and perhaps the new IRI spec (in progress) will say something about that. One could imagine Turtle or RDFa allowing : in their Curies, which would then be %-escaped when the Curie is converted to a URI, but I doubt this will be done. The subject of URI syntax is so complicated, and the quoting rules so impenetrable and specification compliance so poor, that I recommend using them in the most syntactically conservative way possible, so as to stay out of trouble. I'd say eliminate the : one way or another in the process of converting any internal name into a URI. This appears to be what the Foundry approach does, but conversion of : to _ is specific to Foundry identifiers; there's no reason to think :-containing identifiers coming from another identifier space should convert to _ as opposed to - or + or / or %3A. The RFCs suggest that %-escaping would be the right way to put a second : in a URI, but there is no reason to be uniform or slavish about this. If a second colon occurs in a would-be URI, and the introduction of _ is being left up to Protege, then that's too late in the process. How would Protege know to use the Foundry rule for converting :, as opposed to (say) %-escaping, which might be correct for some different identifier source? Get rid of the : before putting the id into Protege in the first place. Best Jonathan
Received on Friday, 12 August 2011 17:51:30 UTC