- From: Dan Connolly <connolly@w3.org>
- Date: Wed, 19 Sep 2001 22:26:51 -0500
- To: w3c-rdfcore-wg@w3.org
Pat writes: | I am unclear why N-triples refers to URIs | by a different name, but it does, and I'm just following along here. -- Wed, 19 Sep 2001 19:35:49 -0500 It's sorta embarrasing to explain what a mess the URI specs are, but it's nothing deep or subtle, so let's see if I can explain... Q: What's a URI? A: Everybody knows what a URI is; most folks call it a URL. They look like this: http://www.w3.org/ ftp://gatekeeper.dec.com/ tel:+1-617-253-2613 http://www.w3.org/2001/02pd/ Q: Gee; those are sorta long and tedious... can I abbreviate them, kinda like filenames in Unix and DOS? A: sure... in an HTML (or RDF) document available at http://www.w3.org/2001/foo, you can shorten http://www.w3.org/2001/02pd/gv to just 02pd/gv as in <a href="02pd/gv">...</a> or <rdf:Description rdf:about="02pd/gv">...</rdf:Description> The value of the href/resource attribute is called a _URI reference_*; all URIs are URI references, but not all URI references are URIs. Whenever you use a relative URI reference (i.e. one that isn't a full, absolute URI), you have to be clear about what URI you mean for it to be relative to so that the consumer can turn it back into absolute form; this is called the _base URI_. Q: Say... speaking of HTML href, I've seen these hash-mark deelies (#foo). What are those all about? A: In a URI reference, the hash mark (aka octothorpe; a fifty-cent word you can have for free ;-) separates a _fragment identifier_ from the rest of the URI reference. For example: <a href="02pd/gv#color">...</a> Q: So the absolute form of that would be http://www.w3.org/2001/02pd/gv#color right? A: right. Q: and that's an (absolute) URI, right? A: you would think so; and as of RFC1630, the term URI included things with fragment identifiers. But RFC1630 is just informational; somewhere along the road to RFC2396, the Internet Draft Standard, the term "URI" was narrowed to exclude things with fragment identifiers. Q: so what do we call it? A: you can call it a URI reference; it is one of those. Q: but URI references include relative deelies like "foo/bar"; I want a term that excludes those but includes things with fragment identifiers. That's what we're using in ntriples after all, no? A: Yes, that's what we're using in ntriples, but there is no ratified term for URI-plus-optional-fragment-identifier. Sorry. If it's any consolation, I'm on record wanting such a term for quite some time now: cf URI terminology, esp. in XML specs Dan Connolly (Mon, Jan 10 2000) http://lists.w3.org/Archives/Public/uri/2000Jan/0002.html Q: sigh... ok... but what about ampersands in href/resource attributes? <a href="lookup?name=Doe&age=27">...</a> A: that's just an XML quoting mechanism. It really has nothing to do with URIs. The value of that href attribute is lookup?name=Doe&age=27 just like the value of the dc:title attribute in <rdf:Description dc:title="AT&T Stuff"/> is AT&T Stuff It's unfortunate that the CGI designers couldn't be convinced to switch from & to ; for this purpose; it would have allowed folks to copy-and-paste without dealing with these quoting issues. Again, my I-told-you-so is on record: NOTE - The URI from a query form submission can be used in a normal anchor style hyperlink. Unfortunately, the use of the `&' character to separate form fields interacts with its use in SGML attribute values as an entity reference delimiter. For example, the URI `http://host/?x=1&y=2' must be written `<a href="http://host/?x=1&y=2"' or `<a href="http://host/?x=1&y=2">'. HTTP server implementors, and in particular, CGI implementors are encouraged to support the use of `;' in place of `&' to save users the trouble of escaping `&' characters this way. -- http://www.ics.uci.edu/pub/ietf/html/rfc1866.txt Q: OK... I think I'm catching on... but what about internationalization? RFC2396 says that URIs are built from US-ASCII (7 bit) characters; but XML attribute values are full unicode strings. What happens if somebody writes <a href="dürst">...</a> A: Well, remember the asterisk (*) when I said the value of an href/resource attribute is a URI reference? I lied. In general, it's something that can be converted to a URI reference as follows: - start with the Unicode string that is the attribute value (i.e. convert & to & and ü to whatever unicode character that is per XML quoting rules) - encode it as UTF-8; this yields a sequence of octets - for each octet that corresponds to a legal URI character (see the uric production in RFC2396) use that character (all URI characters are US-ASCII, recall). - for other octets, write them as %hh where hh are the two (US-ASCII) characters that make a hexadeciaml numeral for that octet. So the resulting URI reference from your question is d%fc%ferst [or something; at this point, I'd have to look up the codes to be sure.] see the W3C charmod spec (and the HTML 4.01 spec, and the XLink spec, and a recent IURI internet draft) for official specification of this unicode-to-URI stuff. Q: What a mess! Are you serious? For a technology so architecturally core to RDF and the Web, that's quite a kludge-tower! A: er... what can I say? That's the state of the art as I understand it. -- Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Wednesday, 19 September 2001 22:26:56 UTC