- From: Dirk-Willem van Gulik <dirkx@asemantics.com>
- Date: Wed, 3 Mar 2004 11:53:30 +0100
- To: www-rdf-interest@w3.org
- Message-Id: <044E1238-6D01-11D8-B8F7-000A95CDA38A@asemantics.com>
Just to supplement the lighting talk of yesterday with some technical beef. Below we walk through DDDS (rfc3401-3405) with the URL: http://www.asemantics.com//n/index.html A-Priori rule valid for all URIs (and URLs are a subset of URI's) (chicken-egg solution - [4]) s/^([^:]+)/\1/i; [1] which is hardcoded in the applications URI parser. Then: http://www.asemantics.com//n/index.html =~ s/^([^:]+)/\1/i; gives http Then look up 'http' in the well known domain (another chicken-egg hardcoded thing - [6]): dig -t NAPTR http.uri.arpa. For the HTTP uri scheme. $ dig -t NAPTR http.uri.arpa. ... http.uri.arpa. 21600 IN NAPTR 0 0 "" "" "!^http://([^:/?#]*).*$!\\1!i" . ... So what we get back is an NAPTR [5] record with 5 values: value meaning 1 0 Order - if there are multiple NAPTR records returned; this is the order of them. 2 0 Preference - preference within an order block 3 "" Flags; several are possible and generally they denote a terminal rule (see below). 4 "" Services: list of protocols and services supported by the end point (i.e. when it is a terminal rule - see below). 5 ... Regular expression 6 ... Replacement [2] So we have a new regex: "!^http://([^:/?#]*).*$!\\1!i Now apply this again to our url: http://www.asemantics.com//n/index.html =~ "!^http://([^:/?#]*).*$!\\1!i . results in www.asemantics.com Note that this was the last central/standards defined step; everything from here is totally fqdn manager specific (i.e. to who-ever manages asemantics.com). Now continue our DDDS loop (which is NOT recursive): dig -t NAPTR www.asemantics.com And we get ww.asemantics.com. 1800 IN NAPTR 100 20 "" "" "!^http://([^:/?#]*).*$!bali.asemantics.com!i" . NAPTR 100 10 "" "" "!^http://foaf.([^:/?#]*).*$!foaf.asemantics.com!i" . Again we apply the regexes to the URL, in the right order (ordered by order first and by pref (second field) second). order 100, pref 10 http://www.asemantics.com//n/index.html =~"!^http://foaf.([^:/?#]*).*$!foaf.asemantics.com!i" no match. Ok, next one. order 100, pref 20 http://www.asemantics.com//n/index.html =~ "!^http://([^:/?#]*).*$!bali.asemantics.com!i" and we get bali.asemantics.com So what has happened here is that we are routing the request to the right place; as some URI's on our FOAF server are special cases; whereas most of them go to the server Bali. Then, you've guessed it, we do an other lookup dig -t NAPTR bali.asemantics.com and get back: bali.asemantics.com 1800 IN NAPTR 100 10 "u" "http+I2L" "!^http://([^:/?#]*)(.*)$!http://\\1/url.pl/\\2!i" . NAPTR 100 10 "a" "z3950+I2C" "!^http://([^:/?#]*)(.*)$!209.132.96.45!i" . NAPTR 100 10 "u" "http+I2C" "!^http://([^:/?#]*)(.*)$!http://\\1/rdf.pl/\\2!i" . NAPTR 100 10 "u" "http+I2R" "!^(.*)$!\\1!i" . Note that this time there is a value in the 'flags' field; a 'U'. This signals that a match of the corrensponding regex means: -> Terminal; do not evaluate any further. -> And the result of the regex (if it matched) MUST be a URI. Several other flags are defined. Secondly you'll notice that the 'service' field contains something. The syntax is [ protocol ] [ '+' service > Where protocol is any valid IANA service (see your /etc/services file); http or ftp are well known examples and 'service' can be several values; shown above are I2R Identifier to Resource -> give me the thing I2L identifier to Location -> give me the location I2C identifier to Characteristic -> give me metadata about the resource So lets now assume that we started this procedure out with the desire to learn ABOUT the url, and that we speak http; then apply the above rules: NAPTR 100 10 "u" "http+I2R" "!^(.*)$!\\1!i" . would match fine; we can do http, but we're not interested in I2R, so next we try NAPTR 100 10 "a" "z3950+I2C" "!^http://([^:/?#]*)(.*)$!209.132.96.45!i" . and this matches, we want I2C - but we're no dinosaurs; so we do not speak z3950. So next we try: NAPTR 100 10 "u" "http+I2C" "!^http://([^:/?#]*)(.*)$!http://\\1/rdf.pl/\\2!i" . this matches, and we can do http and we want I2C; so the fiinal result is http://www.asemantics.com/rdf.pl//n/index.html and the terminal type is 'U' - so I should interpret the above result as a URI. [3] Apologies for any types/cut-and-paste errors in above - I'll spend some cycles in the next week to simply the rules in our demo domain above to make it a bit easier to follow. On http://foaf-demo.asemantics.com/ex.html you can find some very rough and ready code in perl/java which does the above; OR (better) you can cut and paste the algorithm from RFC 3402 and 3404. Which probably is much quicker. (Though I'd love to hear if you open source your python/perl/php/ruby/assembler version of it :-). C'est Ça Dw. Notes 1: Using abbreviated/simplified and not quite correct regexpes in perl style to make it easier to follow, the real ones are more complex to deal with escaping and match exact the URI def in the RFC. 2; See rfc3401-3405 for exactly how when this is used; in general use the regex if there is a value or do outright substitution if it is empty with the replacement value. 3: Other options are an SRV record or simply an IP address. 4: Rfc 2396 Uniform Resource Identifiers 5: Rfc 2915 NAPTR record 6. Rfc 3405, http://uri.net/ddds.html Dw
Attachments
- application/pkcs7-signature attachment: smime.p7s
Received on Wednesday, 3 March 2004 05:53:49 UTC