Re: DDDS / can you do sample full path ? (Fwd) from Dirk-Willem van Gulik on 2004-03-03 (www-rdf-interest@w3.org from March 2004)

From: Dirk-Willem van Gulik <dirkx@asemantics.com>
Date: Wed, 3 Mar 2004 11:53:30 +0100
To: www-rdf-interest@w3.org
Message-Id: <044E1238-6D01-11D8-B8F7-000A95CDA38A@asemantics.com>
Just to supplement the lighting talk of yesterday with some technical 
beef.

Below we walk through DDDS (rfc3401-3405) with the URL:

	http://www.asemantics.com//n/index.html

A-Priori rule valid for all URIs (and URLs are a subset of URI's)
(chicken-egg solution - [4])

	s/^([^:]+)/\1/i;			[1]

which is hardcoded in the applications URI parser. Then:
	
	http://www.asemantics.com//n/index.html =~ s/^([^:]+)/\1/i;

gives

	http

Then look up 'http' in the well known domain (another chicken-egg
hardcoded thing - [6]):

	dig -t NAPTR http.uri.arpa.

For the HTTP uri scheme.

	$ 	dig -t NAPTR http.uri.arpa.
	...
	http.uri.arpa.          21600   IN
		NAPTR   0 0 "" "" "!^http://([^:/?#]*).*$!\\1!i" .
	...

So what we get back is an NAPTR [5] record with 5 values:

	value	meaning
1	0		Order - if there are multiple NAPTR records
			returned; this is the order of them.
2	0		Preference - preference within an order block
3	""		Flags; several are possible and generally
			they denote a terminal rule (see below).
4	""		Services: list of protocols and services
			supported by the end point (i.e. when it
			is a terminal rule - see below).
5	...		Regular expression
6	...		Replacement [2]

So we have a new regex:

		"!^http://([^:/?#]*).*$!\\1!i

Now apply this again to our url:

	http://www.asemantics.com//n/index.html =~ 
"!^http://([^:/?#]*).*$!\\1!i .

results in

		www.asemantics.com

Note that this was the last central/standards defined step;
everything from here is totally fqdn manager specific (i.e.
to who-ever manages asemantics.com).

Now continue our DDDS loop (which is NOT recursive):

	dig -t NAPTR www.asemantics.com

And we get

	ww.asemantics.com.     1800    IN
		NAPTR   100 20 "" "" "!^http://([^:/?#]*).*$!bali.asemantics.com!i" .
		NAPTR   100 10 "" "" 
"!^http://foaf.([^:/?#]*).*$!foaf.asemantics.com!i" .

Again we apply the regexes to the URL, in the right order (ordered by
order first and by pref (second field) second).

order 100, pref 10
	http://www.asemantics.com//n/index.html 
=~"!^http://foaf.([^:/?#]*).*$!foaf.asemantics.com!i"
	
no match. Ok, next one.

order 100, pref 20
	http://www.asemantics.com//n/index.html =~ 
"!^http://([^:/?#]*).*$!bali.asemantics.com!i"

and we get

	bali.asemantics.com

So what has happened here is that we are routing the request to the 
right place;
as some URI's on our FOAF server are special cases; whereas most of them
go to the server Bali.

Then, you've guessed it, we do an other lookup

	dig -t NAPTR bali.asemantics.com

and get back:

	bali.asemantics.com 1800 IN
		NAPTR   100 10 "u" "http+I2L" 
"!^http://([^:/?#]*)(.*)$!http://\\1/url.pl/\\2!i" .
		NAPTR   100 10 "a" "z3950+I2C" 
"!^http://([^:/?#]*)(.*)$!209.132.96.45!i" .
		NAPTR   100 10 "u" "http+I2C" 
"!^http://([^:/?#]*)(.*)$!http://\\1/rdf.pl/\\2!i" .
		NAPTR   100 10 "u" "http+I2R" "!^(.*)$!\\1!i" .

Note that this time there is a value in the 'flags' field; a 'U'. This 
signals that
a match of the corrensponding regex means:

->	Terminal; do not evaluate any further.

->	And the result of the regex (if it matched) MUST be a URI.

Several other flags are defined.

Secondly you'll notice that the 'service' field contains something. The 
syntax is

	[ protocol ] [ '+' service >

Where protocol is any valid IANA service (see your /etc/services file); 
http
or ftp are well known examples and 'service' can be several values; 
shown
above are

	I2R		Identifier to Resource -> give me the thing
	I2L		identifier to Location -> give me the location
	I2C		identifier to Characteristic -> give me metadata about the 
resource

So lets now assume that we started this procedure out with the desire
to learn ABOUT the url, and that we speak http; then apply the above 
rules:

		NAPTR   100 10 "u" "http+I2R" "!^(.*)$!\\1!i" .

would match fine; we can do http, but we're not interested in I2R, so 
next we try

		NAPTR   100 10 "a" "z3950+I2C" 
"!^http://([^:/?#]*)(.*)$!209.132.96.45!i" .

and this matches, we want I2C - but we're no dinosaurs; so we do not 
speak
z3950. So next we try:

		NAPTR   100 10 "u" "http+I2C" 
"!^http://([^:/?#]*)(.*)$!http://\\1/rdf.pl/\\2!i" .

this matches, and we can do http and we want I2C; so the fiinal result 
is

		http://www.asemantics.com/rdf.pl//n/index.html

and the terminal type is 'U' - so I should interpret the above result 
as a URI. [3]

Apologies for any types/cut-and-paste errors in above - I'll spend some 
cycles
in the next week to simply the rules in our demo domain above to make 
it a
bit easier to follow.

On http://foaf-demo.asemantics.com/ex.html you can find some very rough 
and
ready code in perl/java which does the above; OR (better) you can cut 
and
paste the algorithm from RFC 3402 and 3404. Which probably is much 
quicker.
(Though I'd love to hear if you open source your 
python/perl/php/ruby/assembler
version of it :-).

C'est Ça

Dw.

Notes

1:	Using abbreviated/simplified and not quite correct
	regexpes in perl style to make it easier to follow, the
	real ones are more complex to deal with escaping
	and match exact the URI def in the RFC.

2;	See rfc3401-3405 for exactly how when this is used; in
	general use the regex if there is a value or do outright
	substitution if it is empty with the replacement value.

3:	Other options are an SRV record or simply an IP
	address.

4:	Rfc 2396	Uniform Resource Identifiers

5:	Rfc 2915	NAPTR record

6.	Rfc 3405, http://uri.net/ddds.html

Dw
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Wednesday, 3 March 2004 05:53:49 UTC