RE: What to do about namespace derived URI refs... (long) from Patrick.Stickler@nokia.com on 2001-06-08 (www-rdf-interest@w3.org from June 2001)

From: <Patrick.Stickler@nokia.com>
Date: Fri, 8 Jun 2001 10:52:45 +0300
To: sandro@w3.org, www-rdf-interest@w3.org
Cc: Ora.Lassila@nokia.com
Message-ID: <6D1A8E7871B9D211B3B00008C7490AA507958752@treis03nok>
[Sandro, I replied to this via the group as I considered it a continuation
of the discussion thread... hope you don't mind]

> >> > Sandro, the tag FAQ and Tutorial are not accessible from the
> >> > URL you specified. It simply says "coming soon". Can you 
> >> > update the web page or provide copies as attachments?
> >> 
> >> That's odd.  I don't know where in the infrastructure an old copy
> >> could be.  You've tried shift-reload and that kind of client-side
> >> stuff?
> >
> >Yes. It's not there. Maybe an older copy overwote the newer?
> 
> I've tried it from four different multi-user systems on which I
> happen to have shell acounts (all run by totally different
> organizations), and it works fine on all of them.   (I typed "lynx
> taguri.org" and each one displayed a page with the tutorial.)

My apologies. Don't know what is up with my browser (IE5) but it
works now. I had tried both reload and setting prefs to 'check every time'
but it still gave me the old page. I've rebooted since then and now
it came up straight away with the new version. More Microsoft mysteries...

> >I don't see how binding or not binding a tag to a data 
> stream disqualfies
> >tags as a URN, since the intended use for tags are as names, 
> not locations.
> 
> Dan Connolly tells me that the URI Working Group came to the
> conclusion that the notion of URLs was obsolete, since the web
> infrastructure now uses them as URIs anyway (finding the closest
> server, etc).  

Arggh! My God! What are they thinking! Or have I totally lost touch ;-)

A URL is a URI that directly represents a web resource (a MIME data stream)
and a URN is a URI that indirectly represents a web resource *or* an 
abstract concept, and which *might* in a given context be mapped to a URL
for the actual resource, if it is not abstract.

A URL is a location, an address of content. A URN is a name. The address
does not identify the content, it only locates it. The name does not locate
the content, it only identifies it. This is a fundamental distinction that
should *never* be lost. Either to say that a URN "functions" as a URL or
that a URL "functions" as a URN is to totally abandon this distinction,
and if that is what the URI Working Group is thinking, then God help the
internet! Just because a URN has a mapping in some context to a URL or
can be resolved by some agent to a MIME stream, does *not* make it a URL!
And just because someone might use a URL as a URN, does not mean that that
is a valid URL. It is not if it does not define the *address* of some
content.

I wonder if Dan et.al. are trying to rationalize away the common abuse of
URLs
for universal names for abstract concepts by blurring up the original clear
distinctions between URL and URN. Sad. Very sad. What to do....??? Sigh.

> If you're saying that the binding between a URN and its denotation
> (such as a web page) is not constrained to be constant across time and
> space, then it sounds like the notion of URN is obsolete, too, since
> URNs are operationally identical to all other URIs.  Perhaps the
> difference is simply that with URNs one is expected to do some
> rewriting and redirecting closer to the client?

Not at all obsolete. A URN is meant to provide a location independent
identity to some content. That content may infact be stored redundantly
in numerous places, and the resolution agent within a given context
might choose from the "closest" (in internet distances) *address* from
which that uniquely defined information might be retrieved. This is
e.g. exactly what was intended by URN schemes such as 'isbn'. Taking
the more "modern" case of eBooks, various book retailers, libraries, 
etc. may all store their own local copy of a given eBook, each with a
different location (URL) yet all use the same URN, based e.g. on the
ISBN of the eBook to identify the publication. Persons requesting a
copy of that publication (for loan, purchase, whatever) would simply
be able to request the book by its URN, and each environment (for each
retailer, library, etc.) maps that URN to the URL for that content
within their same environment.

For inter-library loans, the "interchange" is simply a URN and two URLs,
one for the source library and one for the target library. This distinction
of name versus address is crucial if we are to avoid an exponential
explosion
in complexity when trying to define equivalencies between URLs in the
abscense of universal names, especially as locations change yet names 
should not! Having universal names enables the proper scalability of 
the SW, leaving each context to manage its own mappings from names of
non-abstract web resources to locations. If we do away with the common
names, then every context must define a mapping between its own locations
for resources and every other location of the same resource in order
to achieve any semblence of interoperability! It will be a scalability
nightmare!

Single case in point: using a URL as a name for ISO 639 language "Finnish".
There is no single official URL for the definition of the ISO 639 standard.
But there are many, many URLs where it is re-iterated in some web accessible
format. What if everyone chooses a different URL for their "authority", some
choose the Oasis site. Some an IETF RFC. Others a W3C note. Others some
publisher's site. etc. etc. How then does any agent on the SW *know* that
another agent *means* ISO 639 language "Finnish" if it does not have the
mapping from that agents preferred "URL name" (an oxymoron!) to that used
by itself?!!! It can't!  

Even if all agents agreed on a single URL to use as a name, which is
essential
for the SW to work, then (a) the name inherits all of the fragility
of a URL, being a location/address, and (b) because it is a URL, there will
be the likelyhood that some explicit schema will be located at the common
"namespace" portion of the URL and fragment identifiers will be used to
define the
sub-components of the "namespace", and those fragments will be MIME content
type 
specific and thus are both unreasonably tied to a given MIME content type
(since they are names, not references) and also not guarunteed to remain
valid
over time if e.g. the schema encoding for the ontology changes, and finally
there is no guaruntee of compatibility between various serialization/schema
interpretations of "namespace" + "name" and that used for the actual names!

In short, using URLs or URL refs as names brings chaos not order to the SW.

I'm really wondering what the W3C and IETF are thinking by abandoning
this critical distinction between URL vs. URN as a strict partitioning
of URI schemes. The intersected diagram of URI types and the view that
URIs can sometimes be URLs and sometimes be URNs is very very disturbing.

> >>   Step 3. Encode the date. In theory, tags could use dates 
> of the form
> >>   "2001-06-05", but we decided to save people a lot of 
> typing by using
> >>   a shorter notation for dates. Instead, we write the date 
> I picked as
> >>   "1-6-5". We also say that the first day of a month and 
> the first day
> >>   of a year have a further modification: you drop the "1" 
> fields, so
> >>   January 1, 2007 is written as "7".
> >
> >Firstly, by not using ISO standard date formats, you require software
> >that might wish to "understand" the date to implement your 
> new proprietary
> >encoding for dates. Bad.
> 
> There is no reason to ever read the dates.    Tags are opaque strings
> to all software.

But in your argument for the existence of dates, you said the
purpose was specifically to *differentiate* between tags generated
by you and e.g. your grandson. If you don't read the date, then 
you cannot create an ordering of the tags, nor can you compare
the date of the tag with e.g. the period of your life versus that
of your grandson, etc. The dates must be read if inferences are to
be drawn about the temporal relationships between tags as a basis
for determining the minter. Eh?

> >Secondly, it is ambiguous as to whether it is
> >year-month-day or year-day-month. Even though the tag spec says which
> >is which, folks in Europe will hate you ;-) 
> 
> We just used the ISO ordering.

But one must enter the dates, and since the average person will
look for examples to guide him/her (who reads the manual ;-) there
is great potential for confusion. Even though there is the ISO
ordering, the possible encodings of dates are potentially ambiguous
therefore their utility as examples for immitation is greatly
diminished.

> >It's not *that* hard to write, and anyway, you could make a 
> cute little
> >utility to autogenerate your tags for you.
> 
> It's not a question of generation.  I use ISO dates in filenames all
> the time, etc.  It's a question of the world being plastered with
> tags, 98% of which could either use "2001-01-01" or "1".  I see little
> reason to waste 9 character positions on billions of written
> instances.   For just RDF, which obviously doesn't care about such
> things, I wouldn't care about that, it's true.

A greater problem the ambiguity of ordering between day and month, etc.
is the fact that the notation "logic" for compressing the dates is
too difficult for any average user to be willing to learn and apply.
It's clever, but requires too much thought for "the average Joe".

Folks know the ISO format. It requires no thought at all. And the
consistent format reinforces the proper syntax of the identifier.
The range of possible variation in date encodings will put average
folks off. If they simply see e.g.
'name:<myEmailAddress,<####-##-##>:<name>'
again and again, then they will use it. If they see all kinds of
strange and (to the average person) arcane variations in the date
encoding, they will say "too difficult" and go on abusing URLs...

> One of the problems with using ISO dates is that people will assume
> they mean something related to the tag, as opposed to simply naming a
> time the authority name is valid.  If you see
>    tag:heinz.com,2001-04-30:baked-beans
> you might well think that date had SOMETHING to do with the beans, or
> at least the time Heinz introduced that kind of beans.  But of course
> it doesn't.   So I think
>   tag:heinz.com,1:baked-beans
> is more appropriate.

I can see your motivations for compressing the date encoding, but
I just don't see the average web user adopting such a methodology.
If these are supposed to be human-minted names, based on someone
having to think about what the name should be, then the date compression
is just too complex for broad adoption (IMHO of course). 

The potential for folks to mis-interpret the date as anything other
than the date of minting should simply be addressed (probably pedantically)
in documentation, tutorials, etc.

> ...
> >You'll probably want to constrain what this name substring portion
> >can include a little, possibly excluding whitespace and special 
> >characters, etc. Otherwise, it could choke various tools and 
> applications
> >and lead to technically valid yet unintended abuse of the URI scheme.
> 
> In the actual spec we limit it to URIChar*, of course.   

Right. Missed that.
 
> >and perhaps a semantically "pre-loaded" scheme 
> >identifier 'id' were used,
> 
> I like the name id, too.   I'm also fond of token.   But tag seems
> okay, and it's getting know by that.   I first called in "tann" for
> Time/Authoirty-Name/Name.

The name of the URI is really an issue of "marketing". Actually, I
am thinking that 'name' (as used above) would be the easiest to "sell"
to the general public.

Still, going back to my argument about whether tags are URNs, I'd say
they definitely are, and that what may actually be needed is the proper
support for the 'urn:' URI scheme, and that tags could be one valid
urn: sub-scheme and names could be another, with different purposes
(the former being arbitrary, temporally bound identifiers and the 
latter being universal names for abstract concepts).

Thus:

      urn:tag:heinz.com,2001-04-30:baked-beans
and
      urn:name:metia.nokia.com:MARS/2.1/status/approved

though if we simply make the date optional (for when needed, we could
probably sell a single URN/URI more easily, e.g.:

      name:heinz.com,2001-04-30:baked-beans
      name:patrick.stickler@nokia.com:myDataTypes/gazonka-big
      name:patrick.stickler@nokia.com,2001-05-22:myMotorcycle
      name:metia.nokia.com:MARS/2.1/status/approved
      name:dublincore.org/elements/1.1/Title
      name:dublincore.org/elements/1.1/Creator
      name:prismstandard.org/1.0/creationTime
      name:iso.ch/3166-1/fi                     (the country "Finland")
      name:iso.ch/639/fi                        (the language "Finnish")

Eh?

(all of the above are merely examples, apologies to the various 
 authorites mentioned)

> >"id:metia.nokia.com:MARS/2.1/status/approved", then we could have a 
> 
> But what happens when Nokia loses a trademark battle with M&M MARS Co,
> which legally gets nokia.com, etc, etc. ?  Without the date you
> constraint future domain holders in a way which may be neither legal
> nor practical -- what if Nokia looses the records of what names it has
> minted?  With tags, they would just start using some later date,
> probably the most recent year start or "2" (assuming we're into 2002
> by now).

This is simply part of the larger issue of trademarks, product names,
and copyrights -- and there are lots of guidelines and precidents to
apply to such cases.

The same argument applies to URLs used as URNs, to tags, and any other
public data used by a business, person or other entity.

The presence or absence of a date in the URN will likely have no
significance in whether or not Nokia could continue to use it, as
the presumed trademark infringement is not because of the date.

> As for the Semantic Web....   well,....   yeah.  Something like this
> could be nice.  :-)

Agreed.

BTW, I'll be in Boston (Burlington) next week, and would have the evening 
of the 14th (Thursday) free. Would you be interested in getting together
somewhere for a chat and e.g. a few beers? Maybe Ora and some of the
other local RDF folks might want to join us?

Cheers,

Patrick

--
Patrick Stickler                      Phone:  +358 3 356 0209
Senior Research Scientist             Mobile: +358 50 483 9453
Software Technology Laboratory        Fax:    +358 7180 35409
Nokia Research Center                 Video:  +358 3 356 0209 / 4227
Visiokatu 1, 33720 Tampere, Finland   Email:  patrick.stickler@nokia.com
Received on Friday, 8 June 2001 04:24:38 UTC