W3C home > Mailing lists > Public > www-tag@w3.org > November 2006

Primary & Equivalent Urls/Resources, Case-Sensitive Urls

From: Mike Schinkel <mikeschinkel@gmail.com>
Date: Wed, 1 Nov 2006 06:08:01 -0500
To: <Vincent.Quint@inrialpes.fr>, <www-tag@w3.org>
Message-ID: <004001c6fda5$ff1debe0$2102fea9@Guides.local>

To all:

Sorry if I am making my comments after the call, but I just got this email
(and I'm relatively new to the list)...

In reading through  http://www.w3.org/2001/tag/2006/10/31-agenda.html I
found a link to http://www.w3.org/DesignIssues/Generic and
http://www.w3.org/2001/tag/doc/alternatives-discovery.html and it occurred
to me that this my be the time and place to discuss an issues I'm very keen
to see addressed. 

The issue is two fold:

1.) How to understand which URL is "primary" for "equivalent" URLs
(w/essentially the same content), and 
2.) Servers that support case-insensitive URLs. 

For (#1) I see the situation on the web where there are a lot of documents
with multiple URLs pointing to either the same resource, or just as
important, an "equivalent" resource.  One simple example is that two URLs
might land on the same content but they are displaying different
advertisements.

But a more interesting scenario would be when using multiple paths to the
same information.  Consider these three URLs:

	http://www.foo.com/toyota/4runner/1999/
	http://www.foo.com/toyota/1999/4runner/
	http://www.foo.com/1999/toyota/4runner/
	
Assuming they point to the same basic content but have different breadcrumbs
for user navigation:

	Home >> Toyota >> 4Runner >> 1999
	Home >> Toyota >> 1999 >> 4Runner 
	Home >> 1999 >> Toyota >> 4Runner 

Although they really are the same page it's difficult to tell them apart. It
would be very nice to be able to have some method for the URI authority to
say which URL is "primary" (or "authoritative"). I think OWL's "sameAs"
touches on the problem but doesn't allow designation of a primary (as far as
I can tell.)  

One use-case for "primary" being needed is using URL as an identity key[1]
where you have multiple URLs pointing to the same content.  Another more
generic use case is given by assuming the following designation from the URI
authority:

	1.) Primary: http://www.foo.com/toyota/4runner/1999/
	2.) Alternate: http://www.foo.com/toyota/1999/4runner/
	3.) Alternate: http://www.foo.com/1999/toyota/4runner/

Then within the <head> of each page, wouldn't it make sense to be able to do
something like this?

	1.) Primary: http://www.foo.com/toyota/4runner/1999/
	<link rel="alternate" href="http://www.foo.com/toyota/1999/4runner/"
/>
	<link rel="alternate" href="http://www.foo.com/1999/toyota/4runner/"
/>

	2.) Alternate: http://www.foo.com/toyota/1999/4runner/
	<link rel="primary" href="http://www.foo.com/toyota/4runner/1999/"
/>
	<link rel="alternate" href="http://www.foo.com/1999/toyota/4runner/"
/>
	
	3.) Alternate: http://www.foo.com/1999/toyota/4runner/
	<link rel="primary" href="http://www.foo.com/toyota/4runner/1999/"
/>
	<link rel="alternate" href="http://www.foo.com/toyota/1999/4runner/"
/>

This way a "primary" document can be identified among a group of
"equivalent" documents.

--------------------------------------------

For (#2), people will often see a URL like this:

	http://www.foo.com/toyota/4runner/1999/

And, knowing the structure, type it in like this:

	http://www.foo.com/Toyota/4Runner/1999/

And by "people" I mean both end users and web developers.

Of course on some web servers that would give a 404 error because the case
is incorrect, but other web servers will not. (I think URL case sensitivity
on web documents is a really bad idea in general because users and web
developers will often type using the wrong case and not even realize that
they did even when presented with an error, but I digress...)

So for a case insensitive web server, either URL will work:

	http://www.foo.com/toyota/4runner/1999/
	http://www.foo.com/Toyota/4Runner/1999/

The problem of course is that little gem about URI Opacity; agents are not
allowed to realize they are the same.  So I would like to make a proposal
that a standard HTTP header is adopted that allows a web server to say that
it is case insensitive, maybe it would be (forgive me is something like this
exists, but in 10+ years of web development I've never heard about it):

	URI-Case-Sensitive: Yes
	URI-Case-Sensitive: No

Thanks for listening and I await your feedback.

-Mike Schinkel
http://www.mikeschinkel.com/blog
http://www.welldesignedurls.org/

[1] http://dig.csail.mit.edu/breadcrumbs/node/71
Received on Thursday, 2 November 2006 02:36:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:42 GMT