Re: resolving the URL mess

--On Monday, October 06, 2014 20:54 -0700 Austin William Wright
<aaa@bzfx.net> wrote:

> On Mon, Oct 6, 2014 at 3:48 PM, Larry Masinter
> <masinter@adobe.com> wrote:
> 
>> > Many software applications utilize universal identifiers so
>> > that systems
>> can refer to resources residing in other systems entirely.
>> 
>> This kind of introduction is just confusing. We're not
>> talking about identifiers in general, just URI/URL.

> Mostly, it's laying the foundation for why URIs/URLs exist at
> all. If this foundation is being eroded by incompatible
> URI/URL implementations, then that is defeating the point of
> the URI, and that is a problem.

Yes.  It is a problem.  I also note that a large part of the
library/ museum/ information sciences community and some of the
traditional publisher community won't accept the definition of
"identifier" you proposed.  That is another problem for several
reasons, including:

	* They believe they have many centuries of experience
	with these issues, or at least with issues that overlap
	those that we are discussing.   When something is said
	that they hear as "with the introduction of X, the world
	has changed and history has become irrelevant" or "I've
	written a bunch of blog articles" (or other
	self-published pieces) "..and/or a few papers in
	journals" (which that community doesn't recognize as
	peer-reviewed and competent) "..and am therefore an
	expert", their usual public response is tolerant
	amusement.  What is sometimes said in private is a lot
	less attractive.
	
	* If "our" identifier systems don't meet their perceived
	needs, they can, and will, either adopt others (e.g.,
	DOIs) or invent their own (e.g., NBNs).  That should be
	fine from "our" point of view as long as those systems
	can be embedded in appropriate URIs without doing
	violence to either.  If they cannot, we end up with
	rough edges or another source of standards forks than
	the ones Austin has been identifying.
	
	* Unlike WHATWG, W3C, IETF, (in no particular order) or
	other voluntary bodies, in some countries, those
	communities are able to mandate the use of their
	identifiers and identifier preferences or even prohibit
	others.  It is most likely in our interest to not
	provoke that behavior which, fwiw, becomes more likely
	if our creating conflicting standards and definitions
	creates confusion in areas that they care about.

> I can't think of a better way to phrase this to make this
> clear, would you mind suggesting an improvement?

Try saying what you mean rather than using terms list "universal
identifiers".   FWIW even in the URL/URI context, we dropped
"universal" in favor of "uniform" (with, IIR, Tim B-Ls
agreement) because the former was just too expansive and, in the
eyes of some, arrogant and overreaching.   In general, one
should avoid ways of working on things that require settling
questions about universality unless they are really necessary
because it is so easy to get bogged down in them.  If we are now
returning to assertions of universality, well, enjoy the rat
hole tour.

> Also, technically I don't even see a difference, as any
> identifier can be translated to a URI or IRI.

Depending on what you mean by "translated", that statement may
be manifestly false.

> And this is
> necessarily true of universal identifiers in general. A
> notable explanation is found in the "URIs and the Test of
> Independent Invention" section in <
> http://www.w3.org/DesignIssues/Axioms.html>.

Current URI specs, including 3986, appear to fail the assertions
about "sameness" made in that document, probably for good
reason.  1996 was a very long time ago.

>> And frankly I think we should include the
>> political/organizational power struggle which seems to fuel
>> much of the angst that gets in the way of a technical
>> solution.
> 
> Any power struggles I've seen so far first came about due to
> interoperability problems, by way of vendors choosing mutually
> incompatible workarounds.

While I agree that the power struggles should not be ignored, I
think we also run the risk of dismissing real and fundamental
difference between legitimate perspectives as mere power or
control struggles.  Dismissing the views of the library, etc.,
community about the proper meaning of "identifier" and its
implications as a power struggle will ultimately not move us
forward.  Conversely, dismissing the semantic web as the
delusion of a bunch of professionally- and historically-immature
individuals who haven't bothered to try to understand centuries
of experience and literature is just not helpful.  Questions of
who is right and who is wrong in either of those cases are
probably unresolvable except in more centuries of retrospect and
we set ourselves up for failures much broader than the question
of what to do about RFC 3986 (or various other specs) if we put
answering those unresolvable questions into the critical path.

These are not interoperability problems; they approach
fundamental linguistic/epistemological ones.   Perhaps
substitution of "Gavagai" for "URI" or "universal identifier"
and pointing would help clarify the issue.
 
> Do you have specific examples?

See above. 

> Power struggles in general are only going away when the
> relevant parties can check their hubris, and we can't fix
> that. In many cases though, we have the ability to fix other
> causes.

As above, there is more than hubris here, although there are
certainly large quantities of it as well.   In addition,
behaviors with symptoms that are probably indistinguishable from
power struggle can also occur when people are concerned about
conflicting standards for what appear to be the same thing and
start seeking a single authoritative definition.  That is, I
think, the situation we are in.  Describing it as a power
struggle is probably less helpful than trying to find common
ground and, at least, good descriptions of scope and areas of
applicability.  While cooperation, consultation, and consensus
would probably be better, I note that clear statements about
scope (including efforts to avoid scope conflicts) and the like
can be developed and implemented unilaterally. 

>...
>> Are we trying to solve an implementation compatibility
>> problem, or just a specification compatibility problem? Or a
>> situation where implementations don't agree, but that for the
>> most part the differences are inconsequential?
> 
> I don't believe there's a problem with the RFCs themselves
> causing the problems in question. They provide ABNF and
> pesudocode, I'm not sure how you can get less confusing than
> that. Though I know David Sheets has ideas, do you too?

I do, or at least disagree about "less confusing".  I note that
there are bits of ABNF in 3986 that are inconsistent with the
way ABNF works: the intent is probably clear, one one wonders.
More important, the ABNF and Pseudocode don't answer questions
about what is an example, a recommendation, or a requirement
very clearly.

>...
> Correct, but RFC3986 defines behavior like what it means to
> dereference a resource, and how to resolve a Reference to
> absolute form, complete with pesudocode. "This behavior"
> refers to the behavior of a compliant implementation.

And those definitions in RFC 3986 have been argued to be
problematic for URI types/schemes that are very different from
HTTP URLs.   It is possible to claim that everyone who sees
those problems is a fool, or possibly a damn fool (and that
position has essentially been taken) but I don't think it is
very helpful.

>...
>> I'm afraid we have no control over how terms are used in
>> the world, where everyone knows what a URL is. So "absolutely
>> crystal-clear" is way beyond us. I'm just hoping for improved
>> clarity in WHATWG and W3C documents.
>> 
> 
> Not in the world, just for our use. I'm not concerned how the
> layperson uses the terms, in most cases URL and URI can be
> used interchangeably, and context frequently makes it clear
> that they really mean something else altogether (e.g.
> "Fragmentless HTTP URL").

Right.   And while "everyone knows what a URL is", what they
know may be inconsistent with each other.  In addition, a lot of
people seem to believe that "URI" is just a fancy term for a URL
with any differences being very fuzzy indeed.

>...
>> > * URI: Authoritatively defined in RFC3986, ...
>> > * IRI: Defined in RFC3987 as a....
>> > * URL: The URI was created as a generalization of the
>> > URL.... 
>> > * URN: Likewise defined in terms of the URI, ...

Interestingly not completely true.  RFC 2141 was not defined in
terms of contemporary (i.e., RFC 3986) URIs and the distinction
is important.

>...
>> Getting consensus on the history and characterizations of
>> these protocol elements is very hard. I'm not sure it's
>> possible, or necessary. I *am* sure trying to put one history
>> and overview in the charter is a non-starter.
> 
> I'm presenting definitions that seem to be held in common. I
> can't find specific numbers, but I'd be willing to bet, and
> Google Scholar seems to back this hypothesis up, that RFC3986
> is the most cited standards-track RFC ever.

Compared to, e.g., 822?  It would probably be an interesting
contest, especially if one excluded citations of 3986 to
criticize, question, or deride it.

> Unless someone has a specific objection, I'm not sure how this
> could be a non-starter. This is technical literature, we're
> entitled to a bunch of different terms that the layperson need
> not care about.

As long as your/our definitions and scope are crystal-clear to
all of the relevant audience.  Some otherwise-sensible people
believes that 3986 does not meet that criterion, so the
foundations of what you propose are problematic.

>...

best,
   john

Received on Tuesday, 7 October 2014 11:53:11 UTC