W3C home > Mailing lists > Public > public-vocabs@w3.org > October 2011

Re: Canonicalizing relative URLs seen in URL type properties?

From: Philip Jägenstedt <philipj@opera.com>
Date: Thu, 20 Oct 2011 11:31:38 +0200
To: public-vocabs@w3.org
Message-ID: <op.v3m8q0b9sr6mfa@kirk>
Note of caution: the following is about URLs in itemtype, while the  
original thread is about URLs in itemprop. They are not the same.

On Thu, 20 Oct 2011 11:02:36 +0200, Martin Hepp  
<martin.hepp@ebusiness-unibw.org> wrote:

> Hi Philipp:
>
> (I think we started to discuss this in the old schema.org forum, but  
> never managed to finish on this).
>
> What is the Microdata take on the canonicalization of property and  
> href/object identifiers?
>
> So for example
>
>     http://schema.org/InStock
>
> could be also expressed using
>
>     http://SCHEMA.ORG/InStock
>     http://Schema.org/InStock
>     http://schema.org:80/InStock
>
> and other variations, supported by RFC 2616 [1, section 3.2.3].
>
> From a HTTP protocol perspective, they are all equal, but even if  
> desirable, it will be difficult for data consumers to spot the  
> equivalence in queries if those are used as identifiers.
>
> We once had a lengthy discussion on this in
>
>    http://lists.w3.org/Archives/Public/public-lod/2011Jan/0134.html
>
> and the general conclusion seems to have been as follows:
>
> 1. When used as locators (i.e. to retrieve a representation), all  
> variants will deliver the same representations.
> 2. When used as identifiers (i.e. to reference to an entity), only the  
> canonical URI is guaranteed to work.
> 3. RDF implementations would work better if they did implicit  
> canonicalization, at least for the basic HTTP URI variations from RFC  
> 2616 section 3.2.3.
> 4. TBL had a strong opinion that RDF environments should do the  
> canonicalization, while others stressed the enormous technical  
> difficulties given the broad range of URI schemes and their different  
> canonicalization rules.

The spec [1] is quite explicit about this: "Item types are opaque  
identifiers, and user agents must not dereference unknown item types, or  
otherwise deconstruct them, in order to determine how to process items  
that use them."

While the item type is defined to be an absolute URL, it's never really  
treated as a URL and the exact string "http://schema.org/InStock" is the  
only string that can be used.

This trait is evident in the DOM API, where itemType is reflected as a  
string (not resolved like URL properties are) and document.getItems does a  
case-sensitive string match, not checking any kind of URL equivalence.

(One could argue that having two kinds of URLs are confusing and that  
itemtype should be resolved, but I won't, since it would make the DOM API  
more complicated.)

[1]  
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#items

-- 
Philip Jägenstedt
Core Developer
Opera Software
Received on Thursday, 20 October 2011 09:32:08 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 22 May 2012 06:48:56 GMT