Re: Canonicalizing relative URLs seen in URL type properties?

Okay, so we are in agreement that both for itemtypes and for href values that identify enumerated entities, the canonical URI must be used without any variations that RFC2616 may allow at the protocol level.

I think that is a good consensus; I seem to have misunderstood your earlier comments about URLs for itemtypes.

Quite clearly, advanced consumers of Microdata will try to be tolerant and catch common mistakes / variations, but that is outside the specification level.
Martin

On Oct 20, 2011, at 11:31 AM, Philip Jägenstedt wrote:

> Note of caution: the following is about URLs in itemtype, while the original thread is about URLs in itemprop. They are not the same.
> 
> On Thu, 20 Oct 2011 11:02:36 +0200, Martin Hepp <martin.hepp@ebusiness-unibw.org> wrote:
> 
>> Hi Philipp:
>> 
>> (I think we started to discuss this in the old schema.org forum, but never managed to finish on this).
>> 
>> What is the Microdata take on the canonicalization of property and href/object identifiers?
>> 
>> So for example
>> 
>>    http://schema.org/InStock
>> 
>> could be also expressed using
>> 
>>    http://SCHEMA.ORG/InStock
>>    http://Schema.org/InStock
>>    http://schema.org:80/InStock
>> 
>> and other variations, supported by RFC 2616 [1, section 3.2.3].
>> 
>> From a HTTP protocol perspective, they are all equal, but even if desirable, it will be difficult for data consumers to spot the equivalence in queries if those are used as identifiers.
>> 
>> We once had a lengthy discussion on this in
>> 
>>   http://lists.w3.org/Archives/Public/public-lod/2011Jan/0134.html
>> 
>> and the general conclusion seems to have been as follows:
>> 
>> 1. When used as locators (i.e. to retrieve a representation), all variants will deliver the same representations.
>> 2. When used as identifiers (i.e. to reference to an entity), only the canonical URI is guaranteed to work.
>> 3. RDF implementations would work better if they did implicit canonicalization, at least for the basic HTTP URI variations from RFC 2616 section 3.2.3.
>> 4. TBL had a strong opinion that RDF environments should do the canonicalization, while others stressed the enormous technical difficulties given the broad range of URI schemes and their different canonicalization rules.
> 
> The spec [1] is quite explicit about this: "Item types are opaque identifiers, and user agents must not dereference unknown item types, or otherwise deconstruct them, in order to determine how to process items that use them."
> 
> While the item type is defined to be an absolute URL, it's never really treated as a URL and the exact string "http://schema.org/InStock" is the only string that can be used.
> 
> This trait is evident in the DOM API, where itemType is reflected as a string (not resolved like URL properties are) and document.getItems does a case-sensitive string match, not checking any kind of URL equivalence.
> 
> (One could argue that having two kinds of URLs are confusing and that itemtype should be resolved, but I won't, since it would make the DOM API more complicated.)
> 
> [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#items
> 
> -- 
> Philip Jägenstedt
> Core Developer
> Opera Software
> 

Received on Thursday, 20 October 2011 09:44:13 UTC