Re: Canonicalizing relative URLs seen in URL type properties?

Hi Philipp:

(I think we started to discuss this in the old schema.org forum, but never managed to finish on this).

What is the Microdata take on the canonicalization of property and href/object identifiers?

So for example

    http://schema.org/InStock

could be also expressed using

    http://SCHEMA.ORG/InStock
    http://Schema.org/InStock
    http://schema.org:80/InStock

and other variations, supported by RFC 2616 [1, section 3.2.3]. 

From a HTTP protocol perspective, they are all equal, but even if desirable, it will be difficult for data consumers to spot the equivalence in queries if those are used as identifiers.

We once had a lengthy discussion on this in

   http://lists.w3.org/Archives/Public/public-lod/2011Jan/0134.html

and the general conclusion seems to have been as follows:

1. When used as locators (i.e. to retrieve a representation), all variants will deliver the same representations.
2. When used as identifiers (i.e. to reference to an entity), only the canonical URI is guaranteed to work.
3. RDF implementations would work better if they did implicit canonicalization, at least for the basic HTTP URI variations from RFC 2616 section 3.2.3.
4. TBL had a strong opinion that RDF environments should do the canonicalization, while others stressed the enormous technical difficulties given the broad range of URI schemes and their different canonicalization rules.


Best

Martin

[1] http://www.ietf.org/rfc/rfc2616.txt

--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/

On Oct 20, 2011, at 10:26 AM, Philip Jägenstedt wrote:

> On Thu, 20 Oct 2011 02:10:12 +0200, John Panzer <jpanzer@google.com> wrote:
> 
>> What are the rules for canonicalizing URLs seen in URL valued properties?  I
>> would guess that whether or not a value appears in an HTML URL attribute or
>> is encoded some other way with microdata, the rules should be the same.
>> Which implies paying attention to <base> tags etc.  Is this correct?
> 
> This is defined in <http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#concept-property-value>:
> 
> "The value is the absolute URL that results from resolving the value of the element's src/href attribute relative to the element at the time the attribute is set, or the empty string if there is no such attribute or if resolving it results in an error."
> 
> Following the references, you'll find that you need to resolve the URL by (basically) parsing it and then putting it back together while taking <base> into account.
> 
> -- 
> Philip Jägenstedt
> Core Developer
> Opera Software
> 

Received on Thursday, 20 October 2011 09:03:30 UTC