Re: Matching same ressources but with varying URL schemes (http / https)

On 7/4/13 2:12 PM, Aidan Hogan wrote:
> On 04/07/2013 17:45, Kingsley Idehen wrote:
>> On 7/4/13 11:49 AM, Olivier Berger wrote:
> <snip to Olivier's question:>
>>> For instance, I'd like to match as "identical" two doap:Projects
>>> resources
>>> which have "same" doap:homepage if I can match
>>> http://project1/example.com/home/ and 
>>> https://project1/example.com/home/
>>>
> <snip to Kingsley's response:>
>> Conclusion: just leverage RDF semantics, forget about regexing anything,
>> and you have a first-class demonstration of what RDF actually adds to
>> Linked Data :-)
>
> Kingsley, I think you misread the question.
>
>   doap:homepage a owl:InverseFunctionalProperty .
>   :SomeThing doap:homepage <http://project1/example.com/home/> .
>   :AnotherThing doap:homepage <https://project1/example.com/home/> .
>
> will not infer:
>
>   :SomeThing owl:sameAs :AnotherThing .
>
> The two values for doap:homepage are different. 

Ah!

I assumed the objects of the relation where the same, overlooked the 
http and https.

> The question is how to normalise/canonicalise/sameAs the two homepage 
> URIs (which are identical up to http/https/trailing slash, etc.), as a 
> lead up to making that inference.

In this case, he can takes Steve's suggestion re: :canonicalUri relation 
(which can also be designated as an IFP an used in an inference rule ).  
Cutting and pasting Steve's suggestion re materialized triples:

One approach would be to have some sort of property like "canonical URI", which you can use for your matching, then you can lean on the triplestore's built in URI indexing.

{
   <foo> doap:homepage<https://Foo.example>  .
   <foo> doap:name "Foo" .
   <https://Foo.example>  :canonicalUri<http://foo.example/>
}


Kingsley



>
> @Olivier, I'd propose to put a script between your sources and your 
> store that applies a manual normalisation of the URIs. Caveat emptor: 
> the URIs *are* different so although normalisation might seem like a 
> great idea in some cases, it could also easily conflate things that 
> are actually different. Be particularly wary of the trailing slash case.
>
> Cheers,
> Aidan
>
>
>
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen

Received on Thursday, 4 July 2013 21:05:30 UTC