Re: Matching same ressources but with varying URL schemes (http / https)

On 7/4/13 11:49 AM, Olivier Berger wrote:
> Hi.
>
> I hope such "design pattern" questions on consuming Linked Open Data are
> OT... otherwise, please suggest an appropriate venue for questions ;)
>
>
> I'm trying to figure out potential patterns for designing an application
> /consuming/ Linked Data, typically using SPARQL over a local Virtuoso
> triple store which was fed with harvested Linked Open Data.
>
> I happen to find resources sometimes identified with http, sometimes
> with https, which otherwise reference the same URL. Other issues may be
> the use or not of a trailing slash for dir-like URLs.
>
> For instance, I'd like to match as "identical" two doap:Projects resources
> which have "same" doap:homepage if I can match
> http://project1/example.com/home/ and https://project1/example.com/home/
>
>
> It may happen that a document is rendered the same by the publishing
> service, whichever way it is accessed, so I'd like to consider that
> referencing it via URIs which contain htpp:// or https:// is equivalent.
>
> Or a service may have chosen to adopt https:// as a canonical URI for
> instance, but it may happen that users reference it via http somewhere
> else...
>
> Obviously, direct matching of the same ?h URIRef won't work
> in basic SPARQL queries like :
> PREFIX doap:  <http://usefulinc.com/ns/doap#>
>
> SELECT *
> {
>    GRAPH <htpp://myapp.example.com/graphs?source=http://publisher1.example.com/> {
>     ?dp doap:homepage ?h.
>     ?dp doap:name ?dn
>    }
>    GRAPH <htpp://myapp.example.com/graphs?source=https://publisher2.example.com/> {
>     ?ap doap:homepage ?h.
>     ?ap doap:name ?an
>    }
> }
>
> I can think of a sort of Regexp matching on the string after '://' but I
> doubt to get good performance ;-)
>
> Is there a way to create indexes over some URIs, or owl:sameAs relations to
> manage such URI matching in queries ? Or am I left to "normalizing" my
> URLs in the harvested data before storing them in the triple store ?
>
> Would you think there's a reasonably standard approach... or one that
> would work with Virtuoso 6.1.3 ? ;)
>
> I imagine that this is a kinda FAQ for consuming Linked (Open)
> Data... but it seems many more people are concerned on publishing than
> on consuming in public discussions ;-)
>
>
> Thanks in advance.
>
> P.S.: already posted a similar question on
> http://answers.semanticweb.com/questions/23584/matching-ressources-with-variying-url-scheme-http-https

This is an example of what I mean by *explicit* entity relationship 
semantics that RDF uniquely brings to the table re. enhancements to the 
basic EAV/CR model and Linked Data. At this juncture, you are dealing 
with basic structured data and (at best) *implicit* rather than 
*explicit* machine- and human-comprehensible  entity relationship 
semantics.

Situation:

You have the relation doap:homepage, but its semantics aren't clear to 
you or your user agent. Now, let's leverage some basic RDF and Linked 
Data to look-up the semantics of the doap:homepage relation and we find:

1. 
http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fusefulinc.com%2Fns%2Fdoap%23homepage&graph=http%3A%2F%2Fschemapedia.com%2Fschemas%2F 
-- its an inversFunctionalProperty (IFP)

2. 
http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23InverseFunctionalProperty&graph=http%3A%2F%2Fschemapedia.com%2Fschemas%2F 
-- inverseFunctional property description (a little sparse).

Using relationship semantics "reasoning" and "inference" an RDF 
processor can determine that the subjects (irrespective of how they are 
denoted/named) of the doap:homepage relation share a common referent. I 
also posted an IFP exploitation example using SPARQL a while back [1].

Conclusion: just leverage RDF semantics, forget about regexing anything, 
and you have a first-class demonstration of what RDF actually adds to 
Linked Data :-)

Links:

[1] http://bit.ly/Y6TIfs -- Using SPARQL to Integrate Disparate Data via 
InverseFunctionalProperty (IFP) relations .

-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen

Received on Thursday, 4 July 2013 16:46:09 UTC