Matching same ressources but with varying URL schemes (http / https)


I hope such "design pattern" questions on consuming Linked Open Data are
OT... otherwise, please suggest an appropriate venue for questions ;)

I'm trying to figure out potential patterns for designing an application
/consuming/ Linked Data, typically using SPARQL over a local Virtuoso
triple store which was fed with harvested Linked Open Data.

I happen to find resources sometimes identified with http, sometimes
with https, which otherwise reference the same URL. Other issues may be
the use or not of a trailing slash for dir-like URLs.

For instance, I'd like to match as "identical" two doap:Projects resources
which have "same" doap:homepage if I can match
http://project1/ and https://project1/

It may happen that a document is rendered the same by the publishing
service, whichever way it is accessed, so I'd like to consider that
referencing it via URIs which contain htpp:// or https:// is equivalent.

Or a service may have chosen to adopt https:// as a canonical URI for
instance, but it may happen that users reference it via http somewhere

Obviously, direct matching of the same ?h URIRef won't work
in basic SPARQL queries like :
PREFIX doap:  <>

  GRAPH <htpp://> {
   ?dp doap:homepage ?h.
   ?dp doap:name ?dn
  GRAPH <htpp://> {
   ?ap doap:homepage ?h.
   ?ap doap:name ?an

I can think of a sort of Regexp matching on the string after '://' but I
doubt to get good performance ;-)

Is there a way to create indexes over some URIs, or owl:sameAs relations to
manage such URI matching in queries ? Or am I left to "normalizing" my
URLs in the harvested data before storing them in the triple store ?

Would you think there's a reasonably standard approach... or one that
would work with Virtuoso 6.1.3 ? ;)

I imagine that this is a kinda FAQ for consuming Linked (Open)
Data... but it seems many more people are concerned on publishing than
on consuming in public discussions ;-)

Thanks in advance.

P.S.: already posted a similar question on
Olivier BERGER - OpenPGP-Id: 2048R/5819D7E8
Ingenieur Recherche - Dept INF
Institut Mines-Telecom, Telecom SudParis, Evry (France)

Received on Thursday, 4 July 2013 15:50:24 UTC