- From: Olivier Berger <olivier.berger@telecom-sudparis.eu>
- Date: Thu, 04 Jul 2013 17:49:54 +0200
- To: public-lod@w3.org
Hi. I hope such "design pattern" questions on consuming Linked Open Data are OT... otherwise, please suggest an appropriate venue for questions ;) I'm trying to figure out potential patterns for designing an application /consuming/ Linked Data, typically using SPARQL over a local Virtuoso triple store which was fed with harvested Linked Open Data. I happen to find resources sometimes identified with http, sometimes with https, which otherwise reference the same URL. Other issues may be the use or not of a trailing slash for dir-like URLs. For instance, I'd like to match as "identical" two doap:Projects resources which have "same" doap:homepage if I can match http://project1/example.com/home/ and https://project1/example.com/home/ It may happen that a document is rendered the same by the publishing service, whichever way it is accessed, so I'd like to consider that referencing it via URIs which contain htpp:// or https:// is equivalent. Or a service may have chosen to adopt https:// as a canonical URI for instance, but it may happen that users reference it via http somewhere else... Obviously, direct matching of the same ?h URIRef won't work in basic SPARQL queries like : PREFIX doap: <http://usefulinc.com/ns/doap#> SELECT * { GRAPH <htpp://myapp.example.com/graphs?source=http://publisher1.example.com/> { ?dp doap:homepage ?h. ?dp doap:name ?dn } GRAPH <htpp://myapp.example.com/graphs?source=https://publisher2.example.com/> { ?ap doap:homepage ?h. ?ap doap:name ?an } } I can think of a sort of Regexp matching on the string after '://' but I doubt to get good performance ;-) Is there a way to create indexes over some URIs, or owl:sameAs relations to manage such URI matching in queries ? Or am I left to "normalizing" my URLs in the harvested data before storing them in the triple store ? Would you think there's a reasonably standard approach... or one that would work with Virtuoso 6.1.3 ? ;) I imagine that this is a kinda FAQ for consuming Linked (Open) Data... but it seems many more people are concerned on publishing than on consuming in public discussions ;-) Thanks in advance. P.S.: already posted a similar question on http://answers.semanticweb.com/questions/23584/matching-ressources-with-variying-url-scheme-http-https -- Olivier BERGER http://www-public.telecom-sudparis.eu/~berger_o/ - OpenPGP-Id: 2048R/5819D7E8 Ingenieur Recherche - Dept INF Institut Mines-Telecom, Telecom SudParis, Evry (France)
Received on Thursday, 4 July 2013 15:50:24 UTC