- From: Dan Brickley <danbri@danbri.org>
- Date: Sat, 01 Jul 2006 12:03:36 +0100
- To: public-xg-wcl@w3.org
- Cc: timbl@w3.org, connolly@w3.org
(Am sending this to the XG's public list, bcc:'d to the member one. We're all on both, right? it's a good discussion to have in public...) OK Some progress, based on the regex from Jo's doc. Rough notes from the SW Interest Group IRC channel, where I got some help putting this together. I've got a quick perl script that generates an RDF description of each entry in a list of URIs, and a SPARQL query plus various filters which match against some/all of these URIs. It uses a fictional namespace in http://www.w3.org/2004/12/q/ which reminds me to investigate whether I still have write-access there, and if we can use it for the XG. Am Cc:'ing TimBL and DanC who may be interested. Tim, Dan --- this work is motivated by a desire to attach RDF descriptive labels to collections of documents picked out either by enumeration or by patterns expressed against URIs/IRIs. Jo Rabin's doc at http://www.w3.org/2005/Incubator/wcl/matching.html has more background. There's some related work from OpenSearch folks at http://www.snellspace.com/wp/?p=369 that we're loosly connected to via Elias Torres in #swig. For today's hack, see http://swig.xmlhack.com/2006/07/01/2006-07-01.html#1151749799.081592 Perl script: http://spypixel.com/2006/wcl/uri/uri-pl-source.txt List of URIs: http://spypixel.com/2006/wcl/uri/sites.txt Generated RDF: http://spypixel.com/2006/wcl/uri/_data.rdf example: <ID xmlns='http://www.w3.org/2004/12/q/idsyntax#'> <full>http://nobody:nothing@127.0.0.1:8080/dot/slash/dot?foo=bar;x=y</full> <nameFor rdf:resource='http://nobody:nothing@127.0.0.1:8080/dot/slash/dot?foo=bar;x=y'/> <scheme>http</scheme> <authority>nobody:nothing@127.0.0.1:8080</authority> <userinfo>nobody:nothing</userinfo> <host>127.0.0.1</host> <port>8080</port> <path>/dot/slash/dot</path> <query>foo=bar;x=y</query> </ID> Example SPARQL: http://spypixel.com/2006/wcl/uri/filter-test2.rq (this runs OK in Jena/ARQ eg through the Twinkle GUI) Here's the SPARQL example in full. Basically we match the URI descriptions, and then filter against the various strings using the query language's FILTER functionality, in particular, regexs, and/or stuff, and exact matching with "=". The lines with a # are commented out. Note that there are some cases here we'll want for testing, eg. case of the URI scheme (hTtp: etc) could easily trip us up. PREFIX u: <http://www.w3.org/2004/12/q/idsyntax#> SELECT DISTINCT * WHERE { ?id a u:ID . ?id u:full ?full . ?id u:nameFor ?res . ?id u:scheme ?scheme . ?id u:authority ?authority . OPTIONAL { ?id u:userinfo ?userinfo } . OPTIONAL { ?id u:host ?host } . OPTIONAL { ?id u:port ?port } . OPTIONAL { ?id u:path ?path } . OPTIONAL { ?id u:query ?query } . OPTIONAL { ?id u:fragment ?fragment } . # FILTER regex ( ?scheme, "http" ) . # schemes matching "http" ie. includes https: # FILTER regex ( ?scheme, "^http$" ) . # http: scheme # FILTER regex ( ?scheme, "^HTTP$" ) . # HTTP: scheme (do we normalise in the regex or the rdf?) # FILTER regex ( ?scheme, "^http$", "i" ) . # http: scheme, case insensitive (more robust) # FILTER regex(?scheme,"^http$","i") && ( (?port = "8080") || (?port = "1234") ). #FILTER regex(?userinfo, ":") # password is given in the URI FILTER regex(?host, "^pics|www\.pics") . } Easiest way to play with this is to download and run Twinkle from http://www.ldodds.com/projects/twinkle/ and use http://spypixel.com/2006/wcl/uri/_data.rdf as the data URI. I've not got it running against the online Redland SPARQL query installation yet, will ask Dave Beckett where the problem is. There are a few more comprehensive collections of 'tricky' URIs around, I'm not sure the exact status of any URI test suite but have collected up some links in the bottom of the perl script, reproduced here. http://www.w3.org/Addressing/url_test/url_grammar.tests http://www.ninebynine.org/Software/HaskellUtils/Network/URITestDescriptions.html http://www.w3.org/2001/Talks/0912-IUC-IRI/paper.html I've not investigated the IRI side yet, nor taken any care with charset issues (either in the data, or the perl/regex). Next steps in the XG? It would be great if someone could try re-expressing the contents of www.w3.org/2005/Incubator/wcl/matching.html or Phil's recent msg http://lists.w3.org/Archives/Member/member-xg-wcl/2006Jun/0079.html (member-only link) using SPARQL filters plus this vocab. For those of us who prefer to do things with XML, I wonder whether the XML resultset format that SPARQL returns would be an acceptable compromise. If we run the above SPARQL query without any filters, it returns the following XML structure --- http://spypixel.com/2006/wcl/uri/_eg_results.txt ie. markup like this: <result> <id bnodeid="b0"/> <full>HTTP://example.caps.example.org/</full> <res uri="HTTP://example.caps.example.org/"/> <scheme>HTTP</scheme> <authority>example.caps.example.org</authority> <userinfo bound="false"/> <host>example.caps.example.org</host> <port bound="false"/> <path>/</path> <query bound="false"/> <fragment bound="false"/> </result> ...for each result. Am thinking out loud here, not yet quite sure how all these ingredients fit together. And that's without even considering OWL, RIF etc. :) cheers, Dan
Received on Saturday, 1 July 2006 11:03:59 UTC