Re: TaxonConcept Interlinking SPARQL Query, Results and questions about interpretation

Peter,

On 17 Sep 2010, at 20:48, Peter DeVries wrote:
> I created the SPARQL query below for the TaxonConcept Knowledge Base.
>
> It is based on the earlier one posted by Richard Cyganiak.
>
> I looked through my RDF for predicates that have in and out links to  
> other
> data sets.
>
> It is not clear to me how to count basic web pages that are not  
> really RDF
> resources.

I don't think SPARQL has any easy way of distinguishing wether the  
target of a link is “just” a web page or a full-blown RDF resource.

> Also where in the CKAN description do you differentiate between in  
> links and
> out links?

An outlink in our parlance is any triple that's hosted on your site  
where the one resource is in your namespace and the other is in  
someone else's namespace. It doesn't matter which resource is in the  
subject or object position.

An inlink is a triple that uses one of your URIs in the subject or  
object position, but is hosted by someone else, in another dataset.

The CKAN record for your dataset only records the outlinks of your  
dataset.

We find the inlinks by looking at all other CKAN records and see if  
any of them reference your dataset.

Best,
Richard


> I am posting the query and results here so others might benefit from  
> them or
> inform me of something I may be doing incorrectly.
>
> Below is the query, after that follows the results as text and I  
> have also
> attached a .png of the Virtuoso iSPARQL results.
>
> - Pete
> *
> *
> PREFIX owl:   <http://www.w3.org/2002/07/owl#>
> PREFIX skos:  <http://www.w3.org/2004/02/skos/core#>
> PREFIX txn:   <http://lod.taxonconcept.org/ontology/txn.owl#>
> PREFIX foaf:  <http://xmlns.com/foaf/0.1/>
> PREFIX umbel: <http://umbel.org/umbel#>
>
> SELECT ?domain_s ?domain_o (COUNT(*) AS ?count)
> WHERE {
>   {
>       SELECT (bif:regexp_substr("http://([^/]*)", STR(?s), 1) AS ? 
> domain_s)
> (bif:regexp_substr("http://([^/]*)", STR(?o), 1) AS ?domain_o)
>       WHERE {
>           { ?s owl:sameAs ?o }
>           UNION
>           { ?s skos:exactMatch ?o }
>           UNION
>           { ?s skos:broadMatch ?o }
>           UNION
>           { ?s skos:narrowMatch ?o }
>           UNION
>           { ?s skos:relatedMatch ?o }
>           UNION
>           { ?s skos:closeMatch ?o }
>           UNION
>           { ?s txn:speciesConceptHasSpeciesNameString ?o }
>           UNION
>           { ?s txn:speciesNameStringHasSpeciesTaxonConcept ?o }
>           UNION
>           { ?s txn:speciesConceptHasBasionymNameString  ?o }
>           UNION
>           { ?s txn:basionymNameStringHasSpeciesTaxonConcept  ?o }
>           UNION
>           { ?s txn:hasPDFVersion  ?o }
>           UNION
>           { ?s txn:hasAuthorURI  ?o }
>           UNION
>           { ?s foaf:page  ?o }
>           UNION
>           { ?s foaf:topic  ?o }
>           UNION
>           { ?s txn:inDBpediaClade  ?o }
>           UNION
>           { ?s txn:occurrenceInContinent  ?o }
>           UNION
>           { ?s txn:occurrenceInStateProvince  ?o }
>           UNION
>           { ?s txn:occurrenceInCounty  ?o }
>           UNION
>           { ?s txn:isExpectedIn  ?o }
>           UNION
>           { ?s txn:hasExpectationOf  ?o }
>           UNION
>           { ?s txn:isUnknownAboutIn  ?o }
>           UNION
>           { ?s txn:hasUnknownExpectationOf  ?o }
>           UNION
>           { ?s txn:isUnexpectedIn  ?o }
>           UNION
>           { ?s txn:hasUnknownExpectationOf  ?o }
>       }
>   }
> }
> GROUP BY ?domain_s ?domain_o
> *
> *
> *==============================*
> *
> *
> *domain_s** **domain_o** **count*
> lod.geospecies.org lod.taxonconcept.org 71757
> www.uniprot.org lod.taxonconcept.org 23427
> bio2rdf.org lod.taxonconcept.org 23427
> dbpedia.org lod.taxonconcept.org 18849
> eunis.eea.europa.eu lod.taxonconcept.org 2987
> www.bbc.co.uk lod.taxonconcept.org 318
> lod.taxonconcept.org lod.geospecies.org 71756
> lod.taxonconcept.org www.uniprot.org  23427
> lod.taxonconcept.org bio2rdf.org      23656
> lod.taxonconcept.org dbpedia.org      95208
> lod.taxonconcept.org eunis.eea.europa.eu 5974
> lod.taxonconcept.org www.bbc.co.uk 636
> rdf.freebase.com lod.taxonconcept.org 119
> lod.taxonconcept.org 72
> lod.taxonconcept.org rdf.freebase.com 119
> lod.taxonconcept.org 24900
> sw.opencyc.org   lod.taxonconcept.org    24
> lod.taxonconcept.org sw.opencyc.org   24
> lod.taxonconcept.org gni.globalnames.org 73329
> gni.globalnames.org lod.taxonconcept.org 73330
> lod.taxonconcept.org www.americanarachnology.org 1
> lod.taxonconcept.org assets.geospecies.org 3
> lod.taxonconcept.org www.itis.gov     42100
> lod.taxonconcept.org data.gbif.org    1154
> lod.taxonconcept.org en.wikipedia.org 18849
> lod.taxonconcept.org species.wikimedia.org 9328
> lod.taxonconcept.org www.eol.org      579
> lod.taxonconcept.org www.boldsystems.org 122
> lod.taxonconcept.org www.catalogueoflife.org 53
> lod.taxonconcept.org bugguide.net     3297
> lod.taxonconcept.org lod.taxonconcept.org 287048
> assets.geospecies.org media.geospecies.org 5
> lod.taxonconcept.org mushroomobserver.org 5
> assets.geospecies.org lod.geospecies.org 10
> assets.geospecies.org lod.taxonconcept.org 1
> static.flickr.com www.flickr.com   33
> bugguide.net lod.taxonconcept.org 3297
> media.geospecies.org lod.taxonconcept.org 19
> ocs.geospecies.org lod.taxonconcept.org 26
> media.geospecies.org dbpedia.org      14
> assets.geospecies.org dbpedia.org 1
> media.geospecies.org lod.geospecies.org 37
> mushroomobserver.org lod.taxonconcept.org 5
> media.geospecies.org media.geospecies.org 29
> ocs.geospecies.org ocs.geospecies.org 53
> media.geospecies.org assets.geospecies.org 15
> media.geospecies.org static.flickr.com 2
> mushroomobserver.org mushroomobserver.org 3
> mushroomobserver.org dbpedia.org 1
> ocs.geospecies.org sws.geonames.org 39
> lod.taxonconcept.org sws.geonames.org 234792
> sws.geonames.org lod.taxonconcept.org 128529
> -- 
>
>
>
> ----------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> TaxonConcept Knowledge Base <http://www.taxonconcept.org/> /  
> GeoSpecies
> Knowledge Base <http://lod.geospecies.org/>
> About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
> ------------------------------------------------------------
> <interlinking_capture.png>

Received on Saturday, 18 September 2010 18:42:39 UTC