W3C home > Mailing lists > Public > public-sparql-dev@w3.org > April to June 2008

Re: minimal set of URIs for individuals (in context of owl:sameAs)

From: James Howison <james@howison.name>
Date: Mon, 14 Apr 2008 11:32:04 -0400
Message-Id: <C9208C23-A26B-42BC-B9CB-22A41BCE4983@howison.name>
To: public-sparql-dev@w3.org

Well here's what I've done about this for the moment:

1. Get DISTINCT uris for all eg:User into distinctUsers array

2. Create empty array to hold Representatives (ie one uri per synonym  

3. Recursively filter the distinctUsers array, adding representatives  
to reps, removing synonym groups until distinctUsers array is empty
  String currUrl = distinctUsers.remove(0); // urls.pop
  reps.add(currUrl); // This is the representative
  ArrayList<String> synonyms = getSynonyms(currUrl, model);
  if (urls.isEmpty()) {
    return reps;
  } else {
    // recurse and continue
    return whittle(urls, reps, model);

getSynonyms() is just a SPARQL query (with results put into an  

"WHERE { <" + currUrl +"> owl:sameAs ?synonym .}"

That gives me the minimal set of uris for synonym groups, and it isn't  
_too_ computationally expensive.  Still wish I could do this via a  
single SPARQL query (perhaps with FILTERs).


On Apr 12, 2008, at 11:30 PM, James Howison wrote:
> I'm trying to understand how to get a minimal set of URIs to refer  
> to a set of individuals[1], where multiple URIs might have been  
> declared owl:sameAs each other.  This would be useful for counting  
> individuals of a particular owl:Class, while respecting owl:sameAs,  
> but also for UI where you don't want to show the individual multiple  
> times (once for each synonym URI). The set would be such that all  
> the URIs would be owl:differentFrom each other, and there would be  
> one (and only one) for each set of URIs declared owl:sameAs each  
> other.
> I note that the COUNT extensions I've looked at, such as ARQ, count  
> URIs rather than attempting to count semantic entities.
> Minimal example:
> eg:User rdf:type owl:Class .
> eg:userA rdf:type eg:User .
> eg:userB rdf:type eg:User .
> eg:userC rdf:type eg:User .
> # now add new knowledge that eg:userA and eg:userB
> # are actually synonyms for the same person, but
> # that eg:userC refers to a separate person
> eg:userA owl:sameAs        eg:userB ;
>         owl:differentFrom eg:userC .
> So there are actually two people, where one has two synonyms  
> (eg:userA and eg:userB)
> Now if I use OWL inference and SPARQL I could find the first URI for  
> any eg:User:
> WHERE { ?uri rdf:type eg:User } LIMIT 1
> getting, for example, the result eg:userC, and then run a second  
> query like:
> WHERE { ?user owl:differentFrom eg:userC }
> but that would give me both eg:userA and eg:userB. If I then use  
> that list to count I get 3, rather than the desired 2.  If I use it  
> to draw a UI I get repetition of an individual.
> I'm hoping to end up with a set or URIs, such that all the member  
> URIs are owl:differentFrom each other, and there is one URI for each  
> individual in the set.
> Any SPARQL methods to do this, or do I need to post-process the  
> results of the second query to 'whittle' down the results  
> recursively removing elements that are owl:sameAs each other?  Seems  
> like a problem others would have faced.  Perhaps owl:allDifferent is  
> relevant here, can that be used in SPARQL queries in some way?
> Apologies if people saw a similar query a few days ago on jena-dev,  
> I didn't get any answers so I tried to clean it up, cut it down a  
> bit and find the right venue.
> Thanks,
> James
> ps.  I realize that the idea of counting individuals this way  
> violates the open world assumption (there may, of course, be many  
> more 'out there') but for many purposes (like UIs) this is still a  
> valid desire, I think.
> [1] Individual as distinct from URI.  ie if eg:a owl:sameAs eg:b  
> there are two URIs but only a single individual (with two  
> synonyms).  I hope that's the right nomenclature.  Happy to be  
> corrected.
Received on Monday, 14 April 2008 15:32:41 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:15:49 UTC