minimal set of URIs for individuals (in context of owl:sameAs)

I'm trying to understand how to get a minimal set of URIs to refer to  
a set of individuals[1], where multiple URIs might have been declared  
owl:sameAs each other.  This would be useful for counting individuals  
of a particular owl:Class, while respecting owl:sameAs, but also for  
UI where you don't want to show the individual multiple times (once  
for each synonym URI). The set would be such that all the URIs would  
be owl:differentFrom each other, and there would be one (and only one)  
for each set of URIs declared owl:sameAs each other.

I note that the COUNT extensions I've looked at, such as ARQ, count  
URIs rather than attempting to count semantic entities.

Minimal example:

eg:User rdf:type owl:Class .

eg:userA rdf:type eg:User .

eg:userB rdf:type eg:User .

eg:userC rdf:type eg:User .

# now add new knowledge that eg:userA and eg:userB
# are actually synonyms for the same person, but
# that eg:userC refers to a separate person

eg:userA owl:sameAs        eg:userB ;
          owl:differentFrom eg:userC .

So there are actually two people, where one has two synonyms (eg:userA  
and eg:userB)

Now if I use OWL inference and SPARQL I could find the first URI for  
any eg:User:

WHERE { ?uri rdf:type eg:User } LIMIT 1

getting, for example, the result eg:userC, and then run a second query  
like:

WHERE { ?user owl:differentFrom eg:userC }

but that would give me both eg:userA and eg:userB. If I then use that  
list to count I get 3, rather than the desired 2.  If I use it to draw  
a UI I get repetition of an individual.

I'm hoping to end up with a set or URIs, such that all the member URIs  
are owl:differentFrom each other, and there is one URI for each  
individual in the set.

Any SPARQL methods to do this, or do I need to post-process the  
results of the second query to 'whittle' down the results recursively  
removing elements that are owl:sameAs each other?  Seems like a  
problem others would have faced.  Perhaps owl:allDifferent is relevant  
here, can that be used in SPARQL queries in some way?

Apologies if people saw a similar query a few days ago on jena-dev, I  
didn't get any answers so I tried to clean it up, cut it down a bit  
and find the right venue.

Thanks,
James

ps.  I realize that the idea of counting individuals this way violates  
the open world assumption (there may, of course, be many more 'out  
there') but for many purposes (like UIs) this is still a valid desire,  
I think.

[1] Individual as distinct from URI.  ie if eg:a owl:sameAs eg:b there  
are two URIs but only a single individual (with two synonyms).  I hope  
that's the right nomenclature.  Happy to be corrected.

Received on Sunday, 13 April 2008 17:06:07 UTC