Re: Using boolean value vs class from Antoine Zimmermann on 2019-09-26 (semantic-web@w3.org from September 2019)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Thu, 26 Sep 2019 13:06:13 +0200
To: Hugh Glaser <hugh@glasers.org>
Cc: Semantic Web <semantic-web@w3.org>
Message-ID: <0f86fc73-e86e-ebe8-0401-39a258fef360@emse.fr>
On 25/09/2019 13:59, Hugh Glaser wrote:
> Thanks Antoine,
> (Apologies if I am avoiding some of your blog issues, Mike.)
> 
>> On 25 Sep 2019, at 09:59, Antoine Zimmermann <antoine.zimmermann@emse.fr> wrote:
>>
>>
>>
>> Le 24/09/2019 à 15:57, Hugh Glaser a écrit :
>>> Very interesting question, thanks - it helps me explore my understanding.
>>> Sorry - as I have said, I'm not really very good on this stuff, but I do like to try to understand.
>>> Antoine, some of what you say puzzles me.
>>> Looking at class :Deprecated
>>>> The second model with a class :Deprecated ensures that an entity is either of type :Deprecated, or not.
>>> Is it not more properly the case that an Entity is either of type :Deprecated or we don't know? (Open world)
>>
>> RDF(S), OWL etc are not 3-valued logics, where something may be true, false, or something else. Something is either deprecated or it is not. It may not be known in which case an entity is, but it definitely fall into one of the two cases.
> True.
> But my point was that you can never infer that something is not in the :Deprecated class (I think).

If you are defining the class :Deprecated in your vocabulary / ontology, 
then I can use it too, and we can interoperate. In my application, 
things that need to be identified as not deprecated are typed like this:

  ex:thing  a  [ owl:complementOf :Deprecated ] .

With an OWL 2 RL or an OWL 2 QL reasoner, this works well.
Someone else does it differently: they introduce the class 
:NotDeprecated and assert:

  :NotDeprecated  owl:disjointWith  :Deprecated .

Then they say:

  ex:thing  a  :NotDeprecated .

With an OWL 2 EL reasoner (or pD*, or OWL LD...), this works well.
Other constructions can be used to infer being in the complement of a class.

But in many cases, you just need to care about things that can be 
inferred to be in the class, and disregard everything else. In a court 
of justice, if you cannot prove that someone is guilty, you assume they 
are innocent, regardless of whether you actually proved innonence.

>>
>>> So the boolean version seems to perhaps give me a richer way of recording knowledge.
>>
>> In a sense, yes, because you can say for instance:
>>
>> ex:thing  :isDeprecated  "true"^^xsd:boolean, "false"^^xsd:boolean .
>>
>> And this is still consistent. But in my experience, every boolean properties that I've seen were intended to be used for segregating the trues and the falses. But this segregation does not come for free in the RDF(S)/OWL semantics.
> So in one method, you can explicitly assert that an entity is not deprecated, by saying
> ex:thing :isDeprecated "false"^^xsd:boolean .
> and so querying can act accordingly.
> In the other, you can never record or query whether something is not deprecated.

It depends. See the options above. If it's sufficient for you to find 
things that are not known to be deprecated, you can do:

  NOT EXISTS { ?x  a  :Deprecated }

But yes, :isDeprecated "false"^xsd:boolean gives a more direct way of 
doing this.

>>
>>> To model the boolean equivalent, you could also have a :notDeprecated class.
>>> And then you would have the same four categories for the class version as you have for the boolean version.
>>> (Not saying this is good!)
>>
>> Yes, but again, the "notDeprecated" comes "for free" by simply being the set of things that are not in the Deprecated class.
> True.
> But that isn't much use for finding those not deprecated entities, in an Open world, I thought?

With nothing more than a :Deprecated class, sure. With a little extra 
like what I showed before, there are many ways. And, inside your 
application, you are free to do the assumptions you like, such as, the CWA.

> Of course, if Mikael is not interested in doing that (equivalent to asserting a false), as I mentioned in my later paragraph, then it doesn't really matter, and the methods are equivalent.

If Mikael wants to know the good practices for modelling a domain of 
knowledge in a shared ontology, then it really matters, IMHO.

> In fact, in that world, the absence of
> ex:thing :isDeprecated "true"^^xsd:boolean .
> implies that something is not deprecated, just as much as if it was explicitly "false".

Again, this is relying on an assumption that you can only make if you 
control all the environments and contexts in which the vocabulary is and 
will be used. If you are believing that everything must be in one of the 
two cases: 1) deprecated or 2) not deprecated, then the use of a class 
makes this belief (or knowledge) formally explicit, while the boolean 
property only allows one to achieve this with a little extra assumption.


> And I think this may go to the nub of your rule of thumb - that using booleans can be an indicator of flawed thinking, where classes might have been better.
> 
> In a world in which you only ever want to make a statement that an entity has a property, that is, only ever use "true", then you have very clearly created a subset/category/class, or whatever.
> And using a class seems to me to be very much equivalently expressive, and may well be a more "natural" way to do the modelling.
> And provides the appropriate restriction for others' use on the values modelled.
> If you, on the other hand, intend to also use "false", then you are trying to do some sort of implied negation; at the very least to be equivalent, you will need another class, and this is going to make things look pretty horrible.

Not so horrible, in my humble opinion.


> So, I think that the bottom line is the Mikael would need to tell us more about what he is modelling, and how, before a proper recommendation could be made.

I find that a general discussion of the topic, independent of Mikael's 
use case, is quite useful for a larger group of people.

> He says:
>> lets say we have documents and we want to say wheather they are valid or deprecated. There are two ways to do this:
> I guess the question is the, how does he intend to model that a document is "valid"?
> It is starting to look like perhaps he should have two classes, DeprecatedDocument and ValidDocument; which brings up a maintenance question - what happens in different modelling styles if he or someone later decides that there are other categories of documents, such as WithdrawnDocument, PendingDocument?

And maybe there is a category of documents, the bizarros, that are 
always deprecated, and another, the freshers, that are always non 
deprecated. With classes, it's pretty easy and natural:

:Bizzaro  rdfs:subClassOf  :Deprecated .
:Freshers owl:disjointWith  :Deprecated .


> Of course, there is nothing special about how restrictive the possible "deprecation"s are - there may be a whole bunch of other states that Mikael hasn't told us about - it looks quite a stark without knowing that.
> It could just easily have been class :BornInGermany v. boolean property :wasBornInGermany for a group of people, if that is what I was trying to model (along with perhaps other stuff).
> I' not sure the best way would be to create a new class for every country.
> Which brings us to Mike's Third Way.
> 
> I agree with the naming of the class being wrong, by the way; if I saw a name like that, I would wonder whether the user should instead have used booleans!
> (It's like seeing certain variable names (such as "count" and their ilk) in a pure functional language program - it would tell me that the student wasn't thinking right. :-) )
> 
> By the way, I think we are possibly pretty much agreed, Antoine - I'm just exploring the issue.

I guess so, yes.


--AZ

> 
> Thanks
> Hugh
> 
>>
>>> [Hang on - I have just realised that Mikael makes no suggestion that he will ever assert "false" - so your introducing the "false" categories (3 & 4) is like me introducing the :notDeprecated class.]
>>
>> If the property is only intended to be used in Mikael's application or system, then yes, but the question of what's the best model for a closed system is not so relevent. Just use what is most efficient for your tool chain. However, if the intent is to have a shared vocabulary that will traverse application boundaries, then you must think about how *others* may use it. If you define a property :isDeprecated with range xsd:boolean, then the value "false" may show up. Of course, you can document your vocabulary with instructions saying that it must only be used with value "true", or you can add an axiom forcing the value to be "true" or not present, but in the former case, you rely on out of band information, and in the latter, you rely on more expressive constructs. You do not need this at all with a class :Deprecated.
>>
>>
>>> Although I worry about your argument here, I think that the general principle may well be very good.
>>> If you see booleans, especially where they always seem to be "true", it is a flag that maybe a class should be used.
>>> (This is very similar to seeing "= true" in an expression in a programming language, someone isn't thinking right :-) )
>>> I usually view an rdf:type triple as nothing special compared with any other.
>>> You assert them and match them just the same.
>>> It just so happens that "we" have chosen that we can do sub-classing, and so if we do that, we get some special magic that can happen, which doesn't happen with everything else.
>>> And that is sometimes very useful, although it can make things quite hard to get the hang of.
>>> Then, as you say, there are a whole bunch of practical questions about efficiency of stores and reasoners when you do things in different ways.
>>> But, as with programming, most efficiency things should be left to the system implementation, and the source should be modelled in the most understandable and maintainable way.
>>
>> We definitely agree on those last paragraphs.
>>
>>
>> Best,
>> --AZ
>>
>>
>>> Best
>>> Hugh
>>>> On 24 Sep 2019, at 13:48, Antoine Zimmermann <antoine.zimmermann@emse.fr> wrote:
>>>>
>>>> Mikael,
>>>>
>>>>
>>>> These two options definitely affects reasoning.
>>>>
>>>> If you have a property :isDeprecated, then any entity can fall into 4 disjoint categories:
>>>>
>>>> 1. The entities that have no value for :isDeprecated.
>>>> 2. The entities that have value "true" only.
>>>> 3. The entities that have value "false" only.
>>>> 4. The entities that have both values "true" and "false".
>>>>
>>>> Moreover, if the range of the property is unrestricted, it can have all sorts of literal values, in any combination.
>>>>
>>>> If you want to make sure that all entities have exactly one of "true" or "false" as value for :isDeprecated, you need to introduce a cardinality axiom, which increases the complexity of reasoning (and you need to find a reasoner that supports cardinality restrictions on datatype properties).
>>>>
>>>> The second model with a class :Deprecated ensures that an entity is either of type :Deprecated, or not. This comes for free with any reasoner that supports a logic as simple as RDFS, without extra axioms. Many more reasoners support axioms made on classes than axioms made on literals and datatype properties. It's easier to define subclasses of deprecated documents, for instance.
>>>>
>>>> In general, when I review an ontology document, I mark all use of boolean properties as a mistake. Usually, boolean properties comes from adopting a programming approach to ontology engineering rather than a knowledge representation approach (that is, it uses the ontology as a data structure for computation rather than as an information model about the world, for knowledge interchange).
>>>>
>>>> However, when you have to go back and forth between an existing data model such as tabular data etc. and RDF, it can be convenient to translate booleans to booleans, so there can be exceptions to my rule of thumb of excluding all boolean properties.
>>>>
>>>>
>>>> Best,
>>>> --AZ
>>>>
>>>> Le 24/09/2019 à 13:57, Mikael Pesonen a écrit :
>>>>> Hi,
>>>>> lets say we have documents and we want to say wheather they are valid or deprecated. There are two ways to do this:
>>>>> :doc1 a foaf:Document ;
>>>>>      :isDeprecated "true"^^xsd:boolean .
>>>>> or
>>>>> :doc1 a foaf:Document ;
>>>>>      a :Deprecated .
>>>>> Are there some different implications on the use? Does is affect OWL reasoning, for example?
>>>>> Mikael
>>>>
>>>> -- 
>>>> Antoine Zimmermann
>>>> Institut Henri Fayol
>>>> École des Mines de Saint-Étienne
>>>> 158 cours Fauriel
>>>> CS 62362
>>>> 42023 Saint-Étienne Cedex 2
>>>> France
>>>> Tél:+33(0)4 77 42 66 03
>>>> Fax:+33(0)4 77 42 66 66
>>>> http://www.emse.fr/~zimmermann/
>>>> Member of team Connected Intelligence, Laboratoire Hubert Curien
>>>>
>>
>> -- 
>> Antoine Zimmermann
>> Institut Henri Fayol
>> École des Mines de Saint-Étienne
>> 158 cours Fauriel
>> CS 62362
>> 42023 Saint-Étienne Cedex 2
>> France
>> Tél:+33(0)4 77 42 66 03
>> Fax:+33(0)4 77 42 66 66
>> http://www.emse.fr/~zimmermann/
>> Member of team Connected Intelligence, Laboratoire Hubert Curien
>
Received on Thursday, 26 September 2019 11:06:40 UTC