Re: Using boolean value vs class from Michael F Uschold on 2019-09-25 (semantic-web@w3.org from September 2019)

From: Michael F Uschold <uschold@gmail.com>
Date: Wed, 25 Sep 2019 16:38:26 -0700
To: Patrick J Hayes <phayes@ihmc.us>
Cc: Hugh Glaser <hugh@glasers.org>, Antoine Zimmermann <antoine.zimmermann@emse.fr>, "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <CADfiEMNQiBa76u1h=7Yk6n2aYS7P9LXHrVcMAm_c-g2HfkXZbQ@mail.gmail.com>
 Pat, thanks for raising those questions.

> Seems to me that you are imposing an alien, and unnecessary, intuition
onto OWL which is not there in either the design philosophy or the
specifications of the actual language.

You are correct that one can use an owl:Class of anything that is a set.  I
(and my colleagues) have made choices that work well for us for developing
and populating enterprise ontologies deployed as rdf graphs for our
clients.  So it is one style of OWL modeling.

Our clients typically have many facets and taxonomies with often thousands
of nodes.  These things often represent sets, so as you say, one could use
owl classes to represent them. This results in a gigantic class hierarchy
which is difficult to view in Protégé.  Also, when the taxonomy nodes are
only used as tags, there is nothing interesting to say about then other
than what things they categorize.  So we choose to use owl:Class for things
that we reasonably expect to say things about in a triple store.  That way,
the interesting things that we are carefully modeling out with restrictions
are more easily managed.

> [regarding using fewer vs. more properties] That seems to me to be a
problem rather than an advantage.
Its a double edge sword.   The advantage is the person trying to understand
and populate the ontology has far few things to try to remember. A
disadvantage is it makes SPARQL queries a bit longer, to find the type of
thing that is the object.

Would love to see you sometime and dive into some of the interesting
details you raise.
Michael

On Tue, Sep 24, 2019 at 11:24 PM Patrick J Hayes <phayes@ihmc.us> wrote:

> Michael, I have some questions/comments, in-line below.
>
> On Sep 25, 2019, at 12:11 AM, Michael F Uschold <uschold@gmail.com> wrote:
>
> This is an important discussion, as this modeling question arises all the
> time.
>
> I agree that Boolean data properties are not a great option. This is
> explained in this blog:  Why Not to Use Boolean Datatypes in Taxonomies
> <https://www.semanticarts.com/why-not-to-use-boolean-datatypes-in-taxonomies/>
> by Dave McComb.
>
> OWL inference may be a red herring here.  You may not be running OWL
> inference over a large ABox of documents?  More likely, you are just
> going to run inference on the TBox and then load triples into a triple
> store and use whatever reasoning is provided by that vendor (highly
> variable, and certainly not OWL2-DL).
>
> Creating a class called Deprecated will work, but may not be the best
> solution. First, it goes against common practice for naming a class. Common
> names for classes include “Person” and “Document”. An instance of the first
> class is a person. An instance of the second class is a document. However
> an instance of your proposed class is not a ‘deprecated’. Rather it is a
> deprecated thing.  If you named the class DeprecatedThing, the naming
> convention would be respected. However, that’s not a very satisfying class.
> The reason points to more fundamental issue, aside from naming.
>
> The main purpose of an OWL class is to represent a set of things that are
> all the same kind (person, document).
>
> Really? Where do you get this from? I don’t see anything in any OWL spec
> which implies this. OWL classes are just sets of things, and you can make a
> set for any purpose.
>
> Nobody thinks of being deprecated as signifying a different kind of thing.
>  It’s more analogous to a tag for a photo.  If you tag a photo with
> “winter”, this gives rise to a set of things tagged with “winter”.   One
> could represents that set as an OWL class, just like one could represent
> the class of all deprecated things with the class DeprecatedThing.  But
> these sets do not represent a kind of thing one would want to represent as
> an OWL class.  Rather, being deprecated or not is a  characteristic or
> facet of a thing.
>
> What is the difference between a characteristic of a thing and a class (of
> things with that characteristic) that the thing might be in? Can you
> characterize that distinction?
>
> Documents and products and lot of things can have many facets.
>
> And they can be in, or not in, many classes.
>
> Seems to me that you are imposing an alien, and unnecessary, intuition
> onto OWL which is not there in either the design philosophy or the
> specifications of the actual language.
>
> There is a third alternative that we use in our enterprise ontologies.  I
> would create a class called say DeprecationIndicator with two instances:
> isDeprecated and isNotDeprecated.   These are really categories:  something
> is deprecated or not.
>
> But by encoding what are really Booleans as values you are introducing the
> problems that Antoine describes. What is gained by this rather than simply
> having the classes? As you say, they are really categories, and surely a
> category is a best thought of as a class.
>
> There are typically many such facets
>
> This whole language of facets seems like simply a re-statement of the OWL
> description-logic intuition in a different metalanguage. What is a ‘facet’
> other than a property?
>
> and each has a set of values.   There might be a facet called Color for
> cars or iPhones.  An individual car or phone would have a color and there
> could be several instances of the class Color (rose, midnight green, etc).
>
>
> Cars and iphones are in the domain of the hasColor property, and its
> values are things like rose, etc., in the class of Colors. Pure OWL; but I
> would use this basic structure in a language as expressive as Common Logic.
>
> An advantage of this approach is that you avoid unnecessary proliferation
> of properties, one for each facet. You do not need two properties one for
> hasColor and one for hasDeprecationIndictor. Rather you can just use a
> single property, say isCategorizedBy.
>
> That seems to me to be a problem rather than an advantage. Why does a
> color ‘categorize’ something? That violates my intuition rather sharply.
> But in any case, your isCategorizedBy is just a very high superproperty of
> all the more precise ‘characterizing’ properties, which could be defined as
> restrictions to particular range classes if one really wanted to do thing
> in such an opaque way, so (A hasColor rose) just means (A isCategorizedBy
> rose) & (Color rose). (In CL you could use the same name for the class and
> the property, just to keep things notationally simpler. I believe one can
> do the same kind of punning in OWL2,)
>
> This is further explained in this blog: Buckets, Buckets Everywhere, Who
> Knows What to Think?
> <https://www.semanticarts.com/gist-buckets-buckets-everywhere-who-knows-what-to-think/>
> by yours truly.
>
> Well, that says that you like to make these distinctions, but it doesn’t
> explain why, or what advantages might accrue from adopting this unintuitive
> discipline.
>
> Best wishes
>
> Pat
>
>
>
>
> On Tue, Sep 24, 2019 at 7:03 AM Hugh Glaser <hugh@glasers.org> wrote:
>
>> Very interesting question, thanks - it helps me explore my understanding.
>> Sorry - as I have said, I'm not really very good on this stuff, but I do
>> like to try to understand.
>>
>> Antoine, some of what you say puzzles me.
>> Looking at class :Deprecated
>> > The second model with a class :Deprecated ensures that an entity is
>> either of type :Deprecated, or not.
>> Is it not more properly the case that an Entity is either of type
>> :Deprecated or we don't know? (Open world)
>>
>> So the boolean version seems to perhaps give me a richer way of recording
>> knowledge.
>>
>> To model the boolean equivalent, you could also have a :notDeprecated
>> class.
>> And then you would have the same four categories for the class version as
>> you have for the boolean version.
>> (Not saying this is good!)
>>
>> [Hang on - I have just realised that Mikael makes no suggestion that he
>> will ever assert "false" - so your introducing the "false" categories (3 &
>> 4) is like me introducing the :notDeprecated class.]
>>
>> Although I worry about your argument here, I think that the general
>> principle may well be very good.
>> If you see booleans, especially where they always seem to be "true", it
>> is a flag that maybe a class should be used.
>> (This is very similar to seeing "= true" in an expression in a
>> programming language, someone isn't thinking right :-) )
>>
>> I usually view an rdf:type triple as nothing special compared with any
>> other.
>> You assert them and match them just the same.
>> It just so happens that "we" have chosen that we can do sub-classing, and
>> so if we do that, we get some special magic that can happen, which doesn't
>> happen with everything else.
>> And that is sometimes very useful, although it can make things quite hard
>> to get the hang of.
>>
>> Then, as you say, there are a whole bunch of practical questions about
>> efficiency of stores and reasoners when you do things in different ways.
>> But, as with programming, most efficiency things should be left to the
>> system implementation, and the source should be modelled in the most
>> understandable and maintainable way.
>>
>> Best
>> Hugh
>>
>> > On 24 Sep 2019, at 13:48, Antoine Zimmermann <
>> antoine.zimmermann@emse.fr> wrote:
>> >
>> > Mikael,
>> >
>> >
>> > These two options definitely affects reasoning.
>> >
>> > If you have a property :isDeprecated, then any entity can fall into 4
>> disjoint categories:
>> >
>> > 1. The entities that have no value for :isDeprecated.
>> > 2. The entities that have value "true" only.
>> > 3. The entities that have value "false" only.
>> > 4. The entities that have both values "true" and "false".
>> >
>> > Moreover, if the range of the property is unrestricted, it can have all
>> sorts of literal values, in any combination.
>> >
>> > If you want to make sure that all entities have exactly one of "true"
>> or "false" as value for :isDeprecated, you need to introduce a cardinality
>> axiom, which increases the complexity of reasoning (and you need to find a
>> reasoner that supports cardinality restrictions on datatype properties).
>> >
>> > The second model with a class :Deprecated ensures that an entity is
>> either of type :Deprecated, or not. This comes for free with any reasoner
>> that supports a logic as simple as RDFS, without extra axioms. Many more
>> reasoners support axioms made on classes than axioms made on literals and
>> datatype properties. It's easier to define subclasses of deprecated
>> documents, for instance.
>> >
>> > In general, when I review an ontology document, I mark all use of
>> boolean properties as a mistake. Usually, boolean properties comes from
>> adopting a programming approach to ontology engineering rather than a
>> knowledge representation approach (that is, it uses the ontology as a data
>> structure for computation rather than as an information model about the
>> world, for knowledge interchange).
>> >
>> > However, when you have to go back and forth between an existing data
>> model such as tabular data etc. and RDF, it can be convenient to translate
>> booleans to booleans, so there can be exceptions to my rule of thumb of
>> excluding all boolean properties.
>> >
>> >
>> > Best,
>> > --AZ
>> >
>> > Le 24/09/2019 à 13:57, Mikael Pesonen a écrit :
>> >> Hi,
>> >> lets say we have documents and we want to say wheather they are valid
>> or deprecated. There are two ways to do this:
>> >> :doc1 a foaf:Document ;
>> >>     :isDeprecated "true"^^xsd:boolean .
>> >> or
>> >> :doc1 a foaf:Document ;
>> >>     a :Deprecated .
>> >> Are there some different implications on the use? Does is affect OWL
>> reasoning, for example?
>> >> Mikael
>> >
>> > --
>> > Antoine Zimmermann
>> > Institut Henri Fayol
>> > École des Mines de Saint-Étienne
>> > 158 cours Fauriel
>> > CS 62362
>> > 42023 Saint-Étienne Cedex 2
>> > France
>> > Tél:+33(0)4 77 42 66 03
>> > Fax:+33(0)4 77 42 66 66
>> > http://www.emse.fr/~zimmermann/
>> > Member of team Connected Intelligence, Laboratoire Hubert Curien
>> >
>>
>> --
>> Hugh
>> 023 8061 5652
>>
>>
>>
>
> --
>
> Michael Uschold
>    Senior Ontology Consultant, Semantic Arts
>    http://www.semanticarts.com
>    LinkedIn: www.linkedin.com/in/michaeluschold
>    Skype, Twitter: UscholdM
>
>
>
>

-- 

Michael Uschold
   Senior Ontology Consultant, Semantic Arts
   http://www.semanticarts.com
   LinkedIn: www.linkedin.com/in/michaeluschold
   Skype, Twitter: UscholdM
Received on Wednesday, 25 September 2019 23:39:27 UTC