comments on section 7.4 from João Paulo Almeida on 2015-01-22 (public-dwbp-wg@w3.org from January 2015)

From: João Paulo Almeida <jpalmeida@ieee.org>
Date: Thu, 22 Jan 2015 14:42:26 -0200
To: "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
Message-ID: <D0E6BCD2.99ED0%jpalmeida@ieee.org>
Dear All,

I understand Carlos concerns that we do not have time for a full discussion
of the concepts underlying the BP document, but I would not like section 7.4
to be sent ³out there² in its present form.

I would like the first paragraph to be simplified; it would come back in a
later version when we have settle the discussion in that other thread (how
to get from data representation to vocabularies).

It currently reads:
³Datasets often resort to a range of vocabularies in the data they contain:
data is entered or captured in a controlled way, i.e., positions in a data
graph (or column in a relationship table) are explicitly defined, the name
of a person, the subject of a book, a relationship ³knows² between two
persons. Additionally, for certain positions, the values used should come
from a limited set of pre-existing resources: for example object types,
roles of a person, countries in a geographic area, or possible subjects for
books. Such vocabularies ensure a level of control, standardization and
interoperability in the data. They can also provide a way to easily create
richer data. Say, a dataset contains a reference to a concept described in
several languages. This reference allows applications to localize their
display of their search depending on the language of the user."

In my opinion there are some imprecisions (what are positions in a graph?
What is richer data?), so I would prefer the following simplification:
³Data is often represented in a structured way making reference to a range
of vocabularies: data is represented in a controlled way, e.g. by defining
types of nodes and links in a data graph or types of values for columns in a
table. Additionally, the values used may come from a limited set of
pre-existing values or resources: for example object types, roles of a
person, countries in a geographic area, or possible subjects for books. Such
vocabularies ensure a level of control, standardization and interoperability
in the data."

I would also not like the terms ³light-weight² and ³heavy-weight² ontologies
to be used in the way they are being used. The text currently says that:

"The first means offered by W3C for creating (³light-weight²) ontologies is
the RDF Schema <http://www.w3.org/standards/techs/rdf#w3c_all>  language. It
is possible to define more complex (³heavy-weight²) ontologies with advanced
axioms using languages such as The Web Ontology Language OWL
<http://www.w3.org/standards/techs/owl#w3c_all> .²

There is a lot of literature on ontologies that calls ontologies in OWL
"light-weight ontologies", given the low expressiveness of description
logics when compared to other approaches for ontology specification (e.g.,
first-order logics). Heavyweight ontologies would be formal ontologies
written with expressive languages for off-line use (also called ³reference
ontologies²). See Guizzardi¹s thesis for a very good discussion on this:
http://www.inf.ufes.br/~gguizzardi/OFSCM.pdf

My suggestion is to replace this text by:
"The first means offered by W3C for creating ontologies is the RDF Schema
<http://www.w3.org/standards/techs/rdf#w3c_all>  language. It is possible to
define more expressive ontologies with additional axioms using languages
such as those in The Web Ontology Language OWL
<http://www.w3.org/standards/techs/owl#w3c_all>  family.²

BP12, possible approach to implementation:
Add that diagrams may also serve the purpose of documenting vocabularies. An
example is the use of a subset of UML to represent the W3C Org Ontology. (By
the way, we had certain conventions established in GLD to define the UML
diagram which could be part of a detailed BP for this.)

I would seriously hope that Best Practice 16 is removed altogether. It has a
number of statements with which I strongly disagree, and is too biased
against formalization.

It is biased because it says things such as "Unnecessarily complex
vocabularies cost more efforts to produce and are less likely to be re-used
in other datasets. ³ but there is no reference to the other side of the
coin, which would be that ³overly simplistic vocabularies may fail to
establish shared meaning to enable semantic interoperability².  It is
because of the lack of expressiveness of schema languages like XML Schema
that we now have RDF(S) and OWL(S)Š

It also says that "Resources that are equiped with a strong, formal
semantics are less clear (harder to understand) for any data re-user.² I
can¹t really understand this. It is too strong a generalization. Why would
formal semantics be directly opposed to clarity? Formal semantics may help
one to establish more precise specificationsŠ which would support
establishing the intended meaning of the vocabulary. So the whole point is
obviously identifying the right level of formalization for particular tasks
(and possibly having a number of related formalisms when one size does not
fit all)! And of course presenting the ontology in a way that users can
understand it (for example, with diagrams that do not require the user to
read through all axioms  again see W3C ORG Ontology for an example).

Best regards,
João Paulo
Received on Thursday, 22 January 2015 16:42:56 UTC