Re: comments on section 7.4 from Antoine Isaac on 2015-01-23 (public-dwbp-wg@w3.org from January 2015)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Fri, 23 Jan 2015 01:48:00 +0100
To: <public-dwbp-wg@w3.org>
Message-ID: <54C19A40.6000301@few.vu.nl>
Hi João Paulo, Ig,

Thanks for the comment!
I have committed a new version that tries to address some of them. See reactions below.




>     I would like the first paragraph to be simplified; it would come back in a later version when we have settle the discussion in that other thread (how to get from data representation to vocabularies).


The wording of that paragraph can be improved and your suggestions have helped a lot.
That said I'm uncomfortable with 'simplification' if it means 'removing the examples'. As I've hinted in the other thread, it's very likely that whatever term we choose for 'vocabulary', there will be people for which it won't be intuitive, and examples will help.



>
>     It currently reads:
>     “Datasets often resort to a range of vocabularies in the data they contain: data is entered or captured in a controlled way, i.e., positions in a data graph (or column in a relationship table) are explicitly defined, the name of a person, the subject of a book, a relationship “knows” between two persons. Additionally, for certain positions, the values used should come from a limited set of pre-existing resources: for example object types, roles of a person, countries in a geographic area, or possible subjects for books. Such vocabularies ensure a level of control, standardization and interoperability in the data. They can also provide a way to easily create richer data. Say, a dataset contains a reference to a concept described in several languages. This reference allows applications to localize their display of their search depending on the language of the user."
>
>     In my opinion there are some imprecisions (what are positions in a graph? What is richer data?), so I would prefer the following simplification:
>     “Data is often represented in a structured way making reference to a range of vocabularies: data is represented in a controlled way, e.g. by defining types of nodes and links in a data graph or types of values for columns in a table. Additionally, the values used may come from a limited set of pre-existing values or resources: for example object types, roles of a person, countries in a geographic area, or possible subjects for books. Such vocabularies ensure a level of control, standardization and interoperability in the data."
>
>
> I think the way you summed up is OK.


I have taken the new wording for positions in the graphs, columns, etc. Much clearer!


> However, I do not see any problem to keep the second part related to "Richer Data", because an example was given for explaining that.


I agree, I've kept it.

>
>     I would also not like the terms “light-weight” and “heavy-weight” ontologies to be used in the way they are being used. The text currently says that:
>
>     "The first means offered byW3C for creating (“light-weight”) ontologies is theRDF Schema <http://www.w3.org/standards/techs/rdf#w3c_all>language. It is possible to define more complex (“heavy-weight”) ontologies with advanced axioms using languages such as The Web Ontology LanguageOWL <http://www.w3.org/standards/techs/owl#w3c_all>.”
>
>     There is a lot of literature on ontologies that calls ontologies in OWL "light-weight ontologies", given the low expressiveness of description logics when compared to other approaches for ontology specification (e.g., first-order logics). Heavyweight ontologies would be formal ontologies written with expressive languages for off-line use (also called “reference ontologies”). See Guizzardi’s thesis for a very good discussion on this: http://www.inf.ufes.br/~gguizzardi/OFSCM.pdf
>
>     My suggestion is to replace this text by:
>     "The first means offered by W3C for creating ontologies is the RDF Schema <http://www.w3.org/standards/techs/rdf#w3c_all> language. It is possible to define more expressive ontologies with additional axioms using languages such as those in The Web Ontology Language OWL <http://www.w3.org/standards/techs/owl#w3c_all> family.”
>
> I perfectly understand what you are talking about. And, as you know, there isn't a consensus in the ontology community about the right definition for light-weight and heavy-weight ontologies. For example, you can see Mizoguchi's tutorial [1] about the type of ontologies.
> [1] http://www.unipamplona.edu.co/unipamplona/portalIG/home_23/recursos/general/06032011/onto_parte1.pdf
> For this reason, and due the deadline, I think we could jump this conceptual discussion and the way you proposed is quite nice for me.


I am glad to remove lightweight and heavyweight, really, even though I too have seen them applied in the cases that were described in the text.


>
>
>     BP12, possible approach to implementation:
>     Add that diagrams may also serve the purpose of documenting vocabularies. An example is the use of a subset of UML to represent the W3C Org Ontology. (By the way, we had certain conventions established in GLD to define the UML diagram which could be part of a detailed BP for this.)
>
>
> +1, but I think you are talking about BP11, right?


Diagrams are an excellent suggestion! More details on GLD conventions (or just a pointer) could be helpful indeed, but I don't have time to dig them up.

>
>
>     *I would seriously hope that Best Practice 16 is removed altogether.* It has a number of statements with which I strongly disagree, and is too biased against formalization.
>
>     It is biased because it says things such as "Unnecessarily complex vocabularies cost more efforts to produce and are less likely to be re-used in other datasets. “ but there is no reference to the other side of the coin, which would be that “overly simplistic vocabularies may fail to establish shared meaning to enable semantic interoperability”.  It is because of the lack of expressiveness of schema languages like XML Schema that we now have RDF(S) and OWL(S)…
>
>     It also says that "Resources that are equiped with a strong, formal semantics are less clear (harder to understand) for any data re-user.” I can’t really understand this. It is too strong a generalization. Why would formal semantics be directly opposed to clarity? Formal semantics may help one to establish more precise specifications… which would support establishing the intended meaning of the vocabulary. So the whole point is obviously identifying the right level of formalization for particular tasks (and possibly having a number of related formalisms when one size does not fit all)! And of course presenting the ontology in a way that users can understand it (for example, with diagrams that do not require the user to read through all axioms – again see W3C ORG Ontology for an example).
>
>
> +1. I totally agree with João Paulo about this issue. The level of formalization depends on several aspects, such as the intended audience, domain, kind of use, and so on... We can see different scenarios with different levels of formalization...
> As we are proposing Best Practices, I think it is very strong to make such a recommendation. For this reason, I agree with João Paulo to remove the BP16.


This BP is 'do not overformalize vocabularies', it is not 'do never formalize vocabularies'! I agree with you formalization is useful in general. It's just that it shouldn't be overused.
The point is indeed to find the right level, and I think it matches pretty well Ig's point on audience, domain, kind of use etc.

I have tried to do some re-wording in the lines you suggest, because I believe our perspectives are not fully incompatible. Some of the sentences were indeed confusing and Carlo's suggestions helped me a lot to clarify. I've even changed the title.

If you still fully disagree with having the BP included then we should remove it. If you agree with the general idea but still dislike the expression it would seem fair to keep it but raising a formal issue in the document, calling for readers to support or reject the best practice, or contribute enhancements.

Right now I have put an issue on whether the BP should be re-written in a more technology neutral way. I really don't have the time to do more today, sorry...

Best,

Antoine
Received on Friday, 23 January 2015 00:48:31 UTC