Re: Schema.org and OWL from Michael Andrews on 2018-06-15 (public-schemaorg@w3.org from June 2018)

From: Michael Andrews <nextcontent01@gmail.com>
Date: Fri, 15 Jun 2018 21:51:17 +0530
To: Richard Wallis <richard.wallis@dataliberate.com>
Cc: Martin Hepp <mfhepp@gmail.com>, Simon.Cox@csiro.au, Anthony Moretti <anthony.moretti@gmail.com>, Dan Brickley <danbri@google.com>, elf-pavlik@hackers4peace.net, "schema.org Mailing List" <public-schemaorg@w3.org>, Thad Guidry <thadguidry@gmail.com>
Message-ID: <CAF9ZrJ30CF4AknY-SoGeOFtZMCohvM2PyxdROM_K3ec7=cg5Tw@mail.gmail.com>
I know the original query has been answered, and closed.  But I welcome
that the discussion has broadened beyond the original topic.  I’ve learned
much for the discussion.  Some of the broader points raised suggest issues
that the schema.org community should be mindful of moving forward.


Two themes strike me.  One is that the current vocabulary *seems*
idiosyncratic.  I’m not concerned about elegance for the sake of elegance
or academic correctness. I seek coherence simply to understand and learn
the whole package so I can use it for my own needs.  I’ve spent many hours
trying to understand how the vocabulary works, and find myself confused by
it regularly.  It’s not for a lack of effort: I have a shelf of books about
RDF and related topics.  I appreciate that schema.org is more approachable
than most high-level ontologies.  But I have trouble understanding the
patterns of different pieces, which often feel like they were made
piece-meal (even it they weren’t).  Schema.org is confusing for those who
want to understand it has a coherent vocabulary, but who haven’t been
involved with it from its inception, and I fear that chasm isn’t well
understood by core insiders.  To cite one small example: entities types
sometimes are labeled with “Type” in the name (in enumerations), generally
“Type” missing in the name, but sometime they are called a “Specification.”
I’m sure every decision has a good rationale behind it, but it is hard to
understand those decisions without witnessing each, or tracking down
comments made in list discussions or Github.  In short, schema.org is far
from being “self-describing”, and the existing documentation doesn’t
explain how to use many features of the vocabulary.  There is nothing wrong
with being idiosyncratic, but it does place a bigger burden on
documentation.  Every “good enough” solution for a specific situation
requires that much more explanatory documentation.  People try to read
things literally to understand them.


I would like my fellow community members to appreciate there are people
like me who aren’t complete novices, but aren’t insiders who know every
backstory.  I feel there can be an assumption that only a handful of people
need to know the rationale, and everyone else will be fine with the
examples provided that provide recipes for what to do, even if they don’t
understand why they are crafted the way they are.  Right now, if you want
to understand how to use the “category” property,  you are on your own,
because there’s no recipe, and little explanation.  There are lots of
places like that.


The second theme that this discussion highlights to me is that development
of the vocabulary has been driven by domain-specific tasks.  That’s a very
logical way to prioritize coverage, but the vocabulary may be encountering
some limitations of this approach.  Properties get too closely tied to the
original entity that they were associated with.   Curiously, there is no
central list of properties on the schema.org website (unlike on the GS1
website), so it can be hard to know where a property is being used for a
different entity that might be relevant to one’s own entity.


>From a use case perspective, sometimes the property is the point of
comparison.  Imagine we want to ask, which is faster, a cheetah or an
electric scooter?  We can’t get an answer to that question from the
schema.org vocabulary.  There’s no animal entity, so we can’t specify the
speed of an animal.


I choose this example for two reasons.  First, schema.org needs to allow
for the comparison of common properties of different entity types.  This is
hugely interesting information, which is currently not well-supported.
Schema.org should be a comprehensive vocabulary supporting general
information, not just a collection of dissimilar terms relating to a range
of different domains.


Second, not all use cases will be driven by a single domain, which seems to
be the current assumption.  Some may argue that there’s no animal type
because that’s not commercially significant.  People can get that
information from Wikipedia.  But some publishers do cover a broad range of
general interest information, even animals — and Wikipedia should never be
the sole source of truth. The needs of general interest publishers aren’t
always well served currently.  What use case would require information
about the speed of a cheetah?  Perhaps a learning module for elementary
school students, or even a trivia game for adults.   Applications of the
vocabulary are no longer limited to ecommerce.


I make these points not to detract from the success of the schema.org
project, or those involved with it.  But I’d caution against the view that
everything is working fine as it is.  I think the vocabulary needs to make
it easier to support cross-domain applications.


On Fri, Jun 15, 2018 at 5:20 PM, Richard Wallis <
richard.wallis@dataliberate.com> wrote:

> As the author of the originating email in this trail, asking a question
> that [scanning the subsequent 50 plus replies] now feels wildly off topic,
> I think I should add my couple of cents.
>
> Firstly, on the practical needs behind my original appeal for help.
>
> Many thanks to those that were of great help inside and outside of this
> thread.
>
> The latest version of Schema.org (V3.4) was released yesterday.  It
> contains an experimental updated version of the schemaorg.owl file which
> addresses issues such as being understood by tools such as Protégé; the
> representation of domainIncludes & rangeIncludes values in a way those
> tools can understand them and; the implicit inclusion of Text, URL, and
> Role in the rangeIncludes of most properties.  More information can be
> found on the Developers Page <https://schema.org/docs/developers.html#owl>.
>
>
> I feel it is important to emphasise part of the comments on that page:
>
>  “The mapping into OWL is an approximation, and should not be considered
> an authoritative definition for Schema.org’s terms; see datamodel page
> <https://schema.org/docs/datamodel.html> for details. As an experimental
> feature, there are no expectations as to its interpretation by any third
> party tools.”
>
> This experimental file was updated to help view Schema.org in various
> ways, not as a backdoor way of controlling its structure.
>
> Secondly, my brief thoughts on the discussion that has since ensued.
>
> In my work consulting with and helping organisations and individuals
> wishing to introduce Schema.org markup into their web presence and;
> chairing, and participating in, W3C community groups focussed on improving
> Schema.org for specific domains such as tourism, libraries, archives,
> educational courses, qualifications etc., I come across two main barriers
> to understand, for those new to the vocabulary.
>
>    - Those that are new to structured data who have had no exposure to
>    anything like Schema.org.  The concepts of Types, Properties, inheritance,
>    etc., are foreign to them as are the needs to describe Things (entities)
>    and their relationships.  This group (to reference the previous discussion)
>    just want to know which type to use to describe a Car, what properties that
>    Type makes available, and how to use them in a way that is acceptable to a
>    test tool such as Google’s SDTT
>    <https://search.google.com/structured-data/testing-tool>.  If a Car is
>    a type of Product, or not, has no particular relevance to their use of the
>    Type.
>
>    - Those that come from an ontological background that over concern
>    themselves with the potential implicit semantics inherent in the Type
>    hierarchy, and chosen type names — Is an ExercisePlan really a CreativeWork
>    or not?
>
> As an aside, running a training session containing several members from
> both of those groups can be a very interesting experience!
>
> It has, and will continue to be, very important to continue to shape the
> vocabulary (by broad pragmatic consensus from those that benefit from it)
> to address the practical needs of those groups whilst not making it
> difficult for the majority.
>
> As pointed out earlier the Schema.org vocabulary, since its inception in
> 2011, has been implemented on 10s of millions of websites on gazillions of
> pages, and is at the heart of the crawling activities of the major search
> engines and their testing tools.  As such its continuing development can be
> analogised to an oil tanker navigating the needs of structured data on the
> web, accepting course corrections to steer a few degrees to port or
> starboard.  In that analogy some of the suggestions in this thread would be
> considered to be a command to steer 90º starboard of our current course.
>
> Note I keep calling Schema.org a *vocabulary*, not an *ontology*, a
> subtle but important distinction I find when talking to others.  It is a
> set of useful Types and properties mainly for describing things on the web
> to aid their discovery and their place in global knowledge graphs. The term
> definition pages on the Schema.org site providing guidance and examples of
> how they might be used. The type hierarchy is a way of grouping loosely
> similar concepts together and inheriting sets of useful properties to aid
> markup.
>
> If we were to start again would Schema.org be very different? In detail
> very probably, but in overall design and approach (considering the overall
> needs it is satisfying) I think not.
>
> These discussions and suggestions are important and valuable to help nudge
> our course in helpful directions.  But we do need to be aware of the
> practical and pragmatic needs of those don’t know, need or want to know
> what the heck we are talking about here.
>
> So I encourage discussions such as this one, I inadvertently triggered.
> We just need to take into account that they are at one edge of a very broad
> spectrum of needs and interests of those who will, and have already, gained
> significant benefits from its existence, adoption, and continued evolution.
>
> ~Richard.
>
>
>
>
>
> Richard Wallis
> Founder, Data Liberate
> http://dataliberate.com
> Linkedin: http://www.linkedin.com/in/richardwallis
> Twitter: @rjw
>
> On 15 June 2018 at 08:18, Martin Hepp <mfhepp@gmail.com> wrote:
>
>> Hi Anthony:
>>
>> The main thing to see in here is that types in schema.org are mostly
>> used for grouping entites for which the same type of processing by major
>> consumers of such data is appropriate. We are not trying to develop a fully
>> application-agnostic system of types.
>>
>> Many of the contributors of schema.org have been in ontology engineering
>> since the beginning of that discipline, and over time, we have learned that
>> the pure ideal of fully detaching conceptual data models, and namely
>> relationship types and entity types, from any notion of the processing task
>> expected on the data, will not work.
>>
>> I think there is a nice quote by R.V. Guha on this topic somewhere in the
>> list archive, but I don't find it right now.
>>
>> Historically, data structures and algorithms have always been considered
>> a duality in Computer Science. The community that reused the term
>> "ontology" from Philosophy to CS in the 1990s and redefined it as a word
>> for shared conceptual data models that try to represent the "real
>> structures of the world" wanted to decouple data structures from
>> algorithms. While this aim was well intended, it turned out to be a dead
>> end, because you can endlessly debate about what these "real structures of
>> the world" are, as long as you do not have a metric for measuring your
>> archievement.
>>
>> All models, and data models are no exception, are purpose-bound
>> simplifications of a domain of interest. You can only assess the quality of
>> a model with regard to a purpose. It is invalid to critize a model for
>> being too granular, too coarse, or otherwise deficient, unless this
>> defiency is observable in the area of application for which the model is
>> intended.
>>
>> Best wishes
>> Martin
>> -----------------------------------
>> martin hepp  http://www.heppnetz.de
>> mhepp@computer.org          @mfhepp
>>
>>
>>
>>
>> > On 15 Jun 2018, at 08:59, <Simon.Cox@csiro.au> <Simon.Cox@csiro.au>
>> wrote:
>> >
>> > 1.       domainIncludes and rangeIncludes are not exhaustive. Multiple
>> values are linked by an open OR, not exclusive AND (major difference to
>> RDFS)
>> > 2.       its OK to be a member of more than one class. It’s OK for
>> something to be both a Product and CreativeWork.
>> >
>> > From: Anthony Moretti [mailto:anthony.moretti@gmail.com]
>> > Sent: Friday, 15 June, 2018 16:30
>> > To: Dan Brickley <danbri@google.com>
>> > Cc: Martin Hepp <mfhepp@gmail.com>; elf Pavlik <
>> elf-pavlik@hackers4peace.net>; public-schemaorg@w3.org; Thad Guidry <
>> thadguidry@gmail.com>
>> > Subject: Re: Schema.org and OWL
>> >
>> > Thanks for the links guys.
>> >
>> > I'm definitely not trying to make Schema into "one true logical model
>> of the world", I do always think it's worthy to strive for simplicity and
>> consistency though, something maybe similar in intention to code
>> refactoring.
>> >
>> > Here is a problem that exists now though because of overly specific
>> domains - if I want to describe the height of the Eiffel Tower, a Place,
>> I'd want to use the "height" property, but the only types "height" can be
>> used on are MediaObject, Person, Product, and VisualArtwork. I completely
>> get the volcano-with-fax-number approach, and I'm actually a big fan of it,
>> that's why I propose moving properties such as "height" to Thing. A
>> guideline that Schema might be able to apply here could take inspiration
>> from the rule of three - whenever a property is used on more than two types
>> move it to the parent type. Using this guideline "height" would be on
>> Thing, and could then be used to describe the Eiffel Tower.
>> >
>> > I'll end now with one final suggestion, I realize it probably has no
>> chance of going anywhere, but I'll put it out there for consideration
>> anyway. After moving those properties to Thing I realized that because
>> CreativeWork, Product, and Intangible don't have clear definitions all they
>> do is add complexity (how many times is it asked whether products are also
>> creative works and vice versa). It would arguably be simpler to have all
>> their properties on Thing and ThingType. This is in line with the
>> volcano-with-fax-number approach, and would give great flexibility.
>> >
>> > Thanks for all the discussion!
>> >
>> > Anthony
>> >
>> > On Thu, Jun 14, 2018 at 4:09 PM Dan Brickley <danbri@google.com> wrote:
>> > On Thu, 14 Jun 2018 at 15:19, Anthony Moretti <
>> anthony.moretti@gmail.com> wrote:
>> > I think Martin's point about passing information from product types to
>> product instances can be addressed higher in the hierarchy than Product
>> actually. I sense people are opposed to shifting properties from more
>> specific types to Thing though (maybe I don't understand something, can
>> someone please explain that to me?) My view is that using overly specific
>> domains for properties causes strange entailment, e.g. in its current form
>> the "height" property entails the subject is either a MediaObject, Person,
>> Product, or VisualArtwork, which doesn't seem right.
>> >
>> > On this point - "e.g. in its current form the "height" property entails
>> the subject is either a MediaObject, Person, Product, or VisualArtwork,
>> which doesn't seem right." -- we don't really say that anywhere, and in
>> fact we created looser variants of rdfs domain/range for documentation, to
>> avoid saying more than we wanted to. On the contrary, in
>> http://schema.org/docs/datamodel.html  -
>> >
>> > "When we list the expected types associated with a property (or
>> vice-versa) we aim to indicate the main ways in which these terms will be
>> combined in practice. This aspect of schema.org is naturally imperfect.
>> For example the schemas for Volcano suggest that since volcanoes are
>> places, they may have fax numbers. Similarly, we list the unlikely (but not
>> infeasible) possibility of a Country having "opening hours". We do not
>> attempt to perfect this aspect of schema.org's structure, and instead
>> rely heavily on an extensive collection of illustrative examples that
>> capture common and useful combinations of schema.org terms. The
>> type/properties associations of schema.org are closer to "guidelines"
>> than to formal rules, and improvements to the guidelines are always
>> welcome."
>> >
>> > In this regard, you might view this aspect of Schema.org as being
>> closer to the "The Code is more what you call guidelines, than actual
>> rules" tradition of the Pirates of the Caribbean than the expectations you
>> might bring from the OWL world, even if we target much the same underlying
>> data model.
>> >
>> > If this might seem less thank helpful, I'd suggest a possible
>> middle-ground would be to explore the RDF validation languages - SHACL and
>> ShEx - which suggest ways of layering certain kinds of discipline over
>> messy RDF data. It doesn't address all the modeling concerns raised here,
>> but does offer another layer of expressivity which needn't happen in the
>> core project.  You could look at https://www.topquadrant.com/te
>> chnology/shacl/tutorial/ or http://book.validatingrdf.com/ -- e.g.
>> http://datashapes.org/schema attempts to capture some ofschema.org
>> itself in SHACL, whereas https://github.com/SEMICeu/dcat-ap_shacl/ (in
>> SHACL) and https://github.com/SEMICeu/dcat-ap_shacl/issues/32 (in ShEx)
>> try to capture specific useful community-specific patterns for describing
>> datasets. These languages let people say things about Schema.org data
>> structures, beyond what the project itself chooses to say. For example by
>> constructing and documenting more tidy-minded subsets/profiles, or mixing
>> it with longer tail vocabularies (like Wikidata's e.g. see Thad and
>> friends' mappings) or richer domain models e.g. from the sciences, and
>> explaining sensible patterns for these combinations. You could look at what
>> the Blue Brain project are doing there, for example -
>> https://github.com/BlueBrain/nexus-kg/issues?utf8=%E2%9C%93&
>> q=is%3Aissue+is%3Aopen+shacl or the ShEx efforts around HL7/FHIR,
>> https://www.hl7.org/fhir/medication.shex.html
>> >
>> > That kind of perspective I think makes two points. One is that
>> Schema.org's modeling style and hierarchical structure is not the only
>> place where discipline can be exercised usefully; and the second is that
>> more "knowledge graphy" usecases (beyond simple Web markup) are likely to
>> engage with other vocabularies and systems (e.g. scientific domains or
>> general like Wikidata), in which case we're unlikely to see a unified
>> modeling style across it all, and will likely end up focussing - again - on
>> documenting usefully re-usable patterns that address particular situations.
>> >
>> > Dan
>>
>>
>>
>
Received on Friday, 15 June 2018 16:21:44 UTC