Re: rdfs:domain and refs:range in schema.org from Holger Knublauch on 2016-11-26 (public-schemaorg@w3.org from November 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Sun, 27 Nov 2016 08:30:07 +1000
To: public-schemaorg@w3.org
Message-ID: <49531b8b-ae66-ef06-531e-c4d3847023b5@topquadrant.com>
IMHO neither rdfs:domain nor schema:domainIncludes are ideal for 
schema.org. The whole notion of "global" property axioms is 
questionable. schema.org is class-centric and supposed to grow. To 
support its growth, properties should be attached locally to classes, in 
OO style.

rdfs:domain is a global axiom that can be used to infer types in cases 
where no type can be derived from the given instance. In that case, 
looking up the URL of the property itself is a suitable strategy, and 
the URL would deliver the rdfs:domain statement. schema:domainIncludes 
seems to follow this pattern, but without providing particularly useful 
information.

Even the global property labels and comments are rather unhelpful, 
because they need to cover all use cases of the property. However, in 
many cases where properties are shared between classes, they in fact 
should have different labels and comments. Picking a random example:

     http://schema.org/creator

which states "The creator/author of this CreativeWork." but its domain 
includes CreativeWork and UserComments. Quite likely this property 
started as a property of CreativeWork and then somebody decided they 
also need some kind of "creator" and the English term produced an 
overlap. The first use of the property should not limit future uses.

IMHO a better way of associating properties with classes would be using 
something like SHACL. In the case of schema:creator this could look like

schema:CreativeWork
     a rdfs:Class ;
     sh:property [
         sh:predicate schema:creator ;
         sh:description "The creator/author of this CreativeWork." ;
         ... other constraints in the context of CreativeWork
     ] ;
     ...

schema:UserComments
     a rdfs:Class ;
     sh:property [
         sh:predicate schema:creator ;
         sh:description "The creator of this comments." ;
         ... other constraints in the context of UserComments
     ] ;
     ...

This allows any future class to reuse the property without having to 
update a global definition that other applications may have gotten to 
rely on. Furthermore, it allows for class-specific constraints and 
definitions, e.g. properties may get different datatypes or cardinalities.

And, I would recommend against going back to rdfs:domain for schema.org. 
Almost nobody in practice understands its semantics.

Regards,
Holger



On 25/11/2016 21:28, Dan Brickley wrote:
> On 24 November 2016 at 04:05, Phil Archer <phila@w3.org> wrote:
>> This presents either a problem or an opportunity (and I'd like to know which
>> is true).
>>
>> The opportunity presented by "domainIncludes" is that you can, I think, use
>> a property on a class that is not listed as a domain. In something I'm doing
>> right now for the European Commission, I want to use schema:openingHours on
>> a schema:ContactPoint. Since the domain of schema:openingHours 'includes'
>> CivicStructure and LocalBusiness, perhaps that's OK? After all, 'includes'
>> suggests it's not an exhaustive list. schema:ContactPoint's suggested
>> schema:hoursAvailable property leads to a more complex
>> schema:OpeningHoursSpecification that is useful for declaring exceptions -
>> and we want to use that too - but it seems overly complex for a simple
>> "usually open Monday to Friday 9 - 5" statement.
>>
>> So here, domainIncludes, as explained by Dan, wins.
>>
>> But... Martin's example shows that's *not* how it's being used. Rather, it's
>> being used as a constraint language, which I regard as a separate thing
>> altogether.
> Martin wrote "instead of throwing a constraint violation error."; I'd
> suggest this should just be a warning that the use is potentially an
> obscure, new or niche usage and that consequently it might not be
> widely understood.
>
>> If I put a schema:openingHours property on a schema:ContactPoint, the
>> structured data tester will say it doesn't understand my data.
> *the* ? There are several, e.g. Gregg's, Google's
> (http://developers.google.com/structured-data/testing-tool)
>
> I would say that Google's structured data testing tool (SDTT) is
> somewhat too strict in its tone, and too needy in its requirements,
> for my taste.
>
> Compare to the most recent language on validation in
> http://schema.org/docs/datamodel.html under "Conformance". This
> elaborates on schema.org's longstanding and pretty tolerant approach
> to conformance.
>
>> Does that
>> mean my data is invalid for all potential data consumers or just the search
>> engines?
> No, neither.
>
>> If the data is actually invalid then I'd say that rangeIncludes and
>> domainIncludes seem to be mis-named. "domainResterictedToOnly" seems more
>> honest? Or am I missing something?
> Schema.org's domainIncludes and rangeIncludes are pretty weak by
> design. It might be that in many cases we could comfortably enough
> assert rdfs:domain and rdfs:range too. I'm not sure that would add a
> great deal of value, and when you look at the kinds of mistakes and
> errors commonly made in real world data they're often invisible at
> this level of data analysis anyway...
>
> Dan
>
>> Phil
>>
>>
>> ==Dan's reply copied from archive for reference ==
>>
>> We wanted to leave the flexibility to evolve the schemas incrementally
>> without breaking "promises" expressed with RDFS's range/domain, and without
>> adding lots of artificial supertypes to group different types within a
>> common type.
>>
>>
>>
>> == Martin's reply Copied from archive for reference ==
>>
>> Hi Alex:
>>
>> This is because the semantics of RDFS domain and range constructs *imply*
>> additional type membership instead of *constraining* the applicability of a
>> property to a class or value.
>>
>> With RDFS semantics, a domain spec like so
>>
>>      foo:schoolAttended rdfs:domain foo:Human.
>>
>> in combination with the statement
>>
>>      foo:myDog a foo:Dog ;
>>                foo:schoolAttended "ACME High School".
>>
>> implies that
>>
>>      foo:myDog a foo:Human
>>
>> instead of throwing a constraint violation error.
>>
>>
>> Also, if a property had multiple classes as its range or domain, you have to
>> create many useless complex classes in order to avoid unintended type
>> membership inferences:
>>
>> In RDFS, a domain spec like so
>>
>>      foo:yearOfBirth rdfs:domain foo:Human, foo:Dog.
>>
>> in combination with the statement
>>
>>      foo:myDog a foo:Dog ;
>>                foo:yearOfBirth 1971.
>>
>> implies that your dog is a dog and a human:
>>
>>      foo:myDog a foo:Human, foo:Dog.
>>
>> i.e. the intersection of being a dog and human, whatever that is.
>>
>> The only way to avoid this are complex class definitions, like so:
>>
>>       foo:yearOfBirth rdfs:domain [ a owl:Class;
>>                                       owl:unionOf (foo:Human, foo:Dog) ].
>>
>> which will create many, many of those useless classes in the ontology
>> because of combinatorial effects.
>>
>> Martin
>>
>> -----------------------------------
>> martin hepp  http://www.heppnetz.de
>> mhepp@computer.org          @mfhepp
>>
>>
>>
>>
>>> On 21 Nov 2016, at 16:39, Alex Prut <mail@alexprut.com> wrote:
>>>
>>> Hello all,
>>> I'm looking at the schema.org raw ontology implementation and
>>> documentation, but I can’t find a reason why the ontology was implemented
>>> using the schema:domainIncludes and schema:rangeIncludes properties, instead
>>> of the standard RDFs rdfs:domain and rdfs:range?
>>> Thanks,
>>> Alexandru Pruteanu (M.Sc. in Computer Science at University of Udine)
>>> mail@alexprut.com
>>>
>> --
>>
>>
>> Phil Archer
>> Data Strategist, W3C
>> http://www.w3.org/
>>
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1
>>
Received on Saturday, 26 November 2016 22:30:48 UTC