W3C home > Mailing lists > Public > public-vocabs@w3.org > January 2014

Re: schema.org as it could be

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Tue, 07 Jan 2014 15:13:40 -0800
Message-ID: <52CC8A24.9060001@gmail.com>
To: Gregg Kellogg <gregg@greggkellogg.net>
CC: "public-vocabs@w3.org" <public-vocabs@w3.org>

On 01/06/2014 02:57 PM, Gregg Kellogg wrote:
> On Jan 6, 2014, at 1:55 PM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
>> [I've snipped out the parts of my original message that are not relevant to the reply.]
>> On 01/06/2014 01:08 PM, Gregg Kellogg wrote:
>>> On Jan 6, 2014, at 12:33 PM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
>>>> schema.org As It Should Be
>>>> This is a pre-formal account of schema.org and schema.org content as I think
>>>> it should be.
>> [...]
>>>> The types with strict generalization ancestor http://schema.org/Literal are
>>>> datatypes.  All the data values with the datatype as a direct type are
>>>> described in the datatypes defining document *(fragment)*. The datatypes
>>>> are http://schema.org/Boolean, http://schema.org/FloatingPointNumber,
>>>> http://schema.org/Integer, http://schema.org/Text, http://schema.org/URL,
>>>> http://schema.org/Date, http://schema.org/DateTime, and
>>>> http://schema.org/Time.
>>> Note that http://schema.org/Duration is really an exception here. Nominally, it is roughly equivalent to xsd:Duration, although in examples, the leading "P" is omitted. Also, it is a subclass of http://schema.org/Quantity < http://schema.org/Intangible < http://schema.org/Thing, not http://schema.org/Literal. Really, I think this is a mis-categorization, and http://schema.org/Duration should be a subclass of http://schema.org/Literal.
>> .
>> Hmm.  I don't see any indication in schema.org that .../Duration is a datatype.  If it is supposed to be then there should be some fixup done on schema.org for it and probably the other quantities as well.
> If you dereference http://schema.org/Duration, it says that it is in ISO 8601 duration format, which is equivalent to xsd:duration.

Yeah, OK, this could be read to indicate that .../Duration is indeed a 
datatype.  However, that contradicts both the fact that .../Duration is not a 
subclass of .../Literal and it is a subclass of .../Thing.
>>> In my linter, I mostly equate the schema literals with their XSD equivalents, which I think is the intention. In practice, I've never seen any markup which creates a schema:Text datatype for a plain literal. It would certainly be useful if there was the equivalent of owl:sameAs relating the schema.org datatypes to their XSD equivalents, where they exist.
>> It might be useful to have some indication in schema.org of just what the datatypes are.  I didn't delve too deeply into data values, but I agree that using the XSD types is a good idea.
>>>> [...]
>>>> Bare text can be used as if it was the value for any property.  If the
>>>> property does not have http://schema.org/Text or http://schema.org/Literal
>>>> as one of its ranges, but does have one or more datatypes as a range that
>>>> have a data value that can be written as the bare text then the actual value
>>>> for the property is one of these data values.  If the property does not have
>>>> http://schema.org/Text or http://schema.org/Literal as one of its ranges,
>>>> and does not have any suitable datatypes as a range, but does have one or
>>>> more non-datatypes as a range, then the actual value for the property is
>>>> some item that has a type that is one of these ranges and this item has the
>>>> text as a value of its http://schema.org/description property.  Otherwise
>>>> the actual value for the property is the bare text itself.
>>> I think this is an error; there are other cases where a property has schema:rangeIncludes of both object and literal types (see http://schema.org/citation having both CreativeWork and Text in the range). I think that, to be valid, if schema:Text is allowed in the range, it should be in rangeIncludes.
>> (What is the error that you are claiming here - that I've made some mistake in interpreting the conformance section in http://schema.org/docs/datamodel.html as requiring something like this or that there is something incoherent in the above account or that there is some overreach in the above account or that the whole thing is a bad idea?)
>  From the datamodel page you cite:
> [[[
> ... We also expect that often, where we expect a property value of type Person, Place, Organization or some other subClassOf Thing, we will get a text string. In the spirit of "some data is better than none", we will accept this markup and do the best we can.
> ]]]
> To me, this indicates that schema.org partners are prepared to deal with this, but that it is not "correct" as described in the datamodel. What you describe models what the datamodel says it can expect, but from a perspective of wanting to call out exceptions to the datamodel, I think it's correct to flag such usage. Where a Text or some other Literal is explicitly allowed, it should be identified using rangeIncludes.

My intent was to describe what should  go on in cases like this. Sure, you 
might want to label these as "exceptional" in some way, but what is the use of 
having a data model if it is expected that there are going to be many 
violations of it?
>>> That said, I don't believe any formal semantics for domainIncludes/rangeIncludes have ever been defined. I take it to be largely equivalent to the range being an anonymous subclass whiich is the union of the specified types and does not formally permit anything not explicitly stated. If it is intended to permit Text values where only an object datatype is specified, then this should be called out, and the vocabulary made consistent.
>>>  From a processing perspective, I agree that if the value is a literal then the actual value is a literal, or object as the case may be, but that this represents a range violation, IMO.
>> Well, I would be happy with disallowing bare text except where strings are a range, but my reading is that bare text is allowed anywhere, and that this is the way of taking the under-specified intent and turning it into something that can actually be used without having some background mechanism for mapping the text into the referent that the author intended.  (Inferring author intent here would require the resources of Bing, Yahoo!, Google, or the NSA.)  I thinik that this method provides something slightly useful, and that can be extended if something more is known about the referent.  (For example Google could use its "I'm feelling (knowledge-graph) lucky" mechanism, and the NSA could just look into the author's mind to determine the referent and then put that on the source page.)
> In my mind, this is the same as an rdfs:range violation (or rather, an inconsistency created as a result of inferring that the rdf:type of a literal is some kind of rdfs:Class, I believe). Many RDF documents can make this error, and it's fine from a datamodel perspective, but it does create an inconsistent RDFS graph. Similarly, if a literal is used where, say, a schema:Person is expected, it is fine from a datamodel perspective, and schema.org have said that they will accept this, but it creates some kind of inconsistency if the property's range does not include such a literal.
> For example, I've seen markup such as
>    :gregg foaf:knows "Peter" .
> even though the range of foaf:knows foaf:Person. This could be used to infer the following (under some regime):
>    :gregg foaf:knows [a foaf:Person; foaf:name "Peter" ] .
> I suspect that, when applied to e.g. schema:knows, this is the equivalent of what the schema.org datamodel suggests when it says that it will accept such usage and "do the best we can." That doesn't make it correct markup, but usable.
> Gregg

This is definitely a place where schema.org differs from RDF.   My intent was 
to devise a way of putting this difference on a firm foundation, so that this 
expected usage would be reasonably handled without requiring immense resources.

Received on Tuesday, 7 January 2014 23:14:09 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:49:20 UTC