Re: schema.org as it could be

[I've snipped out the parts of my original message that are not relevant to 
the reply.]

On 01/06/2014 01:08 PM, Gregg Kellogg wrote:
> On Jan 6, 2014, at 12:33 PM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
>
>> schema.org As It Should Be
>>
>>
>> This is a pre-formal account of schema.org and schema.org content as I think
>> it should be.
[...]
>>    
>>
>> The types with strict generalization ancestor http://schema.org/Literal are
>> datatypes.  All the data values with the datatype as a direct type are
>> described in the datatypes defining document *(fragment)*. The datatypes
>> are http://schema.org/Boolean, http://schema.org/FloatingPointNumber,
>> http://schema.org/Integer, http://schema.org/Text, http://schema.org/URL,
>> http://schema.org/Date, http://schema.org/DateTime, and
>> http://schema.org/Time.
> Note that http://schema.org/Duration is really an exception here. Nominally, it is roughly equivalent to xsd:Duration, although in examples, the leading "P" is omitted. Also, it is a subclass of http://schema.org/Quantity < http://schema.org/Intangible < http://schema.org/Thing, not http://schema.org/Literal. Really, I think this is a mis-categorization, and http://schema.org/Duration should be a subclass of http://schema.org/Literal.
.
Hmm.  I don't see any indication in schema.org that .../Duration is a 
datatype.  If it is supposed to be then there should be some fixup done on 
schema.org for it and probably the other quantities as well.
>
> Also, I don't find http://schema.org/FloatingPointNumber, but rather http://schema.org/Float.

Yeah, I forgot that I changed the name to be something useful. Using .../Float 
is a bad idea.  (Of course, there are many other bad names under schema.org, 
this is just the one that ends up in my account.
>
> In my linter, I mostly equate the schema literals with their XSD equivalents, which I think is the intention. In practice, I've never seen any markup which creates a schema:Text datatype for a plain literal. It would certainly be useful if there was the equivalent of owl:sameAs relating the schema.org datatypes to their XSD equivalents, where they exist.
It might be useful to have some indication in schema.org of just what the 
datatypes are.  I didn't delve too deeply into data values, but I agree that 
using the XSD types is a good idea.
>> [...]
>>
>> Bare text can be used as if it was the value for any property.  If the
>> property does not have http://schema.org/Text or http://schema.org/Literal
>> as one of its ranges, but does have one or more datatypes as a range that
>> have a data value that can be written as the bare text then the actual value
>> for the property is one of these data values.  If the property does not have
>> http://schema.org/Text or http://schema.org/Literal as one of its ranges,
>> and does not have any suitable datatypes as a range, but does have one or
>> more non-datatypes as a range, then the actual value for the property is
>> some item that has a type that is one of these ranges and this item has the
>> text as a value of its http://schema.org/description property.  Otherwise
>> the actual value for the property is the bare text itself.
> I think this is an error; there are other cases where a property has schema:rangeIncludes of both object and literal types (see http://schema.org/citation having both CreativeWork and Text in the range). I think that, to be valid, if schema:Text is allowed in the range, it should be in rangeIncludes.
(What is the error that you are claiming here - that I've made some mistake in 
interpreting the conformance section in http://schema.org/docs/datamodel.html 
as requiring something like this or that there is something incoherent in the 
above account or that there is some overreach in the above account or that the 
whole thing is a bad idea?)
>
> That said, I don't believe any formal semantics for domainIncludes/rangeIncludes have ever been defined. I take it to be largely equivalent to the range being an anonymous subclass whiich is the union of the specified types and does not formally permit anything not explicitly stated. If it is intended to permit Text values where only an object datatype is specified, then this should be called out, and the vocabulary made consistent.
>
>  From a processing perspective, I agree that if the value is a literal then the actual value is a literal, or object as the case may be, but that this represents a range violation, IMO.

Well, I would be happy with disallowing bare text except where strings are a 
range, but my reading is that bare text is allowed anywhere, and that this is 
the way of taking the under-specified intent and turning it into something 
that can actually be used without having some background mechanism for mapping 
the text into the referent that the author intended.  (Inferring author intent 
here would require the resources of Bing, Yahoo!, Google, or the NSA.)  I 
thinik that this method provides something slightly useful, and that can be 
extended if something more is known about the referent.  (For example Google 
could use its "I'm feelling (knowledge-graph) lucky" mechanism, and the NSA 
could just look into the author's mind to determine the referent and then put 
that on the source page.)


>
> Gregg

peter

Received on Monday, 6 January 2014 21:56:09 UTC