Re: schema.org as it could be

On Jan 6, 2014, at 12:33 PM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:

> schema.org As It Should Be
> 
> 
> This is a pre-formal account of schema.org and schema.org content as I think
> it should be.
> 
> This account is definitely not targetted towards end users. Instead this
> account is designed to serve as a description of how schema.org could work
> in a way that can be easily turned into a formal account for schema.org.  I
> don't actually think that all the choices here are ideal, but changing some
> of them for the better would make radical changes to how schema.org works.
> 
> This account started out as an attempt to fill in the holes in the available
> descriptions of how schema.org actually works, even well after these holes
> have been pointed out.  I then realized that this attempt necessarily
> included the bulk of a vision of what schema.org should be as a useful
> formalism for representing and reasoning with information, so I made a few
> minor additions to result in better support for this vision. I have a full
> syntax and formal semantics that supports this vision of schema.org.
> 
> I'm sending this account out so that others can see both how the holes in
> the description of schema.org could be filled in and also see my vision of
> what schema.org should be.  Perhaps this account will help push schema.org
> towards a useful formalism for representing information in a way that can be
> effectively used and reasoned with.
> 
> 
> General Aspects
> 
> The are some parts of this account that can be considered as optional or are
> somewhat independent of the other parts of this account. These parts are
> enclosed in * below.  The parts can be roughly described as 1/ disjointness
> of types, properties, data values, and items; 2/ no fragment parts in types
> and properties; 3/ super-properties; and 4/ a kind of local unique name
> assumption.  There is support for each of these parts in documents or pages
> under schema.org.
> 
> Throughout this account, a URL is a uniform resource locator, optionally
> including a fragment part.  The document (fragment) at that URL is (the
> appropriate fragment of) the document obtained by the usual web mechanisms
> for retrieving a document given a URL.
> 
> The entities in schema.org are divided into types, properties, data values,
> and items.  *The sets of types, properties, data values, and items are
> pairwise disjoint.*
> 
> 
> Types
> 
> There is a collection of types, in a multi-parent generalization taxonomy,
> with two roots, http://schema.org/Thing and http://schema.org/Literal.  Each
> type is identified by a unique URL *without any fragment part*.  The
> document *(fragment)* at that URL defines the type, listing: 1/ some types
> that are more general than it (its parents), and 2/ for non-datatypes, its
> properties (see below).  Parents and properties, and instances where
> appropriate, are the only information about a type obtainable from its
> defining document *(fragment)*.
> 
> Each type has as a (non-strict) generalization ancestor either
> http://schema.org/Thing or http://schema.org/Literal, but not both.
> 
> The types with strict generalization ancestor http://schema.org/Literal are
> datatypes.  All the data values with the datatype as a direct type are
> described in the datatypes defining document *(fragment)*. The datatypes
> are http://schema.org/Boolean, http://schema.org/FloatingPointNumber,
> http://schema.org/Integer, http://schema.org/Text, http://schema.org/URL,
> http://schema.org/Date, http://schema.org/DateTime, and
> http://schema.org/Time.

Note that http://schema.org/Duration is really an exception here. Nominally, it is roughly equivalent to xsd:Duration, although in examples, the leading "P" is omitted. Also, it is a subclass of http://schema.org/Quantity < http://schema.org/Intangible < http://schema.org/Thing, not http://schema.org/Literal. Really, I think this is a mis-categorization, and http://schema.org/Duration should be a subclass of http://schema.org/Literal.

Also, I don't find http://schema.org/FloatingPointNumber, but rather http://schema.org/Float.

In my linter, I mostly equate the schema literals with their XSD equivalents, which I think is the intention. In practice, I've never seen any markup which creates a schema:Text datatype for a plain literal. It would certainly be useful if there was the equivalent of owl:sameAs relating the schema.org datatypes to their XSD equivalents, where they exist.

> The type http://schema.org/Enumeration has http://schema.org/Thing as a
> parent.  Those types with strict generalization ancestor
> http://schema.org/Enumeration are enumeration types.  All those items with
> the enumeration type as a direct type are listed in the type's defining
> document *(fragment)*.
> 
> The type http://schema.org/Thing has property http://schema.org/description.
> Other properties of http://schema.org/Thing are irrelevant to this account.
> 
> 
> Properties
> 
> There is a collection of properties, *disjoint from the types*, *in a
> multiple-parent generalization taxonomy with multiple roots*. Each property
> is identified by a unique URL *without any fragment part*. The document
> *(fragment)* at that URL defines the property, providing: 1/ one or more
> types that its values belong to (its ranges), *and 2/ some properties that
> are more general than it (its parents)*.  Ranges *and parents* are the only
> information about a property obtainable from its defining document
> *(fragment)*.
> 
> *For each range of a property there must be a range of each parent that is
> the same as or a generalization of the first range.*
> 
> The property http://schema.org/description has range http://schema.org/Text.
> 
> 
> Data Values
> 
> Data values belong to one or more datatypes, and are disjoint from types and
> properties.  There is more that needs to be said about data values, but it
> is all standard.
> 
> 
> Items
> 
> Items are things in the world, including information things, *and are
> disjoint from types, properties, and data values.* Items belong to (one or
> more) non-datatype types.  Items have zero or more URLs identifying them,
> i.e., a URL identifies at most one item.  Items are associated with items
> and data values via properties.  Every item belongs to
> http://schema.org/Thing.  If an item belongs to a type then it belongs to
> the parents of the type.
> 
> *If an item or data value is associated with an item via a property then the
> item or data value is also associated with the item via each parent of the
> property.* For each item or data value associated with an item via a
> property,
> 1/ there is a (non-strict) ancestor of one of the item's types that has
>   the property as one of its properties, and
> 2/ the item or data value belongs to one of the ranges of the property.
> 
> The document (fragments) at the URLs identifying an item provide information
> about the item, including types for the item as well as items and data
> values associated with the item via properties.  *An item cannot have two
> URLs that are the same except for their fragments, if they both have
> fragments, or the last segment of their hierarchical part, if they both do
> not have fragments.*
> 
> Bare text can be used as if it was the value for any property.  If the
> property does not have http://schema.org/Text or http://schema.org/Literal
> as one of its ranges, but does have one or more datatypes as a range that
> have a data value that can be written as the bare text then the actual value
> for the property is one of these data values.  If the property does not have
> http://schema.org/Text or http://schema.org/Literal as one of its ranges,
> and does not have any suitable datatypes as a range, but does have one or
> more non-datatypes as a range, then the actual value for the property is
> some item that has a type that is one of these ranges and this item has the
> text as a value of its http://schema.org/description property.  Otherwise
> the actual value for the property is the bare text itself.

I think this is an error; there are other cases where a property has schema:rangeIncludes of both object and literal types (see http://schema.org/citation having both CreativeWork and Text in the range). I think that, to be valid, if schema:Text is allowed in the range, it should be in rangeIncludes.

That said, I don't believe any formal semantics for domainIncludes/rangeIncludes have ever been defined. I take it to be largely equivalent to the range being an anonymous subclass whiich is the union of the specified types and does not formally permit anything not explicitly stated. If it is intended to permit Text values where only an object datatype is specified, then this should be called out, and the vocabulary made consistent.

From a processing perspective, I agree that if the value is a literal then the actual value is a literal, or object as the case may be, but that this represents a range violation, IMO.

Gregg

> Surface syntaxes
> 
> Any surface syntax must provide ways to write all possible data values (as
> long as they are not too big).
> 
> Any surface syntax must have ways to provide items with any number of types,
> including none, and values for any property of any of the provided types or
> their generalizations or http://schema.org/Thing, including allowing
> multiple values for a property.  Any surface syntax must provide ways for
> writing items with no identifying URLs.
> 
> Any surface syntax must specially process syntax that would otherwise
> produce values for http://schema.org/additionalType, turning the values into
> types; and http://schema.org/url and http://schema.org/sameAs, turning the
> values into identifying URLs.
> 
> Any surface syntax must allow bare text to be written as if it was the value
> for any property.
> 
> 
> Unused types and properties
> 
> The following URLs are not used to identify types or properties and if used
> in a surface syntax to provide information about an item they and their
> values must be ignored: http://schema.org/Class, http://schema.org/Property,
> http://schema.org/domainIncludes, http://schema.org/rangeIncludes,
> rdfs:subClassOf, rdfs:subPropertyOf, rdfs:domain, rdfs:range, rdfs:type,
> rdfs:Class, owl:Class, and rdf:Property.  The following URLs are not used to
> identify properties: http://schema.org/additionalType,
> http://schema.org/url, and http://schema.org/sameAs.
> 
> 

Received on Monday, 6 January 2014 21:08:48 UTC