schema.org as it could be

schema.org As It Should Be


This is a pre-formal account of schema.org and schema.org content as I think
it should be.

This account is definitely not targetted towards end users. Instead this
account is designed to serve as a description of how schema.org could work
in a way that can be easily turned into a formal account for schema.org.  I
don't actually think that all the choices here are ideal, but changing some
of them for the better would make radical changes to how schema.org works.

This account started out as an attempt to fill in the holes in the available
descriptions of how schema.org actually works, even well after these holes
have been pointed out.  I then realized that this attempt necessarily
included the bulk of a vision of what schema.org should be as a useful
formalism for representing and reasoning with information, so I made a few
minor additions to result in better support for this vision. I have a full
syntax and formal semantics that supports this vision of schema.org.

I'm sending this account out so that others can see both how the holes in
the description of schema.org could be filled in and also see my vision of
what schema.org should be.  Perhaps this account will help push schema.org
towards a useful formalism for representing information in a way that can be
effectively used and reasoned with.


General Aspects

The are some parts of this account that can be considered as optional or are
somewhat independent of the other parts of this account. These parts are
enclosed in * below.  The parts can be roughly described as 1/ disjointness
of types, properties, data values, and items; 2/ no fragment parts in types
and properties; 3/ super-properties; and 4/ a kind of local unique name
assumption.  There is support for each of these parts in documents or pages
under schema.org.

Throughout this account, a URL is a uniform resource locator, optionally
including a fragment part.  The document (fragment) at that URL is (the
appropriate fragment of) the document obtained by the usual web mechanisms
for retrieving a document given a URL.

The entities in schema.org are divided into types, properties, data values,
and items.  *The sets of types, properties, data values, and items are
pairwise disjoint.*


Types

There is a collection of types, in a multi-parent generalization taxonomy,
with two roots, http://schema.org/Thing and http://schema.org/Literal.  Each
type is identified by a unique URL *without any fragment part*.  The
document *(fragment)* at that URL defines the type, listing: 1/ some types
that are more general than it (its parents), and 2/ for non-datatypes, its
properties (see below).  Parents and properties, and instances where
appropriate, are the only information about a type obtainable from its
defining document *(fragment)*.

Each type has as a (non-strict) generalization ancestor either
http://schema.org/Thing or http://schema.org/Literal, but not both.

The types with strict generalization ancestor http://schema.org/Literal are
datatypes.  All the data values with the datatype as a direct type are
described in the datatypes defining document *(fragment)*. The datatypes
are http://schema.org/Boolean, http://schema.org/FloatingPointNumber,
http://schema.org/Integer, http://schema.org/Text, http://schema.org/URL,
http://schema.org/Date, http://schema.org/DateTime, and
http://schema.org/Time.

The type http://schema.org/Enumeration has http://schema.org/Thing as a
parent.  Those types with strict generalization ancestor
http://schema.org/Enumeration are enumeration types.  All those items with
the enumeration type as a direct type are listed in the type's defining
document *(fragment)*.

The type http://schema.org/Thing has property http://schema.org/description.
Other properties of http://schema.org/Thing are irrelevant to this account.


Properties

There is a collection of properties, *disjoint from the types*, *in a
multiple-parent generalization taxonomy with multiple roots*. Each property
is identified by a unique URL *without any fragment part*. The document
*(fragment)* at that URL defines the property, providing: 1/ one or more
types that its values belong to (its ranges), *and 2/ some properties that
are more general than it (its parents)*.  Ranges *and parents* are the only
information about a property obtainable from its defining document
*(fragment)*.

*For each range of a property there must be a range of each parent that is
the same as or a generalization of the first range.*

The property http://schema.org/description has range http://schema.org/Text.


Data Values

Data values belong to one or more datatypes, and are disjoint from types and
properties.  There is more that needs to be said about data values, but it
is all standard.


Items

Items are things in the world, including information things, *and are
disjoint from types, properties, and data values.* Items belong to (one or
more) non-datatype types.  Items have zero or more URLs identifying them,
i.e., a URL identifies at most one item.  Items are associated with items
and data values via properties.  Every item belongs to
http://schema.org/Thing.  If an item belongs to a type then it belongs to
the parents of the type.

*If an item or data value is associated with an item via a property then the
item or data value is also associated with the item via each parent of the
property.* For each item or data value associated with an item via a
property,
1/ there is a (non-strict) ancestor of one of the item's types that has
    the property as one of its properties, and
2/ the item or data value belongs to one of the ranges of the property.

The document (fragments) at the URLs identifying an item provide information
about the item, including types for the item as well as items and data
values associated with the item via properties.  *An item cannot have two
URLs that are the same except for their fragments, if they both have
fragments, or the last segment of their hierarchical part, if they both do
not have fragments.*

Bare text can be used as if it was the value for any property.  If the
property does not have http://schema.org/Text or http://schema.org/Literal
as one of its ranges, but does have one or more datatypes as a range that
have a data value that can be written as the bare text then the actual value
for the property is one of these data values.  If the property does not have
http://schema.org/Text or http://schema.org/Literal as one of its ranges,
and does not have any suitable datatypes as a range, but does have one or
more non-datatypes as a range, then the actual value for the property is
some item that has a type that is one of these ranges and this item has the
text as a value of its http://schema.org/description property.  Otherwise
the actual value for the property is the bare text itself.


Surface syntaxes

Any surface syntax must provide ways to write all possible data values (as
long as they are not too big).

Any surface syntax must have ways to provide items with any number of types,
including none, and values for any property of any of the provided types or
their generalizations or http://schema.org/Thing, including allowing
multiple values for a property.  Any surface syntax must provide ways for
writing items with no identifying URLs.

Any surface syntax must specially process syntax that would otherwise
produce values for http://schema.org/additionalType, turning the values into
types; and http://schema.org/url and http://schema.org/sameAs, turning the
values into identifying URLs.

Any surface syntax must allow bare text to be written as if it was the value
for any property.


Unused types and properties

The following URLs are not used to identify types or properties and if used
in a surface syntax to provide information about an item they and their
values must be ignored: http://schema.org/Class, http://schema.org/Property,
http://schema.org/domainIncludes, http://schema.org/rangeIncludes,
rdfs:subClassOf, rdfs:subPropertyOf, rdfs:domain, rdfs:range, rdfs:type,
rdfs:Class, owl:Class, and rdf:Property.  The following URLs are not used to
identify properties: http://schema.org/additionalType,
http://schema.org/url, and http://schema.org/sameAs.

Received on Monday, 6 January 2014 20:33:53 UTC