- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Mon, 06 Jan 2014 12:33:23 -0800
- To: "public-vocabs@w3.org" <public-vocabs@w3.org>
schema.org As It Should Be This is a pre-formal account of schema.org and schema.org content as I think it should be. This account is definitely not targetted towards end users. Instead this account is designed to serve as a description of how schema.org could work in a way that can be easily turned into a formal account for schema.org. I don't actually think that all the choices here are ideal, but changing some of them for the better would make radical changes to how schema.org works. This account started out as an attempt to fill in the holes in the available descriptions of how schema.org actually works, even well after these holes have been pointed out. I then realized that this attempt necessarily included the bulk of a vision of what schema.org should be as a useful formalism for representing and reasoning with information, so I made a few minor additions to result in better support for this vision. I have a full syntax and formal semantics that supports this vision of schema.org. I'm sending this account out so that others can see both how the holes in the description of schema.org could be filled in and also see my vision of what schema.org should be. Perhaps this account will help push schema.org towards a useful formalism for representing information in a way that can be effectively used and reasoned with. General Aspects The are some parts of this account that can be considered as optional or are somewhat independent of the other parts of this account. These parts are enclosed in * below. The parts can be roughly described as 1/ disjointness of types, properties, data values, and items; 2/ no fragment parts in types and properties; 3/ super-properties; and 4/ a kind of local unique name assumption. There is support for each of these parts in documents or pages under schema.org. Throughout this account, a URL is a uniform resource locator, optionally including a fragment part. The document (fragment) at that URL is (the appropriate fragment of) the document obtained by the usual web mechanisms for retrieving a document given a URL. The entities in schema.org are divided into types, properties, data values, and items. *The sets of types, properties, data values, and items are pairwise disjoint.* Types There is a collection of types, in a multi-parent generalization taxonomy, with two roots, http://schema.org/Thing and http://schema.org/Literal. Each type is identified by a unique URL *without any fragment part*. The document *(fragment)* at that URL defines the type, listing: 1/ some types that are more general than it (its parents), and 2/ for non-datatypes, its properties (see below). Parents and properties, and instances where appropriate, are the only information about a type obtainable from its defining document *(fragment)*. Each type has as a (non-strict) generalization ancestor either http://schema.org/Thing or http://schema.org/Literal, but not both. The types with strict generalization ancestor http://schema.org/Literal are datatypes. All the data values with the datatype as a direct type are described in the datatypes defining document *(fragment)*. The datatypes are http://schema.org/Boolean, http://schema.org/FloatingPointNumber, http://schema.org/Integer, http://schema.org/Text, http://schema.org/URL, http://schema.org/Date, http://schema.org/DateTime, and http://schema.org/Time. The type http://schema.org/Enumeration has http://schema.org/Thing as a parent. Those types with strict generalization ancestor http://schema.org/Enumeration are enumeration types. All those items with the enumeration type as a direct type are listed in the type's defining document *(fragment)*. The type http://schema.org/Thing has property http://schema.org/description. Other properties of http://schema.org/Thing are irrelevant to this account. Properties There is a collection of properties, *disjoint from the types*, *in a multiple-parent generalization taxonomy with multiple roots*. Each property is identified by a unique URL *without any fragment part*. The document *(fragment)* at that URL defines the property, providing: 1/ one or more types that its values belong to (its ranges), *and 2/ some properties that are more general than it (its parents)*. Ranges *and parents* are the only information about a property obtainable from its defining document *(fragment)*. *For each range of a property there must be a range of each parent that is the same as or a generalization of the first range.* The property http://schema.org/description has range http://schema.org/Text. Data Values Data values belong to one or more datatypes, and are disjoint from types and properties. There is more that needs to be said about data values, but it is all standard. Items Items are things in the world, including information things, *and are disjoint from types, properties, and data values.* Items belong to (one or more) non-datatype types. Items have zero or more URLs identifying them, i.e., a URL identifies at most one item. Items are associated with items and data values via properties. Every item belongs to http://schema.org/Thing. If an item belongs to a type then it belongs to the parents of the type. *If an item or data value is associated with an item via a property then the item or data value is also associated with the item via each parent of the property.* For each item or data value associated with an item via a property, 1/ there is a (non-strict) ancestor of one of the item's types that has the property as one of its properties, and 2/ the item or data value belongs to one of the ranges of the property. The document (fragments) at the URLs identifying an item provide information about the item, including types for the item as well as items and data values associated with the item via properties. *An item cannot have two URLs that are the same except for their fragments, if they both have fragments, or the last segment of their hierarchical part, if they both do not have fragments.* Bare text can be used as if it was the value for any property. If the property does not have http://schema.org/Text or http://schema.org/Literal as one of its ranges, but does have one or more datatypes as a range that have a data value that can be written as the bare text then the actual value for the property is one of these data values. If the property does not have http://schema.org/Text or http://schema.org/Literal as one of its ranges, and does not have any suitable datatypes as a range, but does have one or more non-datatypes as a range, then the actual value for the property is some item that has a type that is one of these ranges and this item has the text as a value of its http://schema.org/description property. Otherwise the actual value for the property is the bare text itself. Surface syntaxes Any surface syntax must provide ways to write all possible data values (as long as they are not too big). Any surface syntax must have ways to provide items with any number of types, including none, and values for any property of any of the provided types or their generalizations or http://schema.org/Thing, including allowing multiple values for a property. Any surface syntax must provide ways for writing items with no identifying URLs. Any surface syntax must specially process syntax that would otherwise produce values for http://schema.org/additionalType, turning the values into types; and http://schema.org/url and http://schema.org/sameAs, turning the values into identifying URLs. Any surface syntax must allow bare text to be written as if it was the value for any property. Unused types and properties The following URLs are not used to identify types or properties and if used in a surface syntax to provide information about an item they and their values must be ignored: http://schema.org/Class, http://schema.org/Property, http://schema.org/domainIncludes, http://schema.org/rangeIncludes, rdfs:subClassOf, rdfs:subPropertyOf, rdfs:domain, rdfs:range, rdfs:type, rdfs:Class, owl:Class, and rdf:Property. The following URLs are not used to identify properties: http://schema.org/additionalType, http://schema.org/url, and http://schema.org/sameAs.
Received on Monday, 6 January 2014 20:33:53 UTC