Re: Some interesting things that show up when using a reasoner to classify schema.org

On Feb 4, 2015 9:13 AM, "Guha" <guha@google.com> wrote:

> I think we have reached a stage where we should look beyond just search
engines. We need to include any/more large consumers of structured data on
the internet, such as Cortana, Pinterest, Gmail, Google Now and others.
However, I do believe that we need to be firmly anchored to the reality of
the information needs of applications consuming the data.

⎕ 'dat! (for suitable definitions of firmly, reality, and needs)

Use cases should drive the ontologies, but generalizations should bubble
up, and the information needs should not be construed broadly, so that the
data and the schemata can be reused for the next epic .

It would possibly be a good idea to use a (subset of a?) more expressive
and less bulky language for specifying the schemata,even if this is
rendered down into a simpler form form for presentation. CycL,  KE,  IKL,
OWL, a controlled natural language, etc.  The current approach of using a
hand edited html RDFa file is not scalable and is a source of error.

It should be possible to specify use sdo without having to become part of
sdo.

It would definitely be a good idea to be able to mark properties as highly
salient for a given class, and to be able to indicate where the range of a
property depends on the domain of the entity to which it is applied. It
would also be good to be able to indicate that a property is not
applicable.

The ability to express disjunction and to make negative assertions might
also be useful.

There is a clear need to make base schema more modular; if this requires
creative re-interpretation of microdata, then microdata should change to
fit reality.

There are changes to the existing schema definitions that would help with
reuse.  Many domains and ranges are over or under specific.

For example, if the range of a property is conceptually an entity of a
certain type, but values may be given as text that describes or names an
entity, this ought to be represented conceptually, rather than simply
adding Text to a range.

Properties that have a large number of domainIncludes are a sign of missing
abstractions (and often contain errors).

Properties with unions of a seemingly unrelated types are
Properties that have ranges of (Text or URL) are often identifiers for
things of a certain type.

Simon

Received on Wednesday, 4 February 2015 15:15:42 UTC