Re: schema.org as reconstructed from the human-readable information at schema.org from Dan Brickley on 2013-10-25 (public-vocabs@w3.org from October 2013)

From: Dan Brickley <danbri@google.com>
Date: Fri, 25 Oct 2013 16:13:18 +1100
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
Cc: Guha <guha@google.com>, W3C Vocabularies <public-vocabs@w3.org>
Message-ID: <CAK-qy=40HVq7iG3QT5CdsgM3fW+TOYnSyjSciQLV94dMFnGTYg@mail.gmail.com>
On 25 October 2013 15:37, Peter F. Patel-Schneider
<pfpschneider@gmail.com> wrote:
> Strangenesses in schema.org, an incomplete list:
>
> Types as URLs.  Properties as strings.  Prescriptive property introductions.
> Closed set of types, particularly with open set of properties.  Union
> ranges, particularly with sub and super properties.  Single typing with a
> multiple-parent type hierarchy.  URLs as a subset of text. URl vs sameAs
> property.  additionalTypes property.

So we talked about this at some length here at ISWC, Peter. As I
mentioned f2f, I think you're (understandably given our docs) pushing
together some quite different kinds of issues. A lot of your comments
are specific to the Microdata syntax. Microdata can be seen as a fork
of RDFa as it was in 2009, i.e. RDFa 1.0. The doc in
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html
which introduced it talks through the various ways in which the
RDF-ness of RDFa was largely thrown out to form Microdata, even while
the basic/surface user-facing markup patterns remained pretty close.
Microdata became a lot simpler for publishers, but "threw out the baby
with the bath water" in terms of having an RDF interpretation of data
content. Two later pieces of work are relevant: RDFa 1.1 especially
RDFa Lite, which is a publisher-friendly and RDF-oriented view of RDFa
1.1, plus also Gregg's note on microdata/rdf mappings,
http://www.w3.org/TR/microdata-rdf/ . RDFa Lite is very close to
Microdata as far as publishers are concerned but more explicitly
parses into real RDF. It is heavier for parser writers, but there are
many more publishers than parser writers so that tradeoff seems
reasonable. At this stage in history, schema.org is pluralist w.r.t.
syntaxes; there are 5+ million domains publishing Microdata, ... that
format is not going away any time soon. But there are also important
advantages to RDFa - e.g. use of multiple types from independent
vocabularies, as well as more explicit mapping to RDF graphs than
given by the Microdata spec. We have posted (on blog.schema.org) posts
that are positive about RDFa Lite, about JSON-LD (considered as an RDF
notation), as well as Microdata. The common (RDF-based) data model
gives some unity to this. As you've noticed, the main schema.org site
is still very Microdata-centric. I expect us to add more examples in
other notations; already you can see some JSON-LD examples e.g.
http://schema.org/WatchAction and nearby.

So some of your questions (e.g. properties 'as literals') relate to
inadequacies of Microdata considered as a representational language.
Although it might be possible to improve the Microdata spec to some
extent, it is also appropriate to look to other representations like
RDFa and JSON-LD, rather than trying to gradually mutate Microdata
back into RDFa. Microdata is what it is, and it is not terribly hard
to extract a plausible RDF graph from it even if that transformation
is currently under-specified.

Beyond graph notation, there is another cluster of issues around
search engine pragmatism regarding pre-processing of messy data, the
trailing-slash extension model, strings-where-we-expect-things, etc.
Personal view here: a) the '/'-based extension mechanism proposed back
in 2011 has not been a success and should be de-emphasised. It is not
so useful to encourage people to write
'http://schema.org/Person/Minister'; better to migrate towards RDFa
and using real independently declared and RDFS-documentable subtypes
e.g. http://w3.org/ns/minister-vocab#Minister. b) where we say we'll
handle the appearance of a string in places where a string is
expected, we probably should say that we will treat that as a shortcut
for saying 'the string is the value of the http://schema.org/name
property of the thing' c) Mixing of URIs for things vs URIs for
documents that describe those things - at this stage, messyness 'comes
with the territory'. Everyone would like cleaner distinction between
abstract entities and the docs that describe them, but we won't get
there in the mainstream Web by simply demanding them - the Linked Data
community has learned that even amongst enthusiastic experts,
publishing such data is hard.

As far as properties as properties go, you've noticed a few cases
where we express in prose some notion of superproperty. There are also
'this property is replaced by that property' situations, e.g.
http://schema.org/actors vs http://schema.org/actor. Third, if you
look at the source files in W3C mercurial from which the schema.org
site is generated, you'll see documentation of equivalentClass /
equivalentProperty relationships in a few cases, e.g.
https://dvcs.w3.org/hg/webschema/file/2d9d90bce7a0/schema.org/ext/dataset.html
. In all the cases it is reasonable to expect more from the schema.org
site implementation (displaying this data, exposing it in RDF/RDFS
etc.), and documenting in some updated account of the data model. But
right now as Guha says, we only have a very informal, skeletal notion
of properties of properties. A step towards this was creating
per-property pages, that could carry such information in a simple
user-facing way.

Dan
Received on Friday, 25 October 2013 05:13:46 UTC