Re: page collecting schema.org properties from Dan Brickley on 2015-07-13 (public-schemaorg@w3.org from July 2015)

From: Dan Brickley <danbri@google.com>
Date: Mon, 13 Jul 2015 20:11:18 +0100
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Nicolas Torzec <torzecn@yahoo-inc.com>
Cc: Stéphane Corlosquet <scorlosquet@gmail.com>, John Walker <john.walker@semaku.com>, "public-schemaorg@w3.org" <public-schemaorg@w3.org>
Message-ID: <CAK-qy=4YU=E32FpX5ZLwjSxiuAwE6p9UuixYmHiGoB5-XwVXnQ@mail.gmail.com>

On 13 July 2015 at 19:08, Peter F. Patel-Schneider
<pfpschneider@gmail.com> wrote:
> What I wanted to do is to browse the properties.  Parsing the RDFa page
> doesn't end up with nice browsing.
>
> As far as the status of the RDFa goes, is this file all that is known about
> schema.org classes and properties?  Was it ever the case that, for example,
> there was information that the Property range for supersededBy was associated
> with the Property domain for it?

At this stage if you want this kind of history you'll need to dig
around in the Github history. I'd like to get
http://schema.org/version/ backfilled with earlier snapshots
eventually (going back before the Github repo too).  While I can see a
case for keeping a kind of historical event log regarding schema
changes within the main schema.org file it could soon grow large and
start to replicate the native functionality of Git. Over at Dublin
Core Tom Baker has done something simple in this direction for changes
to term definitions - see http://dublincore.org/usage/terms/history/

Regarding http://dydra.com/danbri/schema-org I have just cleared that
repository and re-imported from the v2.0 release (using the new
RDFa-loading feature, no need for Turtle intermediate representation
this time :) I see a possible bug in their UI as the dashboard shows
"Data: 0 statements", however it does also show a transaction log as
follows: "You cleared danbri/schema-org 5 minutes ago"; "You imported
9,023 statements intodanbri/schema-org 5 minutes ago". I've also
updated a few queries which still assumed we were using explicit named
graphs. While I could populate this from the sdo-ganymede draft site,
we may as well wait a week and do it from the real site.

To Nicolas's comment, the RDFa/RDFS document at
http://schema.org/docs/schema_org_rdfa.html (or the datestamped
variants at /version/) do provide single machine-processable files
with all (current or date-stamp released) information about schema.org
classes and properties. While RDFa might not be everyone's favourite
format, it is reasonably well defined and unfair to call RDFa parsing
"scraping" just because some onto viz tools prefer RDF/XML or Turtle.

I believe it is important for the project to keep a clear history
accessible of all changes to the schemas (and there's more work to do
there making things accessible). I'm not so sure that this history
needs to clutter up the main machine-readable files that are used.

BTW for D3 visualization there is also a current-state-of-the-schemas
D3-compatible RDFS/JSON-LD file (yes, a weird hybrid) at
http://schema.org/docs/tree.jsonld For an example of using it (or
something similar) see here and nearby:
http://danbri.org/2013/SchemaD3/examples/4063550/hack3d.html ... idea
was that it could be used as a skeleton for different kinds of
visualization, including stats, provenance etc. (the earlier demo
exposed 'source' links for Wikidoc, rNews, LRMI etc. contributed
terms).

Maybe of interest - Richard Cyganiak implemented CSV dumps -
https://github.com/schemaorg/schemaorg/issues/390 (the pull request
needs an update or manual merging but the basics are there). I'm sure
a lot of developers would prefer CSV summaries of the term hierarchies
to Microdata/RDFa, JSON-LD, Turtle etc...

Dan

Received on Monday, 13 July 2015 19:11:50 UTC