Schema.org master files (was: Re: Mappings between schema.org and other vocabs (especially from W3C groups))

On Nov 3, 2014 8:32 AM, "Dan Brickley" <danbri@google.com> wrote:
>
> On 3 November 2014 05:17, Stéphane Corlosquet <scorlosquet@gmail.com>
wrote:
> > No need for scrappers anymore. [...] the whole schema is available as
RDFa  http://schema.org/docs/schema_org_rdfa.html.

🎸 Scraper Scrapper 🎸 would be a good band name :-)

> http://schema.org/docs/schema_org_rdfa.html should be reasonably good
(since it is the master file that the entire site is built from). The
per-term RDFa is I think not yet perfected, but it is good to know people
find value in it.

> Also http://schema.org/docs/schema_org_rdfa.html contains a few
equivalentProperty and equivalentClass mappings to other vocabularies where
a simple obvious mapping exists. These are currently reflected into the
(not shown to humans) RDFa per-term markup.

> If there are more such mappings available, do please feel free to file a
bug in github with details, or a pull request, and we'll try to get more
into the site.

Some thoughts:

1. Generate the RDFa file from smaller files?

I wonder if it would be better to have the RDFa HTML file also be a
generated artifact; the file is getting a bit big for hand editing, and the
code already supports multiple files for development and testing.

2. If RDFa is generated, maybe use a different rdf format for the  master?

RDFa source is not as easy to read as dedicated formats like Turtle.
Information conveyed by the relative ordering of text in the RDFa file
could partially be captured by grouping related items explicitly, using
faceting classes, set membership, or punning and using SKOS.

This might also allow for generating more overview pages.

3. Would making the master format higher level be useful?

Another possible idea might be to use a format that edited using tools like
protege, top braid composer, etc.

A subset of OWL 2 might work if domainincludes and rangeincludes for a
given schema.org version are treated as  corresponding to union classes
(and possibly if the schema.org literal types are treated as boxed).

This would also open the way to adding the equivalent of "interArgIsa"
axioms (if the subject of a property is an instance of C, the value must be
an instance of D).

That could reduce the number of situations where the ranges listed for a
property on a classes page includes classes that don't make sense for this
class.

It also gives a place to attach annotations with more specific comments
/descriptions than is now the case (the current property comments have to
cover all possible applications, some of which don't make sense when used
on a specific class.

Additionally, it would be possible to use cardinality constraints to state
that a certain property should not be used on certain types (mountain has
at most 0 fax numbers).
This would allow the  per class page generation code to avoid listing such
properties.

All of these changes need only affect the site generation code.

Simon

Received on Monday, 3 November 2014 16:29:03 UTC