Re: `generator` in Schema.org?

On 29 September 2015 at 16:48, Martynas Jusevičius
<martynas@graphity.org> wrote:
> I think PROV covers more than schema.org can ever aim to. We cannot have a
> single monolithic super-ontology for all domains. Vocabulary reuse has been
> one of the pillars of the Semantic Web. I don't understand why people would
> want to push everything and anything into schema.org.

(Let's please not have another big thread about the philosophy of the
Semantic Web here; they are rarely useful. That said, a few comments:)

Schema.org doesn't want to push everything and anything into
schema.org - you can be sure of that!

It's a lot of work, for a start. And there is a *lot* you can say
about any imaginable topic. Nobody can cover everything.

15+ years ago in the RDFS spec, some of us wrote ...
http://www.w3.org/TR/2000/CR-rdf-schema-20000327/#intro

"In RDF, all vocabularies are expressed within a single well defined
model. This allows for a finer grained mixing of machine-processable
vocabularies, and addresses the need to create metadata in which
statements can draw upon multiple vocabularies that are managed in a
decentralized fashion by independent communities of expertise."

I still stand by that. It is important to build upon notations like
JSON-LD, RDFa and (details aside) Microdata, so that different
vocabularies can share a common approach, and that diverse terms can
be combined within a common framework.

Where I think we went a bit wrong in the RDF / Semantic Web community
was really a matter of granularity. Somehow we concluded that a <
50-term vocabulary (e.g. Dublin Core, FOAF, SKOS) was roughly "big
enough", and that pieces of schema at that scale would be plugged
together by publishers. The mistake was to jump from "the technology
doesn't force us to carefully integrate our schemas" to "let's not
tightly carefully our schemas". The resulting chaotic diversity was
engaging for us as technologists, but pretty baffling for publishers,
webmasters and non-specialists. By contrast to earlier smallish RDF
schemas, Schema.org currently has around 639 types, 983 properties,
and 219 enumerated values.

Compared to FOAF + Dublin Core this might seem huge; but compared to
the enormity of information on the public Web (and beyond) this is
still a pretty small descriptive system, with many limitations in
expressivity.

As we said in http://blog.schema.org/2012/05/schemaorg-markup-for-external-lists.html
"The world is too rich, complex and interesting for a single schema to
describe fully on its own. With schema.org we aim to find a balance,
by providing a core schema that covers lots of situations, alongside
extension mechanisms for extra detail."

The recently refreshed extension mechanism (hosted + external, see
http://schema.org/docs/extension.html ) is part of our approach here.
There are also more recent developments like Wikipedia's Wikidata
effort, which may help with making the 'external enumerations' idea
get traction.

As far as PROV goes, I understand that it can be pretty useful in
scientific and scholarly contacts where fine-grained precision is
needed, but it goes beyond the typical level of detail for a
schema.org vocabulary.

Let's take the technical discussion over into
https://github.com/schemaorg/schemaorg/issues/809 - one usecase that
might be interesting is saying which tool generated a machine-readable
data feed (see the http://sdo-phobos.appspot.com/DataFeed type which
is queued for our next release).

cheers,

Dan

Received on Wednesday, 30 September 2015 11:45:52 UTC