Re: Extension Mechanism - Implementation Details from Dan Brickley on 2015-05-28 (public-schemaorg@w3.org from May 2015)

From: Dan Brickley <danbri@google.com>
Date: Thu, 28 May 2015 17:58:24 +0100
To: "martin.hepp@ebusiness-unibw.org" <martin.hepp@ebusiness-unibw.org>
Cc: "schema.org Mailing List" <public-schemaorg@w3.org>, Guha Guha <guha@google.com>, W3C Web Schemas Task Force <public-vocabs@w3.org>
Message-ID: <CAK-qy=4tFv60WvbaUsWZJHt7kn6K_NhVPZXyMMdcGeMEzr_UYA@mail.gmail.com>
On 21 May 2015 at 10:08, martin.hepp@ebusiness-unibw.org
<martin.hepp@ebusiness-unibw.org> wrote:

> I have a few questions and recommendations regarding the schema.org extension mechanism.

> 1. Top-level notion of extensions
> =================================
> It is not yet fully clear to me whether the mechanism aims at being
>
> a) an umbrella for a largely decentralized set of vocabularies, or
> b) just as a mechanism for partioning the vocabulary in order to simplify the management of the codebase.

Closer to (b), with aspirations towards decentralization. (talking
about hosted/reviewed extensions here)

Currently the namespace remains flat, so a proposal of "Bank" from a
finance: extension, or "Bank" from a rivers: extension would be
competing to define the same term.

That gets us into familiar dynamics and tradeoffs; either broadening
the definitions where feasible (not in the case sketched here), or
else using more specific terms, even at the cost of adding verbosity.

> I think that b) is more desireable, at least for reviewed extensions. In that case, users of schema.org in mark-up would not have to know whether a property comes from an extension or from core. Yet still, we could always trace down from which extension an element originates, and we can automatically spot name clashes from different extensions.

Yes. Guha made a good analogy with HTML a while back: HTML doesn't
force publishers/webmasters to remember who proposed <table> or
<image> or <div> or <legend>. And since our vocabulary is now quite
large, that lesson is a good reminder of that value of doing a lot
within a single flat namespace, even though it can be both frustrating
and inelegant sometimes. We suffer so that publishers have a slightly
easier time - this seems fair!



> In that scenario, the main benefit of the extension mechanism will be to keep contributions in individual files in the codebase, which frees us from the problem of removing/adding individual lines scattered across the RDFa file of the core vocabulary. In particular, it becomes easier to try and lateron remove contributions. In the traditional approach, it was very cumbersome to remove contributions at a later stage because they may be scattered across the entire RDFa file (in particular domain/range statements for existing elements). We will also have less merge conflicts.

I wouldn't worry so much about the codebase. We have in fact had the
capability to load data/*.rdfa files from several documents for many
months. The partitioning is more at a social level - we can have
sub-groups, collaborations, taskforces or however we term it. Having
these structured mirrored in the filesystem hierarchy is useful too,
but the main point is that it gives different groups of experts a
focal point for their collaboration. And it will also give some themed
entry points into the vocabulary documentation - publishers interested
in autos, bibliography, perhaps sports, medical/health, etc., will
have a natural starting point for site navigation.

> 2. Identifiers of elements from extensions in MARKUP
> ====================================================
> I think that, at least for reviewed extensions, we should have one flat namespace http://schema.org/<element_name> for types, properties, and individuals from both core AND extensions, in all syntaxes (Microdata, RDFa, and JSON-LD). Otherwise, we will make markup as complicated as in RDFa 1.0 times. You would have to choose one vocabulary per entity / itemscope and switch between the simple version of a type (e.g. http://schema.org/Car) and the enhanced type (e.g. (http://auto.schema.org/Car) depending on whether you need additional properties or you don't.

Yes, it is just http://schema.org/Car regardless, at some level.
Consumers should expect some confusion in the data, but we ought not
to force publishers to remember which extension is behind each term.
This may change as things move between extensions and the core, too.

JSON-LD does bring some additional expressivity, ... I think we can
make it so that e.g. auto: -centric markup can use an @context that
indicates auto.schema.org, even if the resulting triples do not. But
let's come back to that.

> This would add cognitive complexity and thus lots of errors in markup, in particular as we plan to extend types from schema.org with additional properties in extensions, i.e. there is likely overlap between the core and one or more extensions.


Yup

> 3. Redirects
> ============
> a) If a type or property or individual exists ONLY in one or more extensions, there should not be simply a 404 error when trying to dereference its URL from markup (i.e. the flat namespace).
>
> So if there was a type "Foo" in the extension http://foo.schema.org, i.e. http://foo.schema.org/Foo, a HTTP GET and HEAD request to http://schema.org/Foo should not simply return a 404 status code, but either
>
> - a 301 or 302 redirect to http://foo.schema.org/Foo (if only one extension defines it) or
> - a short overview page like
>
> "The element you are referencing is not part of schema.org core, but is defined in the following extensions:
>
> - http://foo.schema.org/Foo
> - http://acme.schema.org/Foo"


Yes, I made a first cut at something like this in the existing
codebase that shipped as v2. Let's follow up on the details in
Github...

cheers,

Dan

> We also have to check whether we understand the implications of overlapping definitions in more than one extension. In theory, two or more extensions could add conflicting statements to the same element from schema.org, e.g.
> - cycles of subClassOf or subPropertyOf statements or
> - clashing textual definitions.
>
> b) If a type or property or individual is defined in schema.org core but EXTENDED in one ore more extensions, there must be reasonable hints to this.
> I have doubts that the current mechanism is sufficient.
>
> A simple fix would be to return an overview page, like so
>
> "Note: The element you are referencing is augmented in the following extensions:
>
> - http://foo.schema.org/Foo
> - http://acme.schema.org/Foo"
>
> Better would be to try to list additional properties and subtypes (for types) and values (for properties) directly in the page in schema.org core (http://schema.org/Foo) and indicate that they are from extensions by a different color.
>
> 4. Textual definitions
> ======================
> What happens with the description of an element if it is updated by one or more extensions?
>
>
> Best wishes
>
> Martin
>
> -----------------------------------
> martin hepp  http://www.heppnetz.de
> mhepp@computer.org          @mfhepp
>
>
>
>
>
>
>
> Best wishes / Mit freundlichen Grüßen
>
> Martin Hepp
>
> -------------------------------------------------------
> martin hepp
> e-business & web science research group
> universitaet der bundeswehr muenchen
>
> e-mail:  martin.hepp@unibw.de
> phone:   +49-(0)89-6004-4217
> fax:     +49-(0)89-6004-4620
> www:     http://www.unibw.de/ebusiness/ (group)
>         http://www.heppnetz.de/ (personal)
> skype:   mfhepp
> twitter: mfhepp
>
> Check out GoodRelations for E-Commerce on the Web of Linked Data!
> =================================================================
> * Project Main Page: http://purl.org/goodrelations/
>
>
>
>
Received on Thursday, 28 May 2015 16:58:53 UTC