Re: Extension Mechanism - Implementation Details from martin.hepp@ebusiness-unibw.org on 2015-06-11 (public-schemaorg@w3.org from June 2015)

From: <martin.hepp@ebusiness-unibw.org>
Date: Thu, 11 Jun 2015 14:58:40 +0200
To: Masahide Kanzaki <mkanzaki@gmail.com>
Cc: Dan Brickley <danbri@google.com>, "schema.org Mailing List" <public-schemaorg@w3.org>, Guha Guha <guha@google.com>, W3C Web Schemas Task Force <public-vocabs@w3.org>
Message-Id: <6E679763-EF2A-4BC4-AF99-ACEEE1287E1F@ebusiness-unibw.org>
Hi,
I think we need both alternatives and configure them on a per vocabulary base:

There are extensions (like auto) where it would be burdensome to use different type URIs depending on whether you want to use properties from the extension or just from core.

But there will also be extensions where we cannot easily guarantee that they are internally aligned with the entire schema.org core. I fear that GS1 will be such a case. For those, I think that using http://<prefix>.schema.org/<type> or full URIs for properties from the extension seems like a better fit.

Martin



> On 11 Jun 2015, at 04:22, KANZAKI Masahide <mkanzaki@gmail.com> wrote:
> 
> Hello, sorry for late response.
> 
>>> 2. Identifiers of elements from extensions in MARKUP
>>> ====================================================
>>> I think that, at least for reviewed extensions, we should have one flat namespace http://schema.org/<element_name>for types, properties, and individuals from both core AND extensions, in all syntaxes (Microdata, RDFa, and JSON-LD). Otherwise, we will make markup as complicated as in RDFa 1.0 times. You would have to choose one vocabulary per entity / itemscope and switch between the simple version of a type (e.g. http://schema.org/Car) and the enhanced type (e.g. (http://auto.schema.org/Car) depending on whether you need additional properties or you don't.
>> 
>> Yes, it is just http://schema.org/Car regardless, at some level.
>> Consumers should expect some confusion in the data, but we ought not
>> to force publishers to remember which extension is behind each term.
>> This may change as things move between extensions and the core, too.
> 
> hmm, Extension Mechanism page says
> [[
> Typically, a webpage / email will use only a single extension (e.g.,
> legal), in which case, instead of ‘schema.org’ they say
> ‘legal.schema.org’ and use all of the vocabulary in legal.schema.org
> and schema.org.
> ]]
> which meas, if I understand correctly, we shall use
> http://auto.schema.org/ for both Car and other core terms, not
> http://schema.org/. Am I missing anything ? (though we can avoid two
> namespaces problem, in either way)
> 
> And, so
>> Currently the namespace remains flat, so a proposal of "Bank" from a
>> finance: extension, or "Bank" from a rivers: extension would be
>> competing to define the same term.
> 
> sounds confusing, as per Extension Mechanism page
> [[
> Each reviewed extension (say, e1), gets its own chunk of schema.org
> namespace: e1.schema.org. .... Reviewed extensions are very different
> from proposals. A proposal, if accepted, with modifications could
> either go into the core or become a reviewed extension.
> ]]
> 
> Does above flat namespace mean reviewed extension, or propsal ?
> 
> cheers,
> 
> 2015-05-29 1:58 GMT+09:00 Dan Brickley <danbri@google.com>:
>> On 21 May 2015 at 10:08, martin.hepp@ebusiness-unibw.org
>> <martin.hepp@ebusiness-unibw.org> wrote:
>> 
>>> I have a few questions and recommendations regarding the schema.org extension mechanism.
>> 
>>> 1. Top-level notion of extensions
>>> =================================
>>> It is not yet fully clear to me whether the mechanism aims at being
>>> 
>>> a) an umbrella for a largely decentralized set of vocabularies, or
>>> b) just as a mechanism for partioning the vocabulary in order to simplify the management of the codebase.
>> 
>> Closer to (b), with aspirations towards decentralization. (talking
>> about hosted/reviewed extensions here)
>> 
>> Currently the namespace remains flat, so a proposal of "Bank" from a
>> finance: extension, or "Bank" from a rivers: extension would be
>> competing to define the same term.
>> 
>> That gets us into familiar dynamics and tradeoffs; either broadening
>> the definitions where feasible (not in the case sketched here), or
>> else using more specific terms, even at the cost of adding verbosity.
>> 
>>> I think that b) is more desireable, at least for reviewed extensions. In that case, users of schema.org in mark-up would not have to know whether a property comes from an extension or from core. Yet still, we could always trace down from which extension an element originates, and we can automatically spot name clashes from different extensions.
>> 
>> Yes. Guha made a good analogy with HTML a while back: HTML doesn't
>> force publishers/webmasters to remember who proposed <table> or
>> <image> or <div> or <legend>. And since our vocabulary is now quite
>> large, that lesson is a good reminder of that value of doing a lot
>> within a single flat namespace, even though it can be both frustrating
>> and inelegant sometimes. We suffer so that publishers have a slightly
>> easier time - this seems fair!
>> 
>> 
>> 
>>> In that scenario, the main benefit of the extension mechanism will be to keep contributions in individual files in the codebase, which frees us from the problem of removing/adding individual lines scattered across the RDFa file of the core vocabulary. In particular, it becomes easier to try and lateron remove contributions. In the traditional approach, it was very cumbersome to remove contributions at a later stage because they may be scattered across the entire RDFa file (in particular domain/range statements for existing elements). We will also have less merge conflicts.
>> 
>> I wouldn't worry so much about the codebase. We have in fact had the
>> capability to load data/*.rdfa files from several documents for many
>> months. The partitioning is more at a social level - we can have
>> sub-groups, collaborations, taskforces or however we term it. Having
>> these structured mirrored in the filesystem hierarchy is useful too,
>> but the main point is that it gives different groups of experts a
>> focal point for their collaboration. And it will also give some themed
>> entry points into the vocabulary documentation - publishers interested
>> in autos, bibliography, perhaps sports, medical/health, etc., will
>> have a natural starting point for site navigation.
>> 
>>> 2. Identifiers of elements from extensions in MARKUP
>>> ====================================================
>>> I think that, at least for reviewed extensions, we should have one flat namespace http://schema.org/<element_name> for types, properties, and individuals from both core AND extensions, in all syntaxes (Microdata, RDFa, and JSON-LD). Otherwise, we will make markup as complicated as in RDFa 1.0 times. You would have to choose one vocabulary per entity / itemscope and switch between the simple version of a type (e.g. http://schema.org/Car) and the enhanced type (e.g. (http://auto.schema.org/Car) depending on whether you need additional properties or you don't.
>> 
>> Yes, it is just http://schema.org/Car regardless, at some level.
>> Consumers should expect some confusion in the data, but we ought not
>> to force publishers to remember which extension is behind each term.
>> This may change as things move between extensions and the core, too.
>> 
>> JSON-LD does bring some additional expressivity, ... I think we can
>> make it so that e.g. auto: -centric markup can use an @context that
>> indicates auto.schema.org, even if the resulting triples do not. But
>> let's come back to that.
>> 
>>> This would add cognitive complexity and thus lots of errors in markup, in particular as we plan to extend types from schema.org with additional properties in extensions, i.e. there is likely overlap between the core and one or more extensions.
>> 
>> 
>> Yup
>> 
>>> 3. Redirects
>>> ============
>>> a) If a type or property or individual exists ONLY in one or more extensions, there should not be simply a 404 error when trying to dereference its URL from markup (i.e. the flat namespace).
>>> 
>>> So if there was a type "Foo" in the extension http://foo.schema.org, i.e. http://foo.schema.org/Foo, a HTTP GET and HEAD request to http://schema.org/Foo should not simply return a 404 status code, but either
>>> 
>>> - a 301 or 302 redirect to http://foo.schema.org/Foo (if only one extension defines it) or
>>> - a short overview page like
>>> 
>>> "The element you are referencing is not part of schema.org core, but is defined in the following extensions:
>>> 
>>> - http://foo.schema.org/Foo
>>> - http://acme.schema.org/Foo"
>> 
>> 
>> Yes, I made a first cut at something like this in the existing
>> codebase that shipped as v2. Let's follow up on the details in
>> Github...
>> 
>> cheers,
>> 
>> Dan
>> 
>>> We also have to check whether we understand the implications of overlapping definitions in more than one extension. In theory, two or more extensions could add conflicting statements to the same element from schema.org, e.g.
>>> - cycles of subClassOf or subPropertyOf statements or
>>> - clashing textual definitions.
>>> 
>>> b) If a type or property or individual is defined in schema.org core but EXTENDED in one ore more extensions, there must be reasonable hints to this.
>>> I have doubts that the current mechanism is sufficient.
>>> 
>>> A simple fix would be to return an overview page, like so
>>> 
>>> "Note: The element you are referencing is augmented in the following extensions:
>>> 
>>> - http://foo.schema.org/Foo
>>> - http://acme.schema.org/Foo"
>>> 
>>> Better would be to try to list additional properties and subtypes (for types) and values (for properties) directly in the page in schema.org core (http://schema.org/Foo) and indicate that they are from extensions by a different color.
>>> 
>>> 4. Textual definitions
>>> ======================
>>> What happens with the description of an element if it is updated by one or more extensions?
> 
> 
> -- 
> @prefix : <http://www.kanzaki.com/ns/sig#> . <> :from [:name
> "KANZAKI Masahide"; :nick "masaka"; :email "mkanzaki@gmail.com"].
Received on Thursday, 11 June 2015 12:59:15 UTC