Re: Proposal for Schema.org extension mechanism from Martin Hepp on 2015-03-24 (public-vocabs@w3.org from March 2015)

From: Martin Hepp <martin.hepp@unibw.de>
Date: Tue, 24 Mar 2015 09:57:43 +0100
To: Guha Guha <guha@google.com>, W3C Web Schemas Task Force <public-vocabs@w3.org>
Cc: John Walker <john.walker@semaku.com>, Sandro Hawke <sandro@w3.org>, Ralph Swick <swick@w3.org>, Tim Berners-Lee <timbl@w3.org>
Message-Id: <B8B0675B-A514-47F4-95C9-7742612EDFEC@unibw.de>
I would also like to support the proposal by a look at resources: Most =
people assume that developing Web ontologies / shared schemas is =
relatively little work; after all, how much time does it take to write =
the definitions for a couple of types and properties? But the reality is =
that building Web ontologies requires=20

1. a very well-chosen conceptual model that represents "sweet spots" of =
data structures that=20
- are non-trivial to reconstruct by a client from unstructured data,
- are easy to grasp by human developers from different cultural and =
professional backgrounds from the label and short description alone,
- are "markup friendly", i.e. as simple as possible in RDFa and =
Microdata, and
- can be reliably populated from existing data structures (e.g. match =
typical distinctions / structures in back-end databases), and then

2. align that model well with the existing elements in other =
vocabularies, namely schema.org (e.g. avoiding the growth of redundant =
branches of functionality that already exist and avoiding name clashes), =
and

2. writing documentation and examples.

Just an estimate: An extension proposal for schema.org of ca. 10 - 15 =
types plus 15 - 25 properties takes me, roughly, a year, to design. =
Maybe not full-time, but due to the many discussions and stakeholder, it =
is really some effort over that period of time.

Now this is only the initial proposal. On the sides of the sponsors of =
schema.org and other consuming clients, you have to review #1 - #3. In =
particular checking that an extension proposal is technically sound with =
regard to all subtleties of the schema.org ecosystem, and that it =
masters #1 well, is really time-consuming. And it is difficult for =
domain experts who do it for the first time to get #1 and #2 right.

Extensions beyond ca. 10-20 new types require a really substantial =
amount of resources from both the external proposers and the sponsors of =
schema.org.

This is why I think we urgently need such a mechanism to tap the =
potential of the many, many interesting schemas and standards out there =
for publishing more structured data on the Web without the need to =
channel those through the social and technical process of getting into =
schema.org core.


Martin
--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  martin.hepp@unibw.de
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp=20
twitter: mfhepp





On 20 Mar 2015, at 16:40, Guha <guha@google.com> wrote:

> This is a very important point you bring up, one that goes to the =
heart of a lot of schema.org decisions.
>=20
> I do agree that it is much 'cleaner' for a term to have a single =
namespace. The cost of this is that the webmaster needs to keep track of =
namespaces. We estimate on the order of 100s or 1000s of vocabulary =
creators. We already have millions of webmasters using schema.org. Most =
applications will use only a single extension, which means that under =
the proposed scheme, they don't have to worry about namespaces.
>=20
> Mixing and matching is of course always open (and welcome). More =
technical webmasters will do that. We just don't want it to be a =
requirement to start participating ...
>=20
> Thanks for the comments.
>=20
> guha
>=20
> On Fri, Mar 20, 2015 at 1:21 AM, John Walker <john.walker@semaku.com> =
wrote:
> Hi Guha,
> =20
> I have a few questions/thoughts around the  proposal that every item =
in the core would also be in every extension.
> =20
> Would this apply only to the reviewed extensions only, or also to =
external extensions?
> =20
> I can understand that only using terms from a single prefix lowers the =
bar for getting started, but I don't think it's too tricky to get your =
head round using multiple prefixes in any of the syntaxes (although some =
are easier than others).
> IMHO it would be simpler and more understandable to have a single =
identifier (URL/URI/IRI) for each term/item rather than multiple =
aliases.
> (This however would not preclude that two different extensions might =
have a different term/item for a very similar concept and hence each has =
own identifier)
> =20
> Also I expect many practical use cases where users need to mix'n'match =
terms from different extensions.
> For example the GS1 extension would have many terms for general use =
and hopefully enough to cover some specific domains like food and =
beverage, but may not fully cover other domains like consumer =
electronics.
> Admittedly this will not be needed in all cases, but I think there are =
enough to warrant giving it some deep thought (i.e. it is far from a =
corner case).
> =20
> Regards,=20
>=20
> John Walker=20
> Principal Consultant & co-founder=20
> Semaku B.V.=20
> SFJ 4.009, Torenallee 20, 5617 BC Eindhoven=20
> Mobile: +31 6 475 22030=20
> Email: john.walker@semaku.com=20
> Skype: jaw111=20
>=20
> KvK: 58031405=20
> BTW: NL852842156B01=20
> IBAN: NL94 INGB 0008 3219 95=20
>=20
> =20
>> On March 20, 2015 at 12:36 AM Guha <guha@google.com> wrote:=20
>>=20
>> The various discussions around this extension proposal seem to have =
reached quiescence. I am hoping this is more because the questions were =
answered than because of boredom.
>> =20
>> We would like to proceed with the implementation of this proposal. If =
there are strong objections, now would be the right time to raise them.
>> =20
>> guha=20
>>=20
>> On Fri, Feb 13, 2015 at 1:34 PM, Guha <guha@google.com> wrote:=20
>> =20
>> Schema.org extension mechanism
>> =20
>>=20
>> Motivation
>>    As schema.org adoption has grown, a number groups with more =
specialized vocabularies have expressed interest in extending schema.org =
with their terms. The most prominent example of this is GS1 with product =
vocabularies. Other examples include real estate, medical and =
bibliographic information. Even in something as common as human names, =
there are groups interested creating the vocabulary for representing all =
the intricacies of names.
>>=20
>> Outline of solution
>>=20
>> There are two kinds of extensions: reviewed extensions and external =
extensions. Both kinds of extensions typically add subclasses and =
properties to the core. Properties may be added to existing and/or new =
classes. More generally, they are an overlay on top of the core, and so =
they may add domains/ranges, superclasses, etc. as well. Extensions have =
to be consistent with the core schema.org. Every item in the core (i.e., =
www.schema.org) is also in every extension. Extensions might overlap =
with each other in concepts (e.g., two extensions defining terms for =
financial institutions, one calling it FinancialBank and other calling =
it FinancialInstitution), but we should not have the same term being =
reused to mean something completely different (e.g., we should not have =
two extensions, one using Bank to mean river bank and the other using =
Bank to mean financial institution).
>>=20
>> Reviewed Extensions
>> Each reviewed extension (say, e1), gets its own chunk of schema.org =
namespace: e1.schema.org. The items in that extension are created and =
maintained by the creators of that extension.  Reviewed extensions are =
very different from proposals. A proposal, if accepted, with =
modifications could either go into the core or become a reviewed =
extension.
>>=20
>> A reviewed extension is something that has been looked at and =
discussed by the community, albeit not as much as something in the core. =
We also expect a reviewed extension to have strong community support, =
preferably in the form of a few deployments.
>>=20
>> External Extensions
>> Sometimes there might be a need for a third party (such as an app =
developer) to create extensions specific to their application. For =
example, Pinterest might want to extend the schema.org concept of =
=91Sharing=92 with =91Pinning=92. In such a case, they can create =
schema.pinterest.com and put up their extensions, specifying how it =
links with core schema.org. We will refer to these as external =
extensions.
>>  =20
>> How it works for webmasters
>> All of Schema.org core and all of the reviewed extensions will be =
available from the schema.org website. Each extension will be linked to =
from each of the touch points it has with the core. So, if an extension =
(say, having to do with Legal stuff) creates =
legal.schema.org/LegalPerson which is a subclass of schema.org/Person, =
the Person will link to LegalPerson.  Typically, a webpage / email will =
use only a single extension (e.g., legal), in which case, instead of =
=91schema.org=92 they say =91legal.schema.org=92 and use all of the =
vocabulary in legal.schema.org and schema.org.
>>=20
>> As appropriate, the main schema.org site will also link to relevant =
external extensions. With external extensions, the use of multiple =
namespaces is unavoidable.
>>=20
>> What does someone creating an extension need to do
>>  We would like extension creators to not have to worry about running =
a website for their extension. Once the extension is approved, they =
simply upload a file with their extension into a certain directory on =
github. Changes are made through the same mechanism.
>>=20
>> Since the source code for schema.org is publicly available, we =
encourage creators of external extensions to use the same application.
>>=20
>> Examples
>>=20
>> Archives example in RDFa
>>=20
>> This example uses a type that makes sense for archival and =
bibliographic applications but which is not currently in the schema.org =
core: Microform, defined as "Any form, either film or paper, containing =
microreproductions of documents for transmission, storage, reading, and =
printing. (Microfilm, microfiche, microcards, etc.)"
>>=20
>> The extension type is taken from  http://bibliograph.net/Microform, =
(which on this proposed model would move to bib.schema.org) which is a =
version of the opensource schema.org codebases that overlays =
bibliographic extras onto the core schema.org types. The example is =
adapted from http://schema.org/workExample.
>>=20
>>=20
>> <div vocab=3D"http://bib.schema.org/">
>>    <p typeof=3D"Book" resource=3D"http://www.freebase.com/m/0h35m">
>>        <em property=3D"name">The Fellowship of the Rings</em> was =
written by
>>        <span property=3D"author">J.R.R Tolkien</span> and was =
originally published
>>        in the <span property=3D"publisher" typeof=3D"Organization">
>>            <span property=3D"location">United Kingdom</span> by
>>            <span property=3D"name">George Allen & Unwin</span>
>>        </span> in <time property=3D"datePublished">1954</time>.
>>        The book has been republished many times, including editions =
by
>>        <span property=3D"workExample" typeof=3D"Book">
>>            <span property=3D"publisher" typeof=3D"Organization">
>>                <span property=3D"name">HarperCollins</span>
>>            </span> in <time property=3D"datePublished">1974</time>
>>            (ISBN: <span property=3D"isbn">0007149212</span>)
>>        </span> and by
>>        <span property=3D"workExample" typeof=3D"Book Microform"> =20
>>            <span property=3D"publisher" typeof=3D"Organization">
>>                <span property=3D"name">Microfiche Press</span>
>>            </span> in <time property=3D"datePublished">2016</time>
>>            (ISBN: <span property=3D"isbn">12341234</span>).
>>        </span>
>>    </p>
>> </div>
>>=20
>> Alternative RDFa:
>>=20
>> The example above puts all data into the extension namespace. =
Although this can be mapped back into normal schema.org it puts more =
work onto consumers. Here is how it would look using multiple =
vocabularies:
>>=20
>> <div vocab=3D"http://schema.org/" prefix=3D"bib: =
http://bib.schema.org/">
>>    <p typeof=3D"Book" resource=3D"http://www.freebase.com/m/0h35m">
>>        <em property=3D"name">The Fellowship of the Rings</em> was =
written by
>>        <span property=3D"author">J.R.R Tolkien</span> and was =
originally published
>>        in the <span property=3D"publisher" typeof=3D"Organization">
>>            <span property=3D"location">United Kingdom</span> by
>>            <span property=3D"name">George Allen & Unwin</span>
>>        </span> in <time property=3D"datePublished">1954</time>.
>>        The book has been republished many times, including editions =
by
>>        <span property=3D"workExample" typeof=3D"Book">
>>            <span property=3D"publisher" typeof=3D"Organization">
>>                <span property=3D"name">HarperCollins</span>
>>            </span> in <time property=3D"datePublished">1974</time>
>>            (ISBN: <span property=3D"isbn">0007149212</span>)
>>        </span> and by
>>        <span property=3D"workExample" typeof=3D"Book bib:Microform"> =20=

>>            <span property=3D"publisher" typeof=3D"Organization">
>>                <span property=3D"name">Microfiche Press</span>
>>            </span> in <time property=3D"datePublished">2016</time>
>>            (ISBN: <span property=3D"isbn">12341234</span>).
>>        </span>
>>    </p>
>> </div>
>>=20
>> Here is that last approach written in JSON-LD (it works today, but =
would be even more concise if the schema.org JSON-LD context file was =
updated to declare the 'bib' extension):
>>=20
>> <script type=3D"application/ld+json">
>> {
>>  "@context": [ "http://schema.org/",
>>       { "bib": "http://bib.schema.org/" } ],
>>  "@id": "http://www.freebase.com/m/0h35m",
>>  "@type": "Book",
>>  "name": "The Fellowship of the Rings",
>>  "author": "J.R.R Tolkien",
>>  "publisher": {
>>     "@type": "Organization",
>>  },
>>  "location": "United Kingdom",
>>  "name": "George Allen & Unwin",
>> },
>>  "datePublished": "1954",
>>  "workExample": {
>>    "@type": "Book",
>>    "name": "Harper Collins",
>>    "datePublished": "1974",
>>    "isbn": "0007149212"
>>  },
>>  "workExample": {
>>    "@type": ["Book", "bib:Microform"],
>>    "name": "Microfiche Press",
>>    "datePublished": "2016",
>>    "isbn": "12341234"
>>  }
>> }
>> </script>
>>=20
>>=20
>> GS1 Example
>>=20
>> <script type=3D"application/ld+json">
>> {
>>    "@context": "http://schema.org/",
>>    "@vocab": "http://gs1.schema.org/",
>>    "@id": "http://id.manufacturer.com/gtin/05011476100885",
>>    "gtin13": "5011476100885",
>>    "@type": "TradeItem",
>>    "tradeItemDescription": "Deliciously crunchy Os, packed with 4 =
whole grains. Say Yes to Cheerios",
>>    "healthClaimDescription": "8 Vitamins & Iron, Source of Calcium & =
High in Fibre",
>>    "hasAllergenRelatedInformation": {
>>        "@type": "gs1:AllergenRelatedInformation",
>>        "allergenStatement": "May contain nut traces"
>>    },
>>    "hasIngredients": {
>>        "@type": "gs1:FoodAndBeverageIngredient",
>>        "hasIngredientDetail": [
>>            {
>>                "@type": "Ingredient",
>>                "ingredientseq": "1",
>>                "ingredientname": "Cereal Grains",
>>                "ingredientpercentage": "77.5"
>>            },
>>            {
>>                "@type": "Ingredient",
>>                "ingredientseq": "2",
>>                "ingredientname": "Whole Grain OATS",
>>                "ingredientpercentage": "38.0"
>>            }
>>      ]
>>    },
>>    "nutrientBasisQuantity": {
>>        "@type": "Measurement",
>>        "value": "100",
>>        "unit": "GRM"
>>    },
>>    "energyPerNutrientBasis": [
>>        {
>>            "@type": "Measurement",
>>            "value": "1615",
>>            "unit": "KJO"
>>        },
>>        {
>>            "@type": "Measurement",
>>            "value": "382",
>>            "unit": "E14"
>>        }
>>    ],
>>    "proteinPerNutrientBasis": {
>>        "@type": "Measurement",
>>        "value": "8.6",
>>        "unit": "GRM"
>>    }
>> }
>>=20
>> </script>
>>=20
>> This example shows a possible encoding of the GS1 schemas overlaid =
onto schema.org. It uses JSON-LD syntax, which would support several =
variations on this approach. It is based on examples from GS1's proposal =
circulated to the schema.org community recently.
>> =
(https://lists.w3.org/Archives/Public/public-vocabs/2015Jan/0069.html). =
Instead of writing
>>    "@context": "http://schema.org/",   "@vocab": =
"http://gs1.schema.org/", it would be possible to simply write =
"@context": "http://gs1.schema.org/".
>>=20
>>=20
>>=20
>>=20
>>=20
>>=20
>=20
> =20
>=20
Received on Tuesday, 24 March 2015 08:58:15 UTC