RE: Proposal for Schema.org extension mechanism from McBennett, Pat on 2015-03-27 (public-vocabs@w3.org from March 2015)

From: McBennett, Pat <McBennettP@DNB.com>
Date: Fri, 27 Mar 2015 06:55:08 -0500
To: "martin.hepp@ebusiness-unibw.org" <martin.hepp@ebusiness-unibw.org>
CC: W3C Web Schemas Task Force <public-vocabs@w3.org>
Message-ID: <52EE3F4A5E7F194A963FE14B2DDBDBFE3C0A8D866F@DNBEXCH01.dnbint.net>
@Martin - basically, in a word - wow!

I think your response may have saved me months (maybe years!?) of futile attempts to find, or help define, authoritative identifiers for reference data using Linked Data - starting with one of the most basic ones, identifiers for countries.

The description of the frustrations you've encountered is disheartening, and it's certainly familiar to me from the corporate world too.

But it just seems to strengthen my initial suggestion that possibly Schema.org would be the ideal mechanism to bridge the gaps you describe. Isn't Schema.org already a de facto 'authority' for many things on the web? If the 'real' authorities for things like country identifiers (e.g. the ISO or the United Nations or the CIA World Factbook even [1]) simply won't (or can't) provide LOD-style identifiers for things they claim authority over, then why can't Schema.org simply step in and do it instead? 

I don't think Wikipedia/DBPedia can do this, since the lack of curation dissolves any claim of 'authority'. I don't think CEDS can do this (no offense Jim :) ), since I'd never heard of it before yesterday, and it's focus is education, so it'll never be recognised as an authority for country identifiers. In fact, I see Schema.org as being the *only* viable alternative today to the 'real' authorities for stepping into this space for defining LOD identifiers for common reference data. They already have the 'authority' (by virtue of 25% of the entire web already being marked up with Schema.org [2]), they are curated, and they are genuinely in the public domain licensing and copyright-wise.

So yes, Schema.org would have the burden of keeping their LOD identifiers in-sync with the ISO or UN or whoever the starting point is as changes occur, but possibly only until the ISO and UN finally provide LOD identifiers themselves, or until Schema.org becomes even more 'authoritative' than them in the modern Internet-driven, URI-based world (since everyone on the Web (i.e. *everyone*) will be using the Schema.org identifiers in all their systems anyway)!

And Martin - if you already have '...a thoroughly constructed RDF conversion of the UNSPSC', then why not contribute it to Schema.org as an extension to get things started? Or if they don't want it, I'll take it :)

It just seems so blindingly obvious to me that I can only assume I'm missing something very big here...?

Pat.

[1] - https://www.cia.gov/library/publications/the-world-factbook/
[2] - I can't find a reference to back this up now, but Markus Lanthaler told me it was an officially released Google statistic from last year.


> -----Original Message-----
> From: martin.hepp@ebusiness-unibw.org [mailto:martin.hepp@ebusiness-
> unibw.org]
> Sent: 26 March 2015 17:35
> To: McBennett, Pat
> Cc: W3C Web Schemas Task Force
> Subject: Re: Proposal for Schema.org extension mechanism
> 
> Dear Pat:
> 
> I have tried for almost a decade to get permission from the United Nations (*) to
> publish a thoroughly constructed RDF conversion of the UNSPSC, so I can feel
> your pain.
> The problem you describe is at the core of many attempts of re-using existing
> standards for the Web of Data: Most standards are subject to copyright
> protection in one way or the other, so to be on the safe side, you need their
> creators' permission. Also, the standards are evolving, thus you need to keep
> your variant in sync with the official standard.
> 
> The permission is hard to get, because the relevant bodies typically do not sign
> off liberal copyright licenses easily, and they do not have the budget or do not
> see the benefit in paying a lawyer to evaluate the feasibility (note that they must
> also check whether they have sufficient rights themselves, so they cannot easily
> grant a CC license).
> 
> Often branding and trademark protection, and existing business models, are a
> problem, too.
> 
> In a nutshell, this is why I suggest to use string literals in lieu of URLs for existing
> standards. Referring to a string precisely defined in an external standard is as
> reliable as using a URI, and while it is not "Linked Open Data"-style and you
> cannot easily get a description by HTTP, you eliminate all the legal and technical
> hassles of republishing a standard as Linked Open Data.
> 
> Also, I think that a badly implemented Linked Open Data variant of a standard is
> worse than the authoritative string from the original standard.
> 
> With badly implemented I mean e.g. that the LOD version is not in sync with the
> latest version of the standard, or that the owner of the domain looses interest
> or goes bankrupt, with the consequence that the shiny URIs start rotting in the
> sunlight and are difficult to eliminate from data and applications.
> 
> Martin
> 
> (*) Actually I gave up after five years ;-)
> (**) Instead we publish a tool to regenerate the RDF transcripts locally in a
> canonical form, see http://wiki.goodrelations-vocabulary.org/Tools/PCS2OWL
> 
> -------------------------------------------------------
> martin hepp
> e-business & web science research group
> universitaet der bundeswehr muenchen
> 
> e-mail:  martin.hepp@unibw.de
> phone:   +49-(0)89-6004-4217
> fax:     +49-(0)89-6004-4620
> www:     http://www.unibw.de/ebusiness/ (group)
>          http://www.heppnetz.de/ (personal)
> skype:   mfhepp
> twitter: mfhepp
> 
> Check out GoodRelations for E-Commerce on the Web of Linked Data!
> =================================================================
> * Project Main Page: http://purl.org/goodrelations/
> 
> 
> 
> 
> On 26 Mar 2015, at 18:04, McBennett, Pat <McBennettP@DNB.com> wrote:
> 
> > I totally agree with Martin Hepp's comments. I've recently begun exactly the
> process Martin describes (i.e. defining 'Web ontologies / shared schemas'), and
> already I'm finding all 3 of his points are spot-on.
> >
> > But I'd like to ask Martin - what form of mechanism does he think could work
> for '.tapping into the potential of the many, many interesting schemas and
> standards out there [.] without the need to channel those through the social
> and technical process of getting into schema.org core'?
> >
> > As a very simple example - I'm currently trying to find an existing RDF schema
> or standard for International Country Codes, but one which is 'authoritative'.
> ISO was an obvious place to start, so I asked them if they could provide these
> codes as RDF (I can that they currently provide them as CSV, XML or XLS [1]).
> Their response:
> >
> > Dear Pat,
> >
> > We do not product any RDF formats, I am sorry.
> >
> > Regards
> >
> > So that means although there are ISO country codes in the public domain (e.g.
> IRL, or FRA, or USA), and of course I can use those codes freely, there are no
> 'official' URI's out there for those codes (that I'm aware of) - i.e. there is no
> 'http://www.iso.org/country/alpha-3/IRL' for Ireland. So unless I can presuade
> the ISO to mint these URI's for 'their' country codes (which I would see as ideal,
> since they are a recognised authority, but it seems unlikely in the sort term),
> what mechanism do I have to use standardised, authoritative (i.e. as opposed to
> crowdsourced Wikipedia (or DBPedia) URI identifiers for countries in my internal
> datasets? I could mint my own URI's for these country codes under my
> companies domain name, but that's hardly appropriate as we've no interest in
> being an authority on country code identifiers (and we'd have the maintanance
> overhead of trying to keep them in-sync with the 'real' ISO codes).
> >
> > Which is why I would have thought an extension to Schema.org might offer a
> good opportunity for this (since Schema.org has already become the de facto
> authority for lots of things!). But am I just being naïve somehow.?
> >
> > Regards,
> >
> > Pat.
> >
> >
> > [1] - http://www.iso.org/iso/country_codes.htm
> >
> >
> >
> > <image001.png>
> >
> > Pat McBennett
> > Architect
> > The Chase Building, 5th Floor
> > Carmanhall Road, Sandyford,
> > Dublin 18, Ireland
> > Direct +353 1
> > Mobile +353 8
> >
> > http://www.dnb.co.uk/
> >
> > <image002.png><image003.png><image004.png><image005.png>
> > <image006.png>
> >
> > The information contained in this electronic message and any attachments
> (the "Message") is intended for one or more specific individuals or entities, and
> may be confidential, proprietary, privileged or otherwise protected by law. If
> you are not the intended recipient (or you are not authorised to receive for the
> recipient), please notify the sender immediately, delete this Message and do not
> disclose, distribute, or copy it to any third party or otherwise use this Message.
> Electronic messages are not secure or error free and can contain viruses or may
> be delayed and the sender is not liable for any of these occurrences. The sender
> reserves the right to monitor, record, transfer cross border and retain electronic
> messages.
> > "D&B" is a trading style of D&B Business Information Solutions is registered in
> Ireland. www.dnb.co.uk
Received on Friday, 27 March 2015 11:55:38 UTC