Re: Proposal for Schema.org extension mechanism from Kingsley Idehen on 2015-02-15 (public-vocabs@w3.org from February 2015)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Sun, 15 Feb 2015 14:48:32 -0500
To: public-vocabs@w3.org
Message-ID: <54E0F810.5010403@openlinksw.com>
On 2/15/15 12:19 PM, Dan Brickley wrote:
> On 15 February 2015 at 14:47, Bo Ferri <zazi@smiy.org> wrote:
>
>> (sorry, I can't resist ;) )
>>
>> interesting and neat idea. Nevertheless nothing new at all (you know this!
>> better than me ;) ). So how does this relate to the already existing, open
>> approach of "simply" publishing a(n) ontology/vocabulary with a PURL and
>> make use of it. Do we really need everything under the (cooperate (?))
>> schema.org umbrella? You know* that the "one vocabulary rule them all"
>> approach (even with extension mechanism) doesn't scale and couldn't make any
>> domain (and webmaster who should apply it) happy (this is the world out
>> there).
> It would be a great thing if there were 100s or 1000s of RDF-based
> vocabularies out there, with lots of publishers and consumers making
> use of them all.

There are, as exemplified by the likes of LOV [1] and lots of "deep web" 
content constructed using RDF (which includes all the documents 
constructed using terms from schema.org plus those from the massive 
Linked Open Data cloud).

Did you not really mean to say: vocabularies used *specifically* by Web 
Masters and HTML+Javascript developers ?

>   Schema.org is a means to an end rather than an end in
> itself - a practical project to help bootstrap this whole thing out of
> the slow motion progress we've been making these last ~17+ years.

It's an addition to the mix. A very good one at that, with very good 
results.

As you know, we have to put things in to perspective i.e.,  the Web 
isn't a "zero sum" affair. Over the last 17 years a lot has been 
contributed, and the visibility of these contributions depends very much 
on the context-lenses through which they are viewed. For instance, from 
my vantage point, we have:

1. Massive Linked Open Data cloud -- oriented towards those that publish 
and consume so-called 5-Star Linked Open Data.
2. Massive Linked Data cloud -- which adds content from Web Masters and 
HTML+Javascript developers to the Linked Open Data Cloud.

Net effect, we have a massive Web of Linked Data comprised of relations 
that have varying degrees of semantic fidelity i.e., the degree to which 
different kinds of user agents (human and/or machines) are able to 
comprehend the nature of relations (associations, attributes, 
properties) connecting two things -- where each relationship 
participant, including the relationship type (relation) itself, are 
identified by an HTTP URI.

>
> What we in the RDF community have seen since work began on
> http://www.w3.org/TR/rdf-schema/ in 1997, is that while it is great to
> have the option of an entirely decentralized composition mechanism,
> there are also very practical costs for vocabularies being so weakly
> coordinated.

Yes!

>
>
> Triples/graphs are not the easiest things to work with at the best of
> times.

When they aren't understood, since "graphs" and "triples" are 
colloquialism that should be at the backdoor of this narrative. We have 
sentences (content) and documents being enhanced via the use of 
hyperlinks. The fact that one can use a graph to represent the nature of 
a sentence and/or a set of sentences that share a common predicate is 
completely lost in this colloquial use of "graph" and the implicit 
triangulation to "graph theory".  Just as bad as the "strings for 
things" slogan that sounds nice but really makes little or no sense, 
since the real issue is all about moving from identifying entities using 
string identifiers (which can only be interpreted in some kind of silo) 
to reference identifiers (which by way of HTTP based hyperlinks can be 
interpreted globally via the ubiquitous World Wide Web).

The real minus of the last 17 years  of RDF is the fact that the first 
5-10 years where built around very poor narratives. On a good day, to 
most, you had draconian goobledegook (sorry, but I have not other word 
choice here) thanks to RDF/XML. This was exacerbated by the time it took 
to move from RDF/XML soley, to the notion of varied notations for 
creating RDF document content (approximately 13 years of self-inflicted 
wounds on the marketing and messaging fronts).

>   The data model is so permissively flexible that creating
> applications against it is difficult.

"Data Model" is part of the problem. RDF is better understood as a 
Language [1]. The "Data Model" notion comes from a realm (i.e., SQL 
RDBMS) that has its own problems (conceptually and technically) which 
are now bubbling to the surface.

You can't do anything with something you don't understand. RDF was well 
designed but atrociously described and promoted, by the W3C.


> These difficulties were in some
> situations (e.g. web search, schema.org's origins) made worse by the
> chaotic state of the vocabulary environment.

The chaos comes from the confusion that swirls around so-called "data 
model" and "syntaxes" .

If one speaks about RDF as a Language i.e., a system of signs, syntax, 
and role semantics, for encoding and decoding information [data in 
context], the artificial confusion dissipates [2][3].

>
> The schema.org extensions discussion I think makes clear that there is
> a spectrum here. At one extreme are vocabularies are developed without
> any communication or coordination whatsoever. Less extreme is for some
> weak coordination and linking between vocabularies, e.g. foaf:focus is
> defined in terms of skos:Concept
> (http://xmlns.com/foaf/spec/#term_focus) or linked data vocabularies
> that relate their terms to others with sub/supertype, equivalence etc
> relationships, even if the designs are essentially independent. At the
> other extreme would be a single vocabulary that attempted to model
> everything in a monolithic way. It is important to understand that
> schema.org is not so rigid.

Correct, it isn't rigid, but once again there's a meme related problem, 
just as there was in the early days of RDF.

Bo's concerns (I believe) has more to do with interpretation of the 
extensions narrative. I think it needs subtle tweaks along the lines of 
making its goals clearer, bearing in mind that Guha (and you) do 
actually support the notion of a generic and loosely coupled vocabulary 
with broad appeal. One that's of practical use to Web Masters and 
HTML+Javascript developers.

Bearing in mind my comments above, I think Bo's concerns can be 
addressed by way of intention clarification i.e., schema.org can be 
extended in a variety of ways rather than one way:

1. Using the approach in Guha's post -- oriented towards Web Masters and 
HTML+Javascript developers
2. Using relations -- an approach natural to the Web but predominantly 
practiced by developers and publishers of Linked Open Data, at the 
current time.

At OpenLink we practice #2, it doesn't require permission from anyone or 
consensus with everyone, we just get on with it [4][5][6].

> Schema.org is by practical necessity very
> pragmatic, and e.g. supports for example both library-oriented and
> bookshop-like ways of describing books.

And so are other endeavors. There is nothing that uniquely pragmatic 
about Schema.org relative to RDF based Linked Open Data in general. What 
you have is the combined market might of Google, Microsoft, Yahoo!, and 
Yandex and a captive SEO community (comprised of Web Masters and 
HTML+Javascript developers). A good thing!

Seriously, I wouldn't claim "pragmatic" (in a generic sense) as the 
distinguishing characteristic (attribute, property etc..) here, relative 
to other RDF related endeavors. As stated above, I see an endeavor that 
targets a captive audience, which of course is a form of pragmatism, but 
not one that implies other efforts are purely theoretical etc..


> The pieces of schema.org that
> came from Good Relations have some ways of talking about mass produced
> objects via prototypes (http://schema.org/ProductModel) which could
> also be applied to books, since books are mass produced; but we also
> have adopted a use of isPartOf as an alternate model for addressing
> FRBR-like use cases, without the rigidity of FRBR. The details don't
> matter here - my point is that even within a single large vocabulary
> you naturally have a kind of pluralism.

Yes, and that's what needs a little more emphasis in communications 
about schema.org usage and strategic goals.

>   Any vocabulary at schema.org's
> scale will have situations where there are several ways of saying the
> same thing, depending on perspective and context.

That's even the case with small vocabularies. In short, always the case 
with Language.

>
> The gap that the extension model fills is between the relative chaos
> of very loosely coupled linked data vocabularies (independent designs,
> documentation, versioning, modeling styles) -vs- the relatively highly
> integrated approach of core schema.org. We want a bit more chaos than
> core schema.org but a lot less chaos than the total free-for-all of
> the classic Semantic Web.

Why not the classic World Wide Web? That broadly used public HTTP 
network isn't devoid of relations and semantics that aid understanding 
the nature of the  relations that make up its tapestry.

I see the gap being addressed as one that goes beyond basic search 
engine discoverability, enabling Web Masters and HTML+Javascript 
developers to gradually encode and consume schema.org content associated 
with more specialist domains.

Schema.org addresses the needs of a community that wasn't optimally 
served by the generic Semantic Web meme. A lot of that (as already 
stated) has all to do with the incentives that arise naturally from the 
visible support of Google, Yandex, Yahoo!, and Microsoft (via Bing!). 
That's massive, and its negates the prescriptive specification problem 
that's dogged RDF from the onset. Ironically, if RDF was correctly 
pitched as a formalization of what was already in use, we would have 
reduced 17 years to something like 5, no kidding!

For instance, Imagine if <link/> and "Link:" had been incorporated into 
the RDF narrative as existing notations for representing entity 
relations? Basically, Web Masters, HTML+Javascript developers, and the 
Microformats (now IndieWeb folks) would have be far less confused and 
resistant to the RDF -- especially as would have prevented the massive 
RDF/XML blob of confusion that ultimately obscured everything.

> For those who would rather pick and choose
> from the entire range of diverse vocabularies, the LOV project (see
> lov.okfn.org) offers a very useful directory.

Exactly!

> For those who want a bit
> more consistency in terms of documentation / navigation / usage
> examples, a common underlying core, and richer domain-oriented
> extensions, we have schema.org and its approaches to extension as
> discussed here.

I don't really agree with that characterization. It's unnecessarily 
pejorative about alternatives to schema.org prescriptions. Again, all of 
these initiatives are pieces of a massive puzzle, so we have to be able 
to communicate about these pieces without knocking other complimentary 
parts.

>
> Schema.org is an exploration of the idea that we'll get further,
> faster by sharing a substantially sized common vocabulary as well as
> underlying graph data model.

To me it's a demonstration of what happens when a specification is 
backed by key industry players. Basically,  rather than saying "Hey! You 
over there, you MUST work this way, just because we say so .." we have 
an approach that targets a massive audience (Web Masters and 
HTML+Javascript Developers) with in-built incentives i.e., they all want 
to optimize their content for the search engine technologies from 
Google, Microsoft, Yahoo!, and Yandex.

>   But it remains a part of that larger
> RDF-based framework and can be freely mixed with independently managed
> vocabularies.

Amen!!

Links:

[1] http://lov.okfn.org/dataset/lov/vocabs/schema -- Looking at 
schema.org via LOV's context-lenses
[2] http://www.slideshare.net/kidehen/understanding-29894555/55 -- 
Natural Language & Data
[3] http://www.jfsowa.com/pubs/fflogic.htm -- Fads and Fallacies about Logic
[4] http://www.openlinksw.com/data/turtle/ -- OpenLink Ontologies 
collection
[5] 
http://kidehen.blogspot.com/2015/01/social-networking-profiles-for-everyone.html 
-- Social Network Profile Publishing for Everyone
[6] 
http://kidehen.blogspot.com/2015/01/review-publishing-for-everyone.html 
-- Review Publishing for Everyone
[7] 
http://kidehen.blogspot.com/2014/02/class-equivalence-based-reasoning.html 
-- Class Equivalence Inference & Reasoning that leverages Schema.org.


-- 
Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Sunday, 15 February 2015 19:49:01 UTC