Re: WebSchemas, Schema.org and W3C from Bernard Vatant on 2013-01-22 (public-vocabs@w3.org from January 2013)

From: Bernard Vatant <bernard.vatant@mondeca.com>
Date: Tue, 22 Jan 2013 17:45:22 +0100
To: Dan Brickley <danbri@danbri.org>
Cc: "public-vocabs@w3.org Org" <public-vocabs@w3.org>, Thomas Baker <tom@tombaker.org>
Message-ID: <CAK4ZFVHt2rY6U8YV1bAzJEyhfP8dJhH1s4ujH3LLmyJju4zWZw@mail.gmail.com>
Hi Dan

I can't believe that such a rich and thoughtful message did not get any
answer (at least any public one) in ten days.
Thanks to putting it down anyway. I wanted to answer right away when you
posted it but had not until today the bandwidth to do so properly.

So here goes, answers below. Note for those who don't care to drill down in
such a long discussion that the main point about it is a call to action
that will be certainly be duplicated on other channels, but I extract it
here :

ACTION

Make a list of "globally adopted schemas" (vocabularies)  and put a *
responsible* agent name/email/URI whatever Web identifier in front of it
https://docs.google.com/spreadsheet/ccc?key=0AiYc9tLJbL4SdHByWkRYUkYxZU5qS1lQOE5FV0hiNlE#gid=0
Free to edit by anyone. If you are* currently responsible* for a
vocabulary, put your name and contact email address.
Let's take a month to see what we can gather. A month from now I will mail
all declared responsible to have confirmation, lock the document, and add
this information to LOV vocabularies description.

If you want to make sure what I mean by "responsible", read details below.

Best

Bernard

2013/1/13 Dan Brickley <danbri@danbri.org>

...
> As a member of the RDF community since 1997, I'm painfully aware of
> some of our failings. It is (as has been expressed already in this
> thread) important to avoid over-burdening schema.org with every hope
> and aspiration that attaches to the RDF, '[sS]emantic [wW]eb', 'Linked
> [open] Data' etc labels. Or put another way; schema.org has no
> intention of being overburdened with such things.
>
> Two particular failings of our community come to mind. One is that we
> have an endearing and frustrating architecture of politeness based on
> the use of namespaces that has led to a situation in which we have a
> fragmented suite of independent vocabularies that are hard for new
> parties to adopt.


I'm not sure that fragmentation and independence is the main obstacle to
adoption. Or the same can be said for any linked data and dataset.
Vocabularies are a particular kind of linked data, but they are linked
data. Linked data are also fragmented and managed by independent sources.
Choosing a reference vocabulary should be no more no less an issue that
choosing reference entities in an authority list, a thesaurus or any kind
of linked data base.
Main issues for adoption of reference URIs are quality and sustainability
of the resources and responsibility of the publisher.
Discovering vocabularies might be tough, although we have more and more
tools for that (not to mention LOV again here), but assessing those three
key parameters (quality, sustainability, responsibilty) is a headache
mainly because many vocabulary publishers do not take them seriously, as
attested by the crying lack of documentation and metadata for many of them.
We have now serious people and orgnizations eager to enter the linked data
game, and I meet more and more the question : "can we trust X or Y to be
still available in 5-10 years?"

The culture around RDF is that you only publish
> schemas for the 'diffs', the missing vocabulary that wasn't covered by
> a jumbled mix of existing terminology. So anyone doing document-like
> markup would be frowned at - "Did you consider using Dublin Core?";
> anyone publishing an RDF vocabulary describing people "Why didn't you
> use FOAF?", and so on. And the very architecture that supported this -
> namespaces - allowed us to continue to design these parallel
> descriptive systems without being forced to sit down together and work
> out how they can be combined to solve real world problems.
>

Indeed. But previous architectures provided parallel vocabularies in
parallel formats not interoperable at all, so we have a real progress. We
had, and still have around, people convinced that the linked data technical
infrastructure would work without social interagreement. But "Publish and
let the Web do the REST" just does not work. Not only for vocabularies, but
again for linked data at large. But seems to me now we have more and more
people convinced that the technical interoperability ensured by the common
linked data infrastructure is not enough if there is no social
coordination. So let's sit down together, indeed.
See e.g. the thematic of next DC conference
http://dcevents.dublincore.org/index.php/IntConf/dc-2013.

But I also agree with others that this forum is not necessarily the one to
solve all problems, and certainly not by bringing every other vocabulary
under the schema.org umbrella. Opening several focused tables of
conversation is certainly more profitable.


>  A couple of years ago, I did sit down and look at the words we'd
> chosen in various deployed and popular-ish RDF vocabularies; I called
> it "Zoo"; https://github.com/danbri/Zoo/blob/master/zoo.foaf.tv/index.html
> https://github.com/danbri/Zoo/blob/master/zoo.foaf.tv/zoo/raw_manifest.txt
> ... this showed that 'Collection' was used in bibo:, swan:, 'Work' in
> skos:; cc: vcard:; 'description' in dcterms: doap: gr: ical: sioc:,
> 'category' in 'doap: gr: po: vcard:', 'subject' in dcterms: po: rdf:
> sioc:, title in 'dcterms: foaf: sioc: vcard:' and so on.


You could easily enrich this zoo now with the LOV search
lov.okfn.org/dataset/lov/search/#s=description
lov.okfn.org/dataset/lov/search/#s=title
...


> Part of my hope for this forum is that  -yes, heavily nudged by the
> creation of
> schema.org - RDF vocabulary managers and editors could finally take
> the time to stay in touch.


Indeed!


> That parties working on vocabularies
> designed to be deployed alongside each other, could do the world a
> favour and talk to each other a bit more.


YES !


> It is good that we have the
> namespaces technical mechanism; but it has for too long allowed us to
> sidestep the need to talk about how different vocabularies fit
> together as more than mere triples.
>

Having pursued the same objective inside the LOV project for about three
years now, I would say that the main obstacle we've met is the pervasive
lack of *responsibility* of vocabulary
owners/authors/creators/publishers/curators. We have gathered more than 300
vocabularies, but for many of them it is not possible to identify who is
the current responsible entity (person or organisation), under any
definition of the word at http://en.wiktionary.org/wiki/responsible. In a
nutshell people don't make things seriously, and/or they don't answer when
called. I don't say it's a general rule, but from potential adopters it's
very difficult to say if there is someone responsible behind a given
vocabulary, in particular in those frequent cases where the project is
closed, original editor has moved, or does not answer mails etc etc.

Seems to me a simple basic action should be taken to start with, either
here or under any relevant forum, which would be in a nutshell :
responsible people, step forward. Who wants to play nicely in this game,
how do you make it public, and how would other know about it. We can define
a simple markup on vocabularies, similar to creative commons spirit,
showing the level of engagement or responsibility involved in the
vocabulary publishing. Lists of vocabularies along with their *current
*curators,
endorsing a certain number of social rules, like taking part in process
where their vocabularies are put on the table with other relevant ones,
etc. could be easily published and updated on a regular basis. We have
already exchanged with Tom Baker on this, DCMI have thought seriously about
those issues for a while as you (Dan) are well aware of. 2013 should be a
year of serious action on this.

The point is that in this community too many people have come to know each
other too well, so that they don't see why those implicit connections and
involvements should be explicited anywhere. But for people from outside,
all this is currently totally opaque.


> So WebSchemas was designed to be something a bit more than 'the
> schema.org mailing list at W3C', and I still believe that. We (the
> larger 'we') need a forum in which all schemas intended for
> planet-wide use are equally 'on topic'. The existence of schema.org
> should not have a chilling effect on the design, use and deployment of
> other RDF vocabularies. Even if the schema.org partner companies are
> not in a position right now to collectively promise to
> support/understand/use/endorse non-schema.org vocabulary, it is still
> healthy to have multiple efforts, initiatives and perspectives. (The
> move towards RDFa Lite is a very positive thing here, btw.)
>

Very glad to read that. Diversity is good, but my above suggestions might
help to clarify who are 'we' to begin with.


> The second failing of the community around RDF is that we have - as
> the years have drifted by - acquired a reputation for enjoying talk
> over action, and this isn't entirely undeserved.


But basically unfair. We've talked a lot, but achieved a lot also.
There is an amazing lot of people around able to talk and code at the same
time :)


> Yesterday I was
> re-reading some old mail threads with the late and lamented Aaron
> Swartz -
> http://lists.foaf-project.org/pipermail/foaf-dev/2000-August/004215.html
> http://lists.w3.org/Archives/Public/www-rdf-interest/2000Jul/0034.html
> - that frustration was already present in 2000. In the charter for
> this WebSchemas group i.e.
> http://www.w3.org/2001/sw/interest/webschema.html we list some semweb
> permathread themes explicitly as out-of-scope.
>
> "Out of scope topics include:
>
> * Advocacy of data models or syntaxes without attention to real-world use
> cases
> * The use of inference
> * debate over foundational ontologies"
>
> This does not mean that inference and foundational ontologies are
> uninteresting or unimportant, just that every successful forum needs
> to have some core scope, and that we have plenty of other places
> around W3C to debate those topics. What makes the WebSchemas group
> special? Just that here, finally, we have somewhere where parties
> responsible for globally adopted RDF schemas can do the responsible
> thing and stay more carefully in touch with each other.
>

You wrote the word : responsible. Now let's make a list of "globally
adopted schemas" and put the responsible agent name/email/URI whatever Web
identifier in front of it. Simple action, I've started here :
https://docs.google.com/spreadsheet/ccc?key=0AiYc9tLJbL4SdHByWkRYUkYxZU5qS1lQOE5FV0hiNlE#gid=0
Free to edit by anyone. If you are* currently responsible* for a
vocabulary, put your name and contact email address.
Let's take a month to see what we can gather. A month from now I will mail
all declared responsible to have confirmation, lock the document, and add
this information to LOV vocabularies description.

As Martin points out in a mail that arrived while typing this, ... one
> list is not going to be enough for everything. And in terms of work
> style for getting (sub-)schemas created and integrated, one size
> doesn't fit all. What we've found with schema.org is that different
> collaboration styles make sense for different domains. I suggested a
> W3C Community Group to Richard Wallis and I'm pleased to see that it
> has independent existence and activity. A few months ago I helped set
> up a 'sports schemas' group (just a Google Group mailing list), but
> that initiative is yet to thrive. We have a very active and largely
> independent community around the LRMI vocabulary managed quite
> separately, but linked to this one by mail, wiki and occasional audio
> catchups. There is of course Good Relations, which also enjoys
> independent existence.
>

And there is an ongoing effort to make the Time Ontology move forward
beyond it current "draft status".


> In general I think W3C community groups are a fine mechanism for more
> focussed and intense vocabulary collaboration, and this forum serves
> more for integration issues and high level overview on how all the
> pieces of the jigsaw fit together. It could be great, for example, to
> see a community group around modeling fiction (and Comics?), but we
> also need a place where all such efforts can report back to the wider
> community. The creation of schema.org has made all this more urgent
> and timely, but it is something we've needed for a while. In the
> Dublin Core world we talk about this as 'application profiles';
> templates and examples explaining how independently designed pieces of
> vocabulary can be mixed together to address real world descriptive
> needs. It should happen at W3C, schema.org should engage with it, but
> the need is broader. I think WebSchemas is the right place for it.
>
> I should also mention that there are a few areas now where groups
> elsewhere around W3C have come up with vocabulary (e.g. Organization +
> Registered Organization vocabs; DCAT/ADMS; Geo and post addresses)
> that will likely inform improvements to schema.org. There is a need
> for somewhere public to work out details around stability/versions,
> appropriate acknowledgement, etc.
>

Exactly. What I call "sustainable vocabulary management".
I would like to mention that in France, in the framework of the Datalift
project (datalift.org) we have among partners national institutions INSEE
(statistics) and IGN (geographical) working together to publish linked data
and harmonize their vocabularies and data with each other and the general
vocabulary ecosystem. Those are "serious" "normal" data publishers playing
the game nicely.


> The fundamental problem of schema design is that the world is not
> tidily partitioned; that all use cases interact and overlap -
> 'Intertwingularity'.  We can make focussed sub-fora for figuring out
> how to describe sports, or fiction, or journals and books, but the
> combinations and scope overlaps can be overwhelming. While good design
> can help, perhaps even more important is communication.
>

Again, triple YES !


> And for that we need somewhere to talk. I don't think it ultimately
> matters hugely whether there is a schema.org-specific mailing list at
> W3C alongside a more general 'all vocabularies' one, versus a single
> list as we have now. My preference is for a unified forum, and we will
> likely spin off various schema.org-specific lists for specific
> detailed schema.org topics. But given schema.org's cross-domain
> nature, it seems important for the project to remain highly visible in
> a cross-domain, multi-schema forum.
>
> Dan
>
> > //Ed
> >
> > [1] http://www.w3.org/2001/sw/interest/webschema.html
> >
>
>


-- 
*Bernard Vatant
*
Vocabularies & Data Engineering
Tel :  + 33 (0)9 71 48 84 59
 Skype : bernard.vatant
Blog : the wheel and the hub <http://blog.hubjects.com/>

--------------------------------------------------------
*Mondeca**          **                   *
3 cité Nollez 75018 Paris, France
www.mondeca.com
Follow us on Twitter : @mondecanews <http://twitter.com/#%21/mondecanews>
Received on Tuesday, 22 January 2013 16:46:16 UTC