Re: Data "on" the Web vs Data "in" the Web from Steven Adler on 2014-03-26 (public-dwbp-wg@w3.org from March 2014)

From: Steven Adler <adler1@us.ibm.com>
Date: Tue, 25 Mar 2014 17:05:21 -0700
To: Timothy Lebo <lebot@rpi.edu>
Cc: Phil Archer <phila@w3.org>, public-dwbp-wg <public-dwbp-wg@w3.org>
Message-ID: <OF0C0A3E3B.DDF2A973-ON88257CA6.00837587-88257CA7.00007DBD@us.ibm.com>
Hi Tim,

Great that you chimed in.  We are after all completely Open and your views 
are welcome.

Early in this process, we WG members identified that none of us are 
actually Open Data practitioners.  We are all "theorists."  We therefore 
agreed to base our WG work on customer use cases, to collect problems and 
solutions from real practitioners to base our recommendations on solving 
real problems.

So far we have heard only one example, DBpedia, that uses RDF to publish 
Open Data.  None of the other examples we have heard from, and none of the 
Cities I have ever worked with, are using, plan to use, or think it is 
necessary to use, RDF databases for Open Data. 

Now IBM is a big W3C customer.  We believe in its value and mission and 
are proud to support it.  We believe in RDF and Linked Data and I 
personally am doing everything I can to promote it internally.  I do not 
think it inconsistent to promote RDF and recognize that customers have 
other needs and issues and provide Best Practices Recommendations to solve 
their problems and not MINE.

And doing this makes us a better team, and makes W3C more relevant, and 
that is my goal.


Best Regards,

Steve

Motto: "Do First, Think, Do it Again"



From:
Timothy Lebo <lebot@rpi.edu>
To:
Steven Adler/Somers/IBM@IBMUS
Cc:
Phil Archer <phila@w3.org>, public-dwbp-wg <public-dwbp-wg@w3.org>
Date:
03/25/2014 11:19 AM
Subject:
Re: Data "on" the Web vs Data "in" the Web



Hi, Steve,

I’m not an official member of this WG, but am watching on the sidelines 
licking my wounds from previous WG standardization activities :-)

I’d like to respond to your #3 below.

On Mar 25, 2014, at 2:06 PM, Steven Adler <adler1@us.ibm.com> wrote:

> I understand your point of view and you've written a passionate defense 
of RDF and Linked Data. Its not my intent to challenge your beliefs, but I 
would like to point out some facts and my interpretation.
> 
> 1. According to IDC the RDF market was $10M in 2012 and is estimated to 
grow to $170M in 2017. That might sound like a lot but in comparison to 
the overall database market of about $30B its peanuts.
> 
> 2.  There might be government implementations of RDF but there are no 
Open Data implementations.
> 
> 3.  Using this WG to sell RDF and Linked Data to Open Data customers who 
are not yet using it is not an Open Standards activity. Its a commercial 
activity.


I would think that it *is* the purpose of the World Wide Web Consortium to 
“sell” the approaches that it deems to be most worthwhile.
The W3C has spent significant effort developing a particular flavor of 
information technologies that are well suited to the Web.
RDF and Linked Data are the W3C’s version of open data standards.
W3C doesn’t sell for money, it sells for concept. If its “customers” 
aren’t buying, then they’re free to go adopt non-Web standards.
But it certainly isn’t the W3C’s job to provide any and all information 
technology to any and all — only those centered on the Web and to those 
whom wish to be on the Web.

There are plenty of other standards bodies. They can sell their 
perspective, too.

:: scurries back to the bleachers ::

Regards,
Tim Lebo



> 
> 4.  There are many other issues Open Data creates around dq and data 
comparability that have use cases, documented needs, that need standards.
> 
> All of the Above to me means including the RDF and Linked Data 
approaches with other approaches that solve problems for current 
implementations too.
> 
> And. Also.
> 
> Not or.
> 
> Not only.
> 
> Regards,
> 
> Steve
> 
> 
> ----- Original Message -----
> From: Phil Archer [phila@w3.org]
> Sent: 03/25/2014 03:54 PM GMT
> To: Steven Adler
> Cc: public-dwbp-wg <public-dwbp-wg@w3.org>
> Subject: Re: Data "on" the Web vs Data "in" the Web
> 
> 
> 
> Steve, I cannot let this e-mail of yours go unchallenged.
> 
> On 25/03/2014 14:57, Steven Adler wrote:
>> Those are good comments.  The graph data market is pretty small today,
>> with interest pretty evenly split between RDF and Property Graph.
> 
> Really? I'm very interested to know about uses of Property Graphs as
> there has been discussion about possibly standardising that at W3C.
> We're having difficulty getting a sufficiently large community of
> members together to do this.
> 
>  There
>> are some things Graph databases can do, especially in social networking
>> examples, that perform much better than traditional databases.
>> 
>> But no one today is using RDF or Graph Data in any Open Data
>> implementation and no one has any plans to do so.
> 
> 
> This statement is grossly untrue. Off the top of my head examples of
> public sector use of Linked Data (open or not):
> 
> UN FAO
> European Environment Agency
> The BBC, New York Times etc.
> The European Commission
> UK Government
> Deutsche National Bibliotek
> Ordnance Survey
> British Geological Survey
> The Italian Government
> The US EPA
> The US DoH
> 
> Away from the public sector it's used extensively in finance, health
> care and life sciences are making extensive and growing use of it etc.
> etc. Check out http://semanticweb.com/ for more news of the substantial
> and growing commercial use of the technology.
> 
>  Cities and State
>> governments have limited budgets and resources and very limited skills.
> 
> True of course. The point of this WG is to show those people how to make
> the best of what limited resources they have in this regard. They should
> be able to benefit from the power of LD even if they don't necessarily
> use it as a core tech themselves or even know they're using it.
> 
>> 
>> And RDF and Graph are not the only ways to skin this cat...
> 
> No, but it is the best available method that can operate at Web scale
> and that maximises the benefits of the network effect.
> 
>> 
>> Does Data Quality need linked data vocabularies to offer value?
> 
> URIs as identifiers and LD vocabularies are how you share meaning across
> different datasets. Datasets that use external vocabularies are more
> valuable than ones that just use internal codes that are meaningless out
> of that specific context.
> 
> Wouldn't
>> standardized lineage and certification suffice?
> 
> Not sure what you mean by these.
> 
>  I can see the value of
>> graph search for Data Comparability, but well defined metadata would 
also
>> take Open Data to the next level.
> 
> That's what the CSV on the Web WG is about, for example - well defined
> metadata for tabular data that can, among other things, be used to
> generate LD from the table.
> 
> 
>  If we only recommend RDF and Linked
>> Data as Best Practices for Data Publishing and only a small fraction of
>> the market can use them, what good have we done?
> 
> See previous. It's about making the benefits of LD, i.e. using the Web
> as a data platform, not just a means for shifting PDFs and CSVs from one
> place to another. We want people to get the most from their efforts to
> bring efficiencies and transparency, as well as supporting the
> commercial knowledge economy.
> 
>> 
>> I want to make sure that the work we do has maximum impact and so far 
use
>> case evidence does not convince me that RDF and Linked Data alone will 
get
>> us there.
> 
> No one is suggesting that it alone will get us there. But everything we
> do should support the goal of making use of the power of the Web. That
> means making sure people use URIs as identifiers, or that they publish
> metadata such that their non-URI identifiers can be rendered as URIs;
> that relationships are defined etc.
> 
> The Editor's Draft that Augusto began this thread by pointing to is
> interesting [1]. That talks about how to put links into non-RDF and
> non-HTML formats. +1 to that.
> 
> Phil.
> 
> [1] http://tools.ietf.org/html/draft-kelly-json-hal

> 
> 
>> 
>> 
>> From:
>> Christophe Guéret <christophe.gueret@dans.knaw.nl>
>> To:
>> Steven Adler/Somers/IBM@IBMUS
>> Cc:
>> Christophe Gueret <christophe.gueret@dans.knaw.nl>, Augusto Herrmann
>> <augusto.herrmann@gmail.com>, "hellmatic@gmail.com" 
<hellmatic@gmail.com>,
>> public-dwbp-wg <public-dwbp-wg@w3.org>
>> Date:
>> 03/25/2014 07:31 AM
>> Subject:
>> Re: Data "on" the Web vs Data "in" the Web
>> 
>> 
>> 
>> Hoi Steve,
>> Last year Facebook announced its graph search function, choosing the 
power
>> of semantic search without RDF.  What I have learned from this WG
>> experience so far is that W3C doesn't really create open standards. It
>> creates and enhances and promotes W3C standards.
>> I've some difficulties to follow you on that one, aren't W3C standards
>> open ?
>> The rest of the world often thanks W3C for its ideas and then 
implements
>> those ideas in different ways.
>> I thought this was rather common in industry. People copy each other 
and
>> spend time re-branding the same ideas, also probably to go around 
patents
>> while re-using things that are indeed good ideas. E.g. "retina display"
>> versus "hd screen", "facetime" VS "hangout" VS "videoconference", 
"like"
>> VS "+1", google's graph VS facebook's graph, etc ...
>> Can we, this WG, imagine creating or recommending standards that are
>> objective - that describe things to do that anyone can do with or 
without
>> RDF?
>> We'll see... :)
>> 
>> Regards,
>> Christophe
>> 
>> Regards,
>> 
>> Steve
>> 
>>   From: Christophe Guéret [christophe.gueret@dans.knaw.nl]
>>   Sent: 03/24/2014 04:01 PM CET
>>   To: Steven Adler
>>   Cc: Christophe Gueret <christophe.gueret@dans.knaw.nl>; Augusto 
Herrmann
>> <augusto.herrmann@gmail.com>; "hellmatic@gmail.com" 
<hellmatic@gmail.com>;
>> public-dwbp-wg <public-dwbp-wg@w3.org>
>> 
>>   Subject: Re: Data "on" the Web vs Data "in" the Web
>> 
>> 
>> So now we are creating W3C standards for publishing data as 
unstructured
>> text on websites?
>> The message of this presentation is actually quite the opposite ;-)
>> Instead the idea is to use the Web as platform to host the data. That 
is,
>> instead of publishing datasets as resources use URIs and HTTP to gain
>> access to specific (structured !) elements of data sets which can be
>> linked and re-used. There has to be a structure and there has to be 
links
>> possibility but this does not mean that RDF is the only model that will
>> work out and that RDF/XML is the only way to serialise data.
>> 
>> Is that what's in the charter?  Honestly I have always found the 
charter
>> to be confusing. Maybe it was intended to be machine readable. ;-
>> 
>> :-)
>> 
>> Cheers,
>> Christophe
>> 
>> 
>> 
>> Regards,
>> 
>> Steve
>> 
>>   From: Christophe Guéret [christophe.gueret@dans.knaw.nl]
>>   Sent: 03/24/2014 03:42 PM CET
>>   To: Steven Adler
>>   Cc: Augusto Herrmann <augusto.herrmann@gmail.com>; 
"hellmatic@gmail.com"
>> <hellmatic@gmail.com>; DWBP Public List <public-dwbp-wg@w3.org>
>> 
>>   Subject: Re: Data "on" the Web vs Data "in" the Web
>> 
>> Hoi,
>> 
>> I think this (semantic !) discussion around data "on" and "in" can be a
>> good way to let people see the difference being putting a link to a
>> resource which is a data set dump ("on") and providing some kind of API
>> ("in") - whatever the technologies of the API are. Lately, I've been 
using
>> that argument to point people to the fact that downloading dumps of 
data
>> in various forms is like doing document sharing prior to the Web. 
Coming
>> them to the conclusion that we should publish our data as Web sites. 
There
>> is a bit of a focus set on SemWeb technologies for that but, really, we
>> could think of many other ways to reach the same result. Here are the
>> slides, comments are most welcome ;-) :
>> http://www.slideshare.net/cgueret/linking-knowledge-spaces

>> 
>> Cheers,
>> Christophe
>> 
>> 
>> 
>> On 20 March 2014 15:17, Steven Adler <adler1@us.ibm.com> wrote:
>> Augusto,
>> 
>> I am interested in learning about HAL and look forward to this 
discussion.
>>  But I am a bit concerned with the way you phrase these sentences:
>> 
>> "There should be a way to at first publish open data resources that are
>> linked, but without rdf, such as in xml and json. Then, at a later 
date,
>> improve with a descriptive rdf vocabulary and expressed in rdf to 
become
>> linked open data (preferrably, if possible, keeping compatibility with
>> clients that implemented reading the previous non-semantic version)."
>> 
>> To me this reads that non-rdf methods like xml and json are 
accommodations
>> to constituents who "have not yet seen the light of RDF" and I want to
>> make sure we are providing best practices standards recommendations to 
the
>> world that exists rather than the "perfect world" we would like someday 
to
>> exist.
>> 
>> At IBM, we make software that runs on many operating systems.  Of 
course
>> we employ people with preferences for OSX, Linux, Systemz, AIX, Unix, 
and
>> even Windows.  Heck, many ATMS around the world still run on OS/2...
>> 
>> But because our customers run all of the above we supply them with all 
of
>> the above solutions.
>> 
>> Can we agree on an "all of the above" approach to DWBP (without 
suggesting
>> that everything someday becomes RDF) too?
>> 
>> Best Regards,
>> 
>> Steve
>> 
>> Motto: "Do First, Think, Do it Again"
>> 
>> 
>> From:
>> Augusto Herrmann <augusto.herrmann@gmail.com>
>> To:
>> DWBP Public List <public-dwbp-wg@w3.org>
>> Date:
>> 03/19/2014 01:16 PM
>> Subject:
>> Re: Data "on" the Web vs Data "in" the Web
>> 
>> 
>> 
>> 
>> 
>> Hi,
>> 
>> this is a very important point, Ig. My thoughts exactly when I 
suggested
>> we look at the Hypertext Application Language (HAL) proposal [1] in the
>> first meeting. It was in fact an invitation for us to think about data
>> "in" the web, as in "part of the web itself". We don't necessarily have 
to
>> follow HAL, but should look at is as a source of inspiration. The way
>> links are represented in resources in Subbu Allamaraju's RESTful
>> Webservices Cookbook [2] is another source of inspiration.
>> 
>> We should think of standard ways to insert links to other data into 
many
>> common open data formats, such as xml, json and maybe even csv.. Of 
course
>> this linking requirement is satisfied by linked open data and rdf, but
>> sometimes organizations have some data and are willing to pubilsh, but
>> initially do not have the necessary resources (i.e. people, knowledge) 
to
>> develop vocabularies to describe the data. However, interlinking among
>> resources of a dataset, or even linking to resources in other datasets 
is
>> somewhat easier to do. There should be a way to at first publish open 
data
>> resources that are linked, but without rdf, such as in xml and json. 
Then,
>> at a later date, improve with a descriptive rdf vocabulary and 
expressed
>> in rdf to become linked open data (preferrably, if possible, keeping
>> compatibility with clients that implemented reading the previous
>> non-semantic version).
>> 
>> Perhaps this could become a use case for the Best Practices document.
>> 
>> [1] http://tools.ietf.org/html/draft-kelly-json-hal

>> [2] http://books.google.com.br/books?id=LDuzpQlVuG4C

>> 
>> All the best,
>> Augusto Herrmann
>> Open Data Team - Ministry of Planning - Brazil
>> 
>> 
>> On Mon, Mar 17, 2014 at 9:28 AM, Ig Ibert Bittencourt 
<ig.ibert@gmail.com>
>> wrote:
>> Hello DWBP,
>> 
>> I was reading again about the 5 Start for Open Data and I saw this
>> affirmation below about 3 starts Web Data [1] that I think would be
>> interesting to share with this WG.
>> 
>> Excellent! The data is not only available via the Web but now everyone 
can
>> use the data easily. On the other hand, it's still data on the Web and 
not
>> data in the Web.
>> 
>> 
>> With regards this affirmation, you can see more details in [2] and [3],
>> but not that much.
>> 
>> 
>> [1] http://5stardata.info/

>> [2] http://webofdata.wordpress.com/2010/03/01/data-and-the-web-choices/

>> [3] http://lists.xml.org/archives/xml-dev/200211/msg01290.html

>> 
>> 
>> Best,
>> 
>> Ig Ibert Bittencourt
>> Professor Adjunto III - Universidade Federal de Alagoas (UFAL)
>> Vice-Coordenador da Comissão Especial de Informática na Educação
>> Líder do Centro de Excelência em Tecnologias Sociais
>> Co-fundador da Startup MeuTutor Soluções Educacionais LTDA.
>> 
>> 
>> 
>> 
>> 
> 
> --
> 
> 
> Phil Archer
> W3C Data Activity Lead
> http://www.w3.org/2013/data/
> 
> http://philarcher.org

> +44 (0)7887 767755
> @philarcher1
> 
> 
>
Received on Wednesday, 26 March 2014 00:05:55 UTC