Re: Semantics and Data Consumption from Steven Adler on 2014-03-28 (public-dwbp-wg@w3.org from March 2014)

From: Steven Adler <adler1@us.ibm.com>
Date: Fri, 28 Mar 2014 10:52:13 -0700
To: Laufer <laufer@globo.com>
Cc: Ig Ibert Bittencourt <ig.ibert@gmail.com>, Public DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <OF1868FA8A.774600EA-ON88257CA9.00612444-88257CA9.006229F4@us.ibm.com>
Good idea.  I will ask my friends at Socrata to join us.  I think they 
will be happy to participate.  not sure about DKAN, which is a Drupal 
implimentation of CKAN,

Phil, don't you know some people at CKAN?


Best Regards,

Steve

Motto: "Do First, Think, Do it Again"



From:
Laufer <laufer@globo.com>
To:
Ig Ibert Bittencourt <ig.ibert@gmail.com>
Cc:
Public DWBP WG <public-dwbp-wg@w3.org>
Date:
03/27/2014 02:32 PM
Subject:
Re: Semantics and Data Consumption



Ig,

I think it is important to inspect tools like Socrata and DKAN that are 
used to Publish Data. I like the term, exposing Data.

When you see a Use Case as dados.gov.br what you have is a portal using 
DKAN as a tool to expose Data. The same thing with NYC OpenData or 
ControlPanel LA, but using Socrata. They have a way to manage catalogs but 
the real users are different agencies that constitute other kind of Use 
Cases.

DKAN and Socrata have their own metamodels. These metamodels define the 
things. They define the way Data is exposed when you use the tool.

I think it would be interesting to have people from these organizations as 
participants of the WG. They could tell us how they see this market and at 
the same time they could talk about their metamodels and they could be one 
of the targets of the Best Practices, including in their metamodels 
features that could help to implement the recommendations of the WG.

Best Regards,
Laufer


2014-03-27 18:11 GMT-03:00 Ig Ibert Bittencourt <ig.ibert@gmail.com>:
Hi Laufer,
On Mar 27, 2014 11:27 PM, "Laufer" <laufer@globo.com> wrote:
>
> Ig,
>
> What I am trying to expose is that we should differentiate the ideas of 
the RDF Model and Linked Data from the way Data is stored.
+1
>
> Besides that, I think we should take into account the tools that are 
being used do expose Data on the Web.
>
What you mean?
> Best,
> Laufer
>
>
> 2014-03-27 5:38 GMT-03:00 Ig Ibert Bittencourt <ig.ibert@gmail.com>:
>
>> Hi Laufer,
>>
>> Thank you for your didactic e-mail. :)
>>
>> I agree that Data semantics is very important and we should definitely 
try to connect our the data as much as possible to some kind of schema of 
others people's data.
>>
>> As far as I understand, your proposal goes in the same way as the fifth 
start of the Tim's 5 start open data plan [1] and also with the third 
principle of the Linked Data Principles [2]. Is that right?
>>
>> Even though, IMHO perhaps could be a good idea to reinforce the LD 
principles as best practices. 
>>
>> [1] http://5stardata.info/

>> [2] http://www.w3.org/DesignIssues/LinkedData.html

>>
>> All the Best,
>> Ig
>>
>>
>> 2014-03-25 16:25 GMT-03:00 Laufer <laufer@globo.com>:
>>
>>> Hello, All,
>>>
>>> 
>>>
>>> I apologize for the long message.
>>>
>>> 
>>>
>>> I would like to talk about some concepts that are being discussed by 
the WG and are related to Data Formats and Semantics
>>>
>>> 
>>>
>>> Bernardette published a page in the wiki where she defines phases for 
the Data on the Web Lifecycle.
>>>
>>> 
>>>
>>> When we inspect some of the Use Cases and the Stories listed in the 
wiki, including the webinars presentations, we can see that there are more 
than one player, a chain of players, that is responsible for allowing the 
consumption of Data.
>>>
>>> 
>>>
>>> The Data Generation and the Data Distribution phases are done by 
persons that access the raw data to be published but use platforms for 
distribution that have their own metamodels as, for example, CKAN and 
Socrata.
>>>
>>> 
>>>
>>> The issue “what is the Data format that is consumed” is mixed with the 
idea that the Data format of the stored Data is the same format of the 
consumed Data . In some Use Cases we can see, in some instances, that the 
Publishers store different formats to be downloaded by the Consumers.
>>>
>>> 
>>>
>>> At first sight, it is not important what is the Data format that is 
stored in the repository. When someone request Data, the transformation 
(serialization) of the stored Data could (should?) be done by the Data 
provider.
>>>
>>> 
>>>
>>> Let’s take Socrata as an example. A Dataset in Socrata could be 
uploaded from an Excel file, but once it is stored in Socrata cloud, we 
don’t know what is the Data format of the original Excel file that is 
stored as a Dataset. A Data consumer has a standard interface where she 
can browse the Dataset and she can ask the platform to export Data in 
different formats, including pdf, json, xml, rdf and xls.
>>>
>>> Socrata also provides an individual Endpoint with an API for each 
Dataset. It considers the Endpoint as a way of exporting Data, a slice of 
the whole Dataset.
>>>
>>> 
>>>
>>> When we think about Data semantic, this semantic should be described 
as metadata. It can be stored, for example, in a pdf file describing the 
data model, in a technical style or in a free style. What is important is 
that the Consumer could understand what is being said about the Data that 
she is consuming.
>>>
>>> 
>>>
>>> What could be a Best Practice would be to use a more wide common 
understanding of this metadata. This is one of the contributions of rdf 
model when it defines the use of common vocabularies as a way to describe 
the properties of resources. Besides that, it also introduces the idea of 
universal identifiers in a way of linking Data from different Datasets.
>>>
>>> 
>>>
>>> There is a huge amount of Data to be loaded on the web that has its 
own semantics. People can publish these Data in his own view letting the 
developers to understand each one of these semantics and making the 
mashups. It’s ok. But if the Publishers could use common vocabularies 
these could facilitate the work for the Developers to integrate Data.
>>>
>>> 
>>>
>>> Let’s take an example. In NYC Open Data Dataset “311 Service Requests 
from 2010 to Present“ there are two columns labeled “Latitude” and 
“Longitude”. The type of these two columns is Number. Well, we can guess 
that they are related to the latitude and longitude of the address where a 
service was requested.
>>>
>>> 
>>>
>>> There is a human interface where it is possible to browse the Dataset:
>>>
>>> 
https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/stnw-hdrd


>>>
>>> 
>>>
>>> To get the information about a service request we can use the Endpoint 
to export Data in json or rdf formats. The columns labels are identified 
by property names derived form the columns labels: “Latitude” is 
identified as “latitude”; “Longitude” as “longitude.”
>>>
>>> 
>>>
>>> Using the endpoint created for the Dataset we can obtain the json 
output of the first row:
>>>
>>> 
>>>
>>> http://data.cityofnewyork.us/resource/stnw-hdrd.json?$limit=1
>>>
>>> [ { 
>>>
>>> 
>>>
>>> "longitude" : "-73.76983198736392", 
>>>
>>> "latitude" : "40.71159894212768", 
>>>
>>> 
>>>
>>>  }  ]
>>>
>>> 
>>>
>>> Using the endpoint created for the Dataset we can obtain the rdf 
output of the first row:
>>>
>>> http://data.cityofnewyork.us/resource/stnw-hdrd.rdf?$limit=1

>>>
>>> 
>>>
>>> <rdf:RDF
>>>
>>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>
>>> xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>>>
>>> xmlns:socrata="http://www.socrata.com/rdf/terms#"
>>>
>>> ...
>>>
>>> xmlns:dsbase="http://data.cityofnewyork.us/resource/"
>>>
>>> xmlns:ds="http://data.cityofnewyork.us/resource/stnw-hdrd/"
>>>
>>> xmlns:usps="http://www.w3.org/2000/10/swap/pim/usps#">
>>>
>>> 
>>>
>>> <dsbase:stnw-hdrd rdf:about="
http://data.cityofnewyork.us/resource/stnw-hdrd/27702159"> 
>>>
>>> <socrata:rowID>7055868</socrata:rowID> 
>>>
>>> <rdfs:member rdf:resource="
http://data.cityofnewyork.us/resource/stnw-hdrd"/> 
>>>
>>> 
>>>
>>> <ds:latitude>40.71159894212768</ds:latitude> 
>>>
>>> <ds:longitude>-73.76983198736392</ds:longitude> 
>>>
>>> 
>>>
>>> </dsbase:stnw-hdrd>
>>>
>>> </rdf:RDF>
>>>
>>> 
>>>
>>> 
>>>
>>> Well, the rdf does not introduces any kind of semantics in this case. 
It is only a different serialized format of the Data returned in json. The 
property http://data.cityofnewyork.us/resource/stnw-hdrd/latitude doesn’t 
have more semantics than the label “Latitude“.
>>>
>>> 
>>>
>>> But Socrata allows the owner of the Dataset to associate an rdf 
property to a column. The user can associate any URL as a metadata of the 
column and, besides that, Socrata lists some properties that it 
understands from some vocabularies: dcat; foaf; dublic core; geo.
>>>
>>> 
>>>
>>> I associate to the column “Latitude“ the URL: 
http://www.w3.org/2003/01/geo/wgs84_pos#lat

>>>
>>> 
>>>
>>> I associate to the column “Longitude“ the URL: 
http://www.w3.org/2003/01/geo/wgs84_pos#long

>>>
>>> 
>>>
>>> I made the endpoint call again:
>>>
>>> 
>>>
>>> http://data.cityofnewyork.us/resource/stnw-hdrd.rdf?$limit=1

>>>
>>> <rdf:RDF
>>>
>>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>
>>> xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>>>
>>> xmlns:socrata="http://www.socrata.com/rdf/terms#"
>>>
>>> ...
>>>
>>> xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
>>>
>>> ...
>>>
>>> xmlns:dsbase="http://data.cityofnewyork.us/resource/"
>>>
>>> xmlns:ds="http://data.cityofnewyork.us/resource/stnw-hdrd/"
>>>
>>> xmlns:usps="http://www.w3.org/2000/10/swap/pim/usps#">
>>>
>>> 
>>>
>>> <dsbase:stnw-hdrd rdf:about="
http://data.cityofnewyork.us/resource/stnw-hdrd/27702159"> 
>>>
>>> <socrata:rowID>7055868</socrata:rowID> 
>>>
>>> <rdfs:member rdf:resource="
http://data.cityofnewyork.us/resource/stnw-hdrd"/> 
>>>
>>> 
>>>
>>> <geo:lat>40.71159894212768</geo:lat> 
>>>
>>> <geo:long>-73.76983198736392</geo:long>
>>>
>>> 
>>>
>>> </dsbase:stnw-hdrd>
>>>
>>> </rdf:RDF>
>>>
>>> 
>>>
>>> 
>>>
>>> Well, the rdf returned geo:lat and geo:long as the properties of two 
numbers that has a well known semantics.
>>>
>>> 
>>>
>>> For me, this is a Best Practice.
>>>
>>> 
>>>
>>> What do you think about this?
>>>
>>> 
>>>
>>> I apologize, again, for the long message.
>>>
>>> 
>>>
>>> Kind Regards,
>>>
>>> Laufer
>>>
>>>
>>>
>>> -- 
>>> .  .  .  .. .  . 
>>> .        .   . ..
>>> .     ..       .
>>
>>
>>
>>
>> -- 
>>
>> Ig Ibert Bittencourt
>> Professor Adjunto III - Universidade Federal de Alagoas (UFAL)
>> Vice-Coordenador da Comissão Especial de Informática na Educação
>> Líder do Centro de Excelência em Tecnologias Sociais
>> Co-fundador da Startup MeuTutor Soluções Educacionais LTDA.
>
>
>
>
> -- 
> .  .  .  .. .  . 
> .        .   . ..
> .     ..       .



-- 
.  .  .  .. .  . 
.        .   . ..
.     ..       .
Received on Friday, 28 March 2014 17:52:45 UTC