Re: Semantics and Data Consumption from Laufer on 2014-03-27 (public-dwbp-wg@w3.org from March 2014)

From: Laufer <laufer@globo.com>
Date: Thu, 27 Mar 2014 12:27:22 -0300
To: Ig Ibert Bittencourt <ig.ibert@gmail.com>
Cc: DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <CA+pXJiiM06JfJ3y8o4ooNjgcyTtRHgYVgX3R7W+-kAqYn09ZYQ@mail.gmail.com>
Ig,

What I am trying to expose is that we should differentiate the ideas of the
RDF Model and Linked Data from the way Data is stored.

Besides that, I think we should take into account the tools that are being
used do expose Data on the Web.

Best,
Laufer


2014-03-27 5:38 GMT-03:00 Ig Ibert Bittencourt <ig.ibert@gmail.com>:

> Hi Laufer,
>
> Thank you for your didactic e-mail. :)
>
> I agree that Data semantics is very important and we should definitely try
> to connect our the data as much as possible to some kind of schema of
> others people's data.
>
> As far as I understand, your proposal goes in the same way as the fifth
> start of the Tim's 5 start open data plan [1] and also with the third
> principle of the Linked Data Principles [2]. Is that right?
>
> Even though, IMHO perhaps could be a good idea to reinforce the LD
> principles as best practices.
>
> [1] http://5stardata.info/
> [2] http://www.w3.org/DesignIssues/LinkedData.html
>
> All the Best,
> Ig
>
>
> 2014-03-25 16:25 GMT-03:00 Laufer <laufer@globo.com>:
>
>  Hello, All,
>>
>>
>>
>> I apologize for the long message.
>>
>>
>>
>> I would like to talk about some concepts that are being discussed by the
>> WG and are related to Data Formats and Semantics
>>
>>
>>
>> Bernardette published a page in the wiki where she defines phases for the
>> Data on the Web Lifecycle.
>>
>>
>>
>> When we inspect some of the Use Cases and the Stories listed in the wiki,
>> including the webinars presentations, we can see that there are more than
>> one player, a chain of players, that is responsible for allowing the
>> consumption of Data.
>>
>>
>>
>> The Data Generation and the Data Distribution phases are done by persons
>> that access the raw data to be published but use platforms for distribution
>> that have their own metamodels as, for example, CKAN and Socrata.
>>
>>
>>
>> The issue "what is the Data format that is consumed" is mixed with the
>> idea that the Data format of the stored Data is the same format of the
>> consumed Data . In some Use Cases we can see, in some instances, that the
>> Publishers store different formats to be downloaded by the Consumers.
>>
>>
>>
>> At first sight, it is not important what is the Data format that is
>> stored in the repository. When someone request Data, the transformation
>> (serialization) of the stored Data could (should?) be done by the Data
>> provider.
>>
>>
>>
>> Let's take Socrata as an example. A Dataset in Socrata could be uploaded
>> from an Excel file, but once it is stored in Socrata cloud, we don't know
>> what is the Data format of the original Excel file that is stored as a
>> Dataset. A Data consumer has a standard interface where she can browse the
>> Dataset and she can ask the platform to export Data in different formats,
>> including pdf, json, xml, rdf and xls.
>>
>> Socrata also provides an individual Endpoint with an API for each
>> Dataset. It considers the Endpoint as a way of exporting Data, a slice of
>> the whole Dataset.
>>
>>
>>
>> When we think about Data semantic, this semantic should be described as
>> metadata. It can be stored, for example, in a pdf file describing the data
>> model, in a technical style or in a free style. What is important is that
>> the Consumer could understand what is being said about the Data that she is
>> consuming.
>>
>>
>>
>> What could be a Best Practice would be to use a more wide common
>> understanding of this metadata. This is one of the contributions of rdf
>> model when it defines the use of common vocabularies as a way to describe
>> the properties of resources. Besides that, it also introduces the idea of
>> universal identifiers in a way of linking Data from different Datasets.
>>
>>
>>
>> There is a huge amount of Data to be loaded on the web that has its own
>> semantics. People can publish these Data in his own view letting the
>> developers to understand each one of these semantics and making the
>> mashups. It's ok. But if the Publishers could use common vocabularies these
>> could facilitate the work for the Developers to integrate Data.
>>
>>
>>
>> Let's take an example. In NYC Open Data Dataset "311 Service Requests
>> from 2010 to Present" there are two columns labeled "Latitude" and
>> "Longitude". The type of these two columns is Number. Well, we can guess
>> that they are related to the latitude and longitude of the address where a
>> service was requested.
>>
>>
>>
>> There is a human interface where it is possible to browse the Dataset:
>>
>>
>> https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/stnw-hdrd
>>
>>
>>
>> To get the information about a service request we can use the Endpoint to
>> export Data in json or rdf formats. The columns labels are identified by
>> property names derived form the columns labels: "Latitude" is identified as
>> "latitude"; "Longitude" as "longitude."
>>
>>
>>
>> Using the endpoint created for the Dataset we can obtain the json output
>> of the first row:
>>
>>
>>
>> http://data.cityofnewyork.us/resource/stnw-hdrd.json?$limit=1
>>
>> [ {
>>
>>
>>
>> "longitude" : "-73.76983198736392",
>>
>> "latitude" : "40.71159894212768",
>>
>>
>>
>>  }  ]
>>
>>
>>
>> Using the endpoint created for the Dataset we can obtain the rdf output
>> of the first row:
>>
>> http://data.cityofnewyork.us/resource/stnw-hdrd.rdf?$limit=1
>>
>>
>>
>> <rdf:RDF
>>
>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>
>> xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>>
>> xmlns:socrata="http://www.socrata.com/rdf/terms#"
>>
>> ...
>>
>> xmlns:dsbase="http://data.cityofnewyork.us/resource/"
>>
>> xmlns:ds="http://data.cityofnewyork.us/resource/stnw-hdrd/"
>>
>> xmlns:usps="http://www.w3.org/2000/10/swap/pim/usps#">
>>
>>
>>
>> <dsbase:stnw-hdrd rdf:about="
>> http://data.cityofnewyork.us/resource/stnw-hdrd/27702159">
>>
>> <socrata:rowID>7055868</socrata:rowID>
>>
>> <rdfs:member rdf:resource="
>> http://data.cityofnewyork.us/resource/stnw-hdrd"/>
>>
>>
>>
>> <ds:latitude>40.71159894212768</ds:latitude>
>>
>> <ds:longitude>-73.76983198736392</ds:longitude>
>>
>>
>>
>> </dsbase:stnw-hdrd>
>>
>> </rdf:RDF>
>>
>>
>>
>>
>>
>> Well, the rdf does not introduces any kind of semantics in this case. It
>> is only a different serialized format of the Data returned in json. The
>> property http://data.cityofnewyork.us/resource/stnw-hdrd/latitudedoesn't have more semantics than the label "Latitude".
>>
>>
>>
>> But Socrata allows the owner of the Dataset to associate an rdf property
>> to a column. The user can associate any URL as a metadata of the column
>> and, besides that, Socrata lists some properties that it understands from
>> some vocabularies: dcat; foaf; dublic core; geo.
>>
>>
>>
>> I associate to the column "Latitude" the URL:
>> http://www.w3.org/2003/01/geo/wgs84_pos#lat
>>
>>
>>
>> I associate to the column "Longitude" the URL:
>> http://www.w3.org/2003/01/geo/wgs84_pos#long
>>
>>
>>
>> I made the endpoint call again:
>>
>>
>>
>> http://data.cityofnewyork.us/resource/stnw-hdrd.rdf?$limit=1
>>
>> <rdf:RDF
>>
>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>
>> xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>>
>> xmlns:socrata="http://www.socrata.com/rdf/terms#"
>>
>> ...
>>
>> xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
>>
>> ...
>>
>> xmlns:dsbase="http://data.cityofnewyork.us/resource/"
>>
>> xmlns:ds="http://data.cityofnewyork.us/resource/stnw-hdrd/"
>>
>> xmlns:usps="http://www.w3.org/2000/10/swap/pim/usps#">
>>
>>
>>
>> <dsbase:stnw-hdrd rdf:about="
>> http://data.cityofnewyork.us/resource/stnw-hdrd/27702159">
>>
>> <socrata:rowID>7055868</socrata:rowID>
>>
>> <rdfs:member rdf:resource="
>> http://data.cityofnewyork.us/resource/stnw-hdrd"/>
>>
>>
>>
>> <geo:lat>40.71159894212768</geo:lat>
>>
>> <geo:long>-73.76983198736392</geo:long>
>>
>>
>>
>> </dsbase:stnw-hdrd>
>>
>> </rdf:RDF>
>>
>>
>>
>>
>>
>> Well, the rdf returned geo:lat and geo:long as the properties of two
>> numbers that has a well known semantics.
>>
>>
>>
>> For me, this is a Best Practice.
>>
>>
>>
>> What do you think about this?
>>
>>
>>
>> I apologize, again, for the long message.
>>
>>
>>
>> Kind Regards,
>>
>> Laufer
>>
>>
>> --
>> .  .  .  .. .  .
>> .        .   . ..
>> .     ..       .
>>
>
>
>
> --
>
> Ig Ibert Bittencourt
> Professor Adjunto III - Universidade Federal de Alagoas (UFAL)
> Vice-Coordenador da Comissão Especial de Informática na Educação
> Líder do Centro de Excelência em Tecnologias Sociais
> Co-fundador da Startup MeuTutor Soluções Educacionais LTDA.
>



-- 
.  .  .  .. .  .
.        .   . ..
.     ..       .
Received on Thursday, 27 March 2014 15:27:52 UTC