- From: Laufer <laufer@globo.com>
- Date: Thu, 27 Mar 2014 12:27:22 -0300
- To: Ig Ibert Bittencourt <ig.ibert@gmail.com>
- Cc: DWBP WG <public-dwbp-wg@w3.org>
- Message-ID: <CA+pXJiiM06JfJ3y8o4ooNjgcyTtRHgYVgX3R7W+-kAqYn09ZYQ@mail.gmail.com>
Ig, What I am trying to expose is that we should differentiate the ideas of the RDF Model and Linked Data from the way Data is stored. Besides that, I think we should take into account the tools that are being used do expose Data on the Web. Best, Laufer 2014-03-27 5:38 GMT-03:00 Ig Ibert Bittencourt <ig.ibert@gmail.com>: > Hi Laufer, > > Thank you for your didactic e-mail. :) > > I agree that Data semantics is very important and we should definitely try > to connect our the data as much as possible to some kind of schema of > others people's data. > > As far as I understand, your proposal goes in the same way as the fifth > start of the Tim's 5 start open data plan [1] and also with the third > principle of the Linked Data Principles [2]. Is that right? > > Even though, IMHO perhaps could be a good idea to reinforce the LD > principles as best practices. > > [1] http://5stardata.info/ > [2] http://www.w3.org/DesignIssues/LinkedData.html > > All the Best, > Ig > > > 2014-03-25 16:25 GMT-03:00 Laufer <laufer@globo.com>: > > Hello, All, >> >> >> >> I apologize for the long message. >> >> >> >> I would like to talk about some concepts that are being discussed by the >> WG and are related to Data Formats and Semantics >> >> >> >> Bernardette published a page in the wiki where she defines phases for the >> Data on the Web Lifecycle. >> >> >> >> When we inspect some of the Use Cases and the Stories listed in the wiki, >> including the webinars presentations, we can see that there are more than >> one player, a chain of players, that is responsible for allowing the >> consumption of Data. >> >> >> >> The Data Generation and the Data Distribution phases are done by persons >> that access the raw data to be published but use platforms for distribution >> that have their own metamodels as, for example, CKAN and Socrata. >> >> >> >> The issue "what is the Data format that is consumed" is mixed with the >> idea that the Data format of the stored Data is the same format of the >> consumed Data . In some Use Cases we can see, in some instances, that the >> Publishers store different formats to be downloaded by the Consumers. >> >> >> >> At first sight, it is not important what is the Data format that is >> stored in the repository. When someone request Data, the transformation >> (serialization) of the stored Data could (should?) be done by the Data >> provider. >> >> >> >> Let's take Socrata as an example. A Dataset in Socrata could be uploaded >> from an Excel file, but once it is stored in Socrata cloud, we don't know >> what is the Data format of the original Excel file that is stored as a >> Dataset. A Data consumer has a standard interface where she can browse the >> Dataset and she can ask the platform to export Data in different formats, >> including pdf, json, xml, rdf and xls. >> >> Socrata also provides an individual Endpoint with an API for each >> Dataset. It considers the Endpoint as a way of exporting Data, a slice of >> the whole Dataset. >> >> >> >> When we think about Data semantic, this semantic should be described as >> metadata. It can be stored, for example, in a pdf file describing the data >> model, in a technical style or in a free style. What is important is that >> the Consumer could understand what is being said about the Data that she is >> consuming. >> >> >> >> What could be a Best Practice would be to use a more wide common >> understanding of this metadata. This is one of the contributions of rdf >> model when it defines the use of common vocabularies as a way to describe >> the properties of resources. Besides that, it also introduces the idea of >> universal identifiers in a way of linking Data from different Datasets. >> >> >> >> There is a huge amount of Data to be loaded on the web that has its own >> semantics. People can publish these Data in his own view letting the >> developers to understand each one of these semantics and making the >> mashups. It's ok. But if the Publishers could use common vocabularies these >> could facilitate the work for the Developers to integrate Data. >> >> >> >> Let's take an example. In NYC Open Data Dataset "311 Service Requests >> from 2010 to Present" there are two columns labeled "Latitude" and >> "Longitude". The type of these two columns is Number. Well, we can guess >> that they are related to the latitude and longitude of the address where a >> service was requested. >> >> >> >> There is a human interface where it is possible to browse the Dataset: >> >> >> https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/stnw-hdrd >> >> >> >> To get the information about a service request we can use the Endpoint to >> export Data in json or rdf formats. The columns labels are identified by >> property names derived form the columns labels: "Latitude" is identified as >> "latitude"; "Longitude" as "longitude." >> >> >> >> Using the endpoint created for the Dataset we can obtain the json output >> of the first row: >> >> >> >> http://data.cityofnewyork.us/resource/stnw-hdrd.json?$limit=1 >> >> [ { >> >> >> >> "longitude" : "-73.76983198736392", >> >> "latitude" : "40.71159894212768", >> >> >> >> } ] >> >> >> >> Using the endpoint created for the Dataset we can obtain the rdf output >> of the first row: >> >> http://data.cityofnewyork.us/resource/stnw-hdrd.rdf?$limit=1 >> >> >> >> <rdf:RDF >> >> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" >> >> xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" >> >> xmlns:socrata="http://www.socrata.com/rdf/terms#" >> >> ... >> >> xmlns:dsbase="http://data.cityofnewyork.us/resource/" >> >> xmlns:ds="http://data.cityofnewyork.us/resource/stnw-hdrd/" >> >> xmlns:usps="http://www.w3.org/2000/10/swap/pim/usps#"> >> >> >> >> <dsbase:stnw-hdrd rdf:about=" >> http://data.cityofnewyork.us/resource/stnw-hdrd/27702159"> >> >> <socrata:rowID>7055868</socrata:rowID> >> >> <rdfs:member rdf:resource=" >> http://data.cityofnewyork.us/resource/stnw-hdrd"/> >> >> >> >> <ds:latitude>40.71159894212768</ds:latitude> >> >> <ds:longitude>-73.76983198736392</ds:longitude> >> >> >> >> </dsbase:stnw-hdrd> >> >> </rdf:RDF> >> >> >> >> >> >> Well, the rdf does not introduces any kind of semantics in this case. It >> is only a different serialized format of the Data returned in json. The >> property http://data.cityofnewyork.us/resource/stnw-hdrd/latitudedoesn't have more semantics than the label "Latitude". >> >> >> >> But Socrata allows the owner of the Dataset to associate an rdf property >> to a column. The user can associate any URL as a metadata of the column >> and, besides that, Socrata lists some properties that it understands from >> some vocabularies: dcat; foaf; dublic core; geo. >> >> >> >> I associate to the column "Latitude" the URL: >> http://www.w3.org/2003/01/geo/wgs84_pos#lat >> >> >> >> I associate to the column "Longitude" the URL: >> http://www.w3.org/2003/01/geo/wgs84_pos#long >> >> >> >> I made the endpoint call again: >> >> >> >> http://data.cityofnewyork.us/resource/stnw-hdrd.rdf?$limit=1 >> >> <rdf:RDF >> >> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" >> >> xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" >> >> xmlns:socrata="http://www.socrata.com/rdf/terms#" >> >> ... >> >> xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" >> >> ... >> >> xmlns:dsbase="http://data.cityofnewyork.us/resource/" >> >> xmlns:ds="http://data.cityofnewyork.us/resource/stnw-hdrd/" >> >> xmlns:usps="http://www.w3.org/2000/10/swap/pim/usps#"> >> >> >> >> <dsbase:stnw-hdrd rdf:about=" >> http://data.cityofnewyork.us/resource/stnw-hdrd/27702159"> >> >> <socrata:rowID>7055868</socrata:rowID> >> >> <rdfs:member rdf:resource=" >> http://data.cityofnewyork.us/resource/stnw-hdrd"/> >> >> >> >> <geo:lat>40.71159894212768</geo:lat> >> >> <geo:long>-73.76983198736392</geo:long> >> >> >> >> </dsbase:stnw-hdrd> >> >> </rdf:RDF> >> >> >> >> >> >> Well, the rdf returned geo:lat and geo:long as the properties of two >> numbers that has a well known semantics. >> >> >> >> For me, this is a Best Practice. >> >> >> >> What do you think about this? >> >> >> >> I apologize, again, for the long message. >> >> >> >> Kind Regards, >> >> Laufer >> >> >> -- >> . . . .. . . >> . . . .. >> . .. . >> > > > > -- > > Ig Ibert Bittencourt > Professor Adjunto III - Universidade Federal de Alagoas (UFAL) > Vice-Coordenador da Comissão Especial de Informática na Educação > Líder do Centro de Excelência em Tecnologias Sociais > Co-fundador da Startup MeuTutor Soluções Educacionais LTDA. > -- . . . .. . . . . . .. . .. .
Received on Thursday, 27 March 2014 15:27:52 UTC