Re: Proposal: Looking inside tables from Martin Hepp on 2013-08-20 (public-vocabs@w3.org from August 2013)

From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Date: Tue, 20 Aug 2013 15:27:54 +0200
To: Dan Brickley <danbri@google.com>
Cc: Cosmin Paun <cpaun88@gmail.com>, W3C Web Schemas Task Force <public-vocabs@w3.org>
Message-Id: <F2800B4C-B35E-4DDC-A21A-E86A35C8DF76@ebusiness-unibw.org>

> 
> Do you mean the whole proposal, or the top-of-your head thoughts you
> shared today?
> 

Initially, I meant the basic idea of adding specific conceptual elements to schema.org for marking up HTML tables via a "meta-data" short-cut, i.e. describing the data structure of the HTML table. The longer I think about it I mean the proposal as a whole.


> One thing to emphasise about the original proposal is that it is
> intended also to work with non-HTML tabular structures, such as CSV.
> This btw makes it close in scope to
> http://www.w3.org/TR/2012/REC-r2rml-20120927/
> 

I think that this is a pretty different use-case, and I would separate the two issues. I understand that there might be a need for meta-data mechanisms in schema.org, i.e. elements that allow exposing data structures at the schema level instead of the instance level. When you want to publish meta-data that explains the data structures of a CSV file, you do the latter. And we may want to be able to say that

- http://example.com/pricelist.csv is a table,
- that each line represents an entity of a certain type, and
- how the columns map to schema.org properties for that type.

However, I see several issues here:

1. If this mechanism is also advocated for the use with HTML tables, we confuse developers and make it more difficult for data consumers to operate on the data. For instance, any RDF-based client would have to apply proprietary inference rules to translate the meta-data mixed up with the instance data into the normal form (SPARQL CONSTRUCT rule in the form "for each row in that table, create a blank node of a certain type ..."). And developers will have to remember that while they are marking up a table, they are "meta-marking up" the entities in the table. That is as painful as when XML coders tried to model RDF in RDF/XML syntax...

2. The proposal will likely be too simplistic for the more advanced patterns of translating a CSV table into schema.org data, because it assumes a 1:1 mapping from table cells to schema.org property values (except for the concatenating mechanism). For instance, if you have a table with geo-position data (long/lat) for shops, you will have to create an intermediary entity of http://schema.org/GeoCoordinates with latitude and longitude attached. But I fail to see how this can be done with the mechanism.

This can be done easily with GoogleRefine / OpenRefine (https://github.com/OpenRefine) in combination with the DERI RDF extension (http://refine.deri.ie/), as far as I remember, but the proposed mechanism does not cater for such advanced patterns.

3. The proposal is pretty complicated, despite its limitations. I have doubts that broad audiences will be able to apply it correctly for non-trivial scenarios.

4. Putting this mechanism into schema.org means breaking the separation of concerns between vocabulary and syntax. The translation of CSV files and other tabular data into structured data that follows a Web vocabulary like schema.org is a generic challenge that should not be solved inside a single vocabulary.


Martin

--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/

Received on Tuesday, 20 August 2013 13:28:23 UTC