Re: CSV2RDF from Andy Seaborne on 2014-03-20 (public-csv-wg@w3.org from March 2014)

From: Andy Seaborne <andy@apache.org>
Date: Thu, 20 Mar 2014 15:35:19 +0000
To: Juan Sequeda <juanfederico@gmail.com>
CC: CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <532B0AB7.2030103@apache.org>
On 20/03/14 15:01, Juan Sequeda wrote:
> Andy,
>
> I'm glad you brought up the property about information preservation. I
> now understand where you are coming from.

Specifically on column ordering, the information preserved is that 
defined by the Core Data Model.

My suggestion (and I'm guessing this will be common across formats) is 
that there is a metadata section to the translation, which may be 
optional if clutter matters in that format.

> The W3C Direct Mapping is not information preserving when there are
> NULLs in the database. The reason why is because the RDB schema was
> ignored in the direct mapping. With my scientific hat on, I studied the
> direct mapping wrt to fundamental and desirable properties and made some
> extensions to it:
> http://www2012.org/proceedings/proceedings/p649.pdf

(/me add to reading list)

>
> This extension is in our Ultrawrap tool.
>
> If ordering is important, then it should be part of the mapping.
>
> If the mapping is correct (I'm using the term correct a bit loosely),
> then it should naturally be information preserving. This means that
> CSV_1 -> RDF -> CSV_2 where CSV_1 = CSV_2.
>
> Note that this does not hold when RDF is the starting point.

Perfect round-trip isn't, to me, a necessary design goal, just a useful 
test of information preservation.

 Andy

>
>
> Juan Sequeda
> +1-575-SEQ-UEDA
> www.juansequeda.com <http://www.juansequeda.com>
>
>
> On Thu, Mar 20, 2014 at 3:49 AM, Andy Seaborne <andy@apache.org
> <mailto:andy@apache.org>> wrote:
>
>     Jeni, Juan,
>
>     This is not about round-tripping per se - that is only a test to see
>     if information has been preserved in the mapping.
>
>     In converting to RDF, I was aiming for something that did not loose
>     information.
>
>     The ATDM says:
>     [[
>     The order of the columns is significant and must be preserved by
>     applications.
>     ]]
>
>     As we have identified that column order can be significant (in some
>     situations), the RDF translation should be able preserve that.
>
>     Given that starting point, I think the question to ask should be the
>     other way round.
>
>     Why would we drop information in the translation?
>
>     Otherwise, if we don't require it for RDF, then surely that
>     indicates it is not required in the ATDM.
>
>     Do you have a use case for ATDM that does not apply to RDF?
>
>     There are some uses I can think of:
>
>     1/ text/csv fragment identifiers use column numbering so you need it
>     to get from fragment to column and cell.
>
>     2/ Support for data display: simply outputting displayable
>     information as part of an RDF-based application in a form visually
>     similar to the original data.
>
>              Andy
>
>     PS wiki or github?
>
>
>
>     On 19/03/14 23:39, Juan Sequeda wrote:
>
>         In the RDB2RDF work, we focused only on RDB to RDF (no roundtrip).
>         Therefore I agree with Jeni.
>
>         Btw, I'm catching up on the emails and minutes on CSV2RDF and Direct
>         Mapping. I hope that I will be able to contribute.
>
>         Best,
>
>         Juan Sequeda
>         +1-575-SEQ-UEDA
>         www.juansequeda.com <http://www.juansequeda.com>
>         <http://www.juansequeda.com>
>
>
>
>         On Wed, Mar 19, 2014 at 6:18 PM, Jeni Tennison
>         <jeni@jenitennison.com <mailto:jeni@jenitennison.com>
>         <mailto:jeni@jenitennison.com <mailto:jeni@jenitennison.com>>__>
>         wrote:
>
>              Thanks Andy,
>
>              One point that I think is an interesting question: you
>         addressed the
>              preservation of column order at:
>
>         https://www.w3.org/2013/csvw/__wiki/CSV2RDF#Column_Order
>         <https://www.w3.org/2013/csvw/wiki/CSV2RDF#Column_Order>
>
>              I think we should concentrate on conversion *to* the other
>         formats
>              (RDF etc) rather than caring about round-tripping. So I’m not
>              convinced of the value of retaining this information in an RDF
>              mapping. How do you see it being used by an RDF-based
>         application,
>              aside from in reconstructing the original CSV?
>
>              Cheers,
>
>              Jeni
>
>              ------------------------------__------------------------
>              From: Andy Seaborne andy@apache.org
>         <mailto:andy@apache.org> <mailto:andy@apache.org
>         <mailto:andy@apache.org>>
>              Reply: Andy Seaborne andy@apache.org
>         <mailto:andy@apache.org> <mailto:andy@apache.org
>         <mailto:andy@apache.org>>
>
>              Date: 19 March 2014 at 21:11:49
>              To: CSV on the Web Working Group public-csv-wg@w3.org
>         <mailto:public-csv-wg@w3.org>
>              <mailto:public-csv-wg@w3.org <mailto:public-csv-wg@w3.org>>
>
>              Subject:  CSV2RDF
>
>               > Some notes on CSV+ to RDF.
>               >
>               > https://www.w3.org/2013/csvw/__wiki/CSV2RDF
>         <https://www.w3.org/2013/csvw/wiki/CSV2RDF>
>               >
>               > In no way is this completed work - consider it
>         "in-progress".
>               >
>               > There seem to be three distinct levels to consider:
>               >
>               > 1. Tabular CSV to RDF
>               > 2. Annotated Data Model to RDF
>               > 3. Mapped Annotated Data Model to RDF
>               >
>               > and the content so far only considers levels 1 and 2
>         with no domain
>               > specific target data schema.
>               >
>               > Andy
>               >
>               >
>               >
>
>              --
>              Jeni Tennison
>         http://www.jenitennison.com/
>
>
>
>
Received on Thursday, 20 March 2014 15:35:51 UTC