- From: David Booth <david@dbooth.org>
- Date: Thu, 05 Mar 2015 18:41:32 -0500
- To: Paul Klink <paul@klink.id.au>, public-csv-wg@w3.org
Hi Paul, I'm not in the "CSV on the Web" working group -- I'm just an interested bystander :) -- so I think the working group would have to comment on your use case. Here is the use case document that the group has produced already: http://www.w3.org/TR/csvw-ucr/ David Booth On 03/05/2015 05:57 PM, Paul Klink wrote: > Hi David, > > Let me take one step back and describe the Use Case I am trying to address. > > As a developer, I have often written software to import CSV files > supplied by other organisations. Typically these CSV files contain > extracts from the other organisation's internal databases. At our end > we would read the data from these files, process it and then normally > update our own databases with it. > > In my experience, most of these CSV files contain a single table. > However sometimes the other organisation wants to provide us with > multiple tables of data. In this case the multiple tables are placed in > one file with one column being a key to identifying which table each > line belongs to. By having all the tables in one file, they can safely > reference each other. > > The other organisation will provide a document which describes the > content of the data and how it is formatted in the file. The document > may be sent to us specifically or published via the web. The CSV file > itself is made available via the web. > > The data in these CSV files is normally very application specific. I > don't thing there would be much value in having a schema to cover the > content. Any patterns found between these files would be very generic. > The work involved in providing support for any such schema is unlikely > be worth the effort for both the producer of the data or the consumer. > > However, obviously, there is a pattern in the formatting of these > files. By having a schema which identifies the formatting, it makes it > far easier to produce and consume the data (at a reading and writing > level). Ideally the schema allows the data to be accessed in the same > way we access a database. The following pseudo code for reading data is > shown below: > > Open file (and associated meta) > while <not at end of file> > begin > Read record > Read values from fields > Do stuff with values > Go to next record > end > Close File > > or, in the case of a text file containing multiple tables > > Open file (and associated meta) > while <not at end of file> > begin > Read record > if <start of new table> > begin > Initialise processing for table > end; > Read values from fields > Do stuff with values > Go to next record > end > Close File > > With the above code, no knowledge is required of how the text file > containing the data (CSV or other text variant) is formatted. This will > make it far easier for programmers to import files. It also provides > more built-in checking to confirm that the data is being correctly > interpreted. For example, columns are correctly chosen, data types are > correctly interpreted. Another benefit is that producer of the data no > longer has to document the format. The net effect of this is for > significant productivity improvements in working with these files. > > To make this work, the organisation producing the files would need to > generate the Meta. They will probably only do this if: > 1) The schema is specified by a well known standard and widely adopted > 2) Is very easy to implement (say less than 30 minutes for a simple file) > > I would hazard a guess, that this Use Case is the most common use of CSV > files on the Web. It probably will remain so (at least in terms of > number of organisations) for the foreseeable future. > > It would be great if “CSV on the Web” could cover this Use Case. It > seems to me that it is almost there. It would only need to be slightly > extended to cover a larger variety of formatting of text files. While > “CSV on the Web”'s charter talks a lot about meta data describing the > content of CSV files, it states that the primary focus is to associate > Meta data with CSV files. I would like to think that providing > sufficient Meta data so that existing text files can be read (and > written) in a format independent way, would provide the foundation of a > schema and fall within the scope of the charter. > > Another long post from me. Hopefully you find it constructive. > > Regards > Paul
Received on Thursday, 5 March 2015 23:42:01 UTC