W3C home > Mailing lists > Public > public-csv-wg@w3.org > November 2014

Re: Report on CSV files on the Web

From: Yakov Shafranovich <yakov-ietf@shaftek.org>
Date: Thu, 20 Nov 2014 07:35:27 -0500
Message-ID: <CAPQd5oT7s5Z_=DoLxDYzhy_rcn=ofBnk2j-k6jPVtmAaDHQE4Q@mail.gmail.com>
To: Juergen Umbrich <juergen.umbrich@wu.ac.at>
Cc: "public-csv-wg@w3.org" <public-csv-wg@w3.org>, Sebastian Neumaier <sebastian.neumaier@wu.ac.at>
Thanks

On Thu, Nov 20, 2014 at 7:31 AM, Juergen Umbrich <juergen.umbrich@wu.ac.at>
wrote:

> Hi Yankov,
>
> >
> > I am wondering if there is a correlation between the correct MIME type
> being used and the software being used as identified by the "Server"
> header. Is there any chance you may have that data?
> Sure, this data is available and we can get the numbers hopefully
> beginning of next week since i won't be able to compile the numbers during
> this week.
>
> Best
>   Jürgen
> >
> > Thanks,
> > Yakov
> >
> > On Wed, Nov 19, 2014 at 6:30 AM, Juergen Umbrich <
> juergen.umbrich@wu.ac.at> wrote:
> > Hi all,
> >
> > as "announced" last week, here is our first early report about our
> findings by looking into 65k CSV files, published as OpenData on the Web.
> >
> > "This study reports on our findings about 74395 CSV files published on
> the Web as Open Data. The documents are extracted from 91 Open Data CKAN
> portals for which the meta data indicate a comma/character-separate-values
> file. Our analysis includes the inspection of the HTTP response headers,
> encoding detection and guessing of used delimiters. We also determine the
> deviation of data tables compared to a canonical form [1].
> >
> > Our findings show that the majority of the CSV files adhere to the
> RFC4180 specification, meaning the use of csv as file extension, text/csv
> as the HTTP response header content-type , and ',' as delimiter. We also
> show that there exists nearly no information about the content encoding in
> the HTTP head- ers. The major observed deviations are that data tables
> contain rows in which one or several data cells occupy multiple columns and
> that one or several data cells are empty."
> >
> >
> >
> > Best
> >   Jürgen
> >
> > --
> > Dr. Jürgen Umbrich
> > WU Vienna, Institute for Information Business
> >
> >
> >
> >
> >
> >
> >
>
> --
> Dr. Jürgen Umbrich
> WU Vienna, Institute for Information Business
>
>
>
>
>
>
Received on Thursday, 20 November 2014 12:36:25 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:27:45 UTC