Re: Report on CSV files on the Web

Thanks

On Thu, Nov 20, 2014 at 7:31 AM, Juergen Umbrich <juergen.umbrich@wu.ac.at>
wrote:

> Hi Yankov,
>
> >
> > I am wondering if there is a correlation between the correct MIME type
> being used and the software being used as identified by the "Server"
> header. Is there any chance you may have that data?
> Sure, this data is available and we can get the numbers hopefully
> beginning of next week since i won't be able to compile the numbers during
> this week.
>
> Best
>   Jürgen
> >
> > Thanks,
> > Yakov
> >
> > On Wed, Nov 19, 2014 at 6:30 AM, Juergen Umbrich <
> juergen.umbrich@wu.ac.at> wrote:
> > Hi all,
> >
> > as "announced" last week, here is our first early report about our
> findings by looking into 65k CSV files, published as OpenData on the Web.
> >
> > "This study reports on our findings about 74395 CSV files published on
> the Web as Open Data. The documents are extracted from 91 Open Data CKAN
> portals for which the meta data indicate a comma/character-separate-values
> file. Our analysis includes the inspection of the HTTP response headers,
> encoding detection and guessing of used delimiters. We also determine the
> deviation of data tables compared to a canonical form [1].
> >
> > Our findings show that the majority of the CSV files adhere to the
> RFC4180 specification, meaning the use of csv as file extension, text/csv
> as the HTTP response header content-type , and ',' as delimiter. We also
> show that there exists nearly no information about the content encoding in
> the HTTP head- ers. The major observed deviations are that data tables
> contain rows in which one or several data cells occupy multiple columns and
> that one or several data cells are empty."
> >
> >
> >
> > Best
> >   Jürgen
> >
> > --
> > Dr. Jürgen Umbrich
> > WU Vienna, Institute for Information Business
> >
> >
> >
> >
> >
> >
> >
>
> --
> Dr. Jürgen Umbrich
> WU Vienna, Institute for Information Business
>
>
>
>
>
>

Received on Thursday, 20 November 2014 12:36:25 UTC