Re: Convenience suggestion: Allow metadata in a CSV file

Hi David,

Sorry, I looked at the example metadata that you provided and it includes
data, so that misled me (I'll admit I didn't manage to watch the video as I
was on a train but I now have and understand the goal there).

We did discuss CSV syntax for the metadata early on as well. I was
initially keen on this idea myself.

However, the problem is that CSV is best suited to tabular information and
metadata isn't tabular. You've encountered this in your example where you
have rows that are mostly empty in which the column headers are completely
irrelevant to the contents of the cells. If you try to support the entirety
of the metadata vocabulary I think there will be a number of instances
where the constraints of the tabular syntax start to really bite (multiple
tables, derived datatypes and foreign keys are the ones that spring to
mind). It certainly isn't impossible to support those things, but I think
it is difficult and I don't think the result will be particularly user
friendly.

I think it would be interesting to investigate alternative syntaxes for
parts of tabular metadata (eg just schemas) and/or a specialist text-based
format for the metadata (ala the compact syntax for RELAX NG or the
Manchester syntax for ontologies). But I think these are substantial pieces
of work and not things that we can take on right now.

I suspect that if supplying metadata for CSV files takes off we will find
tools start to develop more user friendly syntaxes to save people writing
JSON by hand, as you have done, and these could be used to inform
standardisation of such a syntax.

So as you suspected, this isn't something that I think we can take on at
this stage.

Does that make sense? Is there anything that your like to see in the specs
that leaves this possible future work in play?

Jeni

On Thu, 30 Apr 2015 12:26 David Booth <david@dbooth.org> wrote:

> Hi Jeni,
>
> This approach would *not* require the publisher to amend existing CSV
> files.  The metadata is provided in a *separate* CSV file requiring no
> changes whatsoever to existing CSV formats.  Was the video unclear about
> that?  (Apologies if so.)
>
> Thanks,
> David Booth
>
> On 04/30/2015 03:37 AM, Jeni Tennison wrote:
> > Hi David,
> >
> > Yes, we did discuss this a earlier on and you might be aware of similar
> > approaches in HXL [1] and Linked CSV [2].
> >
> > We decided to rule this out of scope for now, mostly because adoption
> > would require publisher effort to amend existing CSV files and we only
> > had time to address the 80% case.
> >
> > However, we have tried to ensure that the specifications support the
> > scenario where someone (maybe a future incarnation of the group) defines
> > a CSV-based syntax that includes embedded metadata. You'll see an
> > example of how that could work in [3].
> >
> > Can you confirm that you're content with this response?
> >
> > Thanks,
> >
> > Jeni
> >
> > [1] http://hxlstandard.org/
> > [2] http://jenit.github.io/linked-csv/
> > [3]
> >
> http://w3c.github.io/csvw/syntax/index.html#recognising-tabular-data-formats
> >
> > On 30 Apr 2015 03:43, "David Booth" <david@dbooth.org
> > <mailto:david@dbooth.org>> wrote:
> >
> >     I don't know if the working group has already considered this, but
> >     I'd like to suggest consider allowing CSV metadata to be specified
> >     in another CSV file, as an alternative to JSON.  I have found this
> >     approach to be quite convenient in a tool that I've been developing,
> >     and I think it could increase uptake of a CSV metadata standard.
> >
> >     Here is a very short mockup video (2 minutes 59 seconds) that
> >     illustrates this approach:
> >     https://www.youtube.com/watch?v=LmQWHdaN8_w
> >
> >     I realize that some CSV metadata authors may prefer JSON syntax.
> >     But as simple as JSON is, spreadsheet competence is far more
> >     widespread.  Also I would not blame anyone for being disinclined to
> >     consider this approach given the late date.  But this approach only
> >     involves different syntax -- not semantics -- and if it does indeed
> >     lower the adoption barrier then it seems to me that it would be
> >     worth considering.
> >
> >     What do others think?
> >
> >     Thanks,
> >     David Booth
> >
> >
>
>

Received on Thursday, 30 April 2015 16:33:06 UTC