RE: Questions on the url property in Table annotation an on dialect being a core property from Svensson, Lars on 2015-12-10 (public-csv-wg@w3.org from December 2015)

From: Svensson, Lars <L.Svensson@dnb.de>
Date: Thu, 10 Dec 2015 10:01:22 +0000
To: Gregg Kellogg <gregg@greggkellogg.net>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>, Jeni Tennison <jeni@jenitennison.com>, Ivan Herman <ivan@w3.org>
Message-ID: <24637769D123E644A105A0AF0E1F92EF010D241717@dnbf-ex1.AD.DDB.DE>
Gregg,

On Wednesday, December 09, 2015 6:52 PM, Gregg Kellogg wrote:

> > On Dec 9, 2015, at 12:52 AM, Ivan Herman <ivan@w3.org> wrote:
> >
> > (Cc-ing to Gregg & Jeni, as an additional ping to get their attention…)
> >
> > Hey Lars,
> >
> >
> >> On 4 Dec 2015, at 12:26, Svensson, Lars <L.Svensson@dnb.de> wrote:
> >>
> >> Dear all,
> >>
> >> While reviewing the WG documents (excellent work, large kudoi to the WG!)
> >
> > Thank you!
> >
> >> and thinking of how we could produce compatible data at our place, I
> stumbled over the url annotation on tables as defined in the metadata
> vocabulary, §5.4 [1]. The specification says that in the table metadata the url
> (URI?) is mandatory and should point to the table the table description
> describes, referring to the definition of url in the tabular data model [2] that
> says that the value of the url might be null.
> 
> Yes, the property must be present, but can be null, in which case it is treated as
> an empty URL, which is resolve relative to the metadata location. Note that the
> “url” property is a Link property, which means that if the value supplied is not a
> string, it is treated as an empty string, and so resolved against the metadata
> base.
> 
> >> At first sight my reading would be that for each table I describe with a table
> annotation in the metadata document, I MUST have a url property pointing
> from the metadata document to the described table. If so, that would be a
> major implementation obstacle at our place.
> 
> Yes, the “url” property is required, and it’s value must reference the CSV being
> used to be considered compatible. If it doesn’t, it is incompatible, which
> basically means that using it will issue a warning, unless you’re validating, in
> which case it’s an error. The reason for this is to take steps to be sure the
> metadata is compatible with the tables being interpreted. IIRC, Jeni was the
> chief proponent of this, and may have something more to say.
> 
> >> Our main use case for producing tabular data is that customers can go to
> the catalogue, select a number of object descriptions and export those as CSV.
> When the customer downloads the data, we would provide a Link-Header
> pointing to the metadata document describing the CSV format. It would,
> however, be almost impossible to point back from that metadata document to
> _all_ instances of CSV files ever created (particularly since that would also have
> privacy concerns, since it would be possible to see what other customers have
> downloaded).
> 
> If your intention is to have one metadata file work, with arbitrary columns
> selected, you’ll run into other problems, as the expectation is that there is a 1:1
> relationship between the columns in the CSV and those in the metadata.

Well, it's one object description per row, but the _columns_ are always the same.

> However, you might create a metadata document the respond to from the link
> header compatible with the CSV that is downloaded. It won’t validate as being
> compatible, but it should be useable for generating RDF or JSON from the
> result, as long as the column descriptions match those in the CSV file.

Hmm. Why shouldn't it validate? If I read §6 Processing Tables [1] correctly, I can start by downloading a data file and then rely on the application finding the proper metadata. And as long as the metadata matches the table, it should validate. Or have I misunderstood something?

> >> This boils down to the following question(s):
> >>
> >> 1) Is my understanding of the use of the url property in the table metadata
> correct?
> >> 2) If so, can I solve it by simply setting it to null?
> >
> > That is my reading and, I think, that was our intention. If set to null, that
> means that the implementation makes the 'pairing' between the metadata and
> the data itself which, as far as I can see, is exactly what you do.
> 
> As I said, I don’t think so. If it’s set to null, it is interpreted as an empty string,
> which is a relative URL. However, this should just issue a warning. Note,
> however, that if the CSV and the metadata are both available at the same URL
> subject to content-negotiation, this would be valid.

I'm afraid you're losing me here: In the specification of the tabular data model, particularly §5 (locating metadata) [2], content negotiation is not listed as a method to find metadata for a csv file.

> But, if you’re downloading
> the CSV and it has no location, there would be no way for the metadata to
> locate it anyway.

One option for the user could be to copy the download URL and paste it into an application that downloads the file, locates the metadata through the methods listed in §5 and processes it.
 
> One thing which would be good within the spec, if not within an existing
> implementation, is to set the Location or Content-Location header to be the
> same as the metadata. A client which is aware of this would see that the
> location of the CSV was the same as the metadata referenced using the Link
> header and consider it compatible.

From my understanding this would break the http contract for Location and Content-Location. In RFC 7231, §3.1.4.2 Content-Location [3] specifies:
[[
The "Content-Location" header field references a URI that can be used
as an identifier for a specific resource corresponding to the
representation in this message's payload.
]]
which to me means that it references the location of the resource I just downloaded, not of its metadata.

For the Location header, RFC 7231, §7.1.2 [4] only mentions its use in the context of 201 (Created) and 3XX (Redirection) response codes.

I'm not saying it won't work, but it would at least help me if you could elaborate a bit on how this would work and also point me to the appropriate text of the tabular data specifications.

> In any case, it’s an issue of compatibility for the purpose of generating
> warnings or doing validation, which should not affect an actual transformation,
> but it would be nice if there were no warnings.
> 
> >> And one further question regarding dialect:
> >>
> >> The dialect is an optional property in the table description. From my
> understanding, however, the dialect has major impact on the processing of the
> table. In the tabular format definition, core annotations are those that have
> impact on processor behaviour [3]. Does that mean, that dialect should be a
> core annotation or is that solved by defining default values for the dialect?
> >
> > First of all, the dialect is optional. Furthermore, the dialect only provides
> hints; the parsing algorithm in the model document[1] is non normative. In
> other words, if your processor produces an annotated data model, that is fine;
> how the processor gets there, so to say, is not something these
> recommendations control…
> 
> Yes, I don’t believe dialect is considered a core annotation, as that describes
> the annotated table, rather than the mechanisms used to generate the
> annotated table, which the dialect is used for. As Ivan says, it is really just a
> processing hint.

OK, that I understand.

[1] http://www.w3.org/TR/tabular-data-model/#processing-tables

[2] http://www.w3.org/TR/tabular-data-model/#locating-metadata

[3] http://tools.ietf.org/html/rfc7231#section-3.1.4.2

[4]http://tools.ietf.org/html/rfc7231#section-7.1.2


Thanks,

Lars
Received on Thursday, 10 December 2015 10:01:54 UTC