Re: Questions on the url property in Table annotation an on dialect being a core property from Gregg Kellogg on 2015-12-10 (public-csv-wg@w3.org from December 2015)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Thu, 10 Dec 2015 10:16:42 -0800
To: "Svensson, Lars" <L.Svensson@dnb.de>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>, Jeni Tennison <jeni@jenitennison.com>, Ivan Herman <ivan@w3.org>
Message-Id: <236845F5-B061-4149-B861-47D2EA76FBB0@greggkellogg.net>
> On Dec 10, 2015, at 2:01 AM, Svensson, Lars <L.Svensson@dnb.de> wrote:
> 
> Gregg,
> 
> On Wednesday, December 09, 2015 6:52 PM, Gregg Kellogg wrote:
> 
>>> On Dec 9, 2015, at 12:52 AM, Ivan Herman <ivan@w3.org> wrote:
>>> 
>>> (Cc-ing to Gregg & Jeni, as an additional ping to get their attention…)
>>> 
>>> Hey Lars,
>>> 
>>> 
>>>> On 4 Dec 2015, at 12:26, Svensson, Lars <L.Svensson@dnb.de> wrote:
>>>> 
>>>> Dear all,
>>>> 
>>>> While reviewing the WG documents (excellent work, large kudoi to the WG!)
>>> 
>>> Thank you!
>>> 
>>>> and thinking of how we could produce compatible data at our place, I
>> stumbled over the url annotation on tables as defined in the metadata
>> vocabulary, §5.4 [1]. The specification says that in the table metadata the url
>> (URI?) is mandatory and should point to the table the table description
>> describes, referring to the definition of url in the tabular data model [2] that
>> says that the value of the url might be null.
>> 
>> Yes, the property must be present, but can be null, in which case it is treated as
>> an empty URL, which is resolve relative to the metadata location. Note that the
>> “url” property is a Link property, which means that if the value supplied is not a
>> string, it is treated as an empty string, and so resolved against the metadata
>> base.
>> 
>>>> At first sight my reading would be that for each table I describe with a table
>> annotation in the metadata document, I MUST have a url property pointing
>> from the metadata document to the described table. If so, that would be a
>> major implementation obstacle at our place.
>> 
>> Yes, the “url” property is required, and it’s value must reference the CSV being
>> used to be considered compatible. If it doesn’t, it is incompatible, which
>> basically means that using it will issue a warning, unless you’re validating, in
>> which case it’s an error. The reason for this is to take steps to be sure the
>> metadata is compatible with the tables being interpreted. IIRC, Jeni was the
>> chief proponent of this, and may have something more to say.
>> 
>>>> Our main use case for producing tabular data is that customers can go to
>> the catalogue, select a number of object descriptions and export those as CSV.
>> When the customer downloads the data, we would provide a Link-Header
>> pointing to the metadata document describing the CSV format. It would,
>> however, be almost impossible to point back from that metadata document to
>> _all_ instances of CSV files ever created (particularly since that would also have
>> privacy concerns, since it would be possible to see what other customers have
>> downloaded).
>> 
>> If your intention is to have one metadata file work, with arbitrary columns
>> selected, you’ll run into other problems, as the expectation is that there is a 1:1
>> relationship between the columns in the CSV and those in the metadata.
> 
> Well, it's one object description per row, but the _columns_ are always the same.
> 
>> However, you might create a metadata document the respond to from the link
>> header compatible with the CSV that is downloaded. It won’t validate as being
>> compatible, but it should be useable for generating RDF or JSON from the
>> result, as long as the column descriptions match those in the CSV file.
> 
> Hmm. Why shouldn't it validate? If I read §6 Processing Tables [1] correctly, I can start by downloading a data file and then rely on the application finding the proper metadata. And as long as the metadata matches the table, it should validate. Or have I misunderstood something?

The data model section 6.2 says that processors MUST ensure that metadata and tabular data file are compatible, as defined in 5.4.3 of the Metadata document. The first statement from her is that they have equivalent normalized url properties. So, if the url properties are not the same, they are not considered compatible, and are so not valid. However, this does not necessarily mean that processing stops.

>>>> This boils down to the following question(s):
>>>> 
>>>> 1) Is my understanding of the use of the url property in the table metadata
>> correct?
>>>> 2) If so, can I solve it by simply setting it to null?
>>> 
>>> That is my reading and, I think, that was our intention. If set to null, that
>> means that the implementation makes the 'pairing' between the metadata and
>> the data itself which, as far as I can see, is exactly what you do.
>> 
>> As I said, I don’t think so. If it’s set to null, it is interpreted as an empty string,
>> which is a relative URL. However, this should just issue a warning. Note,
>> however, that if the CSV and the metadata are both available at the same URL
>> subject to content-negotiation, this would be valid.
> 
> I'm afraid you're losing me here: In the specification of the tabular data model, particularly §5 (locating metadata) [2], content negotiation is not listed as a method to find metadata for a csv file.

You’re right that we don’t say anything explicit about this. However, the Link is typed application/csvm+json, which would be reasonable for a client to use when retrieving the resource referenced by the link, but I can find no spec which recommends this.

>> But, if you’re downloading
>> the CSV and it has no location, there would be no way for the metadata to
>> locate it anyway.
> 
> One option for the user could be to copy the download URL and paste it into an application that downloads the file, locates the metadata through the methods listed in §5 and processes it.

Yes, if you use the URL of the original downloaded file as the URL when comparing with the metadata. You might also imagine an unspecified provision that if the CSV file has no URL, then comparing it with the “url” property of the metadata makes no sense, but the spec does not say anything about this.

>> One thing which would be good within the spec, if not within an existing
>> implementation, is to set the Location or Content-Location header to be the
>> same as the metadata. A client which is aware of this would see that the
>> location of the CSV was the same as the metadata referenced using the Link
>> header and consider it compatible.
> 
> From my understanding this would break the http contract for Location and Content-Location. In RFC 7231, §3.1.4.2 Content-Location [3] specifies:
> [[
> The "Content-Location" header field references a URI that can be used
> as an identifier for a specific resource corresponding to the
> representation in this message's payload.
> ]]
> which to me means that it references the location of the resource I just downloaded, not of its metadata.

The proper header would be Conent-Location to describe the URL of the resource downloaded; the Link header references the metadata using it’s own URL. Based on questionable reasoning above, they might share the same URL, but if one were retrieved using Accept: text/csv, and the other using Accept: application/csvm+json (or application/ld+json), that they could result in different representations. I’m certainly reaching here, but I don’t think it’s inconsistent with the spec.

> For the Location header, RFC 7231, §7.1.2 [4] only mentions its use in the context of 201 (Created) and 3XX (Redirection) response codes.
> 
> I'm not saying it won't work, but it would at least help me if you could elaborate a bit on how this would work and also point me to the appropriate text of the tabular data specifications.

As I said, if you download the URL and get a CSV with a link header with type application/csvm+json at the same URL, you might reasonable use that content-type when downloading the metadata and so get a different representation. This is not specified, but is also not inconsistent with the spec. If we had considered this option, we might have added a suggestion that the metadata be retrieved using application/csvm+json and application/ld+json.

I might also point you to a draft Note on the use of HTML for containing both metadata and tabular data: http://w3c.github.io/csvw/html-note/. If data were encoded in HTML tables, then this would provide you a mechanism for including both metadata and tabular data in the same resource and be another way of avoiding your issue. When published as a WG Note, it does not have the force of recommendation, but is expected to be quite popular. I’d certainly like to know if this might satisfy some of your issues.

Gregg

>> In any case, it’s an issue of compatibility for the purpose of generating
>> warnings or doing validation, which should not affect an actual transformation,
>> but it would be nice if there were no warnings.
>> 
>>>> And one further question regarding dialect:
>>>> 
>>>> The dialect is an optional property in the table description. From my
>> understanding, however, the dialect has major impact on the processing of the
>> table. In the tabular format definition, core annotations are those that have
>> impact on processor behaviour [3]. Does that mean, that dialect should be a
>> core annotation or is that solved by defining default values for the dialect?
>>> 
>>> First of all, the dialect is optional. Furthermore, the dialect only provides
>> hints; the parsing algorithm in the model document[1] is non normative. In
>> other words, if your processor produces an annotated data model, that is fine;
>> how the processor gets there, so to say, is not something these
>> recommendations control…
>> 
>> Yes, I don’t believe dialect is considered a core annotation, as that describes
>> the annotated table, rather than the mechanisms used to generate the
>> annotated table, which the dialect is used for. As Ivan says, it is really just a
>> processing hint.
> 
> OK, that I understand.
> 
> [1] http://www.w3.org/TR/tabular-data-model/#processing-tables
> [2] http://www.w3.org/TR/tabular-data-model/#locating-metadata
> [3] http://tools.ietf.org/html/rfc7231#section-3.1.4.2
> [4]http://tools.ietf.org/html/rfc7231#section-7.1.2
> 
> Thanks,
> 
> Lars
Received on Thursday, 10 December 2015 18:17:18 UTC