- From: Gregg Kellogg <gregg@greggkellogg.net>
- Date: Wed, 11 Mar 2015 14:11:10 -0700
- To: Ed Summers <ehs@pobox.com>
- Cc: Jeni Tennison <jeni@jenitennison.com>, Ivan Herman <ivan@w3.org>, Dan Brickley <danbri@danbri.org>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>, "Ingram, William A" <wingram2@illinois.edu>
Hi Ed, > On Mar 11, 2015, at 1:12 PM, Ed Summers <ehs@pobox.com> wrote: > > Thanks for your email Jeni. I appreciate you all talking about this at the f2f. > >> On Mar 10, 2015, at 4:23 PM, Jeni Tennison <jeni@jenitennison.com> wrote: >> >> We’ve tried to simplify the merge algorithm quite a bit since the F2F. I think the only other option wrt merging would require us to limit the types of tabular data that could be processed (ie assume that the only embedded metadata is column titles) and leave unspecified the ability to override processing by the browser, both of which would be possible to do but are (from my perspective) a pretty high price. > > Is the merging metadata section of the Metadata Vocabularies for Tabular Data [1] the best place to look for the latest algorithm? Yes, that algorithm is pretty stable at this point. However, we just resolved issue #314 [3] which reduces the search for metadata to merge. It is now basically the embedded metadata along with the first metadata file found, or that started with. For the embedded metadata we've described, this amounts to merging columns and validating against any titles defined for those columns within metadata. Other standards may specify more complicated embedded metadata which could still invoke the recursive nature of the merge algorithm. > Does CSVW define any types of embedded metadata for a CSV other than headings/column titles? It seems reasonable to require this level of merging, but the merging metadata files seems to add a lot of complexity to me, which could impair adoption. It's been important for us to separate the way in which metadata is found from the process in which it is merged to allow for other standards such as XHL, which embeds additional semantics inside of CSV headers (sorry, no reference, Jeni mentioned it and may have one). > If someone wants to redefine the semantics of a CSV file is it too much to ask them publish their own CSVW metadata file that points at the original CSV? I suspect I’m not fully understanding this use case. Was there a specific use case [2] that will help me understand better? Agreed that this should be spelled out in the use cases. At this point, for the embedded metadata we've described, merging is important to validate the metadata against the actual data so that a transformation does not emit garbage when the columns don't match up. The UCR does describe validation and other relevant requirements: R-CsvValidation R-CanonicalMappingInLieuOfAnnotation R-CommentLines Gregg > //Ed > > [1] https://w3c.github.io/csvw/metadata/ > [2] http://www.w3.org/TR/2014/WD-csvw-ucr-20140327/ > >> >> Cheers, >> >> Jeni >> -- >> Jeni Tennison >> http://www.jenitennison.com/ >> >> On 10 March 2015 at 18:27:21, Ed Summers (ehs@pobox.com) wrote: >>> Just out of curiosity, has there ever been any hushed talk about removing metadata merging, >>> or not making it a MUST? >>> >>>> On Mar 10, 2015, at 11:59 AM, Jeni Tennison wrote: >>>> >>>> Hi Bill, >>>> >>>> That sounds great! >>>> >>>> Our goal is to get new drafts out by the end of the month, and we’d be hopeful that there >>> wouldn’t be many changes between then and Recommendation. The “deadline” for implementations >>> will be July I think, so there’s time to get everything working. >>>> >>>> The implementation needs to be conformant to the specs, which means they need to do everything >>> that’s listed as a MUST, and that includes merging metadata… >>>> >>>> Have you got any tests that you can contribute into the test suite? >>>> >>>> Cheers, >>>> >>>> Jeni >>>> -- >>>> Jeni Tennison >>>> http://www.jenitennison.com/ >>> >> >> >
Received on Wednesday, 11 March 2015 21:11:39 UTC