Re: Column merging is not clear... from Gregg Kellogg on 2015-01-30 (public-csv-wg@w3.org from January 2015)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Fri, 30 Jan 2015 11:35:12 -0800
To: Ivan Herman <ivan@w3.org>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <55E484D0-74AF-4795-BDE7-D21CF8D7F574@greggkellogg.net>
> On Jan 30, 2015, at 12:42 AM, Ivan Herman <ivan@w3.org> wrote:
> 
> Gregg,
> 
> I do not want to add this as an issue, because it may just be my bad understanding. Here is what the (new) document says on merging columns:
> 
> [[[
> When an array of column descriptions B is merged into an original array of column descriptions A, each column description within B is combined into the original array A by:
> 
>  • if there is a column description at the same index within A and that column description has the same name, the column description from B is merged into the matching column description in A
>  • otherwise, if there is a column description at the same index within A and that column description has a title, is also in A, and the column default language is the same in both A and B, the column description from B is imported into the matching column description in A
>  • otherwise, if there is no column description at the same index within A, then the column description is taken from that index of B
>  • otherwise, the column description is ignored. A validator must issue a warning if such a column description is encountered.
> ]]]
> 
> I do not really understand the second entry, and I wonder whether there is a misspelling. What does 'is also in A' means? Or should that be 'is also in B', meaning that the same title should appear on both sides? What happens if the title is an array (which can happen)? Does it mean that there should be at least one agreement in a title? Also, if A says:

Might be better worded as the following:

[[[
* otherwise, if there is a column description at the same index within A and that column description has a title in _property value_ which is also in B, considering the language of each title, the column description from B is imported into the matching column description in A.
]]]

Basically, A and B match if they share a title, considering the language of each title in A and B.

> {
>  "@context" : { "@language" : "en" },
>  "tableSchema" :
>     "columns" : [
>        {
>           "title" : "my Title"
>        }
> 
> and B says
> 
> {
>  "tableSchema" :
>     "columns" : [
>        {
>           "title" : "my Title",
>           "name"  : "my-title"
>        }
> 
> according to these rules you cannot merge the two, because one of the two has a language tag, the other does not. Is it what we want?

No, they don’t match as currently defined, although we might make an exception if the language is undefined.

This page has some interesting word comparisons: http://edl.ecml.at/LanguageFun/Sameworddifferentmeaning/tabid/3103/language/en-GB/Default.aspx.

For example, “Bad” means different things in many germanic languages and English, so “bad”@de != “bad”@en. But would we say that “bad”@en” == “bad”^^xsd:string (SPARQL says no)? If we did, then we could simplify the creation of embedded metadata by not needing to use @language and `lang` in the extracted metadata. In this case, the wording might be the following:

[[[
* otherwise, if there is a column description at the same index within A and that column description has a title in _property value_ which is also in B, considering the language of each title where an undefined language value matches a value in any other language, the column description from B is imported into the matching column description in A.
]]]

As always, suggestions on improving the description to make it less cryptic or more accurate are welcome.

Gregg

> I think some clarifications may be necessary...
> 
> Ivan
> 
> 
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
> 
> 
> 
>
Received on Friday, 30 January 2015 19:35:41 UTC