Re: Column merging is not clear... from Ivan Herman on 2015-01-31 (public-csv-wg@w3.org from January 2015)

From: Ivan Herman <ivan@w3.org>
Date: Sat, 31 Jan 2015 11:51:40 +0100
To: Gregg Kellogg <gregg@greggkellogg.net>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <3F576783-6C87-4B4A-AA8D-1EE541E25177@w3.org>
Just a first reaction (maybe we do have to turn into an issue to get this properly traced...) I do prefer the second version that takes an exception for the language.

However... I have spotted another issue. The _property value_ for a natural language property says:

[[[
 • if the metadata value is a string, that string
 • if the metadata value is an array, the strings in that array
 • if the metadata value is an object, the string or strings that are the value of the property of that object whose name is value of the lang inherited property on that description, or und if no lang is defined.
]]]

(We know that 'lang' has to be changed against the value of '@language', but that is another matter). So let us consider:

metadata A

{
  "@context" : { "@language": "en"}
  ...
  "tableSchema" :
     "columns" : [
        {
           "title" : { "en" : "My title" }
...
}

metadata B:

{
  "@context" : { "@language": "fr"}
  ...
  "tableSchema" :
     "columns" : [
        {
           "title" : { "en" : "My title", "fr" : "Mon titre" }
...
}


I think we can agree that we want these two to match. However, does the property value for 'title' in B include "My title"? It is not really clear from the definition of the property value. I presume the intention is that the property value for B is

["My title", "Mon titre"]

maybe it should be

[{ "en" : "My title" }, { "fr" : "Mon titre }]

ie, some sort of a canonical value...

Maybe the definition of property value should be something like:

[[[
The _property value_ for natural language property is an array of objects, each object having a single key/value pair of the form

   { [language code] : [String] }

The objects are created from the original definition as follows:

  * if the metadata value is a string, the language code in the resulting object is either the value of "@langauge", if it exists, or "und" otherwise
  * if the metadata value is an array, the language code for each resulting objects is either the value of "@langauge", if it exists, or "und" otherwise
  * if the metadata value is an object, the array are the constituent key/value pairs, after possibly flattening values that are themselves arrays with the common key
]]]


(this obviously needs refinement). With this definition, the second alternative below seems to work well.

WDYT?

Ivan


> On 30 Jan 2015, at 20:35 , Gregg Kellogg <gregg@greggkellogg.net> wrote:
> 
>> On Jan 30, 2015, at 12:42 AM, Ivan Herman <ivan@w3.org> wrote:
>> 
>> Gregg,
>> 
>> I do not want to add this as an issue, because it may just be my bad understanding. Here is what the (new) document says on merging columns:
>> 
>> [[[
>> When an array of column descriptions B is merged into an original array of column descriptions A, each column description within B is combined into the original array A by:
>> 
>>  • if there is a column description at the same index within A and that column description has the same name, the column description from B is merged into the matching column description in A
>>  • otherwise, if there is a column description at the same index within A and that column description has a title, is also in A, and the column default language is the same in both A and B, the column description from B is imported into the matching column description in A
>>  • otherwise, if there is no column description at the same index within A, then the column description is taken from that index of B
>>  • otherwise, the column description is ignored. A validator must issue a warning if such a column description is encountered.
>> ]]]
>> 
>> I do not really understand the second entry, and I wonder whether there is a misspelling. What does 'is also in A' means? Or should that be 'is also in B', meaning that the same title should appear on both sides? What happens if the title is an array (which can happen)? Does it mean that there should be at least one agreement in a title? Also, if A says:
> 
> Might be better worded as the following:
> 
> [[[
> * otherwise, if there is a column description at the same index within A and that column description has a title in _property value_ which is also in B, considering the language of each title, the column description from B is imported into the matching column description in A.
> ]]]
> 
> Basically, A and B match if they share a title, considering the language of each title in A and B.
> 
>> {
>> "@context" : { "@language" : "en" },
>> "tableSchema" :
>>    "columns" : [
>>       {
>>          "title" : "my Title"
>>       }
>> 
>> and B says
>> 
>> {
>> "tableSchema" :
>>    "columns" : [
>>       {
>>          "title" : "my Title",
>>          "name"  : "my-title"
>>       }
>> 
>> according to these rules you cannot merge the two, because one of the two has a language tag, the other does not. Is it what we want?
> 
> No, they don’t match as currently defined, although we might make an exception if the language is undefined.
> 
> This page has some interesting word comparisons: http://edl.ecml.at/LanguageFun/Sameworddifferentmeaning/tabid/3103/language/en-GB/Default.aspx.
> 
> For example, “Bad” means different things in many germanic languages and English, so “bad”@de != “bad”@en. But would we say that “bad”@en” == “bad”^^xsd:string (SPARQL says no)? If we did, then we could simplify the creation of embedded metadata by not needing to use @language and `lang` in the extracted metadata. In this case, the wording might be the following:
> 
> [[[
> * otherwise, if there is a column description at the same index within A and that column description has a title in _property value_ which is also in B, considering the language of each title where an undefined language value matches a value in any other language, the column description from B is imported into the matching column description in A.
> ]]]
> 
> As always, suggestions on improving the description to make it less cryptic or more accurate are welcome.
> 
> Gregg
> 
>> I think some clarifications may be necessary...
>> 
>> Ivan
>> 
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704


----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Saturday, 31 January 2015 10:51:49 UTC