Re: Simplifying assumptions and suggested issue resolutions from Ivan Herman on 2015-01-23 (public-csv-wg@w3.org from January 2015)

From: Ivan Herman <ivan@w3.org>
Date: Fri, 23 Jan 2015 10:37:27 +0100
To: Gregg Kellogg <gregg@greggkellogg.net>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <2081BBFD-79ED-486E-A906-DB6F07C096A3@w3.org>
> On 22 Jan 2015, at 21:42 , Gregg Kellogg <gregg@greggkellogg.net> wrote:
> 
>> On Jan 22, 2015, at 3:29 AM, Ivan Herman <ivan@w3.org> wrote:
>> 
>> Wow:-)
> 
> It would be useful for other members to +1/-1 or abstain on these issues so we can be resolved or prioritized. Perhaps the chairs can provide some guidance?
> 

[snip]

>> 
>>> 
>>> #93 "metadata and mapped data conflation" – Resolve to continue to use the #table fragment identifier on the table metadata @id to make these distinct. (Also, if it exists, the #tablegroup fragment on the TableGroup @id)
>> 
>> Not sure about that. I believe there are still some confusions around the exact usage of @id. Indeed, what happens if @base is not explicitly defined?
> 
> If @base is not defined, it defaults to the location of the Metadata document, that's based on JSON-LD semantics, but is consistent with the use of @base in Turtle, and <base> in HTML+RDFa. It is, however, a question of what the base of the merged metadata is; I would say that it is either the @base of the merged metadata, or that of the CSV (from which the embedded metadata is extracted). I'll clarify this in a commit to PR #169.
> 

I think that should be a new (or existing) issue. Maybe...


>> What is then the base for relative URI-s (eg when generating RDF?). These are all related...
> 
> Ditto.
> 
>> That being said, this is issue #91 but is left open...

... that is the one!

[skip]

>> 
>>> 
>>> #128 "language term ambiguous" – Remove alias of xsd:language
>> 
>> I must admit I do not understand the issue. What is the problem in having "datatype":"language" alongside with the term "language"?
> 
> The issue is that we define a "language" property, but we also say that all terms from xsd are imported, for example "anyUri" is a term which is equivalent to "xsd:anyUri". Similarly, this would imply that "language" is a term that expands to "xsd:language", but this would be inconsistent. I was proposing that we make an exception for "xsd:language" and stick with the "csvw:language" term association.
> 

I will answer on the issue to be properly recorded


>>> #136 "confusion on arrays and atomic values" – Resolve by agreeing to my merge language in PR #169 to merge all metadata sources.
>> 
>> To make it clearer to others, we are talking about
>> 
>> http://htmlpreview.github.io/?https://raw.githubusercontent.com/w3c/csvw/merge-semantics/metadata/index.html#merging-metadata
>> 
>> I think your language does not handle this yet. In the second block of bullet items the second and third items still talk about arrays and atomic values as separate things, whereas the document defines atomic values as numbers, booleans, strings, or arrays.
> 
> Indeed, we may consider the difference between an atomic property and an atomic value, where an atomic property takes an atomic value or an array of atomic values. However, the only case of an atomic property which accepts an array of values is "null", and I don't see a good reason for that, so we may just restrict "null" to taking a single atomic value, and re-define an atomic value to be numbers, booleans or strings.

I leave this to the editors, to be honest.


> 
>>> #142 "value space for common properties" – Resolve to allow just the six forms of literals and URIs described in that issue, and not more involved JSON-LD content.
>>> 
>> 
>> I agree, although I still have to think and answer to your comment on the JSON LD output.
> 
> That's an issue for csv2json, but I think the simplest thing is simply to copy those properties from the metadata into the JSON document.

Moving back to the issue list comment on that one. Yes, it is a separate csv2json issue, maybe worth closing #142 and opening a new one on this.

> 
>>> #144 "Metadata merge order" – Resolve by agreeing to my merge language in PR #169 to merge all metadata sources.
>> 
>> I agree in general, with one editorial issue. The procedure you describe may end with a set of column descriptions without a 'name'. Maybe we can say that, at the end of the overall merge, if the 'name' is not set for a column, its value is set to the value of 'title' (or should it be normalized due to the template requirements?). That would mean the rest of the specs, notably the transformations, simpler...
> 
> Yes, I've made comments elsewhere, but didn't include it in this PR; I'll do so now. My interpretation is that "name" is an optional property. When accessing the value of "name", if is not set explicitly, it is taken from the first value from "title" (in the appropriate language), if it exists, and "_col=N" otherwise. "predicateUrl", then defaults to "#" + URI.encode(name). (interesting aside, :_col=1" is not a valid PNAME, and can't be encoded as such in Turtle, we may thing of an alternative representation, if that matters).

Let have this as part of the document somewhere and we can then look at it separately.

There may be a requirement to set the defaults to be compatible with the R2RML Direct Mapping, for example. We should look at that.


> 
>>> #145 "overal metadata" – Resolve by agreeing to my merge language in PR #169 to merge all metadata sources.
>> 
>> Agree
>> 
>>> 
>>> #150 "Merge and common dc properties" – Resolve by agreeing to my merge language in PR #169 to merge all metadata sources.
>> 
>> I am not sure. My original issue was that the 'title' is not defined on the table level, only 'dc:title' is used. How does your PR solve this?
> 
> "dc:" properties, or any other common properties would be handled as suggested in #142. "title" is different, as there is a specific term definition: {"title": {"@id": "csvw:title", "@container": "@language"}, which is how the language variations are handled (with the small exception for "und").
> 


Yes but (only) point is that

{
   "@id" : "bla bla",
   "title" : "my title",
   "schema" : {
      ...
   }
}

is not valid, because "title" cannot appear on the table level. That is my issue.


> The merge section defines handling of common properties, such as 'dc:title":
> 
> [[[
> • If the property is a common property, the result is an array containing values from A followed by values from B not already in A. An array result having a single value uses that value as it's result.
> ]]]
> 
> But, there is a corner case: if @language is defined differently in different merged metadata documents, this could result in a string value of "dc:title", for example, being interpreted differently; this is actually true for all string properties, not just common properties. I would say that Merge should be updated to say that common property values which include strings are expanded to the explicit version. For example, "dc:title": "Tree operations" in a metadata document with @language: en, would be changed to "dc:title": {"@value": "Tree operations", "@language": "en"} before merging. This could potentially be undone in the final step, but it's probably simplest to leave it in merged form, and deal with the JSON representation in that document. (IMO, this should be consistent with how JSON-LD Compact would deal with it).

I would hate to see anything like this. What this means is that the combined metadata would look much more complicated than needed, mainly for those who do not want to care about the JSON-LD syntax level...

> 
> An alternative would be to say that it's invalid to merge together metadata documents that don't have the same @language definition, which would keep the representation of all properties in merged metadata simple strings, rather than expanded JSON-LD type literals.
> 
> Best keep this issue open and discuss on the next telecon.
> 
>>> #166 "datatype handling and template expansion" – Resolve to use post-conversion values without performing value separation.
>> 
>> Agree.
>> 
>>> 
>>> #169 (PR) "Update merge semantics" – Accept PR but may tweak creation of embedded metadata to be independent of metadata @language.
>> 
>> The issues covered by #169 are fine with me, acceptance of PR is also an editorial issue that I leave up to the editors...
>> 
>>> 
>>> #170 "Need for Core Tabular Data" – Resolve to only use annotated model for conversion as being essentially equivalent given merge model.
>> 
>> I agree in principle, but...
>> 
>> - I think having a clear separation in the syntax document, even if informal, is useful. But all other documents can/should ignore the concept
> 
> I think a non-normative section would be useful in describing this.
>> 
>> - The syntax document must have a section defining the 'default' metadata to a core tabular data.
> 
> I've taken the default metadata document to be {"@type": "TableGroup", "resources": []}, which is then used for extracting embedded metadata. To get the effect of the current "Core Tabular Data" mapping means to use "header: false" in the user-supplied metadata through an option or with the Content-Type header=absent parameter.

See my comment above. The whole issue for the default metadata should be taken care of separately.

Ivan

----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Friday, 23 January 2015 09:37:40 UTC