Re: Simplifying assumptions and suggested issue resolutions from Gregg Kellogg on 2015-01-22 (public-csv-wg@w3.org from January 2015)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Thu, 22 Jan 2015 12:42:58 -0800
To: Ivan Herman <ivan@w3.org>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <1E594BB6-4839-45F2-BBED-2429FED7FE41@greggkellogg.net>
> On Jan 22, 2015, at 3:29 AM, Ivan Herman <ivan@w3.org> wrote:
> 
> Wow:-)

It would be useful for other members to +1/-1 or abstain on these issues so we can be resolved or prioritized. Perhaps the chairs can provide some guidance?

>> On 21 Jan 2015, at 19:24 , Gregg Kellogg <gregg@greggkellogg.net> wrote:
>> 
>> We seem to be having a problem closing out issues on the issue tracker, and the meetings are too sparsely attended (IMO) to make real progress there.
> 
> Yes, I think we do have a problem here.
> 
>> I think the time has come to make some simplifying assumptions that can allow us to move forward on some metadata issues:
> 
> I add my votes below... At some point we should also go through the admin to add a note and close them
> 
>> 
>> #43 "ignore invalid metadata files located through link header" - resolve in favor, and treat like any other metadata files
> 
> Agree
> 
>> 
>> #76 "Metadata Merge I" – resolve by agreeing to my merge language in PR #169 to merge all metadata sources.
>> 
>> #77 "Metadata Merge II" – resolve by agreeing to my merge language in PR #169 to merge all metadata sources.
>> 
>> #78 "Metadata Merge III" – resolve by agreeing to my merge language in PR #169 to merge all metadata sources.
> 
> Yes, can be closed; the new merge algorithm have taken over.
> 
>> 
>> #80 "JSON-LD context" – resolve as a string array of a string followed by an object, where the string MUST be "http://www.w3.org/ns/csvw". The object MAY contain @language and @base and MUST NOT contain any other values.
> 
> "as a string array" -> "as a string or an array" :-)
> 
> Agree
> 
>> 
>> #81 "Should JSON-LD keywords be aliased" – "Resolve not to alias keywords"
>> 
> 
> Agree
> 
>> #86 "schema term ambiguous" – resolve to rename the "schema" property to "tableSchema" to avoid confusion with schema.org
> 
> Agree (but will need editor action)
> 
>> 
>> #93 "metadata and mapped data conflation" – Resolve to continue to use the #table fragment identifier on the table metadata @id to make these distinct. (Also, if it exists, the #tablegroup fragment on the TableGroup @id)
> 
> Not sure about that. I believe there are still some confusions around the exact usage of @id. Indeed, what happens if @base is not explicitly defined?

If @base is not defined, it defaults to the location of the Metadata document, that's based on JSON-LD semantics, but is consistent with the use of @base in Turtle, and <base> in HTML+RDFa. It is, however, a question of what the base of the merged metadata is; I would say that it is either the @base of the merged metadata, or that of the CSV (from which the embedded metadata is extracted). I'll clarify this in a commit to PR #169.

> What is then the base for relative URI-s (eg when generating RDF?). These are all related...

Ditto.

> That being said, this is issue #91 but is left open...
> 
>> 
>> #96 "cell-value URI template" – Resolve as suggested in my last comment on that issue.
> 
> Agree
> 
>> 
>> #98 "row reference" – resolve to always emit a csvw:rownum triple referencing the source row. We may later consider making this optional.
> 
> Yes, this is what we discussed yesterday. Let us go for this now.
> 
>> 
>> #101 "predicateUrl, urlTemplate and default" – See #96
> 
> yes
> 
>> 
>> #128 "language term ambiguous" – Remove alias of xsd:language
> 
> I must admit I do not understand the issue. What is the problem in having "datatype":"language" alongside with the term "language"?

The issue is that we define a "language" property, but we also say that all terms from xsd are imported, for example "anyUri" is a term which is equivalent to "xsd:anyUri". Similarly, this would imply that "language" is a term that expands to "xsd:language", but this would be inconsistent. I was proposing that we make an exception for "xsd:language" and stick with the "csvw:language" term association.

>> #136 "confusion on arrays and atomic values" – Resolve by agreeing to my merge language in PR #169 to merge all metadata sources.
> 
> To make it clearer to others, we are talking about
> 
> http://htmlpreview.github.io/?https://raw.githubusercontent.com/w3c/csvw/merge-semantics/metadata/index.html#merging-metadata
> 
> I think your language does not handle this yet. In the second block of bullet items the second and third items still talk about arrays and atomic values as separate things, whereas the document defines atomic values as numbers, booleans, strings, or arrays.

Indeed, we may consider the difference between an atomic property and an atomic value, where an atomic property takes an atomic value or an array of atomic values. However, the only case of an atomic property which accepts an array of values is "null", and I don't see a good reason for that, so we may just restrict "null" to taking a single atomic value, and re-define an atomic value to be numbers, booleans or strings.

>> #142 "value space for common properties" – Resolve to allow just the six forms of literals and URIs described in that issue, and not more involved JSON-LD content.
>> 
> 
> I agree, although I still have to think and answer to your comment on the JSON LD output.

That's an issue for csv2json, but I think the simplest thing is simply to copy those properties from the metadata into the JSON document.

>> #144 "Metadata merge order" – Resolve by agreeing to my merge language in PR #169 to merge all metadata sources.
> 
> I agree in general, with one editorial issue. The procedure you describe may end with a set of column descriptions without a 'name'. Maybe we can say that, at the end of the overall merge, if the 'name' is not set for a column, its value is set to the value of 'title' (or should it be normalized due to the template requirements?). That would mean the rest of the specs, notably the transformations, simpler...

Yes, I've made comments elsewhere, but didn't include it in this PR; I'll do so now. My interpretation is that "name" is an optional property. When accessing the value of "name", if is not set explicitly, it is taken from the first value from "title" (in the appropriate language), if it exists, and "_col=N" otherwise. "predicateUrl", then defaults to "#" + URI.encode(name). (interesting aside, :_col=1" is not a valid PNAME, and can't be encoded as such in Turtle, we may thing of an alternative representation, if that matters).

>> #145 "overal metadata" – Resolve by agreeing to my merge language in PR #169 to merge all metadata sources.
> 
> Agree
> 
>> 
>> #150 "Merge and common dc properties" – Resolve by agreeing to my merge language in PR #169 to merge all metadata sources.
> 
> I am not sure. My original issue was that the 'title' is not defined on the table level, only 'dc:title' is used. How does your PR solve this?

"dc:" properties, or any other common properties would be handled as suggested in #142. "title" is different, as there is a specific term definition: {"title": {"@id": "csvw:title", "@container": "@language"}, which is how the language variations are handled (with the small exception for "und").

The merge section defines handling of common properties, such as 'dc:title":

[[[
• If the property is a common property, the result is an array containing values from A followed by values from B not already in A. An array result having a single value uses that value as it's result.
]]]

But, there is a corner case: if @language is defined differently in different merged metadata documents, this could result in a string value of "dc:title", for example, being interpreted differently; this is actually true for all string properties, not just common properties. I would say that Merge should be updated to say that common property values which include strings are expanded to the explicit version. For example, "dc:title": "Tree operations" in a metadata document with @language: en, would be changed to "dc:title": {"@value": "Tree operations", "@language": "en"} before merging. This could potentially be undone in the final step, but it's probably simplest to leave it in merged form, and deal with the JSON representation in that document. (IMO, this should be consistent with how JSON-LD Compact would deal with it).

An alternative would be to say that it's invalid to merge together metadata documents that don't have the same @language definition, which would keep the representation of all properties in merged metadata simple strings, rather than expanded JSON-LD type literals.

Best keep this issue open and discuss on the next telecon.

>> #166 "datatype handling and template expansion" – Resolve to use post-conversion values without performing value separation.
> 
> Agree.
> 
>> 
>> #169 (PR) "Update merge semantics" – Accept PR but may tweak creation of embedded metadata to be independent of metadata @language.
> 
> The issues covered by #169 are fine with me, acceptance of PR is also an editorial issue that I leave up to the editors...
> 
>> 
>> #170 "Need for Core Tabular Data" – Resolve to only use annotated model for conversion as being essentially equivalent given merge model.
> 
> I agree in principle, but...
> 
> - I think having a clear separation in the syntax document, even if informal, is useful. But all other documents can/should ignore the concept

I think a non-normative section would be useful in describing this.
> 
> - The syntax document must have a section defining the 'default' metadata to a core tabular data.

I've taken the default metadata document to be {"@type": "TableGroup", "resources": []}, which is then used for extracting embedded metadata. To get the effect of the current "Core Tabular Data" mapping means to use "header: false" in the user-supplied metadata through an option or with the Content-Type header=absent parameter.
>> 
>> #171 (PR) "Specified formats for dates & times" – Accept PR with possible improvements later.
> 
> For the sake of discussion, this is:
> 
> http://htmlpreview.github.io/?https://raw.githubusercontent.com/w3c/csvw/date-time-format/metadata/index.html#parsing-cells
> 
> I am fine with it.
> 
> 
>> 
>> Furthermore, the following issues can simply be closed:
>> 
>> #22 "type vs datatype" - no change
> 
> Agree
> 
>> 
>> #87 "title term ambiguous" – Close with no change.
> 
> Agree
> 
>> 
>> #97 "property name for row relation to table" – Close, as we have settled on csvw:row.
> 
> Agree
> 
>> 
>> #113 "csvw:Table in output conflation"
> 
> Agree, it was a F2F resulution
> 
>> 
>> #116 "Blank node shorthand" – Resolve to close with no change.
> 
> Agree
> 
>> 
>> #149 "@context explicit" – Duplicate of #80, and close the same way.
> 
> Indeed.
> 
>> 
>> 
>> Thats more than enough for one go. Based on discussion, we can either resolve, close, or prioritize for next F2F meeting or telecon discussion.
>> 
>> If we can agree to some of these now, it will take a lot off of our plate, and give us the chance to make some headway between now and the F2F, which I think is important.
>> 
> 
> Thanks for initiating this, Gregg!

Just trying to keep the ball rolling!

Gregg

> Ivan
> 
> 
>> Gregg Kellogg
>> gregg@greggkellogg.net
>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
> 
> 
> 
>
Received on Thursday, 22 January 2015 20:43:28 UTC