Re: Using schema:Enumeration instances to define valid values from Gregg Kellogg on 2015-10-12 (public-csv-wg@w3.org from October 2015)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Mon, 12 Oct 2015 16:22:43 -0700
To: Colin Maudry <colin@maudry.com>
Cc: public-csv-wg@w3.org, Axel Haustant <axel.haustant@data.gouv.fr>
Message-Id: <CF2E6301-5518-4F1D-8233-F3F6ED88E8F3@greggkellogg.net>
> On Oct 12, 2015, at 2:02 PM, Colin Maudry <colin@maudry.com> wrote:
> 
> Hi Gregg, 
> 
> Thanks again for the quick reply.
> 
> I'm not sure I understand well what you suggest. You tell me to checkout Example 27 [1], but there I don't see a list of controlled values, only the schema of the table that contains this list.

I pointed you at an example which uses foreign key references, which define a controlled vocabulary. In this case, the countries.csv file provides a controlled vocabulary for country codes, which are used in country_slice.csv. The FK relationship ensure that uses of “countryRef” in country_slice.csv are valued against the defined countryCode entries in countries.csv.

> To circumvent this issue, you add a property schema:valueReference (I guess it could be any property of our choice) to the Table, with a list of URIs. Finally, you create a schema for an imaginary table and its sole column "col", and a URI pattern that matches with the URIs defined above.

Not circumventing an issue, simply adding the other information, using an arbitrary schema.org <http://schema.org/> property, to define the controlled vocabulary for RDF purposes. It’s not necessary for the mechanism to work, only to add your vocabulary entries.

> I have the feeling that the formal relationship has holes, between the table that I want to map from (some data with foreign keys) and the actual controlled values. Or did you imply these values were in a CSV?

Yes, the controlled values are in countries.csv.

Gregg

> In the end, we need somewhere a list of controlled values (in the form of strings) to look at. They could be in a CSV, with a schema telling us in which column to look for controlled values, which is a solution we will look at, but we hoped we could store them in JSON-LD.
> 
> Thanks for your help,
> Colin Maudry
> 
> 
> [1] http://www.w3.org/TR/2015/CR-tabular-metadata-20150716/#foreign-key-reference-between-tables <http://www.w3.org/TR/2015/CR-tabular-metadata-20150716/#foreign-key-reference-between-tables>
> 
> On 12/10/2015 18:38, Gregg Kellogg wrote:
>>> On Oct 12, 2015, at 6:57 AM, Colin Maudry <colin@maudry.com> <mailto:colin@maudry.com> wrote:
>>> 
>>> Hello,
>>> 
>>> For a given column, we would like to be able to define a fixed list of values that this column is supposed to contain. In SQL terms that's an ENUM type.
>>> 
>>> One of the solutions offered here [1] is to use the "format" property, followed by a regex:
>>> 
>>> "format" : "value1|value2|value3"
>>> 
>>> However, we see several problems with this solution:
>>> 
>>>  - it's not easy to reuse somewhere else as a reusable object
>>>  - the value of "format" requires parsing
>>>  - we can't document the values with comments
>>> 
>>> A potential and cleaner solution would be to create schema:Enumeration objects. Problem: I don't know how to connect the csvw:Column object with the schema:Enumeration [2] object. Any idea?
>> You might consider the use of a foreign key constraint against another table, for which output may be suppressed. Check out example 27 in the csv-metadata for an example [1].
>> 
>> Following your suggestion, if the column value can easily be turned into a URI using valueUrl, then you could create something like `ex:Male` from the column value “Male” by setting valueUrl to something like “http://example.org/ <http://example.org/>{col}” where “col” is the name of the column containing the values you want to map. You could add an annotation to define `ex:Genders` using a common property. This might look something like the following:
>> 
>> {
>>   “@type”: “Table”,
>>   “schema:valueReference”: [{
>>     “@id”: “http://example.org/Genders” <http://example.org/Genders%E2%80%9D>,
>>     “rdfs:subClassOf”: “schema:Enumeration"
>>   }, {
>>     “@id”: “http://example.org/Female” <http://example.org/Female%E2%80%9D>,
>>     “@type”: "http://example.org/Genders" <http://example.org/Genders>
>>   }, {
>>     “@id”: “http://example.org/Male” <http://example.org/Male%E2%80%9D>,
>>     “@type”: "http://example.org/Genders" <http://example.org/Genders>
>>   }, {
>>     “@id”: “http://example.org/Other” <http://example.org/Other%E2%80%9D>,
>>     “@type”: "http://example.org/Genders" <http://example.org/Genders>
>>   }],
>>   “tableSchema”: {
>>     “columns”: [{“name”: “col”, “valueUrl”: “http://example.org/ <http://example.org/>{col}”}]
>>   }
>> }
>> 
>> You could even combine the two, so that a validator would ensure that only URIs in the target vocabulary were used.
>> 
>>> The thread mentioned above [1] seems to end up recommending using XSD schema files to declare enumerations, but I'm not comfortable with doing so outside of the JSON-LD/RDF realm. I think it would make the validation script more complex because less web-friendly. I'd rather declare them in a separate JSON-LD file.
>>> 
>>> FYI, values of an enumeration and their schema:Enumeration object are expressed this way:
>>> 
>>> ex:Genders rdfs:subClassOf schema:Enumeration .
>>> ex:Female rdf:type ex:Genders .
>>> ex:Male rdf:type ex:Genders .
>>> ex:Other rdf:type ex:Genders .
>> Gregg Kellogg
>> 
>> [3] http://w3c.github.io/csvw/metadata/#foreign-key-reference-between-tables <http://w3c.github.io/csvw/metadata/#foreign-key-reference-between-tables>
>> 
>>> [1] https://github.com/w3c/csvw/issues/223 <https://github.com/w3c/csvw/issues/223>
>>> [2] http://schema.org/Enumeration <http://schema.org/Enumeration>
>>> 
>>> Thanks!
>>> Colin
>
Received on Monday, 12 October 2015 23:21:57 UTC