Re: What to do when "primary key" cell values are blank

On 06/06/14 11:47, Tandy, Jeremy wrote:
> Hi Andy
>
> ... so it looks like I've got confused in my terminology:
>
> """\"The\" primary key is different in each pass.  The note in R-PrimaryKey does not meet our experiences."""
>
> ... and ...
>
> """\"Primary\" is being overloaded between uniquely identifying a row (structural to CSV files), and uniquely identifying an entity (modelling).  In denormalised data, entities might get repeated on different rows."""
>
> I've clearly been thinking about the "modelling" case not the "structural" case. Can you help me clarify with some suggested alternative text?
>

R-PrimaryKey seems to take a design position and I think there are 
alternatives depending on the data and intent.

Maybe drop these 2 items that seem to me to be one specific choice that 
is not always the right one for all conversions:

----
Where a row contains a primary key cell that is blank or empty, that row 
shall be ignored.
----

because an alternative approach is to generate a primary key anyway 
(e.g. UUID based or based on row number).  This may be patched up later 
or not.  Skipping looses the information.



I think data is as clean as this seems to see it:
----
Note

Assumption that a row within a CSV file describes a single entity for 
which a primary key can be assigned.
----

In the hierarchy extraction example, there is a deduced identifier for 
the "11-1011.03" row could induce another triple subject.

soc:11-1011.00 skos:narrower soc:11-1011.03 .

(using :narrower, not :broader)

In the Land registry example, a transaction row has the address on it 
but the address can be used multiple places.  There are two entities in 
the row (imagine a conversion that just extracted the addresses).

In order to share the address, the subject URI for an address is a hash 
of its parts and in the RDF is a separate entity to the transaction 
record.  That's it's "primary key" - not the transaction's "primary key".

	Andy

> Thanks in anticipation.
>
> Jeremy
>
>
>
>> -----Original Message-----
>> From: Andy Seaborne [mailto:andy@apache.org]
>> Sent: 06 June 2014 10:23
>> To: public-csv-wg@w3.org
>> Subject: Re: What to do when "primary key" cell values are blank
>>
>> On 06/06/14 09:53, Tandy, Jeremy wrote:
>>> Hi - when putting together Use Case #24 - Expressing a hierarchy
>> within occupational listings [1] I was considering how primary key
>> behaviour might work. In this use case, there are four different types
>> of entity described in a single CSV file. I inferred that we might
>> apply four different templates to pull out the relevant contents and
>> transform into RDF. A given row describes _one of_ the types of entity,
>> meaning that the primary key column asserted, say, for extracting "SOC
>> Major Group" concepts will often be blank.
>>>
>>> I have stated in the use case that:
>>>> Where the value in the designated primary key column is blank, the
>> row is ignored.
>>>
>>> I have also added this constraint to the primary key requirement [2].
>>>
>>> Please advise is this is inappropriate!
>>
>> We use template conversion - we often run multiple templates on the
>> same CSV, essentially extracting different kinds of entity on each
>> pass.
>> "The" primary key is different in each pass.  The note in R-PrimaryKey
>> does not meet our experiences.
>>
>> JeniT's condition extract is an example where it might be done as a
>> pass to generate the skos:broader separately from the "code rdfs:label
>> ....".
>>
>> "Primary" is being overloaded between uniquely identifying a row
>> (structural to CSV files), and uniquely identifying an entity
>> (modelling).  In denormalised data, entities might get repeated on
>> different rows.
>>
>> 	Andy
>>
>>>
>>> Regards, Jeremy
>>>
>>>
>>> [1]
>>> http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
>> ExpressingHie
>>> rarchyWithinOccupationalListings [2]
>>> http://w3c.github.io/csvw/use-cases-and-requirements/#R-PrimaryKey
>>>
>>
>

Received on Friday, 6 June 2014 11:15:40 UTC