- From: Andy Seaborne <andy@apache.org>
- Date: Fri, 06 Jun 2014 12:15:09 +0100
- To: "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
On 06/06/14 11:47, Tandy, Jeremy wrote: > Hi Andy > > ... so it looks like I've got confused in my terminology: > > """\"The\" primary key is different in each pass. The note in R-PrimaryKey does not meet our experiences.""" > > ... and ... > > """\"Primary\" is being overloaded between uniquely identifying a row (structural to CSV files), and uniquely identifying an entity (modelling). In denormalised data, entities might get repeated on different rows.""" > > I've clearly been thinking about the "modelling" case not the "structural" case. Can you help me clarify with some suggested alternative text? > R-PrimaryKey seems to take a design position and I think there are alternatives depending on the data and intent. Maybe drop these 2 items that seem to me to be one specific choice that is not always the right one for all conversions: ---- Where a row contains a primary key cell that is blank or empty, that row shall be ignored. ---- because an alternative approach is to generate a primary key anyway (e.g. UUID based or based on row number). This may be patched up later or not. Skipping looses the information. I think data is as clean as this seems to see it: ---- Note Assumption that a row within a CSV file describes a single entity for which a primary key can be assigned. ---- In the hierarchy extraction example, there is a deduced identifier for the "11-1011.03" row could induce another triple subject. soc:11-1011.00 skos:narrower soc:11-1011.03 . (using :narrower, not :broader) In the Land registry example, a transaction row has the address on it but the address can be used multiple places. There are two entities in the row (imagine a conversion that just extracted the addresses). In order to share the address, the subject URI for an address is a hash of its parts and in the RDF is a separate entity to the transaction record. That's it's "primary key" - not the transaction's "primary key". Andy > Thanks in anticipation. > > Jeremy > > > >> -----Original Message----- >> From: Andy Seaborne [mailto:andy@apache.org] >> Sent: 06 June 2014 10:23 >> To: public-csv-wg@w3.org >> Subject: Re: What to do when "primary key" cell values are blank >> >> On 06/06/14 09:53, Tandy, Jeremy wrote: >>> Hi - when putting together Use Case #24 - Expressing a hierarchy >> within occupational listings [1] I was considering how primary key >> behaviour might work. In this use case, there are four different types >> of entity described in a single CSV file. I inferred that we might >> apply four different templates to pull out the relevant contents and >> transform into RDF. A given row describes _one of_ the types of entity, >> meaning that the primary key column asserted, say, for extracting "SOC >> Major Group" concepts will often be blank. >>> >>> I have stated in the use case that: >>>> Where the value in the designated primary key column is blank, the >> row is ignored. >>> >>> I have also added this constraint to the primary key requirement [2]. >>> >>> Please advise is this is inappropriate! >> >> We use template conversion - we often run multiple templates on the >> same CSV, essentially extracting different kinds of entity on each >> pass. >> "The" primary key is different in each pass. The note in R-PrimaryKey >> does not meet our experiences. >> >> JeniT's condition extract is an example where it might be done as a >> pass to generate the skos:broader separately from the "code rdfs:label >> ....". >> >> "Primary" is being overloaded between uniquely identifying a row >> (structural to CSV files), and uniquely identifying an entity >> (modelling). In denormalised data, entities might get repeated on >> different rows. >> >> Andy >> >>> >>> Regards, Jeremy >>> >>> >>> [1] >>> http://w3c.github.io/csvw/use-cases-and-requirements/#UC- >> ExpressingHie >>> rarchyWithinOccupationalListings [2] >>> http://w3c.github.io/csvw/use-cases-and-requirements/#R-PrimaryKey >>> >> >
Received on Friday, 6 June 2014 11:15:40 UTC