- From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
- Date: Fri, 6 Jun 2014 14:45:42 +0000
- To: Andy Seaborne <andy@apache.org>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Hi Andy - Thanks for the feedback. I have now re-written the R-PrimaryKey requirement [1] to focus _only_ on unique identification of _rows_ within a dataset as opposed to the identification of entities described by a given row. I have amended all the use cases where this distinction was muddled such that where I was talking about unique identifiers for the _entity_ I now refer to R-URIMapping to convert the local identifier in the dataset to a globally unique URI. I've dropped all suggestions about skipping rows where the primary key is blank. In Use Case #24 - Expressing a hierarchy within occupational listings [2], I now use the Conditional Processing requirement [3] to skip rows where the "unique identifier" field is blank. I think that this has fixed the confusion in the document. Jeremy [1] http://w3c.github.io/csvw/use-cases-and-requirements/#R-PrimaryKey [2] http://w3c.github.io/csvw/use-cases-and-requirements/#UC-ExpressingHierarchyWithinOccupationalListings [3] http://w3c.github.io/csvw/use-cases-and-requirements/#R-ConditionalProcessingBasedOnCellValues > -----Original Message----- > From: Andy Seaborne [mailto:andy@apache.org] > Sent: 06 June 2014 12:15 > To: Tandy, Jeremy; public-csv-wg@w3.org > Subject: Re: What to do when "primary key" cell values are blank > > On 06/06/14 11:47, Tandy, Jeremy wrote: > > Hi Andy > > > > ... so it looks like I've got confused in my terminology: > > > > """\"The\" primary key is different in each pass. The note in R- > PrimaryKey does not meet our experiences.""" > > > > ... and ... > > > > """\"Primary\" is being overloaded between uniquely identifying a row > (structural to CSV files), and uniquely identifying an entity > (modelling). In denormalised data, entities might get repeated on > different rows.""" > > > > I've clearly been thinking about the "modelling" case not the > "structural" case. Can you help me clarify with some suggested > alternative text? > > > > R-PrimaryKey seems to take a design position and I think there are > alternatives depending on the data and intent. > > Maybe drop these 2 items that seem to me to be one specific choice that > is not always the right one for all conversions: > > ---- > Where a row contains a primary key cell that is blank or empty, that > row shall be ignored. > ---- > > because an alternative approach is to generate a primary key anyway > (e.g. UUID based or based on row number). This may be patched up later > or not. Skipping looses the information. > > > > I think data is as clean as this seems to see it: > ---- > Note > > Assumption that a row within a CSV file describes a single entity for > which a primary key can be assigned. > ---- > > In the hierarchy extraction example, there is a deduced identifier for > the "11-1011.03" row could induce another triple subject. > > soc:11-1011.00 skos:narrower soc:11-1011.03 . > > (using :narrower, not :broader) > > In the Land registry example, a transaction row has the address on it > but the address can be used multiple places. There are two entities in > the row (imagine a conversion that just extracted the addresses). > > In order to share the address, the subject URI for an address is a hash > of its parts and in the RDF is a separate entity to the transaction > record. That's it's "primary key" - not the transaction's "primary > key". > > Andy > > > Thanks in anticipation. > > > > Jeremy > > > > > > > >> -----Original Message----- > >> From: Andy Seaborne [mailto:andy@apache.org] > >> Sent: 06 June 2014 10:23 > >> To: public-csv-wg@w3.org > >> Subject: Re: What to do when "primary key" cell values are blank > >> > >> On 06/06/14 09:53, Tandy, Jeremy wrote: > >>> Hi - when putting together Use Case #24 - Expressing a hierarchy > >> within occupational listings [1] I was considering how primary key > >> behaviour might work. In this use case, there are four different > >> types of entity described in a single CSV file. I inferred that we > >> might apply four different templates to pull out the relevant > >> contents and transform into RDF. A given row describes _one of_ the > >> types of entity, meaning that the primary key column asserted, say, > >> for extracting "SOC Major Group" concepts will often be blank. > >>> > >>> I have stated in the use case that: > >>>> Where the value in the designated primary key column is blank, the > >> row is ignored. > >>> > >>> I have also added this constraint to the primary key requirement > [2]. > >>> > >>> Please advise is this is inappropriate! > >> > >> We use template conversion - we often run multiple templates on the > >> same CSV, essentially extracting different kinds of entity on each > >> pass. > >> "The" primary key is different in each pass. The note in > >> R-PrimaryKey does not meet our experiences. > >> > >> JeniT's condition extract is an example where it might be done as a > >> pass to generate the skos:broader separately from the "code > >> rdfs:label ....". > >> > >> "Primary" is being overloaded between uniquely identifying a row > >> (structural to CSV files), and uniquely identifying an entity > >> (modelling). In denormalised data, entities might get repeated on > >> different rows. > >> > >> Andy > >> > >>> > >>> Regards, Jeremy > >>> > >>> > >>> [1] > >>> http://w3c.github.io/csvw/use-cases-and-requirements/#UC- > >> ExpressingHie > >>> rarchyWithinOccupationalListings [2] > >>> http://w3c.github.io/csvw/use-cases-and-requirements/#R-PrimaryKey > >>> > >> > >
Received on Friday, 6 June 2014 14:46:15 UTC