- From: Gregg Kellogg <gregg@greggkellogg.net>
- Date: Wed, 21 May 2014 16:36:20 -0700
- To: Andy Seaborne <andy@apache.org>
- Cc: public-csv-wg@w3.org
On May 19, 2014, at 6:24 AM, Andy Seaborne <andy@apache.org> wrote: > On 18/05/14 17:59, Ivan Herman wrote: >> >> Andy, Gregg & all >> >> on the call on Wednesday I suggested that, by putting the general >> description of the conversion into (for now) the metadata document, it >> may be necessary to restructure the current CSV2RDF document. I tried to >> draft what a structure & algorithm would look like; I give you here what >> I jotted down. > > Thank you. > >> Note that I rely on the fact that the template part would >> migrate as a general mechanism somewhere; there seems to be an agreement >> on this on the mailing list. I refer to it as a 'template' attribute in >> the metadata. >> >> I think the changes should be in section 3 (see below). Following the >> flow in the metadata document I put the 'table level metadata' into a >> subsection of section 3, meaning section 4 in the current document can >> disappear. I would remove section 5, because that should become a more >> general topic, not bound to RDF; the minimal mapping is also part of >> section 3. I am not sure about current section 7 (I think that should >> move elsewhere). Note also that, I believe, this skeleton may be similar >> for XML and JSON, but I did not check that. >> >> I believe is that the three (or more, eventually, with JSON amd XML) >> relevant documents (syntax, metadata, and conversions) should be in >> synchrony, and this before the next publication round... >> >> With that, this is what I had in mind: >> >> [[[ >> 3. Processing Model >> 3.1 Conversion of a core tabular data, or annotated with embedded >> metadata only >> >> The file's URI is also used as a 'namespace' for URI-s in the generated >> triples, by concatenating the URI with '#' and with the string for a >> column name (denoted by ':name' in what follows) >> >> - this case either yields a header for each column; if not, :col1, >> :col2, :col3, ... are defined >> - the generation is done by >> - each row has the same subject, a new bnode (Bi) >> - each cell generates the triple (Bi :headerj "content of cell j") > > with number-like fields being numbers? (to follow what spreadsheets do) > >> >> Where :xxx means URIOFCSVFILE#xxx >> >> 3.2 Conversion of annotated tabular data >> >> 3.1.1 Table level metadata >> The conversion uses the entries defined in section 3.1 of the metadata >> document to generate table level metadata triples as follows: >> >> - @id is used as a subject for all table level metadata >> - @type generates a (@id rdf:type @type) triple >> - the fields defined by DC-TERMS are used directly, with @id as subject >> >> The @id is also used as a 'namespace' for URI-s in the generated >> triples, by concatenating the URI with '#' and with the string for a >> column name (denoted by ':name' in what follows) >> >> 3.1.2 Field level metadata >> The conversion uses the entries defined in section 3.2 of the metadata >> document to generate table level metadata triples using the steps below. > > For me, a template (in RDF conversion) is the template for one complete rows-worth of conversion. Yes for me too; in my last email, I suggested that we automatically construct such a template if none is provided, which I think simplifies subsequent processing. > I was thinking that if no template were explicitly given, the metadata would be used to define a template and the template be applied. We could have descriptive text about what happens when there is no user-defined template. Your outline seems to define the process when templateless separately from templates. +1 > Generating a template, if none provided, would keep the user-template driven mechanism and metadata-gdefineeneated template mechanism in-step. It would be clear that they aren't alternatives with (potentially) capabilities in the direct roue not in the template route. You could get the generated template and tweak it, for example. +1 > The part in common is escaping syntax. Building up URIs from fields in a row may involve URI query strings, URI path segments etc and these have slightly different rules for conversion from a character string to the URI form (e.g. spaces, use of ?, & and /). We need to be sure we can access the RFC6570 escape conventions; I suggested how we might do this in my processing instructions. >> The processing is based on general metadata attributes as defined in >> that section; this specification adds one field level attribute: >> 'rdf_predicate_type', which can take the value of 'object' or 'literal' >> define >> - each row generates a number of triple with a common subject. This >> subject is >> - a new blank node for each row if no primaryKey attribute is defined, or >> - :field1-field2-...-fieldn, where fieldi are the (column) names >> appearing in the value of the primaryKey attribute if that attribute >> contains a list of names, or >> - :field, where field is the column name appearing as the value of the >> primaryKey attribute >> - for each cell in the field _that is not a primary Key_, the following >> triple is generated >> - subject is the subject defined for the row >> - predicate is :name, where 'name' is the value of the 'name' >> attribute in the field descriptor (3.2.2 in the metadata spec) >> - object: >> - if the column is defined as a foreign key through the >> 'foreignKeys' attribute, the object is a RDF URI Resource as defined by >> the foreign Key reference (3.2.7 in the metadata spec) > > I think the term 'foreign key' brings a lot of baggage with it such as foreign key constraints, and guarantees, especially any assumption about whether the link target exists or not. > > I'd rather just talk about generating URIs as one "type", and reserve 'foreign key' for the case of a link within a group of tables converted together or associated in somewhere a foreign key is highly likely to mean the target of the link exists. Agreed, I don't see any real value here; we need to be able to designate that the type in column metadata is an IRI (@id?). >> - otherwise, if a 'type' attribute is defined (3.2.4) then the >> cell is converted into that typed of literal (in case of date, this may >> also use the 'format' attribute) This is a metadata processing rule, as opposed to a template processing rule. >> - otherwise, if a 'template' attribute is defined, then the >> template is used to generate a value; if the value of rdf_predicate_type >> is missing or is set to 'literal', the object is an rdf literal of type >> xsd:string; otherwise, the object is an RDF URI Resource. > > So templates for you are templating individual RDF objects? Jeremy's conversion example has a specific shape of RDF per row. > >> - otherwise, the value of the cell is uses as an RDF Literal as an >> xsd:string >> >> >> Some open issues: >> - do we need to add a (rowid csv:row "rownumber") kind of triple for >> each row; (probably yes) >> - do we need to add a series of triples of the sort (@id cvs:rows rowid) >> for each row, to make a "bridge" between the graph overall and its >> constituents. It may not be all that important for RDF, but it may be >> necessary for JSON) > > Agreed - we probably want to define triples generated that tie the RDF back to the CSV input. Probably "optional extra" as for a lasr CSVfile, there would be a significant increase is size of output. There's some description in Ivan's version of the document that suggests this, but it isn't carried out in the examples. (BTW, I suggest we merge Ivan's changes in (making him an Editor) and use that as the basis going forward.) Gregg >> ]]] >> >> Does this make sense? >> >> I may try to find some time editing the document, but would be good to >> have a minimal agreement from the group. >> >> Ivan > > Andy > >> >> >> ---- >> Ivan Herman, W3C >> Home: http://www.w3.org/People/Ivan/ >> mobile: +31-641044153 <tel:+31-641044153> >> GPG: 0x343F1A3D >> WebID: http://www.ivan-herman.net/foaf#me >> > >
Received on Wednesday, 21 May 2014 23:36:51 UTC