- From: Ivan Herman <ivan@w3.org>
- Date: Mon, 19 May 2014 16:00:11 +0200
- To: Andy Seaborne <andy@apache.org>
- Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
- Message-Id: <836B654D-C17A-44FE-9542-4E85CAAED9D5@w3.org>
On 19 May 2014, at 15:24 , Andy Seaborne <andy@apache.org> wrote: > On 18/05/14 17:59, Ivan Herman wrote: >> >> Andy, Gregg & all >> >> on the call on Wednesday I suggested that, by putting the general >> description of the conversion into (for now) the metadata document, it >> may be necessary to restructure the current CSV2RDF document. I tried to >> draft what a structure & algorithm would look like; I give you here what >> I jotted down. > > Thank you. > >> Note that I rely on the fact that the template part would >> migrate as a general mechanism somewhere; there seems to be an agreement >> on this on the mailing list. I refer to it as a 'template' attribute in >> the metadata. >> >> I think the changes should be in section 3 (see below). Following the >> flow in the metadata document I put the 'table level metadata' into a >> subsection of section 3, meaning section 4 in the current document can >> disappear. I would remove section 5, because that should become a more >> general topic, not bound to RDF; the minimal mapping is also part of >> section 3. I am not sure about current section 7 (I think that should >> move elsewhere). Note also that, I believe, this skeleton may be similar >> for XML and JSON, but I did not check that. >> >> I believe is that the three (or more, eventually, with JSON amd XML) >> relevant documents (syntax, metadata, and conversions) should be in >> synchrony, and this before the next publication round... >> >> With that, this is what I had in mind: >> >> [[[ >> 3. Processing Model >> 3.1 Conversion of a core tabular data, or annotated with embedded >> metadata only >> >> The file's URI is also used as a 'namespace' for URI-s in the generated >> triples, by concatenating the URI with '#' and with the string for a >> column name (denoted by ':name' in what follows) >> >> - this case either yields a header for each column; if not, :col1, >> :col2, :col3, ... are defined >> - the generation is done by >> - each row has the same subject, a new bnode (Bi) >> - each cell generates the triple (Bi :headerj "content of cell j") > > with number-like fields being numbers? (to follow what spreadsheets do) Yep, in the current model (and in the document) I have not put any automatic datatype conversion. I guess this could be done for some of the very usual ones (numbers, anything else? maybe dates?). I am a little bit neutral on this, to be honest. > >> >> Where :xxx means URIOFCSVFILE#xxx >> >> 3.2 Conversion of annotated tabular data >> >> 3.1.1 Table level metadata >> The conversion uses the entries defined in section 3.1 of the metadata >> document to generate table level metadata triples as follows: >> >> - @id is used as a subject for all table level metadata >> - @type generates a (@id rdf:type @type) triple >> - the fields defined by DC-TERMS are used directly, with @id as subject >> >> The @id is also used as a 'namespace' for URI-s in the generated >> triples, by concatenating the URI with '#' and with the string for a >> column name (denoted by ':name' in what follows) >> >> 3.1.2 Field level metadata >> The conversion uses the entries defined in section 3.2 of the metadata >> document to generate table level metadata triples using the steps below. > > For me, a template (in RDF conversion) is the template for one complete rows-worth of conversion. > > I was thinking that if no template were explicitly given, the metadata would be used to define a template and the template be applied. We could have descriptive text about what happens when there is no user-defined template. Your outline seems to define the process when templateless separately from templates. > > Generating a template, if none provided, would keep the user-template driven mechanism and metadata-gdefineeneated template mechanism in-step. It would be clear that they aren't alternatives with (potentially) capabilities in the direct roue not in the template route. You could get the generated template and tweak it, for example. > I would need an example to understand what you mean... > The part in common is escaping syntax. Building up URIs from fields in a row may involve URI query strings, URI path segments etc and these have slightly different rules for conversion from a character string to the URI form (e.g. spaces, use of ?, & and /). > >> The processing is based on general metadata attributes as defined in >> that section; this specification adds one field level attribute: >> 'rdf_predicate_type', which can take the value of 'object' or 'literal' >> define >> - each row generates a number of triple with a common subject. This >> subject is >> - a new blank node for each row if no primaryKey attribute is defined, or >> - :field1-field2-...-fieldn, where fieldi are the (column) names >> appearing in the value of the primaryKey attribute if that attribute >> contains a list of names, or >> - :field, where field is the column name appearing as the value of the >> primaryKey attribute >> - for each cell in the field _that is not a primary Key_, the following >> triple is generated >> - subject is the subject defined for the row >> - predicate is :name, where 'name' is the value of the 'name' >> attribute in the field descriptor (3.2.2 in the metadata spec) >> - object: >> - if the column is defined as a foreign key through the >> 'foreignKeys' attribute, the object is a RDF URI Resource as defined by >> the foreign Key reference (3.2.7 in the metadata spec) > > I think the term 'foreign key' brings a lot of baggage with it such as foreign key constraints, and guarantees, especially any assumption about whether the link target exists or not. Yes, I agree. I am not even sure we need those; after all, the metadata can generate URIs and can tell that the value should be taken to be a URI. At the moment I just tried to align with the metadata document. It is a more general issue than RDF. > > I'd rather just talk about generating URIs as one "type", and reserve 'foreign key' for the case of a link within a group of tables converted together or associated in somewhere a foreign key is highly likely to mean the target of the link exists. Right. And I am not sure whether the case for several tables in one file is in scope... > >> - otherwise, if a 'type' attribute is defined (3.2.4) then the >> cell is converted into that typed of literal (in case of date, this may >> also use the 'format' attribute) >> - otherwise, if a 'template' attribute is defined, then the >> template is used to generate a value; if the value of rdf_predicate_type >> is missing or is set to 'literal', the object is an rdf literal of type >> xsd:string; otherwise, the object is an RDF URI Resource. > > So templates for you are templating individual RDF objects? Jeremy's conversion example has a specific shape of RDF per row. Yes. Of course, a specific template for a field can use the names of other fields, too. I modeled my thoughts on the R2RML approach which is also granular on fields. > >> - otherwise, the value of the cell is uses as an RDF Literal as an >> xsd:string >> >> >> Some open issues: >> - do we need to add a (rowid csv:row "rownumber") kind of triple for >> each row; (probably yes) >> - do we need to add a series of triples of the sort (@id cvs:rows rowid) >> for each row, to make a "bridge" between the graph overall and its >> constituents. It may not be all that important for RDF, but it may be >> necessary for JSON) > > Agreed - we probably want to define triples generated that tie the RDF back to the CSV input. Probably "optional extra" as for a lasr CSVfile, there would be a significant increase is size of output. > >> ]]] >> >> Does this make sense? >> >> I may try to find some time editing the document, but would be good to >> have a minimal agreement from the group. >> >> Ivan > > Andy > >> >> >> ---- >> Ivan Herman, W3C >> Home: http://www.w3.org/People/Ivan/ >> mobile: +31-641044153 <tel:+31-641044153> >> GPG: 0x343F1A3D >> WebID: http://www.ivan-herman.net/foaf#me >> > > ---- Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D WebID: http://www.ivan-herman.net/foaf#me
Received on Monday, 19 May 2014 14:00:44 UTC