- From: Ivan Herman <ivan@w3.org>
- Date: Sun, 27 Apr 2014 14:26:52 +0200
- To: Andy Seaborne <andy@apache.org>
- Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
- Message-Id: <E13084BB-8E94-4F31-BE96-143FAA21E583@w3.org>
On 27 Apr 2014, at 12:56 , Andy Seaborne <andy@apache.org> wrote: > On 23/04/14 16:13, Ivan Herman wrote: >> (To avoid any misunderstandings, I looked at >> http://w3c.github.io/csvw/csv2rdf/) > >> I am o.k. with the general approach, and with the level of >> simplicity/complexity of the templates. I would probably want each >> feature in the templates to be backed up with a reasonable use case >> (ideally, a use case in real use), but the 'melody', as is documented >> now, is fine to me. My litmus test is whether the mapping is >> implementable in simple and small JS library running on client side >> (not exclusively there, but also there). I think this is essential if >> we want any acceptance of this by client side web apps, ie, if we >> want to maintain a minimal level of hope that client side >> applications would use this:-). > > FWIW My litmus test is bulk conversion of large CSV files, (e.g. inside a DB loading pipeline). > Fair enough... >> >> For the syntax question: I think my litmus test also means that a >> JSON syntax is almost a must: > > The doc is "CSV2RDF" :-) > > Did you have in mind that your small JS library is working in the RDF data model or JSON? So while I agree JSON is a "must" for the WG, for your case, the CSV->JSON is the need. This doc you reviewed may not be the one you want. > > Maybe we end up with a lot of sharing (good) but we don't know yet. By 'syntax' I meant the syntax used for the templates themselves. Ie, still a CSV->RDF (whether producing turtle or JSON-LD is a secondary issue at this point). > >> I do not expect anybody to start >> writing a turtle parser in JS for the purpose of an RDF mapping. The >> template seems to be fairly simple and probably has a straightforward >> description in JSON, ie, I do not believe that to be an issue... > > When I read the template description, I thought of it as text processing, not parsing as Turtle. Well, if the templates are themselves described in Turtle, then I would need a Turtle parser to be able to use them. That was my problem... Ie, I think using JSON for the definition of the templates would be better. Ivan > > The process to produce and output file/stream by text processing, not data structure manipulation. Hence sharing with a JSON conversion is potentially there. > >> >> --- >> >> The templates are on rows on columns, which presupposes a homogeneity >> of the table; again, I would want to check that against use cases. In >> particular, I wonder whether the templates that sets the language tag >> for a whole column is o.k. (e.g., if the column is something like >> 'native name' for cities, then each cell may have a different >> language tag; I am not sure how we would handle that.) >> >> --- >> >> From a more general point of view, an obvious issue on which we will >> have to give an answer to is the relationship of the template >> language to R2RML. As far as I could see, the features in the current >> template language are an almost strict subset of R2RML (I am not sure >> about the datatype mappings; R2RML makes use of SQL datatypes which >> we do not want to refer to). >> >> That being said, if we just referred to R2RML in our spec we would >> scare away a lot of people; meaning that we should probably not do >> it. However, a precise mapping to R2RML may still be necessary to be >> written down in the document, in case somebody want to use an >> existing R2RML engine. We should also check that the simple >> (template-less) mapping is similarly a subset to Direct Mapping, and >> document that >> >> --- >> >> I was also wondering on the call, whether the template is RDF >> specific, or whether at least the general direction could be reused >> for a JSON mapping or, if needed, XML. I guess this is certainly true >> for JSON: the templates to use the right predicate names can be >> reused to generate the keys, for example. But I have not done a >> detailed analysis on this, and there are, almost surely, RDF specific >> features. But we should probably try to factor out the common parts. >> >> (Of course, there is a question whether we need a separate JSON, or >> whether the current mapping would simply produce JSON-LD, ie, JSON. I >> am a little bit afraid of the RDF features, like blank nodes or >> @type, transpire into generic JSON which people may not want... >> >> --- >> >> Minor issue: the automatic numbering/naming of predicates should take >> into account RTL writing direction, see Yakov's examples for CSV >> files in Arabic or Hebrew... > > One possibility is to define a "canonicalization" step, which is CSV to Tabular Data Model that puts the CSV into some sort of expected form. > > This step would include data cleaning and generally fixing things, dealing with alternative separator, dealing with new lines, and could be the place to deal with RTL. > > Andy > >> >> Ivan >> >> ---- Ivan Herman, W3C Digital Publishing Activity Lead Home: >> http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D >> FOAF: http://www.ivan-herman.net/foaf ---- Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D FOAF: http://www.ivan-herman.net/foaf
Received on Sunday, 27 April 2014 12:27:23 UTC