- From: Alfredo Serafini <seralf@gmail.com>
- Date: Thu, 24 Apr 2014 13:50:23 +0200
- To: Innovimax W3C <innovimax+w3c@gmail.com>
- Cc: Ivan Herman <ivan@w3.org>, Jeni Tennison <jeni@jenitennison.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
- Message-ID: <CADawF4MvexAV=F7ecRvfRNis4Q7eZeRFFh5DgUgF=x7ZUP-kfw@mail.gmail.com>
Hi given a default mapping, I would use the combination of 3/4 to plug specific components designed to small changes (aggregate/disaggregate fields, chaacter normalizations, and so on), or even for a completely new mapping. This way the standard workflow will not break too much, and it's open for very different technologies. For the XML part I strongly suggest to avoid using too much specific attributes and on the other hand to fix the ID as an attribute: these could be useful to easily obtain the back mapping with query languages like xpath, for example. Alfredo 2014-04-24 13:42 GMT+02:00 Innovimax W3C <innovimax+w3c@gmail.com>: > Then sorry > > I thought the question was about architecture > > Regards, > > Mohamed > > On Thu, Apr 24, 2014 at 1:40 PM, Ivan Herman <ivan@w3.org> wrote: > > I still do not get it. > > > > GRDDL is a way to tell an XML (including XHTML) processor: "here is an > XSLT file that you can use to transform this XML file into RDF". > > > > What we may provide is reference to an XSLT file that may say "if the > CSV file is transformed into XML, here is an XSLT file that you can use to > massage the result to produce another XML file". There is no mention of RDF > in there. So, while there is a vague resemblance to GRDDL, I think > referring to GRDDL might only muddy the waters:-( > > > > Ivan > > > > On 24 Apr 2014, at 13:33 , Innovimax W3C <innovimax+w3c@gmail.com> > wrote: > > > >> Sure! > >> > >> But the tool we will end up providing with be in the family of "**-> > >> RDF" in which GRDDL. > >> The same will apply if we do CSV -> XML we will have to deal with XSLT > >> and XQuery Serialization spec, for example > >> > >> Regards, > >> > >> Mohamed > >> > >> On Thu, Apr 24, 2014 at 1:14 PM, Ivan Herman <ivan@w3.org> wrote: > >>> I am not absolutely sure whether it is indeed relevant. GRDDL is a way > to associate an XSLT style sheet to an XML file to transform it into RDF. > Ie, it is a tool (alas! almost not in use in practice) for XML->RDF, which > is not part of this charter... > >>> > >>> Ivan > >>> > >>> On 24 Apr 2014, at 12:52 , Innovimax W3C <innovimax+w3c@gmail.com> > wrote: > >>> > >>>> Dear all, > >>>> > >>>> Just a side node perhaps, but we already have some existing material > >>>> which is GRDDL [1] > >>>> > >>>> I was surprised that I was not mentionned in the charter > >>>> > >>>> It would be good to keep GRDDL in mind with respect to answering that > >>>> question in order to keep the link with existing W3C Specification > >>>> > >>>> Thanks > >>>> > >>>> Mohamed > >>>> > >>>> [1] http://www.w3.org/TR/grddl/ > >>>> > >>>> On Wed, Apr 23, 2014 at 9:13 PM, Jeni Tennison <jeni@jenitennison.com> > wrote: > >>>>> Hi, > >>>>> > >>>>> On the call today we discussed briefly the general architecture of > mapping from CSV to other formats (eg RDF, JSON, XML, SQL), specifically > where to draw the lines between what we specify and what is specified > elsewhere. > >>>>> > >>>>> To make this clear with an XML-based example, suppose that we have a > CSV file like: > >>>>> > >>>>> GID,On Street,Species,Trim Cycle,Inventory Date > >>>>> 1,ADDISON AV,Celtis australis,Large Tree Routine Prune,10/18/2010 > >>>>> 2,EMERSON ST,Liquidambar styraciflua,Large Tree Routine > Prune,6/2/2010 > >>>>> 3,EMERSON ST,Liquidambar styraciflua,Large Tree Routine > Prune,6/2/2010 > >>>>> > >>>>> This will have a basic mapping into XML which might look like: > >>>>> > >>>>> <data> > >>>>> <row> > >>>>> <GID>1</GID> > >>>>> <On_Street>ADDISON AV</On_Street> > >>>>> <Species>Celtis australis</Species> > >>>>> <Trim_Cycle>Large Tree Routine Prune</Trim_Cycle> > >>>>> <Inventory_Date>10/18/2010</Inventory_Date> > >>>>> </row> > >>>>> ... > >>>>> </data> > >>>>> > >>>>> But the XML that someone actually wants the CSV to map into might be > different: > >>>>> > >>>>> <trees> > >>>>> <tree id="1" date="2010-10-18"> > >>>>> <street>ADDISON AV</street> > >>>>> <species>Celtis australis</species> > >>>>> <trim>Large Tree Routine Prune</trim> > >>>>> </tree> > >>>>> ... > >>>>> </trees> > >>>>> > >>>>> There are (at least) four different ways of architecting this: > >>>>> > >>>>> 1. We just specify the default mapping; people who want a more > complex mapping can plug that into their own toolchains. The disadvantage > of this is that it makes it harder for the original publisher to specify > canonical mappings from CSV into other formats. It also requires people to > know how to use a larger toolchain (but I think they are probably have that > anyway). > >>>>> > >>>>> 2. We enable people to point from the metadata about the CSV file to > an ‘executable’ file that defines the mapping (eg to an XSLT stylesheet or > a SPARQL CONSTRUCT query or a Turtle template or a Javascript module) and > define how that gets used to perform the mapping. This gives great > flexibility but means that everyone needs to hand craft common patterns of > mapping, such as of numeric or date formats into numbers or dates. It also > means that processors have to support whatever executable syntax is defined > for the different mappings. > >>>>> > >>>>> 3. We provide specific declarative metadata vocabulary fields that > enable configuration of the mapping. For example, each column might have an > associated ‘xml-name’ and ‘xml-type’ (element or attribute), as well as > (more usefully across all mappings) ‘datatype’ and ‘date-format’. This > gives a fair amount of control within a single file. > >>>>> > >>>>> 4. We have some combination of #2 & #3 whereby some things are > configurable declaratively in the metadata file, but there’s an “escape > hatch” of referencing out to an executable file that can override. The > question is then about where the lines should be drawn: how much should be > in the metadata vocabulary (3) and how much left to specific configuration > (2). > >>>>> > >>>>> My inclination is to aim for #4. I also think we should try to reuse > existing mechanisms for the mapping as much as possible, and try to focus > initially on metadata vocabulary fields that are useful across use cases > (ie not just mapping to different formats but also in validation and > documentation of CSVs). > >>>>> > >>>>> What do other people think? > >>>>> > >>>>> Jeni > >>>>> -- > >>>>> Jeni Tennison > >>>>> http://www.jenitennison.com/ > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Innovimax SARL > >>>> Consulting, Training & XML Development > >>>> 9, impasse des Orteaux > >>>> 75020 Paris > >>>> Tel : +33 9 52 475787 > >>>> Fax : +33 1 4356 1746 > >>>> http://www.innovimax.fr > >>>> RCS Paris 488.018.631 > >>>> SARL au capital de 10.000 € > >>>> > >>> > >>> > >>> ---- > >>> Ivan Herman, W3C > >>> Digital Publishing Activity Lead > >>> Home: http://www.w3.org/People/Ivan/ > >>> mobile: +31-641044153 > >>> GPG: 0x343F1A3D > >>> FOAF: http://www.ivan-herman.net/foaf > >>> > >>> > >>> > >>> > >>> > >> > >> > >> > >> -- > >> Innovimax SARL > >> Consulting, Training & XML Development > >> 9, impasse des Orteaux > >> 75020 Paris > >> Tel : +33 9 52 475787 > >> Fax : +33 1 4356 1746 > >> http://www.innovimax.fr > >> RCS Paris 488.018.631 > >> SARL au capital de 10.000 € > > > > > > ---- > > Ivan Herman, W3C > > Digital Publishing Activity Lead > > Home: http://www.w3.org/People/Ivan/ > > mobile: +31-641044153 > > GPG: 0x343F1A3D > > FOAF: http://www.ivan-herman.net/foaf > > > > > > > > > > > > > > -- > Innovimax SARL > Consulting, Training & XML Development > 9, impasse des Orteaux > 75020 Paris > Tel : +33 9 52 475787 > Fax : +33 1 4356 1746 > http://www.innovimax.fr > RCS Paris 488.018.631 > SARL au capital de 10.000 € > >
Received on Thursday, 24 April 2014 11:50:51 UTC