- From: Gregg Kellogg <gregg@greggkellogg.net>
- Date: Mon, 3 Feb 2014 11:45:20 -0800
- To: Ivan Herman <ivan@w3.org>
- Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
On Feb 3, 2014, at 6:50 AM, Ivan Herman <ivan@w3.org> wrote: > Hey Gregg, > > - A clarification please... In the section on Table Join representation[1] you say 'Data such as this does not readily transform to JSON-LD'. I want to understand this better. What I meant was that such data does not readily transform to JSON-LD using a single node definition with one node per row, as it contains data from multiple entities. Also, JSON-LD keyword aliases allow for multiple aliases to represent the same keyword (e.g. doap_id and foaf_id both are aliases for @id), but when transforming back from JSON-LD only one of these will be selected (the shortest and lexagraphically first). This is motivation for describing the entity mapping section. > It is correct, isn't it, that you can transform that into a set of JSON-LD objects, one row per object (in RDF terms, a row into a set of properties having a common bnode subject, each row being a different one). I guess what you mean is that, in ideal term, you want a mapping resulting in what you describe in the Entity mapping section[2], ie, making use of the fact that these have similar subjects. Yes; certainly some transformation to an object with a bnode subject would be possible, just not very useful IMO. This is why I suggest entity mapping as a way of recovering the entities described in a single row. There are certainly pathological use-cases, but may constrain the form of the CSV that can be performed. For example the table shown did not contain an equivalent doap:developer column, which is necessary when relating the DOAP properties to the FOAF properties in the same row; however, this could be inferred in the entity mapping. > The similar issue in the RDB Direct Mapping spec[3] is taken care of by the fact that, in a relational database, one may have a primary key; in the direct mapping, if there is a primary key in a table, that (well, a URI representation thereof) will be chosen as the common subject (instead of a blank node). Isn't this what you are looking for? Well, in a CSV context, it might be hard to distinguish this from data in a single row. From an RDF perspective, if two different tables (or rows) had the same primary identifier, then they would denote the same entity. The use case I was noting is when a row has multiple columns which are identifiers for a subset of the columns within the row, for example doap_id and foaf_id are each identifiers, with the doap_* and foaf_* columns being apportioned to one or the other. The JSON-LD frame in the entity mapping example defines this mapping. Certainly, one of the identifier columns may be more _primary_ than the other, that likely being the left-hand-side of a join. > Putting this into this context, I believe the issue is what will the metadata of the CSV file contain; that metadata (whose definition is one of the goals of this WG!) may do exactly that: designate one column as the 'primary key'. Once that is done, mapping into JSON (but, probably, XML and of course RDF) becomes way more obvious. Yes, but determining that one entity "contains" the other. > Is this what you call an 'entity mapping' to JSON(-LD)? yes. > - As for the general approach: I think there are similarities to the mapping of JSON, XML, and RDF that we have to exploit. I would probably look at [3] for a general line of thoughts, which may be moderated by some metadata (nothing as complex as R2RML[4], though) like the primary key above. I would leave XML aside for a moment; I guess what would be very important for our users is indeed, as you propose, to map the CSV file on JSON but following as much as we can the JSON-LD structures, so that the result can be turned into RDF if necessary by a suitable @context (and that @context may also be generated through the metadata). Ie, if somebody just wants JSON and does not even want to utter the term 'RDF', then that is fine, he/she can use JSON; if somebody wants RDF for whatever reasons, then, say, the @context+JSON -> Turtle mapping is already provided by current specifications. Yes, exactly. The lesson of JSON-LD is that you can create a format (or transformation, in this case) which appeals to developers as is, without requiring them to buy into the whole RDF echo-system. This is why I thought the term CVS-LD useful in invoking the same developer-friendly view of turning CSV into structured data. Of course, JSON could also be turned into XML without going through RDF as well. Although I don't expect to attend telecons directly, if people would like to discuss this further in a telcon, I will of course make myself available. Gregg > Thx > > Ivan > > > > [1] https://www.w3.org/2013/csvw/wiki/CSV-LD#Table_Join_representation > [2] https://www.w3.org/2013/csvw/wiki/CSV-LD#Entity_Mapping > [3] http://www.w3.org/TR/rdb-direct-mapping/ > [4] http://www.w3.org/TR/r2rml/ > > On 01 Feb 2014, at 02:52 , Gregg Kellogg <gregg@greggkellogg.net> wrote: > >> I added a proposal for something I call CSV-LD to the wiki [1]. As the name might suggest, this is strongly tied to JSON-LD, and uses JSON-LD context and frame definitions to both provide meaning to CSV, allowing it to be losslessly transformed to JSON-LD, or to create CSV from JSON-LD (with or without embedding). >> >> Consider this a straw-man proposal. It does lay out some use cases that are generally useful (and perhaps should be copied to other pages on the wiki), but there may be more use cases to consider. IMO, creating a specification for this, and extending an existing JSON-LD implementation to support this would not be too difficult. >> >> Gregg Kellogg >> gregg@greggkellogg.net >> >> [1] https://www.w3.org/2013/csvw/wiki/CSV-LD >> >> > > > ---- > Ivan Herman, W3C > Digital Publishing Activity Lead > Home: http://www.w3.org/People/Ivan/ > mobile: +31-641044153 > GPG: 0x343F1A3D > FOAF: http://www.ivan-herman.net/foaf > > > > >
Received on Monday, 3 February 2014 19:45:50 UTC