- From: Juan Sequeda <juanfederico@gmail.com>
- Date: Thu, 20 Feb 2014 12:30:11 -0600
- To: Gregg Kellogg <gregg@greggkellogg.net>
- Cc: Ivan Herman <ivan@w3.org>, Andy Seaborne <andy@apache.org>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
- Message-ID: <CAMVTWDwUgEvfFsW6JYm_x1bOtBLCDtrotS5x1us0=QewBBP_xg@mail.gmail.com>
Quick reaction: why wasn't json-ld around when we defined R2RML. :) Juan Sequeda +1-575-SEQ-UEDA www.juansequeda.com On Thu, Feb 20, 2014 at 12:18 PM, Gregg Kellogg <gregg@greggkellogg.net>wrote: > On Feb 20, 2014, at 8:17 AM, Ivan Herman <ivan@w3.org> wrote: > > > > > > > Andy Seaborne wrote: > > <snip/> > >>>> > >>>> Here's a contribution: > >>>> > >>>> ---------------------------- > >>>> "Sales Region"," Quarter"," Sales" > >>>> "North","Q1",10 > >>>> "North","Q2",15 > >>>> "North","Q3",7 > >>>> "North","Q4",25 > >>>> "South","Q1",9 > >>>> "South","Q2",15 > >>>> "South","Q3",16 > >>>> "South","Q4",31 > >>>> ---------------------------- > >>>> > >>>> There are two sales regions, each with 4 sales results. > >>>> > >>>> This needs some kind of term resolution to turn e.g. "North" into > a URI > >>>> for the northern sales region. It could be by an external lookup > or by > >>>> URI template as in R2RML. External lookup gives better linking. > >>>> > >>>> Defining "views" may help replacing the SQL with something. > >>>> > >>>> > >>>> In this example, what would be the subject? > >> > >> While we could use the row number as the basis of the primary key I > think that > >> *may* lead to low-value data. > > > > Yes, I obviously agree > > > >> > >> Just because you can convert a data table to some RDF, if the URIs are > all > >> locally generated, I'm not sure there is strong value in a standard > here. > >> > > > > Well, yes, that is true. *If* I was using the R2RML approach, that is > (also) > > where it would come in and assign some URI. > > > > But I am afraid R2RML would be too complicated for many of our expected > audience > > here. That is why I followed Gregg's footsteps and pushed JSON-LD into > the > > forefront (which is not necessary the "kosher" thing to do in terms of > RDF, ie, > > mixing syntax with model...). And the reason is because, in fact, > JSON-LD has a > > mini-R2RML built into the system, which is @context. (That what makes it > unique > > among serializations. I wish we had kept the idea for RDFa, but that is > water > > under the bridge now.) > > > > Ie, if the data publisher can also provide a @context in some metadata > format, > > then the two together may map the local names to global URI-s easily. > > > That's exactly the point, although some amount of metadata beyond what > JSON-LD provides is likely necessary to handle more real-world use cases > (such as composite primary keys). > > >> In this example would ideally use "North" to resolve to a URI in the > corporate > >> data dictionary because the "Sales Region" columns I known to be a key > (inverse > >> function property). > >> > >> "North" need not appear in the output. > >> > >> Give: > >> > >> prefix corp: <http://mycorp/globalDataDictionary/> > >> > >> corp:region1 :Name "North" . > >> corp:region2 :Name "South" . > >> > >> We might get from row one: > >> > >> corp:region1 :sales [ :period "Q1" ; :value 10 ] . > >> > >> (including a blank node - a separate discussion! - let's use generated > ids for > >> now:) > >> > >> corp:region1 :sales gen:57 ; > >> gen:57 :period "Q1" ; > >> :value 10 . > >> > >> > >> or a different style: > >> > >> <http://corp/file/row1> > >> :region corp:region1 ; > >> :period "Q1 ; > >> :sales 10 . > >> > > > > I think that if I follow a simple JSON mapping plus declaring the "Sales > Region" > > as, sort of, primary, I can get to something in JSON-LD like > > > > { > > "North" : > > [ { > > "quarter" : "Q1", > > "sales" : 10 > > }, > > { > > "quarter" : "Q2", > > "sales" : 15 > > } ], > > "South" : > > [ .... ] > > } > > > > (I am just making up a simple 'CSV direct mapping') which, with a > suitable > > @context, could then be transformed into RDF like: > > > > [ > > <http://corp/region/north> > > [ <http://corp/quarter> : "Q1", <http://corp/sales> : 10 > ], > > [ <http://corp/quarter> : "Q2", <http://corp/sales> : 15 > ]. > > <http://corp/region/south> > > ... > > ] > > > > yep, bunch of blank nodes, let us put that aside for a moment. (I hope I > got it > > right, Gregg can correct me if I am wrong) > > > > It is probably not exactly the Direct Mapping but, well, the be it. We > have to > > do things that the community can really use easily (I think the direct > mapping > > would mean to have separate objects based identified by row numbers, > right Juan?) > > If the "Sales Region" is used to create an identifier, then you could get > something like that. In this case, though, you might want to make Sales > Region to something like dc:title and assert that it is unique, in some > way, so that a BNode is allocated for it. This might be done implicitly > given a chained representation such as the following: > > { > "@context": { > "dc": "http://purl.org/dc/terms/", > "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#", > "ex": "http://example/", > "Sales Region": "dc:title", > "Quarter": "dc:title", > "Sales": "ex:revenue" > }, > "@type": "ex:SalesRegion", > "Sales Region": null, > "ex:period": { > "@type": "ex:SalesPeriod", > "Quarter": null, > "Sales": null > } > } > > This would result in something like the following: > > [] a ex:SalesRegion; > dc:title "North"; > ex:period > [ a ex:SalesPeriod; dc:title "Q1", ex:revenue 10], > [ a ex:SalesPeriod; dc:title "Q2", ex:revenue 15], > [ a ex:SalesPeriod; dc:title "Q3", ex:revenue 7], > [ a ex:SalesPeriod; dc:title "Q4", ex:revenue 25] . > > [] a ex:SalesRegion; > dc:title "South"; > ex:period > [ a ex:SalesPeriod; dc:title "Q1", ex:revenue 9], > [ a ex:SalesPeriod; dc:title "Q2", ex:revenue 15], > [ a ex:SalesPeriod; dc:title "Q3", ex:revenue 16], > [ a ex:SalesPeriod; dc:title "Q4", ex:revenue 31] . > > It may be that in some cases, we want to map one column to two properties, > for example to create both a relative IRI subject and title based on Sales > Region and Quarter. > > Gregg > > >> In my limited exposure to R2RML usage, the majority has been direct > mapping, > >> with the app (SPARQL queries) directly and crudely pulling values out > of the > >> data. There is no RDF to RDF uplifting. It seems to be a caused by > the need > >> from upfront investment and mixing responsibilities of access and > modelling. > >> > >> The better full mapping language of R2RML does not get the investment > (quality > >> of tools seems to be an issue - too much expectation of free open > source maybe?). > >> > > > > Yes, I think the scenario described by Juan is realistic, and I actually > visited > > a company called Antidot (in France) a while ago who did that big time. > They > > used Direct Mapping to get a clear image of the RDB structure... > > > > The 'uplifting' issue is a real thorn in my side. The Direct Mapping > really > > works if, really, one can rely on a good RDF rule engine. And we do not > have > > that, which is a real shame... > > > >> Being "devils advocate" here ... > >> I do wonder if the WG really does need to produce a *standardised* CSV > to RDF > >> mapping or whether the most important part is to add the best metadata > to the > >> CSV file and let different approaches flourish. > > > > There is no doubt in my mind that the most important part of the job of > this WG > > is to define the right metadata (and a way to find that metadata). I > think we > > can define a simple mapping to JSON/RDF/XML and yes, you are right, it > will not > > be a universal solution that will make everybody happy. Ie, in some > cases, > > people will have to do different things using that metadata. But I think > it is > > possible to cover, hm, at least a 60/40 if not 80/20 range... > > > > Ivan > > > > > >> > >> This is based on looking at the role and responsibilities in the > publishing > >> chain: the publisher provides CSV files and the metadata - do they > provide the > >> RDF processing alogorithm as well? Or does that involve consideration > by the > >> data consumer on how they intend to use the tabular data? > >> > >> Andy > >> > > <snip/> > > > >
Received on Thursday, 20 February 2014 18:30:59 UTC