Re: moving the use case document to FPWD

Hi Jeremy, Eric, all,

I'd have a last possible UC addition that resulted from a work I've completed the past days (almost accidentally).
I haven't added it to the doc yet, cause from the last emails it looked like the doc is in stable form, so I preferred to share it here first.
However, if you agree that it is possibly useful (for instance it touches the "size problem" discussed during last telco), I'll improve, polish and add it now.
Below you can find a draft version.
Cheers,

Davide

-------------

In the <a href="http://challenges.2014.eswc-conferences.org/index.php/RecSys">ESWC-14 Challenge: Linked Open Data-enabled Recommender Systems</a> participants are provided with two book datasets in TSV format.
The first dataset contains a set of users and their ratings for a bunch of books each:

DBbook_userID DBbook_itemID rate
{snip}
6873 5950 1
6873 8010 1
6873 5232 1
{snip}

A second file provides a mapping between book ids and dbpedia URIS:

DBbook_ItemID name DBpedia_uri
{snip}
1 Dragonfly in Amber http://dbpedia.org/resource/Dragonfly_in_Amber
10 Unicorn Variations http://dbpedia.org/resource/Unicorn_Variations
100 A Stranger in the Mirror http://dbpedia.org/resource/A_Stranger_in_the_Mirror
1000 At All Costs http://dbpedia.org/resource/At_All_Costs
{snip}

The challenge requires to estimate the relevance or the score (depending on the task) that users would attribute to a set of books reported in an evaluation dataset:

DBbook_userID DBbook_itemID
{snip}
6873 5946
6873 5229
6873 3151
{snip}

The challenge requires to make use of Linked Open Data resources in the recommendations.
Ontology patterns are effectively adopted for this task.
An ontology pattern represents a sequence of types and properties that link two items, for instance:

{Book1 property1 Object1 property2 Book2}

This sequence results from considering the triples (subject-predicate-object) in the resource (e.g. DBpedia), independently of their direction.
So the sequence above may result from the following triples:

Book1 property1 Object1
Book2 property1 Object1

To find out all the semantic patterns that link two items, it is necessary to run a set of sparql queries over the resource, in order to cover all the possibilities (the source contains directed triples, while the pattern is adirected).
Because of the complexity of the queries (combinatorix) and because of the latency time of execution of the queries (due to the size of the source), it may be useful to cache the results.

CSV is a good candidate for caching ontology patterns, because of its easiness of use. However, there are some open issues related to this.
First, since patterns may present a variable number of components, one might want to represent patterns in a single cell, while being able to separate the pattern elements when necessary.

Book1 Book2 Pattern
1,7680,"http://dbpedia.org/ontology/language,http://dbpedia.org/ontology/Language,http://dbpedia.org/ontology/language"
1,7680,"http://dbpedia.org/ontology/language,http://schema.org/Language,http://dbpedia.org/ontology/language"
1,7680,"http://dbpedia.org/ontology/language,http://www.w3.org/2002/07/owl#Thing,http://dbpedia.org/ontology/language"
1,7680,"http://dbpedia.org/ontology/language,http://www.w3.org/2000/01/rdf-schema#Resource,http://dbpedia.org/ontology/language"
1,2,"http://dbpedia.org/ontology/author,http://dbpedia.org/ontology/Writer,http://dbpedia.org/ontology/author"


Second, the size of these caching files may be remarkable. For example, the size of this file (link) is ~2GB, and that may imply prohibitive loading times, especially when making a limited number of recommendations.
However, since rows are sorted according to the starting and the ending book of the pattern, then all the patterns that link two books are present in a region of the table formed by consecutive rows.
By having at our disposal an annotation of such regions indicating which book they describe, one might be able to select the "slice" of the file he needs to make a recommendation, without having to load it all.

Il giorno 16/mar/2014, alle ore 17.08, Eric ha scritto:

Jeremy, Davide, and all,

I plan attending our working group telecon this week. Having I will be very busy thru Wednesday afternoon in preparation for a review from project sponsors.

I should be available after Wednesday for editor activities.

Thanks,

Eric

Sent from my iPhone

On Mar 16, 2014, at 8:50 AM, "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk<mailto:jeremy.tandy@metoffice.gov.uk>> wrote:

Great. I have now:


-          Added an additional use case in response to the “Direct Mapping” requirement from David Booth

-          Renumbered the use cases

-          Clustered the requirements

As far as I am concerned, we have ticked off everything we said we would do ahead of the cutoff on Monday.

It turns out that I have to travel @ 9am on Monday (to get flights etc.) so I will not now be able to edit the document tomorrow.

Also, as I will be chairing a meeting in the US all of next week:

-          I won’t be able to dial into the teleconference on Wednesday

-          I won’t be able to edit the document or update the git repository … so should changes need to be made (e.g. editorial, typo fixing etc.), I hope that Eric and Davide can do them

I think the Use Case document<http://w3c.github.io/csvw/use-cases-and-requirements/> is fit for publication as FPWD. I don’t anticipate the need for material changes ahead of FPWD publication.

BR, Jeremy

From: Ceolin, D. [mailto:d.ceolin@vu.nl]
Sent: 15 March 2014 14:27
To: Eric Stephan; Tandy, Jeremy
Cc: W3C CSV on the Web Working Group
Subject: Re: moving the use case document to FPWD

Jeremy, Eric, all,

I have added use case #21, "Displaying Locations of Care Homes on a Map",  http://w3c.github.io/csvw/use-cases-and-requirements/#UC-DisplayingLocationsOfCareHomesOnAMap
If I understood it correctly, it requires CSV to JSON transformation, so I have added it as a requirement.
I've also checked and fixed a couple of typos in the other use cases mentioned yesterday, that now *should* be ok.
Cheers,

Davide

Il giorno 14/mar/2014, alle ore 22.57, Davide Ceolin ha scritto:


Jeremy and Eric,

I think I have incorporated all the comments in UC#20 (Representing Entities Extracted from Text) and UC#22 (Intelligently Previewing CSV Files), and that should complete the amendments (comments welcome!).
Tomorrow morning I'll give a last check and add UC#21 (Displaying locations of care homes on a map).
Cheers,

Davide


Il giorno 14/mar/2014, alle ore 20.10, Eric Stephan ha scritto:


Jeremy and Davide,

I completed and checked in the Palo Alto data use case I *think* that
completes everything that I promised.    I'll be on-line over the
weekend if anything comes up.  I might attempt adding the table we
discussed yesterday.

Cheers,

Eric

On Wed, Mar 12, 2014 at 2:40 PM, Eric Stephan <ericphb@gmail.com<mailto:ericphb@gmail.com>> wrote:

Jeremy and Davide,

My responses are below using a >> and capital letters.  By use of the
capital letters I'm not shouting just helping making answers more
visible.  :-)

Thanks,

Eric


1)      [Eric] add use case #16 City of Palo Alto tree data

YES

2)      [Eric] add use case #21 Displaying locations of care homes on a map

POSSIBLY ADDED TO DAVIDE'S LIST?


4)      [Eric] tease out and make explicit the requirements in use
cases #7, #12 and #17 ... noting the implied "microsyntax" requirement
in #7

YES


Also, Eric: we previously talked about adding a use case about ncdump
(netcdf dump). Is this still necessary / feasible?

YES

Davide: it seems Eric has a lot still to do ... are you able to take
action #2 (adding use case #21 Displaying locations of care homes on a
map). Please refer to my earlier email for my thoughts on that use
case (not much to say - but could be helpful).



On Wed, Mar 12, 2014 at 8:53 AM, Tandy, Jeremy
<jeremy.tandy@metoffice.gov.uk<mailto:jeremy.tandy@metoffice.gov.uk>> wrote:
Hi - in today's teleconf we agreed a number of things to complete for
Monday.



1)      [Eric] add use case #16 City of Palo Alto tree data

2)      [Eric] add use case #21 Displaying locations of care homes on a map

3)      [Davide] complete amendments to use cases about "intelligent
preview" and "representing entities and facts" - see emails here and here.

4)      [Eric] tease out and make explicit the requirements in use cases #7,
#12 and #17 ... noting the implied "microsyntax" requirement in #7

5)      [Jeremy] renumber use cases into sequential order

6)      [Jeremy] cluster requirements as proposed by JeniT



We agreed that there were not, at this time, latent requirements from CSV-LD
or CSV2RDF discussion threads.



Also, Eric: we previously talked about adding a use case about ncdump
(netcdf dump). Is this still necessary / feasible?



Davide: it seems Eric has a lot still to do ... are you able to take action #2
(adding use case #21 Displaying locations of care homes on a map). Please
refer to my earlier email for my thoughts on that use case (not much to say
- but could be helpful).



I'll do actions #5 and #6 on Monday morning based on the structure of the
document I see at that point in time - so expect the use case numbers to
have changed by Monday lunch time!



Please let me know if I've got any of this wrong J



BR, Jeremy


---
Davide Ceolin MSc.
Postdoctoral Researcher
The Network Institute
VU University Amsterdam
d.ceolin@vu.nl<mailto:d.ceolin@vu.nl>
http://www.few.vu.nl/~dceolin/

Received on Sunday, 16 March 2014 17:01:08 UTC