Re: Proposal for ISSUE-12, string literals from Eric Prud'hommeaux on 2011-05-15 (public-rdf-wg@w3.org from May 2011)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Sun, 15 May 2011 21:04:23 +0200
To: Sandro Hawke <sandro@w3.org>
Cc: Richard Cyganiak <richard@cyganiak.de>, public-rdf-wg@w3.org
Message-ID: <20110515190420.GC6923@w3.org>

* Sandro Hawke <sandro@w3.org> [2011-05-15 10:14+0200]
> On Sat, 2011-05-14 at 20:33 +0200, Eric Prud'hommeaux wrote:
> > (I'm
> > assuming here that users will continue to work on graphs without e.g.
> > D-entailment; which I think is pretty realistic.)
> 
> FWIW, and while I don't disagree, I think this is a bit like saying
> users will continue to write their own parsers, or that users will
> continue to read/write their own sectors on block devices (instead of
> using filesystems).  They might, but I'd rather focus on making a world
> where they don't have to.

Sure, some geeks will attack their disk platters with magnetized
needles, but I'm trying to establish popularly accepted entry level to
the SemWeb so that folks can write specs get real work done. The
overwhelming choice so far it has been that folks work with the graph,
in part because no other closure claimed to be the minimum, and in
part because every inference breaks cardinality. If you abandon UNA
and infer subtle variations on existing triples, you have to add a lot
of ground facts and inference in order to answer practical questions.

I appreciate that we don't want to over-simplify and I'm happy to work
on a complete picture, but I think that we need to have a coherent
picture of how this will work before we ask the world to adopt
incremental possibly painful steps. As a test, I propose a few use
cases which we'd like to work:

  (SPARQL) How many FTEs do we spend on R&D?

  exemplar of data which we probably want to avoid:
    <Bob> <timeAllocation> [ <project> "R&D" ; <hoursPerWeek> 20 ] .
    <Sue> <timeAllocation> [ <project> "R&D"^^xsd:string ; <hoursPerWeek> 24 ] . # 
  SPARQL Query:
    SELECT (SUM(?hours)/40 AS ?FTEs)
      { ?who <timeAllocation> [ <project> "R&D" ; <hoursPerWeek> ?hours ] }
  Current answer:
      ?FTEs => .5
  Desired answer:
      ?FTEs => 1.1

  (N3) Infants must not be given aspirin without testing for Reye's Syndrome.

  { ?patient :ageInYears ?age FILTER (?age <= 2)
    ?prescription :prescribedTo ?patient ; ?medName "Buffren" }
  => { ?prescription :requires <ReyesProtocol> }

We can tell implementers that specs like these need to be more clever,
but simply saying that a they MIGHT or MUST work on X-entailed graphs
doesn't finish the job. The low-bar approach of saying that 
"R&D" ≡ "R&D"^^xsd:string solves this, but that still effectively
tells the world that inference hurts. Perhaps we can come up with some
scheme wherein working with predictable closures like

    <Bob> <timeAllocation> [ <project> "R&D" ; <hoursPerWeek> 20 ;
                             <project> "R&D"^^xsd:string ; <hoursPerWeek> 20 ] .
    <Sue> <timeAllocation> [ <project> "R&D" ; <hoursPerWeek> 24 ;
                             <project> "R&D"^^xsd:string ; <hoursPerWeek> 24 ] . # 

will still allow for practical processing. Barring that, my
conservative position is that we should have only one form.

>    -- Sandro
> 
> 

-- 
-ericP

Received on Sunday, 15 May 2011 19:04:55 UTC