Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?]

On Fri, Jul 24, 2020 at 01:13:45PM -0400, Peter Patel-Schneider wrote:
> But what happens with 
> 
> wipe:HAT-P-67 ex:diameter "190000"^^ucum:mi. 
> 
> Its diameter is more than 100000 kilometers.
> 
> 
> It appears to me that your query is an unsafe shortcut itself.

Adding that triple doesn't change my results 'cause it's checking for a `datatype(?d) = ucum:km`. I can add `wipe:HAT-P-67 ex:diameter "395200"^^ucum:km` and get:
┌──────────────────────────────────────────┐
│ ?planet                                  │
│   <https://en.wikipedia.org/wiki/saturn> │
│  <https://en.wikipedia.org/wiki/jupiter> │
│ <https://en.wikipedia.org/wiki/HAT-P-67> │
└──────────────────────────────────────────┘
That's the safety I was talking about. Am I missing a query vulnerability?


> peter
> 
> 
> On Fri, 2020-07-24 at 17:38 +0200, Eric Prud'hommeaux wrote:
> > You'll need a microparsing regardless, and, as Simon points out, it's
> > not that onerous. My point was just that having to microparse union
> > types out of the same literal as a UCUM type is more complicated than
> > parsing the UCUM type out of the literal's datatype. Having numeric
> > types separated from their units would allow SPARQL 1.1 queries to
> > avoid cracking the literal form, e.g.
> > 
> > Data:
> > [[
> > PREFIX ex: <http://a.example/astro#>
> > PREFIX ucum: <http://ucum.nlm.nih.gov/#>
> > PREFIX wipe: <https://en.wikipedia.org/wiki/>
> > 
> > wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi .
> > wipe:saturn  ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi .
> > wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi .
> > ]]
> > 
> > Query:
> > [[
> > PREFIX ex: <http://a.example/astro#>
> > PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> > PREFIX ucum: <http://ucum.nlm.nih.gov/#>
> > 
> > SELECT ?planet WHERE {
> >  ?planet ex:diameter ?d .
> >  FILTER(datatype(?d) = ucum:km
> >      && xsd:float(str(?d)) > 1E5)
> > }
> > ]]
> > 
> > Results:
> > ┌─────────────────────────────────────────┐
> > │ ?planet                                 │
> > │  <https://en.wikipedia.org/wiki/saturn> │
> > │ <https://en.wikipedia.org/wiki/jupiter> │
> > └─────────────────────────────────────────┘
> > 
> > Here, SPARQL takes care of parsing value types for us ("1.1e5" or
> > "139822") so the query is safe (datatype(?d) = ucum:km) and pretty
> > easy to compose. The same query is substantially more tedious with
> > data like:
> >   wipe:saturn  ex:diameter "1.1e5 km"^^cdt:ucum , "72367
> > mi"^^cdt:ucum .
> > which is likely to lead to unsafe shortcuts.
> > 
> > So Maxime and Antionne, how hard would it be to transplant your
> > semantics to apply directly to ucum (presuming their cooperation).
> > 
> > 
> > On Fri, Jul 24, 2020 at 12:12:34AM +0000, Cox, Simon (L&W, Clayton)
> > wrote:
> > > Yes you would need a UCUM parser. 
> > > 
> > > Note however that UCUM is not a "large vocabulary". 
> > > There is a relatively small set of terminals here 
> > > http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine
> > > these into a countably infinite set. 
> > > The rule is described here: 
> > > http://unitsofmeasure.org/ucum.html#section-Syntax-Rules 
> > > 
> > > There are a number of implementations listed here 
> > > https://unitsofmeasure.org/trac at 'Implementation Support'. 
> > > This documentation has not been updated for about 3 years, so some
> > > of the links might be stale, and there may be others.   
> > > 
> > > A units-of-measure library, with UCUM support, that was available
> > > to be integrated into RDF applications would be a significant
> > > contribution to the community. 
> > > 
> > > Simon 
> > > 
> > > > -----Original Message-----
> > > > From: Hugh Glaser <hugh@glasers.org>
> > > > Sent: Friday, 24 July, 2020 08:58
> > > > To: Eric Prud'hommeaux <eric@w3.org>
> > > > Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>; Semantic Web
> > > > <semantic-web@w3.org>; Maxime Lefrançois
> > > > <maxime.lefrancois@emse.fr>
> > > > Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes
> > > > semantics -
> > > > existential variables?]
> > > > 
> > > > If I understand correctly.
> > > > I will need to add a UCUM parser to my system to be able to
> > > > process these
> > > > datatypes, if people send them to me in their RDF?
> > > > In fact, I will need a UCUM to RDF converter to be able to
> > > > "understand"
> > > > properly what they "mean"?
> > > > Does such an animal exist?
> > > > 
> > > > It looks to me that UCUM is quite a large vocabulary of units,
> > > > for a start -
> > > > what would the URI for the "liter" unit of measurement be, for
> > > > example?
> > > > 
> > > > I'm very happy to have widely adopted standards like this - I
> > > > just want to
> > > > keep my Semantic Web processing in the Semantic Web (RDF), and as
> > > > simple
> > > > as possible.
> > > > Or at least be helped to do that.
> > > > 
> > > > Cheers
> > > > 
> > > > > On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org>
> > > > > wrote:
> > > > > 
> > > > > On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann
> > > > > wrote:
> > > > > > Regarding physical quantities, such as "5 inches", etc., my
> > > > > > colleague
> > > > > > Maxime Lefrançois and myself coauthored a specification for a
> > > > > > datatype for physical quantities [1]. It is quite simple: we
> > > > > > reuse
> > > > > > the Unified Code for Units of Measurement (UCUM), a standard
> > > > > > that is
> > > > > > used in many scientific applications, and combine it with a
> > > > > > number:
> > > > > > 
> > > > > > <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER> ::=
> > > > > > xsd:decimal(('e'|'E')xsd:integer)?
> > > > > > 
> > > > > > Since UCUM has a well defined semantics, so does our
> > > > > > datatype.
> > > > > > Better, since UCUM is implemented in many programming
> > > > > > languages, my
> > > > > > colleague Maxime could easily integrate it into Jena and its
> > > > > > SPARQL engine
> > > > [2].
> > > > > > So, with our Jena fork, one can write:
> > > > > > 
> > > > > > SELECT ?planet WHERE {
> > > > > >  ?planet a ex:Planet;
> > > > > >    ex:diameter ?s .
> > > > > >  FILTER(?s > "2e11 mm"^^cdt:ucum)
> > > > > > }
> > > > > 
> > > > > I applaud the work to extend XSD's numeric types so that RDF
> > > > > can have
> > > > standard  measurement types. But why not leverage your work by
> > > > adding
> > > > SPARQL support for UCUM types? e.g.
> > > > > SELECT ?planet WHERE {
> > > > >  ?planet a ex:Planet;
> > > > >    ex:diameter ?s .
> > > > >  FILTER(?s > "2e11"^^ucum:mm)
> > > > > }
> > > > > 
> > > > > It feels cleaner to me to embed the entire type of the data in
> > > > > the literal's
> > > > datatype rather than spreading it across an aggregator type
> > > > (cdt:ucum) and
> > > > the lexical value (" mm").
> > > > > In either case we probably have a union type in the lexical
> > > > > value so we'd
> > > > have to micro-parse doubles, decimals and integers, but the
> > > > parsing is easier
> > > > if the measurement unit is broken out into the end of the
> > > > datatype URL.
> > > > > There are a few UCUM units that aren't viable localnames (e.g.
> > > > > "m/s.s"),
> > > > but I think we can encode around that (e.g. "m_s.s") in a way
> > > > that still makes
> > > > ucum: a practical namespace for datatypes.
> > > > > 
> > > > > > This works if the size of the planet is encoded as a
> > > > > > cdt:ucum, no
> > > > > > matter what unit one is using. One can even use "link for
> > > > > > Gunter's
> > > > > > chain" (unit "[lk_us]"), or "cubic meters per acre" (unit
> > > > > > "m3/[acr_us]") [3], which are both units of length.
> > > > > > 
> > > > > > With some of our industrial partners, we are using this for
> > > > > > energy
> > > > > > data, and they seem to be very pleased with this approach,
> > > > > > compared
> > > > > > to an ontology-based approach.
> > > > > > 
> > > > > > 
> > > > > > [1] https://w3id.org/lindt/custom_datatypes#ucum
> > > > > > [2] You can try it at
> > > > > > https://ci.mines-stetienne.fr/lindt/playground.html
> > > > > > [3] Try this query in the playground:
> > > > > > 
> > > > > > """
> > > > > > PREFIX iter: <http://w3id.org/sparql-generate/iter/>
> > > > > > PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> > > > > > PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
> > > > > > PREFIX ex: <http://example.org/>
> > > > > > 
> > > > > > SELECT ?length ?normalized
> > > > > > 
> > > > > > WHERE{
> > > > > > 
> > > > > >  VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum }  #
> > > > > > convert to
> > > > > > meters
> > > > > >  BIND("0 m"^^cdt:ucum + ?position AS ?normalized )
> > > > > > 
> > > > > > }
> > > > > > """
> > > > > > 
> > > > > > --AZ
> > > > > > 
> > > > > > Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit :
> > > > > > > Yeah, the atomicity of the chunk is the point. This even
> > > > > > > applies to
> > > > > > > quantities. 25.4mm is *identical* to 1” – they are the same
> > > > > > > thing.
> > > > > > > Any engine that operates with quantities needs to
> > > > > > > understand that.
> > > > ’25.4’
> > > > > > > and ‘mm’ cannot be separated. Coordinates are slightly more
> > > > > > > complex
> > > > > > > but it comes down to the same thing. A single element
> > > > > > > within a set
> > > > > > > of coordinates that describes a position in space is not
> > > > > > > independent
> > > > > > > of the other numbers in the tuple, or of the coordinate
> > > > > > > reference
> > > > > > > system within which they are expressed. One value should
> > > > > > > *never* be
> > > > > > > used independent of the others. Exactly the same position
> > > > > > > on the
> > > > > > > earth will be denoted by three different numbers if
> > > > > > > embedded in a
> > > > > > > different coordinate reference system. You can only
> > > > > > > ‘reason’ over them
> > > > as a group, not individually.
> > > > > > > *From:*Dan Brickley <danbri@danbri.org>
> > > > > > > *Sent:* Thursday, 16 July, 2020 23:58
> > > > > > > *To:* Jeen Broekstra <jeen@fastmail.com>
> > > > > > > *Cc:* Semantic Web <semantic-web@w3.org>
> > > > > > > *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes
> > > > > > > semantics
> > > > > > > - existential variables?]
> > > > > > > 
> > > > > > > …
> > > > > > > 
> > > > > > > I believe the big appeal of putting it all into the zone we
> > > > > > > call
> > > > > > > "literals" is that you get a kind of atomicity; that chunk
> > > > > > > of data
> > > > > > > is either there, or not there; it is asserted, or not
> > > > > > > asserted. With
> > > > > > > a triples-based (description of a ) data structure you have
> > > > > > > to be
> > > > > > > constantly on your guard that every subset of the full
> > > > > > > graph pattern
> > > > > > > is at least sensible and harmless, even when subsetting
> > > > > > > these chunks
> > > > > > > is often confusing or misleading for data consumers. I
> > > > > > > can't help
> > > > > > > wondering whether notions of graph shapes from shacl, shex
> > > > > > > (and
> > > > > > > sparql) could be exploited to create an RDF-based data
> > > > > > > format which
> > > > > > > had atomicity at the level of entire shapes.
> > > > > > > 
> > > > > > > Dan
> > > > > > > 
> > > > > > >    Jeen
> > > > > > > 
> > > > > > 
> > > > > > --
> > > > > > Antoine Zimmermann
> > > > > > Institut Henri Fayol
> > > > > > École des Mines de Saint-Étienne
> > > > > > 158 cours Fauriel
> > > > > > CS 62362
> > > > > > 42023 Saint-Étienne Cedex 2
> > > > > > France
> > > > > > Tél:+33(0)4 77 42 66 03
> > > > > > Fax:+33(0)4 77 42 66 66
> > > > > > http://www.emse.fr/~zimmermann/
> > > > > > Member of team Connected Intelligence, Laboratoire Hubert
> > > > > > Curien
> > > > 
> > > > --
> > > > Hugh
> > > > 023 8061 5652
> > > > 
> 

Received on Friday, 24 July 2020 18:22:08 UTC