Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?]

I think you're arguing that SPARQL (and OWL, and DL-Query, ...) should
support measurement types.  I don't think this  argument favors either
of `"1.1e5 km"^^cdt:ucum` or `"1.1e5"^^ucum:km` over the other.

Long live blank nodes.


On Fri, Jul 24, 2020 at 05:41:19PM -0400, Peter F. Patel-Schneider wrote:
> Yes, adding this "planet" does not change the results.  But it should!
> 
> The flaw in your query is its dependence on entering the data in km.  Entering
> the data in any other measure only (mile, meters, furlongs, centimeters)
> results in the query missing answers that it should be returning.
> 
> 
> peter
> 
> 
> 
> On 7/24/20 2:21 PM, Eric Prud'hommeaux wrote:
> > On Fri, Jul 24, 2020 at 01:13:45PM -0400, Peter Patel-Schneider wrote:
> >> But what happens with 
> >>
> >> wipe:HAT-P-67 ex:diameter "190000"^^ucum:mi. 
> >>
> >> Its diameter is more than 100000 kilometers.
> >>
> >>
> >> It appears to me that your query is an unsafe shortcut itself.
> > Adding that triple doesn't change my results 'cause it's checking for a `datatype(?d) = ucum:km`. I can add `wipe:HAT-P-67 ex:diameter "395200"^^ucum:km` and get:
> > ┌──────────────────────────────────────────┐
> > │ ?planet                                  │
> > │   <https://en.wikipedia.org/wiki/saturn> │
> > │  <https://en.wikipedia.org/wiki/jupiter> │
> > │ <https://en.wikipedia.org/wiki/HAT-P-67> │
> > └──────────────────────────────────────────┘
> > That's the safety I was talking about. Am I missing a query vulnerability?
> >
> >
> >> peter
> >>
> >>
> >> On Fri, 2020-07-24 at 17:38 +0200, Eric Prud'hommeaux wrote:
> >>> You'll need a microparsing regardless, and, as Simon points out, it's
> >>> not that onerous. My point was just that having to microparse union
> >>> types out of the same literal as a UCUM type is more complicated than
> >>> parsing the UCUM type out of the literal's datatype. Having numeric
> >>> types separated from their units would allow SPARQL 1.1 queries to
> >>> avoid cracking the literal form, e.g.
> >>>
> >>> Data:
> >>> [[
> >>> PREFIX ex: <http://a.example/astro#>
> >>> PREFIX ucum: <http://ucum.nlm.nih.gov/#>
> >>> PREFIX wipe: <https://en.wikipedia.org/wiki/>
> >>>
> >>> wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi .
> >>> wipe:saturn  ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi .
> >>> wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi .
> >>> ]]
> >>>
> >>> Query:
> >>> [[
> >>> PREFIX ex: <http://a.example/astro#>
> >>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> >>> PREFIX ucum: <http://ucum.nlm.nih.gov/#>
> >>>
> >>> SELECT ?planet WHERE {
> >>>  ?planet ex:diameter ?d .
> >>>  FILTER(datatype(?d) = ucum:km
> >>>      && xsd:float(str(?d)) > 1E5)
> >>> }
> >>> ]]
> >>>
> >>> Results:
> >>> ┌─────────────────────────────────────────┐
> >>> │ ?planet                                 │
> >>> │  <https://en.wikipedia.org/wiki/saturn> │
> >>> │ <https://en.wikipedia.org/wiki/jupiter> │
> >>> └─────────────────────────────────────────┘
> >>>
> >>> Here, SPARQL takes care of parsing value types for us ("1.1e5" or
> >>> "139822") so the query is safe (datatype(?d) = ucum:km) and pretty
> >>> easy to compose. The same query is substantially more tedious with
> >>> data like:
> >>>   wipe:saturn  ex:diameter "1.1e5 km"^^cdt:ucum , "72367
> >>> mi"^^cdt:ucum .
> >>> which is likely to lead to unsafe shortcuts.
> >>>
> >>> So Maxime and Antionne, how hard would it be to transplant your
> >>> semantics to apply directly to ucum (presuming their cooperation).
> >>>
> >>>
> >>> On Fri, Jul 24, 2020 at 12:12:34AM +0000, Cox, Simon (L&W, Clayton)
> >>> wrote:
> >>>> Yes you would need a UCUM parser. 
> >>>>
> >>>> Note however that UCUM is not a "large vocabulary". 
> >>>> There is a relatively small set of terminals here 
> >>>> http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine
> >>>> these into a countably infinite set. 
> >>>> The rule is described here: 
> >>>> http://unitsofmeasure.org/ucum.html#section-Syntax-Rules 
> >>>>
> >>>> There are a number of implementations listed here 
> >>>> https://unitsofmeasure.org/trac at 'Implementation Support'. 
> >>>> This documentation has not been updated for about 3 years, so some
> >>>> of the links might be stale, and there may be others.   
> >>>>
> >>>> A units-of-measure library, with UCUM support, that was available
> >>>> to be integrated into RDF applications would be a significant
> >>>> contribution to the community. 
> >>>>
> >>>> Simon 
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Hugh Glaser <hugh@glasers.org>
> >>>>> Sent: Friday, 24 July, 2020 08:58
> >>>>> To: Eric Prud'hommeaux <eric@w3.org>
> >>>>> Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>; Semantic Web
> >>>>> <semantic-web@w3.org>; Maxime Lefrançois
> >>>>> <maxime.lefrancois@emse.fr>
> >>>>> Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes
> >>>>> semantics -
> >>>>> existential variables?]
> >>>>>
> >>>>> If I understand correctly.
> >>>>> I will need to add a UCUM parser to my system to be able to
> >>>>> process these
> >>>>> datatypes, if people send them to me in their RDF?
> >>>>> In fact, I will need a UCUM to RDF converter to be able to
> >>>>> "understand"
> >>>>> properly what they "mean"?
> >>>>> Does such an animal exist?
> >>>>>
> >>>>> It looks to me that UCUM is quite a large vocabulary of units,
> >>>>> for a start -
> >>>>> what would the URI for the "liter" unit of measurement be, for
> >>>>> example?
> >>>>>
> >>>>> I'm very happy to have widely adopted standards like this - I
> >>>>> just want to
> >>>>> keep my Semantic Web processing in the Semantic Web (RDF), and as
> >>>>> simple
> >>>>> as possible.
> >>>>> Or at least be helped to do that.
> >>>>>
> >>>>> Cheers
> >>>>>
> >>>>>> On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>> On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann
> >>>>>> wrote:
> >>>>>>> Regarding physical quantities, such as "5 inches", etc., my
> >>>>>>> colleague
> >>>>>>> Maxime Lefrançois and myself coauthored a specification for a
> >>>>>>> datatype for physical quantities [1]. It is quite simple: we
> >>>>>>> reuse
> >>>>>>> the Unified Code for Units of Measurement (UCUM), a standard
> >>>>>>> that is
> >>>>>>> used in many scientific applications, and combine it with a
> >>>>>>> number:
> >>>>>>>
> >>>>>>> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER> ::=
> >>>>>>> xsd:decimal(('e'|'E')xsd:integer)?
> >>>>>>>
> >>>>>>> Since UCUM has a well defined semantics, so does our
> >>>>>>> datatype.
> >>>>>>> Better, since UCUM is implemented in many programming
> >>>>>>> languages, my
> >>>>>>> colleague Maxime could easily integrate it into Jena and its
> >>>>>>> SPARQL engine
> >>>>> [2].
> >>>>>>> So, with our Jena fork, one can write:
> >>>>>>>
> >>>>>>> SELECT ?planet WHERE {
> >>>>>>>  ?planet a ex:Planet;
> >>>>>>>    ex:diameter ?s .
> >>>>>>>  FILTER(?s > "2e11 mm"^^cdt:ucum)
> >>>>>>> }
> >>>>>> I applaud the work to extend XSD's numeric types so that RDF
> >>>>>> can have
> >>>>> standard  measurement types. But why not leverage your work by
> >>>>> adding
> >>>>> SPARQL support for UCUM types? e.g.
> >>>>>> SELECT ?planet WHERE {
> >>>>>>  ?planet a ex:Planet;
> >>>>>>    ex:diameter ?s .
> >>>>>>  FILTER(?s > "2e11"^^ucum:mm)
> >>>>>> }
> >>>>>>
> >>>>>> It feels cleaner to me to embed the entire type of the data in
> >>>>>> the literal's
> >>>>> datatype rather than spreading it across an aggregator type
> >>>>> (cdt:ucum) and
> >>>>> the lexical value (" mm").
> >>>>>> In either case we probably have a union type in the lexical
> >>>>>> value so we'd
> >>>>> have to micro-parse doubles, decimals and integers, but the
> >>>>> parsing is easier
> >>>>> if the measurement unit is broken out into the end of the
> >>>>> datatype URL.
> >>>>>> There are a few UCUM units that aren't viable localnames (e.g.
> >>>>>> "m/s.s"),
> >>>>> but I think we can encode around that (e.g. "m_s.s") in a way
> >>>>> that still makes
> >>>>> ucum: a practical namespace for datatypes.
> >>>>>>> This works if the size of the planet is encoded as a
> >>>>>>> cdt:ucum, no
> >>>>>>> matter what unit one is using. One can even use "link for
> >>>>>>> Gunter's
> >>>>>>> chain" (unit "[lk_us]"), or "cubic meters per acre" (unit
> >>>>>>> "m3/[acr_us]") [3], which are both units of length.
> >>>>>>>
> >>>>>>> With some of our industrial partners, we are using this for
> >>>>>>> energy
> >>>>>>> data, and they seem to be very pleased with this approach,
> >>>>>>> compared
> >>>>>>> to an ontology-based approach.
> >>>>>>>
> >>>>>>>
> >>>>>>> [1] https://w3id.org/lindt/custom_datatypes#ucum
> >>>>>>> [2] You can try it at
> >>>>>>> https://ci.mines-stetienne.fr/lindt/playground.html
> >>>>>>> [3] Try this query in the playground:
> >>>>>>>
> >>>>>>> """
> >>>>>>> PREFIX iter: <http://w3id.org/sparql-generate/iter/>
> >>>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> >>>>>>> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
> >>>>>>> PREFIX ex: <http://example.org/>
> >>>>>>>
> >>>>>>> SELECT ?length ?normalized
> >>>>>>>
> >>>>>>> WHERE{
> >>>>>>>
> >>>>>>>  VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum }  #
> >>>>>>> convert to
> >>>>>>> meters
> >>>>>>>  BIND("0 m"^^cdt:ucum + ?position AS ?normalized )
> >>>>>>>
> >>>>>>> }
> >>>>>>> """
> >>>>>>>
> >>>>>>> --AZ
> >>>>>>>
> >>>>>>> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit :
> >>>>>>>> Yeah, the atomicity of the chunk is the point. This even
> >>>>>>>> applies to
> >>>>>>>> quantities. 25.4mm is *identical* to 1” – they are the same
> >>>>>>>> thing.
> >>>>>>>> Any engine that operates with quantities needs to
> >>>>>>>> understand that.
> >>>>> ’25.4’
> >>>>>>>> and ‘mm’ cannot be separated. Coordinates are slightly more
> >>>>>>>> complex
> >>>>>>>> but it comes down to the same thing. A single element
> >>>>>>>> within a set
> >>>>>>>> of coordinates that describes a position in space is not
> >>>>>>>> independent
> >>>>>>>> of the other numbers in the tuple, or of the coordinate
> >>>>>>>> reference
> >>>>>>>> system within which they are expressed. One value should
> >>>>>>>> *never* be
> >>>>>>>> used independent of the others. Exactly the same position
> >>>>>>>> on the
> >>>>>>>> earth will be denoted by three different numbers if
> >>>>>>>> embedded in a
> >>>>>>>> different coordinate reference system. You can only
> >>>>>>>> ‘reason’ over them
> >>>>> as a group, not individually.
> >>>>>>>> *From:*Dan Brickley <danbri@danbri.org>
> >>>>>>>> *Sent:* Thursday, 16 July, 2020 23:58
> >>>>>>>> *To:* Jeen Broekstra <jeen@fastmail.com>
> >>>>>>>> *Cc:* Semantic Web <semantic-web@w3.org>
> >>>>>>>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes
> >>>>>>>> semantics
> >>>>>>>> - existential variables?]
> >>>>>>>>
> >>>>>>>> …
> >>>>>>>>
> >>>>>>>> I believe the big appeal of putting it all into the zone we
> >>>>>>>> call
> >>>>>>>> "literals" is that you get a kind of atomicity; that chunk
> >>>>>>>> of data
> >>>>>>>> is either there, or not there; it is asserted, or not
> >>>>>>>> asserted. With
> >>>>>>>> a triples-based (description of a ) data structure you have
> >>>>>>>> to be
> >>>>>>>> constantly on your guard that every subset of the full
> >>>>>>>> graph pattern
> >>>>>>>> is at least sensible and harmless, even when subsetting
> >>>>>>>> these chunks
> >>>>>>>> is often confusing or misleading for data consumers. I
> >>>>>>>> can't help
> >>>>>>>> wondering whether notions of graph shapes from shacl, shex
> >>>>>>>> (and
> >>>>>>>> sparql) could be exploited to create an RDF-based data
> >>>>>>>> format which
> >>>>>>>> had atomicity at the level of entire shapes.
> >>>>>>>>
> >>>>>>>> Dan
> >>>>>>>>
> >>>>>>>>    Jeen
> >>>>>>>>
> >>>>>>> --
> >>>>>>> Antoine Zimmermann
> >>>>>>> Institut Henri Fayol
> >>>>>>> École des Mines de Saint-Étienne
> >>>>>>> 158 cours Fauriel
> >>>>>>> CS 62362
> >>>>>>> 42023 Saint-Étienne Cedex 2
> >>>>>>> France
> >>>>>>> Tél:+33(0)4 77 42 66 03
> >>>>>>> Fax:+33(0)4 77 42 66 66
> >>>>>>> http://www.emse.fr/~zimmermann/
> >>>>>>> Member of team Connected Intelligence, Laboratoire Hubert
> >>>>>>> Curien
> >>>>> --
> >>>>> Hugh
> >>>>> 023 8061 5652
> >>>>>

Received on Friday, 24 July 2020 22:09:49 UTC