RE: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?]

Indeed, quantities must have the same 'dimension' to be comparable across different units (here: L1 i.e. length, or A0E0L1I0M0H0T0D0 in complete form). 
Down the rabbit hole ... 

> -----Original Message-----
> From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
> Sent: Saturday, 25 July, 2020 07:41
> To: Eric Prud'hommeaux <eric@w3.org>
> Cc: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>; Hugh Glaser
> <hugh@glasers.org>; Antoine Zimmermann
> <antoine.zimmermann@emse.fr>; Semantic Web <semantic-web@w3.org>;
> Maxime Lefrançois <maxime.lefrancois@emse.fr>
> Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics -
> existential variables?]
> 
> Yes, adding this "planet" does not change the results.  But it should!
> 
> The flaw in your query is its dependence on entering the data in
> km.  Entering the data in any other measure only (mile, meters, furlongs,
> centimeters) results in the query missing answers that it should be returning.
> 
> 
> peter
> 
> 
> 
> On 7/24/20 2:21 PM, Eric Prud'hommeaux wrote:
> > On Fri, Jul 24, 2020 at 01:13:45PM -0400, Peter Patel-Schneider wrote:
> >> But what happens with
> >>
> >> wipe:HAT-P-67 ex:diameter "190000"^^ucum:mi.
> >>
> >> Its diameter is more than 100000 kilometers.
> >>
> >>
> >> It appears to me that your query is an unsafe shortcut itself.
> > Adding that triple doesn't change my results 'cause it's checking for a
> `datatype(?d) = ucum:km`. I can add `wipe:HAT-P-67 ex:diameter
> "395200"^^ucum:km` and get:
> > ┌──────────────────────────────────────────┐
> > │ ?planet                                  │
> > │   <https://en.wikipedia.org/wiki/saturn> │
> > │  <https://en.wikipedia.org/wiki/jupiter> │ │
> > <https://en.wikipedia.org/wiki/HAT-P-67> │
> > └──────────────────────────────────────────┘
> > That's the safety I was talking about. Am I missing a query vulnerability?
> >
> >
> >> peter
> >>
> >>
> >> On Fri, 2020-07-24 at 17:38 +0200, Eric Prud'hommeaux wrote:
> >>> You'll need a microparsing regardless, and, as Simon points out,
> >>> it's not that onerous. My point was just that having to microparse
> >>> union types out of the same literal as a UCUM type is more
> >>> complicated than parsing the UCUM type out of the literal's
> >>> datatype. Having numeric types separated from their units would
> >>> allow SPARQL 1.1 queries to avoid cracking the literal form, e.g.
> >>>
> >>> Data:
> >>> [[
> >>> PREFIX ex: <http://a.example/astro#> PREFIX ucum:
> >>> <http://ucum.nlm.nih.gov/#> PREFIX wipe:
> >>> <https://en.wikipedia.org/wiki/>
> >>>
> >>> wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi .
> >>> wipe:saturn  ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi .
> >>> wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi .
> >>> ]]
> >>>
> >>> Query:
> >>> [[
> >>> PREFIX ex: <http://a.example/astro#> PREFIX xsd:
> >>> <http://www.w3.org/2001/XMLSchema#>
> >>> PREFIX ucum: <http://ucum.nlm.nih.gov/#>
> >>>
> >>> SELECT ?planet WHERE {
> >>>  ?planet ex:diameter ?d .
> >>>  FILTER(datatype(?d) = ucum:km
> >>>      && xsd:float(str(?d)) > 1E5)
> >>> }
> >>> ]]
> >>>
> >>> Results:
> >>> ┌─────────────────────────────────────────┐
> >>> │ ?planet                                 │
> >>> │  <https://en.wikipedia.org/wiki/saturn> │ │
> >>> <https://en.wikipedia.org/wiki/jupiter> │
> >>> └─────────────────────────────────────────┘
> >>>
> >>> Here, SPARQL takes care of parsing value types for us ("1.1e5" or
> >>> "139822") so the query is safe (datatype(?d) = ucum:km) and pretty
> >>> easy to compose. The same query is substantially more tedious with
> >>> data like:
> >>>   wipe:saturn  ex:diameter "1.1e5 km"^^cdt:ucum , "72367
> >>> mi"^^cdt:ucum .
> >>> which is likely to lead to unsafe shortcuts.
> >>>
> >>> So Maxime and Antionne, how hard would it be to transplant your
> >>> semantics to apply directly to ucum (presuming their cooperation).
> >>>
> >>>
> >>> On Fri, Jul 24, 2020 at 12:12:34AM +0000, Cox, Simon (L&W, Clayton)
> >>> wrote:
> >>>> Yes you would need a UCUM parser.
> >>>>
> >>>> Note however that UCUM is not a "large vocabulary".
> >>>> There is a relatively small set of terminals here
> >>>> http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine
> >>>> these into a countably infinite set.
> >>>> The rule is described here:
> >>>> http://unitsofmeasure.org/ucum.html#section-Syntax-Rules

> >>>>
> >>>> There are a number of implementations listed here
> >>>> https://unitsofmeasure.org/trac at 'Implementation Support'.
> >>>> This documentation has not been updated for about 3 years, so some
> >>>> of the links might be stale, and there may be others.
> >>>>
> >>>> A units-of-measure library, with UCUM support, that was available
> >>>> to be integrated into RDF applications would be a significant
> >>>> contribution to the community.
> >>>>
> >>>> Simon
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Hugh Glaser <hugh@glasers.org>
> >>>>> Sent: Friday, 24 July, 2020 08:58
> >>>>> To: Eric Prud'hommeaux <eric@w3.org>
> >>>>> Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>;
> Semantic Web
> >>>>> <semantic-web@w3.org>; Maxime Lefrançois
> >>>>> <maxime.lefrancois@emse.fr>
> >>>>> Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics
> >>>>> - existential variables?]
> >>>>>
> >>>>> If I understand correctly.
> >>>>> I will need to add a UCUM parser to my system to be able to
> >>>>> process these datatypes, if people send them to me in their RDF?
> >>>>> In fact, I will need a UCUM to RDF converter to be able to
> >>>>> "understand"
> >>>>> properly what they "mean"?
> >>>>> Does such an animal exist?
> >>>>>
> >>>>> It looks to me that UCUM is quite a large vocabulary of units, for
> >>>>> a start - what would the URI for the "liter" unit of measurement
> >>>>> be, for example?
> >>>>>
> >>>>> I'm very happy to have widely adopted standards like this - I just
> >>>>> want to keep my Semantic Web processing in the Semantic Web
> (RDF),
> >>>>> and as simple as possible.
> >>>>> Or at least be helped to do that.
> >>>>>
> >>>>> Cheers
> >>>>>
> >>>>>> On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>> On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann
> >>>>>> wrote:
> >>>>>>> Regarding physical quantities, such as "5 inches", etc., my
> >>>>>>> colleague Maxime Lefrançois and myself coauthored a
> >>>>>>> specification for a datatype for physical quantities [1]. It is
> >>>>>>> quite simple: we reuse the Unified Code for Units of Measurement
> >>>>>>> (UCUM), a standard that is used in many scientific applications,
> >>>>>>> and combine it with a
> >>>>>>> number:
> >>>>>>>
> >>>>>>> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER>
> ::=
> >>>>>>> xsd:decimal(('e'|'E')xsd:integer)?
> >>>>>>>
> >>>>>>> Since UCUM has a well defined semantics, so does our datatype.
> >>>>>>> Better, since UCUM is implemented in many programming
> languages,
> >>>>>>> my colleague Maxime could easily integrate it into Jena and its
> >>>>>>> SPARQL engine
> >>>>> [2].
> >>>>>>> So, with our Jena fork, one can write:
> >>>>>>>
> >>>>>>> SELECT ?planet WHERE {
> >>>>>>>  ?planet a ex:Planet;
> >>>>>>>    ex:diameter ?s .
> >>>>>>>  FILTER(?s > "2e11 mm"^^cdt:ucum) }
> >>>>>> I applaud the work to extend XSD's numeric types so that RDF can
> >>>>>> have
> >>>>> standard  measurement types. But why not leverage your work by
> >>>>> adding SPARQL support for UCUM types? e.g.
> >>>>>> SELECT ?planet WHERE {
> >>>>>>  ?planet a ex:Planet;
> >>>>>>    ex:diameter ?s .
> >>>>>>  FILTER(?s > "2e11"^^ucum:mm)
> >>>>>> }
> >>>>>>
> >>>>>> It feels cleaner to me to embed the entire type of the data in
> >>>>>> the literal's
> >>>>> datatype rather than spreading it across an aggregator type
> >>>>> (cdt:ucum) and
> >>>>> the lexical value (" mm").
> >>>>>> In either case we probably have a union type in the lexical value
> >>>>>> so we'd
> >>>>> have to micro-parse doubles, decimals and integers, but the
> >>>>> parsing is easier if the measurement unit is broken out into the
> >>>>> end of the datatype URL.
> >>>>>> There are a few UCUM units that aren't viable localnames (e.g.
> >>>>>> "m/s.s"),
> >>>>> but I think we can encode around that (e.g. "m_s.s") in a way that
> >>>>> still makes
> >>>>> ucum: a practical namespace for datatypes.
> >>>>>>> This works if the size of the planet is encoded as a cdt:ucum,
> >>>>>>> no matter what unit one is using. One can even use "link for
> >>>>>>> Gunter's chain" (unit "[lk_us]"), or "cubic meters per acre"
> >>>>>>> (unit
> >>>>>>> "m3/[acr_us]") [3], which are both units of length.
> >>>>>>>
> >>>>>>> With some of our industrial partners, we are using this for
> >>>>>>> energy data, and they seem to be very pleased with this
> >>>>>>> approach, compared to an ontology-based approach.
> >>>>>>>
> >>>>>>>
> >>>>>>> [1] https://w3id.org/lindt/custom_datatypes#ucum

> >>>>>>> [2] You can try it at
> >>>>>>> https://ci.mines-stetienne.fr/lindt/playground.html

> >>>>>>> [3] Try this query in the playground:
> >>>>>>>
> >>>>>>> """
> >>>>>>> PREFIX iter: <http://w3id.org/sparql-generate/iter/>
> >>>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> >>>>>>> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
> >>>>>>> PREFIX ex: <http://example.org/>
> >>>>>>>
> >>>>>>> SELECT ?length ?normalized
> >>>>>>>
> >>>>>>> WHERE{
> >>>>>>>
> >>>>>>>  VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum }  # convert
> >>>>>>> to meters
> >>>>>>>  BIND("0 m"^^cdt:ucum + ?position AS ?normalized )
> >>>>>>>
> >>>>>>> }
> >>>>>>> """
> >>>>>>>
> >>>>>>> --AZ
> >>>>>>>
> >>>>>>> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit :
> >>>>>>>> Yeah, the atomicity of the chunk is the point. This even
> >>>>>>>> applies to quantities. 25.4mm is *identical* to 1” – they are
> >>>>>>>> the same thing.
> >>>>>>>> Any engine that operates with quantities needs to understand
> >>>>>>>> that.
> >>>>> ’25.4’
> >>>>>>>> and ‘mm’ cannot be separated. Coordinates are slightly more
> >>>>>>>> complex but it comes down to the same thing. A single element
> >>>>>>>> within a set of coordinates that describes a position in space
> >>>>>>>> is not independent of the other numbers in the tuple, or of the
> >>>>>>>> coordinate reference system within which they are expressed.
> >>>>>>>> One value should
> >>>>>>>> *never* be
> >>>>>>>> used independent of the others. Exactly the same position on
> >>>>>>>> the earth will be denoted by three different numbers if
> >>>>>>>> embedded in a different coordinate reference system. You can
> >>>>>>>> only ‘reason’ over them
> >>>>> as a group, not individually.
> >>>>>>>> *From:*Dan Brickley <danbri@danbri.org>
> >>>>>>>> *Sent:* Thursday, 16 July, 2020 23:58
> >>>>>>>> *To:* Jeen Broekstra <jeen@fastmail.com>
> >>>>>>>> *Cc:* Semantic Web <semantic-web@w3.org>
> >>>>>>>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes
> >>>>>>>> semantics
> >>>>>>>> - existential variables?]
> >>>>>>>>
> >>>>>>>> …
> >>>>>>>>
> >>>>>>>> I believe the big appeal of putting it all into the zone we
> >>>>>>>> call "literals" is that you get a kind of atomicity; that chunk
> >>>>>>>> of data is either there, or not there; it is asserted, or not
> >>>>>>>> asserted. With a triples-based (description of a ) data
> >>>>>>>> structure you have to be constantly on your guard that every
> >>>>>>>> subset of the full graph pattern is at least sensible and
> >>>>>>>> harmless, even when subsetting these chunks is often confusing
> >>>>>>>> or misleading for data consumers. I can't help wondering
> >>>>>>>> whether notions of graph shapes from shacl, shex (and
> >>>>>>>> sparql) could be exploited to create an RDF-based data format
> >>>>>>>> which had atomicity at the level of entire shapes.
> >>>>>>>>
> >>>>>>>> Dan
> >>>>>>>>
> >>>>>>>>    Jeen
> >>>>>>>>
> >>>>>>> --
> >>>>>>> Antoine Zimmermann
> >>>>>>> Institut Henri Fayol
> >>>>>>> École des Mines de Saint-Étienne
> >>>>>>> 158 cours Fauriel
> >>>>>>> CS 62362
> >>>>>>> 42023 Saint-Étienne Cedex 2
> >>>>>>> France
> >>>>>>> Tél:+33(0)4 77 42 66 03
> >>>>>>> Fax:+33(0)4 77 42 66 66
> >>>>>>> http://www.emse.fr/~zimmermann/

> >>>>>>> Member of team Connected Intelligence, Laboratoire Hubert
> Curien
> >>>>> --
> >>>>> Hugh
> >>>>> 023 8061 5652
> >>>>>

Received on Sunday, 26 July 2020 11:04:34 UTC