- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Fri, 24 Jul 2020 17:41:19 -0400
- To: Eric Prud'hommeaux <eric@w3.org>
- Cc: "Cox, Simon (L&W, Clayton)" <Simon.Cox@csiro.au>, Hugh Glaser <hugh@glasers.org>, Antoine Zimmermann <antoine.zimmermann@emse.fr>, Semantic Web <semantic-web@w3.org>, Maxime Lefrançois <maxime.lefrancois@emse.fr>
Yes, adding this "planet" does not change the results. But it should! The flaw in your query is its dependence on entering the data in km. Entering the data in any other measure only (mile, meters, furlongs, centimeters) results in the query missing answers that it should be returning. peter On 7/24/20 2:21 PM, Eric Prud'hommeaux wrote: > On Fri, Jul 24, 2020 at 01:13:45PM -0400, Peter Patel-Schneider wrote: >> But what happens with >> >> wipe:HAT-P-67 ex:diameter "190000"^^ucum:mi. >> >> Its diameter is more than 100000 kilometers. >> >> >> It appears to me that your query is an unsafe shortcut itself. > Adding that triple doesn't change my results 'cause it's checking for a `datatype(?d) = ucum:km`. I can add `wipe:HAT-P-67 ex:diameter "395200"^^ucum:km` and get: > ┌──────────────────────────────────────────┐ > │ ?planet │ > │ <https://en.wikipedia.org/wiki/saturn> │ > │ <https://en.wikipedia.org/wiki/jupiter> │ > │ <https://en.wikipedia.org/wiki/HAT-P-67> │ > └──────────────────────────────────────────┘ > That's the safety I was talking about. Am I missing a query vulnerability? > > >> peter >> >> >> On Fri, 2020-07-24 at 17:38 +0200, Eric Prud'hommeaux wrote: >>> You'll need a microparsing regardless, and, as Simon points out, it's >>> not that onerous. My point was just that having to microparse union >>> types out of the same literal as a UCUM type is more complicated than >>> parsing the UCUM type out of the literal's datatype. Having numeric >>> types separated from their units would allow SPARQL 1.1 queries to >>> avoid cracking the literal form, e.g. >>> >>> Data: >>> [[ >>> PREFIX ex: <http://a.example/astro#> >>> PREFIX ucum: <http://ucum.nlm.nih.gov/#> >>> PREFIX wipe: <https://en.wikipedia.org/wiki/> >>> >>> wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi . >>> wipe:saturn ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi . >>> wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi . >>> ]] >>> >>> Query: >>> [[ >>> PREFIX ex: <http://a.example/astro#> >>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> >>> PREFIX ucum: <http://ucum.nlm.nih.gov/#> >>> >>> SELECT ?planet WHERE { >>> ?planet ex:diameter ?d . >>> FILTER(datatype(?d) = ucum:km >>> && xsd:float(str(?d)) > 1E5) >>> } >>> ]] >>> >>> Results: >>> ┌─────────────────────────────────────────┐ >>> │ ?planet │ >>> │ <https://en.wikipedia.org/wiki/saturn> │ >>> │ <https://en.wikipedia.org/wiki/jupiter> │ >>> └─────────────────────────────────────────┘ >>> >>> Here, SPARQL takes care of parsing value types for us ("1.1e5" or >>> "139822") so the query is safe (datatype(?d) = ucum:km) and pretty >>> easy to compose. The same query is substantially more tedious with >>> data like: >>> wipe:saturn ex:diameter "1.1e5 km"^^cdt:ucum , "72367 >>> mi"^^cdt:ucum . >>> which is likely to lead to unsafe shortcuts. >>> >>> So Maxime and Antionne, how hard would it be to transplant your >>> semantics to apply directly to ucum (presuming their cooperation). >>> >>> >>> On Fri, Jul 24, 2020 at 12:12:34AM +0000, Cox, Simon (L&W, Clayton) >>> wrote: >>>> Yes you would need a UCUM parser. >>>> >>>> Note however that UCUM is not a "large vocabulary". >>>> There is a relatively small set of terminals here >>>> http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine >>>> these into a countably infinite set. >>>> The rule is described here: >>>> http://unitsofmeasure.org/ucum.html#section-Syntax-Rules >>>> >>>> There are a number of implementations listed here >>>> https://unitsofmeasure.org/trac at 'Implementation Support'. >>>> This documentation has not been updated for about 3 years, so some >>>> of the links might be stale, and there may be others. >>>> >>>> A units-of-measure library, with UCUM support, that was available >>>> to be integrated into RDF applications would be a significant >>>> contribution to the community. >>>> >>>> Simon >>>> >>>>> -----Original Message----- >>>>> From: Hugh Glaser <hugh@glasers.org> >>>>> Sent: Friday, 24 July, 2020 08:58 >>>>> To: Eric Prud'hommeaux <eric@w3.org> >>>>> Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>; Semantic Web >>>>> <semantic-web@w3.org>; Maxime Lefrançois >>>>> <maxime.lefrancois@emse.fr> >>>>> Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes >>>>> semantics - >>>>> existential variables?] >>>>> >>>>> If I understand correctly. >>>>> I will need to add a UCUM parser to my system to be able to >>>>> process these >>>>> datatypes, if people send them to me in their RDF? >>>>> In fact, I will need a UCUM to RDF converter to be able to >>>>> "understand" >>>>> properly what they "mean"? >>>>> Does such an animal exist? >>>>> >>>>> It looks to me that UCUM is quite a large vocabulary of units, >>>>> for a start - >>>>> what would the URI for the "liter" unit of measurement be, for >>>>> example? >>>>> >>>>> I'm very happy to have widely adopted standards like this - I >>>>> just want to >>>>> keep my Semantic Web processing in the Semantic Web (RDF), and as >>>>> simple >>>>> as possible. >>>>> Or at least be helped to do that. >>>>> >>>>> Cheers >>>>> >>>>>> On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> >>>>>> wrote: >>>>>> >>>>>> On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann >>>>>> wrote: >>>>>>> Regarding physical quantities, such as "5 inches", etc., my >>>>>>> colleague >>>>>>> Maxime Lefrançois and myself coauthored a specification for a >>>>>>> datatype for physical quantities [1]. It is quite simple: we >>>>>>> reuse >>>>>>> the Unified Code for Units of Measurement (UCUM), a standard >>>>>>> that is >>>>>>> used in many scientific applications, and combine it with a >>>>>>> number: >>>>>>> >>>>>>> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER> ::= >>>>>>> xsd:decimal(('e'|'E')xsd:integer)? >>>>>>> >>>>>>> Since UCUM has a well defined semantics, so does our >>>>>>> datatype. >>>>>>> Better, since UCUM is implemented in many programming >>>>>>> languages, my >>>>>>> colleague Maxime could easily integrate it into Jena and its >>>>>>> SPARQL engine >>>>> [2]. >>>>>>> So, with our Jena fork, one can write: >>>>>>> >>>>>>> SELECT ?planet WHERE { >>>>>>> ?planet a ex:Planet; >>>>>>> ex:diameter ?s . >>>>>>> FILTER(?s > "2e11 mm"^^cdt:ucum) >>>>>>> } >>>>>> I applaud the work to extend XSD's numeric types so that RDF >>>>>> can have >>>>> standard measurement types. But why not leverage your work by >>>>> adding >>>>> SPARQL support for UCUM types? e.g. >>>>>> SELECT ?planet WHERE { >>>>>> ?planet a ex:Planet; >>>>>> ex:diameter ?s . >>>>>> FILTER(?s > "2e11"^^ucum:mm) >>>>>> } >>>>>> >>>>>> It feels cleaner to me to embed the entire type of the data in >>>>>> the literal's >>>>> datatype rather than spreading it across an aggregator type >>>>> (cdt:ucum) and >>>>> the lexical value (" mm"). >>>>>> In either case we probably have a union type in the lexical >>>>>> value so we'd >>>>> have to micro-parse doubles, decimals and integers, but the >>>>> parsing is easier >>>>> if the measurement unit is broken out into the end of the >>>>> datatype URL. >>>>>> There are a few UCUM units that aren't viable localnames (e.g. >>>>>> "m/s.s"), >>>>> but I think we can encode around that (e.g. "m_s.s") in a way >>>>> that still makes >>>>> ucum: a practical namespace for datatypes. >>>>>>> This works if the size of the planet is encoded as a >>>>>>> cdt:ucum, no >>>>>>> matter what unit one is using. One can even use "link for >>>>>>> Gunter's >>>>>>> chain" (unit "[lk_us]"), or "cubic meters per acre" (unit >>>>>>> "m3/[acr_us]") [3], which are both units of length. >>>>>>> >>>>>>> With some of our industrial partners, we are using this for >>>>>>> energy >>>>>>> data, and they seem to be very pleased with this approach, >>>>>>> compared >>>>>>> to an ontology-based approach. >>>>>>> >>>>>>> >>>>>>> [1] https://w3id.org/lindt/custom_datatypes#ucum >>>>>>> [2] You can try it at >>>>>>> https://ci.mines-stetienne.fr/lindt/playground.html >>>>>>> [3] Try this query in the playground: >>>>>>> >>>>>>> """ >>>>>>> PREFIX iter: <http://w3id.org/sparql-generate/iter/> >>>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> >>>>>>> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#> >>>>>>> PREFIX ex: <http://example.org/> >>>>>>> >>>>>>> SELECT ?length ?normalized >>>>>>> >>>>>>> WHERE{ >>>>>>> >>>>>>> VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum } # >>>>>>> convert to >>>>>>> meters >>>>>>> BIND("0 m"^^cdt:ucum + ?position AS ?normalized ) >>>>>>> >>>>>>> } >>>>>>> """ >>>>>>> >>>>>>> --AZ >>>>>>> >>>>>>> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit : >>>>>>>> Yeah, the atomicity of the chunk is the point. This even >>>>>>>> applies to >>>>>>>> quantities. 25.4mm is *identical* to 1” – they are the same >>>>>>>> thing. >>>>>>>> Any engine that operates with quantities needs to >>>>>>>> understand that. >>>>> ’25.4’ >>>>>>>> and ‘mm’ cannot be separated. Coordinates are slightly more >>>>>>>> complex >>>>>>>> but it comes down to the same thing. A single element >>>>>>>> within a set >>>>>>>> of coordinates that describes a position in space is not >>>>>>>> independent >>>>>>>> of the other numbers in the tuple, or of the coordinate >>>>>>>> reference >>>>>>>> system within which they are expressed. One value should >>>>>>>> *never* be >>>>>>>> used independent of the others. Exactly the same position >>>>>>>> on the >>>>>>>> earth will be denoted by three different numbers if >>>>>>>> embedded in a >>>>>>>> different coordinate reference system. You can only >>>>>>>> ‘reason’ over them >>>>> as a group, not individually. >>>>>>>> *From:*Dan Brickley <danbri@danbri.org> >>>>>>>> *Sent:* Thursday, 16 July, 2020 23:58 >>>>>>>> *To:* Jeen Broekstra <jeen@fastmail.com> >>>>>>>> *Cc:* Semantic Web <semantic-web@w3.org> >>>>>>>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes >>>>>>>> semantics >>>>>>>> - existential variables?] >>>>>>>> >>>>>>>> … >>>>>>>> >>>>>>>> I believe the big appeal of putting it all into the zone we >>>>>>>> call >>>>>>>> "literals" is that you get a kind of atomicity; that chunk >>>>>>>> of data >>>>>>>> is either there, or not there; it is asserted, or not >>>>>>>> asserted. With >>>>>>>> a triples-based (description of a ) data structure you have >>>>>>>> to be >>>>>>>> constantly on your guard that every subset of the full >>>>>>>> graph pattern >>>>>>>> is at least sensible and harmless, even when subsetting >>>>>>>> these chunks >>>>>>>> is often confusing or misleading for data consumers. I >>>>>>>> can't help >>>>>>>> wondering whether notions of graph shapes from shacl, shex >>>>>>>> (and >>>>>>>> sparql) could be exploited to create an RDF-based data >>>>>>>> format which >>>>>>>> had atomicity at the level of entire shapes. >>>>>>>> >>>>>>>> Dan >>>>>>>> >>>>>>>> Jeen >>>>>>>> >>>>>>> -- >>>>>>> Antoine Zimmermann >>>>>>> Institut Henri Fayol >>>>>>> École des Mines de Saint-Étienne >>>>>>> 158 cours Fauriel >>>>>>> CS 62362 >>>>>>> 42023 Saint-Étienne Cedex 2 >>>>>>> France >>>>>>> Tél:+33(0)4 77 42 66 03 >>>>>>> Fax:+33(0)4 77 42 66 66 >>>>>>> http://www.emse.fr/~zimmermann/ >>>>>>> Member of team Connected Intelligence, Laboratoire Hubert >>>>>>> Curien >>>>> -- >>>>> Hugh >>>>> 023 8061 5652 >>>>>
Received on Friday, 24 July 2020 21:41:53 UTC