- From: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>
- Date: Sun, 26 Jul 2020 11:04:00 +0000
- To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Eric Prud'hommeaux <eric@w3.org>
- CC: Hugh Glaser <hugh@glasers.org>, Antoine Zimmermann <antoine.zimmermann@emse.fr>, Semantic Web <semantic-web@w3.org>, Maxime Lefrançois <maxime.lefrancois@emse.fr>
Indeed, quantities must have the same 'dimension' to be comparable across different units (here: L1 i.e. length, or A0E0L1I0M0H0T0D0 in complete form). Down the rabbit hole ... > -----Original Message----- > From: Peter F. Patel-Schneider <pfpschneider@gmail.com> > Sent: Saturday, 25 July, 2020 07:41 > To: Eric Prud'hommeaux <eric@w3.org> > Cc: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>; Hugh Glaser > <hugh@glasers.org>; Antoine Zimmermann > <antoine.zimmermann@emse.fr>; Semantic Web <semantic-web@w3.org>; > Maxime Lefrançois <maxime.lefrancois@emse.fr> > Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - > existential variables?] > > Yes, adding this "planet" does not change the results. But it should! > > The flaw in your query is its dependence on entering the data in > km. Entering the data in any other measure only (mile, meters, furlongs, > centimeters) results in the query missing answers that it should be returning. > > > peter > > > > On 7/24/20 2:21 PM, Eric Prud'hommeaux wrote: > > On Fri, Jul 24, 2020 at 01:13:45PM -0400, Peter Patel-Schneider wrote: > >> But what happens with > >> > >> wipe:HAT-P-67 ex:diameter "190000"^^ucum:mi. > >> > >> Its diameter is more than 100000 kilometers. > >> > >> > >> It appears to me that your query is an unsafe shortcut itself. > > Adding that triple doesn't change my results 'cause it's checking for a > `datatype(?d) = ucum:km`. I can add `wipe:HAT-P-67 ex:diameter > "395200"^^ucum:km` and get: > > ┌──────────────────────────────────────────┐ > > │ ?planet │ > > │ <https://en.wikipedia.org/wiki/saturn> │ > > │ <https://en.wikipedia.org/wiki/jupiter> │ │ > > <https://en.wikipedia.org/wiki/HAT-P-67> │ > > └──────────────────────────────────────────┘ > > That's the safety I was talking about. Am I missing a query vulnerability? > > > > > >> peter > >> > >> > >> On Fri, 2020-07-24 at 17:38 +0200, Eric Prud'hommeaux wrote: > >>> You'll need a microparsing regardless, and, as Simon points out, > >>> it's not that onerous. My point was just that having to microparse > >>> union types out of the same literal as a UCUM type is more > >>> complicated than parsing the UCUM type out of the literal's > >>> datatype. Having numeric types separated from their units would > >>> allow SPARQL 1.1 queries to avoid cracking the literal form, e.g. > >>> > >>> Data: > >>> [[ > >>> PREFIX ex: <http://a.example/astro#> PREFIX ucum: > >>> <http://ucum.nlm.nih.gov/#> PREFIX wipe: > >>> <https://en.wikipedia.org/wiki/> > >>> > >>> wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi . > >>> wipe:saturn ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi . > >>> wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi . > >>> ]] > >>> > >>> Query: > >>> [[ > >>> PREFIX ex: <http://a.example/astro#> PREFIX xsd: > >>> <http://www.w3.org/2001/XMLSchema#> > >>> PREFIX ucum: <http://ucum.nlm.nih.gov/#> > >>> > >>> SELECT ?planet WHERE { > >>> ?planet ex:diameter ?d . > >>> FILTER(datatype(?d) = ucum:km > >>> && xsd:float(str(?d)) > 1E5) > >>> } > >>> ]] > >>> > >>> Results: > >>> ┌─────────────────────────────────────────┐ > >>> │ ?planet │ > >>> │ <https://en.wikipedia.org/wiki/saturn> │ │ > >>> <https://en.wikipedia.org/wiki/jupiter> │ > >>> └─────────────────────────────────────────┘ > >>> > >>> Here, SPARQL takes care of parsing value types for us ("1.1e5" or > >>> "139822") so the query is safe (datatype(?d) = ucum:km) and pretty > >>> easy to compose. The same query is substantially more tedious with > >>> data like: > >>> wipe:saturn ex:diameter "1.1e5 km"^^cdt:ucum , "72367 > >>> mi"^^cdt:ucum . > >>> which is likely to lead to unsafe shortcuts. > >>> > >>> So Maxime and Antionne, how hard would it be to transplant your > >>> semantics to apply directly to ucum (presuming their cooperation). > >>> > >>> > >>> On Fri, Jul 24, 2020 at 12:12:34AM +0000, Cox, Simon (L&W, Clayton) > >>> wrote: > >>>> Yes you would need a UCUM parser. > >>>> > >>>> Note however that UCUM is not a "large vocabulary". > >>>> There is a relatively small set of terminals here > >>>> http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine > >>>> these into a countably infinite set. > >>>> The rule is described here: > >>>> http://unitsofmeasure.org/ucum.html#section-Syntax-Rules > >>>> > >>>> There are a number of implementations listed here > >>>> https://unitsofmeasure.org/trac at 'Implementation Support'. > >>>> This documentation has not been updated for about 3 years, so some > >>>> of the links might be stale, and there may be others. > >>>> > >>>> A units-of-measure library, with UCUM support, that was available > >>>> to be integrated into RDF applications would be a significant > >>>> contribution to the community. > >>>> > >>>> Simon > >>>> > >>>>> -----Original Message----- > >>>>> From: Hugh Glaser <hugh@glasers.org> > >>>>> Sent: Friday, 24 July, 2020 08:58 > >>>>> To: Eric Prud'hommeaux <eric@w3.org> > >>>>> Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>; > Semantic Web > >>>>> <semantic-web@w3.org>; Maxime Lefrançois > >>>>> <maxime.lefrancois@emse.fr> > >>>>> Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics > >>>>> - existential variables?] > >>>>> > >>>>> If I understand correctly. > >>>>> I will need to add a UCUM parser to my system to be able to > >>>>> process these datatypes, if people send them to me in their RDF? > >>>>> In fact, I will need a UCUM to RDF converter to be able to > >>>>> "understand" > >>>>> properly what they "mean"? > >>>>> Does such an animal exist? > >>>>> > >>>>> It looks to me that UCUM is quite a large vocabulary of units, for > >>>>> a start - what would the URI for the "liter" unit of measurement > >>>>> be, for example? > >>>>> > >>>>> I'm very happy to have widely adopted standards like this - I just > >>>>> want to keep my Semantic Web processing in the Semantic Web > (RDF), > >>>>> and as simple as possible. > >>>>> Or at least be helped to do that. > >>>>> > >>>>> Cheers > >>>>> > >>>>>> On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> > >>>>>> wrote: > >>>>>> > >>>>>> On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann > >>>>>> wrote: > >>>>>>> Regarding physical quantities, such as "5 inches", etc., my > >>>>>>> colleague Maxime Lefrançois and myself coauthored a > >>>>>>> specification for a datatype for physical quantities [1]. It is > >>>>>>> quite simple: we reuse the Unified Code for Units of Measurement > >>>>>>> (UCUM), a standard that is used in many scientific applications, > >>>>>>> and combine it with a > >>>>>>> number: > >>>>>>> > >>>>>>> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER> > ::= > >>>>>>> xsd:decimal(('e'|'E')xsd:integer)? > >>>>>>> > >>>>>>> Since UCUM has a well defined semantics, so does our datatype. > >>>>>>> Better, since UCUM is implemented in many programming > languages, > >>>>>>> my colleague Maxime could easily integrate it into Jena and its > >>>>>>> SPARQL engine > >>>>> [2]. > >>>>>>> So, with our Jena fork, one can write: > >>>>>>> > >>>>>>> SELECT ?planet WHERE { > >>>>>>> ?planet a ex:Planet; > >>>>>>> ex:diameter ?s . > >>>>>>> FILTER(?s > "2e11 mm"^^cdt:ucum) } > >>>>>> I applaud the work to extend XSD's numeric types so that RDF can > >>>>>> have > >>>>> standard measurement types. But why not leverage your work by > >>>>> adding SPARQL support for UCUM types? e.g. > >>>>>> SELECT ?planet WHERE { > >>>>>> ?planet a ex:Planet; > >>>>>> ex:diameter ?s . > >>>>>> FILTER(?s > "2e11"^^ucum:mm) > >>>>>> } > >>>>>> > >>>>>> It feels cleaner to me to embed the entire type of the data in > >>>>>> the literal's > >>>>> datatype rather than spreading it across an aggregator type > >>>>> (cdt:ucum) and > >>>>> the lexical value (" mm"). > >>>>>> In either case we probably have a union type in the lexical value > >>>>>> so we'd > >>>>> have to micro-parse doubles, decimals and integers, but the > >>>>> parsing is easier if the measurement unit is broken out into the > >>>>> end of the datatype URL. > >>>>>> There are a few UCUM units that aren't viable localnames (e.g. > >>>>>> "m/s.s"), > >>>>> but I think we can encode around that (e.g. "m_s.s") in a way that > >>>>> still makes > >>>>> ucum: a practical namespace for datatypes. > >>>>>>> This works if the size of the planet is encoded as a cdt:ucum, > >>>>>>> no matter what unit one is using. One can even use "link for > >>>>>>> Gunter's chain" (unit "[lk_us]"), or "cubic meters per acre" > >>>>>>> (unit > >>>>>>> "m3/[acr_us]") [3], which are both units of length. > >>>>>>> > >>>>>>> With some of our industrial partners, we are using this for > >>>>>>> energy data, and they seem to be very pleased with this > >>>>>>> approach, compared to an ontology-based approach. > >>>>>>> > >>>>>>> > >>>>>>> [1] https://w3id.org/lindt/custom_datatypes#ucum > >>>>>>> [2] You can try it at > >>>>>>> https://ci.mines-stetienne.fr/lindt/playground.html > >>>>>>> [3] Try this query in the playground: > >>>>>>> > >>>>>>> """ > >>>>>>> PREFIX iter: <http://w3id.org/sparql-generate/iter/> > >>>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > >>>>>>> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#> > >>>>>>> PREFIX ex: <http://example.org/> > >>>>>>> > >>>>>>> SELECT ?length ?normalized > >>>>>>> > >>>>>>> WHERE{ > >>>>>>> > >>>>>>> VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum } # convert > >>>>>>> to meters > >>>>>>> BIND("0 m"^^cdt:ucum + ?position AS ?normalized ) > >>>>>>> > >>>>>>> } > >>>>>>> """ > >>>>>>> > >>>>>>> --AZ > >>>>>>> > >>>>>>> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit : > >>>>>>>> Yeah, the atomicity of the chunk is the point. This even > >>>>>>>> applies to quantities. 25.4mm is *identical* to 1” – they are > >>>>>>>> the same thing. > >>>>>>>> Any engine that operates with quantities needs to understand > >>>>>>>> that. > >>>>> ’25.4’ > >>>>>>>> and ‘mm’ cannot be separated. Coordinates are slightly more > >>>>>>>> complex but it comes down to the same thing. A single element > >>>>>>>> within a set of coordinates that describes a position in space > >>>>>>>> is not independent of the other numbers in the tuple, or of the > >>>>>>>> coordinate reference system within which they are expressed. > >>>>>>>> One value should > >>>>>>>> *never* be > >>>>>>>> used independent of the others. Exactly the same position on > >>>>>>>> the earth will be denoted by three different numbers if > >>>>>>>> embedded in a different coordinate reference system. You can > >>>>>>>> only ‘reason’ over them > >>>>> as a group, not individually. > >>>>>>>> *From:*Dan Brickley <danbri@danbri.org> > >>>>>>>> *Sent:* Thursday, 16 July, 2020 23:58 > >>>>>>>> *To:* Jeen Broekstra <jeen@fastmail.com> > >>>>>>>> *Cc:* Semantic Web <semantic-web@w3.org> > >>>>>>>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes > >>>>>>>> semantics > >>>>>>>> - existential variables?] > >>>>>>>> > >>>>>>>> … > >>>>>>>> > >>>>>>>> I believe the big appeal of putting it all into the zone we > >>>>>>>> call "literals" is that you get a kind of atomicity; that chunk > >>>>>>>> of data is either there, or not there; it is asserted, or not > >>>>>>>> asserted. With a triples-based (description of a ) data > >>>>>>>> structure you have to be constantly on your guard that every > >>>>>>>> subset of the full graph pattern is at least sensible and > >>>>>>>> harmless, even when subsetting these chunks is often confusing > >>>>>>>> or misleading for data consumers. I can't help wondering > >>>>>>>> whether notions of graph shapes from shacl, shex (and > >>>>>>>> sparql) could be exploited to create an RDF-based data format > >>>>>>>> which had atomicity at the level of entire shapes. > >>>>>>>> > >>>>>>>> Dan > >>>>>>>> > >>>>>>>> Jeen > >>>>>>>> > >>>>>>> -- > >>>>>>> Antoine Zimmermann > >>>>>>> Institut Henri Fayol > >>>>>>> École des Mines de Saint-Étienne > >>>>>>> 158 cours Fauriel > >>>>>>> CS 62362 > >>>>>>> 42023 Saint-Étienne Cedex 2 > >>>>>>> France > >>>>>>> Tél:+33(0)4 77 42 66 03 > >>>>>>> Fax:+33(0)4 77 42 66 66 > >>>>>>> http://www.emse.fr/~zimmermann/ > >>>>>>> Member of team Connected Intelligence, Laboratoire Hubert > Curien > >>>>> -- > >>>>> Hugh > >>>>> 023 8061 5652 > >>>>>
Received on Sunday, 26 July 2020 11:04:34 UTC