- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Sat, 25 Jul 2020 00:09:36 +0200
- To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
- Cc: "Cox, Simon (L&W, Clayton)" <Simon.Cox@csiro.au>, Hugh Glaser <hugh@glasers.org>, Antoine Zimmermann <antoine.zimmermann@emse.fr>, Semantic Web <semantic-web@w3.org>, Maxime Lefrançois <maxime.lefrancois@emse.fr>
I think you're arguing that SPARQL (and OWL, and DL-Query, ...) should support measurement types. I don't think this argument favors either of `"1.1e5 km"^^cdt:ucum` or `"1.1e5"^^ucum:km` over the other. Long live blank nodes. On Fri, Jul 24, 2020 at 05:41:19PM -0400, Peter F. Patel-Schneider wrote: > Yes, adding this "planet" does not change the results. But it should! > > The flaw in your query is its dependence on entering the data in km. Entering > the data in any other measure only (mile, meters, furlongs, centimeters) > results in the query missing answers that it should be returning. > > > peter > > > > On 7/24/20 2:21 PM, Eric Prud'hommeaux wrote: > > On Fri, Jul 24, 2020 at 01:13:45PM -0400, Peter Patel-Schneider wrote: > >> But what happens with > >> > >> wipe:HAT-P-67 ex:diameter "190000"^^ucum:mi. > >> > >> Its diameter is more than 100000 kilometers. > >> > >> > >> It appears to me that your query is an unsafe shortcut itself. > > Adding that triple doesn't change my results 'cause it's checking for a `datatype(?d) = ucum:km`. I can add `wipe:HAT-P-67 ex:diameter "395200"^^ucum:km` and get: > > ┌──────────────────────────────────────────┐ > > │ ?planet │ > > │ <https://en.wikipedia.org/wiki/saturn> │ > > │ <https://en.wikipedia.org/wiki/jupiter> │ > > │ <https://en.wikipedia.org/wiki/HAT-P-67> │ > > └──────────────────────────────────────────┘ > > That's the safety I was talking about. Am I missing a query vulnerability? > > > > > >> peter > >> > >> > >> On Fri, 2020-07-24 at 17:38 +0200, Eric Prud'hommeaux wrote: > >>> You'll need a microparsing regardless, and, as Simon points out, it's > >>> not that onerous. My point was just that having to microparse union > >>> types out of the same literal as a UCUM type is more complicated than > >>> parsing the UCUM type out of the literal's datatype. Having numeric > >>> types separated from their units would allow SPARQL 1.1 queries to > >>> avoid cracking the literal form, e.g. > >>> > >>> Data: > >>> [[ > >>> PREFIX ex: <http://a.example/astro#> > >>> PREFIX ucum: <http://ucum.nlm.nih.gov/#> > >>> PREFIX wipe: <https://en.wikipedia.org/wiki/> > >>> > >>> wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi . > >>> wipe:saturn ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi . > >>> wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi . > >>> ]] > >>> > >>> Query: > >>> [[ > >>> PREFIX ex: <http://a.example/astro#> > >>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > >>> PREFIX ucum: <http://ucum.nlm.nih.gov/#> > >>> > >>> SELECT ?planet WHERE { > >>> ?planet ex:diameter ?d . > >>> FILTER(datatype(?d) = ucum:km > >>> && xsd:float(str(?d)) > 1E5) > >>> } > >>> ]] > >>> > >>> Results: > >>> ┌─────────────────────────────────────────┐ > >>> │ ?planet │ > >>> │ <https://en.wikipedia.org/wiki/saturn> │ > >>> │ <https://en.wikipedia.org/wiki/jupiter> │ > >>> └─────────────────────────────────────────┘ > >>> > >>> Here, SPARQL takes care of parsing value types for us ("1.1e5" or > >>> "139822") so the query is safe (datatype(?d) = ucum:km) and pretty > >>> easy to compose. The same query is substantially more tedious with > >>> data like: > >>> wipe:saturn ex:diameter "1.1e5 km"^^cdt:ucum , "72367 > >>> mi"^^cdt:ucum . > >>> which is likely to lead to unsafe shortcuts. > >>> > >>> So Maxime and Antionne, how hard would it be to transplant your > >>> semantics to apply directly to ucum (presuming their cooperation). > >>> > >>> > >>> On Fri, Jul 24, 2020 at 12:12:34AM +0000, Cox, Simon (L&W, Clayton) > >>> wrote: > >>>> Yes you would need a UCUM parser. > >>>> > >>>> Note however that UCUM is not a "large vocabulary". > >>>> There is a relatively small set of terminals here > >>>> http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine > >>>> these into a countably infinite set. > >>>> The rule is described here: > >>>> http://unitsofmeasure.org/ucum.html#section-Syntax-Rules > >>>> > >>>> There are a number of implementations listed here > >>>> https://unitsofmeasure.org/trac at 'Implementation Support'. > >>>> This documentation has not been updated for about 3 years, so some > >>>> of the links might be stale, and there may be others. > >>>> > >>>> A units-of-measure library, with UCUM support, that was available > >>>> to be integrated into RDF applications would be a significant > >>>> contribution to the community. > >>>> > >>>> Simon > >>>> > >>>>> -----Original Message----- > >>>>> From: Hugh Glaser <hugh@glasers.org> > >>>>> Sent: Friday, 24 July, 2020 08:58 > >>>>> To: Eric Prud'hommeaux <eric@w3.org> > >>>>> Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>; Semantic Web > >>>>> <semantic-web@w3.org>; Maxime Lefrançois > >>>>> <maxime.lefrancois@emse.fr> > >>>>> Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes > >>>>> semantics - > >>>>> existential variables?] > >>>>> > >>>>> If I understand correctly. > >>>>> I will need to add a UCUM parser to my system to be able to > >>>>> process these > >>>>> datatypes, if people send them to me in their RDF? > >>>>> In fact, I will need a UCUM to RDF converter to be able to > >>>>> "understand" > >>>>> properly what they "mean"? > >>>>> Does such an animal exist? > >>>>> > >>>>> It looks to me that UCUM is quite a large vocabulary of units, > >>>>> for a start - > >>>>> what would the URI for the "liter" unit of measurement be, for > >>>>> example? > >>>>> > >>>>> I'm very happy to have widely adopted standards like this - I > >>>>> just want to > >>>>> keep my Semantic Web processing in the Semantic Web (RDF), and as > >>>>> simple > >>>>> as possible. > >>>>> Or at least be helped to do that. > >>>>> > >>>>> Cheers > >>>>> > >>>>>> On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> > >>>>>> wrote: > >>>>>> > >>>>>> On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann > >>>>>> wrote: > >>>>>>> Regarding physical quantities, such as "5 inches", etc., my > >>>>>>> colleague > >>>>>>> Maxime Lefrançois and myself coauthored a specification for a > >>>>>>> datatype for physical quantities [1]. It is quite simple: we > >>>>>>> reuse > >>>>>>> the Unified Code for Units of Measurement (UCUM), a standard > >>>>>>> that is > >>>>>>> used in many scientific applications, and combine it with a > >>>>>>> number: > >>>>>>> > >>>>>>> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER> ::= > >>>>>>> xsd:decimal(('e'|'E')xsd:integer)? > >>>>>>> > >>>>>>> Since UCUM has a well defined semantics, so does our > >>>>>>> datatype. > >>>>>>> Better, since UCUM is implemented in many programming > >>>>>>> languages, my > >>>>>>> colleague Maxime could easily integrate it into Jena and its > >>>>>>> SPARQL engine > >>>>> [2]. > >>>>>>> So, with our Jena fork, one can write: > >>>>>>> > >>>>>>> SELECT ?planet WHERE { > >>>>>>> ?planet a ex:Planet; > >>>>>>> ex:diameter ?s . > >>>>>>> FILTER(?s > "2e11 mm"^^cdt:ucum) > >>>>>>> } > >>>>>> I applaud the work to extend XSD's numeric types so that RDF > >>>>>> can have > >>>>> standard measurement types. But why not leverage your work by > >>>>> adding > >>>>> SPARQL support for UCUM types? e.g. > >>>>>> SELECT ?planet WHERE { > >>>>>> ?planet a ex:Planet; > >>>>>> ex:diameter ?s . > >>>>>> FILTER(?s > "2e11"^^ucum:mm) > >>>>>> } > >>>>>> > >>>>>> It feels cleaner to me to embed the entire type of the data in > >>>>>> the literal's > >>>>> datatype rather than spreading it across an aggregator type > >>>>> (cdt:ucum) and > >>>>> the lexical value (" mm"). > >>>>>> In either case we probably have a union type in the lexical > >>>>>> value so we'd > >>>>> have to micro-parse doubles, decimals and integers, but the > >>>>> parsing is easier > >>>>> if the measurement unit is broken out into the end of the > >>>>> datatype URL. > >>>>>> There are a few UCUM units that aren't viable localnames (e.g. > >>>>>> "m/s.s"), > >>>>> but I think we can encode around that (e.g. "m_s.s") in a way > >>>>> that still makes > >>>>> ucum: a practical namespace for datatypes. > >>>>>>> This works if the size of the planet is encoded as a > >>>>>>> cdt:ucum, no > >>>>>>> matter what unit one is using. One can even use "link for > >>>>>>> Gunter's > >>>>>>> chain" (unit "[lk_us]"), or "cubic meters per acre" (unit > >>>>>>> "m3/[acr_us]") [3], which are both units of length. > >>>>>>> > >>>>>>> With some of our industrial partners, we are using this for > >>>>>>> energy > >>>>>>> data, and they seem to be very pleased with this approach, > >>>>>>> compared > >>>>>>> to an ontology-based approach. > >>>>>>> > >>>>>>> > >>>>>>> [1] https://w3id.org/lindt/custom_datatypes#ucum > >>>>>>> [2] You can try it at > >>>>>>> https://ci.mines-stetienne.fr/lindt/playground.html > >>>>>>> [3] Try this query in the playground: > >>>>>>> > >>>>>>> """ > >>>>>>> PREFIX iter: <http://w3id.org/sparql-generate/iter/> > >>>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > >>>>>>> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#> > >>>>>>> PREFIX ex: <http://example.org/> > >>>>>>> > >>>>>>> SELECT ?length ?normalized > >>>>>>> > >>>>>>> WHERE{ > >>>>>>> > >>>>>>> VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum } # > >>>>>>> convert to > >>>>>>> meters > >>>>>>> BIND("0 m"^^cdt:ucum + ?position AS ?normalized ) > >>>>>>> > >>>>>>> } > >>>>>>> """ > >>>>>>> > >>>>>>> --AZ > >>>>>>> > >>>>>>> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit : > >>>>>>>> Yeah, the atomicity of the chunk is the point. This even > >>>>>>>> applies to > >>>>>>>> quantities. 25.4mm is *identical* to 1” – they are the same > >>>>>>>> thing. > >>>>>>>> Any engine that operates with quantities needs to > >>>>>>>> understand that. > >>>>> ’25.4’ > >>>>>>>> and ‘mm’ cannot be separated. Coordinates are slightly more > >>>>>>>> complex > >>>>>>>> but it comes down to the same thing. A single element > >>>>>>>> within a set > >>>>>>>> of coordinates that describes a position in space is not > >>>>>>>> independent > >>>>>>>> of the other numbers in the tuple, or of the coordinate > >>>>>>>> reference > >>>>>>>> system within which they are expressed. One value should > >>>>>>>> *never* be > >>>>>>>> used independent of the others. Exactly the same position > >>>>>>>> on the > >>>>>>>> earth will be denoted by three different numbers if > >>>>>>>> embedded in a > >>>>>>>> different coordinate reference system. You can only > >>>>>>>> ‘reason’ over them > >>>>> as a group, not individually. > >>>>>>>> *From:*Dan Brickley <danbri@danbri.org> > >>>>>>>> *Sent:* Thursday, 16 July, 2020 23:58 > >>>>>>>> *To:* Jeen Broekstra <jeen@fastmail.com> > >>>>>>>> *Cc:* Semantic Web <semantic-web@w3.org> > >>>>>>>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes > >>>>>>>> semantics > >>>>>>>> - existential variables?] > >>>>>>>> > >>>>>>>> … > >>>>>>>> > >>>>>>>> I believe the big appeal of putting it all into the zone we > >>>>>>>> call > >>>>>>>> "literals" is that you get a kind of atomicity; that chunk > >>>>>>>> of data > >>>>>>>> is either there, or not there; it is asserted, or not > >>>>>>>> asserted. With > >>>>>>>> a triples-based (description of a ) data structure you have > >>>>>>>> to be > >>>>>>>> constantly on your guard that every subset of the full > >>>>>>>> graph pattern > >>>>>>>> is at least sensible and harmless, even when subsetting > >>>>>>>> these chunks > >>>>>>>> is often confusing or misleading for data consumers. I > >>>>>>>> can't help > >>>>>>>> wondering whether notions of graph shapes from shacl, shex > >>>>>>>> (and > >>>>>>>> sparql) could be exploited to create an RDF-based data > >>>>>>>> format which > >>>>>>>> had atomicity at the level of entire shapes. > >>>>>>>> > >>>>>>>> Dan > >>>>>>>> > >>>>>>>> Jeen > >>>>>>>> > >>>>>>> -- > >>>>>>> Antoine Zimmermann > >>>>>>> Institut Henri Fayol > >>>>>>> École des Mines de Saint-Étienne > >>>>>>> 158 cours Fauriel > >>>>>>> CS 62362 > >>>>>>> 42023 Saint-Étienne Cedex 2 > >>>>>>> France > >>>>>>> Tél:+33(0)4 77 42 66 03 > >>>>>>> Fax:+33(0)4 77 42 66 66 > >>>>>>> http://www.emse.fr/~zimmermann/ > >>>>>>> Member of team Connected Intelligence, Laboratoire Hubert > >>>>>>> Curien > >>>>> -- > >>>>> Hugh > >>>>> 023 8061 5652 > >>>>>
Received on Friday, 24 July 2020 22:09:49 UTC