- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Fri, 24 Jul 2020 20:21:46 +0200
- To: Peter Patel-Schneider <pfpschneider@gmail.com>
- Cc: "Cox, Simon (L&W, Clayton)" <Simon.Cox@csiro.au>, Hugh Glaser <hugh@glasers.org>, Antoine Zimmermann <antoine.zimmermann@emse.fr>, Semantic Web <semantic-web@w3.org>, Maxime Lefrançois <maxime.lefrancois@emse.fr>
On Fri, Jul 24, 2020 at 01:13:45PM -0400, Peter Patel-Schneider wrote: > But what happens with > > wipe:HAT-P-67 ex:diameter "190000"^^ucum:mi. > > Its diameter is more than 100000 kilometers. > > > It appears to me that your query is an unsafe shortcut itself. Adding that triple doesn't change my results 'cause it's checking for a `datatype(?d) = ucum:km`. I can add `wipe:HAT-P-67 ex:diameter "395200"^^ucum:km` and get: ┌──────────────────────────────────────────┐ │ ?planet │ │ <https://en.wikipedia.org/wiki/saturn> │ │ <https://en.wikipedia.org/wiki/jupiter> │ │ <https://en.wikipedia.org/wiki/HAT-P-67> │ └──────────────────────────────────────────┘ That's the safety I was talking about. Am I missing a query vulnerability? > peter > > > On Fri, 2020-07-24 at 17:38 +0200, Eric Prud'hommeaux wrote: > > You'll need a microparsing regardless, and, as Simon points out, it's > > not that onerous. My point was just that having to microparse union > > types out of the same literal as a UCUM type is more complicated than > > parsing the UCUM type out of the literal's datatype. Having numeric > > types separated from their units would allow SPARQL 1.1 queries to > > avoid cracking the literal form, e.g. > > > > Data: > > [[ > > PREFIX ex: <http://a.example/astro#> > > PREFIX ucum: <http://ucum.nlm.nih.gov/#> > > PREFIX wipe: <https://en.wikipedia.org/wiki/> > > > > wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi . > > wipe:saturn ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi . > > wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi . > > ]] > > > > Query: > > [[ > > PREFIX ex: <http://a.example/astro#> > > PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > > PREFIX ucum: <http://ucum.nlm.nih.gov/#> > > > > SELECT ?planet WHERE { > > ?planet ex:diameter ?d . > > FILTER(datatype(?d) = ucum:km > > && xsd:float(str(?d)) > 1E5) > > } > > ]] > > > > Results: > > ┌─────────────────────────────────────────┐ > > │ ?planet │ > > │ <https://en.wikipedia.org/wiki/saturn> │ > > │ <https://en.wikipedia.org/wiki/jupiter> │ > > └─────────────────────────────────────────┘ > > > > Here, SPARQL takes care of parsing value types for us ("1.1e5" or > > "139822") so the query is safe (datatype(?d) = ucum:km) and pretty > > easy to compose. The same query is substantially more tedious with > > data like: > > wipe:saturn ex:diameter "1.1e5 km"^^cdt:ucum , "72367 > > mi"^^cdt:ucum . > > which is likely to lead to unsafe shortcuts. > > > > So Maxime and Antionne, how hard would it be to transplant your > > semantics to apply directly to ucum (presuming their cooperation). > > > > > > On Fri, Jul 24, 2020 at 12:12:34AM +0000, Cox, Simon (L&W, Clayton) > > wrote: > > > Yes you would need a UCUM parser. > > > > > > Note however that UCUM is not a "large vocabulary". > > > There is a relatively small set of terminals here > > > http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine > > > these into a countably infinite set. > > > The rule is described here: > > > http://unitsofmeasure.org/ucum.html#section-Syntax-Rules > > > > > > There are a number of implementations listed here > > > https://unitsofmeasure.org/trac at 'Implementation Support'. > > > This documentation has not been updated for about 3 years, so some > > > of the links might be stale, and there may be others. > > > > > > A units-of-measure library, with UCUM support, that was available > > > to be integrated into RDF applications would be a significant > > > contribution to the community. > > > > > > Simon > > > > > > > -----Original Message----- > > > > From: Hugh Glaser <hugh@glasers.org> > > > > Sent: Friday, 24 July, 2020 08:58 > > > > To: Eric Prud'hommeaux <eric@w3.org> > > > > Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>; Semantic Web > > > > <semantic-web@w3.org>; Maxime Lefrançois > > > > <maxime.lefrancois@emse.fr> > > > > Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes > > > > semantics - > > > > existential variables?] > > > > > > > > If I understand correctly. > > > > I will need to add a UCUM parser to my system to be able to > > > > process these > > > > datatypes, if people send them to me in their RDF? > > > > In fact, I will need a UCUM to RDF converter to be able to > > > > "understand" > > > > properly what they "mean"? > > > > Does such an animal exist? > > > > > > > > It looks to me that UCUM is quite a large vocabulary of units, > > > > for a start - > > > > what would the URI for the "liter" unit of measurement be, for > > > > example? > > > > > > > > I'm very happy to have widely adopted standards like this - I > > > > just want to > > > > keep my Semantic Web processing in the Semantic Web (RDF), and as > > > > simple > > > > as possible. > > > > Or at least be helped to do that. > > > > > > > > Cheers > > > > > > > > > On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> > > > > > wrote: > > > > > > > > > > On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann > > > > > wrote: > > > > > > Regarding physical quantities, such as "5 inches", etc., my > > > > > > colleague > > > > > > Maxime Lefrançois and myself coauthored a specification for a > > > > > > datatype for physical quantities [1]. It is quite simple: we > > > > > > reuse > > > > > > the Unified Code for Units of Measurement (UCUM), a standard > > > > > > that is > > > > > > used in many scientific applications, and combine it with a > > > > > > number: > > > > > > > > > > > > <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER> ::= > > > > > > xsd:decimal(('e'|'E')xsd:integer)? > > > > > > > > > > > > Since UCUM has a well defined semantics, so does our > > > > > > datatype. > > > > > > Better, since UCUM is implemented in many programming > > > > > > languages, my > > > > > > colleague Maxime could easily integrate it into Jena and its > > > > > > SPARQL engine > > > > [2]. > > > > > > So, with our Jena fork, one can write: > > > > > > > > > > > > SELECT ?planet WHERE { > > > > > > ?planet a ex:Planet; > > > > > > ex:diameter ?s . > > > > > > FILTER(?s > "2e11 mm"^^cdt:ucum) > > > > > > } > > > > > > > > > > I applaud the work to extend XSD's numeric types so that RDF > > > > > can have > > > > standard measurement types. But why not leverage your work by > > > > adding > > > > SPARQL support for UCUM types? e.g. > > > > > SELECT ?planet WHERE { > > > > > ?planet a ex:Planet; > > > > > ex:diameter ?s . > > > > > FILTER(?s > "2e11"^^ucum:mm) > > > > > } > > > > > > > > > > It feels cleaner to me to embed the entire type of the data in > > > > > the literal's > > > > datatype rather than spreading it across an aggregator type > > > > (cdt:ucum) and > > > > the lexical value (" mm"). > > > > > In either case we probably have a union type in the lexical > > > > > value so we'd > > > > have to micro-parse doubles, decimals and integers, but the > > > > parsing is easier > > > > if the measurement unit is broken out into the end of the > > > > datatype URL. > > > > > There are a few UCUM units that aren't viable localnames (e.g. > > > > > "m/s.s"), > > > > but I think we can encode around that (e.g. "m_s.s") in a way > > > > that still makes > > > > ucum: a practical namespace for datatypes. > > > > > > > > > > > This works if the size of the planet is encoded as a > > > > > > cdt:ucum, no > > > > > > matter what unit one is using. One can even use "link for > > > > > > Gunter's > > > > > > chain" (unit "[lk_us]"), or "cubic meters per acre" (unit > > > > > > "m3/[acr_us]") [3], which are both units of length. > > > > > > > > > > > > With some of our industrial partners, we are using this for > > > > > > energy > > > > > > data, and they seem to be very pleased with this approach, > > > > > > compared > > > > > > to an ontology-based approach. > > > > > > > > > > > > > > > > > > [1] https://w3id.org/lindt/custom_datatypes#ucum > > > > > > [2] You can try it at > > > > > > https://ci.mines-stetienne.fr/lindt/playground.html > > > > > > [3] Try this query in the playground: > > > > > > > > > > > > """ > > > > > > PREFIX iter: <http://w3id.org/sparql-generate/iter/> > > > > > > PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > > > > > > PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#> > > > > > > PREFIX ex: <http://example.org/> > > > > > > > > > > > > SELECT ?length ?normalized > > > > > > > > > > > > WHERE{ > > > > > > > > > > > > VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum } # > > > > > > convert to > > > > > > meters > > > > > > BIND("0 m"^^cdt:ucum + ?position AS ?normalized ) > > > > > > > > > > > > } > > > > > > """ > > > > > > > > > > > > --AZ > > > > > > > > > > > > Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit : > > > > > > > Yeah, the atomicity of the chunk is the point. This even > > > > > > > applies to > > > > > > > quantities. 25.4mm is *identical* to 1” – they are the same > > > > > > > thing. > > > > > > > Any engine that operates with quantities needs to > > > > > > > understand that. > > > > ’25.4’ > > > > > > > and ‘mm’ cannot be separated. Coordinates are slightly more > > > > > > > complex > > > > > > > but it comes down to the same thing. A single element > > > > > > > within a set > > > > > > > of coordinates that describes a position in space is not > > > > > > > independent > > > > > > > of the other numbers in the tuple, or of the coordinate > > > > > > > reference > > > > > > > system within which they are expressed. One value should > > > > > > > *never* be > > > > > > > used independent of the others. Exactly the same position > > > > > > > on the > > > > > > > earth will be denoted by three different numbers if > > > > > > > embedded in a > > > > > > > different coordinate reference system. You can only > > > > > > > ‘reason’ over them > > > > as a group, not individually. > > > > > > > *From:*Dan Brickley <danbri@danbri.org> > > > > > > > *Sent:* Thursday, 16 July, 2020 23:58 > > > > > > > *To:* Jeen Broekstra <jeen@fastmail.com> > > > > > > > *Cc:* Semantic Web <semantic-web@w3.org> > > > > > > > *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes > > > > > > > semantics > > > > > > > - existential variables?] > > > > > > > > > > > > > > … > > > > > > > > > > > > > > I believe the big appeal of putting it all into the zone we > > > > > > > call > > > > > > > "literals" is that you get a kind of atomicity; that chunk > > > > > > > of data > > > > > > > is either there, or not there; it is asserted, or not > > > > > > > asserted. With > > > > > > > a triples-based (description of a ) data structure you have > > > > > > > to be > > > > > > > constantly on your guard that every subset of the full > > > > > > > graph pattern > > > > > > > is at least sensible and harmless, even when subsetting > > > > > > > these chunks > > > > > > > is often confusing or misleading for data consumers. I > > > > > > > can't help > > > > > > > wondering whether notions of graph shapes from shacl, shex > > > > > > > (and > > > > > > > sparql) could be exploited to create an RDF-based data > > > > > > > format which > > > > > > > had atomicity at the level of entire shapes. > > > > > > > > > > > > > > Dan > > > > > > > > > > > > > > Jeen > > > > > > > > > > > > > > > > > > > -- > > > > > > Antoine Zimmermann > > > > > > Institut Henri Fayol > > > > > > École des Mines de Saint-Étienne > > > > > > 158 cours Fauriel > > > > > > CS 62362 > > > > > > 42023 Saint-Étienne Cedex 2 > > > > > > France > > > > > > Tél:+33(0)4 77 42 66 03 > > > > > > Fax:+33(0)4 77 42 66 66 > > > > > > http://www.emse.fr/~zimmermann/ > > > > > > Member of team Connected Intelligence, Laboratoire Hubert > > > > > > Curien > > > > > > > > -- > > > > Hugh > > > > 023 8061 5652 > > > > >
Received on Friday, 24 July 2020 18:22:08 UTC