- From: Peter Patel-Schneider <pfpschneider@gmail.com>
- Date: Fri, 24 Jul 2020 13:13:45 -0400
- To: Eric Prud'hommeaux <eric@w3.org>, "Cox, Simon (L&W, Clayton)" <Simon.Cox@csiro.au>
- Cc: Hugh Glaser <hugh@glasers.org>, Antoine Zimmermann <antoine.zimmermann@emse.fr>, Semantic Web <semantic-web@w3.org>, Maxime Lefrançois <maxime.lefrancois@emse.fr>
But what happens with wipe:HAT-P-67 ex:diameter "190000"^^ucum:mi. Its diameter is more than 100000 kilometers. It appears to me that your query is an unsafe shortcut itself. peter On Fri, 2020-07-24 at 17:38 +0200, Eric Prud'hommeaux wrote: > You'll need a microparsing regardless, and, as Simon points out, it's > not that onerous. My point was just that having to microparse union > types out of the same literal as a UCUM type is more complicated than > parsing the UCUM type out of the literal's datatype. Having numeric > types separated from their units would allow SPARQL 1.1 queries to > avoid cracking the literal form, e.g. > > Data: > [[ > PREFIX ex: <http://a.example/astro#> > PREFIX ucum: <http://ucum.nlm.nih.gov/#> > PREFIX wipe: <https://en.wikipedia.org/wiki/> > > wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi . > wipe:saturn ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi . > wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi . > ]] > > Query: > [[ > PREFIX ex: <http://a.example/astro#> > PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > PREFIX ucum: <http://ucum.nlm.nih.gov/#> > > SELECT ?planet WHERE { > ?planet ex:diameter ?d . > FILTER(datatype(?d) = ucum:km > && xsd:float(str(?d)) > 1E5) > } > ]] > > Results: > ┌─────────────────────────────────────────┐ > │ ?planet │ > │ <https://en.wikipedia.org/wiki/saturn> │ > │ <https://en.wikipedia.org/wiki/jupiter> │ > └─────────────────────────────────────────┘ > > Here, SPARQL takes care of parsing value types for us ("1.1e5" or > "139822") so the query is safe (datatype(?d) = ucum:km) and pretty > easy to compose. The same query is substantially more tedious with > data like: > wipe:saturn ex:diameter "1.1e5 km"^^cdt:ucum , "72367 > mi"^^cdt:ucum . > which is likely to lead to unsafe shortcuts. > > So Maxime and Antionne, how hard would it be to transplant your > semantics to apply directly to ucum (presuming their cooperation). > > > On Fri, Jul 24, 2020 at 12:12:34AM +0000, Cox, Simon (L&W, Clayton) > wrote: > > Yes you would need a UCUM parser. > > > > Note however that UCUM is not a "large vocabulary". > > There is a relatively small set of terminals here > > http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine > > these into a countably infinite set. > > The rule is described here: > > http://unitsofmeasure.org/ucum.html#section-Syntax-Rules > > > > There are a number of implementations listed here > > https://unitsofmeasure.org/trac at 'Implementation Support'. > > This documentation has not been updated for about 3 years, so some > > of the links might be stale, and there may be others. > > > > A units-of-measure library, with UCUM support, that was available > > to be integrated into RDF applications would be a significant > > contribution to the community. > > > > Simon > > > > > -----Original Message----- > > > From: Hugh Glaser <hugh@glasers.org> > > > Sent: Friday, 24 July, 2020 08:58 > > > To: Eric Prud'hommeaux <eric@w3.org> > > > Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>; Semantic Web > > > <semantic-web@w3.org>; Maxime Lefrançois > > > <maxime.lefrancois@emse.fr> > > > Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes > > > semantics - > > > existential variables?] > > > > > > If I understand correctly. > > > I will need to add a UCUM parser to my system to be able to > > > process these > > > datatypes, if people send them to me in their RDF? > > > In fact, I will need a UCUM to RDF converter to be able to > > > "understand" > > > properly what they "mean"? > > > Does such an animal exist? > > > > > > It looks to me that UCUM is quite a large vocabulary of units, > > > for a start - > > > what would the URI for the "liter" unit of measurement be, for > > > example? > > > > > > I'm very happy to have widely adopted standards like this - I > > > just want to > > > keep my Semantic Web processing in the Semantic Web (RDF), and as > > > simple > > > as possible. > > > Or at least be helped to do that. > > > > > > Cheers > > > > > > > On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> > > > > wrote: > > > > > > > > On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann > > > > wrote: > > > > > Regarding physical quantities, such as "5 inches", etc., my > > > > > colleague > > > > > Maxime Lefrançois and myself coauthored a specification for a > > > > > datatype for physical quantities [1]. It is quite simple: we > > > > > reuse > > > > > the Unified Code for Units of Measurement (UCUM), a standard > > > > > that is > > > > > used in many scientific applications, and combine it with a > > > > > number: > > > > > > > > > > <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER> ::= > > > > > xsd:decimal(('e'|'E')xsd:integer)? > > > > > > > > > > Since UCUM has a well defined semantics, so does our > > > > > datatype. > > > > > Better, since UCUM is implemented in many programming > > > > > languages, my > > > > > colleague Maxime could easily integrate it into Jena and its > > > > > SPARQL engine > > > [2]. > > > > > So, with our Jena fork, one can write: > > > > > > > > > > SELECT ?planet WHERE { > > > > > ?planet a ex:Planet; > > > > > ex:diameter ?s . > > > > > FILTER(?s > "2e11 mm"^^cdt:ucum) > > > > > } > > > > > > > > I applaud the work to extend XSD's numeric types so that RDF > > > > can have > > > standard measurement types. But why not leverage your work by > > > adding > > > SPARQL support for UCUM types? e.g. > > > > SELECT ?planet WHERE { > > > > ?planet a ex:Planet; > > > > ex:diameter ?s . > > > > FILTER(?s > "2e11"^^ucum:mm) > > > > } > > > > > > > > It feels cleaner to me to embed the entire type of the data in > > > > the literal's > > > datatype rather than spreading it across an aggregator type > > > (cdt:ucum) and > > > the lexical value (" mm"). > > > > In either case we probably have a union type in the lexical > > > > value so we'd > > > have to micro-parse doubles, decimals and integers, but the > > > parsing is easier > > > if the measurement unit is broken out into the end of the > > > datatype URL. > > > > There are a few UCUM units that aren't viable localnames (e.g. > > > > "m/s.s"), > > > but I think we can encode around that (e.g. "m_s.s") in a way > > > that still makes > > > ucum: a practical namespace for datatypes. > > > > > > > > > This works if the size of the planet is encoded as a > > > > > cdt:ucum, no > > > > > matter what unit one is using. One can even use "link for > > > > > Gunter's > > > > > chain" (unit "[lk_us]"), or "cubic meters per acre" (unit > > > > > "m3/[acr_us]") [3], which are both units of length. > > > > > > > > > > With some of our industrial partners, we are using this for > > > > > energy > > > > > data, and they seem to be very pleased with this approach, > > > > > compared > > > > > to an ontology-based approach. > > > > > > > > > > > > > > > [1] https://w3id.org/lindt/custom_datatypes#ucum > > > > > [2] You can try it at > > > > > https://ci.mines-stetienne.fr/lindt/playground.html > > > > > [3] Try this query in the playground: > > > > > > > > > > """ > > > > > PREFIX iter: <http://w3id.org/sparql-generate/iter/> > > > > > PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > > > > > PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#> > > > > > PREFIX ex: <http://example.org/> > > > > > > > > > > SELECT ?length ?normalized > > > > > > > > > > WHERE{ > > > > > > > > > > VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum } # > > > > > convert to > > > > > meters > > > > > BIND("0 m"^^cdt:ucum + ?position AS ?normalized ) > > > > > > > > > > } > > > > > """ > > > > > > > > > > --AZ > > > > > > > > > > Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit : > > > > > > Yeah, the atomicity of the chunk is the point. This even > > > > > > applies to > > > > > > quantities. 25.4mm is *identical* to 1” – they are the same > > > > > > thing. > > > > > > Any engine that operates with quantities needs to > > > > > > understand that. > > > ’25.4’ > > > > > > and ‘mm’ cannot be separated. Coordinates are slightly more > > > > > > complex > > > > > > but it comes down to the same thing. A single element > > > > > > within a set > > > > > > of coordinates that describes a position in space is not > > > > > > independent > > > > > > of the other numbers in the tuple, or of the coordinate > > > > > > reference > > > > > > system within which they are expressed. One value should > > > > > > *never* be > > > > > > used independent of the others. Exactly the same position > > > > > > on the > > > > > > earth will be denoted by three different numbers if > > > > > > embedded in a > > > > > > different coordinate reference system. You can only > > > > > > ‘reason’ over them > > > as a group, not individually. > > > > > > *From:*Dan Brickley <danbri@danbri.org> > > > > > > *Sent:* Thursday, 16 July, 2020 23:58 > > > > > > *To:* Jeen Broekstra <jeen@fastmail.com> > > > > > > *Cc:* Semantic Web <semantic-web@w3.org> > > > > > > *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes > > > > > > semantics > > > > > > - existential variables?] > > > > > > > > > > > > … > > > > > > > > > > > > I believe the big appeal of putting it all into the zone we > > > > > > call > > > > > > "literals" is that you get a kind of atomicity; that chunk > > > > > > of data > > > > > > is either there, or not there; it is asserted, or not > > > > > > asserted. With > > > > > > a triples-based (description of a ) data structure you have > > > > > > to be > > > > > > constantly on your guard that every subset of the full > > > > > > graph pattern > > > > > > is at least sensible and harmless, even when subsetting > > > > > > these chunks > > > > > > is often confusing or misleading for data consumers. I > > > > > > can't help > > > > > > wondering whether notions of graph shapes from shacl, shex > > > > > > (and > > > > > > sparql) could be exploited to create an RDF-based data > > > > > > format which > > > > > > had atomicity at the level of entire shapes. > > > > > > > > > > > > Dan > > > > > > > > > > > > Jeen > > > > > > > > > > > > > > > > -- > > > > > Antoine Zimmermann > > > > > Institut Henri Fayol > > > > > École des Mines de Saint-Étienne > > > > > 158 cours Fauriel > > > > > CS 62362 > > > > > 42023 Saint-Étienne Cedex 2 > > > > > France > > > > > Tél:+33(0)4 77 42 66 03 > > > > > Fax:+33(0)4 77 42 66 66 > > > > > http://www.emse.fr/~zimmermann/ > > > > > Member of team Connected Intelligence, Laboratoire Hubert > > > > > Curien > > > > > > -- > > > Hugh > > > 023 8061 5652 > > >
Received on Friday, 24 July 2020 17:14:02 UTC