- From: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>
- Date: Sun, 26 Jul 2020 11:06:17 +0000
- To: "Cox, Simon (L&W, Clayton)" <Simon.Cox@csiro.au>, Eric Prud'hommeaux <eric@w3.org>
- CC: Hugh Glaser <hugh@glasers.org>, Antoine Zimmermann <antoine.zimmermann@emse.fr>, Semantic Web <semantic-web@w3.org>, Maxime Lefrançois <maxime.lefrancois@emse.fr>
Whoops - of course I meant and "2.5^^ucum:mi is very different to "2.5"^^ucum:km But there we are > -----Original Message----- > From: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au> > Sent: Sunday, 26 July, 2020 20:52 > To: Eric Prud'hommeaux <eric@w3.org> > Cc: Hugh Glaser <hugh@glasers.org>; Antoine Zimmermann > <antoine.zimmermann@emse.fr>; Semantic Web <semantic-web@w3.org>; > Maxime Lefrançois <maxime.lefrancois@emse.fr> > Subject: [ExternalEmail] RE: Blank nodes must DIE! [ was Re: Blank nodes > semantics - existential variables?] > > > wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi . > > wipe:saturn ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi . > > wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi . > > This style is a bit like language tags on string-literals - but here it is "unit tags" > on number-literals. > It looks like a nice parallel: "air"@en denotes a very different thing to > "air"@ms , and "2.5^^ucum:mi is very different to "2.5"@ucum:km . > > But do we need to remember and avoid the processing issues that came > from language tags? > > Simon > > > -----Original Message----- > > From: Eric Prud'hommeaux <eric@w3.org> > > Sent: Saturday, 25 July, 2020 01:38 > > To: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au> > > Cc: Hugh Glaser <hugh@glasers.org>; Antoine Zimmermann > > <antoine.zimmermann@emse.fr>; Semantic Web <semantic- > web@w3.org>; > > Maxime Lefrançois <maxime.lefrancois@emse.fr> > > Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - > > existential variables?] > > > > You'll need a microparsing regardless, and, as Simon points out, it's > > not that onerous. My point was just that having to microparse union > > types out of the same literal as a UCUM type is more complicated than > > parsing the UCUM type out of the literal's datatype. Having numeric > > types separated from their units would allow SPARQL 1.1 queries to avoid > cracking the literal form, e.g. > > > > Data: > > [[ > > PREFIX ex: <http://a.example/astro#> > > PREFIX ucum: <http://ucum.nlm.nih.gov/#> PREFIX wipe: > > <https://en.wikipedia.org/wiki/> > > > > wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi . > > wipe:saturn ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi . > > wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi . > > ]] > > > > Query: > > [[ > > PREFIX ex: <http://a.example/astro#> > > PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > > PREFIX ucum: <http://ucum.nlm.nih.gov/#> > > > > SELECT ?planet WHERE { > > ?planet ex:diameter ?d . > > FILTER(datatype(?d) = ucum:km > > && xsd:float(str(?d)) > 1E5) > > } > > ]] > > > > Results: > > ┌─────────────────────────────────────────┐ > > │ ?planet │ > > │ <https://en.wikipedia.org/wiki/saturn> │ │ > > <https://en.wikipedia.org/wiki/jupiter> │ > > └─────────────────────────────────────────┘ > > > > Here, SPARQL takes care of parsing value types for us ("1.1e5" or > > "139822") so the query is safe (datatype(?d) = ucum:km) and pretty easy to > compose. > > The same query is substantially more tedious with data like: > > wipe:saturn ex:diameter "1.1e5 km"^^cdt:ucum , "72367 mi"^^cdt:ucum . > > which is likely to lead to unsafe shortcuts. > > > > So Maxime and Antionne, how hard would it be to transplant your > > semantics to apply directly to ucum (presuming their cooperation). > > > > > > On Fri, Jul 24, 2020 at 12:12:34AM +0000, Cox, Simon (L&W, Clayton) wrote: > > > Yes you would need a UCUM parser. > > > > > > Note however that UCUM is not a "large vocabulary". > > > There is a relatively small set of terminals here > > http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine > > these into a countably infinite set. > > > The rule is described here: > > > http://unitsofmeasure.org/ucum.html#section-Syntax-Rules > > > > > > There are a number of implementations listed here > > https://unitsofmeasure.org/trac at 'Implementation Support'. > > > This documentation has not been updated for about 3 years, so some > > > of > > the links might be stale, and there may be others. > > > > > > A units-of-measure library, with UCUM support, that was available to > > > be > > integrated into RDF applications would be a significant contribution > > to the community. > > > > > > Simon > > > > > > > -----Original Message----- > > > > From: Hugh Glaser <hugh@glasers.org> > > > > Sent: Friday, 24 July, 2020 08:58 > > > > To: Eric Prud'hommeaux <eric@w3.org> > > > > Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>; Semantic > > Web > > > > <semantic-web@w3.org>; Maxime Lefrançois > > <maxime.lefrancois@emse.fr> > > > > Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics > > > > - existential variables?] > > > > > > > > If I understand correctly. > > > > I will need to add a UCUM parser to my system to be able to > > > > process these datatypes, if people send them to me in their RDF? > > > > In fact, I will need a UCUM to RDF converter to be able to "understand" > > > > properly what they "mean"? > > > > Does such an animal exist? > > > > > > > > It looks to me that UCUM is quite a large vocabulary of units, for > > > > a start - what would the URI for the "liter" unit of measurement > > > > be, for > > example? > > > > > > > > I'm very happy to have widely adopted standards like this - I just > > > > want to keep my Semantic Web processing in the Semantic Web (RDF), > > > > and as simple as possible. > > > > Or at least be helped to do that. > > > > > > > > Cheers > > > > > > > > > On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> wrote: > > > > > > > > > > On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann > wrote: > > > > >> Regarding physical quantities, such as "5 inches", etc., my > > > > >> colleague Maxime Lefrançois and myself coauthored a > > > > >> specification for a datatype for physical quantities [1]. It is quite > simple: > > > > >> we reuse the Unified Code for Units of Measurement (UCUM), a > > > > >> standard that is used in many scientific applications, and > > > > >> combine it > > with a number: > > > > >> > > > > >> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER> > ::= > > > > >> xsd:decimal(('e'|'E')xsd:integer)? > > > > >> > > > > >> Since UCUM has a well defined semantics, so does our datatype. > > > > >> Better, since UCUM is implemented in many programming > > > > >> languages, my colleague Maxime could easily integrate it into > > > > >> Jena and its SPARQL engine > > > > [2]. > > > > >> > > > > >> So, with our Jena fork, one can write: > > > > >> > > > > >> SELECT ?planet WHERE { > > > > >> ?planet a ex:Planet; > > > > >> ex:diameter ?s . > > > > >> FILTER(?s > "2e11 mm"^^cdt:ucum) } > > > > > > > > > > I applaud the work to extend XSD's numeric types so that RDF can > > > > > have > > > > standard measurement types. But why not leverage your work by > > > > adding SPARQL support for UCUM types? e.g. > > > > > > > > > > SELECT ?planet WHERE { > > > > > ?planet a ex:Planet; > > > > > ex:diameter ?s . > > > > > FILTER(?s > "2e11"^^ucum:mm) > > > > > } > > > > > > > > > > It feels cleaner to me to embed the entire type of the data in > > > > > the literal's > > > > datatype rather than spreading it across an aggregator type > > > > (cdt:ucum) and the lexical value (" mm"). > > > > > > > > > > In either case we probably have a union type in the lexical > > > > > value so we'd > > > > have to micro-parse doubles, decimals and integers, but the > > > > parsing is easier if the measurement unit is broken out into the > > > > end of the > > datatype URL. > > > > > > > > > > There are a few UCUM units that aren't viable localnames (e.g. > > > > > "m/s.s"), > > > > but I think we can encode around that (e.g. "m_s.s") in a way that > > > > still makes > > > > ucum: a practical namespace for datatypes. > > > > > > > > > > > > > > >> This works if the size of the planet is encoded as a cdt:ucum, > > > > >> no matter what unit one is using. One can even use "link for > > > > >> Gunter's chain" (unit "[lk_us]"), or "cubic meters per acre" > > > > >> (unit > > > > >> "m3/[acr_us]") [3], which are both units of length. > > > > >> > > > > >> With some of our industrial partners, we are using this for > > > > >> energy data, and they seem to be very pleased with this > > > > >> approach, compared to an ontology-based approach. > > > > >> > > > > >> > > > > >> [1] https://w3id.org/lindt/custom_datatypes#ucum > > > > >> [2] You can try it at > > > > >> https://ci.mines-stetienne.fr/lindt/playground.html > > > > >> [3] Try this query in the playground: > > > > >> > > > > >> """ > > > > >> PREFIX iter: <http://w3id.org/sparql-generate/iter/> > > > > >> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > > > > >> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#> > > > > >> PREFIX ex: <http://example.org/> > > > > >> > > > > >> SELECT ?length ?normalized > > > > >> > > > > >> WHERE{ > > > > >> > > > > >> VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum } # convert > > > > >> to meters > > > > >> BIND("0 m"^^cdt:ucum + ?position AS ?normalized ) > > > > >> > > > > >> } > > > > >> """ > > > > >> > > > > >> --AZ > > > > >> > > > > >> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit : > > > > >>> Yeah, the atomicity of the chunk is the point. This even > > > > >>> applies to quantities. 25.4mm is *identical* to 1” – they are the > same thing. > > > > >>> Any engine that operates with quantities needs to understand > that. > > > > ’25.4’ > > > > >>> and ‘mm’ cannot be separated. Coordinates are slightly more > > > > >>> complex but it comes down to the same thing. A single element > > > > >>> within a set of coordinates that describes a position in space > > > > >>> is not independent of the other numbers in the tuple, or of > > > > >>> the coordinate reference system within which they are > > > > >>> expressed. One value should *never* be used independent of the > > > > >>> others. Exactly the same position on the earth will be denoted > > > > >>> by three different numbers if embedded in a different > > > > >>> coordinate reference system. You can only ‘reason’ over them > > > > as a group, not individually. > > > > >>> > > > > >>> *From:*Dan Brickley <danbri@danbri.org> > > > > >>> *Sent:* Thursday, 16 July, 2020 23:58 > > > > >>> *To:* Jeen Broekstra <jeen@fastmail.com> > > > > >>> *Cc:* Semantic Web <semantic-web@w3.org> > > > > >>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes > > > > >>> semantics > > > > >>> - existential variables?] > > > > >>> > > > > >>> … > > > > >>> > > > > >>> I believe the big appeal of putting it all into the zone we > > > > >>> call "literals" is that you get a kind of atomicity; that > > > > >>> chunk of data is either there, or not there; it is asserted, > > > > >>> or not asserted. With a triples-based (description of a ) data > > > > >>> structure you have to be constantly on your guard that every > > > > >>> subset of the full graph pattern is at least sensible and > > > > >>> harmless, even when subsetting these chunks is often confusing > > > > >>> or misleading for data consumers. I can't help wondering > > > > >>> whether notions of graph shapes from shacl, shex (and > > > > >>> sparql) could be exploited to create an RDF-based data format > > > > >>> which had atomicity at the level of entire shapes. > > > > >>> > > > > >>> Dan > > > > >>> > > > > >>> Jeen > > > > >>> > > > > >> > > > > >> -- > > > > >> Antoine Zimmermann > > > > >> Institut Henri Fayol > > > > >> École des Mines de Saint-Étienne > > > > >> 158 cours Fauriel > > > > >> CS 62362 > > > > >> 42023 Saint-Étienne Cedex 2 > > > > >> France > > > > >> Tél:+33(0)4 77 42 66 03 > > > > >> Fax:+33(0)4 77 42 66 66 > > > > >> http://www.emse.fr/~zimmermann/ Member of team Connected > > > > >> Intelligence, Laboratoire Hubert Curien > > > > > > > > -- > > > > Hugh > > > > 023 8061 5652 > > > > > > >
Received on Sunday, 26 July 2020 11:06:38 UTC