Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?]

You'll need a microparsing regardless, and, as Simon points out, it's not that onerous. My point was just that having to microparse union types out of the same literal as a UCUM type is more complicated than parsing the UCUM type out of the literal's datatype. Having numeric types separated from their units would allow SPARQL 1.1 queries to avoid cracking the literal form, e.g.

Data:
[[
PREFIX ex: <http://a.example/astro#>
PREFIX ucum: <http://ucum.nlm.nih.gov/#>
PREFIX wipe: <https://en.wikipedia.org/wiki/>

wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi .
wipe:saturn  ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi .
wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi .
]]

Query:
[[
PREFIX ex: <http://a.example/astro#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX ucum: <http://ucum.nlm.nih.gov/#>

SELECT ?planet WHERE {
 ?planet ex:diameter ?d .
 FILTER(datatype(?d) = ucum:km
     && xsd:float(str(?d)) > 1E5)
}
]]

Results:
┌─────────────────────────────────────────┐
│ ?planet                                 │
│  <https://en.wikipedia.org/wiki/saturn> │
│ <https://en.wikipedia.org/wiki/jupiter> │
└─────────────────────────────────────────┘

Here, SPARQL takes care of parsing value types for us ("1.1e5" or "139822") so the query is safe (datatype(?d) = ucum:km) and pretty easy to compose. The same query is substantially more tedious with data like:
  wipe:saturn  ex:diameter "1.1e5 km"^^cdt:ucum , "72367 mi"^^cdt:ucum .
which is likely to lead to unsafe shortcuts.

So Maxime and Antionne, how hard would it be to transplant your semantics to apply directly to ucum (presuming their cooperation).


On Fri, Jul 24, 2020 at 12:12:34AM +0000, Cox, Simon (L&W, Clayton) wrote:
> Yes you would need a UCUM parser. 
> 
> Note however that UCUM is not a "large vocabulary". 
> There is a relatively small set of terminals here http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine these into a countably infinite set. 
> The rule is described here: http://unitsofmeasure.org/ucum.html#section-Syntax-Rules 
> 
> There are a number of implementations listed here https://unitsofmeasure.org/trac at 'Implementation Support'. 
> This documentation has not been updated for about 3 years, so some of the links might be stale, and there may be others.   
> 
> A units-of-measure library, with UCUM support, that was available to be integrated into RDF applications would be a significant contribution to the community. 
> 
> Simon 
> 
> > -----Original Message-----
> > From: Hugh Glaser <hugh@glasers.org>
> > Sent: Friday, 24 July, 2020 08:58
> > To: Eric Prud'hommeaux <eric@w3.org>
> > Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>; Semantic Web
> > <semantic-web@w3.org>; Maxime Lefrançois
> > <maxime.lefrancois@emse.fr>
> > Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics -
> > existential variables?]
> > 
> > If I understand correctly.
> > I will need to add a UCUM parser to my system to be able to process these
> > datatypes, if people send them to me in their RDF?
> > In fact, I will need a UCUM to RDF converter to be able to "understand"
> > properly what they "mean"?
> > Does such an animal exist?
> > 
> > It looks to me that UCUM is quite a large vocabulary of units, for a start -
> > what would the URI for the "liter" unit of measurement be, for example?
> > 
> > I'm very happy to have widely adopted standards like this - I just want to
> > keep my Semantic Web processing in the Semantic Web (RDF), and as simple
> > as possible.
> > Or at least be helped to do that.
> > 
> > Cheers
> > 
> > > On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> wrote:
> > >
> > > On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann wrote:
> > >> Regarding physical quantities, such as "5 inches", etc., my colleague
> > >> Maxime Lefrançois and myself coauthored a specification for a
> > >> datatype for physical quantities [1]. It is quite simple: we reuse
> > >> the Unified Code for Units of Measurement (UCUM), a standard that is
> > >> used in many scientific applications, and combine it with a number:
> > >>
> > >> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER> ::=
> > >> xsd:decimal(('e'|'E')xsd:integer)?
> > >>
> > >> Since UCUM has a well defined semantics, so does our datatype.
> > >> Better, since UCUM is implemented in many programming languages, my
> > >> colleague Maxime could easily integrate it into Jena and its SPARQL engine
> > [2].
> > >>
> > >> So, with our Jena fork, one can write:
> > >>
> > >> SELECT ?planet WHERE {
> > >>  ?planet a ex:Planet;
> > >>    ex:diameter ?s .
> > >>  FILTER(?s > "2e11 mm"^^cdt:ucum)
> > >> }
> > >
> > > I applaud the work to extend XSD's numeric types so that RDF can have
> > standard  measurement types. But why not leverage your work by adding
> > SPARQL support for UCUM types? e.g.
> > >
> > > SELECT ?planet WHERE {
> > >  ?planet a ex:Planet;
> > >    ex:diameter ?s .
> > >  FILTER(?s > "2e11"^^ucum:mm)
> > > }
> > >
> > > It feels cleaner to me to embed the entire type of the data in the literal's
> > datatype rather than spreading it across an aggregator type (cdt:ucum) and
> > the lexical value (" mm").
> > >
> > > In either case we probably have a union type in the lexical value so we'd
> > have to micro-parse doubles, decimals and integers, but the parsing is easier
> > if the measurement unit is broken out into the end of the datatype URL.
> > >
> > > There are a few UCUM units that aren't viable localnames (e.g. "m/s.s"),
> > but I think we can encode around that (e.g. "m_s.s") in a way that still makes
> > ucum: a practical namespace for datatypes.
> > >
> > >
> > >> This works if the size of the planet is encoded as a cdt:ucum, no
> > >> matter what unit one is using. One can even use "link for Gunter's
> > >> chain" (unit "[lk_us]"), or "cubic meters per acre" (unit
> > >> "m3/[acr_us]") [3], which are both units of length.
> > >>
> > >> With some of our industrial partners, we are using this for energy
> > >> data, and they seem to be very pleased with this approach, compared
> > >> to an ontology-based approach.
> > >>
> > >>
> > >> [1] https://w3id.org/lindt/custom_datatypes#ucum
> > >> [2] You can try it at
> > >> https://ci.mines-stetienne.fr/lindt/playground.html
> > >> [3] Try this query in the playground:
> > >>
> > >> """
> > >> PREFIX iter: <http://w3id.org/sparql-generate/iter/>
> > >> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> > >> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
> > >> PREFIX ex: <http://example.org/>
> > >>
> > >> SELECT ?length ?normalized
> > >>
> > >> WHERE{
> > >>
> > >>  VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum }  # convert to
> > >> meters
> > >>  BIND("0 m"^^cdt:ucum + ?position AS ?normalized )
> > >>
> > >> }
> > >> """
> > >>
> > >> --AZ
> > >>
> > >> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit :
> > >>> Yeah, the atomicity of the chunk is the point. This even applies to
> > >>> quantities. 25.4mm is *identical* to 1” – they are the same thing.
> > >>> Any engine that operates with quantities needs to understand that.
> > ’25.4’
> > >>> and ‘mm’ cannot be separated. Coordinates are slightly more complex
> > >>> but it comes down to the same thing. A single element within a set
> > >>> of coordinates that describes a position in space is not independent
> > >>> of the other numbers in the tuple, or of the coordinate reference
> > >>> system within which they are expressed. One value should *never* be
> > >>> used independent of the others. Exactly the same position on the
> > >>> earth will be denoted by three different numbers if embedded in a
> > >>> different coordinate reference system. You can only ‘reason’ over them
> > as a group, not individually.
> > >>>
> > >>> *From:*Dan Brickley <danbri@danbri.org>
> > >>> *Sent:* Thursday, 16 July, 2020 23:58
> > >>> *To:* Jeen Broekstra <jeen@fastmail.com>
> > >>> *Cc:* Semantic Web <semantic-web@w3.org>
> > >>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes semantics
> > >>> - existential variables?]
> > >>>
> > >>> …
> > >>>
> > >>> I believe the big appeal of putting it all into the zone we call
> > >>> "literals" is that you get a kind of atomicity; that chunk of data
> > >>> is either there, or not there; it is asserted, or not asserted. With
> > >>> a triples-based (description of a ) data structure you have to be
> > >>> constantly on your guard that every subset of the full graph pattern
> > >>> is at least sensible and harmless, even when subsetting these chunks
> > >>> is often confusing or misleading for data consumers. I can't help
> > >>> wondering whether notions of graph shapes from shacl, shex (and
> > >>> sparql) could be exploited to create an RDF-based data format which
> > >>> had atomicity at the level of entire shapes.
> > >>>
> > >>> Dan
> > >>>
> > >>>    Jeen
> > >>>
> > >>
> > >> --
> > >> Antoine Zimmermann
> > >> Institut Henri Fayol
> > >> École des Mines de Saint-Étienne
> > >> 158 cours Fauriel
> > >> CS 62362
> > >> 42023 Saint-Étienne Cedex 2
> > >> France
> > >> Tél:+33(0)4 77 42 66 03
> > >> Fax:+33(0)4 77 42 66 66
> > >> http://www.emse.fr/~zimmermann/
> > >> Member of team Connected Intelligence, Laboratoire Hubert Curien
> > 
> > --
> > Hugh
> > 023 8061 5652
> > 
> 

Received on Friday, 24 July 2020 15:38:30 UTC