- From: Martynas Jusevičius <martynas@atomgraph.com>
- Date: Mon, 27 Jul 2020 23:02:09 +0300
- To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
- Cc: Hugh Glaser <hugh@glasers.org>, "Eric Prud'hommeaux" <eric@w3.org>, Semantic Web <semantic-web@w3.org>, Maxime Lefrançois <maxime.lefrancois@emse.fr>
Hi Antoine, you have a vocabulary that you'd like users to adopt. But not only that, they would also have to adopt a software library specific to that vocabulary (the Jena fork), because your syntax does not use standard SPARQL datatype URIs. Am I getting it right? I'm all for new vocabularies, but having to depend on your Jena fork in our projects would be a no-go for me. If you want broad adoption, standard SPARQL is the only way IMO. A side question (haven't followed the thread very much): how does UCUM compare to QUDT? https://qudt.org/ Martynas On Fri, Jul 24, 2020 at 9:49 PM Antoine Zimmermann <antoine.zimmermann@emse.fr> wrote: > > Hugh, > > I will answer as if you are asking about our own proposal (cdt:ucum), > and not Eric's proposal (ucum:m, ucum:W, ucum:s, etc). > > Le 24/07/2020 à 00:58, Hugh Glaser a écrit : > > If I understand correctly. > > I will need to add a UCUM parser to my system to be able to process these datatypes, if people send them to me in their RDF? > > Yes, exactly. > > > In fact, I will need a UCUM to RDF converter to be able to "understand" properly what they "mean"? > > No, you need RDF with support for cdt:cum. Same thing as when you want > to use wkt:Literal, from the GeoSPARQL standard. Other RDF parsers will > still work but they won't be able to understand what the value is. > > > Does such an animal exist? > > Yes. My colleague Maxime Lefrançois implemented it as a fork of Apache > Jena. You can try it at > https://ci.mines-stetienne.fr/lindt/playground.html. There are examples > queries to play with. > > > It looks to me that UCUM is quite a large vocabulary of units, for a start - what would the URI for the "liter" unit of measurement be, for example? > > There are infinitely many units. E.g., meters (m), square meters (m2), > cubic meters (m3), meters to the fourth (m4), etc. There is an infinite > algebra of units of measures. Nothing to be afraid of: there are > infinitely many integers, and yet xsd:integer is well supported. > > > I'm very happy to have widely adopted standards like this - I just want to keep my Semantic Web processing in the Semantic Web (RDF), and as simple as possible. > > Or at least be helped to do that. > > The alternative approaches always require specific code. If you exclude > new datatypes (stick to XSDs and RDF datatypes), you need code that > interpret some vocabulary, e.g.: > > <quantity> :numericValue 10; > :unit <inches> . > > You have to retrieve the relevant triples to reconstruct the quantity. > You have to deal with the cases where there are missing triples: > > <quantity> :numericValue 10 . > > or multiple values: > > <quantity> :numericValue 10; > :unit <inches> . > <quantity> :numericValue 25.4; > :unit <meters> . > > It's quite complicated to use, implement, and cumbersome to query. > > > --AZ > > > > > Cheers > > > >> On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> wrote: > >> > >> On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann wrote: > >>> Regarding physical quantities, such as "5 inches", etc., my colleague Maxime > >>> Lefrançois and myself coauthored a specification for a datatype for physical > >>> quantities [1]. It is quite simple: we reuse the Unified Code for Units of > >>> Measurement (UCUM), a standard that is used in many scientific applications, > >>> and combine it with a number: > >>> > >>> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> > >>> <NUMBER> ::= xsd:decimal(('e'|'E')xsd:integer)? > >>> > >>> Since UCUM has a well defined semantics, so does our datatype. Better, since > >>> UCUM is implemented in many programming languages, my colleague Maxime could > >>> easily integrate it into Jena and its SPARQL engine [2]. > >>> > >>> So, with our Jena fork, one can write: > >>> > >>> SELECT ?planet WHERE { > >>> ?planet a ex:Planet; > >>> ex:diameter ?s . > >>> FILTER(?s > "2e11 mm"^^cdt:ucum) > >>> } > >> > >> I applaud the work to extend XSD's numeric types so that RDF can have standard measurement types. But why not leverage your work by adding SPARQL support for UCUM types? e.g. > >> > >> SELECT ?planet WHERE { > >> ?planet a ex:Planet; > >> ex:diameter ?s . > >> FILTER(?s > "2e11"^^ucum:mm) > >> } > >> > >> It feels cleaner to me to embed the entire type of the data in the literal's datatype rather than spreading it across an aggregator type (cdt:ucum) and the lexical value (" mm"). > >> > >> In either case we probably have a union type in the lexical value so we'd have to micro-parse doubles, decimals and integers, but the parsing is easier if the measurement unit is broken out into the end of the datatype URL. > >> > >> There are a few UCUM units that aren't viable localnames (e.g. "m/s.s"), but I think we can encode around that (e.g. "m_s.s") in a way that still makes ucum: a practical namespace for datatypes. > >> > >> > >>> This works if the size of the planet is encoded as a cdt:ucum, no matter > >>> what unit one is using. One can even use "link for Gunter's chain" (unit > >>> "[lk_us]"), or "cubic meters per acre" (unit "m3/[acr_us]") [3], which are > >>> both units of length. > >>> > >>> With some of our industrial partners, we are using this for energy data, and > >>> they seem to be very pleased with this approach, compared to an > >>> ontology-based approach. > >>> > >>> > >>> [1] https://w3id.org/lindt/custom_datatypes#ucum > >>> [2] You can try it at https://ci.mines-stetienne.fr/lindt/playground.html > >>> [3] Try this query in the playground: > >>> > >>> """ > >>> PREFIX iter: <http://w3id.org/sparql-generate/iter/> > >>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > >>> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#> > >>> PREFIX ex: <http://example.org/> > >>> > >>> SELECT ?length ?normalized > >>> > >>> WHERE{ > >>> > >>> VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum } > >>> # convert to meters > >>> BIND("0 m"^^cdt:ucum + ?position AS ?normalized ) > >>> > >>> } > >>> """ > >>> > >>> --AZ > >>> > >>> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit : > >>>> Yeah, the atomicity of the chunk is the point. This even applies to > >>>> quantities. 25.4mm is *identical* to 1” – they are the same thing. Any > >>>> engine that operates with quantities needs to understand that. ’25.4’ > >>>> and ‘mm’ cannot be separated. Coordinates are slightly more complex but > >>>> it comes down to the same thing. A single element within a set of > >>>> coordinates that describes a position in space is not independent of the > >>>> other numbers in the tuple, or of the coordinate reference system within > >>>> which they are expressed. One value should *never* be used independent > >>>> of the others. Exactly the same position on the earth will be denoted by > >>>> three different numbers if embedded in a different coordinate reference > >>>> system. You can only ‘reason’ over them as a group, not individually. > >>>> > >>>> *From:*Dan Brickley <danbri@danbri.org> > >>>> *Sent:* Thursday, 16 July, 2020 23:58 > >>>> *To:* Jeen Broekstra <jeen@fastmail.com> > >>>> *Cc:* Semantic Web <semantic-web@w3.org> > >>>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - > >>>> existential variables?] > >>>> > >>>> … > >>>> > >>>> I believe the big appeal of putting it all into the zone we call > >>>> "literals" is that you get a kind of atomicity; that chunk of data is > >>>> either there, or not there; it is asserted, or not asserted. With a > >>>> triples-based (description of a ) data structure you have to be > >>>> constantly on your guard that every subset of the full graph pattern is > >>>> at least sensible and harmless, even when subsetting these chunks is > >>>> often confusing or misleading for data consumers. I can't help wondering > >>>> whether notions of graph shapes from shacl, shex (and sparql) could be > >>>> exploited to create an RDF-based data format which had atomicity at the > >>>> level of entire shapes. > >>>> > >>>> Dan > >>>> > >>>> Jeen > >>>> > >>> > >>> -- > >>> Antoine Zimmermann > >>> Institut Henri Fayol > >>> École des Mines de Saint-Étienne > >>> 158 cours Fauriel > >>> CS 62362 > >>> 42023 Saint-Étienne Cedex 2 > >>> France > >>> Tél:+33(0)4 77 42 66 03 > >>> Fax:+33(0)4 77 42 66 66 > >>> http://www.emse.fr/~zimmermann/ > >>> Member of team Connected Intelligence, Laboratoire Hubert Curien > > > >
Received on Monday, 27 July 2020 20:02:35 UTC