Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?]

Hi all,

3 years after implementing and publishing this proposal together with
Antoine, my feeling is that the real blocker for the adoption of cdt:ucum
is not really the ucum license, but rather the adoption in existing RDF
frameworks and triplestores..  With a little more impact and some more
implementations, we definitely would consider to push it as a W3C member
submission (@Paul)

@Dan, I remember we discussed this very same license issue of the license
in ISWC Monterey, I'd be glad if some day the walls can be pushed
@Simon, in the W3C SOSA/SSN spec back in 2017 we do mention how useful and
concise this cdt:ucum notation is [1]
@Eric, encoding the (countably infinite set of) units in their datatype IRI
would definitely raise some voices. We didn't want to require
implementations to parse IRIs.

In the same vein as cdt:ucum and the sub-datatypes, we could consider that
other datatypes could be defined to conveniently represent other
engineering-related data, such as complex numbers, complex numbers + unit,
geocoordinates, lists, sets, matrices, colours, RGC colours, ... each
having its own Lexical Space, Value Space, L2V mapping, ...
btw, sub-datatype is underspecified.  one datatype could have a sub-lexical
space and/or sub-value space, and/or sub-L2V, of another one.

and what about datatypes that encode XML [2], HTML [3], JSON [4], Turtle
documents, RDF graphs ;-) ... where is the limit before the idea becomes
outrageous?  .....

If the object of a triple is a literal whose value is an RDF graph,   could
that be related to RDF Dataset named graphs and dereferencing?   ...

[1]  example 3  in https://www.w3.org/TR/vocab-ssn/
[2] rdf:XMLLiteral or
https://www.iana.org/assignments/media-types/application/xml
[3] rdf:HTML or https://www.iana.org/assignments/media-types/text/html
[4] https://www.iana.org/assignments/media-types/application/json

Best regards,
Maxime Lefrançois
MINES Saint-Étienne
http://maxime-lefrancois.info/


Le ven. 24 juil. 2020 à 00:06, Eric Prud'hommeaux <eric@w3.org> a écrit :

> On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann wrote:
> > Regarding physical quantities, such as "5 inches", etc., my colleague
> Maxime
> > Lefrançois and myself coauthored a specification for a datatype for
> physical
> > quantities [1]. It is quite simple: we reuse the Unified Code for Units
> of
> > Measurement (UCUM), a standard that is used in many scientific
> applications,
> > and combine it with a number:
> >
> > <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE>
> > <NUMBER> ::= xsd:decimal(('e'|'E')xsd:integer)?
> >
> > Since UCUM has a well defined semantics, so does our datatype. Better,
> since
> > UCUM is implemented in many programming languages, my colleague Maxime
> could
> > easily integrate it into Jena and its SPARQL engine [2].
> >
> > So, with our Jena fork, one can write:
> >
> > SELECT ?planet WHERE {
> >   ?planet a ex:Planet;
> >     ex:diameter ?s .
> >   FILTER(?s > "2e11 mm"^^cdt:ucum)
> > }
>
> I applaud the work to extend XSD's numeric types so that RDF can have
> standard  measurement types. But why not leverage your work by adding
> SPARQL support for UCUM types? e.g.
>
> SELECT ?planet WHERE {
>   ?planet a ex:Planet;
>     ex:diameter ?s .
>   FILTER(?s > "2e11"^^ucum:mm)
> }
>
> It feels cleaner to me to embed the entire type of the data in the
> literal's datatype rather than spreading it across an aggregator type
> (cdt:ucum) and the lexical value (" mm").
>
> In either case we probably have a union type in the lexical value so we'd
> have to micro-parse doubles, decimals and integers, but the parsing is
> easier if the measurement unit is broken out into the end of the datatype
> URL.
>
> There are a few UCUM units that aren't viable localnames (e.g. "m/s.s"),
> but I think we can encode around that (e.g. "m_s.s") in a way that still
> makes ucum: a practical namespace for datatypes.
>
>
> > This works if the size of the planet is encoded as a cdt:ucum, no matter
> > what unit one is using. One can even use "link for Gunter's chain" (unit
> > "[lk_us]"), or "cubic meters per acre" (unit "m3/[acr_us]") [3], which
> are
> > both units of length.
> >
> > With some of our industrial partners, we are using this for energy data,
> and
> > they seem to be very pleased with this approach, compared to an
> > ontology-based approach.
> >
> >
> > [1] https://w3id.org/lindt/custom_datatypes#ucum
> > [2] You can try it at
> https://ci.mines-stetienne.fr/lindt/playground.html
> > [3] Try this query in the playground:
> >
> > """
> > PREFIX iter: <http://w3id.org/sparql-generate/iter/>
> > PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> > PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
> > PREFIX ex: <http://example.org/>
> >
> > SELECT ?length ?normalized
> >
> > WHERE{
> >
> >   VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum }
> >   # convert to meters
> >   BIND("0 m"^^cdt:ucum + ?position AS ?normalized )
> >
> > }
> > """
> >
> > --AZ
> >
> > Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit :
> > > Yeah, the atomicity of the chunk is the point. This even applies to
> > > quantities. 25.4mm is *identical* to 1” – they are the same thing. Any
> > > engine that operates with quantities needs to understand that. ’25.4’
> > > and ‘mm’ cannot be separated. Coordinates are slightly more complex but
> > > it comes down to the same thing. A single element within a set of
> > > coordinates that describes a position in space is not independent of
> the
> > > other numbers in the tuple, or of the coordinate reference system
> within
> > > which they are expressed. One value should *never* be used independent
> > > of the others. Exactly the same position on the earth will be denoted
> by
> > > three different numbers if embedded in a different coordinate reference
> > > system. You can only ‘reason’ over them as a group, not individually.
> > >
> > > *From:*Dan Brickley <danbri@danbri.org>
> > > *Sent:* Thursday, 16 July, 2020 23:58
> > > *To:* Jeen Broekstra <jeen@fastmail.com>
> > > *Cc:* Semantic Web <semantic-web@w3.org>
> > > *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes semantics -
> > > existential variables?]
> > >
> > > …
> > >
> > > I believe the big appeal of putting it all into the zone we call
> > > "literals" is that you get a kind of atomicity; that chunk of data is
> > > either there, or not there; it is asserted, or not asserted. With a
> > > triples-based (description of a ) data structure you have to be
> > > constantly on your guard that every subset of the full graph pattern is
> > > at least sensible and harmless, even when subsetting these chunks is
> > > often confusing or misleading for data consumers. I can't help
> wondering
> > > whether notions of graph shapes from shacl, shex (and sparql) could be
> > > exploited to create an RDF-based data format which had atomicity at the
> > > level of entire shapes.
> > >
> > > Dan
> > >
> > >     Jeen
> > >
> >
> > --
> > Antoine Zimmermann
> > Institut Henri Fayol
> > École des Mines de Saint-Étienne
> > 158 cours Fauriel
> > CS 62362
> > 42023 Saint-Étienne Cedex 2
> > France
> > Tél:+33(0)4 77 42 66 03
> > Fax:+33(0)4 77 42 66 66
> > http://www.emse.fr/~zimmermann/
> > Member of team Connected Intelligence, Laboratoire Hubert Curien
> >
>

Received on Thursday, 23 July 2020 22:52:12 UTC