Re: Thing Description for existing data sources from Dave Raggett on 2016-06-10 (public-wot-ig@w3.org from June 2016)

From: Dave Raggett <dsr@w3.org>
Date: Fri, 10 Jun 2016 18:28:54 +0100
To: "Charpenay, Victor" <victor.charpenay@siemens.com>
Cc: "public-web-of-things@w3.org" <public-web-of-things@w3.org>, Public Web of Things IG <public-wot-ig@w3.org>
Message-Id: <91454D82-8512-4C66-9752-E3B5335904E0@w3.org>
> On 10 Jun 2016, at 16:50, Charpenay, Victor <victor.charpenay@siemens.com> wrote:
> 
> Hi all,
>  
> I have been recently trying to design a Thing Description for existing IoT data, within the frame of a European project called BIG IoT (http://big-iot.eu/ <http://big-iot.eu/>). I wanted to share with you the conclusions I drew from that exercise. In general, it made me think the current version of the TD model is still rather limited, I expose here the features I missed.
>  
> The data consists of measurements collected in the region of Piedmont, Italy and exposed by an open-source cloud platform called Yucca. See e.g. https://userportal.smartdatanet.it/userportal/#/dashboard/stream/quadrante/2d5dbb35-9565-4eaa-cdfc-0fb9aa04e296/TrFl <https://userportal.smartdatanet.it/userportal/#/dashboard/stream/quadrante/2d5dbb35-9565-4eaa-cdfc-0fb9aa04e296/TrFl>. The platform has a RESTful interface and uses the OData framework. Streams and datasets are modeled as resources, associated with standard query parameters and specific serialization formats.
>  
> 1.     Modeling of data stream
> The API is designed in such a way that streams are modeled as collections of single events (this had been debated on this mailing-list), each having an identifier (a URI). One can either access the events individually (the measurements) or access the stream as a whole (the collection). Obviously, the TD should contain an Event to describe a measurement. But in the same time, the collection that would normally act as the Event also returns data and can be seen as a Property. I guess this measurement pattern is pretty common. It is even captured in the SSN ontology (where measurement is called observation). How to deal with that?

When you say a URI, what does that mean?  In a data model, it presumably just means an RDF URI, or are you thinking in protocol terms?  If each measurement has its own URI, I presume this involves a time stamp or sequence number. What kinds of streams are there in the BIG-IoT project?  For contrast, I am working on a demo with 250 measurements per second with a 6 axis accelerometer.  I don’t yet see a need for a URI for each measurement, so perhaps your environment requires additional assumptions, if so could you please explain what they are?

You say "One can either access the events individually (the measurements) or access the stream as a whole”. Another model involves reading or writing a sequence of measurements, analogous to POSIX Sockets.  In other words, the API doesn’t need to expose each measurement as a separate event.

> 2.     Query parameters
> A proposal has just come up for dynamic query parameters in the group:https://github.com/w3c/wot/tree/master/proposals/resource-parameters <https://github.com/w3c/wot/tree/master/proposals/resource-parameters>. OData specifies some standard parameters like  “filter” or “orderby”. It would be great to have these in the TD model. See http://docs.oasis-open.org/odata/odata/v4.0/errata02/os/complete/part1-protocol/odata-v4.0-errata02-os-part1-protocol-complete.html#_Toc406398291 <http://docs.oasis-open.org/odata/odata/v4.0/errata02/os/complete/part1-protocol/odata-v4.0-errata02-os-part1-protocol-complete.html#_Toc406398291>.

What kinds of query are you talking about, and how do they fit into the architecture?

> 3.     Value type definition (I)
> OData defines its own schema language (CSDL). Although I could translate the schemas the platform uses into JSON Schema, I doubt this is can be a long-term solution. Moreover, it would require for all data providers to re-engineer their data models. I would advocate instead that data type definitions should be outside of a Thing Description. Here, all schemas are hosted by the platform: WoT devices could actually do the same. Value types would then just need to contain the local URI of the type definition and its schema language. This way, no cloud connectivity is required for a client to retrieve. A JSON serialization of the type definition is also not mandatory anymore. One could use Relax-NG, XML Schema or even CSDL. This is the simplest solution I found to re-use existing value types declared by the platform.

This is why I propose to have a type system that is independent of the schema language.

> 4.     Value type definition (II)
> This solution still need to be refined. For instance, CSDL is formalized in XML Schema, which means one could theoretically parse any SCDL document with the sole knowledge of XML Schema. Which schema language is it better to declare? CSDL (more efficient but too specialized) or XML Schema (more generic but resource-consuming)? Should the decision be left to the implementer of WoT software?

I had a discussion with the technical lead for schema.org today, and one of the things that came up in conversation was how prescriptive the schema language is. We both agreed that there is a role for varying levels of prescriptiveness. A full prescriptive language can be very brittle.  However, some ability to describe constraints is often useful, e.g. to state that a particular field is required. It is often desirable to tolerate new fields.  The RDF open world hypothesis provides another perspective on this.

Another dimension to the discussion is whether to stick to existing approaches, even when they are awkward for the use cases in hand, or to come up with new leaner approaches. This will depend on the community you are targeting.

All this explains why I feed we need to explore a range of approaches and to analyse their strengths and weaknesses from a technical perspective and from a social perspective for specific developer communities.

Best regards,
—
   Dave Raggett <dsr@w3.org <mailto:dsr@w3.org>>
Received on Friday, 10 June 2016 17:29:09 UTC