- From: Francois-Paul Servant <francoispaulservant@gmail.com>
- Date: Sat, 3 May 2014 14:21:57 +0200
- To: martin.hepp@ebusiness-unibw.org
- Cc: Niklas Lindström <lindstream@gmail.com>, W3C Web Schemas Task Force <public-vocabs@w3.org>
- Message-Id: <68E39E85-B2C1-4B6A-8A70-28C50D29FF60@gmail.com>
Hi, what does it take to improve data published using PropertyValue, and to share the enhancements? Le 2 mai 2014 ā 22:37, martin.hepp@ebusiness-unibw.org a écrit : <snip> > Ideal Version: External Property with Qualitative Value > > <div itemscope itemtype="http://schema.org/Product"> > <span itemprop="name">ACME Electric Anvil</span> > ... > Operating Voltage: <div itemprop="http://acme.org/vocab/#voltage" itemscope > itemtype="http://schema.org/QuantitativeValue"> > <span itemprop="minValue">100</span>- > <span itemprop="maxValue">220</span> > <meta itemprop="unitCode" content="VLT" > V > </div> > > with this > > Variant 1: Property name instead of URI > > <div itemtype="http://schema.org/Product"> > <span itemprop="name">ACME Electric Anvil</span> > <div itemprop="additionalProperty" itemscope itemtype="http://schema.org/PropertyValue"> > <span itemprop="name">Operating Voltage</span> > <span itemprop="minValue">100</span>- > <span itemprop="maxValue">250</span> > <meta itemprop="unitCode" content="VLT"> V > </div> > </div> > > or this > > Variant 2: Unit as text instead of UN/CEFACT Common Code and range as a single field > > > <div itemtype="http://schema.org/Product"> > <span itemprop="name">ACME Electric Anvil</span> > <div itemprop="additionalProperty" itemscope itemtype="http://schema.org/PropertyValue"> > <span itemprop="name">Operating Voltage</span> > <span itemprop="value">100-250</span>- > <span itemprop="unitText">V</span> > </div> > </div> > > or in worst case this: > > Variant 3: Range and Unit in a joint field > > <div itemtype="http://schema.org/Product"> > <span itemprop="name">ACME Electric Anvil</span> > <div itemprop="additionalProperty" itemscope itemtype="http://schema.org/PropertyValue"> > <span itemprop="name">Operating Voltage</span> > <span itemprop="value">100-250 V</span>- > </div> > </div> > > > It is obvious that the version with a dedicated property URI and a proper http://schema.org/QuantitativeValue node is easier to process. > > But from a data provider's perspective, who typically has the product properties in very light-weight property-value structures, with often proprietary properties, even the step to Variant 1 makes data publication much, much simpler, because he does not have to map the local property name to a standard property URI nor determine the type of the value (quantitative, qualitative, or Boolean). That is VERY difficult from typical Web applications, even if the back-end systems (PDM/PIM) had this additional data. one interesting exercise is to try to take data published in the non-ideal variants, and to see what it requires to get to the ideal one. With one constraint: we must imagine that there is already a lot of data published in the non-ideal variants, and that we want to lift them without republishing them all. This corresponds to the real situation of a client or a third party who wants to make use of these data and share its results. Or even of the publishing corporation, which may not be able without a lot of work to change all the publishing process as it is (neither, of course, to change anything to what has already been published). Is it possible to publish some extra statements (in an independent, supplementary process) to improve the non-ideal published data? (In an ideal situation, we publish the data, and we can improve it afterwards). Note that a player such as a search engine can quite easily handle the situation: from <span itemprop="name">Operating Voltage</span> it can easily recognize the corresponding http://acme.org/vocab/#voltage property in its "knowledge graph of known entities and properties" and then correctly index the product in question. What's for the rest of us? In the 3 variants that you describe, as they are, I think that there is no way to efficiently publish improved data. One can use NLP techniques to effectively use the data, but he/she cannot easily publish the results. The first reason is that the PropertyValue is not identified: in RDF terms, it is a blank node. No way to say something about it (no way to lift it therefore). So, if I have, for instance, a small program that knows that a unitText of "V" is equivalent to the unitCode "VLT", I can't simply publish something that would lift data published in variant 2 to the level of variant 1. On the other hand, if the data had been published using an identifier for the PropertyValues, it would have been possible: if we had for instance published in the first place: <div itemtype="http://schema.org/Product"> <span itemprop="name">ACME Electric Anvil</span> <div itemprop="additionalProperty" itemscope itemtype="http://schema.org/PropertyValue" itemid="http://ex.com/ov_100_250"> <span itemprop="name">Operating Voltage</span> <span itemprop="value">100-250</span>- <span itemprop="unitText">V</span> </div> </div> one could simply state somewhere http://ex.com/ov_100_250 schema:unitCode "VLT". to improve *all* the description of products published by ex.com that have an operating voltage of 100-250. With that, variants 2, 3 4 are basically equivalent: one can use any ML / heuristic technique to do the work, and easily share the results. The publisher of the "non-ideal" data can keep its systems running as they are, and just publish a small set of triples to improve all the already published and the to-be-published data. Now, can we reach the "ideal version" state as easily? Yes, but it requires the use of the propertyID property: <http://ex.com/ov_100_250> schema:propertyID <http://acme.org/vocab/#voltage> and to consider that, if the propertyID is the URI of a property, then if s additionalProperty pv. pv propertyID p. then s p pv. which is not completely in line with Martin's proposal. If this is a problem, there is a variant 0, which is an almost ideal version Variant 0: additionalProperty with External Type <div itemscope itemtype="http://schema.org/Product"> <span itemprop="name">ACME Electric Anvil</span> ... Operating Voltage: <div itemprop="additionalProperty" itemscope itemtype="http://acme.org/vocab/#Voltage" itemid="http://ex.com/ov_100_250"> <span itemprop="minValue">100</span>- <span itemprop="maxValue">220</span> <meta itemprop="unitCode" content="VLT" > V </div> (possibly, add the propertyID to this markup) Note BTW that I do not consider the external property pattern as the "ideal version": - there will never be enough properties in a vocab: we need an "additionalProperty" anyway - it's sufficient to just define types of features in practical uses: if you say that your product has (="additionalProperty") a given "Voltage", do you really have to say that it "has voltage" the Voltage in question? - it doesn't work well for "configurations" (partially defined products), cf http://events.linkeddata.org/ldow2013/papers/ldow2013-paper-11.pdf But this in another story. To summarize: data published in "non-ideal" versions can be easily enhanced, and the results shared, if and (I think) only if they include URIs for the PropertyValue in the first place. In this case, publishing some statements, independently of the original publishing, can improve a lot of data at once. The use of URIs for PropertyValues - local ones is fine - should therefore be encouraged. (this assumes, of course, that users of the data make use of URIs and conflate statements published about the same URI in two different places. But without that, it's the whole idea of a web of data which is defeated. This may seem obvious, but last time I checked Google's structured data testing tool, it didn't do it even for statements in the same page.) fps
Received on Saturday, 3 May 2014 12:22:28 UTC