W3C home > Mailing lists > Public > public-vocabs@w3.org > May 2014

Re: Enhancing PropertyValue-based data (was Re: Generic Property-Value Proposal for Schema.org)

From: Jarno van Driel <jarno@quantumspork.nl>
Date: Sun, 4 May 2014 01:21:55 +0200
Message-ID: <CAFQgrbbPONctu6wxCnr35xzKLr4T8APgkiv7KyLgE62+Ok7fQw@mail.gmail.com>
To: Francois-Paul Servant <francoispaulservant@gmail.com>
Cc: Martin Hepp <martin.hepp@ebusiness-unibw.org>, Niklas Lindström <lindstream@gmail.com>, W3C Web Schemas Task Force <public-vocabs@w3.org>
I'm not quite sure by the way whether "Property
http://acme.org/vocab/#Voltage" is considered to be a multi-type-entity
RDFa or not. If so I guess one could use @sameAs in RDFa as well:

<div vocab="http://schema.org/" typeof="Product">
    ...
    <div property="additionalProperty" typeof="PropertyValue" id="
http://ex.com/ov_100_250">
      <link property="sameAs" href="http://acme.org/vocab/#Voltage">
      ...
    </div>
</div>


On Sun, May 4, 2014 at 12:05 AM, Jarno van Driel <jarno@quantumspork.nl>wrote:

> Sorry, should have been:
>
> <div vocab="http://schema.org/" typeof="Product">
>     ...
>     <div property="additionalProperty" typeof="PropertyValue
> http://acme.org/vocab/#Voltage" id="http://ex.com/ov_100_250">
>     ...
>     </div>
> </div>
>
> as opposed to:
>
> <div itemscope itemtype="http://schema.org/Product">
>     ...
>     <div itemprop="additionalProperty" itemscope itemtype="
> http://schema.org/PropertyValue" itemid="http://ex.com/ov_100_250">
>     <link itemprop="sameAs" href="http://acme.org/vocab/#Voltage">
>     ...
>     </div>
> </div>
>
> And would this serve your purpose?
>
>
> On Sat, May 3, 2014 at 11:29 PM, Jarno van Driel <jarno@quantumspork.nl>wrote:
>
>> Forgive me if I misunderstand your point, but doesn't:
>>
>> <div vocab="http://schema.org/" typeof="Product">
>>     ...
>>     <div property="additionalProperty" typeof="PropertyValue
>> http://ex.com/ov_100_250" id="http://ex.com/ov_100_250">
>>     ...
>>     </div>
>> </div>
>>
>> get the same result as:
>>
>> <div itemscope itemtype="http://schema.org/Product">
>>     ...
>>     <div itemprop="additionalProperty" itemscope itemtype="
>> http://schema.org/PropertyValue" itemid="http://ex.com/ov_100_250">
>>     <link itemprop="sameAs" href="http://acme.org/vocab/#Voltage">
>>     ...
>>     </div>
>> </div>
>>
>> Would the @propertyID still be needed then?
>>
>>
>> On Sat, May 3, 2014 at 2:21 PM, Francois-Paul Servant <
>> francoispaulservant@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> what does it take to improve data published using PropertyValue, and to
>>> share the enhancements?
>>>
>>> Le 2 mai 2014 à 22:37, martin.hepp@ebusiness-unibw.org a écrit :
>>> <snip>
>>>
>>> Ideal Version: External Property with Qualitative Value
>>>
>>> <div itemscope itemtype="http://schema.org/Product">
>>>  <span itemprop="name">ACME Electric Anvil</span>
>>> ...
>>>  Operating Voltage: <div itemprop="http://acme.org/vocab/#voltage"
>>> itemscope
>>>       itemtype="http://schema.org/QuantitativeValue">
>>>      <span itemprop="minValue">100</span>-
>>>      <span itemprop="maxValue">220</span>
>>>      <meta itemprop="unitCode" content="VLT" > V
>>> </div>
>>>
>>> with this
>>>
>>> Variant 1: Property name instead of URI
>>>
>>> <div itemtype="http://schema.org/Product">
>>>  <span itemprop="name">ACME Electric Anvil</span>
>>>  <div itemprop="additionalProperty" itemscope itemtype="
>>> http://schema.org/PropertyValue">
>>>   <span itemprop="name">Operating Voltage</span>
>>>   <span itemprop="minValue">100</span>-
>>>   <span itemprop="maxValue">250</span>
>>>   <meta itemprop="unitCode" content="VLT"> V
>>>  </div>
>>> </div>
>>>
>>> or this
>>>
>>> Variant 2: Unit as text instead of UN/CEFACT Common Code and range as a
>>> single field
>>>
>>>
>>> <div itemtype="http://schema.org/Product">
>>>  <span itemprop="name">ACME Electric Anvil</span>
>>>  <div itemprop="additionalProperty" itemscope itemtype="
>>> http://schema.org/PropertyValue">
>>>   <span itemprop="name">Operating Voltage</span>
>>>   <span itemprop="value">100-250</span>-
>>>   <span itemprop="unitText">V</span>
>>>  </div>
>>> </div>
>>>
>>> or in worst case this:
>>>
>>> Variant 3: Range and Unit in a joint field
>>>
>>> <div itemtype="http://schema.org/Product">
>>>  <span itemprop="name">ACME Electric Anvil</span>
>>>  <div itemprop="additionalProperty" itemscope itemtype="
>>> http://schema.org/PropertyValue">
>>>   <span itemprop="name">Operating Voltage</span>
>>>   <span itemprop="value">100-250 V</span>-
>>>  </div>
>>> </div>
>>>
>>>
>>> It is obvious that the version with a dedicated property URI and a
>>> proper http://schema.org/QuantitativeValue node is easier to process.
>>>
>>> But from a data provider's perspective, who typically has the product
>>> properties in very light-weight property-value structures, with often
>>> proprietary properties, even the step to Variant 1 makes data publication
>>> much, much simpler, because he does not have to map the local property name
>>> to a standard property URI nor determine the type of the value
>>> (quantitative, qualitative, or Boolean). That is VERY difficult from
>>> typical Web applications, even if the back-end systems (PDM/PIM) had this
>>> additional data.
>>>
>>>
>>>
>>> one interesting exercise is to try to take data published in the
>>> non-ideal variants, and to see what it requires to get to the ideal one.
>>> With one constraint: we must imagine that there is already a lot of data
>>> published in the non-ideal variants, and that we want to lift them without
>>> republishing them all. This corresponds to the real situation of a client
>>> or a third party who wants to make use of these data and share its results.
>>> Or even of the publishing corporation, which may not be able without a lot
>>> of work to change all the publishing process as it is (neither, of course,
>>> to change anything to what has already been published). Is it possible to
>>> publish some extra statements (in an independent, supplementary process) to
>>> improve the non-ideal published data?
>>> (In an ideal situation, we publish the data, and we can improve it
>>> afterwards).
>>>
>>> Note that a player such as a search engine can quite easily handle the
>>> situation: from
>>> <span itemprop="name">Operating Voltage</span>
>>> it can easily recognize the corresponding http://acme.org/vocab/#voltageproperty in its "knowledge graph of known entities and properties" and then
>>> correctly index the product in question.
>>>
>>> What's for the rest of us?
>>>
>>> In the 3 variants that you describe, as they are, I think that there is
>>> no way to efficiently publish improved data. One can use NLP techniques to
>>> effectively use the data, but he/she cannot easily publish the results.
>>>
>>> The first reason is that the PropertyValue is not identified: in RDF
>>> terms, it is a blank node. No way to say something about it (no way to lift
>>> it therefore).
>>> So, if I have, for instance, a small program that knows that a unitText
>>> of "V" is equivalent to the unitCode "VLT", I can't simply publish
>>> something that would lift data published in variant 2 to the level of
>>> variant 1.
>>>
>>> On the other hand, if the data had been published using an identifier
>>> for the PropertyValues, it would have been possible: if we had for instance
>>> published in the first place:
>>> <div itemtype="http://schema.org/Product">
>>>    <span itemprop="name">ACME Electric Anvil</span>
>>>    <div itemprop="additionalProperty" itemscope itemtype="
>>> http://schema.org/PropertyValue" itemid="http://ex.com/ov_100_250">
>>>            <span itemprop="name">Operating Voltage</span>
>>>            <span itemprop="value">100-250</span>-
>>>            <span itemprop="unitText">V</span>
>>>    </div>
>>> </div>
>>>
>>> one could simply state somewhere
>>> http://ex.com/ov_100_250 schema:unitCode "VLT".
>>>
>>> to improve *all* the description of products published by ex.com that
>>> have an operating voltage of 100-250.
>>>
>>> With that, variants 2, 3 4 are basically equivalent: one can use any ML
>>> / heuristic technique to do the work, and easily share the results.
>>> The publisher of the "non-ideal" data can keep its systems running as
>>> they are, and just publish a small set of triples to improve all the
>>> already published and the to-be-published data.
>>>
>>> Now, can we reach the "ideal version" state as easily?
>>>
>>> Yes, but it requires the use of the propertyID property:
>>> <http://ex.com/ov_100_250> schema:propertyID <
>>> http://acme.org/vocab/#voltage>
>>> and to consider that, if the propertyID is the URI of a property, then if
>>> s additionalProperty pv.
>>> pv propertyID p.
>>> then s p pv.
>>> which is not completely in line with Martin's proposal.
>>>
>>> If this is a problem, there is a variant 0, which is an almost ideal
>>> version
>>> Variant 0: additionalProperty with External Type
>>>
>>> <div itemscope itemtype="http://schema.org/Product">
>>>    <span itemprop="name">ACME Electric Anvil</span>
>>>  ...
>>>   Operating Voltage: <div itemprop="additionalProperty" itemscope
>>> itemtype="http://acme.org/vocab/#Voltage<http://acme.org/vocab/#voltage>"
>>> itemid="http://ex.com/ov_100_250">
>>>        <span itemprop="minValue">100</span>-
>>>        <span itemprop="maxValue">220</span>
>>>        <meta itemprop="unitCode" content="VLT" > V
>>> </div>
>>> (possibly, add the propertyID to this markup)
>>>
>>> Note BTW that I do not consider the external property pattern as the
>>> "ideal version":
>>> - there will never be enough properties in a vocab: we need an
>>> "additionalProperty" anyway
>>> - it's sufficient to just define types of features in practical uses: if
>>> you say that your product has (="additionalProperty") a given "Voltage", do
>>> you really have to say that it "has voltage" the Voltage in question?
>>> - it doesn't work well for "configurations" (partially defined
>>> products), cf
>>> http://events.linkeddata.org/ldow2013/papers/ldow2013-paper-11.pdf
>>>
>>> But this in another story. To summarize:
>>> data published in "non-ideal" versions can be easily enhanced, and the
>>> results shared, if and (I think) only if they include URIs for the
>>> PropertyValue in the first place. In this case, publishing some statements,
>>> independently of the original publishing, can improve a lot of data at once.
>>> The use of URIs for PropertyValues - local ones  is fine - should
>>> therefore be encouraged.
>>>
>>> (this assumes, of course, that users of the data make use of URIs and
>>> conflate statements published about the same URI in two different places.
>>> But without that, it's the whole idea of a web of data which is defeated.
>>> This may seem obvious, but last time I checked Google's structured data
>>> testing tool, it didn't do it even for statements in the same page.)
>>>
>>> fps
>>>
>>>
>>
>
Received on Saturday, 3 May 2014 23:22:25 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:29:41 UTC