Re: Generic Property-Value Proposal for Schema.org from martin.hepp@ebusiness-unibw.org on 2014-04-30 (public-vocabs@w3.org from April 2014)

From: <martin.hepp@ebusiness-unibw.org>
Date: Wed, 30 Apr 2014 23:43:24 +0200
To: Francois-Paul Servant <francoispaulservant@gmail.com>
Cc: W3C Web Schemas Task Force <public-vocabs@w3.org>
Message-Id: <5DD54BB9-323E-47BA-B2B9-158C249DB6A8@ebusiness-unibw.org>
Dear Francois-Paul:
On 30 Apr 2014, at 09:14, Francois-Paul Servant <francoispaulservant@gmail.com> wrote:

> Dear Martin,
> 
> some remarks regarding your proposal.
> 
> Regarding the motivations:
> - I agree that there is a strong motivation for such a proposal, and you name it in your second design principle: "No Lifting and Cleansing Barrier: Do not force site owners to lift or cleanse existing data."
> You may have very precise data describing your products in a table that you could very well publish it as they are, but it is difficult to map columns and cells to external vocabularies (if such vocabularies exist). It should be possible to lift the data later.

Great, thanks! I think automotive is a really nice example - we typically have lots of relevant car features, but it will be very tiring to define a global standard for all marketing-relevant features (and their authoritative translations etc.).

> 
> - I'm less convinced by the argument "generic extension mechanism for properties at the level of schema.org". As you note, using external properties is a problem in microdata. But it is not the case in RDFa or in JSON-LD: RDF, by itself, provides a generic pattern for exposing characteristics for entities. I don't think that it is a big effort for a site owner to mint a URI for an additional property.
> 
That is perfectly fine from my perspective. In fact, this is just a by-product of the proposal and I wanted to disclose that properly.
However, note that e.g. for smaller sites, it is indeed a -- at least perceived -- problem to mint a URI for an additional property. Even big automotive players had external support for defining their OWL vocabularies;-)

Think of hotels for instance - if they define room features, they will often not be able to use an existing URI nor define their own.

We are in agreement that 

1. in non-microdata syntax, external properties are in principle no problem and
2. in general, properly defined properties with a URI will be better, if available.

The proposal is about filling that gap.


> Regarding the proposal itself: in order to avoid having to define many properties in schema.org, you propose an alternative, simplified way to write s p o triples when describing a resource s, using one and only one property, schema:additionalProperty, whose range is schema:PropertyValue. Basically, PropertyValue is a pair (property,value). You describe a PropertyValue using a few properties: schema:name, schema;value, schema:unitText, etc.
> 
> I would keep and make explicit the (property,value) pair structure, using two dedicated properties (say): schema:property and schema:object, both with domain PropertyValue
> Why? to make it possible to easily lift data published using schema:additionalProperty, in bulk.

If I understand you correctly, you are proposing to create individual nodes for the property name part and for the value part. I have looked at the proposal, but 

- I see no gain in using a dedicated property node for the property name. If you already have a URI for the property, simply use propertyID with the URI of the property and omit the schema:name. That is as simple as your proposal.

- If you already have a URI for the value (e.g. for a qualitative value), you can use schema:value directly with that URI.

I will show that in your examples:

> 
> Let's take some of your examples to explain it:
> 
> <div itemtype="http://schema.org/Product">
> 	<img itemprop="image" src="camera123.jpg" />
> 	<span itemprop="name">Digital Camera 123</span>
> 	<div itemprop="additionalProperty" itemscope itemtype="http://schema.org/PropertyValue">
> 		<span itemprop="name">Approx. Weight</span>
> 		<span itemprop="value">450</span>
> 		<span itemprop="unitText">gram</span>
> 	</div> 
>   	<div itemprop="additionalProperty" itemscope itemtype="http://schema.org/PropertyValue">	  
> 		<span itemprop="name">Interface</span>:	  
> 		<span itemprop="value">USB</span>
> 	</div>
> </div>
> 
> that is in turtle (for lisibility):
> 
> [	a schema:Product;
> 	schema:image x:camera123.jpg;
> 	schema:name "Digital Camera 123";
> 	schema:additionalProperty [
> 		a schema:PropertyValue;
> 		schema:name "Approx. Weight";
> 		schema:value "450";
> 		schema: unitText "gram"
> 	];
> 	schema:additionalProperty [
> 		a schema:PropertyValue;
> 		schema:name "Interface";
> 		schema:value "USB";
> 	]
> ]


Yes

> 
> I suggest to write instead:
> 
> [	a schema:Product;
> 	schema:image x:camera123.jpg;
> 	schema:name "Digital Camera 123";
> 	schema:additionalProperty [
> 		a schema:PropertyValue;
> 		schema:property [
> 			schema:name "Approx. Weight"
> 		];
> 		schema:object [
> 			schema:value "450";
> 			schema: unitText "gram"
> 		]
> 	];
> 	schema:additionalProperty [
> 		a schema:PropertyValue;
> 		schema:property [
> 			schema:name "Interface"
> 		];
> 		schema:object [
> 			schema:value "USB";
> 		]
> 	]
> ]
> 
> Not really different, not more difficult to produce, arguably more blank nodes.

In Microdata, it would be more difficult to produce, also, we would need (or should then at least have), a type for these subnodes.

Your proposal in Microdata would look as follows:

<div itemtype="http://schema.org/Product">
	<img itemprop="image" src="camera123.jpg" />
	<span itemprop="name">Digital Camera 123</span>
	<div itemprop="additionalProperty" itemscope itemtype="http://schema.org/PropertyValue">
		<div itemprop="property" itemscope itemtype="http://schema.org/Property">
			<span itemprop="name">Approx. Weight</span>
		</div>
		<div itemprop="object" itemscope itemtype="http://schema.org/StructuredValue">
			<span itemprop="value">450</span>
			<span itemprop="unitText">gram</span>
		</div>
	</div> 
  	<div itemprop="additionalProperty" itemscope itemtype="http://schema.org/PropertyValue">	
		<div itemprop="property" itemscope itemtype="http://schema.org/Property">		  
			<span itemprop="name">Interface</span>:	
		</div>
		<div itemprop="object" itemscope itemtype="http://schema.org/QuantitativeValue">		  
			<span itemprop="value">USB</span>
		</div>
	</div>
</div>

That are 21 lines in comparison to the initial proposal with 13 lines:

<div itemtype="http://schema.org/Product">
	<img itemprop="image" src="camera123.jpg" />
	<span itemprop="name">Digital Camera 123</span>
	<div itemprop="additionalProperty" itemscope itemtype="http://schema.org/PropertyValue">
		<span itemprop="name">Approx. Weight</span>
		<span itemprop="value">450</span>
		<span itemprop="unitText">gram</span>
	</div> 
  	<div itemprop="additionalProperty" itemscope itemtype="http://schema.org/PropertyValue">	  
		<span itemprop="name">Interface</span>:	  
		<span itemprop="value">USB</span>
	</div>
</div>

It is doable to modify the proposal, but from a Web markup perspective, I am not convinced. My main concern is not so much the additional code as such, but the experience that each additional level of nesting makes RDFa and Microdata coding more error-prone and intellectually more challenging.

Imagine doing this in a non-trivial table in RDFa or Microdata. It will be very painful.


> The point is that in many cases, you have URIs for the values, or you can easily mint them from your own codification. And you can therefore easily produce, say:
> 
> [	a schema:Product;
> 	schema:image x:camera123.jpg;
> 	schema:name "Digital Camera 123";
> 	schema:additionalProperty [
> 		a schema:PropertyValue;
> 		schema:property foo:approxWeight;
> 		schema:object [
> 			schema:value "450";
> 			schema: unitText "gram"
> 		]
> 	];
> 	schema:additionalProperty [
> 		a schema:PropertyValue;
> 		schema:property foo:interface;
> 		schema:object foo:USB
> 	]
> ]
> foo:approxWeight schema:name "Approx. Weight".
> foo:interface schema:name "Interface".
> foo:USB schema:value "USB".
> 
> The advantage here is that this data can be later improved, for instance stating:
> 
> foo:approxWeight rdfs:subPropertyOf schema:weight.
> foo:USB owl:sameAs dbpedia:USB.
> 
> this can be done without any impact on the source systems, on the actual production of the data, or on data that are already published: you can write the statements above once and lift all corresponding records at once.

I think we should separate the issue of consuming this data in RDF worlds from the perspective of mark-up. My assumption of consuming such data in RDF worlds is that with SPARQL CONSTRUCT rules (and a few heuristics), RDF-based consumers will transform the property-value pairs into local schemas in RDFS or OWL or map the data to existing vocabularies (like http://purl.org/vso/ns).

As long as the nodes are blank nodes, you cannot add a name later on anyway, so SPARQL CONSTRUCT works as well.

It may not be obvious, but we only disagree on the tiny little bit whether future lifting and cleansing should happen on the original node (often a BNode), or in a copy of that data in the target data structure.

Note also that in pure RDF worlds, including RDFa, there is no strong need to use the new pattern. You can always use proper RDF or OWL properties. The only downside is that search engines may skip such additional properties.

If you are referring to externally defined URIs for the value or property, you can directly use those:

<div itemtype="http://schema.org/Car">
  <img itemprop="image" src="station_waggon123.jpg" />
  <span itemprop="name">Station Waggon 123</span>
  <div itemprop="additionalProperty" itemscope itemtype="http://schema.org/PropertyValue">
	  <span itemprop="name">Gearbox Type</span>:
	  <link itemprop="value" href="http://purl.org/vvo/ns#GearboxDSG" />VW DSG
	  <link itemprop="propertyID" href="http://purl.org/vvo/ns#gearbox" />
  </div>  
</div>

In RDFa and JSON-LD, you could of course directly use the equivalent of

s vvo:gearbox vvo:GearboxDSG .

But even in this bordeline case I think that my proposal has advantages, since a search engine can partly process the meta-data without fully understanding the external vocabulary.


> 
> A question we would then ask is the question of rules than can be linked to the use of schema:additionalProperty. Is it equivalent to state: 
> s schema:additionalProperty [
> 	schema:property p;
> 	schema:object o
> ]
> 
> and s p o?
> 
In my proposal: Formally, no. But a client would likely consolidate this.

However, I would like to limit the discussion of the exact processing of such data out of this thread, for eventually, the sponsors of schema.org will have to decide whether and how they will use such mark-up.


> Also note that in many cases, you actually don't care about the property. An example describing cars:
> [	a vso:Vehicle;
> 	schema:additionalProperty [
> 		schema:object [ schema:name "Sunroof" ]
> 	],[
> 		schema:object dbpedia:Diesel
> 	]
> ]
> 
> but we probably would prefer to write something like:
> [	a vso:Vehicle;
> 	schema:feature [ schema:name "Sunroof"],
> 	schema:feature dbpedia:Diesel
> ]
> 	

I think that Sunroof: Yes and fuel type: Diesel would be better and not more diffcult to produce:

<div itemtype="http://schema.org/Car">
  <img itemprop="image" src="station_waggon123.jpg" />
  <span itemprop="name">Station Waggon 123</span>
  <div itemprop="additionalProperty" itemscope itemtype="http://schema.org/PropertyValue">
	  <span itemprop="name">Sunroof</span>
	  <meta itemprop="value" content="True">
  </div>  
  <div itemprop="additionalProperty" itemscope itemtype="http://schema.org/PropertyValue">
	  <span itemprop="name">Fuel type</span>:
	  <link itemprop="value" href="http://dbpedia.org/resource/Diesel" />Diesel
  </div>  
</div>


> 
> (note BTW that your use of schema:name for the PropertyValue is a bit incorrect, as you do not use it to label the PropertyValue pair, but the property. A schema:name for the second of the examples should probably be "Interface: USB" - but Ok, that's not important)
That is a separate issue to discuss. I thought about schema:propertyName, but then again, it is in most cases redundant, and I see little harm in overloading schema:name here. I have added it to the list of issues.



> Best Regards,
> 
> fps
> 

Thanks for your substantial feedback!

Best

Martin

> Le 29 avr. 2014 à 11:42, martin.hepp@ebusiness-unibw.org a écrit :
> 
> Dear all:
> 
> I have just finalized a proposal on how to add support for generic property-value pairs to schema.org. This serves three purposes:
> 
> 1. It will allow to expose product feature information from thousands of product detail pages from retailers and manufacturers. 
> 2. It will simplify the development of future extensions for specific types of products and services, because we do no longer need to standardize and define all relevant properties in schema.org and can instead defer the interpretation to the client.
> 3. It will serve as a clean, generic extension mechanism for properties in schema.org
> 
> The proposal with all examples is here:
> 
>   https://www.w3.org/wiki/WebSchemas/PropertyValuePairs
> 
> Your feedback will be very welcome.
> 
> Best wishes / Mit freundlichen Grüßen
> 
> Martin Hepp
> -----------------------------------
> martin hepp  http://www.heppnetz.de
> mhepp@computer.org          @mfhepp
> 
> 
> 
> 
> 
> 
>
Received on Wednesday, 30 April 2014 21:43:56 UTC