W3C home > Mailing lists > Public > public-schemaorg@w3.org > May 2017

Re: Decision tree schema

From: Nicolas Torzec <torzecn@yahoo-inc.com>
Date: Mon, 1 May 2017 16:29:16 +0000 (UTC)
To: Krzysztof Tomasz Zembrowski <krzysztof@zembrowski.com>
Cc: Timothy Holborn <timothy.holborn@gmail.com>, public-schemaorg@w3.org, <owen@ambur.net>
Message-ID: <23813315.1583040.1493656156216@mail.yahoo.com>
Hi Krzysztof,
Thanks for sharing details about your use case. 
A few comments given that context:

1) Beside "decision trees", which is a very specific term used in machine learning, also check for "association rules", which is a similar but more generic concept used in data mining.
2) I understand that PMML may be a good fit: it's designed primarily to exchange predictive models generated by machine, between machines. It started with decision trees but expanded beyond them.

3) RIF is designed to formalize/serialize "business logic", but in a generic way: i.e. it has nothing to do with "business" and everything to do with describing "rules" and "processes" using (a subset of) logic.
4) I don't know anything about StratML. Thanks for the link Owen Ambur.

Regarding schema.org specifically:
1) It has generic support for lists via ItemList and ListItem (i.e. one could encode trees via recursive lists) but nothing specific for association rules.
2) Supporting association rules would require to introduce one or more new classes to capture rules, each having an antecedent (i.e. the "if" part that describes the context/conditions) and a consequent (i.e. the "then" part that describes the consequences: context update, actions, etc). Antecedent and Consequent could be made as simple as Text fields, or as specific/formal as RIF...


On Sunday, April 30, 2017, 5:10:15 AM PDT, Krzysztof Tomasz Zembrowski <krzysztof@zembrowski.com> wrote:Dear Nicholas,
Dear Timothy,

Thank you both for your input. The information is very valuable for me.

My goal is to make legal/law decision trees crawlable and understandable for machines using a set standard (i.e. schema.org).

Currently the decision trees I use have a custom, but simple nested XML structure and are not crawlable or understandable (standalone) for machines. They are visualized (question by question/choice by choice) dynamically using JavaScript (jQuery). Due to the dynamic visualisation the whole decision tree is loaded into the browser using one certain URL. There is no communication with the server while making a choice and there are no URLs for the specific questions/choices in a decision tree, because the whole decision path/data is already loaded in the browser. Currently I am not planning to use prerender.io or SEO.js to make the decision tree crawlable. Therefore I asked my initial question, as I wondered whether a schema for a decision tree already exists, or at least a way to translate the XML nested decision tree into JSON-LD.

I am convinced, that having a decision tree schema is very useful. On one hand a decision tree is a logical and easy to understand way to visualize the decision making process (for humans). On the other hand it is a logical path for the machine to make a decision (using the human logic defined in the decision tree/schema).

In my specific case: translating paragraphs of legal acts/documents into a decision tree is firstly comprehensible for humans, secondly it is a path for decision making for machines whenever they have enough input information.

At first glance PMML seems to be a solution. The only issue with PMML is, that it is way to big and complicated for this small project. Currently the source code of the nested XML is very easy to understand by humans, with minimal technical knowledge and it is very easy to traverse branches, calculate the depth of the decision tree/branch and even make jumps between branches.

I hope this explain my goal clearly.

Would appreciate any input concerning translating a nested XML to a schema.org schema.

Thank you in advance.

Best regards,

Am 26.04.2017 um 03:56 schrieb Nicolas Torzec <torzecn@yahoo-inc.com>:
Krzysztof,What is your goal?
RIF was designed to encode and exchange (business) rules in a formal way using (extensions of) first-order logic. PMML was designed to serialize and exchange predictive models such as decision trees and other (statistical) ML models. It is supported by libraries such as R, Scikit-Learn, or Spark/mllib.
On Tuesday, April 25, 2017, 6:27:31 PM PDT, Timothy Holborn <timothy.holborn@gmail.com> wrote:

On Wed, 26 Apr 2017 at 11:10 Nicolas Torzec <torzecn@yahoo-inc.com> wrote:

The closest thing to a standard schema for predictive models such as decision trees is PMML. PMML is an XML-based interchange format for predictive models such as decision tree, linear regression, etc. 
See https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language.

On Thursday, April 20, 2017, 12:28:40 AM PDT, Krzysztof Tomasz Zembrowski <krzysztof@zembrowski.com> wrote:Dear all,

does a schema for a decision tree structure exist?

I'm wondering whether it's reasonable to output the whole decision tree (with all questions, decision possibilities and answers [the whole tree with all branches]) in the ld+json format.

Moreover, does it makes sense to nest it, like the following example:

__ Choice
____ Answer
__ Choice
____ Question
______ Choice
________ Answer
______ Choice
________ Answer
cf. https://jsfiddle.net/zembrowski/q50L7t6f/


make it completely flat (with or without dependencies), like this example:

and add a parameter to each element with the correlation.
cf. https://jsfiddle.net/zembrowski/hevu3jof/

Would be more than happy to read your thoughts on this topic.

See RIF / OWL: https://www.w3.org/TR/rif-rdf-owl/   

Thank you in advance. 

Best regards,
Received on Monday, 1 May 2017 16:29:53 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:12:35 UTC