W3C home > Mailing lists > Public > public-schemaorg@w3.org > June 2016

Re: schema.org Markup for DITA XML-based Technical Documentation

From: Richard Wallis <richard.wallis@dataliberate.com>
Date: Thu, 23 Jun 2016 16:09:51 +0100
Message-ID: <CAD47Kz5nMvU_mAvtdS6jKYEJO1t0Cr=ddzLNJJgXQg12W907ZQ@mail.gmail.com>
To: Colin Maudry <colin@maudry.com>
Cc: Felix Sasaki <fsasaki@w3.org>, "Young,Jeff (OR)" <jyoung@oclc.org>, John Walker <john.walker@semaku.com>, Keith Schengili-Roberts <keith.roberts@ixiasoft.com>, Martynas Jusevičius <martynas@graphity.org>, "public-schemaorg@w3.org" <public-schemaorg@w3.org>
I would suggest initially starting [for Schema.org] with what can be
described using the vocabulary as it is today.

Focusing in on metadata that would be shared via the web that would help a
user, or search engine, discover a document/resource either directly or via
entities it is related to.

Take example documents already visible on the web that have been created
using DITA and explore how you would markup their web presence using
Schema.org terms.

Starting simple should quickly establish something that can be shared and
discussed with others to identify related use cases and potential more
in-depth needs.

~Richard.


Richard Wallis
Founder, Data Liberate
http://dataliberate.com
Linkedin: http://www.linkedin.com/in/richardwallis
Twitter: @rjw

On 23 June 2016 at 15:45, Colin Maudry <colin@maudry.com> wrote:

> Hello,
>
> I'm the developer of the DITA RDF plugin [1] for the DITA OT, that enables
> the extraction of the DITA documentation metadata (titles, links, authors,
> keywords, etc.) and its materialization as RDF triples.
>
> Schema.org is popular for the Web because it's well understood by search
> engines. However, it's a somewhat generic vocabulary. It's not meant to
> express the details of documentation metadata. This is the purpose of the
> DITA ontology [2], and it is not especially meant to be understood by
> search engines.
>
> As other pointed out, if Schema.org and SEO are the objective, if the DITA
> content is properly tagged (for instance with SKOS-inspired SubjectScheme),
> one could inject Schema.org metadata about the products in the generated
> HTML content during publication. For me, that's the most promising use case.
>
> Though possible, I wouldn't add RDF/XML within DITA content, at least not
> manually. I like DITA content as light as possible, otherwise it's hard to
> maintain. Data must sit in data bases, not in DITA (XML) files.
>
> Colin
> https://twitter.com/CMaudry
>
> [1] https://github.com/ColinMaudry/dita-rdf
> [2]
> https://www.lucidchart.com/documents/view/4478-99e0-5162a0ee-a67f-27dc0a000cd9
>
> On 23/06/16 07:15, Felix Sasaki wrote:
>
> Thanks, Jeff. Such solutions have the drawback that you have to change the
> underlying schema (of DocBook, DITA or other XML vocabularies). I recently
> had a discussion with a company offering semantic enrichment services to
> another company. The offer was rejected because the enrichment required a
> change to the schema -which is part of a literally expensive workflow, that
> involves many tools and people, potentially across organizations.
>
> Best,
>
> Felix
>
> Am 22.06.2016 um 22:24 schrieb Young,Jeff (OR) <jyoung@oclc.org>:
>
> Sorry if this got mentioned already, but you could add RDFa (Schema.org
> or otherwise) directly to DocBook XML as described here:
>
> <http://www.devx.com/semantic/Article/42543>
> http://www.devx.com/semantic/Article/42543
>
> If and when the DocBook XML got transformed into HTML, the RDFa could be
> mapped as part of that transformation (e.g. using XSL).
>
> Jeff
>
> From: Felix Sasaki <fsasaki@w3.org>
> Date: Wednesday, June 22, 2016 at 3:54 PM
> To: John Walker <john.walker@semaku.com>
> Cc: Keith Schengili-Roberts <keith.roberts@ixiasoft.com>, Martynas
> Jusevičius <martynas@graphity.org>, "public-schemaorg@w3.org" <
> public-schemaorg@w3.org>, Colin Maudry <colin@maudry.com>
> Subject: Re: schema.org Markup for DITA XML-based Technical Documentation
> Resent-From: <public-schemaorg@w3.org>
> Resent-Date: Wednesday, June 22, 2016 at 3:54 PM
>
> A use case is enrichment of technical documentation content with
> identifiers for named entities. These may provide links to general data
> sets or to specific ones, e.g. provided by the tech doc company.
>
> I have explored this with others and produced this demo, showing the
> process with docbook and other XML vocabularies. I will present a DITA demo
> at this years TCWorld conference in autumn.
> See the demo here
> http://fsasaki.github.io/stuff/feisgiltt2016/
>
> Am 22.06.2016 um 18:47 schrieb John Walker <john.walker@semaku.com>:
>
> Hi Keith
>
> Given that linked data and DITA are two subjects close to my heart, I
> would be happy to spend time on this subject both to work out ideas and (in
> due course) to implement things.
>
> A first thought is if a mapping could be generic or would depend on the
> use case at hand. In case the latter how could one define mappings from a
> DITA specialisation to concepts from an ontology (schema.org or
> otherwise) that could be passed into a generic processor.
>
> Alternatively it *should* be perfectly possible to use RDFa directly in
> the source DITA XML and pass this through into HTML+RDFa output, but not
> seen any deployments of this approach in the wild.
>
>
> Choose in the demo the approach 2 „embed linked data via structured
> markup“. This uses not RDFa but micro data attributes, but this is just
> syntactic sugar. See an XLIFF example below.
>
> <xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0"
> srcLang="en" trgLang="fr">
>  <file id="f1">
>   <unit id="u1">
>    <segment>
>    <source>We very much welcome you in the city of <mrk vocab="
> <http://schema.org/>http://schema.org/" typeof="Place" property="name"
> resource=" <http://dbpedia.org/resource/Prague>
> http://dbpedia.org/resource/Prague">Prague</mrk>, a home of XML!</source>
>    </segment>
>   </unit>
>  </file>
> </xliff>
>
>
> Just thinking out loud but perhaps also feasible to embed RDF/XML into the
> DITA XML source (similar to how MathML and SVG can be embedded).
>
>
> See the approach 4 in the demo. This does not embed RDF/XML but turtle,
> changing this to RDF/XML is no big issue.
>
> This is possible in SVG for example.
>
> Otherwise perhaps an approach like embedding JSON-LD in HTML using the
> script tag with appropriate MIME type might work.
>
>
> See approach 5 for a json-ld example - the information is stored as web
> annotation.
>
>
> First step would be to define a few concrete use cases.
>
>
> All use cases in above demo are related to SEO. The different approaches
> are supplied because they have different influences on existing XML
> workflows, e.g. may (or may not break) validation.That topic will be
> explored further in a new CG, see
> https://www.w3.org/community/rax/
>
>
> Best,
>
> Felix
>
>
>
> Regards
> John
>
>
> Sent from my Samsung Galaxy smartphone.
>
>
> -------- Original message --------
> From: Keith Schengili-Roberts < <keith.roberts@ixiasoft.com>
> keith.roberts@ixiasoft.com>
> Date: 22/06/2016 16:23 (GMT+01:00)
> To: John Walker < <john.walker@semaku.com>john.walker@semaku.com>,
> Martynas Jusevičius < <martynas@graphity.org>martynas@graphity.org>
> Cc:  <public-schemaorg@w3.org>public-schemaorg@w3.org, Colin Maudry <
> <colin@maudry.com>colin@maudry.com>
> Subject: Re: schema.org Markup for DITA XML-based Technical Documentation
>
> I'll be honest and say that I can't give you a straight answer as to which
> of those options I would go for as I am still researching what is feasible
> in terms of a bridge between DITA and Schema.org <http://schema.org/>. I
> was not previously aware of the TechArticle class that you mention, and
> have added that to my list of things to review.
>
> Offhand I would say a combination of #1 and #3. Though not designed with
> SEO in mind, Colin Maudry's DITA OT plugin that produces RDF seems to me to
> be a natural stepping stone to producing content in RDFa that Schema.org
> <http://schema.org/> could parse, though asking for Schema.org
> <http://schema.org/>-aware descriptors to be built into the DITA-OT is
> also a possibility.
>
> At the moment there is no effective bridge between DITA-based content and
> Schema.org <http://schema.org/>, and I really just want to get the ball
> rolling... (and educate myself as to what is required in the process).
>
> Cheers!
>
> -
>
> *Keith Schengili-Roberts*
> DITA Information Architect / DITA Specialist
>
> *IXIASOFT *
> 825 Querbes, Suite 200, Montréal, Québec, Canada, H2V 3X1
> tel  + 1 514 279-4942 <%2B%201%20514%20279-4942>  /  toll free + 1 877
> 279-4942 <%2B%201%20877%20279-4942>
> <robertsk@ixiasoft.com>robertsk@ixiasoft.com  /
> <http://cp.mcafee.com/d/FZsS83gArhohhoh76zBN4TsSCztdBNV5xMSCztdBNVZUsrjhKCOUYyMedETo7n79EzCjpkDYqJxUa9RDVWN-SZ3oG_jBPpeI_fmfSTEr5nWsKrus7fnjovW_8TuKyqeuLsKCONvAQm4T6emKDp55mVEVvVkffGhBrwqrhdECXYDuZXTLuZPtPo0agvbqltDO-6P_QDO7GOfBk5i3VriHI-ndFEKc8L6MQ1wQg60MbwAQg1eDNd40Bm3LN-5Ld40Qp-4Ph07vfp7QdIL6Y11Q5gJZM7na>
> www.ixiasoft.com
>
> <OutlookEmoji-1457643010967_UC2016-logo.jpg.jpg>
>
> *Interested in attending? Visit our **event website*
> <http://www.ixiasoft.com/en/news-and-events/ixiasoft-user-conference-2016>* for
> more information.*
> ------------------------------
> *From:* John Walker <john.walker@semaku.com>
> *Sent:* Wednesday, June 22, 2016 9:04:34 AM
> *To:* Keith Schengili-Roberts; Martynas Jusevičius
> *Cc:*  <public-schemaorg@w3.org>public-schemaorg@w3.org; Colin Maudry
> *Subject:* RE: schema.org Markup for DITA XML-based Technical
> Documentation
>
> Hi Keith,
>
>
> Could you elaborate on what kind of (meta)data you would want to expose
> and the sort of use cases you would want to support?
>
>
> For example is it to:
> 1.       annotate the HTML output (in which case schema.org already has
> quite broad coverage)
> 2.       give some insights to the ‘underlying’ DITA resources (maps,
> topics, references between them, etc.) to, for example, better analyze
> re-use and other metrics
> 3.       improve SEO by describing the subject matter of the content (for
> example what product or subject the content is about)
>
>
> An existing class such as  <http://schema.org/TechArticle>
> http://schema.org/TechArticle might already map well to certain DITA
> concepts.
> Alternatively is there some way to classify/type DITA content according to
> some external classification scheme (more specific than SubjectScheme in
> that it should assert the rdf:type of the content resource).
>
>
> Regards,
>
>
> John Walker
> Principal Consultant & co-founder
> Semaku B.V.
> SFJ 4.009, Torenallee 20, 5617 BC Eindhoven
> Mobile: +31 6 475 22030
> Email:  <john.walker@semaku.com>john.walker@semaku.com
> Skype: jaw111
> Web:  <http://semaku.com/>http://semaku.com/
>
>
> KvK: 58031405
> BTW: NL852842156B01
> IBAN: NL94 INGB 0008 3219 95
>
>
> *From:* Keith Schengili-Roberts [ <keith.roberts@ixiasoft.com>
> mailto:keith.roberts@ixiasoft.com <keith.roberts@ixiasoft.com>]
> *Sent:* Wednesday, June 22, 2016 2:25 PM
> *To:* Martynas Jusevičius < <martynas@graphity.org>martynas@graphity.org>
> *Cc:*  <public-schemaorg@w3.org>public-schemaorg@w3.org; Colin Maudry <
> <colin@maudry.com>colin@maudry.com>
> *Subject:* Re: schema.org Markup for DITA XML-based Technical
> Documentation
>
>
> Thanks for mentioning that. I have been in contact with Colin Maudry about
> this already, and I can see how it might be a stepping stone towards getting
>  Schema.org <http://schema.org/> readable data from DITA.
>
>
> I am still doing research into the feasibility of the whole thing, so am
> not clear as what you mean with your "shoehorn" comment.
>
>
> Cheers!
>
>
> -
>
>
> *Keith Schengili-Roberts*
> DITA Information Architect / DITA Specialist
>
>
> *IXIASOFT *
> 825 Querbes, Suite 200, Montréal, Québec, Canada, H2V 3X1
> tel  + 1 514 279-4942 <%2B%201%20514%20279-4942>  /  toll free + 1 877
> 279-4942 <%2B%201%20877%20279-4942>
> <robertsk@ixiasoft.com>robertsk@ixiasoft.com  /
> <http://cp.mcafee.com/d/FZsS83gArhohhoh76zBN4TsSCztdBNV5xMSCztdBNVZUsrjhKCOUYyMedETo7n79EzCjpkDYqJxUa9RDVWN-SZ3oG_jBPpeI_fmfSTEr5nWsKrus7fnjovW_8TuKyqeuLsKCONvAQm4T6emKDp55mVEVvVkffGhBrwqrhdECXYDuZXTLuZPtPo0agvbqltDO-6P_QDO7GOfBk5i3VriHI-ndFEKc8L6MQ1wQg60MbwAQg1eDNd40Bm3LN-5Ld40Qp-4Ph07vfp7QdIL6Y11Q5gJZM7na>
> www.ixiasoft.com
>
>
> <image001.jpg>
>
>
> *Interested in attending? Visit our **event website*
> <http://www.ixiasoft.com/en/news-and-events/ixiasoft-user-conference-2016>* for
> more information.*
> ------------------------------
> *From:* Martynas Jusevičius < <martynas@graphity.org>martynas@graphity.org
> >
> *Sent:* Tuesday, June 21, 2016 5:18:36 PM
> *To:* Keith Schengili-Roberts
> *Cc:*  <public-schemaorg@w3.org>public-schemaorg@w3.org; Colin Maudry
> *Subject:* Re: schema.org Markup for DITA XML-based Technical
> Documentation
>
>
> If you want to use DITA in RDF, there is this effort by Colin Maudry:
> <http://colin.maudry.com/dita-rdf/#concept/welcome.html>
> http://colin.maudry.com/dita-rdf/#concept/welcome.html
>
>
> If you for some reason want to shoehorn it into schema.org specifically,
> then it sounds like a bad idea.
>
>
> On Fri, Jun 17, 2016 at 8:46 PM, Keith Schengili-Roberts <
> <keith.roberts@ixiasoft.com>keith.roberts@ixiasoft.com> wrote:
>
> Hello there:
>
>
> I am wondering if there's the possibility of coming up with a Schema.org
> <http://schema.org/> format for content produced using the DITA XML
> structured format? It is primarily (but not exclusively) used by technical
> writing departments to produce content. It is estimated to be used by
> somewhere between 5-10% of all technical writing groups, mainly with
> medium- to large-firms. The standard is open, and is managed by OASIS (
> <https://www.oasis-open.org/committees/dita/>
> https://www.oasis-open.org/committees/dita/).
>
>
> DITA is topic based, with the latest standard (DITA 1.3) having six topic
> types: a generic "topic" type, then more specific concept, task, reference,
> glossary and troubleshooting types. Best Practices suggests that each topic
> come with a short description, so it is possible to easily identify the
> type of topic and what it describes.
>
>
> XHTML output from DITA currently uses Dublin Core descriptive metadata,
> but it could just as easily use something that Schema.org
> <http://schema.org/> could recognize, likely using either the RDFa or
> Microdata formats.
>
>
> Is there interest in helping devise a bridge between DITA-based output and
> something that Schema.org <http://schema.org/> could use? I am happy to
> be an expert on the DITA end of things if there is someone willing to help
> guide me through the process as to what's needed on the Schema.org
> <http://schema.org/> end of things.
>
>
> Cheers!
> -
>
>
> *Keith Schengili-Roberts*
> DITA Information Architect / DITA Specialist
>
>
> *IXIASOFT *
> 825 Querbes, Suite 200, Montréal, Québec, Canada, H2V 3X1
> tel  + 1 514 279-4942 <%2B%201%20514%20279-4942>  /  toll free + 1 877
> 279-4942 <%2B%201%20877%20279-4942>
> <robertsk@ixiasoft.com>robertsk@ixiasoft.com  /
> <http://cp.mcafee.com/d/FZsS83gArhohhoh76zBN4TsSCztdBNV5xMSCztdBNVZUsrjhKCOUYyMedETo7n79EzCjpkDYqJxUa9RDVWN-SZ3oG_jBPpeI_fmfSTEr5nWsKrus7fnjovW_8TuKyqeuLsKCONvAQm4T6emKDp55mVEVvVkffGhBrwqrhdECXYDuZXTLuZPtPo0agvbqltDO-6P_QDO7GOfBk5i3VriHI-ndFEKc8L6MQ1wQg60MbwAQg1eDNd40Bm3LN-5Ld40Qp-4Ph07vfp7QdIL6Y11Q5gJZM7na>
> www.ixiasoft.com
>
>
> <image001.jpg>
>
>
> *Interested in attending? Visit our **event website*
> <http://www.ixiasoft.com/en/news-and-events/ixiasoft-user-conference-2016>* for
> more information.*
>
>
> <OutlookEmoji-1457643010967_UC2016-logo.jpg.jpg><image001.jpg>
>
>
>
>
>
>
Received on Thursday, 23 June 2016 15:10:35 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 23 June 2016 15:10:36 UTC