Re: schema.org Markup for DITA XML-based Technical Documentation

Hello,

I'm the developer of the DITA RDF plugin [1] for the DITA OT, that 
enables the extraction of the DITA documentation metadata (titles, 
links, authors, keywords, etc.) and its materialization as RDF triples.

Schema.org is popular for the Web because it's well understood by search 
engines. However, it's a somewhat generic vocabulary. It's not meant to 
express the details of documentation metadata. This is the purpose of 
the DITA ontology [2], and it is not especially meant to be understood 
by search engines.

As other pointed out, if Schema.org and SEO are the objective, if the 
DITA content is properly tagged (for instance with SKOS-inspired 
SubjectScheme), one could inject Schema.org metadata about the products 
in the generated HTML content during publication. For me, that's the 
most promising use case.

Though possible, I wouldn't add RDF/XML within DITA content, at least 
not manually. I like DITA content as light as possible, otherwise it's 
hard to maintain. Data must sit in data bases, not in DITA (XML) files.

Colin
https://twitter.com/CMaudry

[1] https://github.com/ColinMaudry/dita-rdf
[2] 
https://www.lucidchart.com/documents/view/4478-99e0-5162a0ee-a67f-27dc0a000cd9

On 23/06/16 07:15, Felix Sasaki wrote:
> Thanks, Jeff. Such solutions have the drawback that you have to change 
> the underlying schema (of DocBook, DITA or other XML vocabularies). I 
> recently had a discussion with a company offering semantic enrichment 
> services to another company. The offer was rejected because the 
> enrichment required a change to the schema -which is part of a 
> literally expensive workflow, that involves many tools and people, 
> potentially across organizations.
>
> Best,
>
> Felix
>
>> Am 22.06.2016 um 22:24 schrieb Young,Jeff (OR) <jyoung@oclc.org 
>> <mailto:jyoung@oclc.org>>:
>>
>> Sorry if this got mentioned already, but you could add RDFa 
>> (Schema.org <http://Schema.org> or otherwise) directly to DocBook XML 
>> as described here:
>>
>> http://www.devx.com/semantic/Article/42543
>>
>> If and when the DocBook XML got transformed into HTML, the RDFa could 
>> be mapped as part of that transformation (e.g. using XSL).
>>
>> Jeff
>>
>> From: Felix Sasaki <fsasaki@w3.org <mailto:fsasaki@w3.org>>
>> Date: Wednesday, June 22, 2016 at 3:54 PM
>> To: John Walker <john.walker@semaku.com <mailto:john.walker@semaku.com>>
>> Cc: Keith Schengili-Roberts <keith.roberts@ixiasoft.com 
>> <mailto:keith.roberts@ixiasoft.com>>, Martynas Jusevičius 
>> <martynas@graphity.org <mailto:martynas@graphity.org>>, 
>> "public-schemaorg@w3.org <mailto:public-schemaorg@w3.org>" 
>> <public-schemaorg@w3.org <mailto:public-schemaorg@w3.org>>, Colin 
>> Maudry <colin@maudry.com <mailto:colin@maudry.com>>
>> Subject: Re: schema.org <http://schema.org> Markup for DITA XML-based 
>> Technical Documentation
>> Resent-From: <public-schemaorg@w3.org <mailto:public-schemaorg@w3.org>>
>> Resent-Date: Wednesday, June 22, 2016 at 3:54 PM
>>
>> A use case is enrichment of technical documentation content with 
>> identifiers for named entities. These may provide links to general 
>> data sets or to specific ones, e.g. provided by the tech doc company.
>>
>> I have explored this with others and produced this demo, showing the 
>> process with docbook and other XML vocabularies. I will present a 
>> DITA demo at this years TCWorld conference in autumn.
>> See the demo here
>> http://fsasaki.github.io/stuff/feisgiltt2016/
>>
>>> Am 22.06.2016 um 18:47 schrieb John Walker <john.walker@semaku.com 
>>> <mailto:john.walker@semaku.com>>:
>>>
>>> Hi Keith
>>>
>>> Given that linked data and DITA are two subjects close to my heart, 
>>> I would be happy to spend time on this subject both to work out 
>>> ideas and (in due course) to implement things.
>>>
>>> A first thought is if a mapping could be generic or would depend on 
>>> the use case at hand. In case the latter how could one define 
>>> mappings from a DITA specialisation to concepts from an ontology 
>>> (schema.org <http://schema.org/>or otherwise) that could be passed 
>>> into a generic processor.
>>>
>>> Alternatively it *should* be perfectly possible to use RDFa directly 
>>> in the source DITA XML and pass this through into HTML+RDFa output, 
>>> but not seen any deployments of this approach in the wild.
>>
>> Choose in the demo the approach 2 „embed linked data via structured 
>> markup“. This uses not RDFa but micro data attributes, but this is 
>> just syntactic sugar. See an XLIFF example below.
>>
>> <xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0" 
>> srcLang="en" trgLang="fr">
>>  <file id="f1">
>>   <unit id="u1">
>>    <segment>
>>    <source>We very much welcome you in the city of <mrk 
>> vocab="http://schema.org/" typeof="Place" property="name" 
>> resource="http://dbpedia.org/resource/Prague">Prague</mrk>, a home of 
>> XML!</source>
>>    </segment>
>>   </unit>
>>  </file>
>> </xliff>
>>
>>>
>>> Just thinking out loud but perhaps also feasible to embed RDF/XML 
>>> into the DITA XML source (similar to how MathML and SVG can be 
>>> embedded).
>>
>> See the approach 4 in the demo. This does not embed RDF/XML but 
>> turtle, changing this to RDF/XML is no big issue.
>>
>>> This is possible in SVG for example.
>>>
>>> Otherwise perhaps an approach like embedding JSON-LD in HTML using 
>>> the script tag with appropriate MIME type might work.
>>
>> See approach 5 for a json-ld example - the information is stored as 
>> web annotation.
>>
>>>
>>> First step would be to define a few concrete use cases.
>>
>> All use cases in above demo are related to SEO. The different 
>> approaches are supplied because they have different influences on 
>> existing XML workflows, e.g. may (or may not break) validation.That 
>> topic will be explored further in a new CG, see
>> https://www.w3.org/community/rax/
>>
>>
>> Best,
>>
>> Felix
>>
>>>
>>>
>>> Regards
>>> John
>>>
>>>
>>> Sent from my Samsung Galaxy smartphone.
>>>
>>>
>>> -------- Original message --------
>>> From: Keith Schengili-Roberts <keith.roberts@ixiasoft.com 
>>> <mailto:keith.roberts@ixiasoft.com>>
>>> Date: 22/06/2016 16:23 (GMT+01:00)
>>> To: John Walker <john.walker@semaku.com 
>>> <mailto:john.walker@semaku.com>>, Martynas Jusevičius 
>>> <martynas@graphity.org <mailto:martynas@graphity.org>>
>>> Cc:public-schemaorg@w3.org <mailto:public-schemaorg@w3.org>, Colin 
>>> Maudry <colin@maudry.com <mailto:colin@maudry.com>>
>>> Subject: Re:schema.org <http://schema.org/>Markup for DITA XML-based 
>>> Technical Documentation
>>>
>>> I'll be honest and say that I can't give you a straight answer as to 
>>> which of those options I would go for as I am still researching what 
>>> is feasible in terms of a bridge between DITA andSchema.org 
>>> <http://schema.org/>. I was not previously aware of the TechArticle 
>>> class that you mention, and have added that to my list of things to 
>>> review.
>>>
>>> Offhand I would say a combination of #1 and #3. Though not designed 
>>> with SEO in mind, Colin Maudry's DITA OT plugin that produces RDF 
>>> seems to me to be a natural stepping stone to producing content in 
>>> RDFa thatSchema.org <http://schema.org/>could parse, though asking 
>>> forSchema.org <http://schema.org/>-aware descriptors to be built 
>>> into the DITA-OT is also a possibility.
>>>
>>> At the moment there is no effective bridge between DITA-based 
>>> content andSchema.org <http://schema.org/>, and I really just want 
>>> to get the ball rolling... (and educate myself as to what is 
>>> required in the process).
>>>
>>> Cheers!
>>>
>>> -
>>>
>>> *Keith Schengili-Roberts*
>>> DITA Information Architect / DITA Specialist
>>> *IXIASOFT *
>>> 825 Querbes, Suite 200, Montréal, Québec, Canada, H2V 3X1
>>> tel + 1 514 279-4942 <tel:%2B%201%20514%20279-4942>/  toll free + 1 
>>> 877 279-4942 <tel:%2B%201%20877%20279-4942>
>>> robertsk@ixiasoft.com <mailto:robertsk@ixiasoft.com> / 
>>> www.ixiasoft.com 
>>> <http://cp.mcafee.com/d/FZsS83gArhohhoh76zBN4TsSCztdBNV5xMSCztdBNVZUsrjhKCOUYyMedETo7n79EzCjpkDYqJxUa9RDVWN-SZ3oG_jBPpeI_fmfSTEr5nWsKrus7fnjovW_8TuKyqeuLsKCONvAQm4T6emKDp55mVEVvVkffGhBrwqrhdECXYDuZXTLuZPtPo0agvbqltDO-6P_QDO7GOfBk5i3VriHI-ndFEKc8L6MQ1wQg60MbwAQg1eDNd40Bm3LN-5Ld40Qp-4Ph07vfp7QdIL6Y11Q5gJZM7na>
>>>
>>> <OutlookEmoji-1457643010967_UC2016-logo.jpg.jpg>
>>>
>>> /Interested in attending? Visit our //event website/ 
>>> <http://www.ixiasoft.com/en/news-and-events/ixiasoft-user-conference-2016>/ for 
>>> more information.///
>>> ------------------------------------------------------------------------
>>> *From:*John Walker <john.walker@semaku.com 
>>> <mailto:john.walker@semaku.com>>
>>> *Sent:*Wednesday, June 22, 2016 9:04:34 AM
>>> *To:*Keith Schengili-Roberts; Martynas Jusevičius
>>> *Cc:*public-schemaorg@w3.org <mailto:public-schemaorg@w3.org>; Colin 
>>> Maudry
>>> *Subject:*RE:schema.org <http://schema.org/>Markup for DITA 
>>> XML-based Technical Documentation
>>> Hi Keith,
>>>
>>> Could you elaborate on what kind of (meta)data you would want to 
>>> expose and the sort of use cases you would want to support?
>>>
>>> For example is it to:
>>> 1.annotate the HTML output (in which caseschema.org 
>>> <http://schema.org/>already has quite broad coverage)
>>> 2.give some insights to the ‘underlying’ DITA resources (maps, 
>>> topics, references between them, etc.) to, for example, better 
>>> analyze re-use and other metrics
>>> 3.improve SEO by describing the subject matter of the content (for 
>>> example what product or subject the content is about)
>>>
>>> An existing class such ashttp://schema.org/TechArticlemight already 
>>> map well to certain DITA concepts.
>>> Alternatively is there some way to classify/type DITA content 
>>> according to some external classification scheme (more specific than 
>>> SubjectScheme in that it should assert the rdf:type of the content 
>>> resource).
>>>
>>> Regards,
>>>
>>> John Walker
>>> Principal Consultant & co-founder
>>> Semaku B.V.
>>> SFJ 4.009, Torenallee 20, 5617 BC Eindhoven
>>> Mobile: +31 6 475 22030
>>> Email:john.walker@semaku.com <mailto:john.walker@semaku.com>
>>> Skype: jaw111
>>> Web:http://semaku.com/
>>>
>>> KvK: 58031405
>>> BTW: NL852842156B01
>>> IBAN: NL94 INGB 0008 3219 95
>>>
>>> *From:*Keith Schengili-Roberts [mailto:keith.roberts@ixiasoft.com]
>>> *Sent:*Wednesday, June 22, 2016 2:25 PM
>>> *To:*Martynas Jusevičius <martynas@graphity.org 
>>> <mailto:martynas@graphity.org>>
>>> *Cc:*public-schemaorg@w3.org <mailto:public-schemaorg@w3.org>; Colin 
>>> Maudry <colin@maudry.com <mailto:colin@maudry.com>>
>>> *Subject:*Re:schema.org <http://schema.org/>Markup for DITA 
>>> XML-based Technical Documentation
>>>
>>> Thanks for mentioning that. I have been in contact with Colin Maudry 
>>> about this already, and I can see how it might be a stepping stone 
>>> towards gettingSchema.org <http://schema.org/>readable data from DITA.
>>>
>>> I am still doing research into the feasibility of the whole thing, 
>>> so am not clear as what you mean with your "shoehorn" comment.
>>>
>>> Cheers!
>>>
>>> -
>>>
>>> *Keith Schengili-Roberts*
>>> DITA Information Architect / DITA Specialist
>>>
>>> *IXIASOFT *
>>> 825 Querbes, Suite 200, Montréal, Québec, Canada, H2V 3X1
>>> tel + 1 514 279-4942 <tel:%2B%201%20514%20279-4942>  /  toll free + 
>>> 1 877 279-4942 <tel:%2B%201%20877%20279-4942>
>>> robertsk@ixiasoft.com <mailto:robertsk@ixiasoft.com> / 
>>> www.ixiasoft.com 
>>> <http://cp.mcafee.com/d/FZsS83gArhohhoh76zBN4TsSCztdBNV5xMSCztdBNVZUsrjhKCOUYyMedETo7n79EzCjpkDYqJxUa9RDVWN-SZ3oG_jBPpeI_fmfSTEr5nWsKrus7fnjovW_8TuKyqeuLsKCONvAQm4T6emKDp55mVEVvVkffGhBrwqrhdECXYDuZXTLuZPtPo0agvbqltDO-6P_QDO7GOfBk5i3VriHI-ndFEKc8L6MQ1wQg60MbwAQg1eDNd40Bm3LN-5Ld40Qp-4Ph07vfp7QdIL6Y11Q5gJZM7na>
>>>
>>> <image001.jpg>
>>>
>>> /Interested in attending? Visit our //event website/ 
>>> <http://www.ixiasoft.com/en/news-and-events/ixiasoft-user-conference-2016>/ for 
>>> more information.///
>>> ------------------------------------------------------------------------
>>> *From:*Martynas Jusevičius <martynas@graphity.org 
>>> <mailto:martynas@graphity.org>>
>>> *Sent:*Tuesday, June 21, 2016 5:18:36 PM
>>> *To:*Keith Schengili-Roberts
>>> *Cc:*public-schemaorg@w3.org <mailto:public-schemaorg@w3.org>; Colin 
>>> Maudry
>>> *Subject:*Re:schema.org <http://schema.org/>Markup for DITA 
>>> XML-based Technical Documentation
>>>
>>> If you want to use DITA in RDF, there is this effort by Colin 
>>> Maudry: http://colin.maudry.com/dita-rdf/#concept/welcome.html
>>>
>>> If you for some reason want to shoehorn it intoschema.org 
>>> <http://schema.org/>specifically, then it sounds like a bad idea.
>>>
>>> On Fri, Jun 17, 2016 at 8:46 PM, Keith Schengili-Roberts 
>>> <keith.roberts@ixiasoft.com <mailto:keith.roberts@ixiasoft.com>> wrote:
>>>> Hello there:
>>>>
>>>> I am wondering if there's the possibility of coming up with 
>>>> aSchema.org <http://schema.org/>format for content produced using 
>>>> the DITA XML structured format? It is primarily (but not 
>>>> exclusively) used by technical writing departments to produce 
>>>> content. It is estimated to be used by somewhere between 5-10% of 
>>>> all technical writing groups, mainly with medium- to large-firms. 
>>>> The standard is open, and is managed by OASIS 
>>>> (https://www.oasis-open.org/committees/dita/).
>>>>
>>>> DITA is topic based, with the latest standard (DITA 1.3) having six 
>>>> topic types: a generic "topic" type, then more specific concept, 
>>>> task, reference, glossary and troubleshooting types. Best Practices 
>>>> suggests that each topic come with a short description, so it is 
>>>> possible to easily identify the type of topic and what it describes.
>>>>
>>>> XHTML output from DITA currently uses Dublin Core descriptive 
>>>> metadata, but it could just as easily use something thatSchema.org 
>>>> <http://schema.org/>could recognize, likely using either the RDFa 
>>>> or Microdata formats.
>>>>
>>>> Is there interest in helping devise a bridge between DITA-based 
>>>> output and something thatSchema.org <http://schema.org/>could use? 
>>>> I am happy to be an expert on the DITA end of things if there is 
>>>> someone willing to help guide me through the process as to what's 
>>>> needed on theSchema.org <http://schema.org/>end of things.
>>>>
>>>> Cheers!
>>>> -
>>>>
>>>> *Keith Schengili-Roberts*
>>>> DITA Information Architect / DITA Specialist
>>>>
>>>> *IXIASOFT *
>>>> 825 Querbes, Suite 200, Montréal, Québec, Canada, H2V 3X1
>>>> tel + 1 514 279-4942 <tel:%2B%201%20514%20279-4942>  /  toll free + 
>>>> 1 877 279-4942 <tel:%2B%201%20877%20279-4942>
>>>> robertsk@ixiasoft.com <mailto:robertsk@ixiasoft.com> / 
>>>> www.ixiasoft.com 
>>>> <http://cp.mcafee.com/d/FZsS83gArhohhoh76zBN4TsSCztdBNV5xMSCztdBNVZUsrjhKCOUYyMedETo7n79EzCjpkDYqJxUa9RDVWN-SZ3oG_jBPpeI_fmfSTEr5nWsKrus7fnjovW_8TuKyqeuLsKCONvAQm4T6emKDp55mVEVvVkffGhBrwqrhdECXYDuZXTLuZPtPo0agvbqltDO-6P_QDO7GOfBk5i3VriHI-ndFEKc8L6MQ1wQg60MbwAQg1eDNd40Bm3LN-5Ld40Qp-4Ph07vfp7QdIL6Y11Q5gJZM7na>
>>>>
>>>> <image001.jpg>
>>>>
>>>> /Interested in attending? Visit our //event website/ 
>>>> <http://www.ixiasoft.com/en/news-and-events/ixiasoft-user-conference-2016>/ for 
>>>> more information.///
>>>
>>> <OutlookEmoji-1457643010967_UC2016-logo.jpg.jpg><image001.jpg>
>>
>

Received on Thursday, 23 June 2016 14:46:23 UTC