Re: How can TDMRep be used in PDF? from Laurent Le Meur on 2023-03-10 (public-tdmrep@w3.org from March 2023)

From: Laurent Le Meur <laurent@edrlab.org>
Date: Fri, 10 Mar 2023 18:21:30 +0100
To: "public-tdmrep@w3.org" <public-tdmrep@w3.org>
Message-Id: <1138F7C9-BB8E-4194-9E9B-5D5173488CEC@edrlab.org>
Hi Vincent, 

The community group will stay alive, and we certainly should setup a call to check renewed interest for this project, now that chatGPT has shown what AI agent can get from freely available content on the Web. 

The work done by the group was initially scoped by the desire to keep things simple: we used robots.txt as an example of something sufficiently simple to be universally accepted. This led to the 3 proposed techniques (use http headers, use a central file, or use html meta tags). A content provider has a choice between these different techniques, but a TDM Actor (an AI Agent) MUST support all three techniques. 

I could argue that using http header or a central file was sufficient: this means acting at the level of the transfer protocol, before the AI Agent gets the content. Adding the possibility to embed information in an HTML resource was done essentially because this was a know practice for web indexing (cf robots.txt). But now that we have this metadata embedding capability, I understand that publishers would like to use it on non-HTML resources, namely PDF and EPUB. 

So, ok we can study this evolution. We should have good reasons to do so, as if we go there, AI Agents will have to (= MUST) support these additional techniques: this may become an issue. 

Note: If this group decides to use XMP for embedding metadata in PDF (and other formats supporting XMP), I think that these new properties should be equivalent to the properties currently embedded in HTML pages and elsewhere, i.e. "tdm-reservation" (0 or 1) and "tdm-policy" (a URL).

Best regards
Laurent
(currently co-chair of the TDMRep CG) 


> Le 9 mars 2023 à 15:56, Lizzi, Vincent <Vincent.Lizzi@taylorandfrancis.com> a écrit :
> 
> Dear TDMRep community,
>  
> There was a recent notice from W3C that this community group might automatically close soon, so this seems like the time to get a question in for this group. I work for a publisher that distributes content in multiple formats including web pages, EPub, and PDF. The PDF format is still very popular among our readers. The PDF files that we publish contain embedded XMP metadata that provides the title, authors, digital object identifier (DOI), version (using NISO JAV terminology), copyright, and license information. In considering a possibility of someday having to implement TDMRep for our content, the TDMRep Final Community Report shows how TDMRep metadata can be encoded in HTML meta tags which could be used in web pages and EPub files. It is unclear as to whether, or how, TDMRep metadata can be encoded in RDF XML to be included in the XMP metadata of a PDF file. Is this use case far outside of the intended scope of TDMRep? If this is a reasonable use of TDMRep, can you provide any guidance on how to encode TDMRep metadata in the metadata of a PDF file?
>  
> Thank you,
> Vincent
>  
> ______________________________________________
> Vincent M. Lizzi
> Head of Information Standards | Taylor & Francis Group
> 530 Walnut St., Suite 850, Philadelphia, PA 19106
> E-Mail: vincent.lizzi@taylorandfrancis.com <mailto:vincent.lizzi@taylorandfrancis.com>
> Web: www.tandfonline.com <http://www.tandfonline.com/>
>  
> Taylor & Francis is a trading name of Informa UK Limited,
> registered in England under no. 1072954
>  
> "Everything should be made as simple as possible, but not simpler."
>  
> 
> Information Classification: General
>
Received on Friday, 10 March 2023 17:21:45 UTC