- From: <andy.carver@yahoo.com>
- Date: Tue, 21 Jan 2025 23:45:39 +0000 (UTC)
- To: "xproc-dev@w3.org" <xproc-dev@w3.org>
- Message-ID: <238722800.1532727.1737503139465@mail.yahoo.com>
Hi folks, I've got two questions about p:load's ability to capture "webpages" (p:document seems to perform the same). These questions are in the context of a divergence (in the general case), between what actually results from the step in the pipeline (viz., the raw HTML as delivered by the web-server), and what one sees in the browser window (i.e., the completed, resulting DOM tree). That is to say, this is a divergence I see currently in Morgana XProcIII and in XML Calabash 3.x. (I imply no disparagement at all, by pointing to this divergence; I'm really quite impressed that this step handles even HTTPS -- getting and de-crypting the HTML document.) So then, 1. Is this (bare-bones HTML) output from the p:load step actually all the spec (or other XProc documentation) has in mind, when speaking of the ability of p:load (or p:document) to retrieve documents (i.e. "webpages") from the Web? 2. Assuming that's all that the spec has in mind -- but also, that what the user might desire (or even expect) is a whole lot more, when visions of getting-and-loading "webpages" fills one's head -- is there so to speak, some other link one can add to the chain (even, say, some NPM package one calls from command line), some other step in one's pipeline perhaps, that will perform, or get a browser to perform, the DOM-completion (including any AJAX calls), and return the actual HTML that is built within and displayed by a web-browser? A simple example: Say one wants to process, as XML, some data about Wyoming state legislators, as seen at https://www.wyoleg.gov/Legislators/2025/H . The naive newbie (such as myself?) might expect all this lovely data will be captured by p:load -- for lovely XML processing -- and be chagrined to discover that the HTML captured is so elementary that when opened in a web-browser it shows a blank, white screen. For all the lovely data is not in the HTML served -- not, that is, until some AJAX queri(es) retrieve it from the server and add it to the DOM. I will mention that I'm a Windows user. So I apologize, if the answer to 2. is kindergarten stuff to Linux gurus :D In any case, I'm hoping for a solution that will work (eventually) in Windows. Many thanks, Andy
Received on Thursday, 23 January 2025 13:09:04 UTC