2026-05-05
Highlighted a full chatgpt.com conversation, hit Cmd-C, pasted into Markdown. Got the first message. Scrolled up to confirm the rest was visible on screen, ran it again. Same result. Looked like a clipboard bug. It wasn’t.
chatgpt.com renders its message list with windowing: only the on-screen turns are in the DOM at any moment. The browser’s selection model can’t see what isn’t rendered. Highlight-all selects the visible nodes, the clipboard gets exactly what was selected, and “the rest of the conversation” was never in the document to begin with. The DOM was not the source of truth.
The conversation lives behind /backend-api/conversation/<id>, the same endpoint the official Share button calls. Pull the access token from /api/auth/session, fetch the conversation JSON, walk current_node back through parent pointers to build the active branch, and you have every turn the UI ever rendered plus every turn it hasn’t. The whole thing fits in a bookmarklet. The endpoint is undocumented, so expect this to break when OpenAI changes the API shape.
The interesting part was citation resolution. Search-augmented assistant responses contain markers like citeturn346779view0turn707315view0 where the rendered UI shows footnote links. My first attempt recursed into metadata.content_references[].items[] looking for URLs, parsed turn<G>view<I> fragments out of the marker strings, and built a lookup keyed by fragment. Zero matches. The lookup key, the literal marker string, lives on the parent ref as matched_text, and I had dropped that association by recursing past it. Once I treated matched_text as a literal substring to replace, the resolver collapsed to a six-line loop.
The general shape: any SPA that paginates, virtualizes, or lazy-loads its content is a UI that has decoupled what’s on screen from what’s selectable. Clipboard, screen readers, and Find-in-Page all share the same blind spot. When you need the underlying data, the DevTools Network tab tells you what to ask for and the session cookie tells you how to ask for it. Scraping the rendered page is the wrong abstraction for a page that wasn’t fully rendered.