Format Transparency: Why AI Can Read Some of Your Files and Not Others

Abstract

For 30 years, choosing a file format meant balancing two dimensions: fidelity (does it preserve what matters?) and size (how much disk and bandwidth?). PNG vs JPEG, FLAC vs MP3, BMP vs WebP — every comparison ran along these two axes.

The arrival of LLMs introduces a new dimension that did not previously exist as a load-bearing concern: transparency — how directly the bytes on disk correspond to the semantic content of the file. The more transparent a format, the simpler its rules, the harder it is to write something malformed, and the easier it is for any reader — human, script, or LLM — to interpret and edit.

This dimension was almost free for three decades. The dominant consumer of file formats was “human-via-GUI,” who never saw the bytes anyway. Opacity was the default and it cost nothing.

A new class of consumer has now arrived. LLMs can only natively read text. Transparency is no longer an aesthetic preference. It is a decision about who, and what, can collaborate with your files.

The Two Dimensions That Used to Be Enough

Open any file-format comparison from the last 30 years and you will see the same axes. PNG vs JPEG: PNG is lossless (high fidelity), JPEG is smaller (better compression). FLAC vs MP3: FLAC preserves every sample, MP3 is one tenth the size. ZIP vs raw: identical content, different size.

These are real trade-offs and they are not going away. The point is narrower: our format-choice vocabulary needed only two words for 30 years — what does it preserve and how big is it. That was sufficient because the only entity reading the files was a human, working through a GUI tool. Nobody inspected the bytes directly. A third dimension would have been an academic curiosity.

A New Reader, A New Dimension

Consider two formats that store the same image content:

A PNG file: a binary blob with magic bytes, chunked structure, zlib-compressed pixel data, optional palette tables, and a CRC at the end.
A PPM (P3) file: the literal text P3\n16 16\n255\n220 38 38 220 38 38 ...

Both produce the same picture. Fidelity is comparable (PPM is actually higher — no compression artifacts). PNG wins decisively on size. By the two dimensions of the GUI era, PNG dominates. But by an axis that was almost invisible until LLMs arrived, the two formats are radically different:

	PNG	PPM (P3)
Spec length to read first byte	~50 pages	1 paragraph
What you can do with `cat file`	Garbage on stdout	Read pixel values directly
Tools that can edit it	Image libraries	Any text editor
LLM can write one from scratch	No (needs a library)	Yes (just print numbers)
Smallest valid file	Hard to construct by hand	Four lines of ASCII

This is not fidelity. It is not size. It is something else, and it is worth naming.

Transparency is the inverse of interpretive distance: how many decoding steps stand between the bytes on disk and the semantic content they represent. PPM has near-zero interpretive distance — numbers are pixels. PNG has many layers (header → chunks → row filters → zlib decompression → 2D pixel array). An encrypted ZIP has even more.

Source transparency and rendered transparency

The dimension splits in two depending on who is reading.

For an AI, transparency means source transparency: are the actual bytes easy to read? An LLM has no rendering pipeline. It reads files as text. Tags, syntax, structure are all low-cost for AI. Compressed binary blobs are not.

For a human, transparency usually means rendered transparency: when the file is opened in its intended viewer, is the output easy to comprehend? Color, layout, typography, charts, interactivity — these all increase human rendered transparency. The bytes themselves are usually irrelevant; humans rarely read source.

The two notions align for many formats but diverge for some. The interesting cases are the divergent ones:

Format	Source transparency (for AI)	Rendered transparency (for humans)
Plain text / Markdown	High	High (raw or rendered)
HTML	High (it is text)	High (rich rendered output); low if a human reads the raw tags
PPM	High	Low (raw is tedious; needs a viewer)
PNG	Low (binary)	High
Photoshop `.psd`	Very low	High (when opened in Photoshop)

HTML is the most instructive case. The source is full of tags — humans don’t enjoy reading it raw. But HTML is still text, so it is highly transparent to AI. And the rendered output is far richer than rendered Markdown. HTML splits the difference: low human source transparency, high AI source transparency, high human rendered transparency. The choice of “AI source vs human rendered” turns out to drive a lot of contemporary format debates — including a popular argument we will revisit later in this post.

The properties of transparency

The properties that follow from high source transparency are striking:

Simple rules. PPM’s entire grammar fits in a tweet.
Hard to violate. It is almost impossible to write a syntactically invalid PPM by hand. Misalign a single PNG chunk and the whole file is unreadable.
Universal tooling. A text editor is enough. No SDK, no library, no specialized viewer.
Diff-friendly. Version control shows you what actually changed, line by line.
Self-describing. You can often guess the format from a single look without reading the spec.

These are not separate features. They are downstream consequences of the same property: bytes are close to meaning.

The Spectrum

Every format sits somewhere on this axis. The sketch below ranks formats by source transparency — the dimension AI cares about. Human rendered transparency is largely orthogonal: a PNG renders beautifully but is opaque to AI; an SVG can be both transparent source and beautifully rendered.

flowchart LR
    A["<b>Fully transparent</b><br/>Plain text · source code<br/>Markdown · CSV<br/>SVG · PPM"]
    B["<b>Highly transparent</b><br/>JSON · YAML · XML<br/>HTML · INI · TOML<br/>biblatex"]
    C["<b>Semi-transparent</b><br/>BMP · WAV<br/>uncompressed TIFF<br/>SQLite"]
    D["<b>Semi-opaque</b><br/>PNG · MP3 · PDF<br/>DOCX · PPTX · XLSX<br/>EPUB"]
    E["<b>Opaque</b><br/>Photoshop .psd<br/>Figma .fig · .blend<br/>compiled binaries<br/>video codecs (H.264)"]
    F["<b>Fully opaque</b><br/>Encrypted blobs<br/>obfuscated formats<br/>neural network weights"]
    A --> B --> C --> D --> E --> F

The boundaries are fuzzy. DOCX is “zip + XML” — technically transparent if you unzip it, but the XML schema is so byzantine that it is effectively semi-opaque. PDF has a text layer for content but layout information is binary-encoded and lossy to extract. SQLite is binary, but its file format is exhaustively documented and stable across decades, making it more transparent in practice than its appearance suggests.

What matters is not the exact ranking. What matters is that the dimension exists and has consequences.

A Demonstration: PPM

To make the spectrum concrete, here is a 16×16 pixel-art heart I drew by typing RGB triples directly into a text editor. No image library was involved in the writing of the file. An excerpt of the source:

P3
# 16x16 hand-typed pixel-art heart in PPM (Plain Portable PixMap)
# Each "R G B" triple is one pixel, left-to-right, top-to-bottom.
# Palette: bg = 30 41 59  (background)
#          R  = 220 38 38 (heart body)
#          H  = 252 165 165 (highlight)
16 16
255
 30 41 59    30 41 59    30 41 59    30 41 59  ...  30 41 59
 30 41 59    30 41 59    30 41 59   252 165 165 220 38 38   ...   220 38 38   30 41 59
 30 41 59    30 41 59   252 165 165 252 165 165 220 38 38   ...   220 38 38   30 41 59
 30 41 59   252 165 165 220 38 38   220 38 38   220 38 38   ...   220 38 38   30 41 59
220 38 38   220 38 38   220 38 38   220 38 38   220 38 38   ...   220 38 38   220 38 38
... 11 more rows ...

The first line declares it is a PPM. The next four are documentation comments (PPM has comments — # to end of line). Line six declares the image dimensions, line seven declares the maximum channel value, and everything after is pixel data: three numbers per pixel, separated by whitespace, left-to-right, top-to-bottom.

Rendered (and upscaled 24× for visibility):

Hand-typed PPM heart, rendered

There is no library involved in this image’s source. There is no decoding step beyond “split on whitespace and group in threes.” A change to a single pixel is one string-replace away. The format is so transparent that it can act as the source code of an image — version-controllable, diffable, mergeable, AI-editable, human-readable.

The cost is real: this 16×16 heart, stored as PPM, is 2.6 KB. Stored as PNG it would be ~300 bytes. A 1920×1080 PPM image exceeds 25 MB. Transparency is not free. But that is the point of having a dimension — there is a trade-off worth talking about.

Why We Chose Opacity for 30 Years

If transparency is so useful, why did the industry move so decisively in the opposite direction? Opacity is not an accident. It was bought, often deliberately, for four distinct reasons.

First, size matters when storage is expensive and bandwidth is slow. The 1990s and 2000s saw an explosion of binary formats specifically optimized for compression: JPEG, MP3, MP4, WebP, AVIF. These formats made the modern web possible. The price is a decompression step before any inspection — opacity is the natural shape of a compressed format.

Second, performance matters when computation is constrained. GPU shaders, video decoders, and audio pipelines are designed around binary tensor formats. ASCII representations would devastate decode performance.

Third, opacity preserves coupled invariants. Some formats maintain internal consistency that fully transparent representations would break. Word’s automatic numbering, cross-references, table of contents, and outline levels all depend on the format owning the relationships between parts of the document. Expose those mechanisms as plain text and naive edits desynchronize them — a cross-reference points to the wrong section the moment a heading is renamed; a numbered list resets when a new item is inserted in the middle. The opaque format protects these invariants by mediating every change through its own interpreter. The same logic applies to spreadsheet formulas with cell dependencies, presentation themes that propagate styling, and design systems with linked components. Opacity here is not laziness; it is enforcement of a coordination requirement that plain text cannot express.

Fourth, GUI tools made opacity invisible to users. Photoshop, Word, Illustrator, and Figma all hide the file format from the human entirely. The file is an implementation detail; the canvas is the interface. There was no reason to demand a readable format because the human never read it.

For three decades the consumer of file formats was “human-via-GUI.” Opacity cost almost nothing — and in the case of coupled-invariant formats, it actively bought correctness. Transparency was a niche concern of Unix purists and version-control enthusiasts.

Why the Calculation Has Changed

For the first time, transparency is not an aesthetic preference. It is a decision about which of your collaborators can participate. LLMs cannot operate a GUI, cannot decode binary chunks unaided, and can write code that handles binary only at the cost of an extra dependency layer and a wider error surface. The new consumer can only natively read text.

flowchart LR
    H["<b>Human</b>"] --> GUI[GUI tool]
    H --> CLI[CLI / editor]
    AI["<b>LLM</b>"] --> CLI
    AI -.weak link.-> GUI

    GUI --> O["Opaque format<br/>(.psd, .fig, .docx)"]
    CLI --> T["Transparent format<br/>(.svg, .md, .json)"]

The dotted line is real. LLMs can call MCP servers that wrap GUI tools (image editors, design tools), but every operation goes through a translation layer that was never required for the human. Each translation is a place where intent is lost.

A transparent format eliminates the translation. The LLM reads the same bytes the human reads. Edits that take a sentence (“change all the section headers to bold”) become single-shot operations. There is no MCP server to install, no API to authenticate, no GUI to remote-control.

Two Strategies for Opaque Formats

You will not always have the freedom to choose. PowerPoint, Word, Photoshop, and Figma are entrenched in workflows you do not control. When the deliverable has to be .docx and the recipient cannot read Markdown, you ship .docx. The question is how to keep AI in the loop without surrendering to the format.

Two strategies are available, and they have very different power structures.

Strategy 1: Wrap the opaque format

Build an MCP server, a CLI, or a plugin that exposes operations on the file: insert paragraph, set cell value, add slide, change layer opacity. The AI invokes these operations one at a time. The format stays opaque. The wrapper provides the interpreter.

This is how most AI-powered productivity tools are built today. It works. But it carries a structural cost: every edit is a tool call. The AI never sees the document directly — it works through a keyhole, requesting one change at a time. Complex edits (rename a variable across 80 occurrences, restructure the headings of a 60-page report, refactor a slide deck) become 80 tool calls instead of one text edit.

Strategy 2: Maintain a transparent twin

Pick a text format that captures everything the deliverable needs. Make the opaque format the rendered output, never the source. Use a CLI to compile text → opaque whenever a deliverable is required.

The canonical example is LaTeX. The .tex file is fully transparent and AI-editable. The .pdf is the deliverable. pdflatex bridges them. Nobody edits the PDF; it is regenerated from the source whenever a fresh copy is needed.

The same pattern shows up across many domains:

Transparent source	Opaque deliverable	CLI bridge
`.tex`	`.pdf`	`pdflatex` / `xelatex`
`.qmd` (Quarto)	`.html`, `.pdf`, `.docx`	`quarto render`
`.md` (Marp / Slidev)	`.pptx`, `.pdf`	`marp` / `slidev export`
`.svg`	`.png`, `.pdf`	`rsvg-convert`, ImageMagick
`.mmd` (Mermaid)	`.png`, `.svg`	`mmdc`
`.ly` (Lilypond)	`.pdf`, MIDI	`lilypond`
`.bib` (biblatex)	Formatted citations	`biber`, `pandoc`

The defining property of this strategy is isomorphism in the direction that matters: the text source contains everything the opaque deliverable will contain. The mapping is one-way — text renders to deliverable, not the reverse — but that is fine, because nobody edits the deliverable. The text is the canonical form. The “lossy round trip” problem that defeats naive sync schemes never arises, because you never go the other way.

The architectural difference

In Claude’s infrastructure, the two strategies map onto two distinct construction patterns:

Strategy 1 is built on MCP servers — long-lived processes that hold state and expose typed operations. The AI calls tools.
Strategy 2 is built on rules (format conventions the AI must follow) and skills (workflows that handle the harder cases). The AI edits text directly. A CLI is invoked from a skill or hook only when a render is needed.

flowchart TB
    subgraph s1 ["Strategy 1: Wrap the format"]
        direction TB
        A1["<b>LLM</b>"] -->|tool call| M["MCP server / plugin"]
        M -->|edit op| O1["Opaque file<br/>(.docx, .pptx, .psd)"]
    end
    subgraph s2 ["Strategy 2: Transparent twin"]
        direction TB
        A2["<b>LLM</b>"] -->|direct edit| T["Transparent source<br/>(.tex, .md, .svg)"]
        T -->|cli render| O2["Opaque deliverable<br/>(.pdf, .pptx, .png)"]
        R["rules"] -.governs.-> A2
        S["skill"] -.handles edge cases.-> A2
    end

The difference in power structure is real. In Strategy 1, the wrapper is the interpreter. The AI is a client. If the wrapper does not expose an operation, the operation cannot be performed. The wrapper holds the keys.

In Strategy 2, the AI is the interpreter. Rules tell it how the format works. Skills handle the edge cases. The CLI is dumb — it only converts. There is no operation the AI cannot perform, because text edits are unbounded. The AI holds the keys.

Both strategies keep AI in the loop. Only Strategy 2 keeps AI in charge.

When each strategy fits

Strategy 1 is the right answer when:

The opaque format has no reasonable text twin (Photoshop layers, Figma component instances, video timelines)
The opaque format is the canonical form by external requirement (collaborators edit the .docx and the workflow has to round-trip)
The required edit operations are well-defined and stable (insert row, change cell color, reorder slides)

Strategy 2 is the right answer when:

A text format exists, or can be designed, that captures the deliverable’s content
You control the workflow and own the canonical form
The space of edits is open-ended (LLMs are most useful when the operation space is unrestricted text)
Long-term collaboration with AI matters more than short-term GUI ergonomics

Most knowledge work — papers, reports, presentations, technical diagrams, websites, bibliographies, configuration — fits Strategy 2 cleanly. That this is not yet the default reflects 30 years of GUI momentum, not the underlying merits.

Applications: Format Choices in Practice

The dimension becomes useful when applied to specific decisions. But “specific” matters. The right format depends not on the content type but on the workflow around it — who edits, who reads, on which surface, and how often the revision cycle repeats. The same document type can call for different formats when the workflow changes.

Four common workflow patterns illustrate the analysis. Each one has a clear answer for its pattern, and a clear boundary where that answer flips. A fifth case — AI-generated reports for human reading, where Thariq Shihipar has argued HTML beats Markdown — gets its own treatment in the next section.

Living documents you revise with AI

The workflow: notes, drafts, internal specs, technical documentation. You write; AI assists with edits and refactors. The source itself is the artifact — you (or close collaborators) read it directly, often in an editor or terminal.

Transparency requirement: high source readability for both human and AI; clean diffs for version control.

The format: Markdown. Plain-text source means AI edits are unlimited and free. Diffs review well. Conversion to PDF, HTML, or even Word for occasional polished delivery is one Pandoc command away.

Where the answer flips: when non-technical reviewers need to comment-and-suggest inside the source (Word’s tracked-changes UX is genuinely hard to beat); when the deliverable’s reader ecosystem only handles .docx (legal, HR, traditional academic administration).

Vector graphics, icons, and diagrams

The workflow: icons, diagrams, charts, illustrations. Generated or modified by code or AI; re-rendered at different sizes; sometimes embedded in HTML or print.

Transparency requirement: high AI source transparency for editing; resolution-independent rendering.

The format: SVG. Text source means AI can write, modify, batch-style, or extract from it directly. Vector means infinite resolution.

Where the answer flips: photographs, screenshots, or any content that is fundamentally raster — no vector representation exists, or compression dominates. Use PNG or JPEG. SVG is wrong for these and trying to force-fit it produces enormous unwieldy files.

Tabular data flowing through pipelines

The workflow: data analyzed by code or AI; transformed, queried, exported. Possibly versioned. Possibly multi-GB.

Transparency requirement: high AI source transparency; queryability without proprietary tools; predictable schema.

The format: CSV for small simple data; SQLite for anything queryable; Parquet for analytical workloads at scale.

Where the answer flips: when humans need formulas, charts, conditional formatting, or pivot-table interaction, Excel is the right answer. Excel does things no transparent format can. Trying to replace it for genuinely interactive human analysis is a mistake — the right rule is “Excel for human analysis surface, transparent format for everything that flows around it.”

Math- or revision-heavy academic writing

The workflow: papers, theses, technical books. Heavy use of math, citations, cross-references, figures. Multi-author revision cycles over months or years.

Transparency requirement: high source transparency for AI-assisted revision; deterministic rendering for citations and math.

The format: LaTeX, or Quarto (which compiles through LaTeX). Source is editable by AI and humans. Math, citations, cross-references are first-class. PDF output is at least as polished as Word’s, often more so.

Where the answer flips: when co-authors refuse to use anything but Word; when an institutional template is locked to .docx. In those cases, the wrapper-MCP route (Strategy 1 from the previous section) becomes the practical answer.

Quick reference

Workflow pattern	Format	Where the answer flips
Living docs you revise with AI	Markdown	Word when reviewer ecosystem requires it
Vector graphics, icons, diagrams	SVG	PNG / JPEG for true raster content
Pipeline / AI-driven tabular data	CSV, SQLite, Parquet	Excel for formula-driven interactive analysis
Math-heavy academic writing	LaTeX / Quarto	Word when co-authors require it

The recommended format is never “best in general.” It is best for this combination of editor, reader, surface, and revision cadence. Change any one of those, and the answer can change. The transparency dimension is a tool for analyzing the workflow, not a verdict on the formats themselves.

Putting the Framework to Work: Re-examining Popular Format Arguments

The framework is most useful when it disagrees with intuition. Two popular arguments about format choice are circulating in the AI-developer community right now. Both contain real observations. Both reach conclusions the framework can correct.

”HTML beats Markdown for AI-generated reports” (Thariq Shihipar)

The argument, summarized: AI is now writing more reports, specs, and explainers than humans are. Humans mostly read these artifacts; they rarely edit the source. For that workflow, Thariq Shihipar argues that rendered HTML beats rendered Markdown — color, layout, tables, embedded SVG, interactivity — and that the tag clutter of HTML source does not matter because humans never read it. AI source transparency is preserved (HTML is text). Human transparency comes from rendering, not from source.

The framework’s verdict: half right. Rendered richness genuinely matters when humans only consume the output. But the conclusion — “use HTML as your format” — collapses the source-versus-rendered distinction this post has been building.

HTML, as a source format, mixes content and presentation. A Markdown source separates them: structure and meaning live in the source, appearance is decided at render time. That separation is what makes Markdown easy to migrate, refactor, version-control, and republish in new forms. HTML source loses it. Diffs become noisy. Sections become hard to lift out for reuse. The same content cannot be re-rendered to PDF, EPUB, or a different layout without significant rework.

The cleaner move is exactly the transparent twin strategy from earlier: keep a transparent source, render to rich output. Markdown with embedded SVG, Quarto, MDX, AsciiDoc, or Markdown plus a custom render pipeline can produce output as rich as Thariq’s HTML examples — without surrendering source-level benefits.

The real question is not “Markdown or HTML.” It is “what is your source, and what is your render target?” Treat HTML as a render target produced from a transparent source, and you keep both sets of benefits. Treat HTML as the source, and you have collapsed two distinct things into one — and the cost arrives the moment you want to restructure, migrate, or re-render.

Where the answer genuinely is HTML-as-source: when the content is fundamentally interactive (sliders, calculators, custom widgets), or when rendering choices are inseparable from the content (a bespoke data visualization). In those cases there is no meaningful source-versus-render distinction; the HTML is the content.

”MCP is dead, use CLI” (Holmes, Zechner)

The argument, summarized: LLMs can call shell commands directly. Mature local CLIs — sqlite3, jq, ripgrep, ffmpeg — are composable, fast, easy to debug, and well-documented. MCP adds protocol overhead with no real-world benefit; for most local tasks, a good CLI with a good README is all an agent needs. Versions of this argument appear in Eric Holmes’s MCP is dead, long live the CLI and Mario Zechner’s Local coding agents: the CLI is all you need.

I have argued at length elsewhere that this view misses two architectural goods MCP provides: explicit resource binding and encapsulated state. The transparency framework gives a complementary diagnosis.

The framework’s verdict: the argument silently assumes all relevant formats are transparent enough that text-based tools can mediate edits correctly. That assumption holds for genuinely transparent formats — text files, CSV, JSON, source code. It collapses for opaque formats with coupled invariants (the third reason for opacity, above). No CLI command — no cat, no grep, no sed — can correctly edit a Word document’s cross-references, a spreadsheet’s formula dependencies, or a design system’s linked components. The invariants live inside the format. Naive text edits break them.

This is exactly what the two strategies formalize. Strategy 2 (transparent twin) is the answer when a transparent source can express everything the deliverable needs. Strategy 1 (wrap with MCP) is the answer when opacity is genuinely required — either by coupled invariants, or by an opaque external system (a database, a SaaS API, an OS service) that has no transparent twin. MCP is the formal name of Strategy 1. “CLI is enough” tacitly denies that Strategy 1 is ever needed. For coupled-invariant formats, it always is.

The two posts compose. The earlier one argued for MCP from architectural goods. This one explains which formats genuinely require it — and, by implication, which formats let CLIs win precisely because they are transparent enough not to need a wrapper.

Where Transparency Doesn’t Apply

Several content classes resist this argument and probably always will:

Compressed media. Video, audio, and high-resolution photography lose their economics if you remove compression. A 4K movie cannot be PPM.
Trained model weights. Neural network parameters are dense floating-point tensors with no transparent representation that would not dwarf the original.
Final-form deliverables. A polished PDF, a printed brochure, a typeset book — these exist to be consumed, not edited. Opacity is appropriate.
Large-scale databases. A 10 TB warehouse fundamentally cannot be CSV. SQLite, Parquet, and similar binary-but-well-documented formats are the right answer.

The dimension still applies to all of these — it is just that for some content classes, the cost of transparency is genuinely too high. The point is to make this an active decision rather than a default.

Conclusion

Fidelity and size are real dimensions of file format choice. They are not going away. But they are no longer enough.

Transparency — the directness of correspondence between bytes and semantic content — is now load-bearing. It determines which collaborators can read what you create, edit it, version-control it, and participate in the workflows that involve it. The dimension splits in two: source transparency for AI, rendered transparency for humans. The right format is the one that matches the workflow’s pattern of who edits, who reads, and on which surface.

For most of computing’s history this dimension was free. Today it is a leverage decision. Default to transparent. Have a specific reason for opacity — and the reasons can be real (compression, performance, coupled invariants protecting cross-references). When opacity is genuinely required, Strategy 1 (wrap with MCP) is the right shape. When it is not, Strategy 2 (transparent source with a one-way render) keeps AI in charge instead of as a client of someone else’s wrapper.

The longer trend is visible already. Markdown-based note systems are eating Word. Code-based design systems are eating monolithic design tools. CLI- and text-first agents are eating GUI-bound IDE plugins. The pull is not aesthetic. It is the natural consequence of LLMs becoming load-bearing collaborators on more and more of the work humans care about. Pick formats today that respect this. The dimension is permanent; the protocol is negotiable.

This post grew out of a conversation about whether I could edit pixels directly. The answer led step by step from PNG to SVG to BMP to PPM, and somewhere along the way the transparency spectrum became impossible to unsee. The heart image above was hand-typed during that conversation as proof.