Editor's note (2026-05). Chrome 150 deprecated
navigator.modelContextin favour ofdocument.modelContext(per WebMCP spec PR #184). Mentions of the API in this post use the forward-compatible feature-detection pattern:const modelContext = document.modelContext || navigator.modelContext; if (modelContext)WebConverter's own bootstrap uses this exact fallback, so the same site code keeps working on browsers that still ship the older identifier.
When we first shipped WebMCP support, only image conversion was wired up. That was useful — but it was a sliver of what WebConverter can do. As of today every feature on the site is exposed as a WebMCP tool: an AI agent can call document.modelContext and convert images, build and edit PDFs, OCR scans, extract PDF text, convert documents with Pandoc, convert and trim video, convert and extract audio, transcribe speech with Whisper, and remove image backgrounds — all locally in the browser, all without uploading anything.
The Full Tool Catalogue
The thirteen tools are registered globally — on every page of WebConverter, not just /webmcp.html — so an agent can use them from wherever the user is. They all return a base64 file plus a data: URL (and, where appropriate, the structured output as plain text or JSON):
convert_image— image to PNG, JPEG, WebP, BMP, TGA, HDR, EXR or KTX2. Accepts every format the WASM importer reads (BMP, DDS, GIF, HDR, ICO, JPEG, KTX, PGM, PIC, PNG, PPM, PSD, TGA, WebP) plus, via a browser-decode fallback, HEIC on Safari/iOS and AVIF.images_to_pdf— one or more images into a single PDF, one image per page.images_to_searchable_pdf— images plus Tesseract OCR so the resulting PDF has a selectable, searchable text layer on top of the pixels.merge_pdfs— whole-document merge of several PDFs.reorder_pdf_pages— a PDF and a new page order, out comes a re-ordered PDF.delete_pdf_pages— drop pages by index.extract_pdf_text— text or simple Markdown out of any PDF, viapdf.js.convert_document— DOCX, ODT, RTF, HTML, Markdown, LaTeX, RST, MediaWiki, EPUB ↔ Markdown, HTML, plain, LaTeX, RST, AsciiDoc, DOCX, ODT — with Pandoc compiled to WebAssembly. Lazy ~56 MB download on first call.convert_audio— any audio (and the audio track of a video) to MP3, OGG, WAV or FLAC; decoding uses the browser's own codecs.convert_video— MP4 (H.264 + AAC), WebM (VP9 + Opus) or animated GIF with ffmpeg-wasm; lazy per-variant download.trim_video— cut a clip fromstartTimetoendTimeseconds.remove_image_background— transparent PNG/WebP via a tiny U²-Net-P ONNX model plus a deterministic WASM matting pass for clean edges.transcribe_audio— Whisper (quantised, ~40 MB) running in WebAssembly, with timestamped segments.list_supported_formats— call this first; the agent gets a structured map of every tool's inputs, outputs and engine.
Why This Matters for Agents
An AI assistant that wants to do something with a file today usually has three bad options: upload it to a third-party API, run a server-side tool that touches your data, or refuse. WebMCP changes that because the tool is the page's own JavaScript. The agent gets the capability; your file never leaves the tab. There is no API key, no rate limit, no cost, and near-zero carbon because the upload-process-download round trip never happens.
That model finally scales beyond images. An agent can now:
- Take a folder of phone photos and produce a single, OCR'd PDF — locally.
- Transcribe a meeting recording into searchable text, in the browser, without sending audio to anyone.
- Crop a 90-second clip out of a 4K video for a presentation, locally, with ffmpeg-wasm.
- Strip backgrounds out of a batch of product photos for a shop listing — no subscription, no upload.
- Pull the text out of a contract PDF, run a docx → markdown conversion on the related Word file, and merge several PDFs into one — all from the same tool surface.
Lazy by Design
The webmcp.js bootstrap is tiny. The expensive parts — Pandoc's ~56 MB WASM, the ffmpeg cores, the U²-Net ONNX model, the Whisper model, Tesseract's language data — are only fetched the first time the matching tool is called, then the browser caches them. Agents that never call convert_video never pay the ffmpeg download. Asset budget intact.
Privacy, Safety, Honesty
Every tool is annotated readOnlyHint — they take bytes and return bytes; they never write files, never make network requests outside the lazy WASM/model downloads, never read other tabs. The single network dependency is the lazy download of the engine itself (Pandoc, ffmpeg, Whisper, Tesseract), each from a stable URL and cached.
And we are honest about formats: HEIC is in list_supported_formats as "Safari/iOS only" because that is the truth — Chrome and Firefox do not decode HEIC natively, and shipping a multi-megabyte HEIC decoder would violate the project's asset-size budget. The fallback decodes whatever the browser itself can decode and no more.
Try It
The WebMCP page lists every registered tool and includes a working live demo. If you are building an in-browser agent — or just want to see how a serious, complete WebMCP server looks — this is what an honest, private, zero-cost file-tools surface looks like. And it is just a web page.
Ready to convert your images?
Try WebConverter Free