Tutorial detail

Markdown AST with mdast: Node Types, Semantics, and Structure

Step 16 • Expert

Model Markdown as structured data with mdast and reason about nodes, parents, and literals.

AST-based thinking is required for robust linting, migration, and transformation workflows.

Why an AST changes how you see Markdown

Markdown source is plain text, but tools often need a structured representation. An Abstract Syntax Tree, or AST, converts source into nodes that describe the document’s structure. Instead of treating Markdown as a string, an AST-based tool can identify headings, paragraphs, links, images, code blocks, list items, emphasis, and other constructs directly.

mdast is a Markdown AST specification in the syntax-tree ecosystem. It gives JavaScript and unified-based tools a shared node model for Markdown documents. This is important because many operations are difficult or unsafe with raw string replacement. If you want to rename heading levels, validate image alt text, rewrite links, extract a table of contents, or enforce style rules, an AST is the right level of abstraction.

An AST does not eliminate the need to understand Markdown parsing. The parser still decides what nodes exist. If the source parses a line as a paragraph instead of a heading, the AST will reflect that. But once parsing is complete, the AST makes structure explicit and programmable.

mdast node families

The mdast model includes several broad node categories. Parent nodes contain children. Literal nodes contain text-like values. Resource nodes represent things with URLs and titles, such as links and images. Reference nodes represent reference-style links and images. These categories help tools operate generically.

For example, a document node is a parent. A paragraph is a parent containing inline children. Text is a literal. A link is a parent and a resource because it contains children and points to a URL. An image is a resource and literal-like in the sense that it carries alt text. A heading is a parent with a depth property.

This structured model is far richer than raw HTML for authoring tools. HTML output may lose source-level distinctions. A reference-style link and inline link may produce the same HTML. In mdast, those forms can remain distinguishable depending on parsing and node type. That distinction matters for format-preserving transformations.

AST transformations

AST transformations let you modify documents safely. A plugin can visit every heading and generate slugs. Another can ensure each image has alt text. Another can rewrite external links. Another can collect references. Because the tool works on nodes, it avoids accidentally modifying code blocks or unrelated text.

Consider replacing all instances of a URL. A string replacement might change code examples, escaped text, or reference definitions incorrectly. An AST transform can target only link and image URL fields. This is safer and easier to reason about.

AST transforms are also useful for migrations. If a documentation site changes internal route structure, a transform can update Markdown links across thousands of files. If a style guide changes heading hierarchy, a transform can adjust heading depths while preserving prose.

Linting with structure

Markdown linting becomes more powerful with an AST. A linter can enforce one top-level heading, require descriptive link text, reject empty headings, detect duplicate heading text, limit heading depth jumps, require alt text, or flag raw HTML. These rules are structural. They should not be implemented with fragile regular expressions.

AST-based linting is especially valuable for structural quality. You can require exactly one top-level heading, ensure internal links use descriptive anchor text, detect missing metadata source fields, and enforce minimum content structure. The Markdown source remains pleasant to edit, while tooling protects the rendered page quality.

mdast and positional data

Many AST nodes can include positional information: line, column, and offset. This allows tools to report useful diagnostics. Instead of saying “invalid link found,” a tool can say “invalid link in file X at line 42.” Good diagnostics are essential for adoption. Authors are more likely to fix issues when tools point to the exact source location.

Positional data also helps editors implement selections, previews, and synchronized scrolling. A preview pane can map rendered output back to source lines. A spellchecker can ignore code nodes. A link checker can highlight the exact Markdown link that failed.

AST versus rendered HTML

Rendered HTML is useful for browsers, but it is not always the best transformation layer. HTML may include renderer-specific wrappers, classes, sanitized output, or syntax highlighting markup. It may also lose source intent. An AST sits closer to the Markdown language.

That said, ASTs are not universal across all ecosystems. mdast is one specification, especially strong in the unified/remark world. Other parsers may expose different tree shapes. A production system should choose an AST model and treat it as part of the internal contract.

AST contracts in teams

If multiple tools operate on Markdown, agree on the AST contract. A linter, link checker, search extractor, and renderer should not each invent a different understanding of the document. Using mdast as a shared model lets teams reuse traversal utilities and plugin patterns.

This also helps onboarding. Engineers can learn one node vocabulary and apply it across validation, transformation, and migration tasks. The source remains Markdown, but the engineering layer becomes consistent.

FAQ

What is a Markdown AST?

A Markdown AST is a tree representation of a parsed Markdown document, with nodes for structures such as headings, paragraphs, links, and code blocks.

What is mdast?

mdast is a Markdown AST specification used in the syntax-tree and unified ecosystems.

Why use an AST instead of regex?

An AST lets tools target document structures precisely without accidentally changing code blocks, literals, or unrelated text.

Can mdast help document quality?

Yes. AST tooling can enforce heading structure, link quality, alt text, and other content rules that affect rendered page correctness.

Is mdast the same as HTML?

No. mdast represents Markdown structure before final rendering, while HTML is a browser output format.

Learn parser foundations in Markdown Parser Implementation Theory.
Build processing steps in unified and remark Pipelines.
Transform links from Markdown Links, Images, and Reference Definitions.
Compare extension nodes from GFM Extensions.
Apply AST validation in Designing Production Markdown Systems.

Continue with unified and remark Pipelines.

References

mdast Specification

Navigation

Previous step Next step Tutorial index

Series map