Markdown for documents, React for interaction, MDX for both! But how do Markdown and MDX arrive at HTML and JSX? The answer is Abstract Syntax Trees.
Markdown is the perfect format for writing documents, documentation, blog posts, static content, and more. React on the other hand is great for building interactive interfaces. That said, have you ever tried writing a blog post in React/HTML? There's a reason Markdown exists! But what if you want to add some interactive elements to a Markdown document? Maybe an embedded YouTube video or maybe a chart that pulls in some dynamic data? Or maybe a form to collect some contact information on a sales page?
MDX gives you the best of both worlds. Write your documents in Markdown, but feel free to import and use React components right there inside of your document. Beautiful.
In this article we're going to go beyond surface level and dive into some of the inner workings of Markdown and MDX. How does a file with Markdown get converted into HTML, and how does MDX get converted into JSX?
We are going to explore Abstract Syntax Trees (AST) and what Markdown and MDX have to do with them. The code samples in this article can be found here.
MDX Real-World Usage (A Warning)
The examples in this article are meant to provide a glimpse of what MDX is doing behind the scenes and what ASTs are like and used for. If you'd like to use MDX in Gatsby, Next.js, or Create React App, the MDX website provides examples and documentation on how to easily use it within your app.
Syntax Trees
The ability to view code as data - rather than simply some text in a file - opens up a world of possibilities. Take Prettier for example. It is able to take some poorly formatted JavaScript or Markdown and give you something nicely formatted in return. You may think the conversion goes from ugly Markdown directly to formatted Markdown, but the key to this process is the intermediary step, a data structure called an Abstract Syntax Tree (AST).
Think of what you can produce with a Markdown file. Yes, you can produce HTML, but you can also produce formatted Markdown (like what Prettier does), or it can be checked for linter errors, display how many words are in it, among other things.
Markdown -> AST -> HTML
Markdown -> AST -> Formatted Markdown
Markdown -> AST -> Lint Errors
Markdown -> AST -> Word Counts
It is with ASTs that MDX is able to combine Markdown and React so beautifuly together.
Abstract Syntax Trees in Action
To see ASTs in action, let's look at this small Markdown example with a Level 1 Heading and a Paragraph:
# Welcome
A paragraph.
If we process this markdown with unified along with the remark-parse plugin, we'll take the Markdown input and end up with an AST which represents the Markdown.
import unified from "unified";
import markdown from "remark-parse";
const input = `
# Welcome
A paragraph.
`;
const tree = unified()
.use(markdown)
.parse(input);
If you do this yourself, you'll see all sorts of data about the position and line of the characters, but I have stripped this out to make it a bit more digestible. Each node (an object) in this tree contains a number of properties:
- type: What data type is this node? Heading, Paragraph, Emphasis, Strong, etc.
- children: Nested nodes contained within the current one. Imagine an Image inside of a Link, or a Link within a Paragraph
- depth: Used to differentiate Level 1, 2, 3 Headings (h1, h2, h3)
- value: Text nodes have a value attribute which contain their actual text value
{
"type": "root",
"children": [
{
"type": "heading",
"depth": 1,
"children": [
{
"type": "text",
"value": "Welcome"
}
]
},
{
"type": "paragraph",
"children": [
{
"type": "text",
"value": "A paragraph."
}
]
}
]
}
Using the AST for Calculations
We can process the AST to count how many of each type
we see (recursive function alert):
function counts(acc, node) {
// add 1 to an initial or existing value
acc[node.type] = (acc[node.type] || 0) + 1;
// find and add up the counts from all of this node's children
return (node.children || []).reduce(
(childAcc, childNode) => counts(childAcc, childNode),
acc
);
}
Which, depending on your input, produces something like:
{
"root": 1,
"heading": 1,
"text": 7,
"paragraph": 3,
"strong": 1,
"emphasis": 1
}
Counting Words with the AST
The word count tool I'm using in VS Code right now counts ## Welcome
as 2 words, when we can really see that it is only a single word which happens to be in an h2
tag. Using an AST we can provide a more accurate word count by only counting the text
values.
import unified from "unified";
import markdown from "remark-parse";
function wordCount(count, node) {
if (node.type === "text") {
return count + node.value.split(" ").length;
} else {
return (node.children || []).reduce(
(childCount, childNode) => wordCount(childCount, childNode),
count
);
}
}
// Our markdown input
const input = `## Welcome`;
// Convert markdown into an AST
const tree = unified()
.use(markdown)
.parse(input);
// Extract Word Count from AST
const words = wordCount(0, tree);
Visualizing the AST
With this AST we can also create a React component called Node
which renders it and its children (using padding to display its tree like structure):
const Node = ({ node }) => (
<div style={{ paddingLeft: `15px` }}>
<strong>
{node.type}
{node.depth && <span> (d{node.depth})</span>}
</strong>
{node.value && <div style={{ paddingLeft: "15px" }}>{node.value}</div>}
{/* Render additional Nodes for each child */}
{node.children &&
node.children.map(child => {
const { line, column, offset } = child.position.start;
return <Node key={`${line}-${column}-${offset}`} node={child} />;
})}
</div>
);
This output allows us to see how the tree is structured and indented:
root
heading (d1)
text
Welcome
paragraph
text
A paragraph.
MDX
If you came here for MDX and not Markdown, you're in luck! We're now going to transition into exploring how MDX works and how it is related to the Markdown examples shown above.
AST Explorer
For all the visual learners, there is a great website called AST Explorer which allows you to visualize the AST produced by a number of different input formats such as Markdown and MDX. We're going to be diving into MDX a bit further now, so let's take a look at the AST produced by an MDX file.
MDAST, HAST, MDXAST, MDXHAST... What??
That's a lot of acronyms! But what do they mean and what does this have to do with Markdown and MDX? In order to convert Markdown into an AST, we need a specification, or a set of rules to follow so we know what types of Nodes are available (heading, paragraph, link, etc.) and what properties they might have (type, children, value).
This set of rules for Markdown is called mdast. Similarly, there are other sets of rules for dealing with HTML, called hast. With both specifications, someone could write code that converts a Markdown AST (mdast) into an HTML AST (hast), which is exactly what remark-rehype does.
MDX is a superset of Markdown, meaning that everything you can do in Markdown you can also do in MDX, plus three additional features, which are:
- jsx (replacing html)
- import statements
- export statements
This specification is called MDXAST.
Compiling MDX into an AST
Unless you are developing a plugin for MDX, you probably won't need to deal directly with the MDX AST, but since this article is about learning, let's write some code which produces an AST.
const { createMdxAstCompiler } = require("@mdx-js/mdx");
// A "unified" compiler
const compiler = createMdxAstCompiler({ remarkPlugins: [] });
const input = `
import YouTube from "./YouTube";
# Welcome
<YouTube id="123" />
`;
const ast = compiler.parse(input);
const astString = JSON.stringify(ast, null, 2);
console.log(astString);
After we strip out some of the position data, the AST ends up looking like the data below. Notice that we are seeing two of the custom MDX node types: import
and jsx
.
{
"type": "root",
"children": [
{
"type": "import",
"value": "import YouTube from \"./YouTube\";"
},
{
"type": "heading",
"depth": 1,
"children": [
{
"type": "text",
"value": "Welcome"
}
]
},
{
"type": "jsx",
"value": "<YouTube id=\"123\" />"
}
]
}
Compiling MDX into JSX
What we really want MDX to do is to produce JSX, not an AST. This code is similar to the previous example which produced an AST, but we're adding on the utility function mdxHastToJsx
which takes the AST from the previous step and produces JSX.
const { createMdxAstCompiler } = require("@mdx-js/mdx");
const mdxHastToJsx = require("@mdx-js/mdx/mdx-hast-to-jsx");
const input = `
import YouTube from "./YouTube";
# Welcome
<YouTube id="123" />
`;
const compiler = createMdxAstCompiler({ remarkPlugins: [] }).use(mdxHastToJsx);
const jsx = compiler.processSync(input).toString();
console.log(jsx);
What is produced is valid JSX, which looks like:
import YouTube from "./YouTube";
const layoutProps = {};
const MDXLayout = "wrapper";
export default function MDXContent({ components, ...props }) {
return (
<MDXLayout
{...layoutProps}
{...props}
components={components}
mdxType="MDXLayout"
>
<h1>{`Welcome`}</h1>
<YouTube id="123" mdxType="YouTube" />
</MDXLayout>
);
}
Conclusion
I hope you've enjoyed learning about ASTs and the role they play with Markdown and MDX. With ASTs we're able to process and tweak our code on its way to the desired result. It could be as simple as counting how many words are in a Markdown document, or as complex as Prettier or Babel. They open the door to a number of possibilities, which may have at one point seemed like a far-fetched idea. Take MDX itself for example. It was just an idea that a few people had, and with the help of ASTs and some hard work by some smart people, became a reality.