Small Sharp Software Tools

Harness the Combinatoric Power of Command-Line Tools and Utilities

Creating an HTML Document from Multiple Markdown Files with Pandoc

Tagged with Bash tools

Published July 15, 2020

If you have a folder full of Markdown documents and you want to convert them to a single HTML file for easy distribution, you can do this quickly with Pandoc.

Pandoc is an open source document converter available on multiple platforms, and is an invaluable tool if you work with documents for any significant amount of time.

In this tutorial, you’ll create several Markdown documents and convert them into a single HTML document with a table of contents and a custom template.

Installing Pandoc

You can install Pandoc with your package manager of choice, or by visiting the installation page on the Pandoc web site.

On macOS, you can install Pandoc with Homebrew using the following command:

$ brew install pandoc

On Ubuntu and Debian systems, install Pandoc with APT:

$ sudo apt install pandoc

Now that Pandoc is installed, you can create some Markdown files to convert.

Creating the Markdown Files

For this example, you’ll create three Markdown files that will represent chapters of a book.

Start by creating a folder for your project:

$ mkdir mybook

Switch to that directory:

$ cd mybook

Now create a few Markdown files with some content:

$ cat << 'EOF' > 1.md
> # Chapter 1
>
> This is the first chapter. It has some text.
> EOF

Use this first file to create two more files using the sed command. First, replace 1 and first with 2 and second to create 2.md:

$ sed -e 's/1/2/' -e 's/first/second/' 1.md > 2.md

Then replace 1 and first with 3 and third to create the file 3.md:

$ sed -e 's/1/3/' -e 's/first/third/' 1.md > 3.md

You can also create the files yourself:

$ cat << 'EOF' > 2.md
> # Chapter 2
>
> This is the second chapter. It has some text.
> EOF
$ cat << 'EOF' > 3.md
> # Chapter 3
>
> This is the third chapter. It has some text.
> EOF

You now have three Markdown files. View all three of them at once with the cat command:

$ cat 1.md 2.md 3.md

You’ll see this output:

# Chapter 1

This is the first chapter. It has some text.
# Chapter 2

This is the second chapter. It has some text.
# Chapter 3

This is the third chapter. It has some text.

Now let’s convert these to a single HTML file with Pandoc.

Converting with Pandoc

When you use Pandoc to convert files, you specify the input and output formats, along with the filename you want to create and the files you want to process.

To convert several files from Markdown to HTML, use the -f markdown flag to specify that the source documents are Markdown, use -t html to specify that the output should be HTML, and use the -o flag to specify the output filename. Then list the input files last:

$ pandoc -f markdown -t html -o index.html 1.md 2.md 3.md

Pandoc will stitch the files together and generate an HTML fragment containing the three files.

$ cat index.html

You’ll see this output:

<h1 id="chapter-1">Chapter 1</h1>
<p>This is the first chapter. It has some text.</p>
<h1 id="chapter-2">Chapter 2</h1>
<p>This is the second chapter. It has some text.</p>
<h1 id="chapter-3">Chapter 3</h1>
<p>This is the third chapter. It has some text.</p>

The output doesn’t contain the typical HTML boilerplate though. Pandoc creates fragments by default. This fragment is meant to be injected or copied into another file.

Creating Standalone Documents

Pandoc can create standalone documents rather than fragments. To do so, use the -s option, Pandoc will generate a standalone document that will include this boilerplate.

$ pandoc -s -f markdown -t html -o index.html 1.md 2.md 3.md

When you run this, Pandoc will use a default template and you’ll see a warning that the title attribute isn’t provided:

[WARNING] This document format requires a nonempty <title> element.
  Defaulting to '1' as the title.
  To specify a title, use 'title' in metadata or --metadata title="...".

To fix that, use the --metadata title flag:

$ pandoc -s --metadata title="My book" -f markdown -t html -o index.html 1.md 2.md 3.md

Finally, Pandoc can create an in-document table of contents for you if you add the --toc argument.

Here’s the full command, broken across multiple lines for easier readability:

$ pandoc -s \
  --metadata title="My book" \
  --toc \
  -f markdown \
  -t html -o index.html \
  1.md 2.md 3.md 4.md

The resulting file contains the typical HTML skeleton, some CSS, and your content. But you might want more control than that.

Using a Custom Template

You can use your own HTML template by specifying the --template flag and a path to the file. But you’ll need a template first.

Use the following command to see the default HTML template:

$ pandoc -D html

You’ll see this output:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="$lang$" xml:lang="$lang$"$if(dir)$ dir="$dir$"$endif$>
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
$for(author-meta)$
  <meta name="author" content="$author-meta$" />
$endfor$
$if(date-meta)$
  <meta name="dcterms.date" content="$date-meta$" />
$endif$
$if(keywords)$
  <meta name="keywords" content="$for(keywords)$$keywords$$sep$, $endfor$" />
$endif$
  <title>$if(title-prefix)$$title-prefix$ – $endif$$pagetitle$</title>
  <style>
    $styles.html()$
  </style>
$for(css)$
  <link rel="stylesheet" href="$css$" />
$endfor$
$if(math)$
  $math$
$endif$
  <!--[if lt IE 9]>
    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
$for(header-includes)$
  $header-includes$
$endfor$
</head>
<body>
$for(include-before)$
$include-before$
$endfor$
$if(title)$
<header id="title-block-header">
<h1 class="title">$title$</h1>
$if(subtitle)$
<p class="subtitle">$subtitle$</p>
$endif$
$for(author)$
<p class="author">$author$</p>
$endfor$
$if(date)$
<p class="date">$date$</p>
$endif$
</header>
$endif$
$if(toc)$
<nav id="$idprefix$TOC" role="doc-toc">
$if(toc-title)$
<h2 id="$idprefix$toc-title">$toc-title$</h2>
$endif$
$table-of-contents$
</nav>
$endif$
$body$
$for(include-after)$
$include-after$
$endfor$
</body>
</html>

Throughout the file, you’ll see variables that act as placeholders. For example, the $title$ variable holds the value of the page title, and the $body$ variable holds the output. You’ll also see some conditional logic. For example. the table of contents is only displayed if Pandoc was told to generate it:

$if(toc)$
<nav id="$idprefix$TOC" role="doc-toc">
$if(toc-title)$
<h2 id="$idprefix$toc-title">$toc-title$</h2>
$endif$
$table-of-contents$
</nav>
$endif$

To create your own template, you can create your own from scratch and include the statements you like, but it’s faster to save this existing template to a new file. You can do that quickly with the following command which saves the output of the pandoc -D command to the file template.html:

$ pandoc -D html > template.html

Now open template.html in your editor and make the modifications you want. For this exercise, locate the line of the file that inserts the body of the document:

$body$

Modify it so it’s wrapped in an HTML <main> element:

<main>
  $body$
</main>

Save the file.

You can now tell Pandoc to use the modified template with the --template argument:

$ pandoc -s \
  --metadata title="My book" \
  --toc \
  --template template.html \
  -f markdown \
  -t html -o index.html \
  1.md 2.md 3.md 4.md

Your resulting file uses the custom template you created. Your main content will be wrapped in a <main> tag.

Conclusion

In this tutorial you used Pandoc to create a single HTML document from a collection of Markdown documents. Explore this further by looking at Pandoc’s support for different types of Markdown, and its support for including headers and footers from other files. You’ll find Pandoc to be an incredibly powerful tool, and what you’ve used here can translate into using Pandoc to create ebooks and printed material as well.