


Pandoc has long supported filters, which allow the pandoc abstract syntax tree (AST) to be manipulated between the parsing and the writing phase. Traditional pandoc filters accept a JSON representation of the pandoc AST and produce an altered JSON representation of the AST. They may be written in any programming language, and invoked from pandoc using the --filter option.

Although traditional filters are very flexible, they have a couple of disadvantages. First, there is some overhead in writing JSON to stdout and reading it from stdin (twice, once on each side of the filter). Second, whether a filter will work will depend on details of the user’s environment. A filter may require an interpreter for a certain programming language to be available, as well as a library for manipulating the pandoc AST in JSON form. One cannot simply provide a filter that can be used by anyone who has a certain version of the pandoc executable.

Starting with version 2.0, pandoc makes it possible to write filters in Lua without any external dependencies at all. A Lua interpreter (version 5.3) and a Lua library for creating pandoc filters is built into the pandoc executable. Pandoc data types are marshaled to Lua directly, avoiding the overhead of writing JSON to stdout and reading it from stdin.


Pandoc, the universal document converter, can serve as a nice intro into functional programming with Haskell. For many contributors, including the author of this guide, pandoc was their first real exposure to this language. Despite its impressive size of more than 60.000 lines of Haskell code (excluding the test suite), pandoc is still very approachable due to its modular architecture. It can serve as an interesting subject for learning.

This guide exists to navigate the large amount of sources, to lay-out a path that can be followed for learning, and to explain the underlying concepts.

A basic understanding of Haskell and of pandoc’s functionality is assumed.

Getting the code

Pandoc has a publicly accessible git repository on GitHub: https://github.com/jgm/pandoc. To get a local copy of the source:

git clone https://github.com/jgm/pandoc

The source for the main pandoc program is app/pandoc.hs. The source for the pandoc library is in src/, the source for the tests is in test/, and the source for the benchmarks is in benchmark/.


This document describes pandoc’s handling of JATS.

Metadata Values

Article summary. Added via the document’s front matter via the <abstract> element.

list of article contributors. Each author should have a surname and a given name listed in the entry; if the author has no surname value, then the item will be used as the contributors string-name.

the contributor’s ORCID identifier.

surname of the contributor. Usually the family name in western names.

See <surname>.


personal names of the contributor; this includes middle names (if any) in western-style names.

See <given-names>.

full name of the author; included only as a fallback if author.surname is not available. Tagged with <string-name>.

the contributor’s email address.

Used as the contents of the <email> element.


either full affiliation entries as described in field affiliation, or a list of affiliation identifiers.

The identifiers link to the organizations with which an author is affiliated. Each identifier in this list must also occur as the id of an affiliation listed in the top-level affiliation list.

If the top-level affiliation field is set, then this entry assumed to be a list of identifiers, and a list of full entries if that field is unset.

Full entries must be given if the articleauthoring tag set it used, as affiliation links are not allowed in that schema.

boolean attribute used to mark authors who contributed equally to the work. The equal-contrib attribute, set to yes, is added to the author’s <contrib> element if this is set to a truthy value.
identifier linking to the contributor’s correspondence information. The info itself must be stored in as an item in article.author-notes.corresp. If the cor-id value is then, an <xref> link of ref-type corresp is added. The rid attribute is set to cor-<ID>, where <ID> is the stringified value of this attribute.

the list of organizations with which contributors are affiliated. Each institution is added as an <aff> element to the author’s contrib-group.

The fields are given in the order in which they are included in the output.

internal identifier; used as the <aff> element’s id value, prefixed with aff-.
name of the research group or other low-level organizational structure; used as value of an <institution> element with content-type set to group.
name of the department or other mid-level organizational structure; used as value of an <institution> element with content-type set to dept.
name of the company, university, or other top-level organizational structure; used as value of an <institution> element. The institution element is wrapped in an <institution-wrap> element; any identifiers, like ringgold or ror, are added to the wrapper and must hence belong to this organization (not the department or group).
International Standard Name Identifier of the organization. Added via an <institution-id> element with institution-id-type set to ISNI.
Ringgold identifier of the organization. Added via an <institution-id> element with institution-id-type set to Ringgold.
Research Organization Registry identifier of the organization. Added via an <institution-id> element with institution-id-type set to ROR.
Array of persistent identifiers which are added as <institution-id> elements. Each item must contain a map with keys type, used as institution-id-type, and id, used as element content.
The organization’s street address; each list item is wrapped in an <addr-line> element, separated by a comma and space (,).
City in which the organization is located; used only if street-address is not given, in which case the value is wrapped in a <city> element.
Country in which the organization is located; used as the value of a <country> element.
Two letter ISO-3166-1 country identifier; used as the [country][attr:country] attribute in element <country> (if the latter is present).

Copyright and licensing information. This information is rendered via the <permissions> element.

It is recommended to use the license field (described below) for licensing information. If licensing information is included below copyright, then the variables type, link, and text should always be used together.

copyright notice or statement; used as content of the <copyright-statement>. Use a list for multiple statements.
the year of copyright; used as content of the <copyright-year>. Use a list to for multiple copyright years. The JATS documentation states that this field need not to be used if the year is included in the copyright statement.
the copyright holder; included via the <copyright-holder> element. Use a list for multiple copyright holders.
inline text setting the license under which the text is published; included via the <license-p> element.
type of the license; used as value of the license-type attribute.
external link describing the license; used as value of a xlink:href attribute in the <license> element.

publication date. This value should usually be a string representation of a date. Pandoc will parse and deconstruct the date into the components given below. It is also possible to pass these components directly.

The publication date is recorded in the document via the <pub-date> element and its sub-elements. The publication-format attribute is always set to electronic.


ISO-8601 representation of the publication date. Used as the value of the <pub-date> element’s iso-8601-date attribute.

This value is set automatically if pandoc can parse the date value as a date.

day, month, year

Day, month, and year of the publication date. Only the publication year is required. The values are used as the contents of the elements with the respective names.

The values are set automatically if pandoc can parse the date value as a date.

The type of event marked by this date. The value is set as the date-type attribute on the <pub-date> element and defaults to “pub” if not specified.

information concerning the article that identifies or describes it. The key-value pairs within this map are typically used within the <article-meta> element.

external article identifier assigned by the publisher. Used as the content of the <article-id> element with attribute pub-id-type set to publisher-id.
Digital Object Identifier (DOI) assigned to the article. Used as the content of the <article-id> element with attribute pub-id-type set to doi.
PubMed Identifier (PubMed ID) assigned to the article. Used as the content of the <article-id> element with attribute pub-id-type set to pmid.
PubMed Central Identifier assigned to the article. Used as the content of the <article-id> element with attribute pub-id-type set to pmcid.
generic article accession identifier. Used as the content of the <article-id> element with attribute pub-id-type set to art-access-id.
name of a subject or topic describing the article. Used as the content of the <subject> element, nested in a <subj-group> element which has heading as its subj-group-type attribute.
list a subject or topic describing the article. Items are each used as the content a the <subject> element, grouped in a single <subj-group> element with its subj-group-type attribute set to categories.

Additional information about authors, like conflict of interest statements and corresponding author contact info. Wrapped in an [<author-notes>][elem:author-notes] element.

Conflict of interest statement. Rendered as a footnote (<fn>) of fn-type conflict.
Contributed-by information. Rendered as a footnote (<fn>) of fn-type con.
Correspondence information. This must be a list of contributor correspondence items, where each item must have the properties id and email. The info is then rendered via a <corresp> element.
Prose describing the funding. Added to the article’s frontmatter via the funding-statement element.

information on the journal in which the article is published. This must be a map; the following key/value pairs are recognized.

journal identifier assigned by the publisher. Used as content of element <journal-id> with attribute journal-id-type set to publisher-id.
journal identifier assigned by PubMed. Used as content of element <journal-id> with attribute journal-id-type set to nlm-ta.
journal identifier assigned by PubMed Central. Used as content of element <journal-id> with attribute journal-id-type set to pmc.
full title of the journal in which the article is published. Used as content of the <journal-title> element.
short form of the journal title. Used as content of the <abbrev-journal-title> element.
ISSN identifier of the publication’s print version. Used as content of the <issn> element with the publication-format attribute set to print.
ISSN identifier of the publication’s electronic version. Used as content of the <issn> element with the publication-format attribute set to electronic.
name of the publishing entity (person, company, or other). Used as the content of the <publisher-name> element.
place of publication. Used as the content of the <publisher-loc> element.

Article licensing information. Each item of this field is rendered as a <license> element within the <permissions> element.

Item content should be either a single paragraph, or a map with the fields listed below.

inline text describing a license under which the text is published; included via the <license-p> element.
type of the license; used as value of the license-type attribute.
external link describing the license; used as value of a xlink:href attribute in the <license> element.
Additional notes concerning the whole article. Added to the article’s frontmatter via the <notes> element.
Subordinate part of the document title. Added to the document’s front matter as a <subtitle> element.
list of keywords. Items are used as contents of the <kwd> element; the elements are grouped in a <kwd-group> with the kwd-group-type value author.
The article title. Added to the document’s front matter via the <article-title> element.


Pandoc’s handling of org files is similar to that of Emacs org-mode. This document aims to highlight the cases where this is not possible or just not the case yet.

Export options

The following export keywords are supported:

  • AUTHOR: comma-separated list of author(s); fully supported.

  • CREATOR: output generator; passed as plain-text metadata entry creator, but not used by any default templates.

  • DATE: creation or publication date; well supported by pandoc.

  • EMAIL: author email address; passed as plain-text metadata field email, but not used by any default templates.

  • LANGUAGE: document language; included as plain-text metadata field lang. The value should be a BCP47 language tag.

  • SELECT_TAGS: tags which select a tree for export.

  • EXCLUDE_TAGS: tags which prevent a subtree from being exported. Fully supported.

  • TITLE: document title; fully supported.

  • EXPORT_FILE_NAME: target filename; unsupported, the output defaults to stdout unless a target has to be given as a command line option.


