LEFT TEXT
_library/index-include.md.cms
_library/sitemap-include.md.cms
Pandoc has long supported filters, which allow the pandoc abstract syntax tree (AST) to be manipulated between the parsing and the writing phase. Traditional pandoc filters accept a JSON representation of the pandoc AST and produce an altered JSON representation of the AST. They may be written in any programming language, and invoked from pandoc using the --filter
option.
Although traditional filters are very flexible, they have a couple of disadvantages. First, there is some overhead in writing JSON to stdout and reading it from stdin (twice, once on each side of the filter). Second, whether a filter will work will depend on details of the user’s environment. A filter may require an interpreter for a certain programming language to be available, as well as a library for manipulating the pandoc AST in JSON form. One cannot simply provide a filter that can be used by anyone who has a certain version of the pandoc executable.
Starting with version 2.0, pandoc makes it possible to write filters in Lua without any external dependencies at all. A Lua interpreter (version 5.3) and a Lua library for creating pandoc filters is built into the pandoc executable. Pandoc data types are marshaled to Lua directly, avoiding the overhead of writing JSON to stdout and reading it from stdin.
Pandoc, the universal document converter, can serve as a nice intro into functional programming with Haskell. For many contributors, including the author of this guide, pandoc was their first real exposure to this language. Despite its impressive size of more than 60.000 lines of Haskell code (excluding the test suite), pandoc is still very approachable due to its modular architecture. It can serve as an interesting subject for learning.
This guide exists to navigate the large amount of sources, to lay-out a path that can be followed for learning, and to explain the underlying concepts.
A basic understanding of Haskell and of pandoc’s functionality is assumed.
Pandoc has a publicly accessible git repository on GitHub: https://github.com/jgm/pandoc. To get a local copy of the source:
git clone https://github.com/jgm/pandoc
The source for the main pandoc program is app/pandoc.hs
. The source for the pandoc library is in src/
, the source for the tests is in test/
, and the source for the benchmarks is in benchmark/
.
This document describes pandoc’s handling of JATS.
abstract
<abstract>
element.
author
list of article contributors. Each author should have a surname and a given name listed in the entry; if the author has no surname
value, then the item will be used as the contributors string-name
.
orcid
surname
surname of the contributor. Usually the family name in western names.
See <surname>
.
given-names
personal names of the contributor; this includes middle names (if any) in western-style names.
See <given-names>
.
name
author.surname
is not available. Tagged with <string-name>
.
email
the contributor’s email address.
Used as the contents of the <email>
element.
affiliation
either full affiliation entries as described in field affiliation
, or a list of affiliation identifiers.
The identifiers link to the organizations with which an author is affiliated. Each identifier in this list must also occur as the id
of an affiliation listed in the top-level affiliation
list.
If the top-level affiliation
field is set, then this entry assumed to be a list of identifiers, and a list of full entries if that field is unset.
Full entries must be given if the articleauthoring tag set it used, as affiliation links are not allowed in that schema.
equal-contrib
equal-contrib
attribute, set to yes
, is added to the author’s <contrib>
element if this is set to a truthy value.
cor-id
article.author-notes.corresp
. If the cor-id
value is then, an <xref>
link of ref-type
corresp
is added. The rid
attribute is set to cor-<ID>
, where <ID>
is the stringified value of this attribute.
affiliation
the list of organizations with which contributors are affiliated. Each institution is added as an <aff>
element to the author’s contrib-group.
The fields are given in the order in which they are included in the output.
id
<aff>
element’s id
value, prefixed with aff-
.
group
<institution>
element with content-type
set to group
.
department
<institution>
element with content-type
set to dept
.
organization
<institution>
element. The institution element is wrapped in an <institution-wrap>
element; any identifiers, like ringgold
or ror
, are added to the wrapper and must hence belong to this organization (not the department or group).
isni
<institution-id>
element with institution-id-type
set to ISNI
.
ringgold
<institution-id>
element with institution-id-type
set to Ringgold
.
ror
<institution-id>
element with institution-id-type
set to ROR
.
pid
<institution-id>
elements. Each item must contain a map with keys type
, used as institution-id-type
, and id
, used as element content.
street-address
<addr-line>
element, separated by a comma and space (,
).
city
street-address
is not given, in which case the value is wrapped in a <city>
element.
country
<country>
element.
country-code
country
][attr:country] attribute in element <country>
(if the latter is present).
copyright
Copyright and licensing information. This information is rendered via the <permissions>
element.
It is recommended to use the license
field (described below) for licensing information. If licensing information is included below copyright
, then the variables type
, link
, and text
should always be used together.
statement
<copyright-statement>
. Use a list for multiple statements.
year
<copyright-year>
. Use a list to for multiple copyright years. The JATS documentation states that this field need not to be used if the year is included in the copyright statement.
holder
<copyright-holder>
element. Use a list for multiple copyright holders.
text
<license-p>
element.
type
license-type
attribute.
link
xlink:href
attribute in the <license>
element.
date
publication date. This value should usually be a string representation of a date. Pandoc will parse and deconstruct the date into the components given below. It is also possible to pass these components directly.
The publication date is recorded in the document via the <pub-date>
element and its sub-elements. The publication-format
attribute is always set to electronic
.
iso-8601
ISO-8601 representation of the publication date. Used as the value of the <pub-date>
element’s iso-8601-date
attribute.
This value is set automatically if pandoc can parse the date
value as a date.
day
, month
, year
Day, month, and year of the publication date. Only the publication year is required. The values are used as the contents of the elements with the respective names.
The values are set automatically if pandoc can parse the date
value as a date.
type
date-type
attribute on the <pub-date>
element and defaults to “pub” if not specified.
article
information concerning the article that identifies or describes it. The key-value pairs within this map are typically used within the <article-meta>
element.
publisher-id
<article-id>
element with attribute pub-id-type
set to publisher-id
.
doi
<article-id>
element with attribute pub-id-type
set to doi
.
pmid
<article-id>
element with attribute pub-id-type
set to pmid
.
pmcid
<article-id>
element with attribute pub-id-type
set to pmcid
.
art-access-id
<article-id>
element with attribute pub-id-type
set to art-access-id
.
heading
<subject>
element, nested in a <subj-group>
element which has heading
as its subj-group-type
attribute.
categories
<subject>
element, grouped in a single <subj-group>
element with its subj-group-type
attribute set to categories
.
author-notes
Additional information about authors, like conflict of interest statements and corresponding author contact info. Wrapped in an [<author-notes>
][elem:author-notes] element.
conflict
<fn>
) of fn-type
conflict
.
con
<fn>
) of fn-type
con
.
corresp
id
and email
. The info is then rendered via a <corresp>
element.
funding-statement
funding-statement
element.
journal
information on the journal in which the article is published. This must be a map; the following key/value pairs are recognized.
publisher-id
<journal-id>
with attribute journal-id-type
set to publisher-id
.
nlm-ta
<journal-id>
with attribute journal-id-type
set to nlm-ta
.
pmc
<journal-id>
with attribute journal-id-type
set to pmc
.
title
<journal-title>
element.
abbrev-title
<abbrev-journal-title>
element.
pissn
<issn>
element with the publication-format
attribute set to print
.
eissn
<issn>
element with the publication-format
attribute set to electronic
.
publisher-name
<publisher-name>
element.
publisher-loc
<publisher-loc>
element.
license
Article licensing information. Each item of this field is rendered as a <license>
element within the <permissions>
element.
Item content should be either a single paragraph, or a map with the fields listed below.
text
<license-p>
element.
type
license-type
attribute.
link
xlink:href
attribute in the <license>
element.
notes
<notes>
element.
subtitle
<subtitle>
element.
tags
<kwd>
element; the elements are grouped in a <kwd-group>
with the kwd-group-type
value author
.
title
<article-title>
element.
Pandoc’s handling of org files is similar to that of Emacs org-mode. This document aims to highlight the cases where this is not possible or just not the case yet.
The following export keywords are supported:
AUTHOR: comma-separated list of author(s); fully supported.
CREATOR: output generator; passed as plain-text metadata entry creator
, but not used by any default templates.
DATE: creation or publication date; well supported by pandoc.
EMAIL: author email address; passed as plain-text metadata field email
, but not used by any default templates.
LANGUAGE: document language; included as plain-text metadata field lang
. The value should be a BCP47 language tag.
SELECT_TAGS: tags which select a tree for export.
EXCLUDE_TAGS: tags which prevent a subtree from being exported. Fully supported.
TITLE: document title; fully supported.
EXPORT_FILE_NAME: target filename; unsupported, the output defaults to stdout unless a target has to be given as a command line option.