LEFT TEXT
_library/index-include.md.cms
_library/sitemap-include.md.cms
Pandoc, the universal document converter, can serve as a nice intro into functional programming with Haskell. For many contributors, including the author of this guide, pandoc was their first real exposure to this language. Despite its impressive size of more than 60.000 lines of Haskell code (excluding the test suite), pandoc is still very approachable due to its modular architecture. It can serve as an interesting subject for learning.
This guide exists to navigate the large amount of sources, to lay-out a path that can be followed for learning, and to explain the underlying concepts.
A basic understanding of Haskell and of pandoc’s functionality is assumed.
Pandoc has a publicly accessible git repository on GitHub: https://github.com/jgm/pandoc. To get a local copy of the source:
git clone https://github.com/jgm/pandoc
The source for the main pandoc program is app/pandoc.hs. The source for the pandoc library is in src/, the source for the tests is in test/, and the source for the benchmarks is in benchmark/.
Pandoc has long supported filters, which allow the pandoc abstract syntax tree (AST) to be manipulated between the parsing and the writing phase. Traditional pandoc filters accept a JSON representation of the pandoc AST and produce an altered JSON representation of the AST. They may be written in any programming language, and invoked from pandoc using the --filter option.
Although traditional filters are very flexible, they have a couple of disadvantages. First, there is some overhead in writing JSON to stdout and reading it from stdin (twice, once on each side of the filter). Second, whether a filter will work will depend on details of the user’s environment. A filter may require an interpreter for a certain programming language to be available, as well as a library for manipulating the pandoc AST in JSON form. One cannot simply provide a filter that can be used by anyone who has a certain version of the pandoc executable.
Starting with version 2.0, pandoc makes it possible to write filters in Lua without any external dependencies at all. A Lua interpreter (version 5.4) and a Lua library for creating pandoc filters is built into the pandoc executable. Pandoc data types are marshaled to Lua directly, avoiding the overhead of writing JSON to stdout and reading it from stdin.
This document describes pandoc’s handling of JATS.
abstract<abstract> element.
authorsurname value, then the item will be used as the contributors string-name.
orcidsurname<surname>.
given-names<given-names>.
nameauthor.surname is not available. Tagged with <string-name>.
email<email> element.
affiliationaffiliation, or a list of affiliation identifiers. The identifiers link to the organizations with which an author is affiliated. Each identifier in this list must also occur as the id of an affiliation listed in the top-level affiliation list. If the top-level affiliation field is set, then this entry assumed to be a list of identifiers, and a list of full entries if that field is unset. Full entries must be given if the articleauthoring tag set it used, as affiliation links are not allowed in that schema.
roles<role> element to the author’s <contrib> element. The following examples illustrate: An ad-hoc role:
A role specified with CRediT.
The credit-name is automatically looked up from the CRediT taxonomy, but you can also specify it yourself:
A role specified with CRediT, including an optional degree of contribution. Note that specifying the degree only is allowed when using CRediT roles and not ad-hoc roles.
A role specified with CRediT with a label override, useful for internationalization:
The value for credit and credit-name must be from one of the 14 terms from the Contribution Role Taxonomy (CRediT): credit credit-name ————————– —————————– conceptualization Conceptualization data-curation Data curation formal-analysis Formal analysis funding-acquisition Funding acquisition investigation Investigation methodology Methodology project-administration Project administration resources Resources software Software supervision Supervision validation Validation visualization Visualization writing-original-draft Writing – original draft writing-review-editing Writing – review & editing
JATS suggests in <degree-contribution> to use one of the following three values when specifying the degree of contribution:
LeadEqualSupportingequal-contribequal-contrib attribute, set to yes, is added to the author’s <contrib> element if this is set to a truthy value.
cor-idarticle.author-notes.corresp. If the cor-id value is set, then an <xref> link of ref-type corresp is added. The rid attribute is set to cor-<ID>, where <ID> is the stringified value of this attribute. Furthermore, the corresp attribute on the author’s <contrib> element is set to yes if this attribute is set to a truthy.
affiliation<aff> element to the author’s contrib-group. The fields are given in the order in which they are included in the output.
id<aff> element’s id value, prefixed with aff-.
group<institution> element with content-type set to group.
department<institution> element with content-type set to dept.
organization<institution> element. The institution element is wrapped in an <institution-wrap> element; any identifiers, like ringgold or ror, are added to the wrapper and must hence belong to this organization (not the department or group).
isni<institution-id> element with institution-id-type set to ISNI.
ringgold<institution-id> element with institution-id-type set to Ringgold.
ror<institution-id> element with institution-id-type set to ROR.
pid<institution-id> elements. Each item must contain a map with keys type, used as institution-id-type, and id, used as element content.
street-address<addr-line> element, separated by a comma and space (,).
citystreet-address is not given, in which case the value is wrapped in a <city> element.
country<country> element.
country-codecountry][attr:country] attribute in element <country> (if the latter is present).
copyright<permissions> element. It is recommended to use the license field (described below) for licensing information. If licensing information is included below copyright, then the variables type, link, and text should always be used together.
statement<copyright-statement>. Use a list for multiple statements.
year<copyright-year>. Use a list to for multiple copyright years. The JATS documentation states that this field need not to be used if the year is included in the copyright statement.
holder<copyright-holder> element. Use a list for multiple copyright holders.
text<license-p> element.
typelicense-type attribute.
linkxlink:href attribute in the <license> element.
date<pub-date> element and its sub-elements. The publication-format attribute is always set to electronic.
iso-8601<pub-date> element’s iso-8601-date attribute. This value is set automatically if pandoc can parse the date value as a date.
day, month, yeardate value as a date.
typedate-type attribute on the <pub-date> element and defaults to “pub” if not specified.
article<article-meta> element.
publisher-id<article-id> element with attribute pub-id-type set to publisher-id.
doi<article-id> element with attribute pub-id-type set to doi.
pmid<article-id> element with attribute pub-id-type set to pmid.
pmcid<article-id> element with attribute pub-id-type set to pmcid.
art-access-id<article-id> element with attribute pub-id-type set to art-access-id.
heading<subject> element, nested in a <subj-group> element which has heading as its subj-group-type attribute.
categories<subject> element, grouped in a single <subj-group> element with its subj-group-type attribute set to categories.
author-notes<author-notes>][elem:author-notes] element.
conflict<fn>) of fn-type conflict.
con<fn>) of fn-type con.
correspid and email. The info is then rendered via a <corresp> element.
funding-statementfunding-statement element.
journalpublisher-id<journal-id> with attribute journal-id-type set to publisher-id.
nlm-ta<journal-id> with attribute journal-id-type set to nlm-ta.
pmc<journal-id> with attribute journal-id-type set to pmc.
title<journal-title> element.
abbrev-title<abbrev-journal-title> element.
pissn<issn> element with the publication-format attribute set to print.
eissn<issn> element with the publication-format attribute set to electronic.
publisher-name<publisher-name> element.
publisher-loc<publisher-loc> element.
license<license> element within the <permissions> element. Item content should be either a single paragraph, or a map with the fields listed below.
text<license-p> element.
typelicense-type attribute.
linkxlink:href attribute in the <license> element.
notes<notes> element.
subtitle<subtitle> element.
tags<kwd> element; the elements are grouped in a <kwd-group> with the kwd-group-type value author.
title<article-title> element.
supplementary-material<supplementary-material> element. Only available with jats_articlepublishing.
floats-group<floats-group> element. Only available with jats_publishing and jats_archiving.
Pandoc’s handling of org files is similar to that of Emacs org-mode. This document aims to highlight the cases where this is not possible or just not the case yet.
The following export keywords are supported. (Because they populate metadata fields, they will not generally affect the output unless you use the -s/--standalone option to generate a standalone document with metadata.)
AUTHOR: comma-separated list of author(s); fully supported.
CREATOR: output generator; passed as plain-text metadata entry creator, but not used by any default templates.
DATE: creation or publication date; well supported by pandoc.
EMAIL: author email address; passed as plain-text metadata field email, but not used by any default templates.
LANGUAGE: document language; included as plain-text metadata field lang. The value should be a BCP47 language tag.
SELECT_TAGS: tags which select a tree for export.
EXCLUDE_TAGS: tags which prevent a subtree from being exported. Fully supported.
TITLE: document title; fully supported.
EXPORT_FILE_NAME: target filename; unsupported, the output defaults to stdout unless a target has to be given as a command line option.
RIGHT TEXT
| Albert Krewinkel | 4 |
| Author 1 | 30 |
| Author 2 | 30 |
| Author 3 | 30 |
| Gary B. Genett | 36 |
| Gordon Woodhull | 1 |
| John MacFarlane | 10 |
| Mauro Bieg | 1 |
| massifrg@gmail.com | 1 |