#WORK
#WORK

_library/index-include.md.cms

_library/sitemap-include.md.cms

Introduction

Pandoc has long supported filters, which allow the pandoc abstract syntax tree (AST) to be manipulated between the parsing and the writing phase. Traditional pandoc filters accept a JSON representation of the pandoc AST and produce an altered JSON representation of the AST. They may be written in any programming language, and invoked from pandoc using the --filter option.

Although traditional filters are very flexible, they have a couple of disadvantages. First, there is some overhead in writing JSON to stdout and reading it from stdin (twice, once on each side of the filter). Second, whether a filter will work will depend on details of the user’s environment. A filter may require an interpreter for a certain programming language to be available, as well as a library for manipulating the pandoc AST in JSON form. One cannot simply provide a filter that can be used by anyone who has a certain version of the pandoc executable.

Starting with version 2.0, pandoc makes it possible to write filters in Lua without any external dependencies at all. A Lua interpreter (version 5.3) and a Lua library for creating pandoc filters is built into the pandoc executable. Pandoc data types are marshaled to Lua directly, avoiding the overhead of writing JSON to stdout and reading it from stdin.

[…] (link to full page)

Pandoc, the universal document converter, can serve as a nice intro into functional programming with Haskell. For many contributors, including the author of this guide, pandoc was their first real exposure to this language. Despite its impressive size of more than 60.000 lines of Haskell code (excluding the test suite), pandoc is still very approachable due to its modular architecture. It can serve as an interesting subject for learning.

This guide exists to navigate the large amount of sources, to lay-out a path that can be followed for learning, and to explain the underlying concepts.

A basic understanding of Haskell and of pandoc’s functionality is assumed.

Getting the code

Pandoc has a publicly accessible git repository on GitHub: https://github.com/jgm/pandoc. To get a local copy of the source:

git clone https://github.com/jgm/pandoc

The source for the main pandoc program is app/pandoc.hs. The source for the pandoc library is in src/, the source for the tests is in test/, and the source for the benchmarks is in benchmark/.

[…] (link to full page)

This document describes pandoc’s handling of JATS.

Metadata Values

abstract
Article summary. Added via the document’s front matter via the <abstract> element.
author

list of article contributors. Each author should have a surname and a given name listed in the entry; if the author has no surname value, then the item will be used as the contributors string-name.

orcid
the contributor’s ORCID identifier.
surname

surname of the contributor. Usually the family name in western names.

See <surname>.

given-names

personal names of the contributor; this includes middle names (if any) in western-style names.

See <given-names>.

name
full name of the author; included only as a fallback if author.surname is not available. Tagged with <string-name>.
email

the contributor’s email address.

Used as the contents of the <email> element.

affiliation

either full affiliation entries as described in field affiliation, or a list of affiliation identifiers.

The identifiers link to the organizations with which an author is affiliated. Each identifier in this list must also occur as the id of an affiliation listed in the top-level affiliation list.

If the top-level affiliation field is set, then this entry assumed to be a list of identifiers, and a list of full entries if that field is unset.

Full entries must be given if the articleauthoring tag set it used, as affiliation links are not allowed in that schema.

equal-contrib
boolean attribute used to mark authors who contributed equally to the work. The equal-contrib attribute, set to yes, is added to the author’s <contrib> element if this is set to a truthy value.
cor-id
identifier linking to the contributor’s correspondence information. The info itself must be stored in as an item in article.author-notes.corresp. If the cor-id value is then, an <xref> link of ref-type corresp is added. The rid attribute is set to cor-<ID>, where <ID> is the stringified value of this attribute.
affiliation

the list of organizations with which contributors are affiliated. Each institution is added as an <aff> element to the author’s contrib-group.

The fields are given in the order in which they are included in the output.

id
internal identifier; used as the <aff> element’s id value, prefixed with aff-.
group
name of the research group or other low-level organizational structure; used as value of an <institution> element with content-type set to group.
department
name of the department or other mid-level organizational structure; used as value of an <institution> element with content-type set to dept.
organization
name of the company, university, or other top-level organizational structure; used as value of an <institution> element. The institution element is wrapped in an <institution-wrap> element; any identifiers, like ringgold or ror, are added to the wrapper and must hence belong to this organization (not the department or group).
isni
International Standard Name Identifier of the organization. Added via an <institution-id> element with institution-id-type set to ISNI.
ringgold
Ringgold identifier of the organization. Added via an <institution-id> element with institution-id-type set to Ringgold.
ror
Research Organization Registry identifier of the organization. Added via an <institution-id> element with institution-id-type set to ROR.
pid
Array of persistent identifiers which are added as <institution-id> elements. Each item must contain a map with keys type, used as institution-id-type, and id, used as element content.
street-address
The organization’s street address; each list item is wrapped in an <addr-line> element, separated by a comma and space (,).
city
City in which the organization is located; used only if street-address is not given, in which case the value is wrapped in a <city> element.
country
Country in which the organization is located; used as the value of a <country> element.
country-code
Two letter ISO-3166-1 country identifier; used as the [country][attr:country] attribute in element <country> (if the latter is present).
copyright

Copyright and licensing information. This information is rendered via the <permissions> element.

It is recommended to use the license field (described below) for licensing information. If licensing information is included below copyright, then the variables type, link, and text should always be used together.

statement
copyright notice or statement; used as content of the <copyright-statement>. Use a list for multiple statements.
year
the year of copyright; used as content of the <copyright-year>. Use a list to for multiple copyright years. The JATS documentation states that this field need not to be used if the year is included in the copyright statement.
holder
the copyright holder; included via the <copyright-holder> element. Use a list for multiple copyright holders.
text
inline text setting the license under which the text is published; included via the <license-p> element.
type
type of the license; used as value of the license-type attribute.
link
external link describing the license; used as value of a xlink:href attribute in the <license> element.
date

publication date. This value should usually be a string representation of a date. Pandoc will parse and deconstruct the date into the components given below. It is also possible to pass these components directly.

The publication date is recorded in the document via the <pub-date> element and its sub-elements. The publication-format attribute is always set to electronic.

iso-8601

ISO-8601 representation of the publication date. Used as the value of the <pub-date> element’s iso-8601-date attribute.

This value is set automatically if pandoc can parse the date value as a date.

day, month, year

Day, month, and year of the publication date. Only the publication year is required. The values are used as the contents of the elements with the respective names.

The values are set automatically if pandoc can parse the date value as a date.

type
The type of event marked by this date. The value is set as the date-type attribute on the <pub-date> element and defaults to “pub” if not specified.
article

information concerning the article that identifies or describes it. The key-value pairs within this map are typically used within the <article-meta> element.

publisher-id
external article identifier assigned by the publisher. Used as the content of the <article-id> element with attribute pub-id-type set to publisher-id.
doi
Digital Object Identifier (DOI) assigned to the article. Used as the content of the <article-id> element with attribute pub-id-type set to doi.
pmid
PubMed Identifier (PubMed ID) assigned to the article. Used as the content of the <article-id> element with attribute pub-id-type set to pmid.
pmcid
PubMed Central Identifier assigned to the article. Used as the content of the <article-id> element with attribute pub-id-type set to pmcid.
art-access-id
generic article accession identifier. Used as the content of the <article-id> element with attribute pub-id-type set to art-access-id.
heading
name of a subject or topic describing the article. Used as the content of the <subject> element, nested in a <subj-group> element which has heading as its subj-group-type attribute.
categories
list a subject or topic describing the article. Items are each used as the content a the <subject> element, grouped in a single <subj-group> element with its subj-group-type attribute set to categories.
author-notes

Additional information about authors, like conflict of interest statements and corresponding author contact info. Wrapped in an [<author-notes>][elem:author-notes] element.

conflict
Conflict of interest statement. Rendered as a footnote (<fn>) of fn-type conflict.
con
Contributed-by information. Rendered as a footnote (<fn>) of fn-type con.
corresp
Correspondence information. This must be a list of contributor correspondence items, where each item must have the properties id and email. The info is then rendered via a <corresp> element.
funding-statement
Prose describing the funding. Added to the article’s frontmatter via the funding-statement element.
journal

information on the journal in which the article is published. This must be a map; the following key/value pairs are recognized.

publisher-id
journal identifier assigned by the publisher. Used as content of element <journal-id> with attribute journal-id-type set to publisher-id.
nlm-ta
journal identifier assigned by PubMed. Used as content of element <journal-id> with attribute journal-id-type set to nlm-ta.
pmc
journal identifier assigned by PubMed Central. Used as content of element <journal-id> with attribute journal-id-type set to pmc.
title
full title of the journal in which the article is published. Used as content of the <journal-title> element.
abbrev-title
short form of the journal title. Used as content of the <abbrev-journal-title> element.
pissn
ISSN identifier of the publication’s print version. Used as content of the <issn> element with the publication-format attribute set to print.
eissn
ISSN identifier of the publication’s electronic version. Used as content of the <issn> element with the publication-format attribute set to electronic.
publisher-name
name of the publishing entity (person, company, or other). Used as the content of the <publisher-name> element.
publisher-loc
place of publication. Used as the content of the <publisher-loc> element.
license

Article licensing information. Each item of this field is rendered as a <license> element within the <permissions> element.

Item content should be either a single paragraph, or a map with the fields listed below.

text
inline text describing a license under which the text is published; included via the <license-p> element.
type
type of the license; used as value of the license-type attribute.
link
external link describing the license; used as value of a xlink:href attribute in the <license> element.
notes
Additional notes concerning the whole article. Added to the article’s frontmatter via the <notes> element.
subtitle
Subordinate part of the document title. Added to the document’s front matter as a <subtitle> element.
tags
list of keywords. Items are used as contents of the <kwd> element; the elements are grouped in a <kwd-group> with the kwd-group-type value author.
title
The article title. Added to the document’s front matter via the <article-title> element.

[…] (link to full page)

Pandoc’s handling of org files is similar to that of Emacs org-mode. This document aims to highlight the cases where this is not possible or just not the case yet.

Export options

The following export keywords are supported:

  • AUTHOR: comma-separated list of author(s); fully supported.

  • CREATOR: output generator; passed as plain-text metadata entry creator, but not used by any default templates.

  • DATE: creation or publication date; well supported by pandoc.

  • EMAIL: author email address; passed as plain-text metadata field email, but not used by any default templates.

  • LANGUAGE: document language; included as plain-text metadata field lang. The value should be a BCP47 language tag.

  • SELECT_TAGS: tags which select a tree for export.

  • EXCLUDE_TAGS: tags which prevent a subtree from being exported. Fully supported.

  • TITLE: document title; fully supported.

  • EXPORT_FILE_NAME: target filename; unsupported, the output defaults to stdout unless a target has to be given as a command line option.

[…] (link to full page)

  • ITEM 1
  • ITEM 2
  • ITEM 3

RIGHT BOX

RIGHT TEXT