EXMARaLDA (Dulko) is a set of tools for the EXMARaLDA Partitur-Editor with transformation scenarios (actually, XSLT 2.0 stylesheets) for the annotation of learner data in learner corpora, supporting tokenisation, part-of-speech tagging, lemmatisation, sentence-span computation, editing target hypotheses, detection of differences between target hypotheses and the learner text, error analysis, and metadata management (Hirschmann and Nolda 2019, Nolda 2019). It has been developed for the Dulko learner-corpus project at the University of Szeged.
This repository provides the sources of EXMARaLDA (Dulko). The latest release is
available as a ZIP archive
which contains, in particular, an executable for Microsoft Windows
exmaralda-dulko.exe) as well as start-up scripts for MacOS
exmaralda-dulko.command) and Linux (
Unless already installed, install a Java runtime environment (JRE) or Java development kit (JDK), e.g. Oracle Java[^1] or Amazon Corretto; on Linux, you can also use OpenJDK. Note that currently, Java version 8 is required.
Unless already installed, download
install it into some directory
On Microsoft Windows, extract the downloaded ZIP archive into
C:\Program Files\TreeTagger or another directory. Note this directory for
On MacOS, there should be a directory called
or similar in the Downloads folder. Drag this directory into the
Applications folder or onto the desktop and rename it to
On Linux, extract the downloaded TAR.GZ archive into
After the installation, there should be a directory
<DIR1>/bin with the
tree-tagger.exe on Microsoft Windows or
tree-tagger on MacOS and
Create a subdirectory
lib in the TreeTagger directory
Download the German parameter file for TreeTagger.
On Microsoft Windows, uncompress this GZ file with
7-Zip or another tool, rename it to
german-utf8.par and copy or move this file to
On MacOS, there should be a file called
german.par in the Downloads
folder. Rename it to
german-utf8.par and drag it into
On Linux, uncompress the GZ file, rename it to
german-utf8.par, and copy
or move it into
Unless already installed, download the
release version of EXMARaLDA (1.6.1)
corresponding to your system and install it into some directory
Note that EXMARaLDA (Dulko) no longer works with older versions of EXMARaLDA.
On Microsoft Windows, it is recommended to use the default path for
If you are running MacOS and have Oracle Java installed on your system, you
only need the
Partitur-Editor disk image for Oracle Java,
which you can install by dragging the icon called
PartiturEditor_OJ.dmg into the
Applications folder or onto the desktop.
On Linux, install EXMARaLDA into
and install it into some directory
On Microsoft Windows, you can use for this task the setup program
exmaralda-dulko-<VERSION>-setup.exe, which is included in the downloaded
ZIP archive. Please note that on this system, the installation directory
<DIR3> must be a sister directory of
<DIR2>, which is the setup
program’s default (typically,
C:\Program Files\EXMARaLDA (Dulko)).
On MacOS, there should be a directory called
the Downloads folder. While you may run EXMARaLDA (Dulko) from there, it is
recommended to drag the directory into the
Applications folder or onto the
desktop and rename it to
On Linux, extract the ZIP archive to
$HOME/exmaralda-dulko, or another directory of your choice.
On Microsoft Windows, search for
SystemPropertiesAdvanced and create a
system environment variable with the name
TREETAGGER_HOME and the path
to the TreeTagger directory
<DIR1> which you noted during the installation
of the TreeTagger.
On MacOS and Linux, the environment variable
TREETAGGER_HOME is set by the
<DIR3> (unless already set by the environment). If you have installed
TreeTagger into one of the directories recommended in the installation
instructions above, nothing needs to be done. If you have installed it into
a non-standard directory
<DIR1>, open the start-up script with a text
editor and set the variable
If you have installed EXMARaLDA into a non-standard directory
MacOS or Linux, set the variable
EXMARALDADIR in the start-up script
Run EXMARaLDA (Dulko).
In order to run EXMARaLDA (Dulko) on Microsoft Windows, click on the
EXMARaLDA (Dulko) icon on the desktop or run it from the EXMARaLDA submenu
in the start menu.
On MacOS, run the start-up script
the script cannot be run with a double click, right-click on it and open it
with the terminal.
On Linux, run the start-up script
<DIR3>. If you
export PATH=<DIR3>:$PATH to
copy or move the desktop file
can also run EXMARaLDA (Dulko) from your desktop’s application menu.
Open the annotation panel (‘View’ > ‘Annotation panel’) and open the file
Optionally, open the preferences (‘Edit’ > ‘Preferences’), switch to the
‘Stylesheets’ tab, and set the ‘Transcription to format table’ stylesheet to
<DIR3>/dulko.template.exb in EXMARaLDA (Dulko) (‘File’ > ‘Open’) and
save it under a new name (‘File’ > ‘Save as’).[^2]
Open the metainformation dialog (‘Transcription’ > ‘Metainformation’) and edit general metadata.
Open the speakertable (‘Transcription’ > ‘Speakertable’) and edit the speaker metadata.[^3]
On the main window, write or paste the learner text into one or several cells of the first tier. You can also first work on a proper part of the learner text (e.g. the first sentence) and add further parts later on.
Apply the transformation scenario ‘Dulko: word-Spur (Lernertext)’ (‘Transcription’ > ‘Transformation’), which tokenises the learner text and normalises punctuation marks.
If you want to annotate editorial changes by the learner, apply the
transformation scenario ‘Dulko: orig-Spur (Lernertext)’, which adds a tier
for the original, unchanged, learner text. When editing this tier, you can
use the symbols
_ for marking paragraph breaks, line
breaks, hyphenations, and omissions, respectively.[^4]
Apply the transformation scenario ‘Dulko: S-, pos- und lemma-Spuren (Lernertext)’ for parts-of-speech tagging, lemmatisation, and sentence-span identification of the learner text.[^5]
If you have added a tier for the original learner text in step 6, apply the transformation scenario ‘Dulko: Diff-Spur (Lernertext)’, which detects editorial changes.
If you have used some of the symbols
_, mentioned above
in step 6, on the tier for the original learner text, apply the
transformation scenario ‘Dulko: Layout-Spur (Lernertext)’, which
automatically tags those symbols.
Optionally, apply the transformation scenario ‘Dulko: Graph-Spur (Lernertext)’, which adds a tier on which you can tag graphical renditions of the learner text by means of the annotation panel.
Apply the transformation scenario ‘Dulko: trans-Spur (Lernertext)’ in case the learner text is a translation. Write or paste the text translated by the learner into the cells of the new tier.
Apply the transformation scenario ‘Dulko: ZH- und Fehler-Spuren (1. Zielhypothese)’, which adds tiers for a target hypothesis and for error analysis. Edit the target hypothesis, and tag errors by means of the annotation panel.
Apply the transformation scenario ‘Dulko: ZHS-, ZHpos- und ZHlemma-Spuren (1. Zielhypothese)’ for parts-of-speech tagging, lemmatisation, and sentence-span identification of the target hypothesis.
Finally, apply the transformation scenario ‘Dulko: ZHDiff-Spur (1. Zielhypothese)’, which detects differences between the target hypothesis and the learner text.[^6]
In order to annotate further target hypotheses, apply the transformation scenarios for ‘2. Zielhypothese’, ‘3. Zielhypothese’, or ‘weitere Zielhypothese’. These transformation scenarios do not operate on the learner text but on the preceding target hypothesis.
Note that you can re-apply any of the above transformation scenarios in case you want to update the corresponding tiers, e.g. in order to revise the annotations or annotate further parts of the learner text.[^7]
If required, additional timeline items can be inserted by clicking on the next timeline item and choosing ‘Timeline’ > ‘Insert timeline item’. The transformation scenario ‘Dulko: Zeitachse’, in turn, removes unused timeline items.
Apply the transformation scenario ‘Dulko: HTML-Version’ for exporting the table sentence-wise into a HTML file, which can be viewed and printed by means of your favourite browser.
Run ‘Transcription’ > ‘Export segmented transcription’ for exporting the table to an EXS file, which can be used in COMA and EXAKT.[^8]
Apply the transformation scenarios ‘Dulko: ANNIS-kompatible Version’ and ‘Dulko:
Pepper-kompatible Metadaten-Liste’ before exporting the final EXMARaLDA file to
ANNIS via Pepper. The former transformation
scenario deletes redundant annotations and adds namespace prefixes like
ZH2 to the target-hypothesis and error tiers; those namespace prefixes are
needed for properly ordering the tiers in ANNIS. The latter transformation
scenario outputs an attribute-value list with corpus-level metadata for Pepper
(cf. Pepper’s customisation property
Andreas Nolda (email@example.com)
[^1]: A user of Microsoft Windows 8.1 reported that the installation program of
the Oracle Java runtime environment does not set the system environment
JAVA_HOME to the JRE installation path, which prevented
EXMARaLDA from running. Cf. the configuration instructions in this README
on how to set such variables. Alternatively, you can install the Oracle
Java development kit or Amazon Corretto, which both appear to properly set
[^2]: Alternatively, you may start from a blank table (‘File’ > ‘New’). Metadata
can be imported from
<DIR3>/dulko.template.exb by applying the
transformation scenario ‘Dulko: Metadaten’.
[^3]: Part of the speaker metadata (viz. the value of the ‘Abbreviation’ field)
is used to generate tier names. If changed, the tier names can be updated
by means of the the transformation scenario ‘Dulko: Spurnamen’.
[^4]: In order to mark a hyphenation in the learner text, the corresponding word
on the tier for the original learner text has to be split into three
events consisting of the first part of the word, the symbol
-, and the
second part of the word, respectively. Optionally, you can add a further
event with the symbol
- as an explicit line-break mark.
[^5]: The stylesheets for sentence-span tiers (Satzspannen) automatically
identifies sentence spans ending in a punctuation character that
TreeTagger tags as
$. or ending in an abbreviation followed by a
capitalised version of a non-noun. Sentence spans with different endings
have to be tagged manually by splitting the corresponding sentence-span
event inserted by the stylesheet; the sentence-span names can then be
regenerated with the transformation scenario ‘Dulko: Satzspannen’.
[^6]: The stylesheet for difference tiers (Differenz-Spuren) tries hard to
detect movement source and target pairs, which are tagged with
MOV[EMENT]T[ARGET], respectively. If unsure, it
tags potential movement sources and targets with the tags
MOVT/INS, which have to be manually disambiguated (e.g. by means of the
[^7]: The only exception is the transformation scenario ‘Dulko: ZH- und
Fehler-Spuren (weitere Zielhypothese)’, which always creates new tiers.
[^8]: In EXMARaLDA (Dulko), this menu entry runs the XSLT stylesheet
exb2exs.xsl on the current EXB file.