Collab projects on GitHub

EMTk

Collab Emotion Mining Toolkit

Table of Contents

About

Download

How to run

Programming languages, 3rd party libs, and OS

How to cite

License

About EMTk

EMTk is a toolkit for training custom sentiment and emotion classifiers from text. The emotion-mining toolkit comprises two main modules:

Choose Collab EMTk if:

Download

The Docker image

To ease the installation and setup costs, we have packaged EMTk as a lightweight Docker container and published it on Docker-Hub at this repository. Anyone interested in using EMTk must first install Docker. Then, to install the latest version of the EMTk container image, run the following from the command line:

$ docker pull collabuniba/emtk
$ docker run --rm -v <sharedFolderPath>:/shared -ti collabuniba/emtk

where <sharedFolderPath> is the path to the folder on the host that will be shared with the container to allow file exchange at runtime.

Direct download

EMTk and all other software developed by Collab is available on GitHub. If you don’t want to run the software from the Docker container, feel free to download directly the modules from their repos by clicking on any of the buttons below.

EmoTXT   Senti4SD

How to run

First, execute the Docker container in interactive mode. By default, the instruction below will execute the latest version.

# docker run --rm -it collabuniba/emtk
Polarity module

The -it option starts the container in the interactive mode, so the run command logs you in the container’s shell environment (>). From there, to execute the polarity module, run:

> emtk polarity -F A -i input.csv -oc output.csv -vd 600 [-W dsm.bin] [-L] [-ul unigramList -bl bigramList]

where:

Users can test-drive the polarity module by using the file /polarity_sample.csv, containing only a handful of documents.

Emotion module

Regarding the emotion classifier module, in the following, we show first how to train a new model and, then, how to test it on unseen data. To train a new model on a training set, run:

> emtk emotions train -i file.csv -d delimiter [-g] -e emotion

where:

As a result, the script will generate an output folder in the present working directory named training_<file.csv>_<emotion>/, containing:

Finally, to execute the classification task, run:

> emtk emotions classify -i file.csv -d delimiter -e emotion [-m model] [-f /path/to/.../idfs] [-o /path/to/.../ngrams] [-l]

where:

As a result, the script will create an output folder in the present working directory named classification_<file.csv>_<emotion>, containing:

Users can test-drive the emotion classification module by using the file /emotions_sample.csv, which contains only a handful of documents. Other more complex sample datasets are available at /emotions/java/DatasetSO/StackOverflowCSV.

The /shared/ folder

To use the EMTk modules with custom datasets, users must access the /shared/ folder, which is mounted specifying the –v option in the docker run command shown above. The -v option defines the paths for the folder to be shared in both the host and the hosted machines:

docker run -v <pathInTheHostMachine>:<pathInTheContainer> [...].

For instance, on a Linux machine, -v ~/shared:/shared creates a folder named shared in the host system’s home (if it doesn’t already exist) and a folder named shared in the container’s root. Whatever is put into the shared folder can be found on both the systems, allowing input and output file exchange. This is accomplished by leveraging Docker’s bind mounts.

Programming languages, 3rd party libs, and OS

Collab EMTk is developed using a mix of Java, Python, R. Hence, it works on Linux, macOS, and Windows. The following 3rd party libraries are also used:

How to cite

If you intend to use the Collab EMTk for your work, please cite the following papers:

License

Collab EMTk is licensed under the MIT License.

Support or Contact

Having trouble with our toolkit? Contact us and we’ll help you sort it out.