.. _configuration:
Configuration
=============
**Requirements**
For the development of KGTorrent we used the following environment:
- ``MySQL 5.7.11+``
- ``Conda 4.9.2+``
- ``Python 3.7+``
**Environment setup**
To execute the scripts in this repository, first make sure to have a running installation of MySQL (v. 5.7.11 or later). In case you don't, follow the `official guide `_ to install the DBMS.
Then, create a new Conda environment and install the Python dependencies of this project (reported in ``environment.yml``)::
conda create -f environment.yml
Once the environment is ready, activate it using the following Conda command::
conda activate
Finally, this project requires a set of environment variables to be defined. You could customize the template file ``/.env`` and source it from your terminal session to setup all the required variables at once.
Here we provide definitions for each required environment variable.
``DB_HOST``
The address of the machine hosting your MySQL installation (``localhost`` if the DBMS is executed locally).
``DB_PORT``
The connection port to your MySQL installation.
``DB_NAME``
The name of the MySQL database where KGTorrent metadata will be stored. By default, it is ``kaggle_torrent``.
``MYSQL_USER``
Your MySQL username.
``MYSQL_PWD``
Your MySQL password.
``METAKAGGLE_PATH``
The path to the folder containing the uncompressed Meta Kaggle dataset.
``NB_DEST_PATH``
The path to the folder containing the KGTorrent dataset (the Jupyter notebooks archive). This folder should be empty if you are using the scripts to generate the dataset from scratch. On the other hand, this folder should contain the collection of notebooks from a previous version of the dataset if you want to refresh it, by leveraging the latest version of Meta Kaggle.
``LOG_DEST_PATH``
The path to the folder where KGTorrent will save its log files.