Configuration¶
Requirements
For the development of KGTorrent we used the following environment:
MySQL 5.7.11+
Conda 4.9.2+
Python 3.7+
Environment setup
To execute the scripts in this repository, first make sure to have a running installation of MySQL (v. 5.7.11 or later). In case you don’t, follow the official guide to install the DBMS.
Then, create a new Conda environment and install the Python dependencies of this project (reported in environment.yml
):
conda <environment name> create -f environment.yml
Once the environment is ready, activate it using the following Conda command:
conda activate <environment name>
Finally, this project requires a set of environment variables to be defined. You could customize the template file /.env
and source it from your terminal session to setup all the required variables at once.
Here we provide definitions for each required environment variable.
DB_HOST
The address of the machine hosting your MySQL installation (
localhost
if the DBMS is executed locally).DB_PORT
The connection port to your MySQL installation.
DB_NAME
The name of the MySQL database where KGTorrent metadata will be stored. By default, it is
kaggle_torrent
.MYSQL_USER
Your MySQL username.
MYSQL_PWD
Your MySQL password.
METAKAGGLE_PATH
The path to the folder containing the uncompressed Meta Kaggle dataset.
NB_DEST_PATH
The path to the folder containing the KGTorrent dataset (the Jupyter notebooks archive). This folder should be empty if you are using the scripts to generate the dataset from scratch. On the other hand, this folder should contain the collection of notebooks from a previous version of the dataset if you want to refresh it, by leveraging the latest version of Meta Kaggle.
LOG_DEST_PATH
The path to the folder where KGTorrent will save its log files.