Example project - movie ratings

Example project is small demonstration of TaskChain capabilities and try to show its main features and constructions.

This project allows quick hands-on experience and can serve as template for new projects. You can start by running this notebook.

Keep in mind, that goal of the project is showcase of various features, so chosen solutions for given problems are not always optimal.

Install

pip install taskchain

git clone https://github.com/flowerchecker/taskchain
cd taskchain/example
python setup.py develop

Description

Project works with IMDB movie dataset downloaded from Kaggle. Goals of projects is to explore dataset and train a model which is able to predict rating of a new movie.

Project is to split to 3 pipelines

Movies

tasks, configs, notebook

This pipeline has the following functions

load movies data
filter them
get basic statistics - year and duration histograms
extract directors, movies, genres and countries of movies

Features

tasks, configs, notebook

This pipeline build on movie pipeline and has the following functions

select the most relevant actors and directors (to use them as features)
prepare all features - year, duration, and features based on movie's genres, countries, actors and directors (binary features)
select requested feature types

Rating model

tasks, configs, notebook

This pipeline build on features pipeline and has the following functions

create training and eval data from features
train a mode - models are defined here
evaluate the model

Project files

example
├── configs
│   ├── features
│   │   ├── all.yaml
│   │   └── basic.yaml
│   ├── movies
│   │   ├── imdb.all.yaml
│   │   └── imdb.filtered.yaml
│   └── rating_model
│       ├── all_features
│       │   ├── baseline.yaml
│       │   ├── linear_regression.yaml
│       │   ├── nn.yaml
│       │   └── tf_linear_regression.yaml
│       └── basic_features
│           ├── baseline.yaml
│           └── linear_regression.yaml
├── data
│   ├── source_data
│   │   ├── IMDB_movies.csv
│   │   └── ratings.Thran.csv
│   └── task_data       # here will be computed data
├── scripts
│   ├── features.ipynb
│   ├── introduction.ipynb
│   ├── movies.ipynb
│   ├── personal_rating_model.ipynb
│   ├── rating_model.ipynb
│   └── tasks_run.py
├── setup.py
└── src
    └── movie_ratings
        ├── config.py
        ├── __init__.py
        ├── models
        │   ├── core.py
        │   ├── __init__.py
        │   ├── sklearn.py
        │   └── tensorflow.py
        └── tasks
            ├── features.py
            ├── __init__.py
            ├── movies.py
            └── rating_model.py