Example project - movie ratings
Example project is small demonstration of TaskChain capabilities and try to show its main features and constructions.
This project allows quick hands-on experience and can serve as template for new projects. You can start by running this notebook.
Keep in mind, that goal of the project is showcase of various features, so chosen solutions for given problems are not always optimal.
Install
pip install taskchain
git clone https://github.com/flowerchecker/taskchain
cd taskchain/example
python setup.py develop
Description
Project works with IMDB movie dataset downloaded from Kaggle. Goals of projects is to explore dataset and train a model which is able to predict rating of a new movie.
Project is to split to 3 pipelines
Movies
This pipeline has the following functions
- load movies data
- filter them
- get basic statistics - year and duration histograms
- extract directors, movies, genres and countries of movies
Features
This pipeline build on movie pipeline and has the following functions
- select the most relevant actors and directors (to use them as features)
- prepare all features - year, duration, and features based on movie's genres, countries, actors and directors (binary features)
- select requested feature types
Rating model
This pipeline build on features pipeline and has the following functions
- create training and eval data from features
- train a mode - models are defined here
- evaluate the model
Project files
example
├── configs
│ ├── features
│ │ ├── all.yaml
│ │ └── basic.yaml
│ ├── movies
│ │ ├── imdb.all.yaml
│ │ └── imdb.filtered.yaml
│ └── rating_model
│ ├── all_features
│ │ ├── baseline.yaml
│ │ ├── linear_regression.yaml
│ │ ├── nn.yaml
│ │ └── tf_linear_regression.yaml
│ └── basic_features
│ ├── baseline.yaml
│ └── linear_regression.yaml
├── data
│ ├── source_data
│ │ ├── IMDB_movies.csv
│ │ └── ratings.Thran.csv
│ └── task_data # here will be computed data
├── scripts
│ ├── features.ipynb
│ ├── introduction.ipynb
│ ├── movies.ipynb
│ ├── personal_rating_model.ipynb
│ ├── rating_model.ipynb
│ └── tasks_run.py
├── setup.py
└── src
└── movie_ratings
├── config.py
├── __init__.py
├── models
│ ├── core.py
│ ├── __init__.py
│ ├── sklearn.py
│ └── tensorflow.py
└── tasks
├── features.py
├── __init__.py
├── movies.py
└── rating_model.py