bike_hire

Description:

To check out the bike hire analysis, please go into the jupyter notebook folder. It has five parts:

time and seasonal analysis
spatial analysis
prediction
prediction -member
prediction - nonmember

For question 2, prediction between SEPT 4 - SEPT10, please review the notebook prediction.

For question 3, spliting the model for member/ non-member does seem to improve the model performance, which should also be observed in the first time and seasonal analysis. Member/Non-member, they do use the bike hire service at a different time in the week and different months as well, namely, non-member tend to hire bike more on the weekend, and in summer time in July and August

If you would like to run the notebook or change settings, please set up your enviroment as follow.

Set up enviroment:

There are three ways to set up your enviroment:

Run the jupyter notebook inside a docker container.

  $ docker build -t bike-hire:1.0 .
  $ docker compose up
  # click the 8888 link to open jupyter notebook server

Use poetry to install the virtual env and all the dependencies

  # install poetry (https://python-poetry.org/docs/)
  $ curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
  
  $ poetry install
  $ poetry shell
  $ jupyter notebook

use your own virtual env and install the requirement.txt

Further thoughts

The preliminary analysis should give a good retionale on which features we should consider for the model training. If time allows, I could try better clustering algo to create more grandular clusters to improve prediction performance. or feature crossing betwen bucketised latitude and longitude.
It will be hard to train and predict daily numbers ONLY on single route between two stations because the data are quite noisy. to acheieve better performance, in this excercise, I focus training on macro features.
While I have put in place a basic framework to do do evaluating different model performances with Baysian Hyperparameter tuning, I have NOT properly run it myself because of time and machine limitation. for the purpose of showing prediction, I use a toy xgboost model without CV. To get the best performance, I will need to increase the iterations for Baysian Hyperparmater tuning and run the model eveluation pipeline. Then use the best model for doing the prediction.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bike_hire

Description:

Set up enviroment:

Further thoughts

About

Releases

Packages

Languages

hong-ds/bike_hire

Folders and files

Latest commit

History

Repository files navigation

bike_hire

Description:

Set up enviroment:

Further thoughts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages