Currently many big communities have adopted the use of rental bikes to improve transportation comfort. It is crucial to make the rental bikes accessible and available to the general public at the appropriate time since it reduces waiting. Eventually, maintaining a steady supply of rental bikes for the city emerges as a top priority. Predicting how many bikes will be needed each hour to maintain a steady supply of rental bikes is important
The Seoul bike sharing demand data set contains the count of the number of bikes rented at each hour in the Seoul bike-sharing system and information regarding weather conditions. The final product will consist of a model that predicts the number of bicycles rented in any given day based on the hour and other weather-related variables such as rainfall and humidity. The system’s predictions are used to guarantee that available bikes will meet the demand for the service.
This project revolves around predictive modeling for Seoul's bike rental demand, leveraging a dataset comprising 8,760 rows and 14 columns. The workflow adheres to a systematic and formal structure, commencing with data collection and preliminary analysis to ascertain the dataset's fundamental characteristics, including its dimensions and data types. Subsequently, data filtering and cleaning are executed to enhance data quality by eliminating superfluous columns and addressing missing values.
The project progresses to Exploratory Data Analysis (EDA), where insightful visualizations are generated to illuminate relationships between dependent and independent variables. This phase also encompasses an analysis of mean distributions and correlations between columns. With a well-informed understanding of the data, attention shifts towards data preparation, encompassing feature engineering, encoding, and the division of data into training and testing sets.
Data scaling ensures optimal model performance. Model selection is a deliberate process aimed at choosing the most suitable algorithm. Model evaluation employs various metrics to gauge model performance, with hyperparameter tuning employed to enhance accuracy and mitigate overfitting. Ultimately, a comprehensive comparison between test and train data illuminates the model's performance and errors, ensuring a robust predictive model for Seoul's bike rental demand.
The dataset contains weather information (Temperature, Humidity, Windspeed, Visibility, Dewpoint, Solar radiation, Snowfall, Rainfall), the number of bikes rented per hour and date information.
Date feature which is str type is needed to convert it into Datetime format DD/MM/YYYY.The new feature extracted from Date are Day, Month and year
Number of bike rented which is our Dependent variable according to our problem statement which is int type.
Hour feature which is in 24 hour format which tells us number bike rented per hour is int type.
Temperature feature which is in celsius scale(°C) is Float type.
Feature humidity in air (%) which is int type.
Wind Speed feature which is in (m/s) is float type.
Visibility feature which is in 10m, is int type.
Dew point Temperature in (°C) which tells us temperature at the start of the day is Float type.
Solar radiation or UV radiation is Float type.
Rainfall feature in mm which indicates 1 mm of rainfall which is equal to 1 litre of water per metre square is Float type.
Snowfall in cm is Float type.
Season, in this feature four seasons are present in data is str type.
whether no holiday or holiday can be retrieved from this feature is str type.
Whether the day is Functioning Day or not can be retrieved from this feature is str type.
Weekend extracted from Day 1 when the day is Saturday or Sunday while 0 when weekdays
1.Summer was the season with the highest number of bike rentals, followed by Autumn, Spring, and Winter. The busiest months for renting bikes are May through July, while December through February are the least popular months.
2.The working class makes up the vast majority of customers in the bike rental industry. The EDA study shows that in Seoul, the demand for bikes is higher during the weekdays while people are at work.
3.The best conditions are found in the afternoon from 4 to 8 pm, when the humidity is between 40% and 60%, and the temperature is between 20 and 30 degrees.
4.Major elements influencing the demand for rental bikes include temperature, daytime, solar radiation, humidity, and hour of the day.
5.The linear model's prediction was very low since there was a very weak linear relationship between the feature and the labels.
Models Train Accuracy Test Accuracy
Linear Regression 68.62% 68.87%
Polynomial Regression 80.88% 79.97%
RidgeCV 76.30% 74.71%
LassoCV 75.32% 74.07%
ElasticNetCV 71.87% 70.85%