A ride-sharing company (Company X) is interested in predicting rider retention. To help explore this question, there is a sample dataset of a cohort of users who signed up for an account in January 2014. The data was pulled on July 1, 2014; we consider a user retained if they were “active” (i.e. took a trip) in the preceding 30 days (from the day the data was pulled). In other words, a user is "active" if they have taken a trip since June 1, 2014.
city
: city this user signed up in phone: primary device for this user signup_date: date of account registration, in the formYYYYMMDD
last_trip_date
: the last time this user completed a trip, in the formYYYYMMDD
avg_dist
: the average distance (in miles) per trip taken in the first 30 days after signupavg_rating_by_driver
: the rider’s average rating over all of their tripsavg_rating_of_driver
: the rider’s average rating of their drivers over all of their tripssurge_pct
: the percent of trips taken with surge multiplier > 1avg_surge
: The average surge multiplier over all of this user’s tripstrips_in_first_30_days
: the number of trips this user took in the first 30 days after signing upluxury_car_user
: TRUE if the user took a luxury car in their first 30 days; FALSE otherwiseweekday_pct
: the percent of the user’s trips occurring during a weekday