System Design-Uber
For those who want to skip the reading part, can directly watch the 30 mins video
What is Uber?
Uber is a ride sharing app where a user can request a ride from one location to another and the app will connect the user to the nearest available Uber drivers.
Functional Requirements:-
- Uber app should know about the drivers locations who are logged in at the moment. (P0)
- The app should be able to find nearby available drivers. (P0)
- User should be able to request a ride, after which the nearest drivers should be notified. (P0)
- At the start of the trip, the system should initiate payment flow. (P1)
- The system should be able to estimate the time of arrival (ETA of the driver). (P1)
- Drivers should be able to confirm that they have picked up the rider. (P0)
- After the ride has been accepted, the driver and rider should be able to view trip updates. (P0)
- The driver should be able to mark the ride has been completed and should become available for the next ride. (P0)
Non-functional Requirements:-
- Availability — The system should be highly available
- Scalability — The system should be scalable to handle ever increasing number of drivers and riders.
- Reliability — The system should provide fast and error free services.
- Consistency — The system must be strongly consistent from the drivers and riders perspective.
- Fraud Detection — The system should have ability to detect any fraudulent activity related to payments.
App Usage Estimation
Let’s assume that we have around 500 million riders and about 5 million drivers.
- We have 20 million daily active users.
- We have 2 million daily active drivers.
- We have 20 million daily trips.
- All active drivers send a notification about their current location every 4 seconds.
Storage Estimation
Rider’s Metadata
- UserId
- User name
- User email
- User address
In total one user’s data we can approximate at 1000 bytes.
With 500 million total users, data will be around = 500 * 10⁶ * 1000 bytes = 500GB.
For the newly registered users, let’s go ahead with a high number as 500K, which will require 500MB of additional data.
Driver’s Metadata
- Driver Id
- Driver name
- Driver email
- Driver Vehicle Type
- Driver License number
In total let’s approximate 1 driver’s metadata to be around 1000 bytes.
With 5 million drivers, we would need 5GB of total storage. Additionaly, if we have 100K new drivers registering daily, we would need 100MB of additional data to store them.
Driver’s Location Metadata
We will need to store latitude and longitude for each driver and few other data points, let’s assume the data size to be around 36 bytes, which would be 5 * 10⁶ * 36 = 180 MB of data.
Trip Metadata
We can assume 100 bytes of data will be required for each trip.
- Uder Id
- Driver Id
- trip Id
So, if we have 20 million active users, then total data per day will = 2GB.
Total data in a year = 2 * 365, which is around 1 TB.
Bandwidth Estimation
Major bandwidth is required by location updates and trip data, so we will not consider all other factors for our bandwidth calculation.
We have 20 million rides in a day.
Total rides in a second = 20 * 10⁶ / 24*3600 = 240 trips in a second.
Now, each trip takes around 100 bytes, so the calculation will give us = 240 * 8 bits * 100 bytes = 200 kbps
Driver’s location will contain = 3 bytes of driver ID, 16 bytes of location information, and we have 3 million active drivers in a second,
= 3 * 10⁶ * (3+16) * 8 bits = 120 Mbps
Servers estimation
Let’s assume each server can handle 8000 requests in a given time.
In order to handle 20 million requests, we would need = 20 million / 8000 = 2500 servers.
Components involved in our design:-
- Database — store metadata of riders, drivers and trips.
- Cache — store most requested data
- CDNs — effectively deliver content to end users, reducing delay and burden on end-servers.
- Load balancer — to distribute read/write requests among servers.
Workflow of the App
API Design
Update Driver location
- driverID
- old_latitude
- old_longitude
- new_latitude
- new_longitude
Find nearby drivers
- riderID
- lat
- long
Request a ride
- riderID
- lat
- long
- dropOffLat
- dropOffLong
- typeOfVehicle
Show Driver ETA
- driverID
- eta
Confirm pickup
- driverID
- riderID
- timestamp
Show Trip Updates
- tripID
- riderID
- driverID
- driveLat
- driveLong
- time_elapsed
- time_remaining
End Trip
- tripID
- driverID
- riderID
- time_elapsed
- lat
- long
High Level Detailed Design
Location Manager Service
Riders and Drivers will connect to the location manager service. The riders will basically connect with Request Vehicle service, which in turn connect to location manager, whose task is to get all the drivers in the nearby area.
This service also gets frequent updates about the drivers location. So it stores the information in QuadTree Map service, along with saving it in the database for other computations by other services like ETA calculation etc.
Quad Tree Map service
Let’s look at an example:-
Driver 1:- (0,2)
Driver 2:- (400,400)
Driver 3:- (200,200)
Driver 4:- (30, 40)
We will store the count of drivers in their corresponding blocks. This will help us achieve low latency when we want to retrieve all the drivers located nearby me.
Workflow:-
- When the service comes up for the first time, we basically split our 2D world map into 4 sub parts and assign (lat, long) a unique segmentID, based on which we will shard it.
- We can set a threshold like 5 miles or 10 miles and we will keep on splitting up the map until we get that.
- Now, there is one question that arrises, what if some remote areas doesn’t have many drivers, so instead of doing it at the first time, we can do it dynamically and every time a region gets 20 drivers or more, then we split the area into 4 more parts.
- Now, as soon as drivers login, we will update their location on the QuadTree service.
Now, to keep the data fetching faster, let’s calculate the size of our QuadTree in case we store the information in each servers RAM.
There are 200 million unique places on the world, let’s assume each lat, long and segmentID in total takes up 16 bytes.
5 million unqiue drivers, each driverID, lat, long will take up another 16 bytes.
Total data = 5 * 10⁶ * 16 + 200 * 10⁶ * 16 = 3.2 GB which is nothing and easily could be saved on 1 server but, to have better availability, we can have replicates for the server.
Request Vehicle Service
When the rider requests a vehicle, we take in the drop off location and vehicle type, which is then Fed into Find Match service.
Find Match Service
Once we receive data from Request vehicle service, we now use location manager service to find drivers in the segment and send New ride accept request to all of them.
Now, here one thing to note is, it would take some time for finding the appropriate drivers, then send them the notification and give them a time of 1 minute to either accept the ride or ignore it.
If they accept it, the Find Match servui write the riderID, driverID and rideID and adds a new row into MySQL DB. Reason for using relational database is that, we could make use of ACID properties. In this table we also remove mark the ride status as complete once the ride is complete, so that the driver’s status becomes available again.
Trip Manager service
If they accept it, the Find Match service forward the request to Trip Manager which creates a new row of riderID, driverID and rideID into MySQL DB. Reason for using relational database is that, we could make use of ACID properties. In this table we also remove mark the ride status as complete once the ride is complete, so that the driver’s status becomes available again.
Trip Manager also has the task to send the request to payment service, which could then hit third party service like Stripe etc to process the payment.
Databases
There are two types of databases that we use for our application:-
- We can use a NoSQL database for storing the driver’s last known location, payment information after the trip is completed and store the route information after the trip is completed because it’s going to be a write heavy computation.
- We can use MySQl when we have a match between rider and driver and also when the trip is taking place, as we need to make sure the updates to the trip status is consistent across our whole application.
ETA service
For the ETA service, the basic approach would be to apply an algorithm like Djisktras to find the smallest route between the driver and the pickup person.
What we can do is, for each road intersection, we can use a third party api, to get the traffic amount on that road and we can use that value as the weight for the edges between two nodes.
Now, obviously there are other factors to be considered like:-
- speed limit on the edge
- time of the day
- crossings
Once we have our system in place, we can build a machine learning model that could predict ETA between two points for a given time.
Payment Service
For payment, we can make use of third party integration services like Stripe etc and the payment transaction could be initialized after the completion of the trip.
Conclusion
Today we looked at How to add and create a discord bot in Golang?
Follow me for updates like this.
Connect with me on:-
Twitter 👦🏻:- https://twitter.com/kmmtmm92
Youtube 📹:- https://www.youtube.com/channel/UCpmw7QtwoHXV05D3NUWZ3oQ
Github 💭:- https://github.com/Kavit900
Instagram 📸:- https://www.instagram.com/code_with_kavit/