System Design-Uber

Kavit (zenwraight)
8 min readNov 2, 2023

For those who want to skip the reading part, can directly watch the 30 mins video

What is Uber?

Uber is a ride sharing app where a user can request a ride from one location to another and the app will connect the user to the nearest available Uber drivers.

Functional Requirements:-

  • Uber app should know about the drivers locations who are logged in at the moment. (P0)
  • The app should be able to find nearby available drivers. (P0)
  • User should be able to request a ride, after which the nearest drivers should be notified. (P0)
  • At the start of the trip, the system should initiate payment flow. (P1)
  • The system should be able to estimate the time of arrival (ETA of the driver). (P1)
  • Drivers should be able to confirm that they have picked up the rider. (P0)
  • After the ride has been accepted, the driver and rider should be able to view trip updates. (P0)
  • The driver should be able to mark the ride has been completed and should become available for the next ride. (P0)

Non-functional Requirements:-

  • Availability — The system should be highly available
  • Scalability — The system should be scalable to handle ever increasing number of drivers and riders.
  • Reliability — The system should provide fast and error free services.
  • Consistency — The system must be strongly consistent from the drivers and riders perspective.
  • Fraud Detection — The system should have ability to detect any fraudulent activity related to payments.

App Usage Estimation

Let’s assume that we have around 500 million riders and about 5 million drivers.

  • We have 20 million daily active users.
  • We have 2 million daily active drivers.
  • We have 20 million daily trips.
  • All active drivers send a notification about their current location every 4 seconds.

Storage Estimation

Rider’s Metadata

  • UserId
  • User name
  • User email
  • User address

In total one user’s data we can approximate at 1000 bytes.

With 500 million total users, data will be around = 500 * 10⁶ * 1000 bytes = 500GB.

For the newly registered users, let’s go ahead with a high number as 500K, which will require 500MB of additional data.

Driver’s Metadata

  • Driver Id
  • Driver name
  • Driver email
  • Driver Vehicle Type
  • Driver License number

In total let’s approximate 1 driver’s metadata to be around 1000 bytes.

With 5 million drivers, we would need 5GB of total storage. Additionaly, if we have 100K new drivers registering daily, we would need 100MB of additional data to store them.

Driver’s Location Metadata

We will need to store latitude and longitude for each driver and few other data points, let’s assume the data size to be around 36 bytes, which would be 5 * 10⁶ * 36 = 180 MB of data.

Trip Metadata

We can assume 100 bytes of data will be required for each trip.

  • Uder Id
  • Driver Id
  • trip Id

So, if we have 20 million active users, then total data per day will = 2GB.

Total data in a year = 2 * 365, which is around 1 TB.

Bandwidth Estimation

Major bandwidth is required by location updates and trip data, so we will not consider all other factors for our bandwidth calculation.

We have 20 million rides in a day.

Total rides in a second = 20 * 10⁶ / 24*3600 = 240 trips in a second.

Now, each trip takes around 100 bytes, so the calculation will give us = 240 * 8 bits * 100 bytes = 200 kbps

Driver’s location will contain = 3 bytes of driver ID, 16 bytes of location information, and we have 3 million active drivers in a second,

= 3 * 10⁶ * (3+16) * 8 bits = 120 Mbps

Servers estimation

Let’s assume each server can handle 8000 requests in a given time.

In order to handle 20 million requests, we would need = 20 million / 8000 = 2500 servers.

Components involved in our design:-

  • Database — store metadata of riders, drivers and trips.
  • Cache — store most requested data
  • CDNs — effectively deliver content to end users, reducing delay and burden on end-servers.
  • Load balancer — to distribute read/write requests among servers.

Workflow of the App

API Design

Update Driver location

  • driverID
  • old_latitude
  • old_longitude
  • new_latitude
  • new_longitude

Find nearby drivers

  • riderID
  • lat
  • long

Request a ride

  • riderID
  • lat
  • long
  • dropOffLat
  • dropOffLong
  • typeOfVehicle

Show Driver ETA

  • driverID
  • eta

Confirm pickup

  • driverID
  • riderID
  • timestamp

Show Trip Updates

  • tripID
  • riderID
  • driverID
  • driveLat
  • driveLong
  • time_elapsed
  • time_remaining

End Trip

  • tripID
  • driverID
  • riderID
  • time_elapsed
  • lat
  • long

High Level Detailed Design

Location Manager Service

Riders and Drivers will connect to the location manager service. The riders will basically connect with Request Vehicle service, which in turn connect to location manager, whose task is to get all the drivers in the nearby area.

This service also gets frequent updates about the drivers location. So it stores the information in QuadTree Map service, along with saving it in the database for other computations by other services like ETA calculation etc.

Quad Tree Map service

Let’s look at an example:-

Driver 1:- (0,2)

Driver 2:- (400,400)

Driver 3:- (200,200)

Driver 4:- (30, 40)

We will store the count of drivers in their corresponding blocks. This will help us achieve low latency when we want to retrieve all the drivers located nearby me.

Workflow:-

  1. When the service comes up for the first time, we basically split our 2D world map into 4 sub parts and assign (lat, long) a unique segmentID, based on which we will shard it.
  2. We can set a threshold like 5 miles or 10 miles and we will keep on splitting up the map until we get that.
  3. Now, there is one question that arrises, what if some remote areas doesn’t have many drivers, so instead of doing it at the first time, we can do it dynamically and every time a region gets 20 drivers or more, then we split the area into 4 more parts.
  4. Now, as soon as drivers login, we will update their location on the QuadTree service.

Now, to keep the data fetching faster, let’s calculate the size of our QuadTree in case we store the information in each servers RAM.

There are 200 million unique places on the world, let’s assume each lat, long and segmentID in total takes up 16 bytes.

5 million unqiue drivers, each driverID, lat, long will take up another 16 bytes.

Total data = 5 * 10⁶ * 16 + 200 * 10⁶ * 16 = 3.2 GB which is nothing and easily could be saved on 1 server but, to have better availability, we can have replicates for the server.

Request Vehicle Service

When the rider requests a vehicle, we take in the drop off location and vehicle type, which is then Fed into Find Match service.

Find Match Service

Once we receive data from Request vehicle service, we now use location manager service to find drivers in the segment and send New ride accept request to all of them.

Now, here one thing to note is, it would take some time for finding the appropriate drivers, then send them the notification and give them a time of 1 minute to either accept the ride or ignore it.

If they accept it, the Find Match servui write the riderID, driverID and rideID and adds a new row into MySQL DB. Reason for using relational database is that, we could make use of ACID properties. In this table we also remove mark the ride status as complete once the ride is complete, so that the driver’s status becomes available again.

Trip Manager service

If they accept it, the Find Match service forward the request to Trip Manager which creates a new row of riderID, driverID and rideID into MySQL DB. Reason for using relational database is that, we could make use of ACID properties. In this table we also remove mark the ride status as complete once the ride is complete, so that the driver’s status becomes available again.

Trip Manager also has the task to send the request to payment service, which could then hit third party service like Stripe etc to process the payment.

Databases

There are two types of databases that we use for our application:-

  1. We can use a NoSQL database for storing the driver’s last known location, payment information after the trip is completed and store the route information after the trip is completed because it’s going to be a write heavy computation.
  2. We can use MySQl when we have a match between rider and driver and also when the trip is taking place, as we need to make sure the updates to the trip status is consistent across our whole application.

ETA service

For the ETA service, the basic approach would be to apply an algorithm like Djisktras to find the smallest route between the driver and the pickup person.

What we can do is, for each road intersection, we can use a third party api, to get the traffic amount on that road and we can use that value as the weight for the edges between two nodes.

Now, obviously there are other factors to be considered like:-

  • speed limit on the edge
  • time of the day
  • crossings

Once we have our system in place, we can build a machine learning model that could predict ETA between two points for a given time.

Payment Service

For payment, we can make use of third party integration services like Stripe etc and the payment transaction could be initialized after the completion of the trip.

Conclusion

Today we looked at How to add and create a discord bot in Golang?

Follow me for updates like this.

Connect with me on:-

Twitter 👦🏻:- https://twitter.com/kmmtmm92

Youtube 📹:- https://www.youtube.com/channel/UCpmw7QtwoHXV05D3NUWZ3oQ

Github 💭:- https://github.com/Kavit900

Instagram 📸:- https://www.instagram.com/code_with_kavit/

--

--