PC: Wikipedia Commons

Identifying locations for a car franchise in Mumbai using Data Science

The scope of my capstone study is to identify optimal showroom locations for a car franchise in Mumbai. For this study, I have used Foursquare API and other datasets available, using Python. Since no datasets were readily available, I did web scraping to extract relevant data. Also, in use is the k-means clustering algorithm to find feasible number of clusters and their cluster centers. The results show 3 optimal geographical coordinates of localities for setting up a mid-range car showroom. The viability of these coordinates was corroborated upon further exploration of these coordinates using Foursquare API.

Mumbai City is huge (approximately 233 square miles) and densely populated (approximately 1.88 crores). Unlike Tier-B cities, it has pockets of diverse income groups — low, medium and high. Because, there was no free dataset on the demographics of Mumbai per locality and area, the challenge was to bifurcate the localities on the basis of purchasing power of people. Hence, I picked up real estate prices; higher the real estate price, higher the maintenance representing higher purchasing power. Before making the model, I ran a correlation between the property rates and rental rates. It was a positive correlation of .87 indicating the purchasing capacity is not dependent on whether one is buying or renting a property.

I used Foursquare API and data from 99acres.com to prepare the dataset. Foursquare API provided the top 100 ‘most popular’ venues (Figure 1) in Mumbai along with names, address, postal codes and their geographical coordinates. I scraped publicly available data from 99acres.com for real estate prices per square feet along with rental rates. In order to merge the two datasets, I scraped data from the web for the list of postal codes and locality names. The locality names were used as the pivot for the merge.

Post data wrangling, I decided the localities needed to be kept for the study. Assumption made was: if property rate is below Rs.15000 per sqft ($206) then we can remove such localities from the dataset. Reasoning: a person who can afford to buy, reside or live on rent in such properties, will have the capacity to purchase a mid-range car given their inclination. As a result of this, 4 localities were eliminated altogether. In reality too, these localities are also off the mark.

On this scrubbed dataset, I used k-means modelling, a centroid model of clustering. K-means clustering is appropriate as it helps with unsupervised data, detect patterns and recognize the center of each cluster. It also helps detect the right number of clusters needed.

For the model, I used geographical locations of the venues and the mean of real estate rates per sq. ft. I used the elbow method (Figure 2.a) to generate number of clusters the model should create. Once the cluster centers were generated algorithmically (Figure 2.b) — these are the geographical coordinates and central tendency of the localities’ property rates — I mapped them (Figure 3).

The final result was that these coordinates (Figure 4) are in Bandra Kurla Complex (Bandra East), Khar West and Ghatkopar West. Passing these coordinates through Foursquare API scripts for further exploration within the radius of 500 m presented, that these coordinates are surrounded by commercial, residential, shopping complexes, cinema halls and cafes. Evidently, they will attract a lot of footfalls if the franchises are set up in these locations.

Selecting a mid-range branded car is a well-researched and practical decision but also requires enough purchasing power ultimately leading to a positive decision; the locations above just indicate that.

For the code on Github, please click the link here. Use nbviewer (jupyter.org) to view the maps.




Ex-Software Developer, MBA, Data Analyst Enthusiast. Bring together business focus and data skills

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Building Your First Wordcloud with Google Colaboratory and Python

Determining Key Components in Popular Songs to Recommend to Individuals in the Music Industry

Data Exploration and Analysis Using Python


Complete Data Engineer’s Vocabulary

What is Data Science and Why is it important?

Code for All Newsletter — November 2020

Going Deeper with Video Data — Partnering with PepsiCo.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Priya Yogendra Rana

Priya Yogendra Rana

Ex-Software Developer, MBA, Data Analyst Enthusiast. Bring together business focus and data skills

More from Medium

Tools for Data Science

Scenario of RPA in Banking

A Simple Approach to Data — Analysis of NYC Airbnb Listings

What do we mean by Big Data?