How to Scrape Flipkart Product Data with Beautiful Soup and Python?

May 14, 2021
how-to-scrape-flipkart-product-data-with-beautiful-soup-and-python

In this blog, we will see how we scrape Flipkart product data scraping using BeautifulSoup and Python in an elegant and simple manner.

The target of this blog is to get started on practical problem resolving while holding it easy so that you get practical and familiar outcomes as quick as feasible.

The first thing we require to do is install Python 3. If you don’t than you need to install Python 3 before the process.

pip3 install beautifulsoup4

We will also require thelxml, soupsieve, and library’s requests to catch data, breakdown to utilize CSS selectors, and XML. Install them utilizing.

pip3 install requests soupsievelxml

Once it is installed you require to type in and open the editor: –

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests

Now visit the Flipkart List page and examine what information we can acquire

now-visit-the-flipkart-list-page

Not let’s get back to code. Let’s get data and try to imagining we are a browser like this: –

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
import reheaders = {‘User-Agent’:’Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9′}
url = ‘https://www.flipkart.com/mobile-accessories/power-banks/pr?sid=tyy,4mr,fu6&otracker=categorytree&otracker=nmenu_sub_Electronics_0_Power Banks’response=requests.get(url,headers=headers)soup=BeautifulSoup(response.content,’lxml’)

You can save this by scrapeFlipkart.py

python3 scrapeFlipkart.py

You will able to see the full HTML page.

Now, utilize CSS selectors to acquire data you need. To perform that you need to go to open chrome review the tool.

the-full-html-page

We observe that all the particular product information is included with the quality data-id. You also observe that the feature worth is some rubbish and it always keeps changing. But the hint is the occurrence of the data-id features itself. The whole thing we require. So let’s scrap that.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
import reheaders = {‘User-Agent’:’Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9′}
url = ‘https://www.flipkart.com/mobile-accessories/power-banks/pr?sid=tyy,4mr,fu6&otracker=categorytree&otracker=nmenu_sub_Electronics_0_Power Banks’response=requests.get(url,headers=headers)soup=BeautifulSoup(response.content,’lxml’)for item in soup.select(‘[data-id]’):
try:
print(‘—————————————-‘)
print(item)
except Exception as e:
#raise e
b=0

This will print the content in every of the ampoules that clutch the product information.

code

Now get back to work for every field we require. This is interesting because Flipkart HTML has no significant CSS programs we can utilize. So we will route to some actions that is dependable.

print(item.select(‘a img’)[0][‘alt’])
print(item.select(‘a’)[0][‘href’])

The other lines beyond give us the URL of the list.

But we can utilize the *= operator to choose whatever which has the term product rating like this:

print(item.select(‘[id*=productRating]’)[0].get_text().strip())

Extracting the price is more interesting as it do not contain visible ID or class name as a hint to get. But every time it has the exchange denominator ₹ in it.

prices = item.find_all(text=re.compile(‘₹’))
print(prices[0])

We do similar to acquire the discount rates.

discounts = item.find_all(text=re.compile(‘off’))
print(discounts[0])

Put everything together

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
import reheaders = {‘User-Agent’:’Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9′}
url = ‘https://www.flipkart.com/mobile-accessories/power-banks/pr?sid=tyy,4mr,fu6&otracker=categorytree&otracker=nmenu_sub_Electronics_0_Power Banks’response=requests.get(url,headers=headers)soup=BeautifulSoup(response.content,’lxml’)for item in soup.select(‘[data-id]’):
try:
print(‘—————————————-‘)
#print(item)
print(item.select(‘a img’)[0][‘alt’])
print(item.select(‘a’)[0][‘href’])         print(item.select(‘[id*=productRating]’)[0].get_text().strip())
prices = item.find_all(text=re.compile(‘₹’))
print(prices[0])        discounts = item.find_all(text=re.compile(‘off’))
print(discounts[0])     except Exception as e:
#raise e
b=0

If you route it will design all the required details

code

If you need to measure the scraping speed and don’t need to fix up your particular infrastructure, then you can utilize our Flipkart product data crawler to effortlessly scrape millions of URLs at great speed from our crawlers.

If you are looking for the best Flipkart Data Scraping Services, then you can contact Scraping Intelligence for all your queries.

Get in Touch