There are many circumstances in which a developer may be required to interact with Google (Map) Reviews. Everyone familiar with the Google My Business API is aware that an account ID is required for each location (business) to collect reviews using the Google Review Scraper. Scraping Google reviews can be useful in situations when a developer wants to work with evaluations from different sites (or does not have permission to a Google Business account).
This blog will show you the use of Selenium and BeautifulSoup to collect Google Reviews.
Web scraping is the act of obtaining information from the web. Various libraries may assist you with this, including:
We'll use Selenium to traverse the page, upload more content, and then parse the HTML file with Beautiful Soup to extract reviews using a Google Review Scraper.
Selenium may be installed using pip or conda (package methods):
#Installing with pip pip install selenium#Installing with conda install -c conda-forge selenium
To communicate with the particular browser, Selenium will require a driver.
BeautifulSoup will be used to parse the Html file and retrieve the information we need. (In our case, review text, reviewer, date, and so forth)
BeautifulSoup must be installed.
#Installing with pip pip install beautifulsoup4#Installing with conda conda install -c anaconda beautifulsoup4
To communicate with the particular browser, Selenium will require a driver.
Import and initialize the web driver to get started. Then, using the get function, we will give the Google Maps URL of the place for which we want the reviews:
from selenium import webdriver driver = webdriver.Chrome()#London Victoria & Albert Museum URL url = 'https://www.google.com/maps/place/Victoria+and+Albert+Museum/@51.4966392,-0.17218,15z/data=!4m5!3m4!1s0x0:0x9eb7094dfdcd651f!8m2!3d51.4966392!4d-0.17218' driver.get(url)
The web driver will mostly certainly encounter the consent Google page to accept to cookies before proceeding to the real web destination we provided through URL variable. If that's the situation, we may proceed by clicking the "I agree" option.
To copy the XPath, click inspect anywhere on the page, and then click inspect again on the "I agree" button once the source code frame appears on the right side. Copy XPath from the copy choices by right-clicking (or double-clicking) on the code.
To explore the website, Selenium provides numerous ways; in this case, I used find_element_by_xpath():
driver.find_element_by_xpath('//*[@id="yDmH0d"]/c-wiz/div/div/div/div[2]/div[1]/div[4]/form/div[1]/div/button').click()#to make sure content is fully loaded we can use time.sleep() after navigating to each page import time time.sleep(3)
On Google Maps, there are a few various sorts of profile pages, and on many instances, the location will be presented alongside a lot of other locations listed on the left side, and there will almost certainly be some adverts displayed at the top of the list. The following url, for example, would link us to this page:
url = 'https://www.google.com/maps/search/bicycle+store/@51.5026862,-0.1430242,13z/data=!3m1!4b1'
To avoid being caught on these many sorts of layouts or loading the incorrect page, we will construct an error handler code and move to load reviews.
try: driver.find_element(By.CLASS_NAME, "widget-pane-link").click() except Exception: response = BeautifulSoup(driver.page_source, 'html.parser') # Check if there are any paid ads and avoid them if response.find_all('span', {'class': 'ARktye-badge'}): ad_count = len(response.find_all('span', {'class': 'ARktye-badge'})) li = driver.find_elements(By.CLASS_NAME, "a4gq8e-aVTXAb-haAclf-jRmmHf-hSRGPd") li[ad_count].click() else: driver.find_element(By.CLASS_NAME, "a4gq8e-aVTXAb-haAclf-jRmmHf-hSRGPd").click() time.sleep(5) driver.find_element(By.CLASS_NAME, "widget-pane-link").click()
The code above does the following steps:
This should lead us to the website in which the reviews may be found. However, the first loading would only deliver ten reviews, with another ten added with subsequent scroll. To retrieve all of the reviews for the location, we will calculate how many times we need to scroll and then utilize the chrome driver's execute¬_script() method.
#Find the total number of reviews total_number_of_reviews = driver.find_element_by_xpath('//*[@id="pane"]/div/div[1]/div/div/div[2]/div[2]/div/div[2]/div[2]').text.split(" ")[0] total_number_of_reviews = int(total_number_of_reviews.replace(',','')) if ',' in total_number_of_reviews else int(total_number_of_reviews)#Find scroll layout scrollable_div = driver.find_element_by_xpath('//*[@id="pane"]/div/div[1]/div/div/div[2]')#Scroll as many times as necessary to load all reviews for i in range(0,(round(total_number_of_reviews/10 - 1))): driver.execute_script('arguments[0].scrollTop = arguments[0].scrollHeight', scrollable_div) time.sleep(1)
We can now parse them and get the information we want. (Reviewer, Review Text, Review Rate, Review Sentiment, and so on.) Simply discover the similar class name of the outer item displayed inside the scroll layout, which holds all the data about individual reviews, and use it to extract a list of reviews.
response = BeautifulSoup(driver.page_source, 'html.parser') reviews = response.find_all('div', class_='ODSEW-ShBeI NIyLF-haAclf gm2-body-2')
We can now construct a method to extract useful data from the reviews result set generated by HTML parsing. The code below would accept the answer set and provide a Pandas DataFrame with pertinent extracted data in the Review Rate, Review Time, and Review Text columns.
def get_review_summary(result_set): rev_dict = {'Review Rate': [], 'Review Time': [], 'Review Text' : []} for result in result_set: review_rate = result.find('span', class_='ODSEW-ShBeI-H1e3jb')["aria-label"] review_time = result.find('span',class_='ODSEW-ShBeI-RgZmSc-date').text review_text = result.find('span',class_='ODSEW-ShBeI-text').text rev_dict['Review Rate'].append(review_rate) rev_dict['Review Time'].append(review_time) rev_dict['Review Text'].append(review_text) import pandas as pd return(pd.DataFrame(rev_dict))
This blog provided a basic overview of web scraping google reviews using Python as well as a simple example. Data sourcing might necessitate the use of complicated data collection methods, and it can be a time-consuming and costly task if the proper tools are not used.
If you are looking for web scraping services or extracting Google Reviews using Selenium and BeautifulSoup, contact Scraping Intelligence today or request for a quote!
Zoltan Bettenbuk is the CTO of ScraperAPI - helping thousands of companies get access to the data they need. He’s a well-known expert in data processing and web scraping. With more than 15 years of experience in software development, product management, and leadership, Zoltan frequently publishes his insights on our blog as well as on Twitter and LinkedIn.
Explore our latest content pieces for every industry and audience seeking information about data scraping and advanced tools.
Learn how Web Scraping helps food startups optimize unit economics with real-time data on pricing, reviews & trends to enhance efficiency & profits.
Learn how to extract Google Maps search results without coding using simple tools. Export data like reviews, ratings, and contacts quickly & easily.
Web Scraping Services help retailers solve pricing, inventory & marketing challenges with real-time data insights to boost sales & brand performance.
Find the Best Data Visualization Tools for modern Big Data platforms. Compare Tableau, Power BI, Looker Studio, Qlik Sense & Apache Superset features.