Back to Blog

How To Scrape And Analyze Competitor Pricing Data?


Catagory
Services
Publish Date
March 25, 2025
Author
Scraping Intelligence
how-to-scrape-and-analyze-competitor-pricing-data
Table Of Content

    In this quickly evolving business world, staying competitive requires tracking price fluctuations regularly to ensure your strategies are updated. Know that manually gathering pricing information consumes a lot of effort and time, which cannot be afforded when you plan to grow your business.

    What Is Competitor Pricing Data Scraping?

    Competitor pricing data scraping is gathering real-time pricing information from the target platforms like websites, ecommerce stores, or marketplaces ethically. This process aims to help businesses monitor and analyze the prices of similar services and products their competitors offer.

    It is easier to make informed decisions about pricing to maintain the right balance and boost profit margins. Also, with the industry's dynamic pricing model, you have a better chance of beating the competition by delivering affordable services to the target customer.

    How To Scrape Competitor Pricing Data With Python?

    The business dynamics continuously shift towards a better strategy with real-time data analysis. Here are some simple steps to scrape and compare prices of the same products on different platforms:

    Project Setup

    Some libraries that we require for the scraping:

    httpx

    This is used for sending HTTP requests to the webpages and gathering data as HTML.

    parsel

    This library helps in parsing HTML and collecting data using CSS and XPath selectors.

    asyncio

    It will ensure that your scrapers run asynchronously, which boosts the speed of web scraping.

    loguru

    It helps in monitoring and logging the competitor price tracker.

    As asyncio is pre-installed in Python, it is essential to install the rest of the libraries using this command:

    pip install httpx parsel loguru

    Web Scraping Prices

    We will be scraping data from three competitors, BestBuy, Walmart, and Amazon, to compare PlayStation 5 prices. The keyword for each product search will be “PS5 Digital Edition.” Let us start scraping the data from Walmart.

    Scraping Walmar
    import urllib.parse
    import asyncio
    import json
    from httpx import AsyncClient, Response
    from parsel import Selector
    from typing import Dict, List
    from loguru import logger as log
    
    # create an HTTP client with headers that look like a real web browser
    client = AsyncClient(
        headers={
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
            "Accept-Encoding": "gzip, deflate, br",
            "Accept-Language": "en-US,en;q=0.9,lt;q=0.8,et;q=0.7,de;q=0.6",
        },
        follow_redirects=True,
        http2=True
    )
    
    async def scrape_walmart(search_query: str) -> List[Dict]:
        """scrape Walmart search pages"""
    
        def parse_walmart(response: Response) -> List[Dict]:
            """parse Walmart search pages"""
            selector = Selector(response.text)
            data = []
            product_box = selector.xpath("//div[@data-testid='item-stack']/div[1]")
            link = product_box.xpath(".//a[@link-identifier]/@link-identifier").get()
            title = product_box.xpath(".//a[@link-identifier]/span/text()").get()
            price = product_box.xpath(".//div[@data-automation-id='product-price']/span/text()").get()
            price = float(price[price.find("$")+1: -1]) if price else None
            rate = product_box.xpath(".//span[@data-testid='product-ratings']/@data-value").get()
            review_count = product_box.xpath(".//span[@data-testid='product-reviews']/@data-value").get()
            data.append({
                    "link": "https://www.walmart.com/ip/" + link,
                    "title": title,
                    "price": price,
                    "rate": float(rate) if rate else None,
                    "review_count": int(review_count) if review_count else None
                })
            return data
        
        search_url = "https://www.walmart.com/search?q=" + urllib.parse.quote_plus(search_query) + "&sort=best_seller"
        response = await client.get(search_url)
        if response.status_code == 403:
            raise Exception("Walmart requests are blocked")       
        data = parse_walmart(response)
        log.success(f"scraped {len(data)} products from Walmart")
        return data
    
    Run The Code
    async def run():
        data = await scrape_walmart(
            search_query="PS5 digital edition"
        )
        # print the data in JSON format
        print(json.dumps(data, indent=2))
    
    if __name__=="__main__":
        asyncio.run(run())
    
    Functions

    In the above code, we defined two functions:

    • scrape_walmart() requests the category page and gathers HTML from Walmart.
    • parse_walmart() will help parse the extracted HTML to collect each product's price, title, link, review count, and rate.
    Scraping Amazon
    import urllib.parse
    import asyncio
    import json
    from httpx import AsyncClient, Response
    from parsel import Selector
    from typing import Dict, List
    from loguru import logger as log
    
    # create HTTP client with headers that look like a real web browser
    client = AsyncClient(
        headers={
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
            "Accept-Encoding": "gzip, deflate, br",
            "Accept-Language": "en-US,en;q=0.9,lt;q=0.8,et;q=0.7,de;q=0.6",
        },
        follow_redirects=True,
        http2=True
    )
    
    async def scrape_amazon(search_query: str) -> List[Dict]:
        """scrape Amazon search pages"""
    
        def parse_amazon(response: Response) -> List[Dict]:
            """parse Amazon search pages"""
            selector = Selector(response.text)
            data = []
            product_box = selector.xpath("//div[contains(@class, 'search-results')]/div[@data-component-type='s-search-result']")
            product_id = product_box.xpath(".//div[@data-cy='title-recipe']/h2/a[contains(@class, 'a-link-normal')]/@href").get().split("/dp/")[-1].split("/")[0]
            title = product_box.xpath(".//div[@data-cy='title-recipe']/h2/a/span/text()").get()
            price = product_box.xpath(".//span[@class='a-price']/span/text()").get()
            price = float(price.replace("$", "")) if price else None
            rate = product_box.xpath(".//span[contains(@aria-label, 'stars')]/@aria-label").re_first(r"(\d+\.*\d*) out")
            review_count = product_box.xpath(".//div[contains(@data-csa-c-content-id, 'ratings-count')]/span/@aria-label").get()
            data.append({
                    "link": f"https://www.amazon.com/dp/{product_id}",
                    "title": title,
                    "price": price,
                    "rate": float(rate) if rate else None,
                    "review_count": int(review_count.replace(',','')) if review_count else None,
                })
            return data
        
        search_url = "https://www.amazon.com/s?k=" + urllib.parse.quote_plus(search_query)
        response = await client.get(search_url)
        if response.status_code == 403 or 503:
            raise Exception("Amazon requests are blocked")   
        data = parse_amazon(response)
        log.success(f"scraped {len(data)} products from Amazon")
        return data
    
    Run The Code
    async def run():
      amazon_data = await scrape_amazon(
            search_query="PS5 digital edition"
        )
        # print the data in JSON format
        print(json.dumps(amazon_data, indent=2, ensure_ascii=False))
    
    if __name__=="__main__":
        asyncio.run(run())
    
    Scraping BestBuy
    import urllib.parse
    import asyncio
    import json
    from httpx import AsyncClient, Response
    from parsel import Selector
    from typing import Dict, List
    from loguru import logger as log
    
    # create an HTTP client with headers that look like a real web browser
    client = AsyncClient(
        headers={
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
            "Accept-Encoding": "gzip, deflate, br",
            "Accept-Language": "en-US,en;q=0.9,lt;q=0.8,et;q=0.7,de;q=0.6",
        },
        follow_redirects=True,
        http2=True
    )
    
    async def scrape_bestbuy(search_query: str) -> List[Dict]:
        """scrape BestBuy search pages"""
    
        def parse_bestbuy(response: Response) -> List[Dict]:
            """parse BestBuy search pages"""
            selector = Selector(response.text)
            data = []
            product_box = selector.xpath("//ol[contains(@class, 'sku-item-list')]/li[@class='sku-item']")
            product_id = product_box.xpath(".//h4[@class='sku-title']/a/@href").get().split("?skuId=")[-1]
            title = product_box.xpath(".//h4[@class='sku-title']/a/text()").get()
            price = product_box.xpath(".//div[contains(@class, 'priceView')]/span/text()").get()
            price = float(price.replace("$", "")) if price else None
            rate = product_box.xpath(".//div[contains(@class, 'ratings-reviews')]/p/text()").get()
            review_count = product_box.xpath(".//span[@class='c-reviews ']/text()").get()
            data.append({
                    "link": f"https://www.bestbuy.com/site/{product_id}.p",
                    "title": title,
                    "price": price,
                    "rate": float(rate.split()[1]) if rate else None,
                    "review_count": int(review_count[1:-1].replace(",", "")) if review_count else None
                })
            return data
        
        search_url = "https://www.bestbuy.com/site/searchpage.jsp?st=" + urllib.parse.quote_plus(search_query)
        response = await client.get(search_url)
        if response.status_code == 403:
            raise Exception("BestBuy requests are blocked")   
        data = parse_bestbuy(response)
        log.success(f"scraped {len(data)} products from BestBuy")
        return data
    
    Run The Code
    async def run():
    bestbuy_data = await scrape_bestbuy(
            search_query="PS5 digital edition"
    
        )
        # print the data in JSON format
        print(json.dumps(bestbuy_data, indent=2, ensure_ascii=False))
    
    if __name__=="__main__":
        asyncio.run(run())
    

    Combine Scraping Logic

    In this step, we will combine all the scraping logic to use it for tracking competitors' pricing:

    async def track_competitor_prices(
            search_query: str
        ):
        """scrape products from different competitors"""
        data = {}
        data["walmart"] = await scrape_walmart(
            search_query=search_query
        )
        data["amazon"] = await scrape_amazon(
            search_query=search_query
        )
        data["bestbuy"] = await scrape_bestbuy(
            search_query=search_query
        )
        product_count = sum(len(products) for products in data.values())
        log.success(f"successfully scraped {product_count} products")
        # save the results into a JSON file
        
        with open("data.json", "w", encoding="utf-8") as file:
            json.dump(data, file, indent=2, ensure_ascii=False)
    
    async def run():
        await track_competitor_prices(
            search_query="PS5 digital edition"
        )
    
    if __name__=="__main__":
        asyncio.run(run())
    

    Output File

    Output File

    {
      "walmart": [
        {
          "link": "https://www.walmart.com/ip/5113183757",
          "title": "Sony PlayStation 5 (PS5) Digital Console Slim",
          "price": 449.0,
          "rate": 4.6,
          "review_count": 369
        }
      ],
      "amazon": [
        {
          "link": "https://www.amazon.com/dp/B0CL5KNB9M",
          "title": "PlayStation®5 Digital Edition (slim)",
          "price": 449.0,
          "rate": 4.7,
          "review_count": 2521
        }
      ],
      "bestbuy": [
        {
          "link": "https://www.bestbuy.com/site/6566040.p",
          "title": "Sony - PlayStation 5 Slim Console Digital Edition - White",
          "price": 449.99,
          "rate": 4.8,
          "review_count": 769
        }
      ]
    }
    

    Compare Pricing From Competitors

    The web scraping product data process will help manually analyze the insights to understand competitors' performance. A simple monitoring function to analyze the information:

    def generate_insights(data):
        """analyze the data for insight values"""
    
        def calculate_average(lst):
            # Calculate the averages
            non_none_values = [value for value in lst if value is not None]
            return round(sum(non_none_values) / len(non_none_values), 2) if non_none_values else None
    
        # Extract all products across competitors
        all_products = [product for products in data.values() for product in products]
    
        # Calculate overall averages
        overall_average_price = calculate_average([product["price"] for product in all_products])
        overall_average_rate = calculate_average([product["rate"] for product in all_products])
        overall_average_review_count = calculate_average([product["review_count"] for product in all_products])
    
        # Find the lowest priced, highest reviewed, highest priced, and highest rated products across all competitors
        lowest_priced_product = min(all_products, key=lambda x: x["price"])
        highest_reviewed_product = max(all_products, key=lambda x: x.get("review_count", 0) if x.get("review_count") is not None else 0)
        highest_priced_product = max(all_products, key=lambda x: x["price"])
        highest_rated_product = max(all_products, key=lambda x: x["rate"])
    
        # Extract website names for each product
        website_names = {retailer: products[0]["link"].split(".")[1] for retailer, products in data.items()}
    
        insights = {
            "Overall Average Price": overall_average_price,
            "Overall Average Rate": overall_average_rate,
            "Overall Average Review Count": overall_average_review_count,
            "Lowest Priced Product": {
                "Product": lowest_priced_product,
                "Competitor": website_names.get(lowest_priced_product["link"].split(".")[1])
            },
            "Highest Priced Product": {
                "Product": highest_priced_product,
                "Competitor": website_names.get(highest_priced_product["link"].split(".")[1])
            },
            "Highest Rated Product": {
                "Product": highest_rated_product,
                "Competitor": website_names.get(highest_rated_product["link"].split(".")[1])
            },                
            "Highest Reviewed Product": {
                "Product": highest_reviewed_product,
                "Competitor": website_names.get(highest_reviewed_product["link"].split(".")[1])
            }
        }
    
        # Save the insights to a JSON file
        with open("insights.json", "w") as json_file:
            json.dump(insights, json_file, indent=2, ensure_ascii=False)
    

    We have introduced generate_insights function, which will calculate various metrics like:

    • The lowest and highest price for products.
    • Average price, review count, and rate.
    • Top product in review count and rate.

    Final Output

    The insight data below helps represent statistics and numbers to make it easier for analysis. Now, you can successfully compare product prices from various competitors:

    {
      "Overall Average Price": 449.33,
      "Overall Average Rate": 4.7,
      "Overall Average Review Count": 1219.67,
      "Lowest Priced Product": {
        "Product": {
          "link": "https://www.walmart.com/ip/5113183757",
          "title": "Sony PlayStation 5 (PS5) Digital Console Slim",
          "price": 449.0,
          "rate": 4.6,
          "review_count": 369
        },
        "Competitor": "walmart"
      },
      "Highest Priced Product": {
        "Product": {
          "link": "https://www.bestbuy.com/site/6566040.p",
          "title": "Sony - PlayStation 5 Slim Console Digital Edition - White",
          "price": 449.99,
          "rate": 4.8,
          "review_count": 769
        },
        "Competitor": "bestbuy"
      },
      "Highest Rated Product": {
        "Product": {
          "link": "https://www.bestbuy.com/site/6566040.p",
          "title": "Sony - PlayStation 5 Slim Console Digital Edition - White",
          "price": 449.99,
          "rate": 4.8,
          "review_count": 769
        },
        "Competitor": "bestbuy"
      },
      "Highest Reviewed Product": {
        "Product": {
          "link": "https://www.amazon.com/dp/B0CL5KNB9M",
          "title": "PlayStation 5 Digital Edition (slim)",
          "price": 449.0,
          "rate": 4.7,
          "review_count": 2521
        },
        "Competitor": "amazon"
      }
    }
    

    What Are The Challenges Of Scraping Competitor Pricing Data?

    Competing in this dynamic market comes with hurdles and requires advanced solutions to handle them efficiently. Here are some common challenges you might face while extracting and analyzing competitor pricing data:

    Real-Time Data

    Pricing on ecommerce websites will change frequently based on the stock level, demand changes, and competitive pricing. It makes gathering information in real-time or often even in a few hours technically challenging.

    Complex Page Structure

    Product pages on online stores have various information beyond just the pricing, including product descriptions, reviews, related products, ratings, and more. This requires an advanced scraper to identify and extract the specific data per your requirements.

    Price Changes

    Many ecommerce sellers use dynamic pricing, where costs will change based on browsing history, location, active time zone, and market fluctuations. It is difficult to account for these changes and make an updated pricing model for a profitable business.

    Location-Sensitive

    The price changes depend on the geographic location due to various tax rates, regional strategies, and shipping costs. This makes it essential to have scrapers that simulate being in different locations using VPNs or proxy servers.

    Product Variations

    Ecommerce platforms often sell the same products in different variations like size, color, packaging, or seller with their prices. Significantly, your competitor's data scraping tool captures the accurate information to offer the correct prices.

    What Are The Benefits Of Analyzing Extracted Competitor Pricing Data?

    Competitor price data scraping has become essential for businesses looking to effortlessly beat the competition and gain better returns. Here are some reasons you must invest in scraping pricing data from your competitors:

    Informed Pricing Model

    Know competitor prices to set the ideal pricing model to attract the target audience’s attention. If a competitor constantly offers deals, your business can seize the opportunity to provide attractive discounts.

    Trend Analysis

    Gathering pricing information for a particular period will help determine the common patterns and seasonal changes. Using advanced professional scraping tools, this data analysis will help business owners anticipate seasonal changes and adjust pricing accordingly.

    Stock Management

    Monitoring the stock levels of the competitors will help you set the right price in your inventory. Maintaining the right balance of popular and less demanding products becomes effortless while delivering the best customer service.

    What’s Next?

    We have shared essential insights into scraping and analyzing competitor pricing data with the help of professionals. At Scraping Intelligence, you can access advanced technologies and the latest strategies to gather the newest information from your competitors.

    Know that web scraping is a powerful solution for performing competitive analysis while extracting valuable and updated information from the target websites. We respect data privacy and terms of service to ensure our ethical responsibilities while extracting information.


    About the author


    Zoltan Bettenbuk

    Zoltan Bettenbuk is the CTO of ScraperAPI - helping thousands of companies get access to the data they need. He’s a well-known expert in data processing and web scraping. With more than 15 years of experience in software development, product management, and leadership, Zoltan frequently publishes his insights on our blog as well as on Twitter and LinkedIn.

    Latest Blog

    Explore our latest content pieces for every industry and audience seeking information about data scraping and advanced tools.

    web-scraping-using-python-a-step-by-step-tutorial-guide-2025
    Services
    08 July 2025
    Web Scraping Using Python: A Step-By-Step Tutorial Guide (2025)

    No matter what industry you belong to, web scraping helps extract insights from industry datasets. It is a systematic process of getting data from online sources, top-ranking websites, popular platforms, and databases.

    guide-to-alcohol-data-scraping-pricing-trends-and-legal-risks
    Services
    24 Jun 2025
    The Ultimate Guide to Alcohol Data Scraping: Pricing, Trends & Legal Risks

    Learn how to scrape alcohol pricing & market trends safely. Explore legal risks, best tools, and strategies for extracting beverage industry data efficiently.

    The Complete Guide to Web Scraping
    Google
    19 Jun 2025
    How to Scrape Google Shopping for Price and Product Data?

    Learn how to collect real-time data from Google Shopping, which has an array of products and simple steps to scrape price and product data from Google Shopping.