Table Of Content
    Back to Blog

    How to Scrape Amazon Best Sellers Using Python?

    how-does-python-help-in-scraping-amazon-best-sellers
    Category
    E-commerce & Retail
    Publish Date
    March 30, 2026
    Author
    Scraping Intelligence

    Product research on Amazon used to mean hours of manual browsing. Now, a well-written Python script can pull hundreds of ranked product records in minutes. The Amazon Best Sellers list is one of the most data-rich pages on the internet for e-commerce research. Rankings shift every hour based on real sales, not estimates, which gives the data a freshness that most paid market tools cannot match.

    This guide is written for developers who want working code and a clear understanding of what is involved. It covers library selection, two complete scraper implementations, pagination, anti-detection, and data storage. Businesses that prefer managed delivery without maintaining scrapers can use Scraping Intelligence to access structured Amazon product data on a set schedule.

    What Is Amazon Best Sellers Data and Why Does It Matter?

    Every product category on Amazon carries a Best Sellers page. The products listed there are ranked by recent purchase volume, and the rankings refresh every hour. That frequency is what makes the data genuinely useful. A product jumping from rank 80 to rank 12 inside a single day tells you something a weekly sales report never could.

    Each product record on a Best Sellers page contains several extractable fields:

    • Product title and ASIN: Amazon assigns a unique ASIN to every listing; this is the identifier used for all downstream product lookups
    • Best Seller Rank (BSR): the numeric position within the category at the exact time of scraping
    • Listed price and any visible discount or promotional badge on the card
    • Average star rating and the total number of customer reviews
    • Brand name and the primary product image URL
    • The category and subcategory path are visible in the breadcrumb trail above the listings

    What Tools Do You Need to Scrape Amazon Best Sellers Data Using Python?

    Tool selection depends on scale and frequency. A single-category job that runs once a week needs something different from a daily pipeline covering 30 categories across multiple marketplaces. The table below covers the standard options and their roles.

    Library or Tool Role When to Use It
    requests HTTP client Lightweight scraping of static page content
    BeautifulSoup4 HTML parser Navigating product cards and extracting field values
    Selenium or Playwright Browser automation Pages where price or review data loads via JavaScript
    Scrapy Full crawl framework Multi-category pipelines that run on a schedule
    Rotating proxy service IP management Distributing requests to avoid IP-level blocks
    pandas Data processing Cleaning, deduplicating, and exporting scraped records
    fake-useragent Header rotation Cycling browser fingerprints across requests

    How to Scrape Amazon Best Sellers Using Python: Full Working Code?

    The scraper below is built on requests and BeautifulSoup. It rotates user agents, retries on failure with increasing wait times, spaces requests to stay under detection thresholds, and exports results to CSV. This is the starting point the Scraping Intelligence team uses for single-category Amazon Best Sellers extraction before scaling up.

    Method 1: Requests and BeautifulSoup

    import requests
    from bs4 import BeautifulSoup
    import pandas as pd
    import time
    import random
    from fake_useragent import UserAgent
    
    
    class AmazonBestSellerScraper:
        # Scraping Intelligence | scrapingintelligence.com
    
        def __init__(self):
            self.ua       = UserAgent()
            self.session  = requests.Session()
            self.products = []
    
        def get_headers(self):
            return {
                'User-Agent':      self.ua.random,
                'Accept-Language': 'en-US,en;q=0.9',
                'Accept-Encoding': 'gzip, deflate, br',
                'Accept':          'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
                'Connection':      'keep-alive',
                'Referer':         'https://www.amazon.com',
                'DNT':             '1'
            }
    
        def fetch_page(self, url, retries=3):
            for attempt in range(retries):
                try:
                    resp = self.session.get(url, headers=self.get_headers(), timeout=15)
                    if resp.status_code == 200:
                        return resp.text
                    elif resp.status_code == 503:
                        print(f'Rate limited. Waiting before attempt {attempt + 1}...')
                        time.sleep(random.uniform(5, 15))
                except requests.RequestException as e:
                    print(f'Request error on attempt {attempt + 1}: {e}')
                    time.sleep(random.uniform(3, 8))
            return None
    
        def parse_products(self, html):
            soup  = BeautifulSoup(html, 'lxml')
            cards = soup.select('.zg-grid-general-faceout')
            for card in cards:
                try:
                    rank_el    = card.select_one('.zg-bdg-text')
                    title_el   = card.select_one('._cDEzb_p13n-sc-css-line-clamp-3_g3dy1')
                    if not title_el:
                        title_el = card.select_one('.p13n-sc-truncate-desktop-type2')
                    price_el   = card.select_one('.p13n-sc-price')
                    rating_el  = card.select_one('.a-icon-alt')
                    reviews_el = card.select_one('.a-size-small')
                    link_el    = card.select_one('a.a-link-normal')
                    asin = ''
                    if link_el and '/dp/' in (link_el.get('href') or ''):
                        asin = link_el['href'].split('/dp/')[1].split('/')[0].split('?')[0]
                    self.products.append({
                        'rank':    rank_el.text.strip()    if rank_el    else 'N/A',
                        'title':   title_el.text.strip()   if title_el   else 'N/A',
                        'price':   price_el.text.strip()   if price_el   else 'N/A',
                        'rating':  rating_el.text.strip()  if rating_el  else 'N/A',
                        'reviews': reviews_el.text.strip() if reviews_el else 'N/A',
                        'asin':    asin,
                        'url':     f'https://www.amazon.com/dp/{asin}' if asin else 'N/A'
                    })
                except Exception as e:
                    print(f'Skipped card: {e}')
    
        def scrape_category(self, category_url, max_pages=2):
            for page in range(1, max_pages + 1):
                if page == 1:
                    url = category_url
                else:
                    url = f'{category_url}ref=zg_bs_pg_{page}?_encoding=UTF8&pg={page}'
                print(f'Page {page}: {url}')
                html = self.fetch_page(url)
                if html:
                    self.parse_products(html)
                    wait = random.uniform(2, 5)
                    print(f'  Sleeping {wait:.1f}s...')
                    time.sleep(wait)
    
        def save_to_csv(self, filename='best_sellers.csv'):
            df = pd.DataFrame(self.products)
            df.drop_duplicates(subset='asin', inplace=True)
            df.to_csv(filename, index=False, encoding='utf-8-sig')
            print(f'Saved {len(df)} products to {filename}')
    
    
    if __name__ == '__main__':
        scraper = AmazonBestSellerScraper()
        scraper.scrape_category(
            'https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/',
            max_pages=2
        )
        scraper.save_to_csv('electronics_best_sellers.csv')
    

    Method 2: Scrapy Spider for Volume Extraction

    Once a project grows past a handful of categories or needs to run on a daily schedule, Scrapy is the right move. The request queue manages concurrency, the pipeline system handles export, and middleware lets you plug in proxy rotation without touching the core spider logic. Scraping Intelligence uses Scrapy for any Amazon data extraction job covering four or more categories regularly.

    import scrapy
    import re
    
    
    class BestSellersSpider(scrapy.Spider):
        name = 'best_sellers'
        custom_settings = {
            'DOWNLOAD_DELAY':           3,
            'RANDOMIZE_DOWNLOAD_DELAY': True,
            'DEFAULT_REQUEST_HEADERS': {
                'User-Agent':      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
                'Accept-Language': 'en-US,en;q=0.9',
            },
            'FEEDS': {
                'best_sellers.csv': {'format': 'csv', 'overwrite': True}
            }
        }
        start_urls = [
            'https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/',
            'https://www.amazon.com/Best-Sellers-Books/zgbs/books/',
        ]
        def parse(self, response):
            for card in response.css('.zg-grid-general-faceout'):
                title = card.css('._cDEzb_p13n-sc-css-line-clamp-3_g3dy1::text').get()
                if not title:
                    title = card.css('.p13n-sc-truncate-desktop-type2::text').get()
                link       = card.css('a.a-link-normal::attr(href)').get()
                asin_match = re.search(r'/dp/([A-Z0-9]{10})', link or '')
                yield {
                    'rank':   card.css('.zg-bdg-text::text').get('N/A').strip(),
                    'title':  title.strip() if title else 'N/A',
                    'price':  card.css('.p13n-sc-price::text').get('N/A').strip(),
                    'rating': card.css('.a-icon-alt::text').get('N/A').strip(),
                    'asin':   asin_match.group(1) if asin_match else '',
                    'source': response.url
                }
            next_page = response.css('li.a-last a::attr(href)').get()
            if next_page:
                yield response.follow(next_page, callback=self.parse)
    
    # Run from terminal:
    # scrapy runspider best_sellers_spider.py -o output.csv
    

    How Do You Avoid Getting Blocked While Scraping Amazon?

    Amazon's detection system looks at multiple signals in parallel: request rate, header patterns, IP history, and behavioral consistency across sessions. Fixing one layer without the others gives inconsistent results. What works reliably in production:

    • Use residential proxies, not datacenter IPs: Datacenter addresses carry higher bot-association scores. Residential pools from providers like Bright Data or Oxylabs produce traffic profiles that pass Amazon's scoring much more cleanly.
    • Make the time of requests random: A clear sign of machine behavior is a steady rhythm. The Python random.uniform() function generates random delays that don't follow a specific pattern.
    • Change your browser's fingerprints every time you make a request: The fake-useragent library does this automatically. Using the same user-agent string across many queries makes it easy to notice if something is wrong.
    • Use a headless browser when content loads via JavaScript: Playwright and Selenium render pages the way a real browser does. Some price and review fields only appear after JavaScript execution, so plain HTTP requests return incomplete data.
    • Automate CAPTCHA resolution: 2Captcha and Anti-Captcha integrate via Python client libraries. Both return a solved token within a few seconds. The better long-term fix is preventing CAPTCHAs from triggering through lower request rates and better proxy quality.
    • Offload to a scraping API if infrastructure overhead is a concern: ScraperAPI and ZenRows bundle proxy management and CAPTCHA handling into a single endpoint. That removes the need to maintain the infrastructure yourself.

    What Data Fields Can You Extract from Amazon Best Sellers?

    The table below maps extractable fields to their location in the page HTML and notes the primary business use case for each one. CSS selectors occasionally change when Amazon updates its frontend, so validating them against the live page before any major run is a good habit.

    Data Field CSS Selector or Source Primary Use
    Best Seller Rank .zg-bdg-text Track rank movement over time
    Product Title ._cDEzb_p13n-sc-css-line-clamp-3_g3dy1 Keyword Research and Competitive Analysis
    ASIN Parsed from /dp/ in href Unique product ID for downstream API calls
    Price .p13n-sc-price Price Monitoring and repricing decisions
    Star Rating .a-icon-alt Quality filter across product categories
    Review Count .a-size-small Demand signal and social proof measure
    Product Image URL src attribute of .s-image Visual catalog building and ML training sets
    Brand Name .a-size-small.a-color-base Brand share and market concentration analysis
    Category Path Breadcrumb elements Taxonomy mapping and segment classification

    How to Handle Pagination in Amazon Best Sellers data?

    Amazon displays 50 products per page and provides one additional page of 50, giving 100 products per category total. Subcategories carry their own independent Best Sellers lists with the same two-page structure. The pagination URL pattern is predictable and easy to generate programmatically.

    def scrape_all_pages(self, base_url):
        all_products = []
        for page_num in range(1, 3):   # Pages 1 and 2
            if page_num == 1:
                url = base_url
            else:
                url = f"{base_url}ref=zg_bs_pg_{page_num}?_encoding=UTF8&pg={page_num}"
            print(f'Fetching page {page_num}: {url}')
            html = self.fetch_page(url)
            if html:
                self.parse_products(html)
                time.sleep(random.uniform(2.0, 4.5))
        return all_products
    

    How to Store and Export the Scraped Data?

    Storage format is dictated by what happens to the data after extraction. Scraping Intelligence delivers CSV for clients doing spreadsheet analysis and JSON or database format for teams feeding the data into internal tools or APIs. Both options are covered below.

    Export to CSV

    def save_to_csv(products, filename='best_sellers.csv'):
        df = pd.DataFrame(products)
        df.drop_duplicates(subset='asin', inplace=True)
        df['scraped_at'] = pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')
        df.to_csv(filename, index=False, encoding='utf-8-sig')
        print(f'Exported {len(df)} records to {filename}')
    

    Save to SQLite for Historical Rank Tracking

    import sqlite3
    
    def save_to_sqlite(products, db_name='amazon_data.db'):
        df = pd.DataFrame(products)
        df['scraped_at'] = pd.Timestamp.now()
        conn = sqlite3.connect(db_name)
        df.to_sql('best_sellers', conn, if_exists='append', index=False)
        conn.close()
        print(f'Saved {len(df)} records to {db_name}')
    

    Common Errors When Scraping Amazon and How to Fix Them

    These are the errors that show up repeatedly across Amazon scraping projects. Each one has a direct fix.

    Error Root Cause Fix
    503 Service Unavailable Request rate too high Add random delays; rotate proxy IPs
    Empty product list CSS selectors are outdated Inspect live page HTML and update selectors
    CAPTCHA page returned Bot pattern detected by Amazon Slow requests; integrate CAPTCHA solver API
    Price field missing Price loads via JavaScript Switch to Playwright or Selenium for rendering
    Duplicate records in output Pagination pages overlap on shared listings Deduplicate on the ASIN field using pandas
    Connection timeout Network or proxy failure Add retry logic with progressive backoff

    Scraping Amazon Best Sellers Across Multiple Marketplaces

    Amazon runs separate storefronts in more than ten countries. The Best Sellers URL structure is identical across all of them, so scraping international Amazon product data is mostly a matter of swapping the base domain. All the selector and pagination logic carries over without modification.

    MARKETPLACES = {
        'US': 'https://www.amazon.com/Best-Sellers/zgbs/',
        'UK': 'https://www.amazon.co.uk/Best-Sellers/zgbs/',
        'DE': 'https://www.amazon.de/Best-Sellers/zgbs/',
        'IN': 'https://www.amazon.in/gp/bestsellers/',
        'JP': 'https://www.amazon.co.jp/Best-Sellers/zgbs/',
        'CA': 'https://www.amazon.ca/Best-Sellers/zgbs/',
        'FR': 'https://www.amazon.fr/Best-Sellers/zgbs/',
        'IT': 'https://www.amazon.it/Best-Sellers/zgbs/',
        'ES': 'https://www.amazon.es/Best-Sellers/zgbs/',
        'AU': 'https://www.amazon.com.au/Best-Sellers/zgbs/'
    }
    
    for market, base_url in MARKETPLACES.items():
        url = f'{base_url}electronics/'
        print(f'Scraping {market}: {url}')
        time.sleep(random.uniform(3, 7))
    

    Start Your Custom Data Scraping Project

    Talk to Data Experts

    Conclusion

    The techniques covered here give you everything needed to extract Amazon Best Sellers data reliably: a working single-category scraper, a Scrapy implementation for larger jobs, pagination logic, detection countermeasures, and storage options. The BeautifulSoup approach is the right starting point. Scrapy takes over once the scope or frequency grows beyond what a simple script can sustain.

    Teams that need Amazon product data on a consistent schedule but do not own the extraction infrastructure can contact Scraping Intelligence. The service covers Amazon Best Sellers scraping across all major categories and marketplaces, with hourly delivery, custom schemas, and selector maintenance included.


    Frequently Asked Questions


    What is the best Python library for scraping Amazon Best Sellers? +
    BeautifulSoup4 with requests covers single-category jobs well. For multi-category Amazon Best Sellers scraping that runs on a schedule, Scrapy handles concurrency and export far more efficiently than anything built manually around a requests loop.
    How often does Amazon update its Best Sellers rankings? +
    Every hour, the rankings of the best-selling items on Amazon change. A scraper that runs once a day gets one data point out of 24. If the use case is to catch short-term changes in demand or rank volatility, set up hourly runs.
    Can you scrape Amazon without getting blocked? +
    Yes, you can set this up properly. Start with rotating residential proxies, varying latency, and realistic browser headers. For high-volume tasks, adding a CAPTCHA solver or using a managed service like Scraping Intelligence will help complete the job.
    Does Amazon have an official API for Best Sellers data? +
    You can get certain Best Sellers data using the Product Advertising API (PA-API 5.0), but you need an approved affiliate account and the rate limits are strict. Custom scraping is the best solution for higher-frequency or wider extraction.
    How many products can I scrape per Best Sellers category? +
    Amazon shows up to 100 items in each category on two pages. There is a separate Best Sellers list for each subcategory that has the same structure. This means that the total amount of scrapable content across the entire taxonomy is far larger than 100.
    Is Scrapy better than BeautifulSoup for Amazon scraping? +
    For volume and scheduling, Scrapy is clearly better. BeautifulSoup is faster to prototype with and practical for smaller Amazon product scraper projects. Most teams start with BeautifulSoup and move to Scrapy once the scope expands past what a single-threaded requests loop can handle efficiently.

    About the Author


    Scraping Intelligence

    Scraping Intelligence Editorial Team is a collective of data specialists, analysts, and researchers with expertise in web scraping, data extraction, and market intelligence. The team produces well-researched guides, actionable insights, and industry-focused resources that help businesses unlock the value of data and make informed, strategic decisions.

    Latest Blog

    Explore our latest content pieces for every industry and audience seeking information about data scraping and advanced tools.

    scrape-bank-credit-card-offer-data
    E-Commerce & Retail
    22 Apr 2026
    How to Scrape Bank and Credit Card Offers from Retailers’ Websites?

    Learn how to scrape bank and credit card offers from retailer websites to extract deals, cashback, reward points, promo codes & EMI offers with ease.

    Food & Restaurant
    Apr 17, 2026
    The Ultimate Guide to Deliveroo Data Scraping for UK Market Insights

    Deliveroo Data Scraping for UK restaurant menus, prices, and reviews. Get valuable insights for competitor price tracking and market trends.

    extract-amazon-author-data-python
    E-Commerce & Retail
    13 Apr 2026
    Stop Copy-Pasting: Automate Author Name Extraction from Amazon Using Python

    Stop copy-pasting Amazon Author Data manually. Use Python to automate author data extraction fast, and scale your web scraping effortlessly.

    healthcare-data-extraction-use-cases
    Hospital & Healthcare
    07 Apr 2026
    Top 7 Use Cases of Healthcare Data Extraction Explained Simply

    Learn how healthcare data extraction turns billing info into structured insights to improve patient care & reduce high operational costs effectively.