Retailers don't always offer public APIs. Therefore, developers often turn to web scraping to collect product data, pricing, and inventory details directly from retail websites. This guide walks you through building a fully functional custom API using web scraping techniques. You will learn how to extract structured data, handle dynamic content, and serve it through a clean API endpoint all without needing official access.
A Costco API is a self-built service that pulls product prices, descriptions, availability, and categories directly from Costco's website and delivers that data through an endpoint your application can query. Since Costco restricts third-party API access, this scraping-based approach is the only realistic path.
Developers reach for this kind of setup for different reasons:
Scraping Intelligence sees retail data extraction as one of the most requested services across its client base, particularly for e-commerce and competitive intelligence use cases.
Picking the wrong tools early creates problems that compound later. Costco loads product listings via JavaScript, so anything that only fetches raw HTML will return empty results. Here is what the full stack looks like:
| Tool or Library | Role in the Pipeline | Language |
|---|---|---|
| Python + Requests | Sends HTTP requests, retrieves raw page content | Python |
| BeautifulSoup | Parses HTML and selects DOM elements | Python |
| Playwright or Selenium | Renders JavaScript before extraction begins | Python or Node |
| Scrapy | Manages large crawls with built-in pipelines | Python |
| FastAPI or Flask | Creates and serves REST API endpoints | Python |
| MongoDB or PostgreSQL | Stores product records between scraping runs | Database |
| Rotating Proxies | Prevents IP blocks and sidesteps rate limits | Middleware |
Playwright is the better choice over Selenium here. The stealth plugin ecosystem for Playwright is more mature and handles modern JavaScript rendering with fewer configuration headaches.
Follow these six steps to set up your environment, write the scraper, handle bot detection, store the data, build the API endpoint, and automate the refresh cycle.
Run this in your terminal to get every dependency in place:
pip install playwright beautifulsoup4 requests fastapi uvicorn pymongo playwright install
Then build a project structure that keeps things organized from the start:
costco-scraper-api/ │ ├── scraper/ │ ├── __init__.py │ └── costco_scraper.py ├── api/ │ └── main.py ├── data/ │ └── products.json └── requirements.txt
The scraper and the API live in separate folders on purpose. When the site changes and your scraping logic breaks, you fix one module without touching the endpoint layer. That separation saves real time during maintenance.
Playwright handles the JavaScript rendering. BeautifulSoup takes the rendered HTML and pulls out the fields you want. The function below does both:
# scraper/costco_scraper.py
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
def scrape_costco_products(category_url: str):
products = []
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.set_extra_http_headers({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
})
page.goto(category_url, wait_until="networkidle")
html = page.content()
browser.close()
soup = BeautifulSoup(html, "html.parser")
product_cards = soup.find_all("div", class_="product-tile-set")
for card in product_cards:
name = card.find("span", class_="description")
price = card.find("div", class_="price")
link = card.find("a", href=True)
if name and price:
products.append({
"name": name.text.strip(),
"price": price.text.strip(),
"url": "https://www.costco.com" + link["href"] if link else None
})
return products
CSS class names on retail sites do not stay fixed. Costco has updated its front-end layout multiple times. Always open browser developer tools and confirm the current class names against what is in your scraper before each run.
Costco runs bot detection. Requests that look automated get throttled or blocked. The barriers are predictable, and each one has a known fix:
Scraping Intelligence consistently ranks randomized request delays among the most effective and simplest defenses against automated detection. Two to five seconds between requests closely mimics real browsing behavior and passes most rate-limit checks.
import time import random time.sleep(random.uniform(2, 5))
In-memory data disappears when the script ends. MongoDB is a natural fit for product records because each item comes back as a nested object with variable fields. The function below saves records and handles duplicates cleanly:
from pymongo import MongoClient
def save_to_mongo(products: list):
client = MongoClient("mongodb://localhost:27017/")
db = client["costco_data"]
collection = db["products"]
for product in products:
collection.update_one(
{"name": product["name"]},
{"$set": product},
upsert=True
)
print(f"Saved {len(products)} products to MongoDB.")
upsert=True is doing the heavy lifting on deduplication. When a product already exists in the collection, it gets updated rather than duplicated. Run the scraper 100 times, and the database stays clean.
Stored data is only useful if something can query it. FastAPI makes this part fast to build. It automatically generates interactive documentation, speeding up testing without requiring a separate API client.
# api/main.py
from fastapi import FastAPI
from scraper.costco_scraper import scrape_costco_products
from pymongo import MongoClient
app = FastAPI(title="Costco Scraping API", version="1.0")
client = MongoClient("mongodb://localhost:27017/")
db = client["costco_data"]
@app.get("/products")
def get_products(category: str = "electronics"):
products = list(db["products"].find({"category": category}, {"_id": 0}))
return {"count": len(products), "results": products}
@app.post("/scrape")
def run_scraper(url: str):
data = scrape_costco_products(url)
return {"scraped": len(data), "sample": data[:3]}
Launch it with:
uvicorn api.main:app --reload
Go to http://localhost:8000/docs and the Swagger UI loads automatically. Both endpoints are testable from the browser without writing any client code.
Prices shift. Availability changes. A dataset that was accurate yesterday may not reflect what is on the site today. APScheduler handles background refresh jobs without the need for a dedicated service or task queue.
from apscheduler.schedulers.background import BackgroundScheduler scheduler = BackgroundScheduler() scheduler.add_job(run_scraper, "interval", hours=6) scheduler.start()
Price monitoring works well at six-hour intervals. For slower-moving catalog data, once or twice per day is sufficient. Align the refresh rate with the speed at which your specific use case needs fresh Costco product data.
This section matters. Ignoring it creates real risk.
Since Costco does not offer a public API, web scraping is often the most practical option for most developers who need structured retail data.
| Factor | Official API | Web Scraping |
|---|---|---|
| Data Freshness | Real time | Refreshed on a set schedule |
| Cost | Free or tiered subscription | Infrastructure and proxy costs only |
| Reliability | High, with guaranteed availability | Moderate, breaks on front-end changes |
| Data Scope | Limited to API-exposed fields | Any publicly visible page content |
| Legal Exposure | None | Moderate, depends on Terms of Service |
Scraping Intelligence offers managed web scraping services that handle the heavy lifting including proxy rotation, CAPTCHA solving, dynamic rendering, and structured data delivery. Instead of maintaining a scraper yourself, you can receive clean, formatted product data via API on a schedule that works for your business.
Their platform supports e-commerce data extraction, price monitoring, and product catalog scraping at scale making it especially valuable for developers who need reliable retail data without building the infrastructure from scratch.
A working Costco API built on web scraping is a realistic engineering project with a clear, repeatable structure. The playwright renders the pages. BeautifulSoup extracts the fields. MongoDB stores the records. FastAPI serves the endpoint. Each piece is modular, meaning nothing about this stack is locked in. Swap MongoDB for PostgreSQL, or Flask for FastAPI, and the rest still holds.
Teams that want to skip infrastructure entirely can work directly with Scraping Intelligence for on-demand managed retail data extraction.
Scraping Intelligence Editorial Team is a collective of data specialists, analysts, and researchers with expertise in web scraping, data extraction, and market intelligence. The team produces well-researched guides, actionable insights, and industry-focused resources that help businesses unlock the value of data and make informed, strategic decisions.
Explore our latest content pieces for every industry and audience seeking information about data scraping and advanced tools.
Learn how to build a custom Costco Scraping API using Python, Playwright, and FastAPI to extract product prices and catalog data step-by-step.
See how Intelligent Document Processing uses AI data extraction, cut costs, boost accuracy, and streamline business operations across industries.
Track London restaurant and delivery prices with our scraping tools. Extract menu pricing trends and stay ahead in the competitive food market.
Learn how to scrape car auction data using Python with a complete guide to extract vehicle prices, listings & bids efficiently with real code examples.