How to Web Scrape Amazon with Python?

**MyrinNew** · 10-13-2025, 05:56 AM

Want a fast, practical guide to scraping Amazon product data with Python? Here’s a concise walkthrough using requests + BeautifulSoup, with anti-bot tips, pagination, and clean parsing. For a working reference, check the GitHub repo: https://github.com/maivyly52-gif/ama...scraper-python

What You’ll Learn

Send realistic HTTP requests (headers, delays)
Parse titles, prices, ratings, URLs with BeautifulSoup
Handle pagination safely
Reduce blocks with rotating user agents/proxies
Know ethical & legal guardrails

Explore the full example code here: https://github.com/maivyly52-gif/ama...scraper-python

pip install requests beautifulsoup4 fake-useragent

(Proxy support? Add httpx/requests[socks] or a provider SDK.)

Core Steps

1) Build a “human-like” request

import time, random, requests
from fake_useragent import UserAgent

ua = UserAgent()
headers = {
"User-Agent": ua.random,
"Accept-Language": "en-US,en;q=0.9",
}

def fetch(url, *, retries=3, backoff=2):
for i in range(retries):
resp = requests.get(url, headers=headers, timeout=20)
if resp.status_code == 200 and "Robot Check" not in resp.text:
return resp.text
time.sleep(backoff * (i + 1) + random.uniform(0.2, 1.1))
return None

2) Parse product cards

from bs4 import BeautifulSoup

def parse_search(html):
soup = BeautifulSoup(html, "html.parser")
items = []
for card in soup.select("div.s-main-slot div[data-asin][data-component-type='s-search-result']"):
asin = card.get("data-asin")
title_el = card.select_one("h2 a span")
price_whole = card.select_one("span.a-price > span.a-offscreen")
rating = card.select_one("span.a-icon-alt")
link_el = card.select_one("h2 a")
if not (asin and title_el and link_el):
continue
items.append({
"asin": asin,
"title": title_el.get_text(strip=True),
"price": price_whole.get_text(strip=True) if price_whole else None,
"rating": rating.get_text(strip=True) if rating else None,
"url": f"https://www.amazon.com{link_el['href'].split('?')[0]}",
})
return items

3) Walk pagination (carefully)

from urllib.parse import urlencode

def search_amazon(query, pages=1):
base = "https://www.amazon.com/s"
results = []
for page in range(1, pages + 1):
params = {"k": query, "page": page}
html = fetch(f"{base}?{urlencode(params)}")
if not html:
break
results.extend(parse_search(html))
time.sleep(random.uniform(1.2, 3.1)) # be gentle
return results

if __name__ == "__main__":
data = search_amazon("wireless earbuds", pages=2)
for row in data[:5]:
print(row)

Prefer a ready-to-run example? See the repo’s code paths and notes: https://github.com/maivyly52-gif/ama...scraper-python

Anti-Bot Tips (Reduce Blocks)

Rotate User-Agents per request (fake-useragent or a maintained list).
Respectful delays (1–5s jitter) and low concurrency.
Proxies: residential/mobile work best; rotate IPs and subnets.
Fewer parameters in URLs; avoid suspicious patterns.
Fallback strategies: try different storefronts or narrower filters when you hit captchas.

You’ll find a compact starter you can adapt in the GitHub project: https://github.com/maivyly52-gif/ama...scraper-python

Data You Can Extract (Typical)

Title, price, list price, rating, review count
ASIN, product URL, image URL
Badges (e.g., “Best Seller”, “Amazon’s Choice”)
Availability snippets

Legal & Ethical Notes

Check Amazon’s Terms of Use and your local laws before scraping.
Prefer official APIs when possible (e.g., Amazon Product Advertising API) for reliability.
Don’t overload servers; throttle requests and cache results.
Use scraped data only where you have the right to use it.

Next Steps

Turn results into CSV/JSON for analysis.
Add retry with CAPTCHA detection and proxy rotation.
Expand parsing to product detail pages (features, bullets, specs).

Dive deeper, copy the boilerplate, and tweak it for your use case here: https://github.com/maivyly52-gif/ama...scraper-python — and if you find it useful, ⭐ the repo and explore the code examples in https://github.com/maivyly52-gif/ama...scraper-python

More...