Want a fast, practical guide to scraping Amazon product data with Python? Here’s a concise walkthrough using requests + BeautifulSoup, with anti-bot tips, pagination, and clean parsing. For a working reference, check the GitHub repo: https://github.com/maivyly52-gif/ama...scraper-python
What You’ll Learn
Explore the full example code here: https://github.com/maivyly52-gif/ama...scraper-python
pip install requests beautifulsoup4 fake-useragent
(Proxy support? Add httpx/requests[socks] or a provider SDK.)
Core Steps
1) Build a “human-like” request
import time, random, requests
from fake_useragent import UserAgent
ua = UserAgent()
headers = {
"User-Agent": ua.random,
"Accept-Language": "en-US,en;q=0.9",
}
def fetch(url, *, retries=3, backoff=2):
for i in range(retries):
resp = requests.get(url, headers=headers, timeout=20)
if resp.status_code == 200 and "Robot Check" not in resp.text:
return resp.text
time.sleep(backoff * (i + 1) + random.uniform(0.2, 1.1))
return None
2) Parse product cards
from bs4 import BeautifulSoup
def parse_search(html):
soup = BeautifulSoup(html, "html.parser")
items = []
for card in soup.select("div.s-main-slot div[data-asin][data-component-type='s-search-result']"):
asin = card.get("data-asin")
title_el = card.select_one("h2 a span")
price_whole = card.select_one("span.a-price > span.a-offscreen")
rating = card.select_one("span.a-icon-alt")
link_el = card.select_one("h2 a")
if not (asin and title_el and link_el):
continue
items.append({
"asin": asin,
"title": title_el.get_text(strip=True),
"price": price_whole.get_text(strip=True) if price_whole else None,
"rating": rating.get_text(strip=True) if rating else None,
"url": f"https://www.amazon.com{link_el['href'].split('?')[0]}",
})
return items
3) Walk pagination (carefully)
from urllib.parse import urlencode
def search_amazon(query, pages=1):
base = "https://www.amazon.com/s"
results = []
for page in range(1, pages + 1):
params = {"k": query, "page": page}
html = fetch(f"{base}?{urlencode(params)}")
if not html:
break
results.extend(parse_search(html))
time.sleep(random.uniform(1.2, 3.1)) # be gentle
return results
if __name__ == "__main__":
data = search_amazon("wireless earbuds", pages=2)
for row in data[:5]:
print(row)
Prefer a ready-to-run example? See the repo’s code paths and notes: https://github.com/maivyly52-gif/ama...scraper-python
Anti-Bot Tips (Reduce Blocks)
You’ll find a compact starter you can adapt in the GitHub project: https://github.com/maivyly52-gif/ama...scraper-python
Data You Can Extract (Typical)
Legal & Ethical Notes
Next Steps
Dive deeper, copy the boilerplate, and tweak it for your use case here: https://github.com/maivyly52-gif/ama...scraper-python — and if you find it useful, ⭐ the repo and explore the code examples in https://github.com/maivyly52-gif/ama...scraper-python
More...
What You’ll Learn
- Send realistic HTTP requests (headers, delays)
- Parse titles, prices, ratings, URLs with BeautifulSoup
- Handle pagination safely
- Reduce blocks with rotating user agents/proxies
- Know ethical & legal guardrails
Explore the full example code here: https://github.com/maivyly52-gif/ama...scraper-python
pip install requests beautifulsoup4 fake-useragent
(Proxy support? Add httpx/requests[socks] or a provider SDK.)
Core Steps
1) Build a “human-like” request
import time, random, requests
from fake_useragent import UserAgent
ua = UserAgent()
headers = {
"User-Agent": ua.random,
"Accept-Language": "en-US,en;q=0.9",
}
def fetch(url, *, retries=3, backoff=2):
for i in range(retries):
resp = requests.get(url, headers=headers, timeout=20)
if resp.status_code == 200 and "Robot Check" not in resp.text:
return resp.text
time.sleep(backoff * (i + 1) + random.uniform(0.2, 1.1))
return None
2) Parse product cards
from bs4 import BeautifulSoup
def parse_search(html):
soup = BeautifulSoup(html, "html.parser")
items = []
for card in soup.select("div.s-main-slot div[data-asin][data-component-type='s-search-result']"):
asin = card.get("data-asin")
title_el = card.select_one("h2 a span")
price_whole = card.select_one("span.a-price > span.a-offscreen")
rating = card.select_one("span.a-icon-alt")
link_el = card.select_one("h2 a")
if not (asin and title_el and link_el):
continue
items.append({
"asin": asin,
"title": title_el.get_text(strip=True),
"price": price_whole.get_text(strip=True) if price_whole else None,
"rating": rating.get_text(strip=True) if rating else None,
"url": f"https://www.amazon.com{link_el['href'].split('?')[0]}",
})
return items
3) Walk pagination (carefully)
from urllib.parse import urlencode
def search_amazon(query, pages=1):
base = "https://www.amazon.com/s"
results = []
for page in range(1, pages + 1):
params = {"k": query, "page": page}
html = fetch(f"{base}?{urlencode(params)}")
if not html:
break
results.extend(parse_search(html))
time.sleep(random.uniform(1.2, 3.1)) # be gentle
return results
if __name__ == "__main__":
data = search_amazon("wireless earbuds", pages=2)
for row in data[:5]:
print(row)
Prefer a ready-to-run example? See the repo’s code paths and notes: https://github.com/maivyly52-gif/ama...scraper-python
Anti-Bot Tips (Reduce Blocks)
- Rotate User-Agents per request (fake-useragent or a maintained list).
- Respectful delays (1–5s jitter) and low concurrency.
- Proxies: residential/mobile work best; rotate IPs and subnets.
- Fewer parameters in URLs; avoid suspicious patterns.
- Fallback strategies: try different storefronts or narrower filters when you hit captchas.
You’ll find a compact starter you can adapt in the GitHub project: https://github.com/maivyly52-gif/ama...scraper-python
Data You Can Extract (Typical)
- Title, price, list price, rating, review count
- ASIN, product URL, image URL
- Badges (e.g., “Best Seller”, “Amazon’s Choice”)
- Availability snippets
Legal & Ethical Notes
- Check Amazon’s Terms of Use and your local laws before scraping.
- Prefer official APIs when possible (e.g., Amazon Product Advertising API) for reliability.
- Don’t overload servers; throttle requests and cache results.
- Use scraped data only where you have the right to use it.
Next Steps
- Turn results into CSV/JSON for analysis.
- Add retry with CAPTCHA detection and proxy rotation.
- Expand parsing to product detail pages (features, bullets, specs).
Dive deeper, copy the boilerplate, and tweak it for your use case here: https://github.com/maivyly52-gif/ama...scraper-python — and if you find it useful, ⭐ the repo and explore the code examples in https://github.com/maivyly52-gif/ama...scraper-python
More...