Scraping Amazon Products
Extract product data from Amazon including title, price, ratings, images, and availability using ScrapingForge API.
Learn how to scrape Amazon product pages and extract structured data including prices, reviews, ratings, and images.
Overview
Amazon uses heavy JavaScript rendering and bot detection. ScrapingForge handles this automatically with:
- JavaScript rendering for dynamic content
- Residential proxies to avoid IP blocks
- Automatic retry logic for failed requests
Product Data Structure
We'll extract the following data from Amazon product pages:
| Field | Description |
|---|---|
title | Product name and title |
price | Current price |
original_price | Original price (if on sale) |
rating | Average customer rating (1-5 stars) |
reviews_count | Total number of reviews |
availability | Stock status |
images | Product image URLs |
features | Key product features/bullets |
Python Example
import requests
import json
api_key = "sf_your_api_key"
url = "https://api.scrapingforge.com/api/v1/scraper"
# Amazon product URL
product_url = "https://www.amazon.com/dp/B08N5WRWNW"
payload = {
"url": product_url,
"render_js": True,
"premium_proxy": True,
"country": "US",
"wait_for": "#productTitle",
"extract_rules": {
"title": {
"selector": "#productTitle",
"type": "text"
},
"price": {
"selector": ".a-price .a-offscreen",
"type": "text"
},
"rating": {
"selector": "#acrPopover",
"type": "attr",
"attr": "title"
},
"reviews_count": {
"selector": "#acrCustomerReviewText",
"type": "text"
},
"availability": {
"selector": "#availability span",
"type": "text"
},
"images": {
"selector": "#altImages img",
"type": "list",
"attr": "src"
},
"features": {
"selector": "#feature-bullets li span.a-list-item",
"type": "list"
}
}
}
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
product_data = response.json()
print(json.dumps(product_data, indent=2))
Node.js Example
const axios = require('axios');
const apiKey = 'sf_your_api_key';
const apiUrl = 'https://api.scrapingforge.com/api/v1/scraper';
// Amazon product URL
const productUrl = 'https://www.amazon.com/dp/B08N5WRWNW';
const payload = {
url: productUrl,
render_js: true,
premium_proxy: true,
country: 'US',
wait_for: '#productTitle',
extract_rules: {
title: {
selector: '#productTitle',
type: 'text'
},
price: {
selector: '.a-price .a-offscreen',
type: 'text'
},
rating: {
selector: '#acrPopover',
type: 'attr',
attr: 'title'
},
reviews_count: {
selector: '#acrCustomerReviewText',
type: 'text'
},
availability: {
selector: '#availability span',
type: 'text'
},
images: {
selector: '#altImages img',
type: 'list',
attr: 'src'
},
features: {
selector: '#feature-bullets li span.a-list-item',
type: 'list'
}
}
};
axios.post(apiUrl, payload, {
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
}
})
.then(response => {
console.log(JSON.stringify(response.data, null, 2));
})
.catch(error => {
console.error('Error:', error.response?.data || error.message);
});
Response Example
{
"title": "Apple AirPods Pro (2nd Generation)",
"price": "$189.99",
"rating": "4.7 out of 5 stars",
"reviews_count": "54,321 ratings",
"availability": "In Stock",
"images": [
"https://m.media-amazon.com/images/I/61SUj2aKoEL._AC_SL75_.jpg",
"https://m.media-amazon.com/images/I/61f1YfTkTtL._AC_SL75_.jpg"
],
"features": [
"Active Noise Cancellation reduces unwanted background noise",
"Adaptive Transparency lets outside sound in while reducing loud noise",
"Personalized Spatial Audio with dynamic head tracking"
]
}
Best Practices
Rate LimitingAmazon has strict rate limits. Recommendations:
- Use residential proxies (
premium_proxy: true) - Add delays between requests (5-10 seconds minimum)
- Rotate user agents via
custom_headers - Consider using asynchronous jobs for bulk scraping
Cost OptimizationThis request costs approximately:
- Base: 1 credit
- JS rendering: 5 credits
- Premium proxy: 15 credits
- Total: ~21 credits per product
Scraping Multiple Products
For scraping multiple products, use asynchronous jobs:
import requests
import time
api_key = "sf_your_api_key"
base_url = "https://api.scrapingforge.com/api/v1/scraper"
product_urls = [
"https://www.amazon.com/dp/B08N5WRWNW",
"https://www.amazon.com/dp/B0BSHF7WHW",
"https://www.amazon.com/dp/B09B8RXYYH"
]
# Submit jobs
job_ids = []
for product_url in product_urls:
payload = {
"url": product_url,
"render_js": True,
"premium_proxy": True,
"country": "US",
"extract_rules": {...} # Same as above
}
response = requests.post(
f"{base_url}/jobs",
json=payload,
headers={"Authorization": f"Bearer {api_key}"}
)
job_ids.append(response.json()["job_id"])
print(f"Submitted job: {response.json()['job_id']}")
# Wait for completion and fetch results
time.sleep(30) # Wait for jobs to complete
results = []
for job_id in job_ids:
result_response = requests.get(
f"{base_url}/jobs/{job_id}/result",
headers={"Authorization": f"Bearer {api_key}"}
)
results.append(result_response.json())
print(f"Scraped {len(results)} products")

