Scrapingforge logo

Scraping Amazon Products

Extract product data from Amazon including title, price, ratings, images, and availability using ScrapingForge API.

Learn how to scrape Amazon product pages and extract structured data including prices, reviews, ratings, and images.

Overview

Amazon uses heavy JavaScript rendering and bot detection. ScrapingForge handles this automatically with:

  • JavaScript rendering for dynamic content
  • Residential proxies to avoid IP blocks
  • Automatic retry logic for failed requests

Product Data Structure

We'll extract the following data from Amazon product pages:

FieldDescription
titleProduct name and title
priceCurrent price
original_priceOriginal price (if on sale)
ratingAverage customer rating (1-5 stars)
reviews_countTotal number of reviews
availabilityStock status
imagesProduct image URLs
featuresKey product features/bullets

Python Example

import requests
import json

api_key = "sf_your_api_key"
url = "https://api.scrapingforge.com/api/v1/scraper"

# Amazon product URL
product_url = "https://www.amazon.com/dp/B08N5WRWNW"

payload = {
    "url": product_url,
    "render_js": True,
    "premium_proxy": True,
    "country": "US",
    "wait_for": "#productTitle",
    "extract_rules": {
        "title": {
            "selector": "#productTitle",
            "type": "text"
        },
        "price": {
            "selector": ".a-price .a-offscreen",
            "type": "text"
        },
        "rating": {
            "selector": "#acrPopover",
            "type": "attr",
            "attr": "title"
        },
        "reviews_count": {
            "selector": "#acrCustomerReviewText",
            "type": "text"
        },
        "availability": {
            "selector": "#availability span",
            "type": "text"
        },
        "images": {
            "selector": "#altImages img",
            "type": "list",
            "attr": "src"
        },
        "features": {
            "selector": "#feature-bullets li span.a-list-item",
            "type": "list"
        }
    }
}

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)
product_data = response.json()

print(json.dumps(product_data, indent=2))

Node.js Example

const axios = require('axios');

const apiKey = 'sf_your_api_key';
const apiUrl = 'https://api.scrapingforge.com/api/v1/scraper';

// Amazon product URL
const productUrl = 'https://www.amazon.com/dp/B08N5WRWNW';

const payload = {
  url: productUrl,
  render_js: true,
  premium_proxy: true,
  country: 'US',
  wait_for: '#productTitle',
  extract_rules: {
    title: {
      selector: '#productTitle',
      type: 'text'
    },
    price: {
      selector: '.a-price .a-offscreen',
      type: 'text'
    },
    rating: {
      selector: '#acrPopover',
      type: 'attr',
      attr: 'title'
    },
    reviews_count: {
      selector: '#acrCustomerReviewText',
      type: 'text'
    },
    availability: {
      selector: '#availability span',
      type: 'text'
    },
    images: {
      selector: '#altImages img',
      type: 'list',
      attr: 'src'
    },
    features: {
      selector: '#feature-bullets li span.a-list-item',
      type: 'list'
    }
  }
};

axios.post(apiUrl, payload, {
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json'
  }
})
.then(response => {
  console.log(JSON.stringify(response.data, null, 2));
})
.catch(error => {
  console.error('Error:', error.response?.data || error.message);
});

Response Example

{
  "title": "Apple AirPods Pro (2nd Generation)",
  "price": "$189.99",
  "rating": "4.7 out of 5 stars",
  "reviews_count": "54,321 ratings",
  "availability": "In Stock",
  "images": [
    "https://m.media-amazon.com/images/I/61SUj2aKoEL._AC_SL75_.jpg",
    "https://m.media-amazon.com/images/I/61f1YfTkTtL._AC_SL75_.jpg"
  ],
  "features": [
    "Active Noise Cancellation reduces unwanted background noise",
    "Adaptive Transparency lets outside sound in while reducing loud noise",
    "Personalized Spatial Audio with dynamic head tracking"
  ]
}

Best Practices

Rate LimitingAmazon has strict rate limits. Recommendations:
  • Use residential proxies (premium_proxy: true)
  • Add delays between requests (5-10 seconds minimum)
  • Rotate user agents via custom_headers
  • Consider using asynchronous jobs for bulk scraping
Cost OptimizationThis request costs approximately:
  • Base: 1 credit
  • JS rendering: 5 credits
  • Premium proxy: 15 credits
  • Total: ~21 credits per product
For AI extraction, add +10 credits (total: ~31 credits).

Scraping Multiple Products

For scraping multiple products, use asynchronous jobs:

import requests
import time

api_key = "sf_your_api_key"
base_url = "https://api.scrapingforge.com/api/v1/scraper"

product_urls = [
    "https://www.amazon.com/dp/B08N5WRWNW",
    "https://www.amazon.com/dp/B0BSHF7WHW",
    "https://www.amazon.com/dp/B09B8RXYYH"
]

# Submit jobs
job_ids = []
for product_url in product_urls:
    payload = {
        "url": product_url,
        "render_js": True,
        "premium_proxy": True,
        "country": "US",
        "extract_rules": {...}  # Same as above
    }

    response = requests.post(
        f"{base_url}/jobs",
        json=payload,
        headers={"Authorization": f"Bearer {api_key}"}
    )
    job_ids.append(response.json()["job_id"])
    print(f"Submitted job: {response.json()['job_id']}")

# Wait for completion and fetch results
time.sleep(30)  # Wait for jobs to complete

results = []
for job_id in job_ids:
    result_response = requests.get(
        f"{base_url}/jobs/{job_id}/result",
        headers={"Authorization": f"Bearer {api_key}"}
    )
    results.append(result_response.json())

print(f"Scraped {len(results)} products")

Next Steps