Web Scraping, API, Tutorial·Oct 17, 2025

Web Scraping Steam Store with JavaScript and Node.js

We’ll scrape the Steam Store category pages to collect game titles, prices, discounts, and genres using pure JavaScript — no frameworks, no magic. Just modern Node, smart architecture, and the ScrapingForge mindset: code that scales and doesn’t break every Friday night.

Petro Popelyshko

Web scraping – Hero Image

1. Introduction — Web Scraping with JavaScript and Node.js in 2025

If you build scrapers for a living, you already know the pattern: you start with a quick script, it works for five minutes, then Steam (or any modern site) changes something and your code implodes at 2 AM.

The goal of this post isn’t another “Hello world” scraping demo.
We’re going to build a realistic, reproducible Steam Store scraper in plain JavaScript — the kind of tool you could drop into a job queue or analytics pipeline tomorrow.

Why Node.js

Node.js still dominates real‑time data collection for a reason:

Native async I/O lets you keep thousands of requests open without threads.
npm’s ecosystem (e.g. cheerio, undici, playwright) covers 95 % of what you need.
Same language on backend and scraper means less context switching.

Python remains great for quick notebooks. But when you want a scraper that runs forever, Node.js gives you event‑loop control, streams, and observability hooks that scale.

Why Steam

The Steam Store is the perfect benchmark site: public pages, predictable markup, light dynamic loading, and data everyone actually cares about — game names, prices, discounts, and genres.

We’ll scrape a few categories (Action, RPG, and Free‑to‑Play), handle pagination, normalize the data into JSON, and keep it polite with rate limits.

By the end, you’ll have a clean, maintainable scraper that reflects the ScrapingForge philosophy:
build once, debug rarely, scale easily.

2. Setting Up Your Node.js Web Scraping Project

We’ll keep things minimal but organized — structure first, hacks later.

Create the project

mkdir steam-scraper && cd steam-scraper
npm init -y
npm install node-fetch cheerio dotenv

We’re sticking with native ES modules, so set "type": "module" in package.json.

{
  "name": "steam",
  "version": "1.0.0",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "keywords": [],
  "description": "",
  "dependencies": {
    "cheerio": "^1.1.2",
    "dotenv": "^17.2.3",
    "node-fetch": "^3.3.2"
  }
}

Folder layout

steam-scraper/
├── src/
│   ├── index.js          # entry point
│   ├── lib/
│   │   └── helpers.js    # delay, logging, normalization
│   └── targets/
│       └── steam.js      # main scraping logic
├── data/
│   └── output/
├── .env
└── package.json

Environment configuration

Add your .env file to define basic runtime variables:

CATEGORY_URL=https://store.steampowered.com/genre/Action/
REQUEST_DELAY=2000
MAX_PAGES=3

Load it in your code:

import 'dotenv/config'

const CATEGORY_URL = process.env.CATEGORY_URL
const REQUEST_DELAY = Number(process.env.REQUEST_DELAY || 2000)

Minimal utility

Let’s create a tiny delay helper (we’ll need it later for rate limiting):

// src/lib/helpers.js
export const sleep = ms => new Promise(res => setTimeout(res, ms))

Test the baseline

Before you touch Cheerio, confirm your fetch works:

// src/index.js
import fetch from "node-fetch"
import { sleep } from "./lib/helpers.js"

const url = process.env.CATEGORY_URL

async function main() {
  console.log(`Fetching ${url}`)
  const res = await fetch(url)
  console.log(`Status: ${res.status}`)
  await sleep(1000)
}

main()

Run it:

node src/index.js

If you see Status: 200, you’re ready to parse HTML in the next step.

3. Understanding the Steam Store Structure (and Why “/category/action” Doesn’t Work)

If you open https://store.steampowered.com/category/action, you’ll notice something strange: there’s no visible pagination in the URL. Scroll down, and new games just appear. That’s because this page is a dynamic “content hub” — the data loads asynchronously through internal API calls, not through static links like ?page=2.

At first glance, that’s bad news for us scrapers…
But Steam also provides a hidden gem: its search endpoint.

🔍 The Real Endpoint for Reliable Data

Under the hood, Steam powers most category listings through:

https://store.steampowered.com/search/results/

This endpoint supports parameters like:

start – offset index (0, 50, 100, …)
count – number of items per slice (max ≈50)
tags – category or genre ID (e.g., 19 = Action)
force_infinite=1 – returns a JSON payload instead of full HTML

A real request looks like this:

https://store.steampowered.com/search/results/?start=0&count=50&tags=19&force_infinite=1&l=english&cc=US

The response includes two keys:

{
  "results_html": "<a class='search_result_row' ...> ... </a>",
  "total_count": 4872
}

So instead of scraping the dynamically rendered /category page, we’ll call this endpoint directly — it’s cleaner, faster, and gives us built-in pagination metadata.

🧩 Why This Is the “ScrapingForge Way”

We’re not trying to hack around JavaScript rendering when there’s a clean, JSON-backed alternative.
The ScrapingForge mindset is:

“Find the layer that machines actually use, not what browsers paint.”

Steam’s search/results endpoint is that layer. It’s structured, efficient, and consistent — perfect for automation.

⚙️ What We’ll Build

We’ll create a Node 22 scraper that:

Hits the search/results endpoint with proper query params
Parses results_html with Cheerio
Automatically paginates until it hits the total count or an empty page
Saves everything as JSON (and later CSV, if we want)

4. Building the Node 22 Steam Scraper (with Auto-Stop Pagination)

This version is lean, modern, and production-proof:

Uses Node 22’s native fetch (no node-fetch)
Handles pagination via start/count
Stops automatically when no new results appear
Works in ESM mode ("type": "module")

🧱 Project Structure Recap

steam-scraper/
├── src/
│   ├── index.js
│   ├── lib/helpers.js
│   └── targets/steam_search.js
├── data/output/
├── .env
└── package.json

⚙️ `.env`

STEAM_TAG_ID=19           # Action
COUNT_PER_PAGE=50         # items per slice
MAX_PAGES=20              # safety cap
MAX_ITEMS=0               # 0 = unlimited
MAX_EMPTY_PAGES=1         # stop after N empty slices
LOCALE=en
COUNTRY=US
OUT_DIR=./data/output
USER_AGENT=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36

⚙️ `src/lib/helpers.js`

export const sleep = (ms) => new Promise((res) => setTimeout(res, ms))

export function ensureDirSync(fs, dir) {
  if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true })
}

export function normText(s) {
  return (s ?? "").replace(/\s+/g, " ").trim()
}

⚙️ `src/targets/steam_search.js`

import { load } from "cheerio"
import { sleep, normText } from "../lib/helpers.js"

const TAG_ID    = process.env.STEAM_TAG_ID
const PER_PAGE  = Number(process.env.COUNT_PER_PAGE || 50)
const MAX_PAGES = Number(process.env.MAX_PAGES || 2)
const MAX_ITEMS = Number(process.env.MAX_ITEMS || 0)
const MAX_EMPTY = Number(process.env.MAX_EMPTY_PAGES || 1)
const LOCALE    = process.env.LOCALE || "en"
const CC        = process.env.COUNTRY || "US"
const UA        = process.env.USER_AGENT ||
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"

function buildSearchUrl(start) {
  const u = new URL("https://store.steampowered.com/search/results/")
  u.searchParams.set("start", String(start))
  u.searchParams.set("count", String(PER_PAGE))
  u.searchParams.set("force_infinite", "1")
  u.searchParams.set("infinite", "1")
  u.searchParams.set("dynamic_data", "")
  u.searchParams.set("sort_by", "_ASC")
  u.searchParams.set("l", LOCALE)
  u.searchParams.set("cc", CC)
  if (TAG_ID) u.searchParams.set("tags", String(TAG_ID))
  return u.toString()
}

async function fetchSlice(start) {
  const url = buildSearchUrl(start)
  const res = await fetch(url, {
    headers: {
      "User-Agent": UA,
      "Accept": "application/json,text/javascript,*/*;q=0.9",
      "Accept-Language": "en-US,en;q=0.9",
      "Cache-Control": "no-cache",
      "Pragma": "no-cache"
    }
  })

  const contentType = res.headers.get("content-type") || ""
  if (!contentType.includes("application/json")) {
    const text = await res.text()
    throw new Error(`Non-JSON response (${res.status}): ${text.slice(0, 100)}...`)
  }

  return res.json()
}

function parseResultsHtml(resultsHtml) {
  const $ = load(resultsHtml)
  const items = []

  $(".search_result_row").each((_, el) => {
      // IMPORTANT You may need to check the HTML structure of the response, as it may change over time.
      const $el = $(el)
      const title = normText($el.find(".title").text())
      const link  = $el.attr("href") || ""
      const discountPct = normText($el.find(".search_discount span").text()) || "0%"
      const finalPrice  = normText(
          $el.find(".discounted, .search_price").text()
      ) || "N/A"

    if (title) items.push({ title, price: finalPrice, discount: discountPct, link })
  })

  return items
}

export async function scrapeByTag() {
  if (!TAG_ID) throw new Error("STEAM_TAG_ID is not set")

  let results = []
  let start = 0
  let empties = 0

  for (let page = 1; page <= MAX_PAGES; page++) {
    let payload
    try {
      payload = await fetchSlice(start)
    } catch (err) {
      console.error(`Fetch error at page ${page}: ${err.message}`)
      console.log("Retrying after 3s...")
      await sleep(3000)
      continue
    }

    const html = payload?.results_html || ""
    const total = Number(payload?.total_count || 0)

    const batch = parseResultsHtml(html)
    console.log(`Page ${page}: start=${start}, got=${batch.length}, total=${total || "?"}`)

    if (batch.length === 0) {
      empties++
      if (empties >= MAX_EMPTY) {
        console.log(`Auto-stop: ${empties} empty page(s).`)
        break
      }
    } else {
      empties = 0
      results = results.concat(batch)
    }

    if (MAX_ITEMS > 0 && results.length >= MAX_ITEMS) {
      results = results.slice(0, MAX_ITEMS)
      console.log(`Auto-stop: reached MAX_ITEMS=${MAX_ITEMS}.`)
      break
    }

    start += PER_PAGE
    if (total && start >= total) {
      console.log(`Auto-stop: reached total_count=${total}.`)
      break
    }

    await sleep(Number(process.env.REQUEST_DELAY || 1000))
  }

  return results
}

⚙️ `src/index.js`

import "dotenv/config"
import fs from "fs"
import path from "path"
import { fileURLToPath } from "url"
import { ensureDirSync } from "./lib/helpers.js"
import { scrapeByTag } from "./targets/steam_search.js"

const __filename = fileURLToPath(import.meta.url)
const __dirname  = path.dirname(__filename)

const OUT_DIR  = process.env.OUT_DIR || "./data/output"
const OUT_FILE = path.join(__dirname, "..", OUT_DIR, `steam_tag_${process.env.STEAM_TAG_ID}.json`)

async function main() {
  ensureDirSync(fs, path.join(__dirname, "..", OUT_DIR))
  const items = await scrapeByTag()
  fs.writeFileSync(OUT_FILE, JSON.stringify(items, null, 2))
  console.log(`✅ Saved ${items.length} records → ${OUT_FILE}`)
}

main().catch((e) => {
  console.error("Fatal:", e)
  process.exit(1)
})

🧪 Run It

node src/index.js

Example output:

Page 1: start=0, got=50, total=4872
Page 2: start=50, got=50, total=4872
Auto-stop: reached total_count=4872.
✅ Saved 40 records → ./data/output/steam_tag_19.json

💾 Example JSON Output

[
  {
    "title": "Deep Rock Galactic",
    "price": "$14.99",
    "discount": "-50%",
    "link": "https://store.steampowered.com/app/548430/"
  },
  {
    "title": "Cyberpunk 2077",
    "price": "$29.99",
    "discount": "-50%",
    "link": "https://store.steampowered.com/app/1091500/"
  }
]

5. Handling Dynamic Pages and JavaScript Rendering

The Steam Store doesn’t use a single template for all its pages. While the search endpoint works great for most categories, certain sections such as New & Trending, Specials, or Coming Soon rely heavily on client-side JavaScript to render data.

If you try to scrape those pages with simple HTTP requests, you’ll often end up with an empty HTML response or placeholders like <script>InitPage()</script> instead of real data.

Understanding When HTML Isn’t Enough

When inspecting a page in the browser’s Developer Tools, check the Network tab:

If you only see .js and .json requests loading after the initial page, the content is being rendered dynamically.
If Ctrl+F for a game title returns nothing in the raw HTML, it’s definitely JavaScript-driven.

In such cases, you need a way to execute the page’s JavaScript and capture the resulting HTML — something Cheerio alone can’t do. This is where headless browsers like Playwright or Puppeteer come in.

Using Playwright for Dynamic Scraping

Playwright is ideal for modern web scraping because it supports Chromium, Firefox, and WebKit, with automatic waiting for network idle states and page rendering.

import { chromium } from "playwright"

async function scrapeDynamic(url) {
  const browser = await chromium.launch({ headless: true })
  const page = await browser.newPage()

  await page.goto(url, { waitUntil: "networkidle" })
  const games = await page.$$eval(".tab_item_name", els =>
    els.map(e => e.textContent.trim())
  )

  await browser.close()
  return games
}

const data = await scrapeDynamic("https://store.steampowered.com/explore/new/")
console.log(data)

This approach works for pages where Steam renders content after load. It’s slower than the JSON scraping method, but it’s bulletproof for smaller, dynamically loaded sections.

If you’re building at scale, you can also use ScrapingForge’s built-in browser rendering API, which provides the same capability via API calls — no local browsers to manage, no Playwright setup.

6. Cleaning and Structuring the Scraped Data

After scraping multiple pages or categories, your data will likely contain inconsistencies — mixed currencies, varying whitespace, missing prices, or discounts formatted differently. A small normalization layer ensures your output is consistent and easy to use.

Common Data Issues

Typical problems you’ll find in raw Steam data:

Extra whitespace or newline characters in titles
Prices like “Was $59.99 Now $29.99”
Empty strings for games with no discounts
Currency symbols depending on region ($, €, £)

To make analysis easier, clean and structure everything into predictable fields.

Normalizing Price and Discount Fields

Create a helper to clean each record:

export function normalizeGame(game) {
  const priceValue = parseFloat(game.price.replace(/[^0-9.]/g, ""))
  const discountValue = parseInt(game.discount.replace(/[^0-9-]/g, "")) || 0

  return {
    ...game,
    price_usd: isNaN(priceValue) ? null : priceValue,
    discount_percent: discountValue,
  }
}

You can then apply this normalization step after scraping:

import { normalizeGame } from "../lib/normalize.js"

const normalized = items.map(normalizeGame)
fs.writeFileSync("data/cleaned/steam_games_clean.json", JSON.stringify(normalized, null, 2))

This gives you structured JSON that’s easy to query or convert to other formats.

Enriching Data with Additional Fields

You can also extract and compute:

Discount range categories (e.g., “Small”, “Medium”, “Big Deal”)
Derived fields like is_free or on_sale
Timestamp to track when the data was collected

Example:

export function enrichGame(game) {
  return {
    ...game,
    on_sale: game.discount_percent < 0,
    scraped_at: new Date().toISOString(),
  }
}

Structured and enriched data isn’t just cleaner — it’s more valuable for downstream systems or dashboards.

7. Exporting Results and Using the Data

JSON is great for developers, but most users prefer working with data in spreadsheets or analytics tools. Exporting your results to CSV or SQLite makes it easier to filter, sort, and visualize game data.

Exporting to CSV

Install a converter like json2csv:

npm install json2csv

Then use it in your script:

import { Parser } from "json2csv"
import fs from "fs"

export function saveAsCSV(data, path) {
  const parser = new Parser()
  const csv = parser.parse(data)
  fs.writeFileSync(path, csv)
  console.log(`✅ Saved CSV: ${path}`)
}

Usage:

import { saveAsCSV } from "../lib/export.js"

saveAsCSV(normalized, "data/output/steam_games.csv")

You can then open the resulting file in Excel, Google Sheets, or import it into tools like Tableau or Metabase.

Integrating with Other Systems

Once exported, your data can easily feed:

Dashboards for price trends and discounts
Game recommendation bots
Marketing automation systems tracking top-sellers

With minimal tweaks, you can even stream the JSON results to an API endpoint or store them in MongoDB or PostgreSQL.

8. Scraping Multiple Categories Automatically

Once your single-category scraper is stable, expanding to multiple categories is straightforward.
Each Steam category (Action, RPG, Indie, Simulation, etc.) has its own tag ID.
Instead of running the script manually for each, you can automate the process in a loop.

Example: Multi-Tag Scraper

You can define all target tag IDs in your .env file or directly in your script:

STEAM_TAG_IDS=19,122,492,597

Then update your scraper entry point:

import "dotenv/config"
import fs from "fs"
import path from "path"
import { fileURLToPath } from "url"
import { ensureDirSync } from "./lib/helpers.js"
import { scrapeByTag } from "./targets/steam_search.js"

const __filename = fileURLToPath(import.meta.url)
const __dirname = path.dirname(__filename)
const OUT_DIR = process.env.OUT_DIR || "./data/output"

ensureDirSync(fs, path.join(__dirname, "..", OUT_DIR))

const tags = (process.env.STEAM_TAG_IDS || "").split(",").filter(Boolean)

for (const tag of tags) {
  process.env.STEAM_TAG_ID = tag.trim()
  console.log(`\n--- Scraping tag: ${tag} ---`)
  const items = await scrapeByTag()
  const outPath = path.join(__dirname, "..", OUT_DIR, `steam_tag_${tag}.json`)
  fs.writeFileSync(outPath, JSON.stringify(items, null, 2))
  console.log(`✅ Saved ${items.length} records for tag ${tag}`)
}

This setup can handle multiple genres in one run and store each as a separate dataset.

Expanding the Usefulness

With multiple JSON files, you can:

Compare genres side by side
Aggregate price ranges or discount averages
Build datasets for machine learning or recommendation systems

This turns your scraper from a one-off script into a data pipeline — the kind of technical depth that resonates well with developers reading your blog.

9. Error Handling and Rate Limiting

Real-world scraping doesn’t always go smoothly.
Pages change, servers rate-limit requests, and network timeouts can occur.
Building resilience into your scraper keeps it reliable and professional.

1. Detecting and Handling Non-JSON Responses

As seen earlier, Steam sometimes responds with HTML instead of JSON (for example, when it temporarily blocks automated requests).
Your scraper should handle that gracefully:

async function safeFetch(url) {
  try {
    const res = await fetch(url)
    const type = res.headers.get("content-type") || ""
    if (!type.includes("application/json")) {
      const text = await res.text()
      console.warn("Received non-JSON response, skipping slice")
      return null
    }
    return await res.json()
  } catch (e) {
    console.error("Network error:", e.message)
    return null
  }
}

This prevents one failed page from crashing the entire job.

2. Rate Limiting and Retry Logic

Always include delays between requests.
A 1–2 second delay is enough to prevent throttling.
Add exponential backoff when consecutive errors occur:

let delay = 1000
for (let attempt = 1; attempt <= 5; attempt++) {
  const data = await safeFetch(url)
  if (data) return data
  console.log(`Retrying in ${delay} ms...`)
  await sleep(delay)
  delay *= 2
}

This kind of structured retry system helps maintain stability even when API limits change.

3. Logging and Debugging

For long-running scrapers, keep logs:

import fs from "fs"

function logMessage(message) {
  const ts = new Date().toISOString()
  fs.appendFileSync("scraper.log", `[${ts}] ${message}\n`)
}

Logs help trace failures, detect pattern changes, and debug silently failed scrapes.

10. Optimizing and Maintaining Your Scraper**

A good scraper doesn’t just work once — it stays reliable as websites evolve.
Here’s how to keep your Steam scraper robust and high-performing.

Minimize Redundant Requests

Steam data doesn’t change every minute.
Use caching or If-Modified-Since headers to avoid unnecessary downloads:

const headers = {
  "If-Modified-Since": new Date(Date.now() - 86400000).toUTCString(), // 1 day ago
  "User-Agent": process.env.USER_AGENT
}

This reduces bandwidth and avoids flagging from frequent polling.

Handle Data Changes Gracefully

When new HTML structures appear, Cheerio selectors might break.
Build a small diagnostic step to detect when expected fields are missing:

if (!title || !price) {
  console.warn("Incomplete record detected:", link)
}

This lets you adapt early instead of silently producing bad data.

Schedule Regular Runs

Once the scraper works well, automate it:

Cron job (Linux/macOS):

0 */6 * * * node src/index.js >> scraper.log 2>&1

PM2 for continuous jobs
Or a lightweight CI/CD runner for reproducible datasets

Routine scheduling ensures fresh data, useful for trend tracking or price alerts.

Keep It Ethical and Maintainable

Even with a solid scraper, always respect:

Robots.txt and Terms of Service
Reasonable request rates
Avoiding data misuse or personal information

Professional scrapers succeed long-term because they balance technical excellence with responsible use.

Wrapping Up

At this point, you’ve built a robust, production-grade Steam Store web scraper in Node.js that can:

Handle static and dynamic pages
Clean and normalize structured data
Export to CSV or JSON for analytics
Recover from errors and scale to multiple categories

You’ve also seen how a thoughtful scraping architecture — built on modular helpers, retries, and structured output — saves time and keeps your data pipelines maintainable.

This approach isn’t limited to Steam.
You can apply the same structure to scrape e-commerce sites, marketplaces, or product APIs safely and efficiently.
A well-engineered scraper is not about hacking websites — it’s about building resilient data pipelines that keep up with the web’s evolution.

Playwright Web Scraping Tutorial for 2025 (Node.js)

Learn how to use Playwright for web scraping in 2025. This guide covers installation, basic scraping, intercepting requests, and using proxies.

What Is a Web Scraping API (and Why You Shouldn`t Build One From Scratch)

Discover why web scraping APIs are the future of data extraction. Learn about the hidden challenges of DIY scraping and when it makes sense to build vs. buy.