Scraping 20,000 Poshmark Listings (For Real This Time)

Robert James Gabriel
6 min
Scraping 20,000 Poshmark Listings (For Real This Time)

Scraping 20,000 Poshmark Listings (For Real This Time)

Back last week, I wrote a blog post titled How I Scraped 20,000 Poshmark Listings Using Node.js and Puppeteer. The idea seemed solid: scroll through a Poshmark closet, grab every listing URL, and scrape each listing one by one.

But in reality? The script crashed constantly.

It would hang, freeze, or get stuck on a single listing. Chrome would balloon in memory, and Puppeteer would throw vague timeout errors like:


Runtime.callFunctionOn timed out. Increase the 'protocolTimeout' setting in launch/connect calls...

After way too many failed runs, I finally figured out why — and how to actually fix it. This post is the follow-up to that original blog, walking through what went wrong and how I made it right.


🧨 What Went Wrong

Here’s how my original scraper worked:

const browser = await puppeteer.launch();
const page = await browser.newPage();

for (let url of allListingUrls) {
  await page.goto(url);
  const images = await page.$$eval(...);
  await saveImages(images);
}

It looked simple, but under the hood, there were big issues:

  • One long browser session for 20,000 listings = eventual crash

  • No timeout protection meant a single slow page could freeze everything

  • No resume support, so if it crashed at item #1375, I’d have to start all over

  • No skip logic, so re-running would re-scrape everything

✅ The New Approach: Batch It, Restart It, Skip It

I rewrote the entire scraper with resilience in mind. The result is a scraper that:

  • Splits all listings into batches of 10
  • Starts a fresh Puppeteer browser for each batch
  • Skips already-scraped listings
  • Handles timeouts gracefully
  • Saves progress with a simple text file so I can resume anytime

🔁 Batching the Work

const BATCH_SIZE = 10;
const totalBatches = Math.ceil(allListingUrls.length / BATCH_SIZE);

for (let b = 0; b < totalBatches; b++) {
  const batch = allListingUrls.slice(b * BATCH_SIZE, (b + 1) * BATCH_SIZE);
  // process this batch
}

This keeps memory usage low and makes failures isolated and easy to recover from.


🧼 Fresh Browser Per Batch


const { browser, page } = await createBrowser();
// ... scrape listings ...
await browser.close();

No more memory leaks, zombie tabs, or weird state.


⏱️ Timeout Safety with Promise.race


await Promise.race([
  page.goto(url, { timeout: 3000 }),
  sleep(4000).then(() => { throw new Error("Timeout!") })
]);

If a page doesn't load quickly, we kill it and move on.


🧠 Skip Already-Scraped Listings


if (fs.existsSync(folderPath)) {
  console.log("Already scraped, skipping.");
  continue;
}

Let’s me run the scraper over and over again without repeating work.

💾 Save and Resume with last_batch.txt

writeLastBatchIndex(currentBatchIndex);

On each batch, the script saves its progress. So even if I kill it, crash it, or lose power — it’ll pick right back up where it left off.


🤯 The Difference

Feature Old Script ❌ New Script ✅
One browser for all listings Yes No — fresh browser per batch
Timeout handling No Yes — fast fail and move on
Resume support No Yes — last_batch.txt
Re-scrape protection No Yes — skips done folders
Batching No (1 big loop) Yes — smaller and stable
Reliable over 1,000+ listings Unstable Solid and scalable 🚀

🚀 Final Thoughts

The first version was a cool proof-of-concept, but the new version is a production-grade scraper I can actually trust. It’s crash-safe, restartable, and skips already-processed listings.

If you're trying to scrape large datasets with Puppeteer and running into crashes, batching and timeout management will save your sanity.

Want the final working code? DM me on Twitter or email me, happy to share!