Bright Data
Japan EC Scraping
Amazon.co.jp
Rakuten

Bright Data for Rakuten, Amazon.co.jp, and Yahoo! Shopping 2026: A Practical Scraping Guide

How to scrape Japan's top three e-commerce sites (Rakuten, Amazon.co.jp, Yahoo! Shopping) with Bright Data: product selection, marketplace-specific tips, Dify/n8n integration, cost design, and legal notes.

11 min read
Bright Data for Rakuten, Amazon.co.jp, and Yahoo! Shopping 2026: A Practical Scraping Guide

You want to pull product data from Rakuten, Amazon.co.jp, and Yahoo! Shopping every day, but Datacenter IPs get blocked within days. Combining Bright Data Residential, Web Unlocker, and Scraping Browser at the right boundaries makes monthly costs sit comfortably in the low hundreds of dollars while keeping success rates above 95%. This guide walks through marketplace-specific difficulty, product selection, Dify/n8n integration, cost design, and legal notes, all based on our experience running Tra-bell on Bright Data.

Why Scraping Japan's Big-Three E-commerce Is Hard

Rakuten, Amazon.co.jp, and Yahoo! Shopping all run advanced bot detection on both search results and product detail pages. Datacenter IPs get flagged within days and start receiving CAPTCHAs, 403s, or soft blocks (empty HTML on every request). The most common bot pattern (same IP, same product URL, same time of day, every day) is exactly what these systems are tuned to catch.

Marketplace Difficulty

  • Amazon.co.jp: 5 of 5. Strict on both search and detail pages, with Cloudflare-style behaviors mixed in. Web Unlocker or Scraping Browser is effectively required.
  • Rakuten: 3 of 5. Single product detail pages often work on Residential alone, but search results, rankings, and review listings are more stable with Web Unlocker.
  • Yahoo! Shopping: 3 of 5. Similar to Rakuten. IP rotation and request spacing matter most.

In Tra-bell adjacent experiments we ran 1,000 SKUs for a week on Datacenter only and saw a 30 to 50% data loss rate. Switching to Residential + Web Unlocker brought that down to the 1 to 3% range. The first 24 hours typically look acceptable on Datacenter, which is why teams underestimate the problem. The failure curve steepens around day three to five, by which time the pipeline is already in production and ops has scheduled meetings about it.

The other dimension is consistency. Even when a Datacenter request returns HTML, the response often differs from what a residential user sees: stripped-down product descriptions, missing reviews, different prices keyed off region detection. That makes downstream analytics quietly unreliable in ways that are hard to debug.

How This Relates to Official APIs

Rakuten Web Service and Amazon's PA-API are the legal first option for product data. But rate limits, missing fields, and update delays mean most real-world projects end up with an API-plus-scraping hybrid. The methods in this article are designed to fill the gaps that APIs leave open, or to power internal benchmarking samples. Before commercial use, read each marketplace's terms and robots.txt and respect Crawl-Delay and rate limits1.

A few common gaps that push teams toward scraping: Amazon's PA-API throttles aggressively (and revokes credentials at the first sign of large volume), Rakuten Web Service exposes only a subset of attributes for items not enrolled in the affiliate program, and Yahoo! Shopping's APIs cover store search but not the full per-product detail pages many merchandising workflows need.

If your use case is narrower (price monitoring only), see our deeper write-up at Bright Data Residential Proxy for Price Monitoring.

Bright Data Products That Fit Japan EC Scraping

Bright Data has eight main products, but for Japan EC scraping you mostly use three: Residential Proxy, Web Unlocker, and Scraping Browser.

The Three Products You Actually Use

ProductPrimary useFit for Japan ECUnit price
Residential ProxyGeneral HTML fetch, high detection resistanceRakuten and Yahoo! product detailfrom $8.4/GB
Web UnlockerAuto-bypass CAPTCHA and bot defenses via APIAmazon.co.jp, Rakuten search resultsfrom $3/1k requests
Scraping BrowserPuppeteer/Playwright-compatible headless browserJavaScript-heavy pagesfrom $9/GB

How to Choose

  1. Static HTML, single product detail: Residential alone is enough. Cheapest path.
  2. Search results, rankings, review listings: Move to Web Unlocker. Bills by request, so scope the URL set.
  3. Pages with heavy JavaScript rendering or cart flows: Scraping Browser. Existing Puppeteer or Playwright code runs with minimal changes2.

Bright Data operates over 150 million residential IPs across 195 countries, with city and ISP-level targeting. For Japan EC, set country=jp on the Zone to pin traffic to Japanese IPs. The supply side runs KYC on IP providers, which simplifies the conversation with your compliance team about where the traffic actually comes from.

The other two products (ISP Proxy and Mobile Proxy) have narrow use cases for Japan EC. ISP Proxy is interesting when you want a long-lived static residential IP for session continuity (logged-in browsing, shopping carts), and Mobile Proxy is mostly relevant when targeting mobile-only experiences such as Rakuten's app-only campaigns. Both are markedly more expensive per request than Residential, so default to Residential first and escalate only when a measured need emerges.

Decision matrix: Residential vs Web Unlocker vs Scraping Browser for Japan EC
When to use which Bright Data product across Japan's three major marketplaces

If you want a closer look at Residential versus ISP trade-offs specifically, see Bright Data Residential vs ISP Proxy 2026.

Per-Marketplace Implementation Notes

A shared Python skeleton (Web Unlocker) is the starting point, then we vary headers, pacing, and product choice per marketplace.

Shared Python Skeleton (Web Unlocker)

import os
import httpx

PROXY = f"http://{os.environ['BD_USER']}:{os.environ['BD_PASS']}@brd.superproxy.io:33335"

def fetch(url: str) -> str:
    with httpx.Client(
        proxies={"http://": PROXY, "https://": PROXY},
        timeout=30.0,
        headers={"User-Agent": "Mozilla/5.0"},
    ) as client:
        r = client.get(url)
        r.raise_for_status()
        return r.text

Use this as the base and swap User-Agent strings, wait times, and Web Unlocker enablement per marketplace.

Marketplace Tips

Rakuten

  • Product detail pages (item.rakuten.co.jp/<shop>/<itemId>/) clear 90%+ success rate on Residential alone
  • Combine Web Unlocker for search results and RMS-linked sections. Watch keyword, tag, and s (sort) query parameters.
  • Prefer Rakuten Web Service for fields the API exposes
  • Rakuten's per-shop HTML layouts vary wildly: hundreds of thousands of independent merchants run their own templates within Rakuten's frame. Build extractors per category rather than per page, and accept that 5 to 10% of detail pages will need custom handling.

Amazon.co.jp

  • Use Web Unlocker or Scraping Browser for both search and product detail
  • Extract ASIN from the product URL pattern /dp/<ASIN>/. Stock and seller info HTML changes often, so CSS selectors need periodic maintenance.
  • Sponsored and organic listings are interleaved. Identify them with data-component-type.
  • Pricing on Amazon.co.jp is dynamic per session in ways the other marketplaces are not (Subscribe & Save price, Prime price, shipping options). If you only capture the headline price, you will miss a meaningful fraction of merchandising signal. Plan to fetch and normalize at least three price fields.

Yahoo! Shopping

  • Product detail pages are relatively stable. Residential alone often works.
  • Search result sort order (recommended, price, newest) is controlled via the s query parameter. Pin it explicitly for reproducibility.
  • LOHACO and PayPay Mall items may appear in the same listing. Filter them upstream if needed.
  • Yahoo! aggressively promotes PayPay points campaigns inside product detail pages. If your downstream consumer cares about effective price after points, capture both the listed price and the points multiplier.

Orchestrating With Dify and n8n

Bright Data's Web Scraper API pairs well with LLM orchestration and workflow tools like Dify and n8n. Several implementation examples have been shared on X.

"Combining Dify (an LLM app platform) with Bright Data Web Scraper for stable scraping — example shows Amazon product fetch end to end." (Translated from the Japanese post)

DifyJapan's example pipes Bright Data Web Scraper output into a Dify knowledge base, then has an LLM summarize Amazon product information. The entire scrape-to-summary-to-notify chain lives inside one Dify app, which lets non-engineers tweak workflows from the dashboard.

Failed to render tweet: View on X

The n8n example uses a cron trigger to call Bright Data Web Scraper daily and drops results into Google Sheets or Notion. Low maintenance overhead is a recurring theme in both examples.

Even without a dedicated Python or infra engineer, a Dify or n8n hybrid setup can have a PoC running within a week.

Cost Optimization and Operating Rules

A "every marketplace, every SKU, every hour" rollout makes the bill explode. Because Bright Data pricing is mostly consumption-based, scoping targets is the single biggest lever.

Five Practices That Cut Costs 30 to 50%

  • Tiered fallback: Start on Residential alone. Only escalate to Web Unlocker or Scraping Browser when you see 403s or CAPTCHAs.
  • Switch to delta crawling: Re-fetch only SKUs whose price, stock, or ranking changed the previous day.
  • Block static assets: Skip images, CSS, and analytics tags. Bandwidth drops 50 to 70%.
  • Long sessions: Reuse the same IP for 30 minutes to a few hours to amortize handshake bandwidth.
  • Spread time-of-day: Run during off-peak windows. Marketplace load is lower and success rates rise.

A practical sequencing trick worth highlighting: instrument every worker to emit per-request labels for proxy_type, marketplace, and result_status, then dashboard these as a daily heatmap. Cost overruns almost always trace back to one or two outlier categories (a specific Amazon search query escalating to Scraping Browser more often than expected, for instance). Without per-call labeling, that signal is invisible until the invoice arrives.

For a full cost picture, see Bright Data Pricing Cheat Sheet 2026.

In Tra-bell experiments, a three-tier fallback design (Residential, then Web Unlocker, then Scraping Browser) delivered the best balance of cost and success rate. Using Web Unlocker on every request inflated the bill 2 to 3x compared to a tiered approach.

Legal Notes and Where Smile Comfort Helps

Scraping Japanese EC sites is technically doable, but the legal and ethical setup deserves the same engineering rigor as the pipeline itself.

Minimum Checklist

  1. Terms of service: Read and document each marketplace's terms and any scraping-related clauses
  2. robots.txt: Follow each marketplace's robots.txt and Crawl-Delay
  3. Exclude PII: Skip fields like review bodies and usernames that may include personal data
  4. Document purpose: Codify internally that the use is limited to "business use of publicly available product information"
  5. Access frequency: Cap daily fetches at 1 to 2 per SKU and avoid peak hours

The checklist is not optional polish: legal counsel will ask about it, and so will Bright Data's compliance team during onboarding. KYC on the residential supply side is one half of the equation; documented purpose and request hygiene on the demand side is the other. Teams that treat compliance as a Day-1 deliverable rather than a Day-180 fire drill tend to scale through review cycles without having to redesign the pipeline.

For a more detailed take on GDPR and Japan's APPI, see Bright Data and GDPR / APPI Compliance 2026.

Smile Comfort operates Tra-bell, a hotel price tracking service, on Bright Data Residential and Web Unlocker. That production experience translates directly into requirements design, PoC, and production rollout for Japan EC scraping.

Wrap-Up

With Bright Data, you can run cross-marketplace scraping across Rakuten, Amazon.co.jp, and Yahoo! Shopping at a few tens of thousands of yen per month. Plan on Web Unlocker or Scraping Browser for Amazon, Residential plus optional Web Unlocker for Rakuten and Yahoo!, and orchestrate the whole thing with Dify or n8n to get a PoC live in a week. Cost optimization comes from tiered fallbacks and delta crawling. The legal foundation is API-first plus disciplined respect for each marketplace's terms.


Information current as of 2026-05-21. Please check the official sites for the latest updates.

This article contains affiliate links.

Footnotes

  1. METI "Guidelines on Electronic Commerce and Information Property Transactions" https://www.meti.go.jp/policy/it_policy/ec/index.html

  2. Bright Data Proxy Networks / Scraping Browser https://brightdata.com/proxy-types

  3. Bright Data Pricing https://brightdata.com/pricing

Frequently asked questions

Amazon.co.jp is clearly harder. Both its search results and product detail pages run heavy bot detection, so you should plan for Web Unlocker or Scraping Browser from day one. Rakuten product detail pages often run on Residential Proxy alone, but search results and RMS-linked sections are more stable with Web Unlocker. Yahoo! Shopping sits at a similar difficulty to Rakuten.

Related articles