How to Scrape Product Data from eBay
eBay is an American internet trading company providing online auction services. In fact, it is
the world's largest bulletin board with millions of products from sellers located around the world. In this guide, I
will show you how to get data on the graphics cards sold. We will be fetching data from eBay.com
without Selenium, which is good as the data retrieval speed will be high.
- NAME - product name
- BRAND - product brand
- COND - condition
- PRICE - price
- LOC - item location
- SOLD - number of units sold
- AVAL - the number of available
- IMG - image
Let's create a regular expression for the product name. This is fairly easy to do since the title is enclosed
between the <h1> and </h1> tags. Let's use the "Regular expressions designer" and fill in the appropriate fields.
NAME
Step 1:
NAME
Step 1:
(?<=<h1).*?(?=</h1>)
Next, we get the BRAND value. Typically, the brand name is in "Item specifics"
Now we need to select "Brand: NVIDIA" and look at the html-code of the selection. As you can see, eBay uses itemprop
microdata. This is very good as we can use it to get the data we need.
BRAND
Step 1:
BRAND
Step 1:
(?<=<span\ itemprop="brand").*?(?=<\/td>)
What is itemprop?
Micro-markup is the markup of data on the page and on the site, which is used to make the search bot better recognize the content on the site. The syntax is as follows:itemprop="<property>"
This property is placed in a tag that contains the corresponding information. The property is defined by the
Schema.org data dictionary, which is maintained by Google.
Next, we will do the same for the rest of the fields. You can also read a guide on
how to get product data from Amazon. The result of our actions will be regular expressions for scraping:
COND
Step 1:
PRICE
Step 1:
LOC
Step 1:
SOLD
Step 1:
AVAL
Step 1:
IMG
Step 1:
Hopefully this guide will help you get the data you need for your business!
COND
Step 1:
(?<=itemprop="itemCondition").*?(?=</div>)
PRICE
Step 1:
(?<=itemprop="price").*?(?=</)
LOC
Step 1:
(?<=itemprop=['"]availableAtOrFrom['"]>).*?(?=</)
SOLD
Step 1:
(?<=id="why2buy")[\w\W]*?(?=Sold)
AVAL
Step 1:
(?<=<span\ id="qtySubTxt">)[\w\W]*?(?=</span)
IMG
Step 1:
(?<=itemprop="image").*?(?=style\=)
Step 2:
(?<=src=").*?(?=")
Hopefully this guide will help you get the data you need for your business!