Home United States USA — software Web Scraping Using Python (Part 2)

Web Scraping Using Python (Part 2)

360
0
SHARE

Take a look at how we can scrape multiple details form a web page with this example scraping and formatting the details of multiple eBay items.
Let’s be friends:
Comment ( 0)
Join the DZone community and get the full member experience.
In this article, I will outline how to efficiently automate web scraping to return any data you may desire and then store it in a structured format for analysis.
This is the second part of the series from part one in which I went through just the basics of web scraping. If you’d like to get a quick basic overview you can check part one in this link.
Alright with that said, let’s get to the exciting stuff!
Note that I will continue with the eBay example we did in part one.
Just like before I will start by importing our libraries.
I have decided I will scrape data for cell phones from eBay starting from this link .
First, just like we did in part one we will:
Here I am interested in 13 categories and those are the ones I will be getting for all products:
I need to highlight a few things to keep in mind here:
Alright let’s do this for each attribute one by one.
Here I simply used the find method just like we did in part one, specifying ‘h3’ as the tag and ‘s-item__title’ as the class with .text at the end to return only the text we need.
The only difference this time is that I used try and except to ask Python to return « None » into the variable if an error is raised which will come in handy if this item does not have that attribute (a title in this case)
Printing the result gives exactly what we want. The title of the first product on the webpage
The same way I did the title, here I used.find with the relevant tag ‘div’ and the relevant class ‘s-item__subtitle’ and .text at the end.
Again, printing the result gives us the description we want.
Ok perfect, everything is the same as before.
Let’s print the result:
Hmm… looks ok, but we do not want to have « Brand: » then the actual brand written every time for each product. This will look a bit messy if we want to have this in an Excel sheet later.
Let’s try again with a minor modification at the end of the second line of code:
Let’s print:
Great, we got just the brand. Now what I did here is very simple. I added « .split(‘ ‘)  » at the end which simply splits any text we give it based on whatever we specify between it’s brackets.

Continue reading...