Home United States USA — software Web Scraping in Java Using jsoup and OkHttp Web Scraping in Java...

Web Scraping in Java Using jsoup and OkHttp Web Scraping in Java Using jsoup and OkHttp

369
0
SHARE

A Java expert shows us how to create a custom HTML/CSS Theme Template page using web scraping techniques and tools to scrape bootstrap-based web pages.
Web scraping is a fundamental skill that is extremely useful for data collection and automating tasks. The following examples will show how we scrape sites such as wrapbootstrap and themeforest to populate the HTML/CSS Theme Templates page. We will be using jsoup for DOM parsing and OkHttp for HTTP. Although jsoup is capable of handling HTTP for us we prefer to stick with OkHttp in case we need anything more complex than a simple GET request, such as special headers and cookies. Why learn two libraries when one will do?
We like to start simple, so we are only gathering four fields’ title, URL, image URL, and the number of downloads, if available.
Our scraper is fairly simple. All it needs to do is a single GET request and extract the data we are interested in. We are using failsafe for retry logic and jOOλ for a simplified streaming API. Setting up OkHttpClient Logging Interceptors is very useful for tracking down bugs. We are only showing the wrapbootstrap scraper but the rest can be found here.
Our naming convention for the service layer is generally just pluralizing the model. We don’t care how it’s getting the data as long as it gets it. We are caching the results of each scraper so we don’t upset the websites maintainers. In an ideal world, we might periodically scrape and store the data in our own database.
Now we simply create a custom HttpHandler and pass the themes along to the HTML template.
Finally, it’s hooked into our router and now we have a functioning HTML/CSS Theme Template page.

Continue reading...