Home United States USA — software How To Find Broken Links Using Selenium WebDriver

How To Find Broken Links Using Selenium WebDriver

437
0
SHARE

Want to find broken links on your website? Here’s how you can do broken link testing on your website using Selenium in Java, C#, Python, and PHP.
Join the DZone community and get the full member experience. What thoughts come to mind when you come across 404/Page Not Found/Dead Hyperlinks on a website? Aargh! You would find it annoying when you come across broken hyperlinks, which is the sole reason why you should continuously focus on removing the existence of broken links in your web product (or website). Instead of a manual inspection, you can leverage automation for broken link testing using Selenium WebDriver. When a particular link is broken and a visitor lands on the page, it affects that page’s functionality and results in a poor user experience. Dead links could hurt your product’s credibility, as it ‘might’ give an impression to your visitors that there is a minimal focus on the experience. If your web product has many pages (or links) that result in a 404 error (or page not found), the product rankings on search engines (e.g., Google) will also be badly affected. Removal of dead links is one of the integral parts of SEO (Search Engine Optimization) activity. In this part of the Selenium WebDriver tutorial series, we deep dive into finding broken links using Selenium WebDriver. We have demonstrated broken link testing using Selenium Python, Selenium Java, Selenium C#, and Selenium PHP. In simple terms, broken links (or dead links) in a website (or web app) are links that are not reachable and do not work as anticipated. The links could be temporarily down due to server issues or wrongly configured at the back end. Apart from pages that result in 404 error, other prominent examples of broken links are malformed URLs, links to content (e.g., documents, pdf, images, etc.) that have been moved or deleted. Here are some of the common reasons behind the occurrence of broken links (dead links or link rots): Incorrect or misspelled URL entered by the user. Structural changes in the website (i.e., permalinks) with URL redirects or internal redirects are not properly configured. Links to content like videos, documents, etc. that are either moved or deleted. If the content is moved, the ‘internal links’ should be redirected to the designated links. Temporary website downtime due to site maintenance making the website temporarily inaccessible. Broken HTML tags, JavaScript errors, incorrect HTML/CSS customizations, broken embedded elements, etc., within the page leading, can lead to broken links. Geolocation restrictions prevent access to the website from certain IP addresses (if they are blacklisted) or specific countries in the world. Geolocation testing with Selenium helps ensure that the experience is tailor-made for the location (or country) from where the site is accessed. Broken links are a big turn-off for the visitors who land on your website. Here are some of the major reasons why you should check for broken links on your website: Broken Links can hurt the user experience. Removal of broken (or dead) links is essential for SEO (Search Engine Optimization), as it can affect the site’s rankings on search engines (e.g., Google). Broken links testing can be done using Selenium WebDriver on a web page, which in turn can be used to remove the site’s dead links. When a user visits a website, a request is sent by the browser to the site’s server. The server responds to the browser’s request with a three-digit code called the ‘HTTP Status Code.’ An HTTP Status Code is the server’s response to a request sent from the web browser. These HTTP Status Codes are considered equivalent to the conversation between the browser (from which URL request is sent) and the server. Though different HTTP Status Codes are used for different purposes, most of the codes are useful for diagnosing issues in the site, minimizing site downtime, the number of dead links, and more. The first digit of every three-digit status code begins with numbers 1~5. The status codes are represented as 1xx,2xx..,5xx for indicating the status codes in that particular range. As each of these ranges consists of a different class of server response, we would limit the discussion to HTTP Status Codes presented for broken links. Here are the common status code classes that are useful in detecting broken links with Selenium: Here are some of the common HTTP Status Codes presented by the web server on encountering a broken link: Irrespective of the language used with Selenium WebDriver, the guiding principles for broken link testing using Selenium remains the same. Here are the steps for broken links testing using Selenium WebDriver: Use the tag to collect details of all the links present on the webpage. Send an HTTP request for every link. Verify the corresponding response code received in response to the request sent in the previous step. Validate whether the link is broken or not based on the response code sent by the server. Repeat steps (2-4) for every link present on the page. In this Selenium WebDriver tutorial, we would demonstrate how to perform broken link testing using Selenium WebDriver in Python, Java, C#, and PHP. The tests are conducted on (Chrome 85.0 + Windows 10) combination, and the execution is carried out on the cloud-based Selenium Grid provided by LambdaTest. To get started, create an account on the platform and note the user-name & access-key available from the profile section. The browser capabilities are generated using Capabilities Generator. Here is the test scenario used for finding broken links on a website using Selenium: Test Scenario Go to LambdaTest Blog i.e. https://www.lambdatest.com/blog on Chrome 85.0 Collect all the links present on the page Send HTTP request for each link Print whether the link is broken or not on the terminal It is important to note that the time spent in broken links testing using Selenium depends on the number of links present on the ‘web page under test.’ The more the number of links on the page, the more time will be spent finding broken links. For example, LambdaTest has a huge number of links (~150+); hence, the process of finding broken links might take some time (approx a few minutes).

Continue reading...