Find out broken links on website using selenium webdriver and HTTP Client

Earlier we have seen working with finding broken images, now here we will see finding invalid URLs. Here a valid URL will always have a status with 200. We have different HTTP status codes which are used for different purposes. You can check Wiki page for more information on HTTP Status Codes

Here 2xx class of status codes indicates that the action request by client was received and processed successfully without any issues.

And 4xx class of status code is mainly intended for cases in which the client seems to have erred.

And 5xx class of status codes are intended for cases in which the server seems to have erred.

The following are the list of different HTTP status codes.
Http status codes

By just seeing the Links in the UI, we may not be able to confirm if that link is working or not until we click and verify it.

To achieve this, we can use HTTPClient library to check status codes of the URLs on a page. You need to download and add it to the build path.

If request was NOT processed correctly, then the HTTP status codes may return any of the above listed codes but not a 200 status code. We can easily say whether the link is broken or not with status codes.

Now let us jump into the example, First we will try to find all anchor tags on the page by using Webdriver. By using the below syntax:

List<WebElement> anchorTagsList = driver.findElements(By.tagName("a"));

We need to iterate through each link and verify request response Status codes and it should be 200 if not, we will increment invalid links count

Let us look into the example :

package com.linked;

import java.util.List;

import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.HttpClientBuilder;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.testng.annotations.AfterClass;
import org.testng.annotations.BeforeClass;
import org.testng.annotations.Test;

public class FindBrokenLinksExample {
	
	private WebDriver driver;
	private int invalidLinksCount;

	@BeforeClass
	public void setUp() {

		driver = new FirefoxDriver();
		driver.get("http://google.com");
	}

	@Test
	public void validateInvalidLinks() {

		try {
			invalidLinksCount = 0;
			List<WebElement> anchorTagsList = driver.findElements(By
					.tagName("a"));
			System.out.println("Total no. of links are "
					+ anchorTagsList.size());
			for (WebElement anchorTagElement : anchorTagsList) {
				if (anchorTagElement != null) {
					String url = anchorTagElement.getAttribute("href");
					if (url != null && !url.contains("javascript")) {
						verifyURLStatus(url);
					} else {
						invalidLinksCount++;
					}
				}
			}

			System.out.println("Total no. of invalid links are "
					+ invalidLinksCount);

		} catch (Exception e) {
			e.printStackTrace();
			System.out.println(e.getMessage());
		}
	}

	@AfterClass
	public void tearDown() {
		if (driver != null)
			driver.quit();
	}

	public void verifyURLStatus(String URL) {

		HttpClient client = HttpClientBuilder.create().build();
		HttpGet request = new HttpGet(URL);
		try {
			HttpResponse response = client.execute(request);
			// verifying response code and The HttpStatus should be 200 if not,
			// increment invalid link count
			////We can also check for 404 status code like response.getStatusLine().getStatusCode() == 404
			if (response.getStatusLine().getStatusCode() != 200)
				invalidLinksCount++;
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}
Selenium Tutorials: 

Comments

Could you please post jenkins related information. It might helpful for us

Hi , i liked all the post of selenium. its very helpful.
But in this example of Broken Links i am not getting any response for below code...

HttpResponse response = client.execute(request);

Please help..

Error:..
Total no. of links are 48
Oct 14, 2015 12:16:25 PM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.SocketException) caught when processing request: Connection reset
Oct 14, 2015 12:16:25 PM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request

the links which uses java script for them how can i get the response???

Test Stops when a 404 Links get.

What about for the websites where we have to authorize first.
How to deal with such situation?

How to validate links having href as javascript ..?

Hi, thank you for sharing this info.
I have two pages, one with broken links and another without broken links.
I did a run to see the differences and see how would work, but it has not.
In both tests has passed and it has not caught the number of links, neither the broken ones.

Total no. of links are 0
Total no. of invalid links are 0

===============================================
Default Suite
Total tests run: 1, Failures: 0, Skips: 0
===============================================

Sorry, my bad. I was using baseUrl and private String baseUrl; instead of driver.get.
It works now. A bit slow but works.
I wonder if headerless would be much faster. Just don't know how to do it.

Add new comment

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.