Baeldung

Java, Spring and Web Development tutorials

Avoid Bot Detection With Selenium in Spring Boot
2025-11-01 03:50 UTC by Sachin Verma

Two co-workers; the one on the left is holding up a magnifying glass to an artistic representation of files and folders. The one on the right is looking at a projection of some Java code. In between them there is an office plant. To the left of the image are the words "Java on Baeldung"

1. Overview

When considering web automation, Selenium is often the first tool that comes to mind. It’s extensively utilized for automating web browsers, testing applications, and even for extracting information from websites. However, as websites have evolved to become more advanced, they’ve implemented various bot detection mechanisms to distinguish between genuine users and automated tools.

If we’ve attempted to execute a Selenium script and encountered barriers such as access blocks, redirects, or CAPTCHAs, then we’ve experienced this challenge firsthand. In this tutorial, we’ll examine the necessary setup to bypass the most common bot detection when using Selenium and Spring Boot.

2. Setting Up a Spring Boot Project

Before we write any code, we need to set up our project. We’ll use Maven for dependency management, as it’s the standard for Spring Boot projects. Here’s a look at the essential dependencies we’ll need in our pom.xml file:

<dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-java</artifactId>
    <version>4.18.1</version>
</dependency>

This is the core library for Selenium. We need a recent version to ensure compatibility with modern browsers and access to advanced features like the Chrome DevTools Protocol (CDP).

3. Implementation

Let’s write the code now that enables us to use CDP to bypass bot detection.

3.1. Creating a Webdriver

Let’s create a class WebDriverFactory.java for configuring and creating the ChromeDriver:

public class WebDriverFactory {
    public static ChromeDriver createDriver() {
        ChromeOptions options = new ChromeOptions();
        options.addArguments("--disable-blink-features=AutomationControlled");
    }
}

First, we create a ChromeOptions object. This allows us to pass custom arguments to the browser when it starts. We add the argument –disable-blink-features=AutomationControlled. This simple but effective command disables a specific feature in Chrome that is a known indicator of automation. This would be our initial line of defense:

public static ChromeDriver createDriver() {
    ChromeOptions options = new ChromeOptions();
    options.addArguments("--disable-blink-features=AutomationControlled");
    ChromeDriver driver = new ChromeDriver(options);
}

Next, we create a new ChromeDriver instance, passing the options object we just configured. This ensures the browser starts up with the first anti-detection setting already in place:

public static ChromeDriver createDriver() {
    ChromeOptions options = new ChromeOptions();
    options.addArguments("--disable-blink-features=AutomationControlled");
    ChromeDriver driver = new ChromeDriver(options);
    Map<String, Object> params = new HashMap<>();
    params.put("source", "Object.defineProperty(navigator, 'webdriver', { get: () => undefined })");
    driver.executeCdpCommand("Page.addScriptToEvaluateOnNewDocument", params);
    return driver;
}

After the driver is created, we move to the second, more robust technique. We use the driver.executeCdpCommand() method to execute a command from the Chrome DevTools Protocol (CDP). The command Page.addScriptToEvaluateOnNewDocument is key here. It tells the browser to run a specific JavaScript snippet every time a new document (web page) is loaded. The snippet itself, Object.defineProperty(navigator, ‘webdriver’, { get: () => undefined }), is a clever piece of code that redefines the navigator.webdriver property. Instead of returning its default true value for an automated browser, it returns undefined, which is what a real human’s browser would have. This is a very powerful way to trick a website’s bot detection scripts.

3.2. Search Execution

Now that we’ve implemented our ChromeDriver, let’s implement a GoogleSearchService class for our automated search:

public class GoogleSearchService {
    private final WebDriver driver;
    public GoogleSearchService(WebDriver driver) {
        this.driver = driver;
    }
    public void navigateToGoogle() {
        driver.get("https://www.google.com");
    }
    public void search(String query) {
        WebElement searchBox = driver.findElement(By.name("q"));
        searchBox.sendKeys(query);
        searchBox.sendKeys(Keys.ENTER);
    }
    public String getPageTitle() {
        return driver.getTitle();
    }
    public void quit() {
        driver.quit();
    }
}

Here we used the standard Selenium driver driver.get(“https://google.com”) which helps us to navigate to the google search engine. We can swap this with any other public website we want to test our script on. The driver.findElement(By.name(“q”)) command finds the search box element on the page using its name attribute (q). This is a basic but reliable way to locate elements. Then, with the searchBox.sendKeys(“baeldung”) command, we’re typing the word “baeldung” into the search box, simulating a user’s input. The searchBox.sendKeys(Keys.ENTER) command simulates pressing the Enter key, submitting the search query.

3.3. Main Execution Flow

Let’s now look at the main class that ties everything together. The AvoidBotDetectionSelenium.java class acts as the main entry point to create the ChromeDriver, initialize the GoogleSearchService, and then trigger the subsequent search operations via the service object:

public class AvoidBotDetectionSelenium {
    public static void main(String[] args) {
        ChromeDriver driver = WebDriverFactory.createDriver();
        GoogleSearchService googleService = new GoogleSearchService(driver);
        googleService.navigateToGoogle();
        googleService.search("baeldung");
        googleService.quit();
    }
}

We’re using the AvoidBotDetectionSelenium class as the main entry point to orchestrate our stealthy browser automation. The process is sequential: we begin by calling WebDriverFactory.createDriver() to get a fully configured ChromeDriver that has been pre-set with anti-detection features (like hiding the navigator.webdriver property). This configured driver is then passed to the GoogleSearchService, which executes the core automation logic: navigating to Google using googleService.navigateToGoogle() and simulating a user search with googleService.search(“baeldung”). The googleService.quit() ensures the browser is properly closed and system resources are released, concluding the clean and professional execution flow. Finally, we need to consider the choice between headless and headed browsers. While we often use headless browsers for their speed and efficiency, some websites have advanced detection mechanisms that can identify them. In those cases, we’ve found that running the browser in visible mode (one that is displayed on the screen) can be a more reliable approach. It’s a trade-off between speed and stealth, and it’s a decision we need to make based on the specific behavior of the target website.

4. Legal and Correct Ways to Parse a Website

Now that we’ve seen how to bypass detection technically, let’s talk about responsibility. Just because we can bypass detection doesn’t always mean we should. Many websites explicitly prohibit automated scraping in their Terms of Service, and violating these can expose us to legal repercussions. Furthermore, we need to respect the unwritten rules of the internet. We can use a site’s robots.txt file as a guide; it’s a clear signal from the website’s owner about what’s okay to crawl and what’s off-limits. Ignoring its rules is generally viewed as an unethical practice within the developer community. When we consider safer, more professional alternatives, we should always look for public APIs first. Many companies provide these specifically for structured and authorized access to their data. It’s a win-win, as it gives us the data we need in a clean format while respecting the site’s infrastructure. If a public API isn’t available, we can also use scraping libraries like Jsoup or Selenium in cases where a site explicitly allows it or provides open, public endpoints. Finally, before we start any scraping project, it’s always a good practice to check if the data we need is already available. There are numerous open datasets on platforms like Kaggle and on various government websites. It saves time and ensures we’re not violating terms when a perfectly good, legal alternative already exists for our use.

5. Conclusion

Automating a browser with Selenium can be a frustrating experience when we’re constantly blocked by bot detection. In this tutorial, we saw how combining the appropriate ChromeOptions settings with the capabilities of the Chrome DevTools Protocol (CDP) allows us to build a powerful and stealthy automation tool. The techniques we covered, like disabling AutomationControlled and spoofing the navigator.webdriver properties are essential for making our automated scripts indistinguishable from a human user. Since we’ve unlocked this potential to bypass defenses, let’s commit to using it ethically and responsibly. Always respect a website’s robots.txt file and Terms of Service, and be mindful of the impact our script has on their servers. As always, the code for this article is available over on GitHub.