Back to Blog

How Web Scraping is Used to Create Headless Chrome and Puppeteer Using an Authenticated Proxy Server?

Scraping Intelligence delivers an open-source package known as proxy-chain to allow running headless chrome and Puppeteer over other proxies.


Catagory
Other
Publish Date
April 25, 2022
Author
Scraping Intelligence
headless-chrome-and-puppeteer-using-an-authenticated-proxy-server
Table Of Content

    The inclusion of headless modes to Google Chromium, as well as the availability of a similar Node.js API called Puppeteer by Google previously this year, has made it exceedingly easy for developers to automate web operations like filling out forms and taking screenshots of web pages. You may use the—proxy-server command-line option to allow Chromium to utilize a custom proxy server:

    chrome --proxy-server=http://proxy.example.com:8080

    It's important to remember that chrome has to be an alternative for your Chromium executable (see how to do this). Because Chrome does not support the —proxy-server option in non-headless (headful?) mode, you must use Chromium instead of Chrome.

    The browser will display a window inviting you to provide a username and password if the proxy server requires authentication

    When you start Chromium in headless mode, though, you won't see this prompt since the browser doesn't have any windows. Chromium doesn't have a command-line option for passing proxy information, and neither Puppeteer's API nor the underlying Chrome DevTools Protocol (CDP) provide a mechanism to give it to the browser programmatically. It turned out that forcing headless Chromium to utilize a certain proxy account and password is not simple.

    After trying

    chrome --proxy-server=http://John_Doe:123@Pass!@proxy.example.com:8080

    To get around Chromium's constraint, you may set up an open local proxy server that forwards data to an upstream authorized proxy, and then tell Chromium to accept it. Squid and its cache peer configuration option can be used to build such a proxy chain. The following is an example of a Squid configuration file (squid.conf):

    http_port 3128
    cache_peer proxy.example.com parent 8080 0 \
      no-query \
      login=John_Doe:123@Pass! \
      connect-fail-limit=99999999 \
      proxy-only \
      name=my_peer
    cache_peer_access my_peer allow all
    

    Execute the following command to initiate squid:

    squid -f squid.conf -N

    Now that the proxy is running locally on port 3128, Chromium should be able to utilize it:

    chrome --proxy-server=http://localhost:3128

    If you wish to access it directly from your code or if you need to modify proxies on the fly, this technique becomes laborious. You'll need to either dynamically change Squid configuration or run a different Squid instance for each proxy in this situation.

    Squid processes might hang or not start at all, each platform acted differently, and so on. To do something about this, we created proxy-chain, a new NPM package that we distributed as open-source on GitHub. With it, you can quickly "anonymize" an authorized proxy and then use Puppeteer to start headless Chromium using the following Node.js code:

    const puppeteer = require('puppeteer');
    	const proxyChain = require('proxy-chain');
    	
    
    	(async() => {
    	    const oldProxyUrl = 'http://John_Doe:123@Pass!@proxy.example.com:8080';
    	    const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);
    	
    
    	    // Prints something like "http://127.0.0.1:45678"
    	    console.log(newProxyUrl);
    	
    
    	    const browser = await puppeteer.launch({
    	        args: [`--proxy-server=${newProxyUrl}`],
    	    });
    	
    
    	    // Do your magic here...
    	    const page = await browser.newPage();
    	    await page.goto('https://www.example.com');
    	    await page.screenshot({ path: 'example.png' });
    	    await browser.close();
    	    
    	    // Clean up, forcibly close all pending connections
    	    await proxyChain.closeAnonymizedProxy(newProxyUrl, true);
    	})();
    

    To handle protocols like HTTPS and FTP, the proxy-chain package supports both standard HTTP proxy forwarding and HTTP CONNECT tunneling. We'll be utilizing many more features in the package for our forthcoming projects, so follow us on Twitter:

    If you need a proxy for web scraping service, check out Scraping Intelligence Proxy, an HTTP proxy service that allows you access to both datacenter and residential IP addresses, as well as clever IP address rotation.

    Read the sample code given below:

    const puppeteer = require('puppeteer');
    	const proxyChain = require('proxy-chain');
    	
    
    	(async() => {
    	    const oldProxyUrl = 'http://John_Doe:123@Pass!@proxy.example.com:8080';
    	    const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);
    	
    
    	    // Prints something like "http://127.0.0.1:45678"
    	    console.log(newProxyUrl);
    	
    
    	    const browser = await puppeteer.launch({
    	        args: [`--proxy-server=${newProxyUrl}`],
    	    });
    	
    
    	    // Do your magic here...
    	    const page = await browser.newPage();
    	    await page.goto('https://www.example.com');
    	    await page.screenshot({ path: 'example.png' });
    	    await browser.close();
    	    
    	    // Clean up, forcibly close all pending connections
    	    await proxyChain.closeAnonymizedProxy(newProxyUrl, true);
    	})();
    

    Get in touch with us for any web scraping services.

    Request for a quote!


    About the author


    Zoltan Bettenbuk

    Zoltan Bettenbuk is the CTO of ScraperAPI - helping thousands of companies get access to the data they need. He’s a well-known expert in data processing and web scraping. With more than 15 years of experience in software development, product management, and leadership, Zoltan frequently publishes his insights on our blog as well as on Twitter and LinkedIn.

    Latest Blog

    Explore our latest content pieces for every industry and audience seeking information about data scraping and advanced tools.

    web-scraping-using-python-a-step-by-step-tutorial-guide-2025
    Services
    08 July 2025
    Web Scraping Using Python: A Step-By-Step Tutorial Guide (2025)

    No matter what industry you belong to, web scraping helps extract insights from industry datasets. It is a systematic process of getting data from online sources, top-ranking websites, popular platforms, and databases.

    guide-to-alcohol-data-scraping-pricing-trends-and-legal-risks
    Services
    24 Jun 2025
    The Ultimate Guide to Alcohol Data Scraping: Pricing, Trends & Legal Risks

    Learn how to scrape alcohol pricing & market trends safely. Explore legal risks, best tools, and strategies for extracting beverage industry data efficiently.

    The Complete Guide to Web Scraping
    Google
    19 Jun 2025
    How to Scrape Google Shopping for Price and Product Data?

    Learn how to collect real-time data from Google Shopping, which has an array of products and simple steps to scrape price and product data from Google Shopping.