How to use Puppeteer with DockerRead More
Puppeteer is a Node.js library that provides a high-level API to control Chromium (or Firefox) browsers. Puppeteer is a Node.js library for interacting with Chrome. Firefox support has also been added recently.
The Puppeteer Node.js image is used inside the Docker container. When installing chromium from apt, if we use Docker images for Node.js v14 LTS Gallium, it will be v90.0, which may cause compatibility issues. Chromium's latest stable release was used to test it.
A puppeteer script automates testing, archives webpage data, and generates screenshots of live web content. Through a clear API, you can navigate to pages, click on form controls, and issue browser commands.
Running headless Chrome in a Docker container can be complex since many dependencies are required. Install Puppeteer in a Kubernetes cluster, in an isolated container on your dev machine, or in a CI pipeline.
Requirements for Basic Services
The image used in this article is based on Debian. The displayed package manager commands will need to be adjusted accordingly if you are using a different base. You do not need to install Node.js manually if you use the official Node.js image as a starting point that can be used as a suitable starting point.
In order to install Puppeteer, you will need the Node.js package manager named npm. As this package bundles the latest version of Chromium within it, it should theoretically be able to get you running with just an npm install puppeteer command. Ideally, you should be able to run Chrome in a clean Docker environment without the need to install any dependencies.
Because Chrome is normally a heavyweight GUI program, it depends on libraries for fonts, graphics, configuration, and window management, since it is typically a heavyweight GUI program. You will need to include all of these in your Dockerfile if you wish to use them.
Dockerfile
FROM node:slim AS app
# We don't need the standalone Chromium
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
# Install Google Chrome Stable and fonts
# Note: this installs the necessary libs to make the browser work with Puppeteer.
RUN apt-get update && apt-get install curl gnupg -y \
&& curl --location --silent https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install google-chrome-stable -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# Install your app here...
Using Puppeteer in Docker
Using Puppeteer in a Dockerized environment requires special considerations when launching Chrome. It is still necessary to add additional launch flags to the environment even after installing all the dependencies.
It demonstrates a simple script that launches a headless Chrome instance, navigates to a URL, and then takes a screenshot of the page you are viewing. In order to prevent wasting resources on the system, the browser is then closed.
When Chromium calls launch(), it receives the following arguments:
disable-gpu
– In most cases, you won't be able to access the GPU inside a Docker container, unless you have specially configured the host to do so. When this flag is set explicitly, Chrome will not attempt to render using GPU-based technology unless the flag is explicitly set.no-sandbox
anddisable-setuid-sandbox
– The purpose of this is to disable Chrome's sandboxing, a step that is required when the application is running as a root user (the default in a Docker container). These flags could be used to enable malicious web content to escape the browser process and compromise the host if these flags are used. In order to ensure that your Docker containers are strongly isolated from your host machine, you must take the following steps. If you are uncomfortable with this process, you will need to manually configure working Chrome sandboxing, which is a more involved process which may not be suitable for everyone.disable-dev-shm-usage
– Docker's default low shared memory space of 64MB causes issues when the flag is set, so setting this flag is necessary to avoid running into issues. Instead of writing into /tmp, Chrome will write into /tmp.
It is important to add your JavaScript to your container using a COPY instruction. When the proper Chrome flags are used, Puppeteer should be able to execute successfully, in case you're using Chrome.
The code config
Remember to use the installed browser instead of the Puppeteer's built-in one inside your app's code.
import puppeteer from 'puppeteer';
...
const browser = await puppeteer.launch({
executablePath: '/usr/bin/google-chrome',
args: [...] // if we need them.
});
Conclusion
As part of your Continuous Integration pipelines and production infrastructure, Puppeteer can be run in a Docker container, allowing you to automate websites as part of your CI pipelines. It also helps you isolate your environment during development, so you do not have to install Chrome locally. The browser installation via apt will resolve the necessary dependencies to run a headless browser inside a Docker container without any manual intervention on your part. As a default, the Node.js Docker images do not include these dependencies as part of their package. BTW, we have already setup the puppeteer docker environment for you.
In order for your container to work properly, you need to install the right dependencies. In order for Chrome to operate correctly in your Dockerized environment, you must also set Chrome launch arguments. After this procedure, you should be able to use the Puppeteer API without any further special consideration. The easiest way to use Puppeteer inside a Docker container is to install Google Chrome since, unlike the Chromium package offered by Debian, Chrome only offers the latest stable version of the browser. Chrome's resource usage is something that should be taken into consideration when using the browser. If a single container instance is launched with multiple browsers running at the same time, Docker's memory limits could quickly be exceeded. Either you need to increase the limitations on the container or implement a system that restricts the number of scripts running concurrently or reuses the running browser instances.