Headless mode is now standard in Google Chrome 59, so let's try it out
Hello.
I'm Mandai, in charge of Wild on the development team.
Google Chrome version 59 includes headless mode.
Until now, when it comes to headless browsers, PhantomJS and Selenium are famous (I personally used Watir as well. → Headless browser list)
All of them require various preparations to build an environment. However, with Google Chrome 59, you can create a headless environment just by installing it! So I tried it.
Please note that as of version 59, it is not compatible with Windows, so operation has been confirmed on CentOS 7.
Update Google Chrome
Updating is easy if you installed it via yum.
sudo yum -y upgrade google-chrome-stable
If you have not installed it yet, download the rpm package from
this page A repository for Google Chrome will also be automatically added, so future updates can be executed using the yum command.
Let's try moving it for now ~ Check the DOM ~
Try accessing the Beyond homepage through Google Chrome
google-chrome --headless --disable-gpu --dump-dom https://beyondjapan.com<body id="index" style=""> ...
By running it with the option "--dump-dom", I was able to get the DOM of the site.
Let's try it for now ~Screenshot~
Next, I will take a screenshot.
google-chrome --headless --disable-gpu --screenshot --window-size=1280,1440 https://beyondjapan.com [0608/054855.748933:INFO:headless_shell.cc(436)] Written to file screenshot.png .
When taking a screenshot, you can only take a fairly small area if you do not specify the browser display size, so set the screen size with the "--window-size=[width],[height]" option.
To save by specifying a file name,
google-chrome --headless --disable-gpu --screenshot=top.png --window-size=1280,1440 https://beyondjapan.com [0608/055147.536344:INFO:headless_shell.cc(436)] Written to file top.png.
Specify the file name as an argument for the "--screenshot" option.
Let's try it out - convert to PDF -
Next, let's convert the site to PDF.
google-chrome --headless --disable-gpu --print-to-pdf https://beyondjapan.com [0608/033512.266562:INFO:headless_shell.cc(436)] Written to file output.pdf.
A PDF was output with the name output.pdf.
When you look at the contents, you will see that a PDF with the same layout as the batch site is output.
It's not particularly special, but the site header that is fixed at the top is displayed on each page, so the top of the second and subsequent pages is not visible.
I think there are quite a few sites like this, so I think it's good to know about them.
Also, with this command, if multiple pages are converted to PDF, they will be overwritten.
google-chrome --headless --disable-gpu --print-to-pdf=top.pdf https://beyondjapan.com [0608/033723.196640:INFO:headless_shell.cc(436)] Written to file top.pdf.
By specifying a file name as the argument of the "--print-to-pdf" option like this, you can save the PDF with a name.
summary
I use Google Chrome casually and it updates to the latest version without permission, but this time I thought it would be a fairly impactful update, so I picked it up.
Actually, there is a way to operate Google Chrome started in headless mode using Node.js via the DevTools Protocol, and I feel that it has become possible to do it more casually and more deeply.
I would like to write about this part again.
Since today is Rock Day (69), I wanted the content to be rock, but it wasn't that kind of thing.
P.S.: I wrote a related article called [Try using headless Google Chrome from Node.js | Beyond Co., Ltd.](https://beyondjapan.com/blog/2017/07/headless-chrome-with-nodejs) .
Addendum 2: [Google Chrome's headless mode seems to be able to monitor the appearance in detail, so I tried it | Beyond Co., Ltd.] (https://beyondjapan.com/blog/2017/07/headless-chrome-networks) Related article I also wrote
That's it.