[nginx] Explaining how to view, settings, location, etc. of access logs
Hello everyone.
I am a member of the System Solutions Department and spend my days getting sleepy when I don't want to, and having trouble sleeping when I want to.
This time, when we talk about the operation and maintenance of web servers, we will definitely talk about "access logs," which we often come into contact with.
Among these, I would like to explain how to view, configure, and locate the access logs of nginx, which has finally surpassed Apache in global market share in recent years.
test environment
- Linux environment
OS: AlmaLinux release 9. 2 (VirtualBox 7.0.12 environment)
Middleware: nginx (1:1.20.1-14.el9_2.1.alma.1), HTTP(80) - Browser
Chrome: 120.0.6099.217 (Official Build) (64 bit)
test page
- Domain: example.com
* Because it is a localhost environment, access by rewriting hosts - HTML: index.html (for top page), FAQ.html (FAQ page)
Nginx access log location and log examples
The default access log location is "/var/log/nginx/access.log".
If you want to check the access log first, we recommend opening it with the less load command.
less /var/log/nginx/access.log
1️⃣ URL:example.com (index.html) Access log
192.168.33.1 - - [17/Jan/2024:08:47:50 +0000] "GET / HTTP/1.1" 200 37 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML , like Gecko) Chrome/120.0.0.0 Safari/537.36" "-"
2️⃣ index.html (internal link) → FAQ.html Access log
192.168.33.1 - - [17/Jan/2024:08:50:33 +0000] "GET /FAQ.html HTTP/1.1" 200 34 "http://example.com/" "Mozilla/5.0 (Windows NT 10.0 ; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "-"
I built a site in my local environment that accepts example.com, and excerpted some of the logs when accessed from a browser.
This is the log output when accessing the top page (index.html) of example.com and the FAQ page (FAQ.html) from there.
The first IP address and time are easy to understand, but the rest may be difficult to understand, so I will explain them by comparing them with the setting items.
About setting the log format (log format)
The basic configuration file for nignx is "/etc/nginx/nginx.conf".
In this, the "log_format" directive (setting) defines the format of the access log.
*The access log output destination is also defined.
less /etc/nginx/nginx.conf ~Excerpt~ http { log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$ http_x_forwarded_for"';
The "log_format main" part defines the format name with the name "main".
After that, the format of what content is to be output is lined up [nginx variables + hyphens, square brackets, etc. to arrange the display].
Log format explanation / Comparison table with access log 2️⃣
log format | Content | Access log 2️⃣ value | Remarks column |
$remote_addr | Connected IP address | 192.168.33.1 | Since the IP was directly requested, the LB IP is recorded when going through the LB. |
- | Separator hyphen | - | |
$remote_user | Username specified for Basic authentication | - (blank) | Basic authentication is often used during development and maintenance, so it is basically empty. |
[$time_local] | [Local time at the time of processing completion + Timezone] | [17/Jan/2024:08:27:22 +0000] |
The "+0000" part is the time difference. "+0000" is UTC (Standard Time) and "+0900" is JST (Japan Time). |
"$request" | "Request content" ( method, request path, HTTP version) |
"GET /FAQ.html HTTP/1.1" |
This means that a "GET" (display) request for "FAQ.html" has been received using "HTTP/1.1" |
$status | "Status code" | 200 (success) | |
$body_bytes_sent | "Number of bytes sent to client" | 34 (bytes) | FAQ.html etc. Number of bytes of main data (body) part |
"$http_referer" | "Referrer" (access source URL) |
"http://example.com/" *Top page |
Access to top page ⇨ FAQ If "-" (blank), access by directly specifying the URL |
"$http_user_agent" | "User agent" (browser/OS information) |
"Mozilla/5.0 ( Windows NT 10.0;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome /120.0.0.0 Safari/537.36" |
accessing from a Chrome browser on Windows (OS) . |
"$http_x_ forwarded_for" | "X-Forwarded-For" (source IP ) |
"-" | When going through Proxy or LB, the previous source IP address is displayed. |
There is a lot of information available
The above table can be summarized more concisely as follows.
- IP : 192.168.33.1
- Username : None (=not authenticated)
- Access time : 08:27:22 on January 17, 2024, UTC (+9 hours in Japan time)
- Access destination : FAQ site page (FAQ.html)
- Connection status : Success (200)
- Data amount : 4B (bytes)
- Access source : From the top page (http://example.com)
- Environment : Windows OS using Chrome browser (declared)
- Is it via LB or Proxy? Not via LB or Proxy (because it is empty)
In this way, you can get quite a lot of information from the access log.
By aggregating these various types, it is possible to investigate access trends and whether or not there is malicious access.
The default log format is very convenient, so please make use of it.
Terminology explanation
Basic Authentication
This is a simple authentication function that requires you to enter a predefined user name and password name.
Since it is a minimal and simple item, it is used for temporary purposes such as during construction or emergency maintenance.
In particular, with HTTP (80) communication, the authentication information is also sent in plain text (unencrypted), so security is weak.
When using the site even temporarily, it is desirable that the site only supports HTTPS (443) communication.
Referer
Points to the previous URL with a link to the page that was accessed.
If you open the home page from a Google search, the Google URL will be recorded in the log, and if you open the FAQ from the site's home page, the URL of the home page will be recorded in the log.
This term is originally a misspelling of the English word "referrer" (meaning: referrer), but it has an interesting history and is still used today as it was decided on as a misspelling when the specifications were being developed.
HTTP status codes
This is a code that tells you the processing result when HTTP(S) communicates.
It would be too long to list everything, so I will omit it, but the third digit is important.
- 2xx: Success response
- 3xx: Redirect response
- 4xx: Client error response
- 5xx: Server error response
As shown above, the condition can be roughly determined by the third digit number.
The most common codes you will see are 200 (success), 302 (temporary redirect), 404 (unable to access non-existent location), and
503 (server cannot process).
User-Agent
User agent as a term refers to "the software used to communicate with a website."
Generally, websites are accessed using a browser, so this information is treated as "information about the browser the user is using (along with information about the OS, etc.)".
X-Forwarded-For
This is an item (header) that describes the source IP when LB or Proxy communicates.
When communication is interposed between a client (user) such as LB or Proxy and the web server, the IP of the LB or Proxy is recorded on the web server side, but the IP of the source client in front of it is not known. .
Therefore, when communicating via LB or Proxy, it has become a de facto standard to save the source IP in the X-Forwarded-For
Side note: Regarding defining the name “main” in the log format
Why define names? Regarding this, `` When configuring log output, the log format to be used is specified by name .''
less /etc/nginx/nginx.conf ~Excerpt~ access_log /var/log/nginx/access.log main;
It is used in the "access_log" directive, a setting (directive) that specifies the log output destination. Since the items to be defined and the items to be used are different, naming is necessary.
In other words, multiple definitions can be set.
For example, you can define a simplified log format with the name "easy" by reducing unnecessary information, or conversely, if you want more detailed information, you can define a format with more log items (variables) as "detailed". .
Therefore, you can use different definitions for each domain and environment.
What happens if you don't specify a format name in the access_log directive?
Some people may say, "This is an environment where there is no format name specified."
In this case, there is no problem with syntax checking or operation.
If no format name is specified in this access_log directive the default setting is the ``combined'' definition, which is not written in conf but is already included
log_format combined '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"';
It is stated in the nginx official documentation that the above definition is used.
The format of the content is slightly different from "main" written by default in conf, and "$http_x_forwarded_for" is not specified at the end .
By the way , this definition has the same name and output content as the default definition name "combined" in Apache.
lastly
Apache logs are frequently accessed and contain a lot of information.
Compared to that, I don't have many opportunities to use nginx, so I thought it would be more convenient to compile information about it, so I decided to write an article about it.
Personally, I like it because it's easier to understand than Apache's log format specification.
I hope this article provides some useful knowledge to those who read it.
Thank you for reading this far.
*If you want to know more about nginx, please also check out this blog.
[Super beginner] Just read this! NGINX explanation that even beginners can understand
Reference materials
Module ngx_http_log_module
https://nginx.org/en/docs/http/ngx_http_log_module.html
Module ngx_http_core_module
https://nginx.org/en/docs/http/ngx_http_core_module.html
The 'Basic' HTTP Authentication Scheme
https://datatracker.ietf.org/doc/html/rfc7617
Referer
https://developer.mozilla.org/ja/docs/Web/HTTP/Headers/Referer
HTTP response status code
https://developer.mozilla.org/ja/docs/Web/HTTP/Status
User agent
https://developer.mozilla.org/ja/docs/Glossary/User_agent
X-Forwarded-For
https://developer.mozilla.org/ja/docs/Web/HTTP/Headers/X-Forwarded-For