[nginx] Explaining how to view, configure, and locate access logs

table of contents
Hello everyone.
I'm Nakano from the Systems Solutions Department, and I spend my days feeling sleepy when I don't want to sleep, and unable to sleep when I do want to.
This time, we will be talking about "access logs," which are something you will definitely come into contact with on a regular basis when it comes to operating and maintaining a web server
Among them, in recent years, nginx has finally surpassed Apache in global market share, and I would like to explain how to view, configure, and locate nginx access logs
Test environment
- Linux environment
OS: AlmaLinux release 9.2 (VirtualBox 7.0.12 environment)
Middleware: nginx (1:1.20.1-14.el9_2.1.alma.1), HTTP(80) - Browser:
Chrome: 120.0.6099.217 (Official Build) (64-bit)
Test Page
- Domain: example.com
*Access requires modifying the hosts file as it's a localhost environment. - HTML: index.html (for the top page), FAQ.html (FAQ page)
Location of nginx access logs and log examples
The default location for access logsis "/var/log/nginx/access.log".
If you just want to quickly check the access logs, it's recommended to open them using the less command, which is less resource-intensive.
less /var/log/nginx/access.log
1️⃣ URL:example.com (index.html) access log
192.168.33.1 - - [17/Jan/2024:08:47:50 +0000] "GET / HTTP/1.1" 200 37 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "-"
2️⃣ index.html (link within the page) → FAQ.html Access log
192.168.33.1 - - [17/Jan/2024:08:50:33 +0000] "GET /FAQ.html HTTP/1.1" 200 34 "http://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "-"
I created a site in my local environment that accepts requests at example.com, and excerpted some of the logs when I accessed it from a browser
This is the log that is output when you access the top page (index.html) of example.com and the FAQ page (FAQ.html) from there
The initial IP address and time are easy to understand, but the rest may be more difficult to understand, so I will explain them by comparing them with the settings items
Log format settings
The basic configuration file for nignx is "/etc/nginx/nginx.conf"
Within this, the "log_format" directive (setting) defines the format (style) of the access log.
* The destination for outputting the access log is also defined here.
less /etc/nginx/nginx.conf ~Excerpt~ http { log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"';
The part "log_format main" defines the format name as "main".
Following thatis the format (syntax) for what content to output,consisting of [nginx variables + hyphens, brackets, etc. for formatting the display].
Log format explanation / Comparison table with Access Log 2️⃣
| Log Format | Content | Access log 2️⃣ values | Remarks column |
| $remote_addr | Connected IP address | 192.168.33.1 | Because the IP address is the one that made the direct request, the IP address of the load balancer (LB) will be recorded if the request goes through the LB. |
| - | Separating hyphen | - | |
| $remote_user | The username specified for basic authentication | - (blank) | Basic authentication is often used during development and maintenance, so it's basically empty. |
| [$time_local] | processing completionLocal time at | [17/Jan/2024:08:27:22 +0000] |
The "+0000" part indicates a time difference. "+0000" is UTC (Ultimate Time) , and "+0900" is JST (Japan Standard Time). |
| "$request" | "Request details" (method, request path, HTTP version) |
"GET /FAQ.html HTTP/1.1" |
GET (display) request for "FAQ.html" was received using "HTTP/1.1" This means that a |
| $status | "Status Code" | 200 (success) | |
| $body_bytes_sent | "Bytes sent to client" | 34 (bytes) | of FAQ.html, etc. Number of bytes in the main data (body) portion |
| "$http_referer" | "Referrer" (URL from which the access originated) |
"http://example.com/" *Homepage |
Accessing the homepage ⇨ FAQ . requires specifying the URL directly |
| "$http_user_agent" | "User Agent" (Browser and OS information) |
"Mozilla/5.0 (Windows NT 10.0;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" |
on a Windows (OS) system an access attempt from a Chrome-based browser . |
| "$http_x_forwarded_for" | "X-Forwarded-For" (Source IP address) |
"-" | When routing through a proxy or load balancer, the source IP address before the proxy is displayed. |
There is a lot of information available
The above table can be summarized more simply as follows:
- IP : 192.168.33.1
- Username : None (= Not authenticated)
- Access time : 08:27:22 on January 17, 2024, UTC (9 hours ahead in Japan Standard Time)
- Access : FAQ site page (FAQ.html)
- Connection status : Success (200)
- Data size : 4B (bytes)
- Access source : From the homepage (http://example.com)
- Environment : Windows OS with a Chrome-based browser (as declared)
- Is it going through a load balancer or proxy ?: No (because it's empty)
As you can see,a lot of information can be obtained from access logs.
By aggregating this data in various ways, it becomes possible to investigate access trends and determine whether an access attempt is malicious.
The default log format is very useful, so be sure to make use of it
Terminology
Basic Authentication
This is a simple authentication function that requires the input of a predetermined username and password. It is
intended for temporary use, such as during system setup or emergency maintenance, as it is only a basic, minimal authentication method.
In particular, HTTP(80) communication is vulnerable from a security standpoint because authentication information is transmitted in plain text (unencrypted).
For even temporary use, it is preferable that the site only uses HTTPS(443) communication.
Referer
Refers to the URL immediately preceding the page that was accessed
If you open the homepage from a Google search, the Google URL is recorded in the log, and if you open the FAQ from the homepage of a site, the URL of the homepage is recorded in the log
This term is actually a misspelling of the English word "referrer" (meaning: source of reference), butit has an interesting history in that it was decided upon with the misspelling during the specification development process and is still used today.
HTTP status codes
This is the code that tells you the processing result when HTTP(S) communication is performed.
I'll omit the full code to keep it concise, but the third digit is important.
- 2xx: Successful response
- 3xx: Redirect response
- 4xx: Client error response
- 5xx: Server Error Response
As shown above,the third digit of the number generally indicates the status.
Common error codes you'll see include 200 (success), 302 (temporary redirect), 404 (inability to access a non-existent location), and
503 (server unable to process).
User-Agent
As a term, a user agentrefers to "software used to communicate with a website."
Generally, websites are accessed using a browser, so by extension, it is treated as "information about the browser (and OS, etc.) that the user is using."
X-Forwarded-For
This is the item (header) that describes the source IP when LB or Proxy communicates
When communication is interposed between the client (user) and the web server, such as through a load balancer (LB) or proxy, the web server records the IP address of the LB or proxy, but itdoes not know the IP address of the source client before them.
Therefore, when traffic passes through load balancers (LBs) or proxies, the X-Forwarded-For to save the source IP address inthe de facto standardit has become
Aside: Defining the name "main" in the log format
defining a nameis that "when configuring log output, the log format to be used is specified by name."
less /etc/nginx/nginx.conf ~Partial excerpt~ access_log /var/log/nginx/access.log main;
It is used in the "access_log" directive, which is a setting (directive) that specifies the log output destination. Since the items defined and the items used are different, a name is required
This means you can set multiple definitions
For example, you can define a simplified log format that reduces unnecessary information as "easy", or conversely, if you want more detailed information, you can define a log format with more items (variables) as "detailed"
This means that you can use different definitions for each domain and environment
What happens if you don't specify a format name in the access_log directive?
Some of you may be in an environment where the format name is not specified
In this case, there are no problems with syntax checking or operation
in this access_log directiveIf no format name is specified, the "combined" definition, which is built-in but not written in the conf file,will be used as the default setting.
log_format combined '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"';
The use of the above definitions is described in the official nginx documentation
In terms of content, the format is slightly different from the default "main" written in conf, anddoes not have "$http_x_forwarded_for" specified at the end.
Incidentally, this definition has the same name as the default definition name "combined" in Apache, and the output content is also in the same format.
lastly
Apache logs are something we often touch and there is a lot of information available
In comparison, I don't have many opportunities to work with nginx, so I thought it would be more convenient to compile the information, and so I wrote this article
Personally, I find it easier to understand than Apache's log format specification
I hope this article provides some useful information to those who read it.
Thank you for reading this far.
*If you want to learn more about nginx, please also check out this blog:
[Super Beginner's Guide] Read this and you'll be all set! An explanation of NGINX that even beginners can understand.
Reference materials
Module ngx_http_log_module
https://nginx.org/en/docs/http/ngx_http_log_module.html
Module ngx_http_core_module
https://nginx.org/en/docs/http/ngx_http_core_module.html
The 'Basic' HTTP Authentication Scheme
https://datatracker.ietf.org/doc/html/rfc7617
Referer
https://developer.mozilla.org/ja/docs/Web/HTTP/Headers/Referer
HTTP response status codes
https://developer.mozilla.org/ja/docs/Web/HTTP/Status
User agent:
https://developer.mozilla.org/ja/docs/Glossary/User_agent
X-Forwarded-For
https://developer.mozilla.org/ja/docs/Web/HTTP/Headers/X-Forwarded-For
9
