[nginx] Explaining how to view, configure, and locate access logs

table of contents
Hello everyone.
I work in the System Solutions Department and I spend my days feeling sleepy when I don't want to sleep and unable to sleep when I do.
This time, we will be talking about "access logs," which are something you will definitely come into contact with on a regular basis when it comes to operating and maintaining a web server
Among them, in recent years, nginx has finally surpassed Apache in global market share, and I would like to explain how to view, configure, and locate nginx access logs
Test environment
- Linux environment
OS: AlmaLinux release 9.2 (VirtualBox 7.0.12 environment)
Middleware: nginx (1:1.20.1-14.el9_2.1.alma.1), HTTP(80) - Browser
Chrome: 120.0.6099.217 (Official Build) (64-bit)
Test Page
- Domain: example.com
* Because it is a localhost environment, access it using hosts rewrite - HTML: index.html (for the top page), FAQ.html (FAQ page)
Location of nginx access logs and log examples
The default location of the access log is "/var/log/nginx/access.log".
If you want to check the access log first, we recommend opening it with the less command, which has a low load.
less /var/log/nginx/access.log
1️⃣ URL:example.com (index.html) access log
192.168.33.1 - - [17/Jan/2024:08:47:50 +0000] "GET / HTTP/1.1" 200 37 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "-"
2️⃣ index.html (link within the page) → FAQ.html Access log
192.168.33.1 - - [17/Jan/2024:08:50:33 +0000] "GET /FAQ.html HTTP/1.1" 200 34 "http://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "-"
I created a site in my local environment that accepts requests at example.com, and excerpted some of the logs when I accessed it from a browser
This is the log that is output when you access the top page (index.html) of example.com and the FAQ page (FAQ.html) from there
The initial IP address and time are easy to understand, but the rest may be more difficult to understand, so I will explain them by comparing them with the settings items
Log format settings
The basic configuration file for nignx is "/etc/nginx/nginx.conf"
Among these, the "log_format" directive (setting) defines the format of the access log.
* It also defines the output destination of the access log.
less /etc/nginx/nginx.conf ~Excerpt~ http { log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"';
The "log_format main" section defines the format name as "main",
followed by format of what content to output, and [nginx variables + hyphens, brackets, etc. to format the display].
Log format explanation / Comparison table with Access Log 2️⃣
| Log Format | Content | Access log 2️⃣ values | Remarks column |
| $remote_addr | Connected IP address | 192.168.33.1 | Since this is the IP address that was directly requested, the IP address of the LB is recorded when the request is made via the LB. |
| - | Separating hyphen | - | |
| $remote_user | The username specified for basic authentication | - (blank) | Basic authentication is often used during development and maintenance, so it is generally empty. |
| [$time_local] | [ Local time when processing is completed + Timezone] | [17/Jan/2024:08:27:22 +0000] |
The "+0000" part indicates the time difference. "+0000" is UTC (Standard Time) and "+0900" is JST (Japan Time). |
| "$request" | "Request Content" ( method, request path, HTTP version) |
"GET /FAQ.html HTTP/1.1" |
This means that a "GET" (display) request for "FAQ.html" was received via "HTTP/1.1" |
| $status | "Status Code" | 200 (success) | |
| $body_bytes_sent | "Bytes sent to client" | 34 (bytes) | FAQ.html etc. Number of bytes in the main data (body) part |
| "$http_referer" | "Referrer" (URL from which the page was accessed) |
"http://example.com/" *Top page |
Accessing the Top Page ⇨ FAQ When "-" (blank) is displayed, access is by specifying the URL directly. |
| "$http_user_agent" | "User Agent" (Browser/OS information) |
"Mozilla/5.0 ( Windows NT 10.0;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome /120.0.0.0 Safari/537.36" |
access from a Chrome-based browser on Windows (OS) . |
| "$http_x_ forwarded_for" | "X-Forwarded-For" (Source IP ) |
"-" | When going through a proxy or load balancer, the previous source IP address is displayed. |
There is a lot of information available
The above table can be summarized more simply as follows:
- IP : 192.168.33.1
- Username : None (= not authenticated)
- Access time : 08:27:22 UTC (Japan time +9 hours) on January 17, 2024
- Access : FAQ site page (FAQ.html)
- Connection Status : Successful (200)
- Data volume : 4B (bytes)
- Access source : From the top page (http://example.com)
- Environment : Windows OS using Chrome-based browser (as declared)
- Is it via LB or Proxy ?: No (because it is empty)
As you can see, a lot of information can be obtained from access logs.
By aggregating this information in various ways, it is possible to investigate access trends and whether or not access is malicious.
The default log format is very useful, so be sure to make use of it
Terminology
Basic Authentication
This is a simple authentication function that requires the entry of a predetermined username and password.
Because it is a very basic and simple function, it is used for temporary purposes such as during construction or emergency maintenance.
In particular, HTTP(80) communication is weak in terms of security because authentication information is also sent in plain text (unencrypted).
If you are using the site even temporarily, it is recommended that the site only uses HTTPS(443) communication.
Referer
Refers to the URL immediately preceding the page that was accessed
If you open the homepage from a Google search, the Google URL is recorded in the log, and if you open the FAQ from the homepage of a site, the URL of the homepage is recorded in the log
This term is actually a misspelling of the English word "referrer" (meaning: source of reference), but because it was decided on in its misspelling state when the specification was being formulated, it has an interesting history of being used as is today.
HTTP status codes
This code tells you the processing result when HTTP(S) is communicating.
It would be too long to list it all here, but the third digit is important.
- 2xx: Successful response
- 3xx: Redirect response
- 4xx: Client error response
- 5xx: Server Error Response
As mentioned above, the third digit will give you a general idea of the condition.
The most common codes you will see are 200 (success), 302 (temporary redirect), 404 (non-existent location cannot be accessed),
and 503 (server unable to process).
User-Agent
The term user agent refers to "the software used to communicate with a website."
Generally, websites are accessed using a browser, so by extension, it is treated as "information about the browser (and OS, etc.) that the user is using."
X-Forwarded-For
This is the item (header) that describes the source IP when LB or Proxy communicates
When communication is interposed between a client (user) and a web server, such as a LB or Proxy, the IP of the LB or Proxy is recorded on the web server side, but the IP of the sending client in front of it is not known.
Therefore, when communication goes through a LB or Proxy, it is de facto standard to save the source IP in the X-Forwarded-For
Aside: Defining the name "main" in the log format
Why do we need to define a name? The answer is " because when configuring log output, the log format to be used is specified by name ."
less /etc/nginx/nginx.conf ~Partial excerpt~ access_log /var/log/nginx/access.log main;
It is used in the "access_log" directive, which is a setting (directive) that specifies the log output destination. Since the items defined and the items used are different, a name is required
This means you can set multiple definitions
For example, you can define a simplified log format that reduces unnecessary information as "easy", or conversely, if you want more detailed information, you can define a log format with more items (variables) as "detailed"
This means that you can use different definitions for each domain and environment
What happens if you don't specify a format name in the access_log directive?
Some of you may be in an environment where the format name is not specified
In this case, there are no problems with syntax checking or operation
If no format name is specified in this access_log directive the default setting is the built-in "combined" definition, which is not written in conf
log_format combined '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"';
The use of the above definitions is described in the official nginx documentation
The content is slightly different in format from the "main" written by default in conf, and does not have "$http_x_forwarded_for" specified at the end .
By the way , this is the same name as the default definition name "combined" in Apache, and the output content is also the same.
lastly
Apache logs are something we often touch and there is a lot of information available
In comparison, I don't have many opportunities to work with nginx, so I thought it would be more convenient to compile the information, and so I wrote this article
Personally, I find it easier to understand than Apache's log format specification
I hope this article will be of some use to those who read it.
Thank you for reading this far.
*If you want to know more about nginx, please also check out this blog:
[Super Beginner] Just read this! NGINX explanation that even beginners can understand
Reference materials
Module ngx_http_log_module
https://nginx.org/en/docs/http/ngx_http_log_module.html
Module ngx_http_core_module
https://nginx.org/en/docs/http/ngx_http_core_module.html
The 'Basic' HTTP Authentication Scheme
https://datatracker.ietf.org/doc/html/rfc7617
Referer
https://developer.mozilla.org/ja/docs/Web/HTTP/Headers/Referer
HTTP response status codehttps
://developer.mozilla.org/ja/docs/Web/HTTP/Status
User agent
https://developer.mozilla.org/en/docs/Glossary/User_agent
X-Forwarded-For
https://developer.mozilla.org/ja/docs/Web/HTTP/Headers/X-Forwarded-For
8