[nginx] Explaining how to view, configure, and locate access logs

Hello everyone.
I'm Nakano from the Systems Solutions Department, and I spend my days feeling sleepy when I don't want to sleep, and unable to sleep when I do want to.

This time, we will be talking about "access logs," which are something you will definitely come into contact with on a regular basis when it comes to operating and maintaining a web server

Among them, in recent years, nginx has finally surpassed Apache in global market share, and I would like to explain how to view, configure, and locate nginx access logs

Test environment

  • Linux environment
    OS: AlmaLinux release 9.2 (VirtualBox 7.0.12 environment)
    Middleware: nginx (1:1.20.1-14.el9_2.1.alma.1), HTTP(80)
  • Browser:
    Chrome: 120.0.6099.217 (Official Build) (64-bit)

Test Page

  • Domain: example.com
    *Access requires modifying the hosts file as it's a localhost environment.
  • HTML: index.html (for the top page), FAQ.html (FAQ page)

Location of nginx access logs and log examples

The default location for access logsis "/var/log/nginx/access.log".
If you just want to quickly check the access logs, it's recommended to open them using the less command, which is less resource-intensive.

less /var/log/nginx/access.log

1️⃣ URL:example.com (index.html) access log

192.168.33.1 - - [17/Jan/2024:08:47:50 +0000] "GET / HTTP/1.1" 200 37 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "-"

2️⃣ index.html (link within the page) → FAQ.html Access log

192.168.33.1 - - [17/Jan/2024:08:50:33 +0000] "GET /FAQ.html HTTP/1.1" 200 34 "http://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "-"

I created a site in my local environment that accepts requests at example.com, and excerpted some of the logs when I accessed it from a browser

This is the log that is output when you access the top page (index.html) of example.com and the FAQ page (FAQ.html) from there

The initial IP address and time are easy to understand, but the rest may be more difficult to understand, so I will explain them by comparing them with the settings items

Log format settings

The basic configuration file for nignx is "/etc/nginx/nginx.conf"

Within this, the "log_format" directive (setting) defines the format (style) of the access log.
* The destination for outputting the access log is also defined here.

less /etc/nginx/nginx.conf ~Excerpt~ http { log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"';

The part "log_format main" defines the format name as "main".
Following thatis the format (syntax) for what content to output,consisting of [nginx variables + hyphens, brackets, etc. for formatting the display].

Log format explanation / Comparison table with Access Log 2️⃣

Log Format Content Access log 2️⃣ values Remarks column
$remote_addr Connected IP address 192.168.33.1 Because the IP address is the one that made the direct request,
the IP address of the load balancer (LB) will be recorded if the request goes through the LB.
- Separating hyphen -
$remote_user The username specified for basic authentication - (blank) Basic authentication is often used
during development and maintenance, so
it's basically empty.
[$time_local] processing completionLocal time at [17/Jan/2024:08:27:22
+0000]
The "+0000" part indicates a time difference.
"+0000" is UTC (Ultimate Time)
, and "+0900" is JST (Japan Standard Time).
"$request" "Request details"
(method, request path, HTTP version)
"GET /FAQ.html
HTTP/1.1"
GET (display) request for "FAQ.html" was received using "HTTP/1.1"
This means that a
$status  "Status Code" 200 (success)
$body_bytes_sent "Bytes sent to client" 34 (bytes) of FAQ.html, etc.
Number of bytes in the main data (body) portion
"$http_referer" "Referrer"
(URL from which the access originated)
"http://example.com/"
*Homepage
Accessing the homepage ⇨ FAQ
.
requires specifying the URL directly
"$http_user_agent" "User Agent"
(Browser and OS information)
"Mozilla/5.0
(Windows NT 10.0;Win64;x64)
AppleWebKit/537.36
(KHTML, like Gecko)
Chrome/120.0.0.0 Safari/537.36"
on a Windows (OS) system
an access attempt from a Chrome-based browser
.
"$http_x_forwarded_for" "X-Forwarded-For"
(Source IP address)
"-" When routing through a proxy or load balancer,
the source IP address before the proxy is displayed.

There is a lot of information available

The above table can be summarized more simply as follows:

  • IP : 192.168.33.1
  • Username : None (= Not authenticated)
  • Access time : 08:27:22 on January 17, 2024, UTC (9 hours ahead in Japan Standard Time)
  • Access : FAQ site page (FAQ.html)
  • Connection status : Success (200)
  • Data size : 4B (bytes)
  • Access source : From the homepage (http://example.com)
  • Environment : Windows OS with a Chrome-based browser (as declared)
  • Is it going through a load balancer or proxy ?: No (because it's empty)

As you can see,a lot of information can be obtained from access logs.
By aggregating this data in various ways, it becomes possible to investigate access trends and determine whether an access attempt is malicious.

The default log format is very useful, so be sure to make use of it

Terminology

Basic Authentication

This is a simple authentication function that requires the input of a predetermined username and password. It is
intended for temporary use, such as during system setup or emergency maintenance, as it is only a basic, minimal authentication method.

In particular, HTTP(80) communication is vulnerable from a security standpoint because authentication information is transmitted in plain text (unencrypted).
For even temporary use, it is preferable that the site only uses HTTPS(443) communication.

Referer

Refers to the URL immediately preceding the page that was accessed

If you open the homepage from a Google search, the Google URL is recorded in the log, and if you open the FAQ from the homepage of a site, the URL of the homepage is recorded in the log

This term is actually a misspelling of the English word "referrer" (meaning: source of reference), butit has an interesting history in that it was decided upon with the misspelling during the specification development process and is still used today.

HTTP status codes

This is the code that tells you the processing result when HTTP(S) communication is performed.
I'll omit the full code to keep it concise, but the third digit is important.

  • 2xx: Successful response
  • 3xx: Redirect response
  • 4xx: Client error response
  • 5xx: Server Error Response

As shown above,the third digit of the number generally indicates the status.

Common error codes you'll see include 200 (success), 302 (temporary redirect), 404 (inability to access a non-existent location), and
503 (server unable to process).

User-Agent

As a term, a user agentrefers to "software used to communicate with a website."

Generally, websites are accessed using a browser, so by extension, it is treated as "information about the browser (and OS, etc.) that the user is using."

X-Forwarded-For

This is the item (header) that describes the source IP when LB or Proxy communicates

When communication is interposed between the client (user) and the web server, such as through a load balancer (LB) or proxy, the web server records the IP address of the LB or proxy, but itdoes not know the IP address of the source client before them.

Therefore, when traffic passes through load balancers (LBs) or proxies, the X-Forwarded-For to save the source IP address inthe de facto standardit has become

Aside: Defining the name "main" in the log format

defining a nameis that "when configuring log output, the log format to be used is specified by name."

less /etc/nginx/nginx.conf ~Partial excerpt~ access_log /var/log/nginx/access.log main;

It is used in the "access_log" directive, which is a setting (directive) that specifies the log output destination. Since the items defined and the items used are different, a name is required

This means you can set multiple definitions

For example, you can define a simplified log format that reduces unnecessary information as "easy", or conversely, if you want more detailed information, you can define a log format with more items (variables) as "detailed"

This means that you can use different definitions for each domain and environment

What happens if you don't specify a format name in the access_log directive?

Some of you may be in an environment where the format name is not specified

In this case, there are no problems with syntax checking or operation

in this access_log directiveIf no format name is specified, the "combined" definition, which is built-in but not written in the conf file,will be used as the default setting.

log_format combined '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"';

The use of the above definitions is described in the official nginx documentation

In terms of content, the format is slightly different from the default "main" written in conf, anddoes not have "$http_x_forwarded_for" specified at the end.

Incidentally, this definition has the same name as the default definition name "combined" in Apache, and the output content is also in the same format.

lastly

Apache logs are something we often touch and there is a lot of information available

In comparison, I don't have many opportunities to work with nginx, so I thought it would be more convenient to compile the information, and so I wrote this article

Personally, I find it easier to understand than Apache's log format specification

I hope this article provides some useful information to those who read it.
Thank you for reading this far.

*If you want to learn more about nginx, please also check out this blog:
[Super Beginner's Guide] Read this and you'll be all set! An explanation of NGINX that even beginners can understand.

Reference materials

Module ngx_http_log_module
https://nginx.org/en/docs/http/ngx_http_log_module.html

Module ngx_http_core_module
https://nginx.org/en/docs/http/ngx_http_core_module.html

The 'Basic' HTTP Authentication Scheme
https://datatracker.ietf.org/doc/html/rfc7617

Referer
https://developer.mozilla.org/ja/docs/Web/HTTP/Headers/Referer

HTTP response status codes
https://developer.mozilla.org/ja/docs/Web/HTTP/Status

User agent:
https://developer.mozilla.org/ja/docs/Glossary/User_agent

X-Forwarded-For
https://developer.mozilla.org/ja/docs/Web/HTTP/Headers/X-Forwarded-For

If you found this article helpful,please give it a "Like"!
9
Loading...
9 votes, average: 1.00 / 19
36,979
X Facebook Hatena Bookmark pocket

The person who wrote this article

About the author

inside

Beyond as a mid-career hire and
am currently in the Systems Solutions Department.
I hold LPIC-3 304 and AWS SAA certifications.