[nginx] Explaining how to view, configure, and locate access logs

Hello everyone.
I work in the System Solutions Department and I spend my days feeling sleepy when I don't want to sleep and unable to sleep when I do.

This time, we will be talking about "access logs," which are something you will definitely come into contact with on a regular basis when it comes to operating and maintaining a web server

Among them, in recent years, nginx has finally surpassed Apache in global market share, and I would like to explain how to view, configure, and locate nginx access logs

Test environment

  • Linux environment
    OS: AlmaLinux release 9.2 (VirtualBox 7.0.12 environment)
    Middleware: nginx (1:1.20.1-14.el9_2.1.alma.1), HTTP(80)
  • Browser
    Chrome: 120.0.6099.217 (Official Build) (64-bit)

Test Page

  • Domain: example.com
    * Because it is a localhost environment, access it using hosts rewrite
  • HTML: index.html (for the top page), FAQ.html (FAQ page)

Location of nginx access logs and log examples

The default location of the access log is "/var/log/nginx/access.log".
If you want to check the access log first, we recommend opening it with the less command, which has a low load.

less /var/log/nginx/access.log

1️⃣ URL:example.com (index.html) access log

192.168.33.1 - - [17/Jan/2024:08:47:50 +0000] "GET / HTTP/1.1" 200 37 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "-"

2️⃣ index.html (link within the page) → FAQ.html Access log

192.168.33.1 - - [17/Jan/2024:08:50:33 +0000] "GET /FAQ.html HTTP/1.1" 200 34 "http://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "-"

I created a site in my local environment that accepts requests at example.com, and excerpted some of the logs when I accessed it from a browser

This is the log that is output when you access the top page (index.html) of example.com and the FAQ page (FAQ.html) from there

The initial IP address and time are easy to understand, but the rest may be more difficult to understand, so I will explain them by comparing them with the settings items

Log format settings

The basic configuration file for nignx is "/etc/nginx/nginx.conf"

Among these, the "log_format" directive (setting) defines the format of the access log.
* It also defines the output destination of the access log.

less /etc/nginx/nginx.conf ~Excerpt~ http { log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"';

The "log_format main" section defines the format name as "main",
followed by format of what content to output, and [nginx variables + hyphens, brackets, etc. to format the display].

Log format explanation / Comparison table with Access Log 2️⃣

Log Format Content Access log 2️⃣ values Remarks column
$remote_addr Connected IP address 192.168.33.1 Since this is the IP address that was directly requested,
the IP address of the LB is recorded when the request is made via the LB.
- Separating hyphen -
$remote_user The username specified for basic authentication - (blank) Basic authentication is often used
during development and maintenance, so
it is generally empty.
[$time_local] [ Local time when processing is completed + Timezone] [17/Jan/2024:08:27:22
+0000]
The "+0000" part indicates the time difference.
"+0000" is UTC (Standard Time)
and "+0900" is JST (Japan Time).
"$request" "Request Content"
( method, request path, HTTP version)
"GET /FAQ.html
HTTP/1.1"

This means that a "GET" (display) request for "FAQ.html" was received via "HTTP/1.1"
$status  "Status Code" 200 (success)
$body_bytes_sent "Bytes sent to client" 34 (bytes) FAQ.html etc.
Number of bytes in the main data (body) part
"$http_referer" "Referrer"
(URL from which the page was accessed)
"http://example.com/"
*Top page
Accessing the Top Page ⇨ FAQ
When "-" (blank) is displayed,
access is by specifying the URL directly.
"$http_user_agent" "User Agent"
(Browser/OS information)
"Mozilla/5.0
( Windows NT 10.0;Win64;x64)
AppleWebKit/537.36
(KHTML, like Gecko)
Chrome /120.0.0.0 Safari/537.36"

access from a Chrome-based browser
on Windows (OS) .
"$http_x_ forwarded_for" "X-Forwarded-For"
(Source IP )
"-" When going through a proxy or load balancer,
the previous source IP address is displayed.

There is a lot of information available

The above table can be summarized more simply as follows:

  • IP : 192.168.33.1
  • Username : None (= not authenticated)
  • Access time : 08:27:22 UTC (Japan time +9 hours) on January 17, 2024
  • Access : FAQ site page (FAQ.html)
  • Connection Status : Successful (200)
  • Data volume : 4B (bytes)
  • Access source : From the top page (http://example.com)
  • Environment : Windows OS using Chrome-based browser (as declared)
  • Is it via LB or Proxy ?: No (because it is empty)

As you can see, a lot of information can be obtained from access logs.
By aggregating this information in various ways, it is possible to investigate access trends and whether or not access is malicious.

The default log format is very useful, so be sure to make use of it

Terminology

Basic Authentication

This is a simple authentication function that requires the entry of a predetermined username and password.
Because it is a very basic and simple function, it is used for temporary purposes such as during construction or emergency maintenance.

In particular, HTTP(80) communication is weak in terms of security because authentication information is also sent in plain text (unencrypted).
If you are using the site even temporarily, it is recommended that the site only uses HTTPS(443) communication.

Referer

Refers to the URL immediately preceding the page that was accessed

If you open the homepage from a Google search, the Google URL is recorded in the log, and if you open the FAQ from the homepage of a site, the URL of the homepage is recorded in the log

This term is actually a misspelling of the English word "referrer" (meaning: source of reference), but because it was decided on in its misspelling state when the specification was being formulated, it has an interesting history of being used as is today.

HTTP status codes

This code tells you the processing result when HTTP(S) is communicating.
It would be too long to list it all here, but the third digit is important.

  • 2xx: Successful response
  • 3xx: Redirect response
  • 4xx: Client error response
  • 5xx: Server Error Response

As mentioned above, the third digit will give you a general idea of ​​the condition.

The most common codes you will see are 200 (success), 302 (temporary redirect), 404 (non-existent location cannot be accessed),
and 503 (server unable to process).

User-Agent

The term user agent refers to "the software used to communicate with a website."

Generally, websites are accessed using a browser, so by extension, it is treated as "information about the browser (and OS, etc.) that the user is using."

X-Forwarded-For

This is the item (header) that describes the source IP when LB or Proxy communicates

When communication is interposed between a client (user) and a web server, such as a LB or Proxy, the IP of the LB or Proxy is recorded on the web server side, but the IP of the sending client in front of it is not known.

Therefore, when communication goes through a LB or Proxy, it is de facto standard to save the source IP in the X-Forwarded-For

Aside: Defining the name "main" in the log format

Why do we need to define a name? The answer is " because when configuring log output, the log format to be used is specified by name ."

less /etc/nginx/nginx.conf ~Partial excerpt~ access_log /var/log/nginx/access.log main;

It is used in the "access_log" directive, which is a setting (directive) that specifies the log output destination. Since the items defined and the items used are different, a name is required

This means you can set multiple definitions

For example, you can define a simplified log format that reduces unnecessary information as "easy", or conversely, if you want more detailed information, you can define a log format with more items (variables) as "detailed"

This means that you can use different definitions for each domain and environment

What happens if you don't specify a format name in the access_log directive?

Some of you may be in an environment where the format name is not specified

In this case, there are no problems with syntax checking or operation

If no format name is specified in this access_log directive the default setting is the built-in "combined" definition, which is not written in conf

log_format combined '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"';

The use of the above definitions is described in the official nginx documentation

The content is slightly different in format from the "main" written by default in conf, and does not have "$http_x_forwarded_for" specified at the end .

By the way , this is the same name as the default definition name "combined" in Apache, and the output content is also the same.

lastly

Apache logs are something we often touch and there is a lot of information available

In comparison, I don't have many opportunities to work with nginx, so I thought it would be more convenient to compile the information, and so I wrote this article

Personally, I find it easier to understand than Apache's log format specification

I hope this article will be of some use to those who read it.
Thank you for reading this far.

*If you want to know more about nginx, please also check out this blog:
[Super Beginner] Just read this! NGINX explanation that even beginners can understand

Reference materials

Module ngx_http_log_module
https://nginx.org/en/docs/http/ngx_http_log_module.html

Module ngx_http_core_module
https://nginx.org/en/docs/http/ngx_http_core_module.html

The 'Basic' HTTP Authentication Scheme
https://datatracker.ietf.org/doc/html/rfc7617

Referer
https://developer.mozilla.org/ja/docs/Web/HTTP/Headers/Referer

HTTP response status codehttps
://developer.mozilla.org/ja/docs/Web/HTTP/Status

User agent
https://developer.mozilla.org/en/docs/Glossary/User_agent

X-Forwarded-For
https://developer.mozilla.org/ja/docs/Web/HTTP/Headers/X-Forwarded-For

If you found this article useful, please click [Like]!
8
Loading...
8 votes, average: 1.00 / 18
35,549
X Facebook Hatena Bookmark pocket

The person who wrote this article

About the author

inside

I joined Beyond mid-career and
in the System Solutions Department
. I have LPIC-3 304 and AWS SAA certifications.