[For absolute beginners] What is suspicious access?

Introduction

Hello. I'm Paru, a second-year infrastructure engineer in the System Solutions Department, graduating in 2024.

To get straight to the point, can you all immediately picture what an access log is?
As an infrastructure engineer dealing with alerts, it's one of the basic logs I see every day, but in my first year, I particularly disliked and had a strong aversion to them.

So, this time,to write an article for those who are still unfamiliar with access logs, such as those who "don't know what's written in,"don't know how to investigate access logs""can't determine what kind of access is an attack."

*This article is about how to conduct an investigation, so it does not mention any commands for investigation, but there are many excellent articles listed in the reference sites at the bottom of the page, so I hope you will take a look at them!

Learn about access logs

What is the purpose of an access log?

It's fairly common to encounter slow websites or websites displaying errors that prevent you from viewing them.
While increased traffic and resulting server load aren't the only causes of these issues, examining access logsif traffic is the root causecan help determine
Outputting access logs is essential for isolating the cause.

Let's take a look at the access log

Let's take a look.
For example, let's say you access a certain website (http://example.net).
Then, the following message will be output to the server as an access log.

192.168.100.101 - - [1/Nov/2025:10:20:50 +0900] "GET /index.php HTTP/1.1" 200 1042 "http://example.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36"

At first glance, it looks like a simple string of characters that resembles a code, but in reality, it's just information about "when, who, where, and what they came for" written in a fixed order.
It's a bit long, but let's break it down below.

  1. 192.168.100.101 ...The original client's IP address(This is the IP address from which the access came)
  2. -Client identifier(usually not used, so it will be -)
  3. - ... the username of the requester(the name of the person who accessed the page that requires authentication. Usually this is -).
  4. [1/Nov/2025:10:20:50 +0900] ... Date and time of access(+0900 indicates a 9-hour time difference from UTC)
  5. "GET ... HTTP method(there are various types, such as GET communication and POST communication)"
  6. /index.php ... Request URI(This is the path that tells the server, "I want this file!")
  7. HTTP/1.1" ... HTTP version
  8. 200 ... Status code(200 means the request was successful!)
  9. 1042 ... Response bytes(size of data returned by the server; unit is bytes)
  10. "http://example.com" ... Referrer URL(The URL of the previous page from which you arrived via a link. This is the page you were on before http://example.net/index.php)
  11. "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36" ... User agent(browser and OS information. It shows that Chrome is being used.)

How to set the output format

There isn't a fixed format for access logs output by web servers;you can freely define "what information is recorded and in what order" in a configuration file.
This setting is generally/etc, but the format differs depending on the middleware being used (Apache, Nginx
, etc.). Below is an example of how to write it.

Apache ver.

LogFormat "%{X-Forwarded-For}i %l %u [%t] \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

Nginx ver.

log_format combined '$http_x_forwarded_for - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"';

in the Apache example %{X-Forwarded-For}i in the Nginx example $http_x_forwarded_for Let's briefly touch upon

This is an item that is not written by default, but by writing it,a load balancerorreverse proxy, even in environments where there isyou can find out which IP address the access came from.
Conversely, if this setting is not written, only the IP addresses of the load balancer or reverse proxy in front of it will be written, and detailed investigation will not be possible.

In short,freely customized depending on how you write it in the configuration filecan be
Changing the order in which you write the information will change the output order, and you can also record special values ​​(header information) as in this case.

For a detailed explanation of the format, please see the article below.

[Apache] A simple guide to reading access logs! *Updated February 2025

[nginx] Explaining how to view, configure, and locate access logs

Let's investigate the access logs.

This is a very rough guide to investigating access logs, showing you what you might want to look at.
Since access content and investigation methods vary widely, please consider this merely one example.

Premise: Your server's CPU LA (waiting queue) is rising! Let's investigate!

Step 1: Check the trend in access numbers

First, to check whether server load is the cause of the access, let's count the number of accesses per minute.

   10 [29/Oct/2025:12:00
   10 [29/Oct/2025:12:01
   11 [29/Oct/2025:12:02
2529 [29/Oct/2025:12:03
5107 [29/Oct/2025:12:04
2714 [29/Oct/2025:12:05
   26 [29/Oct/2025:12:06
   30 [29/Oct/2025:12:07

for just three minutes starting at 12:03 is quitestrange.
This appears to be the reason for the increased CPU load.

Step 2: Identify suspicious activity

Once you've identified a sudden surge in access, the next step"who came and why."is to determine
From here, you'll focus on the logs from the time period when the access surged and look for anything suspicious.

① Access source IP

Check if there are frequent accesses from a specific IP address.

# Access count IP address
10143 192.0.2.115
    128 192.0.2.22
      62 192.0.2.88
      17 192.0.2.54

If you check the IP address information on an IP lookup site andfind that a website intended for Japanese users is receiving a huge number of visits from overseas IP addresses,it's suspicious. It's highly likely to be a crawler or an attack.

② Access path (request URI)

We'll look at which paths on the site are receiving the most traffic.
Focusing on suspicious IP addresses and examining the paths will make it even clearer.

# Access destinations for IP 192.0.2.115: Path
105 /wp-login.php
  98 /xmlrpc.php
  42 /?author=1
  27 /wp-admin/admin-ajax.php
  15 /wp-config.php.bak
  15 /wp-admin/profile.php
  14 /wp-content/plugins/file-manager/readme.txt

It's quite difficult to distinguish between "suspicious" and "not suspicious" here, so let's look at them one by one.
the websitea shop, with legitimate access (simply clicking the URL) asthe front entranceand suspicious access (trying to access the administration panel of WordPress, etc.)the back entranceLet's think

wp-login.php and xmlrpc.php (suspicious)
Purpose: Unauthorized login (brute-force attack)
Example: It's like trying every possible ID and password to rattle the doorknob of a store's "employee-only back door".

/?author=1 (Suspicious)
Purpose: To identify the username
Example: Asking "Is there an employee with ID number 1?" from the front entrance, trying to steal the employee's name (username for login)

wp-config.php.bak (suspicious)
Purpose: Theft of confidential information
Example: It's like checking to see if a backup file (.bak) of a memo containing a store's "safe combination" (such as a database password) has been downloaded.

legitimate
WordPress function (automatic saving of articles, dynamic processing, etc.). If the number of accesses is unusually high, it may be being exploited for attacks.
wp-admin/profile.php (may not be suspicious depending on the accessing IP address) - allows logged-in users to view their own profile.
plugins/.../readme.txt (possibly for reconnaissance) - for scanning plugins.

The above is just one example, but access to such obviously suspicious paths is quite common.

③ GET communication and POST communication

I've talked at length about paths, buthow information is obtained through communicationalsoimportant.
I'd like to touch upon the commonly seen "GET communication" and "POST communication."

GET communication (image: postcard)
Purpose: When you want to ask for information. (Example: display a page, view information with /?author=1)
Features: The information you send is attached to the URL (like ?author=1). It's completely visible, so you can't send important information.

POST communication (Image: A letter in an envelope)
Purpose: To ask someone to "please receive this information!" (Examples: logging in, submitting a form)
Features: The information being sent (ID and password) can be hidden by placing it in an envelope. The information does not appear in the URL.

Why is this important?
In this investigation example, attacks on /wp-login.php (the back door) need to hide sensitive information like "ID and password" when sending it, so they are always done using POST requests.
If you look at the logs and seeGET requests"this is probably a scan, not an attack.you can determine that
Converselya lot of POSTrequests,this might be a brute-force attack!you can determine that

③ Status code

See how the server responds.

# Access count Status code
  6821 404
    153 500
    112 403
      35 200

4xx(Not Found, etc.) or5xx(Internal Server Error, etc.) errors, it indicates that access attempts are failing.
4xxerrors still allow the web server to respond,5xxerrors mean the server is unresponsive (i.e., the program is throwing an error), which can slow down the server and lead to increased load.

What if it returns 200 OK (normal) to suspicious access?

It's important to note that 200 OK (request successful)
simply means that the server returned what was requested, not that the access was secure. To use
the store analogy,404 is like a dead end, meaning "the safe combination is not here,"butyou've handed over the safe combination, saying "here you go."means

How to read the login attack (wp-login.php) log

POST /wp-login.php200 OK (Login Failed)
Meaning: Incorrect ID or password. The login page will be displayed again.
If this happens repeatedly, you can conclude that a brute-force attack is in progress.

POST /wp-login.php302 Found (Login successful)
Meaning: Authentication OK! You will be redirected to the dashboard (/wp-admin/).
If a 302 is returned from the attacking IP address, it is a dangerous sign that your site has been compromised by an attacker.

Therefore, if you are receiving 200 OK or 302 Found when accessing suspicious paths, itthat the attack is starting to succeed, and you should investigate more carefully.

There are many other types of status codes, so please see the article below for more details.

[This is all you need to remember] A quick review of HTTP status code errors

[This is all you need to remember] A quick review of HTTP status code errors

④ User Agent

Finally, look at what you used to access it.

# Examples of user agents
: "Mozilla/5.0 (compatible; DotBot/1.2; ...)"
"Mozilla/5.0 (... compatible; Google-Read-Aloud; ...)"

the logs"bot"or"crawlercontain terms likemalicious onesthere are also
Furthermore, if this section contains strange strings of characters, or if access is heavily skewed towards a specific user agent,an attackit can be a sign of

That's a lot of information, it's tiring...
Roughly speaking, based on the information above, we'll determine whether the server load is caused by access and what kind of access can be considered "suspicious."

How to deal with suspicious access?

When actually dealing with this issue, you will need to reach an agreement with the customer, but here are some typical examples of how to deal with this issue.

① Block the IP address (quickest method)

After investigating, we found that an abnormal number of accesses were coming from a specific IP address (e.g., 192.0.2.115), so the quickest solution would be to block access from this address.

The method of blocking varies depending on the environment.
With Nginx, you can write `deny 192.0.2.115;` in the conf file, or with an Apache environment, you can restrict access using .htaccess. You can also block access through the OS firewall or, if there is a WAF in front of it, there.

② Restrict access to specific paths

Attackerswp-login.phpwill target "employee-only backdoors (such asstrictly restrict access to these "backdoors."it is essential to

For example,wp-login.phpyou would add a setting to the Nginx configuration file that states, "Only your company's IP address should be allowed to access
*Note that the syntax may vary depending on the Nginx version!

location = /wp-login.php {
# Allow only my IP address
allow 1.2.3.4;
# Deny all others
deny all;

# Don't forget to include the PHP code:
include fastcgi_params;
fastcgi_pass unix:/run/php-fpm/www.sock;
}

For paths that you do not want to be accessed from an unspecified number of IP addresses (for example, xmlrpc.php, which is only used for attacks), it is effective to use deny all; to make them completely invisible in the first place.

summary

What did you think?

I used to hate access logs, but after looking at them for a year, I gradually started to understand them. I hope this
will be of some help to anyone struggling with investigating access logs.

Thank you for reading to the end 🌷

Reference sites:
How to check Linux access logs | A clear explanation of location, how to view, and usage examples
Explaining the commands "grep" and "awk" used to make logs easier to read
[Case study] How we recovered and took countermeasures after a WordPress site was tampered with
Learning the basics of HTTP requests: the difference between GET and POST
Setting access restrictions to the admin screen as a security measure for WordPress

If you found this article helpful,please give it a "Like"!
3
Loading...
3 votes, average: 1.00 / 13
247
X Facebook Hatena Bookmark pocket

The person who wrote this article

About the author

Paru

Graduated in 2024, Systems Solutions Department.
My future dream is to rent a slightly larger room and keep a cat.