How to: Measure your NginX Cache Performance using $upstream_cache_status in a custom Cache log
How to: Measure your NginX Cache Performance using $upstream_cache_status in a custom Cache log
One of the great features of NginX is it’s ability to cache content from a number of sources including but not limited to a proxy server or a fast cgi server. But in order to understand truly how well your cache is performing it is important to log the access to your server and record how the cache behaves. There are other things worth capturing which is why I recommend creating a new log which format you can customize and therefore monitor to see different metrics.
I. Define a new logging format:
As I mentioned I recommend creating a new logging format so you can over time customize it to the information that matters to you and study the behavior of your caching server.
log_format nginx_cache ‘$remote_addr – $upstream_cache_status [$time_local] ‘
‘”$request” $status $body_bytes_sent ‘
‘”$http_referer” “$http_user_agent” ‘;
Now, I have additional information that I am interesting in as my caching is skipped under certain conditions and I group the user agents in two groups: Desktop and Mobile clients so as you can see below I have added additional variables to record this:
log_format nginx_cache ‘$remote_addr – $upstream_cache_status [$time_local] – $device_type – ‘
‘”$request” $status $body_bytes_sent ‘
‘”$http_referer” “$http_user_agent” ‘
‘”$cookie_nocache” – “$arg_nocache” – “$arg_comment” – “$http_pragma” – “$http_authorization” – “$skip_cache” ‘;
The big things to notice here are:
- I have a mapping that sets $device_type as I mentioned into two possible values: Desktop or Mobile
- The last line includes all the variables I use to determine if I will bypass the cache. This are useful for troubleshooting but not necessary if you don’t bypass the cache.
II. Register your new log in the server block to capture caching data
Just like your access and error logs, this new cache log we defined needs to be activated for logging to take place. Simply add a line like your access log line to enable it with a few differences:
access_log /var/log/nginx/cache.log nginx_cache;
As you may notice, we are overwriting the access_log implementation by adding at the end the format we wish to follow which we defined previously as nginx_cache. Also, our implementation of the nginx_cache log format replicates to a degree the access log. If you wish to you could consolidate your access log into your caching log or keep them separate, up to you.
III. Analyze your log
As expected we want to get some valuable information out of our log. We need to develop a method to parse all those lines and output useful information to us.
Here are a few commands useful for that:
A. Get a sense of how many times the cache is used, bypassed, etc. In the case of a proxy cache ‘-‘ might be used as Bypassed to the upstream server
awk '{print $3}' /var/log/nginx/cache.log | sort | uniq -c | sort -r
Here is some sample output:
9 MISS
6 STALE
19 BYPASS
156 –
127 HIT
Here are what those different results mean:
- Miss
- When a page that is candidate for caching was not found in the cache.
- Stale
- A cached page that has expired and was visited. NginX will update the cached content.
- Bypass
- When a page has been indicated to not be cached.
- –
- A number of things I don’t quite understand yet. Seems to be used also as a Bypass by my proxy cache, although technically it means the request never reached the upstream server. I see this on connections that do not fall under that category so I am confused.
- Hit
- What you really want to see. That a page that was tried to be accessed was found in the cache and returned from it.
So at the beginning you probably are going to get lots of Misses but eventually the Hits need to predominate. If you are getting too many Stales you should review your expiration policy.
B. Desktop vs Mobile visitors
This is another interesting metric. In my case I wasn’t sure how worth it was to cache responses for mobile users. Knowing how many kinds of visitors you get helps you understand your audience. Remember, all this queries depend on the order you use in your log file so keep that in mind.
awk '{print $7}' /var/log/nginx/cache.log | sort | uniq -c | sort -r
will output something like:
649 desktop
3 mobile
Now, keep in mind that anything that is not mobile I am cataloging as desktop on my user agent mapping. This might be misleading as bots, etc. will appear as desktop clients and that would mess up the results. You can see that by using the user agent instead of the device type variable. I created a new mapping table to identify some common user agents for things like availability monitoring which pings my server every x seconds. This can help you further understand your visitors (or just use Google Analytics):
52 Desktop
1 HighEnd-Mobile
16 Microsoft-NetworkProbe
C. Response codes
Understanding how often a request for a non-existent page is made, etc. is also important. Again, leverage awk to get a summary:
awk '{print $12}' /var/log/nginx/cache.log | sort | uniq -c | sort -r
will output:
8 403
34 304
27 404
25 302
2 500
19 499
142 301
1305 200
D. Many more!
As you can see we are just scratching the surface of what you can do here with custom logs. I just haven’t had the chance or interest to dig too much into them, but there are resources out there to help you if that is what you want. Below are a few resources I found on the web that deal with this and go much deeper into what you can do. At the end of the day it is all about how you parse your log file, so keep that in mind and have it in an easy to parse format.
- How to parse log files: http://www.the-art-of-web.com/system/logs/
- Parsing request size to size your NginX buffers: https://rtcamp.com/tutorials/nginx/tweaking-fastcgi-buffers/
- Parsing log to find a) Broken links b) Who requests them c) 404 requests: https://rtcamp.com/tutorials/nginx/log-parsing/
Happy parsing!