Recently I moved this blog from WordPress to my own server and started serving my articles with mynt, a static site generator. Besides several benefits, there is a downside: no statistics about my blog’s traffic. I thought about using Google Analytics or Piwik, but I’m not interested in either tracking the visitors of my blog or knowing too much details like their eye color. I just need to know, which blog post has how much unique visitors (per day, per month, overall). After a quick research I realized, that there are either services, that require a fee or self-hosted solutions mostly written in PHP. Because I’d like to have a self-hosted python tool, I decided to write some parts of it my own.
While I did some research, I stumbled upon Parsible. It’s a tool written in Python to parse logs in real time. Because it is highly customizable by plug ins, it is possible to parse nginx webserver logs and further process this data as you want. So at first I forked Parsible, customized the nginx parser and wrote a mongodb processor, which stores each log item into a mongodb database.
Now that the log data is in a database, a solution is needed to analyze and visualize these log items. I decided to write my own tool called plogx. Basically it aggregates all log data, adds filters and visualize statistics about visitors of a website. The statistics are being served to the browser by Flask, a lightweight microframework.
Four things are needed:
To install mongodb, follow the instructions on the mongodb website for your system. For Ubuntu servers this howto is recommended: http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/
To install Parsible you can clone my fork of it:
git clone https://github.com/pedesen/parsible.git
Parsible runs as a service using supervisor. Here’s the script located in
[program:parsible] command = python /path/to/parsible/parsible.py --log-file /path/to/your/logfile.log \ --pid-file /tmp/parsible.pid --parser parse_nginx autostart=true autorestart=true stopsignal=QUIT sterr_logfile=/var/log/supervisor/parsible_err.log stdout_logfile=/var/log/supervisor/parsible_out.log
Please don’t forget to change the path to the Parsible script and the location of the nginx log file above you want to parse (usually located in
As mentioned in the Parsible instructions you have to make an adjustment in the logrotate config file, if you are using logrotate. In my case I had to add the following bold line to the config file for my log file (located in
prerotate if [ -d /etc/logrotate.d/httpd-prerotate ]; then \ run-parts /etc/logrotate.d/httpd-prerotate; \ fi; \ endscript postrotate [ ! -f /var/run/nginx.pid ] || kill -USR1 `cat /var/run/nginx.pid` [ ! -f /tmp/parsible.pid ] || kill -USR1 `cat /tmp/parsible.pid` endscript
The Parsible supervisor script can be started like this and should be started automatically at system startup:
sudo supervisorctl parsible start
If all went as expected, Parsible should start to parse your log files in real time and store them as documents in the mongodb-database
log_db in a collection named
To install plogx, first create a Python virtual environment. It will also install the dependencies flask and flask-pymongo:
git clone https://github.com/pedesen/plogx.git cd plogx virtualenv env source env/bin/activate pip install -r requirements deactivate
To test plogx, you can use the bultin flask development server. Please don’t use this in production! Always serve the flask app with uwsgi and nginx or Apache for example and at least secure it with basic auth:
cd path/to/plogx source env/bin/activate python plogx/app.py
This will start running plogx on port 5000. If all went well, you can browse to the webinterface at http://127.0.0.1:5000. If you’d like to configure plogx, copy config_example.py to config.py in the plogx directory and make some adjustments. Here’s an example of config.py which excludes some unwanted visitors from the stats:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
# List of paths to ignore (useful for css and js files) excluded_paths = [ "/favicon.ico", "/feed.xml" ] # List of IPs to ignore if you want to blacklist some requests excluded_ips =  # Ignore log items, which contain one of these strings in the client field # Note that you can use regex expressions here Example to ignore Googlebots: # excluded_clients = [".*Googlebot.*",] excluded_clients = [ ".*Googlebot.*", ".*Twitterbot.*", ".*bingbot.*", ]
Maybe I will simplify the process of installation and configuration with tools like Docker. Any advice is appreciated, just write a comment below!