Tuning ElasticSearch for running on one node

The default settings for shards and replicates are 5 and 2 respectively. This is a bit overkill with my tiny data. I’m running on one node only, and I have < 1GB data per month.

First off, turn off replication by doing

This will take effect immediately.

Tuning down the number of shards used per index is a bit more complex. We need to create an index template with this setting, and it will only affect future indices.

If the database already contains indices split over several shards, this data needs to be reindexed.

Example: Reindex multiple (e.g. daily) logs:

First reindex into temporary indicies

Delete the originals

Reindex back to the original name

Any of these might time out during the call, so it could be a good idea to keep track of the document count manually.

Example: Reindex to aggregate logs

Reindexing can also be used to aggregate daily logs into monthly. I started out with daily logs but this was really overkill.

Doing this I cut down on the number of shards used from ~3800 to 28, and the system suddenly feels responsive again – even for Kibana dashboards with 10+ visualizations pulling data from 2 years back.

Automatically mount/dismount USB hdd with lvm

So I got me a 4TB external HDD so I can use Docker on my work laptop without running into a panicked state of being out of disk space every month or so.

I set up an lvm using pvcreate / vgcreate / lvcreate as detailed on my devops/sysadmin wiki.

Since this is a laptop I can’t really keep the HDD connected at all time. Just pulling the cord results in nasty I/O errors, or so I’ve heard. >_>;;;

Don’t do that.

I prepared a matching pair of bash files to detach and attach the volumes.

detach_volumes.bash

attach_volumes.bash

To have these scripts run when my computer suspends and wakes up, respectively, I need to add a file to /lib/systemd/system-sleep/

external-hdd

Ok. Almost there. I also want an easy way to run these scripts when I wanna take the laptop to a meeting. I added launcher icons to my window manager, and added the following in my sudoers file:

Read environment variables from files in bash

In the Docker world it’s increasingly common to send secrets through files instead of environment variables by appending _FILE to the regular environment variable name.

Here’s a bash function which reads such files and updates the corresponding environment variable with the file contents.

Example usage:

Change date format in Chrome

Yeah, you can’t. You’ll have to change the locale Chrome is running in. And if you’re on Linux you can’t do it in the browser – it looks at your environment variables.

So if you want your dates in nice ISO-8601 standard (or if you want to run Chrome in Swedish but the rest of your OS in English) you have to set up the environment just for Chrome.

Said and done, I changed my application launcher to run

/bin/sh -c 'LANGUAGE=sv_SE.UTF-8 && /usr/bin/google-chrome-stable'

Updating from pip 7.1.0 to pip 9.0.1 in python 3.6 virtualenv on windows

I’m using PyCharm, and I really like it. However, today I ran into a rather annoying problem.

  • I installed Python 3.6
  • I created a virtualenv for a new project
  • I noticed that pip was outdated – version 7.1.0 instead of 9.0.1
  • I tried to update it and things went south; see https://github.com/pypa/pip/issues/3964

The solution can be found buried in that issue, but for a tl;dr:1: Download https://bootstrap.pypa.io/get-pip.py
2:

 

Every new virtualenv I create through PyCharm still installs pip 7.1.0.

This is because PyCharm has its own version of pip installed inside %APPDATA%\..\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\163.10154.50\helpers.

It looks like a python.egg, only tar.gz’ed. I wanted to add a corresponding file for pip 9.01, but I couldn’t find one online – so I guess I’ll have to wait until PyCharm updates.

Instead, for every 3.6 virtualenv created with old!pip:

Or just create the virtualenvs outside of PyCharm. 🙁

Make a win32 exe of a Python 3 flask app

Well, this was a bit more annoying than I had expected.

I first tried out py2exe, since that’s what I’ve used to successfully build exe files with before. No dice, it doesn’t like werkzeug and its weird module magic.

Next step: Moving on to cx_Freeze then, since people on StackOverflow had reported success. Almost there but..

  • it couldn’t find my data files
  • …nor my templates
  • …nor my static files

All of these were located next to my script & my “library.zip”. Despite the error messages trying to trick me, it was not enough to copy the files into the zip.

So I just need a way to tell Flask & Jinja how to find them – preferably while also allowing me to run everything as a non-exe in my development environment.

Here’s what I did:

__init__.py

my_app.py

setup.py

Log POST data with Apache

We want to debug our web applications, or rather the input to them. This is mainly POST data, so normal Apache logging doesn’t do the trick. 1)Yes, the applications will have their own logging, eventually.

I tried out mod_dumpio which at first glance looks like a perfect match – but it’s so spammy. I’m sure there’s lots of uses for all that data, but it’s too much for our needs.

Next suggestion was mod_security, which has a rather intimidating reference manual.

First attempt:

Looks good! But look at all those plaintext passwords we’re logging. Not impressive.

My next attempt was to just filter out all log rows which contains the text “pass”. While it works, it looks like a hack (“yeah, uh… just stop auditing after this rule!”) and there might be interesting data in that log line that we want to log. For posterity, this is what I did:

I read through the manual some more and finally found sanitiseArg 2)And I didn’t notice until this writeup that I’m using -ize while the manual states -ise. How about that. Thanks, dev team!

SecAuditLog also supports pipes, so let’s cronolog it. 3)Of course logrotate works just fine, but I’m not a fan of just enumerating the files. Filenames with dates are easier to use.

So this is what we’re running. Still a bit spammy, but now we can toggle log parts, and easily add more filters.

References   [ + ]

1. Yes, the applications will have their own logging, eventually.
2. And I didn’t notice until this writeup that I’m using -ize while the manual states -ise. How about that. Thanks, dev team!
3. Of course logrotate works just fine, but I’m not a fan of just enumerating the files. Filenames with dates are easier to use.