Oliver is a PhD student and research assistant at the chair for Data Modelling and Knowledge Generation at the University of Bayreuth. He holds a BSc in Media Informatics and an MSc in Computer Science from the Ludwig-Maximilians-University in Munich. Oliver spent some time in corporate software development, designing and maintaining distributed systems for the Web.

His research interests include

  • contextualized ontologies,
  • natural language processing,
  • network analysis,
  • machine learning, as well as
  • interdisciplinary research at the intersection of computer science and humanities

Recent posts

A comparable counter for Python

In this post, we’re going to investigate Python’s powerful collections.Counter class, and how we can use it to, well, count stuff. We’ll find that comparisons with other Counter instances are supported, but unintuitive, and that scalar comparisons are not supported. We’ll take a plunge into special methods, a somewhat low-level concept that enables, among other things, comparisons between objects. With this knowledge, we’ll build a ComparableCounter class that can we can use to apply a threshold to the actual value-counts and filter a Counter in a more intuitive approach!

ca. 1700 words, 8 minute(s) reading time.

Three patterns for speedy data processing with AWK

, last updated on

Have you ever fought with larger-than-memory datasets? Wanted to filter some column-oriented data against another file, or aggregate counts on groups? In this post, we’ll explore the problem-space around LTM data, analyze how I failed (multiple times…), and look at three useful patterns involving awk!

ca. 1400 words, 7 minute(s) reading time.

A flock of fish: locking functions for the fish-shell

As I was hacking away at a shell-script the other day, I realised the script should prevent multiple instances from running at the same time. In my case, I wanted a simple egg-timer on the command line that runs for X minutes, shows a notification and beeps annoyingly, then exits. Obviously, there’s no point in having multiple timers stack up, so there should only ever be one ticking away. There are multiple other reasons you may only want a single instance running: a long-running backup-task should complete before the next iteration starts, or all writes to a file should be synced to disk before modifying the file again.