Have you ever fought with larger-than-memory datasets? Wanted to filter some column-oriented data against another file, or aggregate counts on groups? In this post, we’ll explore the problem-space around LTM data, analyze how I failed (multiple times…), and look at three useful patterns involving awk!
Oliver is a PhD student and research assistant at the chair for Data Modelling and Knowledge Generation at the University of Bayreuth. He holds a BSc in Media Informatics and an MSc in Computer Science from the Ludwig-Maximilians-University in Munich. Oliver spent some time in corporate software development, designing and maintaining distributed systems for the Web.
His research interests include
- contextualized ontologies,
- natural language processing,
- network analysis,
- machine learning, as well as
- interdisciplinary research at the intersection of computer science and humanities
You can contact him via email at firstname.lastname@example.org or through social media:
As I was hacking away at a shell-script the other day, I realised the script should prevent multiple instances from running at the same time. In my case, I wanted a simple egg-timer on the command line that runs for X minutes, shows a notification and beeps annoyingly, then exits. Obviously, there’s no point in having multiple timers stack up, so there should only ever be one ticking away. There are multiple other reasons you may only want a single instance running: a long-running backup-task should complete before the next iteration starts, or all writes to a file should be synced to disk before modifying the file again.