In this post I explain how to optimize Pandas to process huge amounts of data. I explain the three optimizations that allowed me to analyze more than 100 million rows and 59 columns with a regular computer: 1. looping correctly by using Pandas’ builtins, NumPy, and SIMD vectorization; 2. tweaking dtypes; and 3. parallelizing using all CPU cores and unlocking bigger-than-RAM datasets with Dask.
Optimize Pandas & Dask for big datasets: the example
Execute anything when booting using systemd
A concise explanation about executing a script at boot by using systemd and a regular user account, instead of root.
Big Sheets – Domain Driven Design with a hexagonal architecture
Big Sheets is my attempt to a software using concepts of clean architecture, hexagonal architecture, Domain Driven Design (DDD), and a bit of event-driven programming; by following the amazing book Architecture Patterns with Python.
This post introduces the architecture of the software, so you can take a peek at the code after, and comments a few challenges and learnings from the experience. If you are a practitioner, Big Sheets can serve you as an elaborated example.
The retry-requests package
Networks, and servers are unreliable: they can fail, saturate… our requests should handle these scenarios. I created a small package in python — retry-requests —to do it for us easily.
Distribute your apps with apt
Debian’s app distribution (apt) is a simple, native, and friction-less tool to manage apps and their dependencies. Just with the combo apt update && apt upgrade you ensure that systems are up-to-date, including your software and its dependencies —and it works with Ubuntu and friends. In this tutorial we learn how to package apps, generating
Debug segmentation faults in Apache from mod_wsgi
In this guide we show how to get information from apache segmentation faults that come from python’s mod_wsgi.
Create a sphinx extension to customize your docs
With Sphinx is easy to generate documentation of your python project, as long as you don’t require some custom code. This is a tutorial of how to create a quick & dirty Sphinx extension to personalize the docs of your project.
DAGs with materialized paths using postgres ltree
Building Acyclic Directed Graphs (DAGs) using materialized paths with Postgre’s ltree.
pip, python packages and venv
Python package and virtual environment management can be a bit tricky. This post has some best and bad practices from a personal point of view.
Router as a WiFi extender & switch
Do you know you can use your old and abandoned router as a switch and a WiFi extender? If you plug an ethernet cable to the router it will broadcast that connection through WiFi and the other remaining ethernet ports. So you can, for example, take the router to another room with poor WiFi signal
Code in PyCharm in your PC and execute in a linux virtualbox
I need to develop a software that only executes in Linux and I program in Mac. Moreover, I want to execute it in isolation as the software can potentially destroy the OS and I will run it with root. Let’s configure what we need to develop in our beloved machine A and run/debug in our
Useful SSH and SCP commands and config
SSH and SCP are really powerful commands. In this entry I quickly explain them and show some cool usage cases and configurations.
Create a custom live Debian 9 and 10 the pro way
In the following tutorial we create a custom Debian 9 and 10 live installation image by using debian-live. Our live image carries some private software that we want to execute every time someone boots the live cd. The resulting live-cd boots with an USB stick, a CD, and the network by PXE.