Why Lua?
Felipe, a Cloud Architect and software engineer originally from Brazil, wanted to explore a language close to his roots. Lua is a small, embeddable language with a runtime that fits in just 300KB and is often used in game engines, embedded systems, and niche applications.
“It doesn’t have many features — and that’s exactly the point,” said Felipe. “I wanted something simple, stable, and fast. Plus, I was a bit ashamed as a Brazilian developer not to know Lua well.”
Baseline: 8.5 minutes to process One Billion Rows
Felipe started with the simplest approach: read each line one by one, extract the values, and update a hash map per station. The script ran in 8.5 minutes on a high-end MacBook Pro M2 Max with 64GB RAM.
Respectable for a scripting language — but nowhere near the Java benchmark.
Iterative optimizations: Lua Style
Felipe didn’t stop there. With each iteration, he applied smart changes and squeezed more speed from Lua:
🔁 Version 2: Load entire file into RAM
Instead of reading line by line, the program now read the entire 13GB file into memory first.
Improvement: Processing time dropped to 6 minutes.
⚙️ Version 3: Skip hash calculations
By eliminating billions of unnecessary hash operations for just a few station names, Lua shaved time dramatically.
Improvement: Decreased by only 10 seconds.
🚀 Version 4: LuaJIT… sort of
He tried LuaJIT, Lua’s Just-In-Time compiler. It didn’t work due to memory limits (4GB RAM max), so Felipe moved to multi-processing instead: one Lua process per CPU core.
Result: Still reading line by line, this dropped the time to 45 seconds.
🔍 Version 5 — Efficient Multi-Process Design
Each process was optimized to read a slice of the file and aggregate results in parallel.
Result: Further improved to 30 seconds.
Advanced tricks that pushed Lua to the limit
🧠 Preallocate hash tables
Avoiding costly dynamic memory resizing by pre-sizing hash maps (with max 10,000 stations) gave another boost.
New time: 30 seconds → 20 seconds.
🔢 Parse numbers as integers
Instead of floating-point math, Felipe parsed temperatures as integers (e.g. 19.7°C becomes 197), speeding up calculations significantly.
New time: 6.5 seconds.
🧵 Overloading CPU cores
By running more processes than CPU cores, Lua kept the CPU busy during coordination time. He also disabled garbage collection and used FFI (foreign function interface) for lightweight C-based number parsing.
Final time: 2.8 seconds — a 98% reduction from the original version.
Lessons learned
Felipe's journey wasn’t just about chasing performance. It revealed powerful lessons about:
Hardware-awareness: Understanding memory, disk I/O, and CPU usage can unlock massive improvements.
Simplicity in tooling: Lua might be tiny, but with the right design, it delivers serious power.
Iterative thinking: Felipe treated each version as an experiment. Measure, tweak, repeat.
Community collaboration: Inspired by the Java community, he borrowed ideas and adapted them to Lua’s constraints.