Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is what an "ETL" (Extract-Transform-Load) tool is for. Something like FME Server [1] would handle the first two points and the last point well.

For unzipping something that crazy, I'm interested in your solution - I think I'd have to write a custom zip library and use a RAMdisk or similar.

1: https://www.safe.com/fme/fme-server/



Yes, that's ETL. Classic ETL dealt with databases, the modern variant has relaxed this constraint.

As for the zip: We simply "unzip -p" and stream process it carefully (with a custom program reading XML and transforming it). Cuts processing time from hours (extracting the zip and creating all directories, then visiting each file) to minutes (read from a single file).




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: