While I love SQLite as much as the next person (and the performance and reliability is really quite remarkable), I can’t understand all the effusive praise when you can’t do basic things like dropping columns. How do people get around this? Do you just leave columns in forever? Or go through the dance of recreating tables every time you need to drop a column?
SQLite is for storing data in an environment where SQL is meaningful. Anyone wanting to do database admin tasks (like adjusting table schema) would be well advised to go with a Real Database.
SQLite really shines when you have a more-or-less final idea of what tables you want to build and don't have to worry about users inputting crazy data. Anything up to that and it is a model of simplicity. Anything beyond that and it is undercooked.
I just sucked the existing table into RAM and recreated the table. I did it on a transaction so there was no risk of data loss.
In my case the data was always 10s of MBs.
Remember, the point of SQLite is a replacement for generating a file format. Although it's a database, it lets us (developers) re-use our knowledge of databases when doing basic file I/O.
When do you need to drop a column in a production DB? Maybe my anecdotal bubble is about to burst, but I work in the public sector, and have for a while and on our 200 different production DBs behind around 300 systems we’ve never dropped a column.
Depends on the maturity of your schema - if it's all figured out based on your business domain it won't happen much. If you're still finding product-market fit (or equivalent) splitting a table into two happens sometimes.
"Splitting" a table usually means creating two new ones and dropping the old one after migrating its content with a complex migration script followed by thorough testing. Dropping columns is not only abnormal (adding columns is far more common: features tent to be added, not removed, over time) but also a very crude tool.
Well what I meant was: when you break one table out of another. The kind of thing that comes up when you learn that there's a one-to-many in the domain that you didn't know about when you started.
There are also operational concerns here. Dropping columns may require rebuilding indices, which can have a high cost that isn't worth paying for just to keep the schema clean.
Pretty sure they must, row based storage on disk would practically require it just to not completely waste all of the space you've just gained from deleting the column by leaving a gap on every single row.
If adding a nullable column is free, it probably means that the DBMS is able to distinguish multiple layouts for the same table: existing rows in which the new column doesn't actually exist and is treated as NULL, and newly written rows in which there is space for the new column.
But dropping a column is different: even if the DBMS performs a similar smart trick (ignoring the value of the dropped column that is contained in old rows) space is still wasted, and it can only be reclaimed by rewriting old files.
Dropping a column in postgres is also instant, so yes, it uses the same trick.
Deleting a row is similar too - the row is not removed from the heap page and the database does not get smaller (though if that page gets rewritten the row is not kept). Last time I used innodb it didn't actually return free heap pages to the filesystem at all so no matter how much you deleted the database file never got smaller.
I agree with you. SQLite drove me nuts when it came to changing your database. This is one of the reasons I just use DB Browser for SQLite (DB4S). It takes care of all the complexity.
The general strategy is to create a new table, insert data from the old table,
drop the old table, rename the new table, and re-create the indexes:
create table foo2 (
col1 int,
col2 text
);
insert into foo2 sleect col1, col2 from foo;
drop table foo;
alter table foo2 rename to foo;
create index on foo(col1);
As for the reason, see the next section on that link. It's not perfect, but it is what it is. SQLite wasn't designed to solve 100% of the use cases, and that's not a bad thing.
We use sqlite as a smaller cache of a much larger postgres db. The cache is generated by a job and yes is regenerated every time before being pushed to machines that need it.
Think of SQLite as a file format which happens to have a query interface, and not a database.
MySQL did DML for years with full table rewrites behind the scenes. It's not particularly hard to emulate, though not entirely pleasant.
(Although I really raise an eyebrow when people talk about performance. Every time I write anything server-like in sqlite I end up regretting it, and get enormous speed boost, not to mention actual concurrency, switching to PostgreSQL.)
For data analysis workloads i just load in my raw source data and then develop a series of scripts that create new tables or views on top of those raw inputs.
For my use cases I've thusly never had to drop/alter a column... but I understand it could be very annoying.