Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Design Of SQLite4 (sqlite.org)
252 points by ingve on May 25, 2014 | hide | past | favorite | 67 comments


Doubling down on the key/value model gives me the impression that a real and fully functional ALTER command is becoming even more unlikely than it already is, not to mention that this widely desired feature is not even mentioned here.

A key advantage to the relational model is that we get to have schemas. But if you can't make reasonable changes to those schemas without explicitly rewriting all your data from scratch, that works pretty hard against those advantages. Note that an ALTER command that rewrites all the data automatically, while it would perform like crap, would be better than none at all. Because that's what everyone is doing anyway.


I'm perplexed how you arrived at this conclusion from the linked document.. every major DBMS with a "fully functional ALTER" has a key/value store underneath it, as did SQLite 3, as does PostgreSQL and MySQL, as does..

The only difference with the new design is that the SQL engine and storage engine have been explicitly modularized and exported as a public API, potentially allowing simultaneous k/v and SQL use within a single transaction, or flexible use of SQL with an existing store.


> I'm perplexed how you arrived at this conclusion from the linked document..

it is based on:

https://news.ycombinator.com/item?id=5887053

> I am very familiar with SQLite internals. The answer is already in there. SQLite stores each row as each column value encoded sequentially corresponding to the declared order of the columns. Changing column order or deletions/inserts require a rewrite of every row... > ...A SQLite provided ALTER TABLE implementation would do exactly what was stated - start a transaction, rename the existing table to a temporary name, create a new one with the desired schema, and copy data across mangling as appropriate before deleting the old table and finishing the transaction.

the document here appears to suggest that the internal structure idea is being largely maintained as is, except that it will be organized in one giant blob, rather than blob-per-table. If the structure were being changed such that ALTER were suddenly much more feasible, I'd assume that would be one of the giant headlines of this story. But it's not.

Of course we'd always welcome hearing from actual SQLite developers someday on this issue.


SQLite 4 will have one storage engine record for each table row, just as it was in SQLite 3.

Fundamentally, there is little difference in how SQLite stores data in the storage engine as compared to, say, Postgres, except that Postgres' SQL implementation is more tightly bound to its storage engine (e.g. index tuples are encoded using knowledge of the engine, whereas in SQLite they simply use the record's primary key)

Postgres and suchlike don't have some magical data structure that makes ALTER TABLE possible, all row oriented stores have the same choice: either implement the alter immediately (involves a scan and rewrite) or lazily (on next record update).

Not sure if there are any that take the latter approach, but the method is commonplace elsewhere


To be able to defer the re-encoding of rows (e.g. when ALTER changes column order) the row header must store the version/identifier of the format used to encode it. You can also do it block-at-a-time instead of row-at-a-time, so only each block would store a version/identifier of the format. But any way, deferrable re-encoding means some storage overhead and inconsistent UPDATE performance after ALTER until all rows (or all blocks) have been re-encoded.


I don't think "lazy alter" is a particularly good idea in an RDBMS as it could lead to the operation failing after its transaction has been committed which is something that is generally avoided in relational databases.


1) That would be true for any row-oriented DBMS.

2) LSM tree is blazingly fast in inserts (my own experiments shows two orders of magnitude difference for BerkeleyDB and my own LSM tree implementation in C# for bulk random inserts).

So I think SQLite4 is safe here.


Great! So they can add a full blown ALTER. But so far, the only indication we can get is, "never".


Please correct me if I'm wrong, but isn't a major use-case for SQLite a database for embedded systems? In that light, I can't picture too many situations off the top of my head where one would want to change the schema.

That being said, I've personally seem SQLite used for more prototype-y stuff (default db for new instances of a Rails app, ad-hoc data stores for mobile phone apps, etc)


I am asked constantly for ALTER support in SQLite, including by the maintainer of GNU Mailman, which is neither embedded nor a prototype one-off. SQLite is extremely flexible and used in an enormous variety of situations.

References:

http://dustycloud.org/blog/sqlite-alter-pain/

https://news.ycombinator.com/item?id=5886898

https://bitbucket.org/zzzeek/alembic/issue/21/column-renames...

plus https://bitbucket.org/zzzeek/alembic/issue/129/column-additi... in case sqlite devs care to look at that one...


Very interesting! Thanks for the references.

Having read some of these, though, it's pretty clear that ALTER support is a major pain point. Would you have any insight as to why the SQLite maintainers would not choose to prioritize that issue? From what I know of SQLite, it's a very well built piece of software. It seems odd to me that such a smart bunch of guys would disregard community feedback without a good reason.


The problem is that it's hard to support for a database like sqlite. Things have a very specific on-disk representation, so you can't just add data to the middle of a file; rewriting the file also means others can't read it at the same time, so you can't do long operations.

When you look at this operation on a normal database, it isn't too bad. The database can change the on-disk representation while still serving requests, and it can use whatever storage it wants, since it's the only one that operates on the data.


Wouldn't a database generally be offline when you ALTER it? At least one that is used in scenarios where SQLITE is appropriate.

In that case it should not hurt too much if the operation takes some time.


> Please correct me if I'm wrong, but isn't a major use-case for SQLite a database for embedded systems?

SQLite is used in OSX (spotlight metadata, Core Data framework, iTunes, Mail), iOS, Android, WebSQL, etc.

I am pretty sure every computer gadget you own runs some instance of SQLite (be it an anti-virus app, Firefox, Thunderbird, Oracle products, Skype, Adobe software, PHP (SQLite2 & SQLite3), QT and even many Win Phone 8 apps incorporate SQLite3 library.


It's needed, for example, when upgrading applications that use sqlite for storage. Like in the mobile phone case, if a new version of the app comes out that has a new feature that requires a new column in one of the tables, when you upgrade the app on your phone, the app probably wants to migrate the sqlite db to the new schema rather than throwing away all your saved data.


This is something I hadn't considered. Being able to do firmware updates to a device with an embedded instance of SQLite seems huge.

Seeing as ALTER support is currently lacking, what kinds of approaches are commonly taken to work around this issue? I'm particularly interested in hearing about how people manage workaround to this problem for things like upgrading embedded devices. I imagine the scarcity of resources (RAM, CPU) available to most embedded devices would make this a difficult problem to solve on the hardware at hand.


The usual way to deal with that (and recommended by the SQLite docs) is creating a new table, copying the data, and dropping the old table.

Needless to say that this is VERY error prone, and limited resources make it even more so.


People want to change schema in embedded systems, programs get updated. Having an explicit mechanism to deal with that is one thing IndexedDB and webSQL got right.

We use ALTER tables in pouchdb on webSQL schema upgrade, I am not familiar with the missing functionality, we dont do a lot of complicated things though.


I think at least part of the confusion here is the definition of 'embedded'. To some people 'embedded' means 'within the same process or thread as the core application', to others it means 'firmware running on small hardware devices'.


> To some people 'embedded' means 'within the same process or thread as the core application'

Only that definition is silly here, since SQLite is by its serverless nature always embedded inside another program.


> Please correct me if I'm wrong, but isn't a major use-case for SQLite a database for embedded systems?

What makes you think that? It is used in many Desktop applications. E.g. most browsers (Firefox, Chrome, Safari), Skype and apparently even Apple Mail, iTunes and the Steam runtime use SQLite. Do you want to store complex indexed data in your desktop or smartphone application? Maybe something like a music library? Then you usually use SQLite. Even if you store relatively simple data you may use SQLite: no need to invent your own data file format. SQLite runs virtually everywhere and has bindings for virtually every programming language.

And as someone else already said: applications get updates that add features and need to update their DB accordingly.


I agree that adding this, even if it is horribly inefficient, would be better than everyone reimplementing it at the application layer. I just don't see how you've drawn this conclusion from the linked summary. This sounds to me like a pet gripe of yours and you are just looking for a forum to air it.


We had a discussion about SQLite4 on HN (2012): https://news.ycombinator.com/item?id=4168645

Someone mentioned it on the SQLite mailing list and the first reply was this:

  LOL.  Those Hacker News guys are hardcore.  Make some of
  my mailing lists look almost civil.
-- http://sqlite.1065341.n5.nabble.com/SQLite4-don-t-scream-td6...


"SQLite4 requires all elements of the PRIMARY KEY to be non-null. This is an SQL standard. Due to an oversight in early versions, SQLite3 does not enforce the NOT NULL constraint on PRIMARY KEY columns since by the time the omission was discovered SQLite3 was in such widespread use, activation of NOT NULL enforcement would have broken too many programs."

I find this sad, and I wonder what those too many programs are. Why would one put NULL on PRIMARY KEY?


Thankfully from what I have seen sqlite is maintained with that old-school style of library development, the one you scarcely see these days, and don't hear much about on HN: the one where "thou shalt not break existing code" has some weight. I'd be nervous about depending on a library where the whim of the developer thinking "I find this sad" breaks my program.

sqlite4 OTOH seems to be the "let's break things" phase. Probably healthy to have both attitudes at certain stages.


Upvoted. SQLITE as an API does so many things right:

- Providing it's own malloc/free "proxies" to work away with different linked CRT versions (Microsoft) - Prefixing ("namespacing") everything, and not exporting symbols outside of the API - Configuration of the API before it's started, where it makes sense (threading model for example) - Ability to retrieve compilation options back (e.g. how it was compiled and with what features) - many other things


Agree with you on the importance of backwards compatibility. But there is also the idea that you should at some point draw the line and break compabitibility so that you can really fix things is also defendable. In my opinion by releasing the library/project under a new name, like python does.

The worst and most common though is the "break all things for no reason at all"...


Hopefully there aren't many people designing data models with NULL primary keys.

However, if SQLite never enforced non-NULL behavior then it's certainly possible that along the way some applications could have written an errant record to a database with a NULL primary key (instead of treating it as an error case). And if a "fixed" version of SQLite3 ever shipped, then these previously-valid data files would suddenly become invalid.

Imagine being an app developer and having random customers whose data files suddenly break because the OS happened to upgrade the SQLite3 library to a version that included this fix. I completely understand why the the SQLite maintainers may have decided it wasn't worth the hassle to fix things prior to SQLite4.


> the OS happened to upgrade the SQLite3 library

I don't think this actually detracts from your point, but, are there cases where that can actually happen? Isn't SQLite always statically linked with the applicaton?


E.g. on iOS and Android you get access to the system's SQLite library and apparently can't even opt-out (because it is used by system APIs that you will use, if I understand correctly). See: https://news.ycombinator.com/item?id=7756332


> Isn't SQLite always statically linked with the applicaton?

SQLite can be dynamically linked, though static linking is probably more common.


according to

`apt-cache rdepends slite3`

on ubuntu 14.04 packages including the following link to sqlite dynamic packages(this may be partially due to debian heavily discouraging statistically linked packages.

  zabbix-proxy-sqlite3
  yokadi
  uwsgi
  usermetricsservice
  trove-common
  sympa
  sqliteman
  sisu-sqlite
  rt4-db-sqlite
  roundcube-sqlite3
  redmine-sqlite
  ratbox-services-sqlite
  postfix-cluebringer-sqlite3
  pdns-backend-sqlite3
  pbnj
  opendnssec-enforcer-sqlite3
  lire
  krecipes
  kamailio-sqlite-modules
  inspircd
  imms-common
  hud-tools
  gbrowse
  eurephia
  dotclear
  beancounter
  adminer
  bacula-director-sqlite3


sqlite3 is the package that provides the command-line tool. A lot more packages depend on the dynamic library: "apt-cache rdepends libsqlite3-0 | wc -l" returns 407 results.


It's not super unreasonable for a multi-element primary key, is it? For example, at work we have a MySQL table that stores advertising stats with a (Publisher ID, Country, Day) primary key[1]. All those fields are non-nullable, but I can imagine someone using NULL for the country[2] if we couldn't determine it (it's based on IP).

[1] It's slightly more complicated, but it's basically similar.

[2] "zz" might be appropriate to represent an unknown country, but other domains might not have a good default value

Edit: double asterisks messing up my formatting


it's unreasonable. NULL isn't a value, it means, "the value can't be known". If a primary key of a table has part of it that is unknown, you've designed your schema completely incorrectly, and you should brush up on normal form: http://en.wikipedia.org/wiki/Database_normalization#Normal_f...


If you would think for a couple of minutes you will [probably] understand that "gaps" (NULLs) in a PRIMARY KEY column is a plain stupidity.


Actual table alterations please, so ORM developers don't have to use a load-bearing hack to implement migration in SQLite.


I'm sure this sounds arbitrary, but I'd love it if the sqlite file header allowed for more room to add program specific metadata. The fact that more and more apps are using sqlite files to store user facing files, it would be nice if space in the header was given to identify the expected usage of the sql file.


You're aware of the application_id pragma? That's what this guy was created for: http://www.sqlite.org/pragma.html#pragma_application_id


Hah, I read that and thought "Hmmm, I wonder if Gus knows about this? He should use it in Acorn". Then I read your username :)


That's good, but is not enough. A metadata support would be useful!

e.g. It's like with photos (JPG) were you would only know it was shoot with a Canon camera, or a music file (MP3) that only stores the application name media player.

Most common file formats support metadata formarts like JPEG with its EXIV, ITPC, XMP and MP3 with ID3 v1 & v2, etc.

So for SQLite4 a very simple key value format would be great, with some predefined key-fields.


You can always embed a "metadata" table in your schema. Not ideal but would probably do the job in a lot of cases.


Ah, no I wasn't. The file format page simply marks that region as reserved. http://www.sqlite.org/fileformat.html


Reading the "Key Changes" section, it looks like the sqlite4_env environment will address the global mutex problem that was discussed in a recent HN thread.

The pluggable db engine looks interesting but if I remember right, in MySQL the pluggable db engine initially attracted some niche providers but the marketplace of ideas eventually settled on ISAM/Innodb.

What's still not clear from the Executive Summary is if there would be any compelling use case for the older SQLite3 for reasons other than backward compatibility. It would be great if the SQLite designers could answer: "You'd want to use v3 instead of v4 for greenfield projects if...?"


I don't think there is an answer to your question beyond "it's been battle tested extremely well", but there are strong reasons to move to sqlite4 (in particular, multiple linked instances in eg mobile apps and mobile OSes). It's not like Python where the weight of the existing community has drawn you towards python 2 (something that has changed a lot recently).


That's fair but I guess I was looking for a stronger statement from them because even for new minor point releases, the sqlite.org home page says clearly:

"Version 3.8.4.3 of SQLite is recommended for all new development."

Every time they change the "z" in v.x.y.z, they'll still state "is recommended for all new development."

So, it seems like they could just clearly say, "SQLite4 is recommended for all new development."

...unless there's some architecture tradeoff (unrelated to backward compatibility) that still makes SQLite3 desirable over SQLite4.


Sqlite3 is used in so many places, they would be fools to deprecate it. Maybe in five years, they'll revisit the issue, but for now they want to improve the fundamentals of sqlite.

One good thing is I expect most of their testing code can be ported over to sqlite4 http://www.sqlite.org/testing.html


I don't know, if the author of SQLite also reads here, but ...

I am not sure, if having a single key space seems to have possible counter-productive effects on the overall speed.

One example: You want to scan all db entries of one table and the key are random English words. The table itself only has several hundred entries, but there are also tables with text keys with millions of entries.

When I understand the concept right (I don't know), wouldn't that mean, that a huge (common) index would have to be searched and many, many entries skipped, because you use one single key space? Wouldn't that hurt performance very much for that specific operation?


I wonder if this will have any implications for spatialite? If spatialite were a little better able to make automatic use of spatial indexes, I could easily see it becoming the default foundation for many open source GIS.


Is there any evidence this is actually moving ahead? Seems somewhat stalled.


Is this available already? I'm quite confused.


Doesn't look like it from the home page.


Think I might start a kickstarter to fund PostgreSQL-SQLite replication over WebSocket. Good idea?


Yes! I need a sync engine between both of this, and I is harder than I tough. I wish to join forces on this, but not much love for SQL this day with all "all backend is NOSQL!"


I love sqlite and find it extremely useful. I just wish they could figure the multi-user issues out (maybe it's not practical...probably not, I don't know how it works internally). But, if they could do this it would be truly awesome!


From: http://www.sqlite.org/whentouse.html

SQLite is not designed to replace Oracle. It is designed to replace fopen().


Easy: Put a server process in front of it and route access to the DB through that. :P


Doesn't that just defeat the purpose? At that point you're probably much better off just using postgressql (or similar).


It might be a neat solution if you already have a server process for other tasks anyway and the data access needs are quite simple.


That's what I'm thinking. Probably less trouble just to set up postgres at that point and not deal with unexpected troubles and complications.

But I hadn't thought of putting a server process in front of sqlite so thanks for the tip. It might come in handy at some point.


would someone care to explain why they down voted that comment?


It was a joke! Hence the ":P".


I did that, using Go and its channels feature. Not much code and works like a charm.


The new features look promising. I'd also like to see UPDATE ... FROM. It's kludgy to achieve the equivalent of this in SQLite3.


Huh. Reading this makes me realize that I can totally use SQLite4 as an alternative array backend for my array lib. It'll be a cute and likely useless hack! :-)


I did a Skim Read. So what exactly is new? Because I thought this design document were a few years old already.

When is SQLite 4 expected?


My db engine has also pluggable key-value back ends. It gives great flexibility.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: