The Design Of SQLite4

zzzeek · on May 25, 2014

Doubling down on the key/value model gives me the impression that a real and fully functional ALTER command is becoming even more unlikely than it already is, not to mention that this widely desired feature is not even mentioned here.

A key advantage to the relational model is that we get to have schemas. But if you can't make reasonable changes to those schemas without explicitly rewriting all your data from scratch, that works pretty hard against those advantages. Note that an ALTER command that rewrites all the data automatically, while it would perform like crap, would be better than none at all. Because that's what everyone is doing anyway.

_wmd · on May 25, 2014

I'm perplexed how you arrived at this conclusion from the linked document.. every major DBMS with a "fully functional ALTER" has a key/value store underneath it, as did SQLite 3, as does PostgreSQL and MySQL, as does..

The only difference with the new design is that the SQL engine and storage engine have been explicitly modularized and exported as a public API, potentially allowing simultaneous k/v and SQL use within a single transaction, or flexible use of SQL with an existing store.

zzzeek · on May 25, 2014

> I'm perplexed how you arrived at this conclusion from the linked document..

it is based on:

https://news.ycombinator.com/item?id=5887053

> I am very familiar with SQLite internals. The answer is already in there. SQLite stores each row as each column value encoded sequentially corresponding to the declared order of the columns. Changing column order or deletions/inserts require a rewrite of every row... > ...A SQLite provided ALTER TABLE implementation would do exactly what was stated - start a transaction, rename the existing table to a temporary name, create a new one with the desired schema, and copy data across mangling as appropriate before deleting the old table and finishing the transaction.

the document here appears to suggest that the internal structure idea is being largely maintained as is, except that it will be organized in one giant blob, rather than blob-per-table. If the structure were being changed such that ALTER were suddenly much more feasible, I'd assume that would be one of the giant headlines of this story. But it's not.

Of course we'd always welcome hearing from actual SQLite developers someday on this issue.

_wmd · on May 25, 2014

SQLite 4 will have one storage engine record for each table row, just as it was in SQLite 3.

Fundamentally, there is little difference in how SQLite stores data in the storage engine as compared to, say, Postgres, except that Postgres' SQL implementation is more tightly bound to its storage engine (e.g. index tuples are encoded using knowledge of the engine, whereas in SQLite they simply use the record's primary key)

Postgres and suchlike don't have some magical data structure that makes ALTER TABLE possible, all row oriented stores have the same choice: either implement the alter immediately (involves a scan and rewrite) or lazily (on next record update).

Not sure if there are any that take the latter approach, but the method is commonplace elsewhere

wolf550e · on May 25, 2014

To be able to defer the re-encoding of rows (e.g. when ALTER changes column order) the row header must store the version/identifier of the format used to encode it. You can also do it block-at-a-time instead of row-at-a-time, so only each block would store a version/identifier of the format. But any way, deferrable re-encoding means some storage overhead and inconsistent UPDATE performance after ALTER until all rows (or all blocks) have been re-encoded.

DrJokepu · on May 26, 2014

I don't think "lazy alter" is a particularly good idea in an RDBMS as it could lead to the operation failing after its transaction has been committed which is something that is generally avoided in relational databases.

thesz · on May 26, 2014

1) That would be true for any row-oriented DBMS.

2) LSM tree is blazingly fast in inserts (my own experiments shows two orders of magnitude difference for BerkeleyDB and my own LSM tree implementation in C# for bulk random inserts).

So I think SQLite4 is safe here.

zzzeek · on May 26, 2014

Great! So they can add a full blown ALTER. But so far, the only indication we can get is, "never".

d4mi3n · on May 25, 2014

Please correct me if I'm wrong, but isn't a major use-case for SQLite a database for embedded systems? In that light, I can't picture too many situations off the top of my head where one would want to change the schema.

That being said, I've personally seem SQLite used for more prototype-y stuff (default db for new instances of a Rails app, ad-hoc data stores for mobile phone apps, etc)

zzzeek · on May 25, 2014

I am asked constantly for ALTER support in SQLite, including by the maintainer of GNU Mailman, which is neither embedded nor a prototype one-off. SQLite is extremely flexible and used in an enormous variety of situations.

References:

http://dustycloud.org/blog/sqlite-alter-pain/

https://news.ycombinator.com/item?id=5886898

https://bitbucket.org/zzzeek/alembic/issue/21/column-renames...

plus https://bitbucket.org/zzzeek/alembic/issue/129/column-additi... in case sqlite devs care to look at that one...

d4mi3n · on May 25, 2014

Very interesting! Thanks for the references.

Having read some of these, though, it's pretty clear that ALTER support is a major pain point. Would you have any insight as to why the SQLite maintainers would not choose to prioritize that issue? From what I know of SQLite, it's a very well built piece of software. It seems odd to me that such a smart bunch of guys would disregard community feedback without a good reason.

keeperofdakeys · on May 26, 2014

The problem is that it's hard to support for a database like sqlite. Things have a very specific on-disk representation, so you can't just add data to the middle of a file; rewriting the file also means others can't read it at the same time, so you can't do long operations.

When you look at this operation on a normal database, it isn't too bad. The database can change the on-disk representation while still serving requests, and it can use whatever storage it wants, since it's the only one that operates on the data.

nikbackm · on May 26, 2014

Wouldn't a database generally be offline when you ALTER it? At least one that is used in scenarios where SQLITE is appropriate.

In that case it should not hurt too much if the operation takes some time.

frik · on May 25, 2014

> Please correct me if I'm wrong, but isn't a major use-case for SQLite a database for embedded systems?

SQLite is used in OSX (spotlight metadata, Core Data framework, iTunes, Mail), iOS, Android, WebSQL, etc.

I am pretty sure every computer gadget you own runs some instance of SQLite (be it an anti-virus app, Firefox, Thunderbird, Oracle products, Skype, Adobe software, PHP (SQLite2 & SQLite3), QT and even many Win Phone 8 apps incorporate SQLite3 library.

bobbyi_settv · on May 25, 2014

It's needed, for example, when upgrading applications that use sqlite for storage. Like in the mobile phone case, if a new version of the app comes out that has a new feature that requires a new column in one of the tables, when you upgrade the app on your phone, the app probably wants to migrate the sqlite db to the new schema rather than throwing away all your saved data.

d4mi3n · on May 25, 2014

This is something I hadn't considered. Being able to do firmware updates to a device with an embedded instance of SQLite seems huge.

Seeing as ALTER support is currently lacking, what kinds of approaches are commonly taken to work around this issue? I'm particularly interested in hearing about how people manage workaround to this problem for things like upgrading embedded devices. I imagine the scarcity of resources (RAM, CPU) available to most embedded devices would make this a difficult problem to solve on the hardware at hand.

marcosdumay · on May 26, 2014

The usual way to deal with that (and recommended by the SQLite docs) is creating a new table, copying the data, and dropping the old table.

Needless to say that this is VERY error prone, and limited resources make it even more so.

daleharvey · on May 25, 2014

People want to change schema in embedded systems, programs get updated. Having an explicit mechanism to deal with that is one thing IndexedDB and webSQL got right.

We use ALTER tables in pouchdb on webSQL schema upgrade, I am not familiar with the missing functionality, we dont do a lot of complicated things though.

mbell · on May 26, 2014

I think at least part of the confusion here is the definition of 'embedded'. To some people 'embedded' means 'within the same process or thread as the core application', to others it means 'firmware running on small hardware devices'.

dtech · on May 26, 2014

> To some people 'embedded' means 'within the same process or thread as the core application'

Only that definition is silly here, since SQLite is by its serverless nature always embedded inside another program.

panzi · on May 25, 2014

> Please correct me if I'm wrong, but isn't a major use-case for SQLite a database for embedded systems?

What makes you think that? It is used in many Desktop applications. E.g. most browsers (Firefox, Chrome, Safari), Skype and apparently even Apple Mail, iTunes and the Steam runtime use SQLite. Do you want to store complex indexed data in your desktop or smartphone application? Maybe something like a music library? Then you usually use SQLite. Even if you store relatively simple data you may use SQLite: no need to invent your own data file format. SQLite runs virtually everywhere and has bindings for virtually every programming language.

And as someone else already said: applications get updates that add features and need to update their DB accordingly.

nscanlon · on May 26, 2014

I agree that adding this, even if it is horribly inefficient, would be better than everyone reimplementing it at the application layer. I just don't see how you've drawn this conclusion from the linked summary. This sounds to me like a pet gripe of yours and you are just looking for a forum to air it.

frik · on May 25, 2014

We had a discussion about SQLite4 on HN (2012): https://news.ycombinator.com/item?id=4168645

Someone mentioned it on the SQLite mailing list and the first reply was this:

  LOL.  Those Hacker News guys are hardcore.  Make some of
  my mailing lists look almost civil.

-- http://sqlite.1065341.n5.nabble.com/SQLite4-don-t-scream-td6...

sanxiyn · on May 25, 2014

"SQLite4 requires all elements of the PRIMARY KEY to be non-null. This is an SQL standard. Due to an oversight in early versions, SQLite3 does not enforce the NOT NULL constraint on PRIMARY KEY columns since by the time the omission was discovered SQLite3 was in such widespread use, activation of NOT NULL enforcement would have broken too many programs."

I find this sad, and I wonder what those too many programs are. Why would one put NULL on PRIMARY KEY?

asveikau · on May 25, 2014

Thankfully from what I have seen sqlite is maintained with that old-school style of library development, the one you scarcely see these days, and don't hear much about on HN: the one where "thou shalt not break existing code" has some weight. I'd be nervous about depending on a library where the whim of the developer thinking "I find this sad" breaks my program.

sqlite4 OTOH seems to be the "let's break things" phase. Probably healthy to have both attitudes at certain stages.

malkia · on May 26, 2014

Upvoted. SQLITE as an API does so many things right:

- Providing it's own malloc/free "proxies" to work away with different linked CRT versions (Microsoft) - Prefixing ("namespacing") everything, and not exporting symbols outside of the API - Configuration of the API before it's started, where it makes sense (threading model for example) - Ability to retrieve compilation options back (e.g. how it was compiled and with what features) - many other things

swah · on May 26, 2014

Agree with you on the importance of backwards compatibility. But there is also the idea that you should at some point draw the line and break compabitibility so that you can really fix things is also defendable. In my opinion by releasing the library/project under a new name, like python does.

The worst and most common though is the "break all things for no reason at all"...

skue · on May 25, 2014

Hopefully there aren't many people designing data models with NULL primary keys.

However, if SQLite never enforced non-NULL behavior then it's certainly possible that along the way some applications could have written an errant record to a database with a NULL primary key (instead of treating it as an error case). And if a "fixed" version of SQLite3 ever shipped, then these previously-valid data files would suddenly become invalid.

Imagine being an app developer and having random customers whose data files suddenly break because the OS happened to upgrade the SQLite3 library to a version that included this fix. I completely understand why the the SQLite maintainers may have decided it wasn't worth the hassle to fix things prior to SQLite4.

furyofantares · on May 25, 2014

> the OS happened to upgrade the SQLite3 library

I don't think this actually detracts from your point, but, are there cases where that can actually happen? Isn't SQLite always statically linked with the applicaton?

panzi · on May 26, 2014

E.g. on iOS and Android you get access to the system's SQLite library and apparently can't even opt-out (because it is used by system APIs that you will use, if I understand correctly). See: https://news.ycombinator.com/item?id=7756332

dragonwriter · on May 26, 2014

> Isn't SQLite always statically linked with the applicaton?

SQLite can be dynamically linked, though static linking is probably more common.

rat87 · on May 26, 2014

according to

`apt-cache rdepends slite3`

on ubuntu 14.04 packages including the following link to sqlite dynamic packages(this may be partially due to debian heavily discouraging statistically linked packages.

  zabbix-proxy-sqlite3
  yokadi
  uwsgi
  usermetricsservice
  trove-common
  sympa
  sqliteman
  sisu-sqlite
  rt4-db-sqlite
  roundcube-sqlite3
  redmine-sqlite
  ratbox-services-sqlite
  postfix-cluebringer-sqlite3
  pdns-backend-sqlite3
  pbnj
  opendnssec-enforcer-sqlite3
  lire
  krecipes
  kamailio-sqlite-modules
  inspircd
  imms-common
  hud-tools
  gbrowse
  eurephia
  dotclear
  beancounter
  adminer
  bacula-director-sqlite3

emillon · on May 26, 2014

sqlite3 is the package that provides the command-line tool. A lot more packages depend on the dynamic library: "apt-cache rdepends libsqlite3-0 | wc -l" returns 407 results.

MaxGabriel · on May 25, 2014

It's not super unreasonable for a multi-element primary key, is it? For example, at work we have a MySQL table that stores advertising stats with a (Publisher ID, Country, Day) primary key[1]. All those fields are non-nullable, but I can imagine someone using NULL for the country[2] if we couldn't determine it (it's based on IP).

[1] It's slightly more complicated, but it's basically similar.

[2] "zz" might be appropriate to represent an unknown country, but other domains might not have a good default value

Edit: double asterisks messing up my formatting

zzzeek · on May 25, 2014

it's unreasonable. NULL isn't a value, it means, "the value can't be known". If a primary key of a table has part of it that is unknown, you've designed your schema completely incorrectly, and you should brush up on normal form: http://en.wikipedia.org/wiki/Database_normalization#Normal_f...

dschiptsov · on May 26, 2014

If you would think for a couple of minutes you will [probably] understand that "gaps" (NULLs) in a PRIMARY KEY column is a plain stupidity.

eudox · on May 25, 2014

Actual table alterations please, so ORM developers don't have to use a load-bearing hack to implement migration in SQLite.

seanalltogether · on May 25, 2014

I'm sure this sounds arbitrary, but I'd love it if the sqlite file header allowed for more room to add program specific metadata. The fact that more and more apps are using sqlite files to store user facing files, it would be nice if space in the header was given to identify the expected usage of the sql file.

ccgus · on May 25, 2014

You're aware of the application_id pragma? That's what this guy was created for: http://www.sqlite.org/pragma.html#pragma_application_id

lilyball · on May 25, 2014

Hah, I read that and thought "Hmmm, I wonder if Gus knows about this? He should use it in Acorn". Then I read your username :)

frik · on May 26, 2014

That's good, but is not enough. A metadata support would be useful!

e.g. It's like with photos (JPG) were you would only know it was shoot with a Canon camera, or a music file (MP3) that only stores the application name media player.

Most common file formats support metadata formarts like JPEG with its EXIV, ITPC, XMP and MP3 with ID3 v1 & v2, etc.

So for SQLite4 a very simple key value format would be great, with some predefined key-fields.

emillon · on May 26, 2014

You can always embed a "metadata" table in your schema. Not ideal but would probably do the job in a lot of cases.

seanalltogether · on May 25, 2014

Ah, no I wasn't. The file format page simply marks that region as reserved. http://www.sqlite.org/fileformat.html

jasode · on May 25, 2014

Reading the "Key Changes" section, it looks like the sqlite4_env environment will address the global mutex problem that was discussed in a recent HN thread.

The pluggable db engine looks interesting but if I remember right, in MySQL the pluggable db engine initially attracted some niche providers but the marketplace of ideas eventually settled on ISAM/Innodb.

What's still not clear from the Executive Summary is if there would be any compelling use case for the older SQLite3 for reasons other than backward compatibility. It would be great if the SQLite designers could answer: "You'd want to use v3 instead of v4 for greenfield projects if...?"

SoftwareMaven · on May 25, 2014

I don't think there is an answer to your question beyond "it's been battle tested extremely well", but there are strong reasons to move to sqlite4 (in particular, multiple linked instances in eg mobile apps and mobile OSes). It's not like Python where the weight of the existing community has drawn you towards python 2 (something that has changed a lot recently).

jasode · on May 25, 2014

That's fair but I guess I was looking for a stronger statement from them because even for new minor point releases, the sqlite.org home page says clearly:

"Version 3.8.4.3 of SQLite is recommended for all new development."

Every time they change the "z" in v.x.y.z, they'll still state "is recommended for all new development."

So, it seems like they could just clearly say, "SQLite4 is recommended for all new development."

...unless there's some architecture tradeoff (unrelated to backward compatibility) that still makes SQLite3 desirable over SQLite4.

keeperofdakeys · on May 26, 2014

Sqlite3 is used in so many places, they would be fools to deprecate it. Maybe in five years, they'll revisit the issue, but for now they want to improve the fundamentals of sqlite.

One good thing is I expect most of their testing code can be ported over to sqlite4 http://www.sqlite.org/testing.html

PythonicAlpha · on May 26, 2014

I don't know, if the author of SQLite also reads here, but ...

I am not sure, if having a single key space seems to have possible counter-productive effects on the overall speed.

One example: You want to scan all db entries of one table and the key are random English words. The table itself only has several hundred entries, but there are also tables with text keys with millions of entries.

When I understand the concept right (I don't know), wouldn't that mean, that a huge (common) index would have to be searched and many, many entries skipped, because you use one single key space? Wouldn't that hurt performance very much for that specific operation?

peatmoss · on May 25, 2014

I wonder if this will have any implications for spatialite? If spatialite were a little better able to make automatic use of spatial indexes, I could easily see it becoming the default foundation for many open source GIS.

andrewstuart · on May 25, 2014

Is there any evidence this is actually moving ahead? Seems somewhat stalled.

scrollaway · on May 25, 2014

Is this available already? I'm quite confused.

sigzero · on May 25, 2014

Doesn't look like it from the home page.

elchief · on May 25, 2014

Think I might start a kickstarter to fund PostgreSQL-SQLite replication over WebSocket. Good idea?

mamcx · on May 26, 2014

Yes! I need a sync engine between both of this, and I is harder than I tough. I wish to join forces on this, but not much love for SQL this day with all "all backend is NOSQL!"

jqm · on May 25, 2014

I love sqlite and find it extremely useful. I just wish they could figure the multi-user issues out (maybe it's not practical...probably not, I don't know how it works internally). But, if they could do this it would be truly awesome!

ghshephard · on May 26, 2014

From: http://www.sqlite.org/whentouse.html

SQLite is not designed to replace Oracle. It is designed to replace fopen().

panzi · on May 26, 2014

Easy: Put a server process in front of it and route access to the DB through that. :P

eyko · on May 26, 2014

Doesn't that just defeat the purpose? At that point you're probably much better off just using postgressql (or similar).

nikbackm · on May 26, 2014

It might be a neat solution if you already have a server process for other tasks anyway and the data access needs are quite simple.

jqm · on May 26, 2014

That's what I'm thinking. Probably less trouble just to set up postgres at that point and not deal with unexpected troubles and complications.

But I hadn't thought of putting a server process in front of sqlite so thanks for the tip. It might come in handy at some point.

jqm · on May 26, 2014

would someone care to explain why they down voted that comment?

panzi · on May 26, 2014

It was a joke! Hence the ":P".

genwin · on May 26, 2014

I did that, using Go and its channels feature. Not much code and works like a charm.

genwin · on May 26, 2014

The new features look promising. I'd also like to see UPDATE ... FROM. It's kludgy to achieve the equivalent of this in SQLite3.

carterschonwald · on May 25, 2014

Huh. Reading this makes me realize that I can totally use SQLite4 as an alternative array backend for my array lib. It'll be a cute and likely useless hack! :-)

ksec · on May 26, 2014

I did a Skim Read. So what exactly is new? Because I thought this design document were a few years old already.

When is SQLite 4 expected?

qwerta · on May 25, 2014

My db engine has also pluggable key-value back ends. It gives great flexibility.