I'm experimenting with sqlite as backend database for orchestrator. While orchestrator manages MySQL replication topologies, it also uses MySQL as backend. For some deployments, and I'm looking into such one, having MySQL as backend is a considerable overhead.
This sent me to the route of looking into a self contained orchestrator binary + backend DB. I would have orchestrator spawn up its own backend database instead of connecting to an external one.
Why even relational?
Can't orchestrator just use a key-value backend?
Maybe it could. But frankly I enjoy the power of relational databases, and the versatility they offer has proven itself multiple times with orchestrator, being able to answer interesting, new, complex questions about one's topology by crafting SQL queries.
Moreover, orchestrator is already heavily invested in the relational model. At this time, replacing all SQL queries with key-value reads seems to me as a significant investment in time and risk. So I was looking for a relational, SQL accessible embeddable database for orchestrator.
Why sqlite?
I am in particular looking at two options: sqlite (via the go-sqlite3 binding) and TiDB. sqlite does not need much introduction, and I'll just say it's embeddable within the golang-built binary.
TiDB is a pure-Go, MySQL dialect compatible server which provides relational model on top of key-value store. The store could be local or distributed, and the TiDB project cleverly separates involved layers.
Of the two, sqlite is mature, alas uses a different SQL dialect and has different behavior. TiDB's compatibility with MySQL is an impressive feat, but still ongoing.
Both require adaptations of SQL code. Here are some observations on adaptations required when moving an existing app from MySQL backend to sqlite backend.
Differences
Just to answer an obvious question: can't everything be abstracted away by an ORM that speaks both dialects?
I don't think so. I always exploit SQL beyond the standard insert/delete/update/select, exploits that ORMs just don't support.
Here's an incomplete list of differences I found. Some purely syntactical, some semantical, some behavioral, and some operational.
- Data types: no
CHARACTER SETclause - Data types: you can't associate
UNSIGNEDto any int type - Data types: no
enum. However there's an alternative in the form of:
race text check (race in ('human', 'dog', 'alien')) auto_incrementis calledinteger- Data types: timestamps are not a thing. There's no timezone info.
TIMESTAMPhas noON UPDATEclause.- No
afterclause for adding columns - Indexes are not part of table creation. Only
PRIMARY KEYis. The rest of indexes are created viaCREATE INDEXstatement - Indexes have unique names across the schema. This is unfortunate, since it forces me to use longer names for indexes so as to differentiate them. For example, in MySQL I can have an index named
name_idxin two different tables; insqliteI append table name for "namespacing" - Temporal values and functions: poor support for time arithmetic.
- Getting the diff between two datetimes is non-trivial (what's the diff in months for a leap year?)
INTERVALkeyword not respected. Appending/subtracting dates can be done via:
datetime('now', '-3 hour')
Can you see the problem in the above? What is the number3is a positional argument? In MySQL I would useNOW() - INTERVAL ? HOUR. To make positional arguments work insqlite, the above gets to bedatetime('now', printf('-%d hour', ?)). How would you even translateNOW() - INTERVAL (? * 2) SECOND?UNIX_TIMESTAMPnot respected. Instead usingstrftime('%s', 'now'), I dislike the use of string functions to generate times.
- Getting the diff between two datetimes is non-trivial (what's the diff in months for a leap year?)
insert ignoreturns toinsert or ignorereplace intoremainsreplace into. Yay!insert on duplicate key updatehas no equivalent. It's worthwhile noting areplace intodoes not replace (pun intended) aninsert ... on duplicate keyas the latter can choose to only update a subset of column in the event of a constraint violation. It's really very powerful.IS TRUEandIS FALSEare not respected.ALTER TABLE:- When adding a
not nullcolumn one must specify the default value (e.g.0for anint) - You cannot add a timestamp column that defaults to
CURRENT_TIMESTAMP. You can have such column in yourCREATE TABLEdefinition, but you cannot add such a column. The reason being thatCURRENT_TIMESTAMPis not a constant value. When adding a column to a table,sqlitedoes not actually apply the change to all existing rows, but only to newly created ones. It therefore does not know what value to provide to those older rows. - You cannot
DROP COLUMN. I'll say it again. You cannotDROP COLUMN - You cannot modify a
PRIMARY KEY - You cannot rename a column or change its datatype.
- In fact, the only supported
ALTER TABLEstatements areADD COLUMNandRENAME(renaming the table itself)
- When adding a
Regarding the ALTER TABLE limitations, the advice for dropping/changing/renaming columns or changing the primary key is to "create a new table with the new format, copy the rows, drop the old table, rename". This is certainly feasible, but requires a substantial overhead from the user. And it's non atomic. It requires a change in the state of mind but also a change in state of operations, the latter being non-trivial when moving from one DBMS to another, or when wishing to support both.
I'm still looking into this, and trying to work my way around differences with cheap regular expressions for as much as possible. I'm mainly interested right now in finding all semantic/logic differences that would require application changes. So far the TIMESTAMP behavior is such, and so is the INSERT ... ON DUPLICATE KEY statement.