TIL: SQLite Virtual Tables
I’ve been working on implementing virtual tables
for my zombiezen.com/go/sqlite
package.
I hadn’t used virtual tables in SQLite before this,
so to get a feel for the API, I played around with the feature and read up on the documentation.
Since it’s not a feature I’ve seen talked about a lot,
I wanted to share what virtual tables are, why you might want to use them, and what some limitations are.
What is a Virtual Table?
A virtual table is a table that is not stored in SQLite’s on-disk format, but rather, implemented by making callbacks into code your application provides. The SQLite distribution provides its own virtual table implementations, such as CSV, full-text search, and JSON. But you’re not limited to these virtual tables: if you use the virtual table API, you can create your own.
Some virtual tables can be brought into scope with a CREATE VIRTUAL TABLE
statement,
like the full-text search (FTS) extension:
CREATE VIRTUAL TABLE email USING fts5(sender, title, body);
These act like normal tables, except they cannot have triggers or indices, nor can their schema be modified.
Other virtual tables are “eponymous”,
which means that they are in scope without having to explicitly create them.
The generate_series
extension is a simple eponymous table
and can be used like this:
SELECT value FROM generate_series WHERE start=5 AND stop=50;
Eponymous virtual tables can also be used as table-valued functions. SQLite maps arguments to hidden columns in the virtual table in the order they were defined by the virtual table implementation. For example, the above query can be written more succinctly as:
SELECT value FROM generate_series(5, 50);
This is because in the generate_series
virtual table implementation,
it declares its schema to be:
CREATE TABLE generate_series(
value,
start HIDDEN,
stop HIDDEN,
step HIDDEN
);
Why are Virtual Tables useful?
A very useful thing virtual tables allow you to do is create table-valued functions
that can parse data from some other format.
The json_each
function is a great example of this:
it takes in JSON and then converts the object or array into a query-able table.
This way, rather than having to perform additional filtering or processing in your application,
you can keep the logic in your SQLite query.
Virtual tables can also be helpful for creating data access abstractions, like the full-text search extension or the Geopoly extension. These can be hard to create yourself, but usually in these cases you’re using the hard work of people who have already made them.
Why shouldn’t I use a Virtual Table?
Like with any technique, it’s important to consider tradeoffs of virtual tables. By their nature, virtual tables obfuscate what their data access looks like, so their performance and storage characteristics can be harder to reason about. Non-virtual SQLite tables perform well in a variety of circumstances out-of-the-box and you can control their indices as it suits your application.
Another factor to consider:
from my limited experience, implementing your own virtual tables can be somewhat tricky.
For best results, the workhorse of the virtual table callbacks,
xBestIndex
,
requires you to examine WHERE
clauses to assist in query planning.
You could just return all the data to SQLite and it will filter it for you,
but then your queries will slow down from discarding all the unused rows.
And finally, virtual tables are not available in all SQLite drivers or languages. You may need to dynamically load virtual table implementations from C shared libraries if they’re not already compiled in. If you’re wanting to implement your own virtual table, you may need to write C or another low-level language. Because the callbacks get called frequently, they are quite performance-sensitive, so it wouldn’t make sense to implement a virtual table in Python or Ruby, for example. Depending on your application, you may need to skip the virtual table and rely on normal SQLite behavior.
Conclusion
Virtual tables are a pretty powerful SQLite feature: they let you bring just about anything into SQLite’s data model. It’s not the tool for every job, but in the right circumstances, it makes SQLite even more powerful than it already is.