Monday, July 8, 2013

Database intro for developers

Most of my academic background focused on low level database research. I wrote a dynamic programming based join-order optimizer for the open-source SQLite database as project in school. I have submitted a paper for publication in an academic conference that deals with optimizing low-level page structures and optimizer heuristics to improve throughput for certain kinds of workloads on solid-state drives. All in all, I've been a "nuts and bolts" database geek for a while -- more on the systems-level side than on the data mining and data warehousing side. I can talk about exotic fractal trees and various B*-tree variants for efficient indexing and lock-free versions ad nauseum ;)

In any case, the level of willful ignorance about the basics of how databases work among typical enterprise developers has always surprised me. The are the workhorses of most enterprise apps, and getting them to perform well requires some knowledge of how they work. At least a basic understanding of how indexing works is useful in building up your mental intuition about what will perform and what won't in production. Everyone should also at least know the basic processing pipeline: parsing, optimization, execution.

This talk was meant to be a simple introduction to some of the basics of how databases process your queries for you. I'd like to do more follow on talks to add more depth in the future. It is geared towards developers -- not DBAs. So there is nothing in here about best practices for backups or transaction log shipping.