A Lesson in Using the Right Tools … and Humility!

Playing with an open source disk indexer. It looked like
it had potential, except it was too slow. Too slow to read the disk,
and too slow to restore the “database”.

Too Slow to Read the Disc

It was using the .NET DirectoryInfo() methods to read the disk. I
know from experience this is very slow, so I used Interop to invoke
FindFirstFileW / FindNextFile. Much better!

Too slow to restore the “database”

Now, why was the original taking so long to restore its “database”?
Digging into the code I determined the “database” was a simple
serialization of the objects using BinaryFormatter. Google-sensei
repeatedly said that BinaryFormatter serialization was
slow, especially on restore.

I attempted using some other serialization libraries (Wire,
protobuf, JIL) but couldn’t get them to work easily. I decided
this was my opportunity to try SQLite.

First attempt is very quick-and-dirty. No ORM: hand-built SQL
strings, manually escaping single quotes, etc. A couple of
iterations over the POCOs, and I had something basically working.

So tried a simple disc. Time to read: instant. Time to store:
instant. Time to restore: instant! Excellent!

Tried a more complex disc. Time to read: pretty fast. Time to
store: a few seconds, pretty fast. Time to restore: wait, what???
Waaaay slow. Much slower than the original!!

SQLite is Slower???

Here is the lesson in humility. I simply assumed a SQLite version
would implicitly be faster. I also used the wrong tool: I wasted
some time running VS Profiler, which merely confirmed the database
reads were the problem.

Google-sensei had a lot to say about INSERT performance, but that
wasn’t the problem. Long story short, it eventually occurred to me
that my select statements had a WHERE clause involving an un-indexed
column.

A couple of indices later and viola! Much better!

Final Numbers

numbers

Timing in seconds. Running both versions, against the same disk, under the same
circumstances. About 760,000 POCOs involved.