My odyssey in the world of bioinformatics, with Delphi

After a conversation about the low popularity of the Delphi language—further worsened by the lack of libraries for Delphi—I decided to share my odyssey in the world of bioinformatics, using Delphi as my primary development tool.

My toy tool that turned into a commercial product

Back in 2007, I wrote a small tool for a biologist friend to help him automatically reverse complement DNA bases in a sequence. At the time, they were using a $5000 piece of software to assemble sequences, yet it couldn’t perform this simple task.

 

The tool was a success. It worked so well that I ended up founding a company, writing a commercial product from scratch in Delphi, and selling it against the competition for only $500 per license.

Why start from scratch?

Back then (and it’s still true today), there were no scientific libraries for Delphi. And in the scientific world, C is just as unknown as Delphi when it comes to data analysis. I would have had to translate multiple libraries from languages other than C.  Plus, functional languages don’t translate so well to procedural.

Going for “auto-pilot”

The program I wrote wasn’t just a simple data importer. It performed the entire DNA sequence assembly process, from start to finish. It was the first fully automated solution for this task, in the whole world. One DNA contig could be processed in seconds instead of 30 minutes of manual labor.

The “Need for speed”

The math for sequence assembly has been around since the ’60s–’70s, ready to use. However, the languages scientists preferred—Java, Python, Perl—were too slow to take advantage of it. This limitation pushed most existing programs/libraries to use K-mers, a faster but less accurate approach.

 

The result? Automated DNA sequencing wasn’t possible. Scientists had to manually review and fix K-mer outputs. This meant hiring students (lots) for the tedious task. The problem? After an hour of soul-crushing, repetitive work, they were producing results worse than K-mers algorithms. Who can blame them? That wasn’t work fit for humans.

From this point of view, Delphi was the right choice.

Scientists’ taste in programming languages

Scientists gravitate toward functional languages like R and Julia for data analysis. Math is done in MATLAB, even though it’s 100x slower than C/Delphi. Data analysis often runs on sprawling bash scripts—if you want to call that a language!

 

Why slow languages? Scientists focus on their fields—biology, math, economics—not complex languages like C. If they need speed, they hire someone to write inner loops in C or C++.

 

R is incredibly slow, but it calls C where speed matters. Julia does the same. Both are written in C, ironically.

 

Another reason? Most tools are used once for a specific dataset and then discarded. When a new dataset arrives, they build new tools.

Frighteningly large amounts of data

Once, I saw a table sagging under the weight of hard drives—a half-meter-tall stack of them. Input data was measured in cubic meters, not gigabytes. There was a time when they accumulated more data than they could process.

More lack of libraries

I also had to write a base caller. Back then, only two programs could do it, both priced in the five-digit range. So, I wrote my own. The math wasn’t readily available—I had to invent the algorithm myself. Later, I realized I’d reinvented the wheel. It turned out my algorithm was very similar to audio recognition techniques. If only I’d started there!

 

Somebody once said in a forum that most Delphi libraries are crappy because their authors don’t have the skills to write proper code. Honestly, at that time I would have been glad to work with (and fix) crappy libraries instead of starting everything from:

program DnaSeqenceAssembler;

begin

end.

Back then even a crappy library could have encouraged more scientists to use Delphi, helping build an ecosystem.

Now, looking back, I realize how “from Rome to the moon and back to Paris” crazy I was to start such a project in Delphi.

Coding peppered with a bit of hacking

Not all my time was spent writing new libraries. Some was “wasted” hacking proprietary file formats. A big corporation—the same one behind the five-digit base caller—refused to share the binary format output by their sequencing machines. So, I spent many nights with a hex editor trying to decode their files.

 

Eventually, I discovered the output was in Motorola format. Once I figured that out, the rest became easier.

(Now, the Delphi code to import these files is free on my Git account. Fun fact: the “specification” I used was just the binary files themselves.)

 

I also had to wrote a post-parser to fix their files because 20% of them were corrupted. A multi-billion-dollar company couldn’t hire a proper software engineer to write reliable code. Instead, they had armies of salespeople for every programmer.

 

Today, there are plenty of base callers. But they aren’t written line by line in C or Delphi. Machine learning has taken over. Developers train AI by feeding it examples: “This looks like a G; this looks like a T, A, C, etc.” AI does the rest. No more advanced math or speed-optimized algorithms. Just throw it on an HPC and let it compute.

Missing the proper tools to write a better product

Unfortunately, my program was Windows-only because FMX was unusable at the time (and wasn’t included in my Delphi license anyway).

 

I did build some Linux tools for data processing clusters using FPC/Lazarus, which was a great cross-platform tool at the time. Still is, but now FMX can finally complete it.

 

If I’d been able to create the program for Mac, it would’ve sold 100x better as the bioinformatics world doesn’t have many connections with Windows.

RAM was another limitation. Delphi’s 64-bit support was terrible back then, forcing me to rewrite parts of the math to avoid *out of memory* errors.

 

Now FMX is better, but it’s too late—both for me and Embarcadero. They lost a huge opportunity between 2005–2015 because of three main reasons:

  • No free license.
  • No usable cross-platform support.
  • No libraries for the scientific world.

What about the present?

If Embarcadero wants to stay in the game, especially in this new AI-driven era, they need a dedicated “for scientists” department. And they can’t bury in a “buy it separately” edition as they did with FMX and the do now with Linux.

 

To win, Embarcadero must lose some money now. Isn’t that the definition of “investment”?

AI: The new horizon

People argue that today’s AI isn’t better than a good programmer. Who cares about today? AI is coming, whether you like it or not but old people have problems accepting the reality. Unless you’re near retirement, you should care!

Think of any new technology—like the first airplane 100 years ago, which flew just a few meters. Today, we can fly to the moon.

 

Now replace “airplane” with “AI” and wonder how high it’ll fly?

Leave a Comment

Scroll to Top