Developed By:

Developed By:

Thursday, May 29, 2008

Edit and compare giant binary files with lfhex

Many hex editors try to copy an entire file into memory before they let you edit it, which explicitly limits the size of the files you can view or edit. lfhex is designed to allow you to edit binary files larger than can fit into your computer's memory.

While you might normally not be working with binary files that are larger than your memory, it's good to know that your hex editor can scale to such large files when that situation arises. lfhex can load huge files quickly and does not require large amounts of memory to do so. For example, the documentation mentions that loading a 2GB file requires less than 2MB of RAM.

There are no packages of lfhex for Ubuntu, openSUSE, or Fedora. For this article I'll build it from source using version 0.4 on a 64-bit Fedora 8 machine. Building lfhex uses the qmake, make procedure. You have to make sure you use the qmake executable from Qt 4 instead of Qt 3.x. The easiest way to see which qmake to invoke is to use type -all qmake to find out what qmake executables bash knows about.

In the procedure below, I would use Qt 3.x qmake by default, so I use the full path to the Qt 4 qmake executable to force the correct Qt version to be used for the build. I found no "install" target, but copying the lfhex binary to /usr/local/bin worked fine as a final installation step.


$ type -all qmake
qmake is /usr/lib64/qt-3.3/bin/qmake
$ /usr/lib64/qt4/bin/qmake lfhex.pro
$ make
$ su -l
# cp lfhex /usr/local/bin/

lfhex lets you select between binary, octal, hex, or ASCII display. You can adjust the number of bytes shown per column. Multiple undo and redo are also supported. You can open many editor windows at once in order to compare parts of many binary files.

Large file performance

To test lfhex on a large file I opened the Fedora-8-x86_64-DVD.iso, which is 3.7GB in size, in a virtual machine with 2GB of RAM and no swap file. lfhex opened the binary file almost instantly. The scroll bar allowed me to navigate through the ISO file as soon as I moved it, and seeking using a binary offset was instant. With the Fedora ISO open, gnome-system-monitor reported that lfhex was using 170MB of virtual memory, 18.5MB of resident memory, 10.6MB of writable memory, and 8MB of shared memory.

Unfortunately the compare mode is disabled in the current 0.4 version of lfhex, likely due to 0.4 being the first version to use Qt 4, and that functionality not having been ported to Qt 4 yet. The previous version, 0.3.7.2, which uses Qt 3.x, supports binary diffs on large files. To try it, you can compile and install version 0.3.7.2 using the standard ./configure; make; sudo make install process:


$ wget http://stoopidsimple.com/files/lfhex-0.3.7.2.tar.gz
$ ./configure
$ make
$ cp /usr/local/bin/lfhex /tmp/lfhex
$ cp /usr/local/bin/lfhex /tmp/lfhex.2
$ dd if=/dev/urandom of=/tmp/lfhex.2 bs=100 count=1 seek=24 conv=notrunc
$ ./bin/lfhex -c /tmp/lfhex /tmp/lfhex.2

To test navigation speed between binary differences I created two 100MB files that were almost identical, the second file having a tainted 1KB block of data 24KB from the start of the file:


$ cd /tmp
$ dd if=/dev/urandom of=test1 bs=1024 count=102400
$ cp test1 test2
$ dd if=/dev/urandom of=test2 bs=1024 count=1 seek=24 conv=notrunc

Click to enlarge When searching for the first change in a virtual machine running two cores on an Intel Q6600 CPU, lfhex responded quickly as expected because the first difference was near the start of the file. Searching for the last change causes the search to begin at the end of the file, and this search took 10 seconds to find the last difference, because it had to search backwards almost 100MB of data to find it.

lfhex lacks the ability to insert or delete content, which is a major drawback. If such functionality were implemented, even with a reasonable speed penalty and without the ability to undo changes, it would make lfhex more useful for general binary file editing. lfhex could also benefit from the ability to save your preferences. Having to always select to view eight bytes at a time each time you open lfhex gets annoying quickly.

Still, if you need to examine a large binary file, lfhex is a well worth a look. The speed of initial loading and the ability to seek without waiting are wonderful features.

Ben Martin has been working on filesystems for more than 10 years. He completed his Ph.D. and now offers consulting services focused on libferris, filesystems, and search solutions.