Study on an Approach for Automated Patch Creation and Distribution
Updates are annoying because they are quite frequently and occupy various resources of the user's PC for a significant amount of time. Very often a user has to decide whether he ignores the update notification and takes the security risk of using vulnerable/buggy software or wait for the updates to complete.
While waiting for an update, he may wonder, why the download for a "bugfix" has almost the same size as the initial download of the entire software. A rough analysis of updates from various software distributors reveals that most of those updates are in fact complete software downloads, where some of them aren't even compressed or use weak compression methods. So, the user ends up downloading a bunch of redundant data.
Network bandwidth and connectivity are still an issue due to out-dated infrastructure (example: central Europe), limited bandwidth on broad-band cables over sea and overloaded network nodes (switches) in between. Thus, there is a significant amount of users who still suffer behind 10Mbit and less. Furthermore, an estimate of the potential savings through patches revealed, that even fiber cable networks would not be enough to make patches obsolete.
The ideal solution to this update debacle are actual patches, where the patch contains only those data portions, which are not yet on the local machine, accompanied by instructions on how to modify the old data, to achieve the new software state. As a result, a patch usually contains just a fracture of the size of the entire software.
The various issues of patch creation and application depict the need for one particular improvement, that can be made: A tool which manages and controls the delta creation and application process. A tool which increases the similarity of data streams by normalising its content through unpacking, uncompressing and ordering. A tool which is capable to detect the steps required to properly repack formerly unpacked and uncompressed data, to produce the exact same and not just similar output. This would be some kind of meta-diff tool, which has its very own challenges, mostly driven by the lack of processing time, which will be discussed in another section.
I considered the implementation of a patch creation and distribution system as extension for the debian package management system (dpkg). As stated above, debian now uses LZMA as default compression tool for packages. My original idea was to build the patch distribution around debian, without touching any of the source codes related to dpkg. I wanted to publish that first and see how people react to it. However, the processing effort of LZMA made the advantage of patches over full downloads very slim and depending on the network connection it can even be worse. This disadvantage can only be circumvented, when integrating the patch creation and application deeper into the package manager software.
But there are other issues too. Successfully creating an actual patch for a binary file or package generally requires a tool, which understands the file format. File compression and packaging, which I mainly focused on, is just one example. File types such as images or movies for example have to be handled quite differently and those files usually make up the larger amount of data in a software update. So, when trying to reduce the size of software updates, a system would also need specific tools for at least those file types which contribute the most to the software update size.
Considering these issues and the effort to be put in such a system, I decided to set the project on hold for now, and at least publish my conclusions.
Holger Machens, 26-Aug-2019