cakelab
Home Projects Research Misc. Contact

Delta Service

Study on an Approach for Automated Patch Creation and Distribution

18-Sep-2019

About

Updates are annoying because they are quite frequently and occupy various resources of the user's PC for a significant amount of time. Very often a user has to decide whether he ignores the update notification and takes the security risk of using vulnerable/buggy software or wait for the updates to complete.

While waiting for an update, he may wonder, why the download for a "bugfix" has almost the same size as the initial download of the entire software. A rough analysis of updates from various software distributors reveals that most of those updates are in fact complete software downloads, where some of them aren't even compressed or use weak compression methods. So, the user ends up downloading a bunch of redundant data.

Network bandwidth and connectivity are still an issue due to out-dated infrastructure (example: central Europe), limited bandwidth on broad-band cables over sea and overloaded network nodes (switches) in between. Thus, there is a significant amount of users who still suffer behind 10Mbit and less. Furthermore, an estimate of the potential savings through patches revealed, that even fiber cable networks would not be enough to make patches obsolete.

The ideal solution to this update debacle are actual patches, where the patch contains only those data portions, which are not yet on the local machine, accompanied by instructions on how to modify the old data, to achieve the new software state. As a result, a patch usually contains just a fracture of the size of the entire software.

The various issues of patch creation and application depict the need for one particular improvement, that can be made: A tool which manages and controls the delta creation and application process. A tool which increases the similarity of data streams by normalising its content through unpacking, uncompressing and ordering. A tool which is capable to detect the steps required to properly repack formerly unpacked and uncompressed data, to produce the exact same and not just similar output. This would be some kind of meta-diff tool, which has its very own challenges, mostly driven by the lack of processing time, which will be discussed in another section.

Goals

Review of existing solutions
Review existing solutions to find out why no one uses them.
Analyse and find Caveats
Develop a prototype to evaluate the practical use of such a system.
Project Risk Analysis
Estimate the overall implementation and maintenance effort.

Preliminary results

Existing Solutions are Rare and/or Closed Source
Found only one considerable work on this topic. Patching is usually supported for source code only. Other approaches are closed source. Further analysis requires prototypical implementation.
Prototype of a Meta-Diff Tool
Developed a prototype of a meta-diff tool and tested it on debian binary packages. Used it to evaluate runtime performance of such systems.
Project Risk Analysis
Developed a design for a non-invasive infrastructure for the creation and distribution of patches for debian binary packages. Estimated implementation effort for an extension to the debian package manager system is reasonable. Unfortunately, dpkg switched to LZMA as default compression for packages, a couple of years ago. LZMA induces considerable amounts of processing effort and thereby significantly reduces the advantage of a patch over a full update, unless the patching system gets deeper integrated in dpkg.

Conclusions

I considered the implementation of a patch creation and distribution system as extension for the debian package management system (dpkg). As stated above, debian now uses LZMA as default compression tool for packages. My original idea was to build the patch distribution around debian, without touching any of the source codes related to dpkg. I wanted to publish that first and see how people react to it. However, the processing effort of LZMA made the advantage of patches over full downloads very slim and depending on the network connection it can even be worse. This disadvantage can only be circumvented, when integrating the patch creation and application deeper into the package manager software (see draft for details). Originally, I considered the separation of both systems as a vital feature, which would have allowed users to try out the delta service without any risk.

But there are other issues too. Successfully creating an actual patch for a binary file or package generally requires a tool, which understands the file format. File compression and packaging, which I mainly focused on, is just one example. File types such as images or movies for example have to be handled quite differently and those files usually make up the larger amount of data in a software update. So, when trying to reduce the size of software updates, a system would also need specific tools for at least those file types which contribute the most to the software update size.

Considering these issues and the effort to be put in such a system, I decided to set the project on hold for now, and at least publish my conclusions.


Holger Machens, 02-Jan-2021