Software configuration management. Part 5. Version control
In previous articles, it has already been indicated what is a configuration item (CI). This is the atomic element that will be under the control of SCM’s activity. Now we need to give the opportunity to create versions and manage their appearance, providing simultaneous uninterrupted operation of the whole team.
The definition of the version was given earlier. The version control system is software that allows you to create versions of elements and work with these versions, as with independent elements. The work involves both the creation of the versions themselves and the structures for their storage. Typically, this is either a chain or trees. In English sources, the term version control systems, abbreviated VCS, is used. Next, the main techniques implemented in the vast majority of version control systems will be described. How they are implemented in applications that the reader uses, we leave to the numerous user manuals, how-to, FAQ and other documents, which can be found without difficulty. The main thing is to understand what principles and why it works that way.
Placement of elements and versions
Before working with elements and their versions, you need to create these elements, i.e. to instruct the version control system to take the existing objects and put them under their control. Together with the element itself, its first version is always created.
Most often as elements for version control are
- files;
- directories;
- hard- and softlinks.
Inside the control system, the elements can be placed in different ways – this depends on the VCS architects. The user only needs to know that the item is placed inside the repository and it works with the commands of the selected toolkit.
Branching and merging versions
As already mentioned, control systems should provide structures for storing versions. The most common representation of this structure is the version tree. This is such an organization of the version of the element, in which several sets of sequences of its versions can be created based on any version of the configuration item. In this case, a separate set, originating from the version, is called a branch. And since the branch contains versions, each version can be a source for creating other branches.
The name of the model speaks for itself: the plants (elements) have buds and leaves (versions), from them, in turn, branches. On the branches – leaves (other versions) and other branches. Again, all the same vegetation grows on them. As a result, a tree grows, in which the crown is a multitude of versions. One element is one tree.
Why do we need this whole design? Can not you just build up the versions one by one? Of course you can. However, this immediately limits the possibility of using such a system. If the versions appear one after another, then at the same time, only one of the users working with the system will be able to create a new version, the rest will be forced to wait. Moreover, when a new version appears, everyone will have to combine their changes with the current developments. And so – until everyone wants to put their work into the chain of versions. At the same time, everyone will have to make sure that the merging of the versions did not lead to a breakdown of the system. And, furthermore, until all changes are placed so under control, all of the waiting people will have to save the intermediate results somewhere locally, without mixing with what is currently in the work. And it’s okay if a couple of people work on a dozen elements – they can always agree. And if the scale is much larger? Add a dozen people (even without increasing the number of elements) – and such a simple chain completely stall the work. In general, the linear version structure gives rise to a number of complexities.
So, it is clear that we can not do without branches. But do not raise the same branch for the slightest developer? Let’s see, in what cases branches grow. Typical examples of branches are as follows:
- a branch for a change request – it starts for versions created during work on a change request (“development” or “CR” branch);
- Integration branch – serves as an intermediate storage for the stabilization process;
- Release line – for posting versions with stabilization configuration (see the relevant section of the first part of the article). Some versions on the branch can be further declared as part of the basic configuration;
- debugging (branch) – for short-term storage of versions, mainly for the purpose of verifying any decisions.
The element.c version tree
The diagram shows an example of a version tree. The element.c file has a release_1.x release branch, to which the stabilized versions of this element are added (1-5). To save the delta for each change request, a separate branch with a special name format is created. In our case, the format has the form rec <record_name>_<username>, where the record_number is the ID of the change request in the tracking system. To integrate the delta from different developers, integration branches with names of the form int_<username>_<suffix> are created, where the suffix stores the description of integration or the number of the stabilized configuration. You can also see a thread for debugging, they are often referred to as dbg_<username>_<random_commentary> – test versions of the changes are posted to it. More information about the growing of each branch from the example will be described below in the text.
Each project can have its own ways of creating and naming branches, but the main ones were listed above. If you use product lines, then it becomes necessary to use all of the listed types.
The version tree grows and grows, and sooner or later it is necessary to unite the results of the work. For example, the developer has grown a branch from one of the elements to work on the change request. On it, he put several versions, and the latter is one that contains debugged and tested code. At the same time, there is a release branch, where the versions released in the framework of basic configurations and stable releases are located. It is necessary to combine the results.
For this, the version merge mechanism is used. Typically, it involves creating a new version of the element, for which the base version on the selected branch (base) is taken as a basis, and the changes contained in the selected third-party version (source) are applied. English sources use the term merge.
The branch with the source version can be displayed both from the source version and from its earlier ancestors.
Existing VCS allows you to do the merge both manually and automatically. And the second way is the main one. Manual merge is requested only in case of conflicts.
Conflicting conflicts occur if both the versions of the element change the same fragment. This situation occurs when the ancestor of the source version is not the version from which the new version will grow. A typical example of such a conflict is the revision history, which is added to the beginning of the source file so that in each version you can immediately see who changed the last and what was done. In the case of a merger of versions that have grown from different sources, this line will exactly cause a conflict, and it is solved only by inserting both lines into history. When a more complex case arises, the developer or expert in the affected code must carefully make the necessary changes.
To the question of common ancestors and the merging of changes: in addition to manual and automatic, the merger can be made in a two-position and three-way way. Two-point merging is done by simply comparing the two versions and adding their deltas (the difference between the versions of the element). The algorithm works on the principle of diff’a or approximately to it: take a delta and insert / delete / change the required lines.
Three-position merging takes into account the “common ancestor” of both versions and calculates the delta based on the history of the element change in the corresponding branches. Accordingly, in the event of a merger conflict, the developer is offered 3 versions of the element – a common ancestor and 2 variants, that with this ancestor became with the passage of time and changes. This approach helps to assess the degree and importance of the delta on both branches and decide whether to integrate the conflict piece often even without the participation of the authors of the changes.
After the merger is completed, information about it should be preserved, if possible. As a rule, most mature VCS have the ability to save “merge arrows” – meta-information about where, where and at what instant of time the changes merged and who did it.
Let’s consider an example – the tree of versions of the element on the diagram below, demonstrating the order of growing and merging the branches on it. As you can already guess, the tree is entirely taken from the diagram above, but the arrows of the confluence are added to it.
Example of merging changes between different branches
So, the project produces a certain product, which includes the file element.c. To store stable versions, the team agreed that all stable or basic versions are stored on the “release_1.x” branch. This will be called the “release branch”. Our element is not an exception, and on the release branch an initial version 1 is created.
For simplicity of notation, we will describe branches as if they were directories on a disk. Accordingly, the first version will be called /release_1.x/1.
Further, one of the managers in the system for tracking change requests (we will refer to this system as simply a bugtracker) entered record number 98, where he described the new functionality of the product. And, of course, I assigned one of the users responsible for this task – let it be user2. user2 thought a bit and started to solve this problem, and after some time decided to put the resulting source code in the version control system. According to the naming standards adopted by the project (CM-politicians), the branch for making changes to our project is called rec <record-number>_<user>[_<comments>]. Therefore, the new branch was named rec98_user2, and from the comments its creator abstained. The work is boiling, the version /release_1.x/rec98_user2/1 appears, and then /release_1.x/rec98_user2/2. On this, while we leave the developer user2, let him think over the task.
After all, while it was working, a record (CR) of number 121 was registered in the bugtracker, in which they described a new error found by the testers. This record was assigned to the user user1, and he began to correct the error that was successfully described. As he corrected, he decided to start a branch to save the results. The new branch, according to project policies, the user called rec121_user1. Note that at the time of starting work and creating a branch, someone already added another stable version to the release branch – /release_1.x/2. Therefore, the branch grows from the last version at that time (second). The branch is created – you can create versions on it. The end result is the version /release_1.x/rec121_user1/2.
What’s next? The bug is fixed, the patch has been tested (we will leave this plane behind the scenes for now) – it’s time to make these changes part of a stable configuration and, possibly, a new basic configuration. Here, the CM engineer or the member of the team that performs this role begins to work. With the help of scrap and sledgehammer … sorry, with the help of the merge command it creates a new version on the release line – /release_1.x/3. Pay attention to the arrow A – it displays just the process of merging.
Let’s return to the user user2 – he just decided to make some changes, but decided to first quickly check what happens, and let colleagues look at their decision. To do this, it creates a debug branch. The CM project policy says that it should be called dbg_<user>[_<comment>]. Accordingly, the new branch will be named /release_1.x/rec98_user2/dbg_user2. On it, the user creates a version of /release_1.x/rec98_user2/dbg_user2/1. It was decided to take the resulting solution in the main code, so the author made a merger of the new delta and the version from which the branch was grown. At the same time, the user cleaned and optimized the code so that it was not a shame to give it to the integration – as a result, the version /release_1.x/rec98_user2/3 turned out. Well, the bright arrow B graphically outlines the process of merging.
However, user2 learns that during its operation a serious error was fixed, for which CR # 121 was started. And this fix can affect the functionality of the new functionality. A decision is made to connect the two deltas and see what will come of it. A version of /release_1.x/rec98_user2/4 is created, which is based on /release_1.x/rec98_user2/3, where the changes from /release_1.x/rec121_user1/2 will be applied. Well, the arrow C merge also appears. This new version is checked for operability and the presence of errors, and a decision is made – it is necessary to integrate! The CM engineer again takes his tools and creates a version of /release_1.x/4, drawing the corresponding arrow D to it.
However, life does not stand still. While our two developers contributed and merged the delta together, the other team members already changed the same file. Two CRs-130 and 131-were created, then assigned to user3. He successfully completed them and made two branches – one for each record. Since the tasks were set and solved at different times, then the branches for their solution were grown from different versions on the release branch. As a result, the versions /release_1.x/rec130_user3/1 and /release_1.x/rec131_user3/1, which were released from the version /release_1.x/3, were obtained.
There are changes – we need to combine them, stabilize them and make them the basic configuration, if everything is normal. For this purpose, the CM-engineer, which runs under the operative nickname user7 in the version control system, creates an integration branch that has the form int_<user>_<future-release-number> in this project. Therefore, the /release_1.x/int_user7_1.5 branch appears. It merges together the two deltas. First, the changes for the record 130, with the version /release_1.x/int_user7_1.5/1. Then – to write 131, for it, version 2 is created on the same branch. For all operations, merge arrows are drawn.
The final chime of the CM engineer is the merging of the version /release_1.x/int_user7_1.5/2 into the release branch with the formation of the version /release_1.x/5. Subsequently, this branch will become part of the basic configuration of the product.
Here is a rather big description of a small picture. One picture is worth hundreds of words – the truth is said.
At the attentive reader in a head the question sharply forgets a question – if at us all becomes through branches and arrows of merge – whence the version /release_1.x/2 took? After all, not a single arrow points to it from any branch! A natural question. The answer is also natural. Yes, there are situations when changes are made directly to the release branch. For example, we found a terrible mistake, made along with the first version – forgot to make a comment in the revision history section about who made the changes! Of course, this is a joke, no one will break politics for the sake of such little things. However, this also happens. The main thing is to know exactly who created the new version, and why he did it. Best of all, if the version control system allows you to limit the rights to create versions for each branch separately. In this case, we will additionally secure the project by granting the rights to add versions on the release branch to only the SCM-engineer. At least with such a restriction it will be easier to find the last one.
After the above, should we say that the ability to work with branches is actually the basic functionality of any mature version control system? Without branches, the version control system can be considered as such only from a formal point of view – simply because it is able to store and issue versions, but no more.