concurrent builds

Mon Jan 17 14:01:41 CET 2011

I've done some thinking and want to try with another example to show the difference between the way the current lin works and how I think a concurrent lin may work. The approach I have in mind needs no modification of moonbase, only of lin itself.

THE EXAMPLE:

imaginary dependency tree:

A
|--->B1
|    |--->C1
|    |--->C2
|    |    |--->D
|--->B2
|    |--->D
|--->[B3]
|    |--->D

None of these modules are installed.
The user wants to 'lin A' and intends to choose 'B3'.

CURRENT lin (I have not looked at the source code, but I guess this is how it works):

The current lin uses some kind of depth-first building algorithm: lin starts with the root (A). Now it figures out which dependencies to include by looking at DEPENDS and asking the user for options (B1, B2 and optional B3, the user chooses it).

The next step is to do the same with the first dependency (B1), then with its first dependency (C1).

C1 is the first module with no missing dependencies, so it gets built. Then C2 which needs D, D needs nothing: build D then C2 then B1 ... this is classic depth-first tree traversing.

CONCURRENT lin:

With concurrent lin, you have to use a different approach, because the whole dependency has to be known before any module can be built. This means, that ALL questions from CONFIGURE and DEPENDS of the whole dependency tree have to be asked and answered before any building starts, which can be done using the same depth-first algorithm as in non-concurrent lin.

This idea is kinda event-based: First we need to look at all the leaves of the tree. Leafs are those modules with no missing dependencies. Those are module 'C1' and three times module 'D' -- that is our first 'building block': {C1,D}, which can be built concurrently. Now to use our leaf-logic further, we remove all leafs from the tree:

A
|--->B1
|    |--->C2
|--->B2
|--->B3

Now again: take all leaves, C2, B2 and B3 and define the next building block {C2,B2,B3} and build them concurrently.

A
|--->B1

The last two steps are rather simple: build only block {B1} and then only {A}.

Let us now compare the build order by using single-module building blocks for non-concurrent lin:
non-concurrent: {C1},{D},{C2},{B2},{B3},{A}
concurrent: {C1,D},{C2,B2,B3},{B1},{A}

In case a module fails during the concurrent build process, everything not depending on that module can obviously continue building. After nothing can be built anymore, there should be some kind of output which explains which module(s) failed and which part of the tree could not be built, so the user knows what to do to try building the tree again.

As you can see, there are no module-script-file modifications necessary for this basic approach.

best regards
Christian Krause aka wookietreiber

On Sun, Jan 16, 2011 at 07:23:32PM +0100, Duncan Gibson wrote:
> > I wanted to present a use-case for concurrent builds with lin.
> >
> > If all modules could be built with 'make -j CORES' this wouldn't be
> > necessary, but unfortunately this is not the case. Some modules aren't
> > PSAFE (they can't build concurrently), others don't use make. So with
> > non-concurrent builds it is not always possible to achieve optimal
> > resource usage and the minimum amount of time to install modules (with
> > modern CPU architectures in mind, that have many cores).
> 
> I'm no expert on the internals of the dependency tracking so what comes
> next might be completely wrong.
> 
> I have the feeling that for your idea to work, every possible dependency
> in the moonbase would need to be explicitly specified somewhere, whether
> it's a build time dependency (eg it uses another tool during installation)
> or it's a run-time dependency (eg it needs another tool/library to run).
> Both depends and optional_depends would need to be tracked.
> 
> This would make the maintenance of each module in the moonbase much more
> time-consuming because it means that potentially every configure option
> needs to be checked, and added/updated/deleted with each version bump.
> 
> As it stands, many modules have configuration files that only pull in
> features from other modules if those modules are already installed. In
> some cases the other module is definitely required at build or run time,
> and so the module needs a "depends other_module". In others the module
> may build/run without another module, but it relates to a common task
> so the user might want to consider installing it, ie an "optional_depends".
> 
> Requiring explicit specification of all possible dependencies would make
> the system brittle, because all modules' dependencies would need to be
> consistent with each other, or else you could build out of order.
> 
> It isn't clear to me that the potential benefits of being able to build
> modules in parallel to speed up installation outweigh the extra mainenance
> required to keep all of the dependencies up-to-date and consistent.
> 
> Just my 2c.
> 
> D.