?

Log in

July 26th, 2011

This is the first post in a short series of blog posts about the new GNU make based build system that will soon be integrated into the DEV300 codeline. It is covering the memory usage of the new GNU make build system when using full dependencies.

Since the first announcement of plans to replace the old build.pl/dmake build system with a new solution, one of the focus points was the correct handling of dependencies. To handle this problem correctly, multiple approaches have been tried and one was to have all dependencies in one process because recursive make is considered harmful. As the gnumake2 cws is finally approaching the final status in the lifecycle of a feature branch (being "ready for QA" until "integrated"), it is time to address one of the main concerns reported by community members about it: memory usage.

The very idea of having all information about an OpenOffice.org build in one process -- including all dependencies -- might sound obnoxious and megalomaniac at first. But in general, memory got cheap in recent years and it was the time to at least consider this option.

By now, we can say it was well worth it: gnumake2 is now capable of building eight modules (framework, sfx2, svl, svtools, sw, toolkit, tools, xmloff) in one process, and we have a solid base to approximate the memory usage of a build that contains all dependencies in one process.

First we have to find a reasonable metric of "dependency-intensity" of a part of the build. Then we can try to relate that metric to the memory usage measured on migrated modules to extrapolate to a full build.

A simple metric is the number on #include-statements in hxx- and cxx-files. (That might be a very naïve assumption, but I have spent too much lifetime in the physics department of a university to be scared of a spherical cow.)

To measure the memory usage two methods have been used:

  1. Adding $(info finished parsing.) $(shell sleep 60) to the end of the main makefile allowed me to measure the heap size of a real make process with pmap -d $(ps -a|grep make|cut -f1 -d\ )|egrep -o writeable/private:.[0-9]+K|cut -f 2 -d\ on Linux 64-Bit.
  2. valgrinds massif provided a simple, but synthetic way to do the same (using useful-heap as a measure).
   include statements pmap (KiB)  massif (KiB)
 no module  0  632 407
 tools  964  1184 1115
 svl  1652  1660 1645
 toolkit  2276  3524 3210
 svtools  3768  5548 5179
 framework  6049  5188 4812
 sfx2  6065  6196 5961
 xmloff  6496  6860 6582
 framework, sfx2  12514  10276 9276
 sw  24087  29340 25943
 all migrated but sw  27670  23124 21145
 all migrated but xmloff, sfx2  38796  40088 35550
 all migrated  51757  50812 45129

 

Using a simple linear regression over the data, OpenOffice.org Calcs LINEST function tells me that one can assume a heap usage of 1010±40 bytes per include for the pmap data and 890±35 bytes for the massif data. This plot show no obvious systematic error in the assumption of the model:

gbuild memory plot

 

The last two data points are the predictions of the model for a full build with and without binfilter:

  • without binfilter: 170-190 MiB (pmap), 150-170 MiB (massif)
  • with binfilter: 190-210 MiB (pmap), 170-180 MiB (massif)
So it can be assumed that a full build with all dependencies can be handled decently even by slower and smaller hardware as even common ARM machines have enough RAM to handle this.

Finishing note: gnumake2 will be integrated soon, so if you are using actively developing on the DEV300 codeline, it is advisable to check out the basics of it. A good starting point is the talk "Rebooting the OpenOffice.org Build System" [ ODP PDF video ] I gave at the OOoCon2010. For more detailed information see the Build Environment Effort section at the OOo wiki.

I hope to keep the posts about the build system coming in the next days. Next up: How to migrate a module to gbuild.

 

P.S.: In case an interested GNU make hacker comes across this post, here is some output from make -p for the 8 migrated modules:

2385 variable set hash-tables
# files hash-table stats:
# Load=21210/32768=65%, Rehash=5, Collisions=605939/1022290=59%


# # of strings in strcache: 32753 / lookups = 991181 / hits = 958428
# # of strcache buffers: 284 (* 8176 B/buffer = 2321984 B)
# strcache used: total = 2313570 (5196) / max = 8176 / min = 8170 / avg = 8175
# strcache free: total = 238 (2980) / max = 6 / min = 0 / avg = 0

# strcache hash-table stats:
# Load=32869/65536=50%, Rehash=3, Collisions=545530/1044947=52%


(This is a very raw mirror of the original blog post made to blogs.sun.com 16 Nov 2010. As per http://web.archive.org/web/20090627144253/http://www.sun.com/termsofuse.jsp "... You grant Sun and all other users of the Website an irrevocable, worldwide, royalty-free, nonexclusive license to use, reproduce, modify, distribute, transmit, display, perform, adapt, resell and publish such Content (including in digital form) ..." )

gbuild: How to setup a repository

This is the third post in a short series of blog posts about the new GNU make based build system that will soon be integrated into the DEV300 codeline. It is covering how to use the multi repository support of gbuild.

Welcome back to the little blog series about the new GNU make build system. After having covered the setup of a module in the gbuild system in the last post, this one will be about how repositories are setup with gbuild. A repository in gbuild is a directory where source files are found. While OpenOffice.org currently has every source file in one big repository, gbuild supports to have multiple repositories and will find a file in any of those.

Making repositories known to gbuild

gbuild needs to be told about the repositories it should look for source files. This is done with the variable gb_REPOS which contains a whitespace separated list of source directories. For backwards compatibility, if gb_REPOS is not set, it defaults to $(SOLARSRC) for now (we might get rid of that one day and set gb_REPOS in ./configure along with SOLARSRC).

Setting up a repository for gbuild

Three files need to be present in the root of a gbuild repository:

  • Repository.mk
  • RepositoryFixes.mk
  • and Module_*.mk

In Repository.mk we define some things that need to be globally known for the build. The first thing that needs to be globally set is a variable name for this repository, so that we can refer to it -- for example when defining include paths. This is done by a statement like this:

 

$(eval $(call gb_Helper_register_repository,SRCDIR))

Because of this statement the OpenOffice.org source repository can be referenced as $(SRCDIR) when defining include paths. Other repositories should obviously use a different variable name for their directory.

In addition, we need to declare the naming scheme and the layer that libraries and executables end up in the final product. With the statement:

$(eval $(call gb_Helper_register_libraries,OOOLIBS, [...] fwk [...] sw [...] ))

We register the libraries "fwk" and "sw" to be "OOOLIBS". What this means exactly depends on the platform, but on Linux 64-Bit for example it means that the libraries are named "libfwklx.so" and "libswlx.so" and will end up in the OpenOffice.org layer of the product. This information needs to be declared globally in the repository and can not be simply declared in the local files in the modules like framework/Library_fwk.mk : When one does a partial build (only one module, e.g. sw) that information is still needed, because the filenames of the linked-against libraries need to be known. On OSX, additionally the exact location of the linked-against library (and thus its layer) needs to be known at linktime.

The second file RepositoryFixes.mk is to fix for those cases where developers got too creative for their own good and followed a naming scheme only mostly: naming it slightly different on one platform, for example. When gbuild starts up, it parses the Repository.mk in each repository.  Then it loads the platform defaults and assigns the names and layers according to it. After that the RepositoryFixes.mk file in each repository gets parsed and can fix around "special cases". It should contain all the hacks that work around things that break the usual systematics.

The final file is the Module_*.mk. In the case of the OpenOffice.org repository this is the Module_ooo.mk file. When everything is migrated to gbuild, one will not build module-by-module, but with one make process. This make process will finally, after parsing all other setup code, go into each repository it find in gb_REPOS and include the one Module_*.mk file it finds there. After parsing those module files, the modules in each of those are added as dependencies to the all target -- they need to be built and it turn ensure all modules that are part of it to be built. Gbuild modules can include other modules (yes, you can build "module trees" of arbitrary depth). Module_ooo.mk is just a module that includes all migrated modules.

Ok, enough already of this dry and rather boring topic. The next post will be about eyecandy for developers.

(This is a very raw mirror of the original blog post made to blogs.sun.com on 23 Nov 2010. As per http://web.archive.org/web/20090627144253/http://www.sun.com/termsofuse.jsp "... You grant Sun and all other users of the Website an irrevocable, worldwide, royalty-free, nonexclusive license to use, reproduce, modify, distribute, transmit, display, perform, adapt, resell and publish such Content (including in digital form) ..." )

gbuild: Eyecandy for developers

This is the fourth post in a short series of blog posts about the new GNU make based build system that will soon be integrated into the DEV300 codeline. It is covering getting nice output from gbuild.

Welcome back to the little blog series about the new GNU make build system. After talking about the dry topic of repositories, this post is a just a short post about the output of the new build system. It tries to keep the output calm and clean by default. When you start a build with:

make -srj9

you will get an kbuild-like output:

 

[ build CXX ] tools/source/misc/pathutils
[ build LOG ] tools
[ build LNK ] Library/libtllx.so
...
[ build MOD ] tools
[ build ALL ] top level modules: tools
[ build ALL ] loaded modules: tools

A make clean command result in the same clean output but with "clean" instead of "build". When setting some variables:

export gb_TITLES=T gb_COLOR=T

The output gets a little more attractive:

The left column shows a make clean, the middle column a make/make all (top: with color, bottom: without color). The __.oO and Xx.__ ASCII art represent my best attempt at symbolizing a building/cleaning target. If you have a better idea, give me a note. Please note that the colored output will also help when using a verbose build as it will stick out between all the other output and allows easier orientation in the output. The gb_TITLES=T enables that the progress is also shown in the terminal title. Screenshots are not very good at conveying that, unfortunately.

The two terminal windows on the right show some of the verbose error messages that the gbuild system issues when it deems something wrong. Please note that these errors are reported early (before starting to really build anything) and not late (when trying to actually compile/link something that does not exist).

Here are a few conditions, that gbuild will try to detect and complain about:

  • initial makefile outside of the source repositories
  • no call to gb_Helper_register_repository in the Repository.mk
  • adding an executable/library to an invalid group in gb_Helper_register_* (The error message will report the valid groups.)
  • corrupted module stacks
  • adding a object to a library which has no C/C++ source file in any of the repositories
  • generating a component file which has no source file in any of the repositories
  • generating resource for which there is no source file in any of the repositories
  • linking against a library that was not registered in Repository.mk
  • defining a library that was not registered in Repository.mk
  • unknown platform
That is it for this post. The next one will be about issuing build commands and how the commands in the new build system compare to those in the old build.pl/dmake combination.

(This is a very raw mirror of the original blog post made to blogs.sun.com on 21 Dec 2010. As per http://web.archive.org/web/20090627144253/http://www.sun.com/termsofuse.jsp "... You grant Sun and all other users of the Website an irrevocable, worldwide, royalty-free, nonexclusive license to use, reproduce, modify, distribute, transmit, display, perform, adapt, resell and publish such Content (including in digital form) ..." )
This is the fifth post in a short series of blog posts about the new GNU make based build system that was integrated into the m96 milestone of the DEV300 codeline. It is covering gbuild commands and usage.

Welcome back to the little blog series about the new GNU make build system. After showing off some ANSI color eyecandy for the new build system it is time to have a look at the new build system and how to command it in the usual usecases (all commands assume the shell to be in root directory of the module in question, if no explicit cd command is given):

 

build.pl/dmake GNU make build system description
build && deliver
make -sr
builds the current module
deliver -undeliver && rm -rf $PLATFORM
make -sr clean
clears the module from the $OUTDIR (solver) and clears local build directories
build --all && deliver
build --all
 
cd instsetoo_native && build --all
cd $SRC_ROOT && make -sr
builds all
cd instsetoo_native && build --prepare --from sal
cd $SRC_ROOT && make -sr clean
clears all modules and all local build directories


Some things changed from the old build system. Here is an overview:

no local output tree

The GNU make build system does not use a "local module output directory". All modules use a $WORKDIR (by default a directory named "workdir" in the platform directory in the $OUTDIR/solver) for intermediate files. This makes the source tree read-only for the migrated modules.

cleaning up of modules

build --prepare will not clear the $WORKDIR of files by migrated modules. However, calling make -sr clean in the module or in the $SRC_ROOT will.

current directory when starting make

Other than build.pl, one can not call make in any subdirectory to build the module. Either, one has to cd to the module root before calling make, or one has to explicitly give the makefile in the module root to the make command: cd sw/source/core && make -srf ../../Makefile.

changes in parallelization

The old build system used one dmake process per directory, while the new one is hooked into build.pl as one make process per module for now. Big modules like sw only use the parallelization by the second -P switch given to the build --all command. As more and more modules get migrated the second -P switch in a build --all -P4 -- -P4 command will get more important. In the end -- after getting rid of build.pl -- only one make process will be used for the whole build and the maximum number of jobs will be given to via the -j switch, thus eliminating the need for guesswork on how to distribute the parallelization over the two old systems.

 

precompiled debug headers

On Windows support for precompiled headers is also available on debug builds, resulting for example in a speedup of ~40% for an build of module sw for debug builds.

 

no seperation of build and deliver

The new build system does not separate the build and deliver steps of a module. Since libraries are always linked against the solver/$OUTDIR this means that in module framework, where the library fwk is linking against the library fwi, the library fwi will be copied to the solver/$OUTDIR before linking the library fwk. This lifts the artificial dependency barriers introduced by modules, but also results in that building a module always modifies the solver/$OUTDIR. It also avoids the confusion of building a module, but forgetting to deliver it.

no local module builds

One can not simply copy a module to "anywhere" and build it there. The build system will notice this and will bail out. And even when it would not bail out, it would ignore the copied module for anything but the makefiles. It would still look for the files to build and to compile in the directories given in the variable $gb_REPOS as described in the post about multiple repository support.

To provide a workaround for the rare usecase that one wants to build only one module with some quick or risky changes without changing the solver, there is a setuplocal target available in gbuild. For example to do experimental stuff on the tools module one would:

export gb_LOCALBUILDDIR=/tmp/myoootoolsexperiment

cd tools

make setuplocal # this will create a copy of the tools module and the solver at $gb_LOCALBUILDDIR and tune the build system to that location

cd /tmp/myoootoolsexperiment/srcdir/tools

# hack away

cd $SRC_ROOT/tools

make removelocal # clears $gb_LOCALBUILDDIR and allows work directly on the source

This is an extension to the gbuild system (because it relies on rsync, which the gbuild core itself should not do) and thus can be found in the extensions directory of the build system.

full dependencies

Migrated modules always have full dependencies thus changing one header in a low level module will trigger a rebuild of all objects using that header. On Windows that means all headers except compiler headers and headers from platform, directx and Java SDK, on the other platforms it means all headers.

Faster no-op builds
Checking that nothing (or almost nothing) needs to be rebuild is faster. On a sample system (Notebook with Core2Duo, 2 GHz) on Windows XP (anti virus software installed), rechecking that nothing needs to be done for module sw takes 7 sec with a warm cache. On the same machine build.pl/dmake took 210 sec with the same "full" header dependencies.

(This is a very raw mirror of the original blog post made to blogs.sun.com on 21 Dec 2010. As per http://web.archive.org/web/20090627144253/http://www.sun.com/termsofuse.jsp "... You grant Sun and all other users of the Website an irrevocable, worldwide, royalty-free, nonexclusive license to use, reproduce, modify, distribute, transmit, display, perform, adapt, resell and publish such Content (including in digital form) ..." )

old gbuild blog posts

Since blogs.sun.com/gullFOSS went offline a while ago, some of my blog posts about the new gbuild build system are not available anymore. I mirror those posts here now for that reason. They might be outdated (after all the world is moving quite fast at LibreOffice) and contain broken links, but still might help explain some of the insane design decisions made with gbuild.

Here they are:The posts themselves are untagged as to not spam the planets with old content.

Profile

sweetshark
Bjoern Michaelsen
Website

Latest Month

July 2012
S M T W T F S
1234567
891011121314
15161718192021
22232425262728
293031    
Powered by LiveJournal.com
Designed by Lilia Ahner