In-place conversion of FreeBSD systems from i386 to amd64

I used this process on 05 July 2015 to perform an in-place conversion of my "infrastructure" machines here at home (running FreeBSD stable/10) from i386 to amd64.

I do not claim that this is ideal, or that it will work for others. It is, however, what actually worked for me.

NOTE: I do almost everything described below within an invocation of script(1) -- and usually, that is done within an invocation of tmux(1).

Initial Background and Environment

Over the years, I have mostly performed source-centric updates to my systems, usually at relatively high frequencies (e.g., weekly for "production" machines; daily for experimental/development machines). Please see the page on "upgrading" for more details on that (if you're interested).

One drawback of the approach I have used historically is that, while updating FreeBSD base was fairly fast and easy, updating installed ports by building them in-place was extremely variable, and could take several hours (e.g., if firefox and libxul were both being built on the same day).

Further, with the change of architecture, I was going to want to replace all of the installed ports. The last time I had done that was for the migration from stable/9 to stable/10 -- and I was not eager to re-experience that.

One of the catalysts for this effort was the replacement of my former build machine (a loud, slow, space-constrained 1U rack-mount chassis) with a relatively quiet, fast mini-tower that I was able to make faster (by replacing its primary disk drive with an SSD), and then add 3 2TB disk drives to further remove "space contraints" from the issues I had.

I have been using UFS systems since about 1989; while it has presented challenges from time to time, UFS qualified as "the devil I know" (vs., say, ZFS). Further, my old build machine had but 2GB RAM and a total of about 80 GB usable disk space: There's no way I had incentive to try to run ZFS on that machine.

The new build machine, though, has 32GB RAM; the SSD helped make it ... quite snappy. And I was able to put the 3 2TB drives on the system (which weren't actually being used for anything yet), and set them up as a raidz zpool. And if there was a problem, scratching them and starting over would only take a bit of time, and perhaps a bit of my pride: nothing critical would be lost.

Preparation - Base FreeBSD

First, on the build machine: I booted to slice 1 (stable/10, i386). I then issued:

setenv TARGET_ARCH amd64 setenv TMPDIR /tmp cd /usr/src make -j16 buildworld make -j16 buildkernel
This created /usr/obj/amd64.amd64.

Then, on a target machine: First, "clone" the running slice (2) to the target slice (1) (to ensure that all configuration information is exactly as it should be:

umount /S1/usr umount /S1 newfs -U /dev/ada0s1a && mount /dev/ada0s1a /S1 && dump 0Lf - /dev/ada0s2a | \ (cd /S1 && restore -rf - && rm restoresymtable) && sync && \ df -k /dev/ada0s2a /dev/ada0s1a newfs -U /dev/ada0s1d && mount /dev/ada0s1d /S1/usr && dump 0Lf - /dev/ada0s2d | \ (cd /S1/usr && restore -rf - && rm restoresymtable) && sync && \ df -k /dev/ada0s2d /dev/ada0s1d cd /S1/etc && rm fstab && ln -fs fstab.S1 fstab cd /S1/usr && rm obj && ln -fs /common/S1/obj .
(Note that these operations may be done in parallel.)

Preparation - Ports

In order to avoid the hours-long "rebuild ports inplace" scenario (referenced above), I resolved to try using poudriere -- largely based on the success the FreeBSD.org "cluster admin" team has had with it. And while poudriere can be used with UFS, it was designed to take advantage of features of ZFS -- and I had a nearly perfect setup for experimenting with ZFS while presenting minimal risk to anything important -- as long as the build machine was booted from the stable/10 amd64 slice. So....

The first stop, then, was to the ZFS Quick Start Guide in the FreeBSD Handbook to set up my first zpool.

Based on that guidance, I decided I wanted to use a 3-spindle raidz; I had no reason to care about anything fancy for this exercise -- recall that the goal here is to migrate machines from i386 to amd64; learning a bit about ZFS is nice, but not my present primary focus. Accordingly, I issued zpool create tank raidz ada{1,2,3}; a subsequent "df" showed 3.5TB available there, which should be plenty of room for what needs to be done.

The next stop was to the Building Packages with Poudriere section of the FreeBSD Handbook. And at this point, I started encountering... issues.

As noted above, my general approach to updating FreeBSD systems has been pretty far to the "get the source and build it" end of the scale (vs. "install everything from pre-built packages"). Further, while the new build machine is probably fast enough that I could let it build a whole new world (for the poudriere jail(s)), that seemed silly and wasteful: I already have the machine doing that. But it wasn't apparent to me how to make use of that already-done work... so I posted a query to the FreeBSD Forum.

I won't repeat what's in the forum post & the replies, but the responses got me "on track" to making further progress.

Here are the significant things I changed from poudriere.conf.sample (for my poudriere.conf):

@@ -26,7 +26,7 @@ # # Also note that every protocols supported by fetch(1) are supported here, even # file:/// -FREEBSD_HOST=_PROTO_://_CHANGE_THIS_ +FREEBSD_HOST=file:///repo/svn/freebsd/ # By default the jails have no /etc/resolv.conf, you will need to set # RESOLV_CONF to a file on your hosts system that will be copied has @@ -62,11 +62,11 @@ # all - Run the entire build in memory, including builder jails. # yes - Only enables tmpfs(5) for wrkdir # EXAMPLE: USE_TMPFS="wrkdir data" -USE_TMPFS=yes +USE_TMPFS=wrkdir # How much memory to limit tmpfs size to for *each builder* in GiB # (default: none) #TMPFS_LIMIT=8 @@ -85,6 +85,7 @@ # The full mirror list is available here: # http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/svn.html#svn-mirrors #SVN_HOST=svn0.us-west.FreeBSD.org +SVN_HOST=/svn/freebsd # Automatic OPTION change detection # When bulk building packages, compare the options from kept packages to @@ -215,15 +216,15 @@ # Define the building jail hostname to be used when building the packages # Some port/packages hardcode the hostname of the host during build time # This is a necessary setup for reproducible builds. -#BUILDER_HOSTNAME=pkg.FreeBSD.org +BUILDER_HOSTNAME=freebeast.catwhisker.org # Define to get a predictable timestamp on the ports tree # This is a necessary setup for reproducible builds. #PRESERVE_TIMESTAMP=yes

As mentioned elsewhere, I have been using portmaster to update installed ports -- and in particular, I have used minor variations on the procedure at the end of the portmaster(8) man page to perform complete re-build/replacement of all installed ports on a machine.

I recalled that that procedure included a step to get a list of the installed ports on a machine, so I ran portmaster --list-origins > /tmp/ports.`hostname -s` on each of the production machines, on my build machine, and on my laptop, copied them over to the build machine, concatenated them, ran them through sort -u to remove duplicates, then edited what was left to remove ports that were obviously only build dependencies (e.g., autoconf*; automake*; most things in the devel/* category). I called the result of all this /usr/local/etc/poudriere.d/10amd64-home-pkglist.

I had also (following the suggestion in the above-referenced Forum thread) tarred up /var/db/ports on the above machines, then un-tarred that mess into /usr/local/etc/poudriere.d/10amd64-home-options on the build machine. That avoided having me go through the options selection for about a thousand ports -- and enabled me to keep the same options I had already been using.

So the next step is to create a jail for poudriere; I did that:

cd /usr/local/etc/poudriere.d
poudriere jail -m src=/usr/src -c -j 10amd64 -v stable/10 -a amd64

As noted above, I already had a ports tree. Now, one thing that isn't necessarily obvious from the material above is that because of space contraints on the build machine, I had moved the ports tree to the ReadyNAS -- so the build machine and the production machines were each able to access the same (SVN working copy) of the ports tree via NFS. Also, a week before I did this, I migrated from using amd(8) to autofs(8). But I still needed to tell poudriere where the ports tree was: poudriere ports -c -F -M /usr/ports -p ports

Now to start the process: poudriere bulk -j 10amd64 -p ports -z home -f 10amd64-home-pkglist

This ran for about 10 - 15 minutes, then started throwing errors, e.g.: make: chdir /usr/ports/devel/gsettings-desktop-schemas: No such file or directory which casued me to wonder: "Huh??!?", open a new window on the build machine, and:

freebeast(10.2-P)[10] ls -laT /usr/ports/devel/gsettings-desktop-schemas
total 192
drwxr-xr-x     2 cvsupin  cvsupin    4096 Nov 20 03:39:31 2014 .
drwxr-xr-x  5106 cvsupin  cvsupin  172032 Jul  5 03:54:49 2015 ..
-rw-r--r--     1 cvsupin  cvsupin    2015 Nov 20 03:39:31 2014 Makefile
-rw-r--r--     1 cvsupin  cvsupin     186 Nov 20 03:39:31 2014 distinfo
-rw-r--r--     1 cvsupin  cvsupin      86 Jan 23 04:34:08 2014 pkg-descr
-rw-r--r--     1 cvsupin  cvsupin    2873 Nov 20 03:39:31 2014 pkg-plist
freebeast(10.2-P)[11] 
And before I go further, I should point out that one of the folks on the FreeBSD.org cluster admin team suggested that I use ports-mgmt/poudriere-devel; so I ran portmaster -o ports-mgmt/poudriere-devel poudriere-3.1.7 to effect the switch... but that didn't seem to help: I still saw the "No such file or directory" message for ports directories that existed (though by virtue of NFS and autofs).

So around this point, I started wondering if there might be an issue with NFS or autofs -- there was an issue that had somewhat-similar symptoms just around the time 7.1 was released (January, 2009): amd(8) would count on the system's reference counter to prevent unmounting an NFS-mounted file system that was actually in use; unfortunately, that didn't always work, and unmount(2) would sometimes unmount an in-use file system. (A reasonably easy way to reproduce the problem was to perform an "rm -rf" on a large, deep NFS-mounted directory that was managed by amd(8). Ref. r189287.) (Note: I filed bug 201378 about this.)

So I created another ports SVN working copy -- this time, in the zpool (as /tank/ports). But I made its "distfiles" a symlink to the NFS-mounted "real" one. That done, I removed the old definition: poudriere ports -d -p ports and created a new one: poudriere ports -c -p ports -M /tank/ports -F And found that my ports working copy disappeared! But poudriere ports -d -p ports made it appear again ... but, of course, poudriere knew nothing about it.

So I renamed it, then told poudriere to play its sleight-of-hand game without me: poudriere ports -c -p ports -M /tank/ports -F figuring I'd just rename /tank/parts to something else, then rename the working directory to /tank/ports. Silly me: mv: cannot rename a mount point

After that, I figured that I could move everything out from the working copy to the new /tank/ports ... and that finally(!) worked.

Building the ports turned out to be a bit anticlimactic after that: poudriere bulk -j 10amd64 -p ports -z home -f 10amd64-home-pkglist

[00:00:00] ====>> Creating the reference jail... done [00:00:00] ====>> Mounting system devices for 10amd64-ports-home [00:00:00] ====>> Mounting ports/packages/distfiles [00:00:00] ====>> Using packages from previously failed build [00:00:00] ====>> Mounting packages from: /usr/local/poudriere/data/packages/10amd64-ports-home [00:00:00] ====>> Copying /var/db/ports from: /local/amd64/local/etc/poudriere.d/10amd64-home-options /etc/resolv.conf -> /usr/local/poudriere/data/.m/10amd64-ports-home/ref/etc/resolv.conf [00:00:00] ====>> Starting jail 10amd64-ports-home [00:00:00] ====>> Logs: /usr/local/poudriere/data/logs/bulk/10amd64-ports-home/2015-07-04_13h20m32s [00:00:00] ====>> Loading MOVED [00:00:01] ====>> Calculating ports order and dependencies [00:00:09] ====>> Sanity checking the repository [00:00:09] ====>> Checking packages for incremental rebuild needed [00:00:11] ====>> Deleting stale symlinks [00:00:11] ====>> Deleting empty directories [00:00:11] ====>> Cleaning the build queue [00:00:12] ====>> Recording filesystem state for prepkg... done [00:00:13] ====>> Building 836 packages using 8 builders [00:00:13] ====>> Starting/Cloning builders [00:00:14] ====>> Hit CTRL+t at any time to see build progress and stats [00:00:14] ====>> [01][00:00:00] Starting build of x11-fonts/mkfontscale [00:00:14] ====>> [02][00:00:00] Starting build of graphics/libGL ... [02:05:29] ====>> Failed ports: audio/openal-soft:package net-im/skype-devel:run-depends games/pinball:configure [02:05:29] ====>> Skipped ports: multimedia/mencoder [02:05:29] ====>> Ignored ports: astro/xephem multimedia/win32-codecs [10amd64-ports-home] [2015-07-04_13h20m32s] [committing:] Queued: 836 Built: 830 Failed: 3 Skipped: 1 Ignored: 2 Tobuild: 0 Time: 02:05:28 [02:05:29] ====>> Logs: /usr/local/poudriere/data/logs/bulk/10amd64-ports-home/2015-07-04_13h20m32s [02:05:29] ====>> Cleaning up [02:05:29] ====>> Umounting file systems root@freebeast:/usr/local/etc/poudriere.d # echo $? 4 root@freebeast:/usr/local/etc/poudriere.d # exit

Finally, ensure that the file system where the built packages reside is exported (by updating /etc/exports, then issuing service mountd restart)).

Testing

For this purpose, I used a machine that had hardware that was very similar to one of my production machines, and for which I had created the disk image by restoring backup images from that production machine, then I changed the hostname and IP address while it was still in single-user mode.

First, base FreeBSD: ensure that the build machine is booted from its slice 1 (stable/10 i386 -- but /usr/obj/amd64.amd64 has just recently been populated), then on the target machine, install the amd64 kernel and world from the build machine to the target machine's target slice:

setenv TARGET_ARCH amd64 mount -u -r / mount -u -r /usr umount /S1/usr umount /S1 mount /dev/ada0s1a /S1 mount /dev/ada0s1d /S1/usr mount -u -w /S1 mount -u -w /S1/usr mv -f /S1/var /S1/var.dir ln -fhs /var /S1/var mount -o ro freebeast:/usr/src /usr/src mount -o ro freebeast:/usr/obj /usr/obj mount cd /usr/src make installkernel DESTDIR=/S1 mergemaster -U -u 0022 -p -D /S1 rm -fr /S1/usr/include.old mv -f /S1/usr/include /S1/usr/include.old rm -fr /S1/usr/share/man make installworld DESTDIR=/S1 mergemaster -F -U -u 0022 -i -D /S1 make delete-old DESTDIR=/S1
That done, reboot the target machine from the just-written slice:
gpart set -a active -i 1 ada0 shutdown -r now
noting that some ports (such as tmux) may work, while others (such as sudo) will not.

If the target machine failed to come up in a usable way, you can juust reboot from the previous slice, and nothing should be lost (save for time).

If, however, it did come up, and you are proceeding, this would be a good time to mount the / and /usr file systems read-write (if necessary) and recall that Building Packages with Poudriere section of the FreeBSD Handbook: near the bottom of the page is a section on "Configuring pkg Clients to Use a Poudriere Repository" -- that's what we use next.

Following the recommendation on that page, I created a couple of files:

pogo(10.2-P)[21] ls -lTR /usr/local/etc/pkg total 2 drwxr-xr-x 2 root wheel 512 Jul 4 17:25:18 2015 repos /usr/local/etc/pkg/repos: total 4 -rw-r--r-- 1 root wheel 26 Jul 4 17:18:59 2015 FreeBSD.conf -rw-r--r-- 1 root wheel 116 Jul 4 17:25:18 2015 custom.conf pogo(10.2-P)[22]
The FreeBSD.conf file is exactly as described in the Handbook:
pogo(10.2-P)[22] cat /usr/local/etc/pkg/repos/FreeBSD.conf FreeBSD: { enabled: no } pogo(10.2-P)[23]
For this target machine, which is using autofs, we can let autofs handle the mount as needed:
pogo(10.2-P)[23] cat /usr/local/etc/pkg/repos/custom.conf custom: { url: file:///net/freebeast/local/amd64/local/poudriere/data/packages/10amd64-ports-home enabled: yes, } pogo(10.2-P)[24]

Now that those files are set up, we install the desired ports. Remember the "installed-port-list" file we created in /var/tmp? Now is where we use it:

cd /var/tmp pkg install `cat installed-port-list`
Then respond "y" to the "Proceed with this action? [y/N]:" prompt.

Then there's a bit of base FreeBSD clean-up:

mount -u -w / && \ mount -u -w /usr && \ mount -o ro {freebeast:,}/usr/src && \ mount -w freebeast:/usr/obj /usr/obj && \ cd /usr/src && \ make delete-old-libs && \ cp /var/run/dmesg.boot /var/tmp/dmesg.boot.`uname -r` && \ cd /usr && umount /usr/obj cd /usr && umount /usr/src cd / && mount -u -r /usr ; mount -u -r /

Then reboot (shutdown -r now)... and watch for any "magic smoke" leaks.

Actually Doing It

Just as the "Testing" section shows, except that one of my production machines doesn't use NFS except during an update, so it is not using either amd(8) or autofs(8). For that system, the /usr/local/etc/pkg/repos/custom.conf file looked like:
bats(10.2-P)[1] cat /usr/local/etc/pkg/repos/custom.conf custom: { url: file:///mnt enabled: yes, } bats(10.2-P)[2]
and I manually issued mount freebeast:/local/amd64/local/poudriere/data/packages/10amd64-ports-home /mnt before performing the pkg install `cat installed-port-list`.

And one of those systems is the one running the Web server you are using to view this page.

Postscript: Subsequent Maintenance

The following weekend, I first tested my usual FreeBSD base upgrade procedure on the same test machine referenced above -- and initially, the make installkernel terminated after only a few kernel modules had been installed with a Signal 11 (SIGSEGV). I was unable to identify a cause for this... and a re-try was successful (at which point I tested updating the ports on the machine -- and that also worked).

The next day, I performed the base/ports upgrade to my production machines -- successfully.

For base FreeBSD, the process is as described above under "Testing."

For ports, there are a couple of parts:


$Id: convert_i386_amd64.html,v 1.13 2018/11/20 17:49:10 david Exp $