Skip to main content

Bisecting Kernel Using PXE Boot

Recently I found that an Atheros AR5BXB64 wireless card in my Acer Aspire One AOA150 netbook stopped working properly in 13.04 with 3.8 kernel. The problem was that any file download over the wireless interface had a high chance of being corrupted. I was not able to download updates - every time something big (like a kernel upgrade) got downloaded, I got the file, but the checksum did not match. The card was able to inject copies of previous buffers into the stream in a way that still passed IP and TCP checksum verification.

This was exactly the same symptom we had when Active State Power Management (ASPM) L0s was enabled for that PCI-E device, when the hardware itself was not actually able to handle this state properly. It was fixed in 2009 but suddenly got broken in 3.8 again.

I filed a bug (LP:1137513) and, since I had a workaround, I just continued using the netbook with a setpci call in rc.local until I was asked whether any prior Raring kernels exhibited this behavior.

I booted 12.10 from a Flash drive and found that the card was working properly there. Then I checked 3.7 kernels and found that the issue did not exist there and appeared only in 3.8.0-rc1, which suggested there was indeed a regression in the kernel. And the difference was that 3.7 and earlier kernels had LinkCtl: ASPM Disabled while 3.8+ had LinkCtl: ASPM L0s L1 Enabled in the lspci -vv output.

Since I was not able to find any relevant commits that looked like they were deliberately doing something to ASPM handling, the only way to proceed was to run a binary search. Since it was possible to check whether the bug was there or not just by checking lspci, it could and should have been automated.

Bisect works by splitting the commits between known good and known bad revision in half with every iteration and testing the commit in between. Rinse and repeat until you hit a commit that made the switch from good to bad and that's where the things broke.

I had to test 5840 commits, with roughly 13 steps (213=8192, 212=4096). 13 manual reboots are 12 reboots too many so I went on to set up an environment that would do everything for me. Additionally I did not want to build the kernels on that netbook itself as it is pretty slow. The build host and the netbook should have been separated if I wanted to finish the task in a reasonable amount of time.

So, here's the plan:

  • Set up a PXE environment.

  • Find the minimal kernel config that exhibits the issue.

  • Create a custom initrd containing all the tools needed to detect whether an issue is there.

  • Create a script that can be reliably used by git bisect to make it all run automatically.

PXE

PXE is a Pre-boot Execution Environment which allows network boot without any local disk installation. It works by starting a DHCP client early in the boot process, obtaining the address of the TFTP server that hosts the boot images, downloading the image and passing the control over. TFTP is a Trivial File Transfer Protocol, a really simple connection-less and authentication-less protocol that's running over UDP.

In my case I already had a DHCP server which was running on my OpenWRT-based router. I set up a virtual machine and had my DHCP server direct the queries for the boot images to that virtual machine. This was added to my /etc/dnsmasq.conf on the router:

# PXE boot
dhcp-boot=pxelinux.0,raring-server-i386.lappyfamily.net,192.168.100.102

Then I switched to the "raring-server-i386" and installed atftpd.

Ubuntu 12.04 Note

The version in 12.04 suffers from a bug that prevents rlinetd from being set up properly. If you happen to hit the bug, you can create the configuration manually by executing the following:

. /etc/default/atftpd
sudo update-inetd --group BOOT --add \
"tftp dgram udp wait nobody /usr/sbin/tcpd /usr/sbin/in.tftpd $OPTIONS"
unset OPTIONS

In order to run Linux with PXE you will need to use PXELINUX. The required image, pxelinux.0 can be obtained after installing syslinux package from /usr/lib/syslinux/pxelinux.0. By default atftpd uses /srv/tftp as a root for the shared filesystem, so pxelinux.0 should go to /srv/tftp and be readable by nobody user.

Then I set up the netbook to boot from PXE. This will vary between different BIOS setup interfaces, but it boils down to enabling integrated LAN adapter, enabling boot ROM on that adapter and setting the network device as the first one to boot from. A wired connection to an IPv4 network is needed.

With pxelinux.0 in place I booted the netbook:

Intel UNDI, PXE-2.1 (build 082)
Copyright (C) 1997-2000  Intel Corporation

This Product is covered by one or more of the following patents:
US5,307,459, US5,434,872, US5,732,094, US6,570,804, US6,115,776 and
US6,327,625

Realtek PCI Express Fast Ethernet Controller Series v1.15 (090224)

CLIENT MAC ADDR: 00 1E 68 E2 10 AE  GUID: 40709CDA-A8E0-D411-8C6C-001E68E210AE
CLIENT IP: 192.168.1.14  MASK: 255.255.255.0  DHCP IP: 192.168.1.1
GATEWAY IP: 192.168.1.1



!PXE entry point found (we hope) at 9D17:0109 via plan A
UNDI code segment at 9D17 len 19CA
UNDI data segment at 930A len A0D0
Getting cached packet  01 02 03
My IP address seems to be C0A8010E 192.168.1.14
ip=192.168.1.14:192.168.100.102:192.168.1.1:255.255.255.0
BOOTIF=01-00-1e-68-e2-10-ae
SYSUUID=40709cda-a8e0-d411-8c6c-001e68e210ae
TFTP prefix:
Unable to locate configuration file

Boot failed: press a key to retry, or wait for reset...

The line starting with "My IP address" is our pxelinux.0 working. Now we need to put configuration and kernel in place.

You can just grab the kernel and initrd from existing installation and put them in /srv/tftp. In my case I put kernel to /srv/tftp/vmlinuz and initrd to /srv/tftp/initrd.img. If you check /var/log/syslog on the TFTP server, you will see that pxelinux wanted to get the configuration from the following locations:

pxelinux.cfg/40709cda-a8e0-d411-8c6c-001e68e210ae
pxelinux.cfg/01-00-1e-68-e2-10-ae
pxelinux.cfg/C0A8010E
pxelinux.cfg/C0A8010
pxelinux.cfg/C0A801
pxelinux.cfg/C0A80
pxelinux.cfg/C0A8
pxelinux.cfg/C0A
pxelinux.cfg/C0
pxelinux.cfg/C
pxelinux.cfg/default

If you are wondering, C0A8010E is 192.168.1.14 in hexadecimal notation.

So we need to put configuration at some location it is querying. I decided to use the MAC address, 01:00:1e:68:e2:10:ae and put the following configuration to /srv/tftp/pxelinux.cfg/01-00-1e-68-e2-10-ae (yes, the folder is named pxelinux.cfg):

DEFAULT linux
LABEL linux
KERNEL vmlinuz
APPEND initrd=initrd.img panic=10

You can reboot your device now and you will see that the kernel is loaded, initrd.img is loaded too but the boot breaks since the root device is undefined. While you can put the root information from your grub configuration, we don't actually need to do that. As we are going to build our own kernel with minimal number of features, we will run everything from the intitial RAM drive, initrd. I put the panic=10 option so that the kernel reboots after 10 seconds in case it fails so that I would not need to shut the netbook down manually.

Kernel

Well, that was easy. I grabbed the git repository of Linus, git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git, checked out v3.8-rc1 and ran make menuconfig

/galleries/dropbox/make-menuconfig.png

Since I was interested in ath5k module only I disabled loadable module support and made ath5k a built-in module. I left quite a few features I did not know/care about enabled which I believe has somehow increased the build time, but in the end it took 3 minutes to build a kernel on my VM with 4 concurrent builds (make -j 4, the VM had 4 CPUs allocated)

initrd

When kernel boots with initrd, it mounts that image as a root device and runs init as PID 1 trying /sbin/init, /etc/init, /bin/init and /bin/sh. In case /bin/sh is handled by busybox and it is running as PID 1, it attempts to run /init and /sbin/init. So I needed to create a new initramfs that had everything inside the /init script itself:

  • Setting up /dev, /proc, /sys and /tmp in case I need that for debugging.

  • Configure the network interface.

  • Execute the required test.

  • Wait for a reboot command.

Here's what I came up with, the first part is stolen from /init that ships with Ubuntu:

#!/bin/sh

[ -d /dev ] || mkdir -m 0755 /dev
[ -d /root ] || mkdir -m 0700 /root
[ -d /sys ] || mkdir /sys
[ -d /proc ] || mkdir /proc
[ -d /tmp ] || mkdir /tmp
mkdir -p /var/lock
mount -t sysfs -o nodev,noexec,nosuid sysfs /sys
mount -t proc -o nodev,noexec,nosuid proc /proc
# Some things don't work properly without /etc/mtab.
ln -sf /proc/mounts /etc/mtab

ifconfig eth0 192.168.1.14
route add default gw 192.168.1.1

sleep 5

lspci -vv -d 168c:001c | nc 192.168.100.102 5555

nc -l -p 5555 && reboot -f

So this init script configures the wired interface, sends the output of lspci to the host running the bisect script and listens to an incoming TCP connection on port 5555 so that it would reboot the host when connection is terminated. There is no need for anything fancier.

The initramfs contained the minimal set of applications and libraries, but you will want to grab busybox from a working system as it has a reboot applet which is missing from the version shipped with initrd by default on Ubuntu systems. I needed to do a bit of tweaking before I got everything working as I wanted so I had /bin/sh (symlink to /bin/busybox) originally in place of all these lspci calls. Also I used nc from netcat-traditional since it had less dependencies.

ubuntu@raring-server-i386:~/minroot$ find
./init
./bin
./bin/sh
./bin/lspci
./bin/nc
./bin/bash
./bin/busybox
./lib
./lib/ld-linux.so.2
./lib/libresolv.so.2
./lib/libc.so.6
./lib/libz.so.1
./lib/libpci.so.3
./dev
./dev/ram0
./dev/console
./dev/null
ubuntu@raring-server-i386:~/minroot$ find | cpio --format=newc -o | gzip > /srv/tftp/initrd.img
7806 blocks

Bisect script

git-bisect run expects the script to return 0 when this build is considered a success or 1 when the build is considered a failure, so in our case a successful build will have "ASPM Disabled" in the output. grep returns 0 when a string is found, 1 when not found and 2 when error, so we can just use that.

I called the script 'bisect-helper' and put it in my home directory on the VM.

#!/bin/bash

cp ~/kernel.config ~/linux/.config
cd ~/linux
make olddefconfig
make clean
make -j 4
cp arch/i386/boot/bzImage /srv/tftp/vmlinuz

# reboot netbook
echo "reboot" | nc 192.168.1.14 5555

# when it goes up, it will send us a message
nc -l 5555 | grep 'LnkCtl:\s*ASPM Disabled'
exit $?

The script needs to be tested manually first to check whether it is actually producing the results expected:

ubuntu@raring-server-i386:~/linux$ git checkout v3.7
Checking out files: 100% (11706/11706), done.
Previous HEAD position was 19f949f... Linux 3.8
HEAD is now at 2959440... Linux 3.7
ubuntu@raring-server-i386:~/linux$ ~/bisect-helper; echo $?
...
Kernel: arch/x86/boot/bzImage is ready  (#40)
        LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+
0
ubuntu@raring-server-i386:~/linux$ git checkout v3.8
Checking out files: 100% (11706/11706), done.
Previous HEAD position was 2959440... Linux 3.7
HEAD is now at 19f949f... Linux 3.8
ubuntu@raring-server-i386:~/linux$ ~/bisect-helper; echo $?
...
Kernel: arch/x86/boot/bzImage is ready  (#41)
1

Yep, looks like it is working. Now let the madness begin!

ubuntu@raring-server-i386:~/linux$ git bisect start v3.8-rc1 v3.7
Bisecting: 5840 revisions left to test after this (roughly 13 steps)
[6be35c700f742e911ecedd07fcc43d4439922334] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
ubuntu@raring-server-i386:~/linux$ time git bisect run ../bisect-helper
running ../bisect-helper
...
(an hour later)
...
8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6 is the first bad commit
commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6
Author: Taku Izumi <izumi.taku@jp.fujitsu.com>
Date:   Tue Oct 30 15:27:13 2012 +0900

    PCI/ACPI: Request _OSC control before scanning PCI root bus

    This patch moves up the code block to request _OSC control in order to
    separate ACPI work and PCI work in acpi_pci_root_add().

    Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

:040000 040000 ac58db59f5eaf541d5cb36197aaaf5dfe319ea37 d25e645d59da508e3a1bcb01509cbfb5c7a8239b M  drivers
bisect run success

real    63m39.674s
user    155m30.732s
sys 27m33.772s

It took ~64 minutes to run 13 kernel builds and the offending commit was discovered. I filed a bug, it got picked up by Yinghai Lu, who found that this commit triggered an issue in another function that differentiated between ASPM handling for non-hotplug and hotplug cases. This was affecting iwlwifi as well as other modules which were using pci_disable_link_state call, so it was a real issue.

Yinghai Lu has posted the patch to the kernel-pci mailing list and it should be merged at some point.

Summary

I've always wanted to do git bisect on something, but I had no idea what to start with. Bisecting is not specific to git, there's a Bazaar plugin as well and it is fairly easy to use this technique with any other VCS.

Minimizing the kernel build time is important. Originally I spent nearly 2 hours fiddling with the kernel configuration until I removed most of the things that were completely irrelevant to the issue being debugged. Having a powerful machine to build the software is a must in case the actual test needs to be performed on a slower machine.

Booting linux with PXE involves quite a few moving parts (DHCP, TFTP, boot image, kernel and initrd) that may take a while to configure, but all the time invested in this configuration will eventually pay off.