ZFS Root Boot – Oops! GRUB >

Solaris ZFS root boot mirror pools are usually pretty easy to clone and move to different boxes with little difficulty. There is quite a bit of data on the web about this already so I’m not going to get into the nitty gritty. I might add some better examples with complete shell output later but I’m not near a broken system at the moment. In a nutshell, if you’ve built your O/S on a pair of disks using the ZFS boot option you can later move one of the mirrored disks to another box and clone away. (You’re thinking, ‘why don’t you just use Jumpstart?’ Yeah, me too. )

After the initial install you need to make sure you can boot from both drives. If drive 0 goes away you kinda hope that drive 1 will be bootable. By default the second drive (drive 1) – which is a mirror of the first (drive 0) – is not bootable. In the x86 world you need to run installgrub to make driveN bootable.

Example:


[root@someserver:~]# installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c3t0d0s0
Updating master boot sector destroys existing boot managers (if any).
continue (y/n)?y
stage1 written to partition 0 sector 0 (abs 16065)
stage2 written to partition 0, 272 sectors starting at 50 (abs 16115)
stage1 written to master boot sector
[root@someserver:~]# bootadm update-archive
updating /platform/i86pc/boot_archive
[root@someserver:~]#

You may also want to run bootadm update-archive. On recently installed / disk changed systems you’ll note that the boot archive will get updated just prior the actual reboot.

Once you’ve grabbed a mirrored disk and plopped it into a similar bit of kit you should be able to boot from it. Then you can add a second disk to the system, fdisk it, give it the same partition table as the disk you borrowed from the other system, replace the device in the pool, installgrub on the second disk in the second box, and so on…. I’ll have to write something else about that simple process later.

So — you’ve done all the requisite stuff, replaced the borrowed disk, gave everyone some boot blocks, everything is good to go and it’s time to turn over the cloned box to whomever will be using it. Based on $foo, you decide to reboot the thing one more time, just because.

Oh Carp! It boots to a SPLAT> prompt, errr.. I mean, GRUB> prompt.

Now, one of the common solutions involves some boot media (read: cd or dvd), importing the pool, mounting things on /a and updating the boot archive on /a. In some cases that also means a trip to the datacenter.

Since I like to break as many things as possible I’ve seen the Grand Unified Boot Loader prompt ‘a few’ times. Depending on the hardware (read: boot time), sometimes you’re better off just booting from the CD and fixing things. On the other hand if you know the basic details about where things are supposed to be you can use GRUB as designed. (Okay, you use this method because you really don’t want to drive the datacenter in the afternoon. If it breaks in the morning, you might opt for the CD method. It’s technical, don’t ask.)

I don’t have the proper screenshots for this tonight, but these are some bits of data that I recently used on a borked Sun X4200M2. Everything was going fine, I had installed the boot blocks with installgrub and it returned the correct output but the box rebooted to a grub prompt so I used the data below to boot the box and reinstall the boot loader.

From the GRUB> prompt, in order, I issued the following statements, the kernel loaded and I was in like Flynn! It’s probably a good time to mention that this was done over a serial console that allows remote access to a system console. Also, without some form of LOM (Lights Out Management), or remote power you’ll be headed to the datacenter for the reboots.


findroot (pool_rpool,0,a)
kernel$ /platform/i86pc/multiboot -B $ZFS-BOOTFS
module /platform/i86pc/boot_archive
boot

The snippet above is direct from the GRUB config and each line is followed by some system details as grub locates $x and then boots. In the example, the system was built with the default name of rpool for the root pool.

This is not going to work in every situation, it’s a good idea to have a copy of the menu.lst which is generated by bootadm for your systems. Example:


-bash-3.00$ pwd
/rpool/boot/grub
-bash-3.00$ tail -10 menu.lst | head -3
findroot (pool_rpool,0,a)
kernel$ /platform/i86pc/multiboot -B $ZFS-BOOTFS
module /platform/i86pc/boot_archive
-bash-3.00$

Even when you boot from the CD and import the root pool you may run into issues. For example, booting to a shell from the install media usually tries to find an installed OS. With ZFS root pools you might not have things automagically mounted on /a. Better yet, /a may then be a read only filesystem so importing the pool and fixing the boot archive can be fun. (think /tmp/foo, it’s writeable)

Too many words about a simple fix :) Thanks for reading.

Been Busy….

It’s been awhile. Like many of you I’ve been pretty busy for quite some time now. While I don’t have anything spectacular to report today I do plan on dumping some new information pretty soon.

I thought I’d leave the tidbit below. Looks like a bunch of innocent 404′s, but it’s not. Sure, it’s possible that these few hosts from diverse locations just happened to all visit my tiny website at the same time. It’s also possible that they all use a special version of the Opera browser, on a Windows platform. It’s also possible that they all looked for a specific URL. Ha!

Too bad so many home users have their cable modems/fw/routers and PC’s wide open or just poorly patched. It’s ugly out there :)

68.48.145.24 - - [19/Sep/2010:11:50:47 -0700] "GET http://tm.fidosoft.org/index.php HTTP/1.0" "http://tm.fidosoft.org/index.php" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.54 [en]"
71.235.200.89 - - [19/Sep/2010:11:51:14 -0700] "GET http://tm.fidosoft.org/tm.fidosoft.org/ HTTP/1.0" "http://tm.fidosoft.org/tm.fidosoft.org/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.54 [en]"
216.15.107.121 - - [19/Sep/2010:11:51:17 -0700] "GET /tm.fidosoft.org/ HTTP/1.0" "http://tm.fidosoft.org/tm.fidosoft.org/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.54 [en]"
76.123.104.140 - - [19/Sep/2010:11:51:24 -0700] "GET http://tm.fidosoft.org/tm.fidosoft.org/ HTTP/1.0"  "http://tm.fidosoft.org/tm.fidosoft.org/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.54 [en]"
74.80.40.243 - - [19/Sep/2010:11:52:30 -0700] "GET http://tm.fidosoft.org/tm.fidosoft.org/ HTTP/1.0"  "http://tm.fidosoft.org/tm.fidosoft.org/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.54 [en]"
71.75.42.8 - - [19/Sep/2010:11:52:54 -0700] "GET http://tm.fidosoft.org/tm.fidosoft.org/ HTTP/1.0"  "http://tm.fidosoft.org/tm.fidosoft.org/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.54 [en]"
71.225.200.93 - - [19/Sep/2010:11:53:18 -0700] "GET http://tm.fidosoft.org/ HTTP/1.0"  "http://tm.fidosoft.org/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.54 [en]"

Web garbage

Some ISP’s / hosting firms just don’t get it. Either that or they really don’t care. You would think that some months after the whole roundcube sploit there would be some outbound filtering or monitoring for signatures like the following. Apparently not. Note the first hit for the file, “HTTP/1.1″. Nice.

209.250.238.240 - - [26/May/2009:20:03:25 -0700] "GET HTTP/1.1 HTTP/1.1" ? 224 "-" "Toata dragostea mea pentru diavola"
209.250.238.240 - - [26/May/2009:20:03:25 -0700] "GET /roundcubemail-0.1/bin/msgimport HTTP/1.1" ? 224 "-" "Toata dragostea mea pentru diavola"
209.250.238.240 - - [26/May/2009:20:03:26 -0700] "GET /roundcubemail/bin/msgimport HTTP/1.1" ? 224 "-" "Toata dragostea mea pentru diavola"
209.250.238.240 - - [26/May/2009:20:03:26 -0700] "GET /roundcubemail-0.2/bin/msgimport HTTP/1.1" ? 224 "-" "Toata dragostea mea pentru diavola"
209.250.238.240 - - [26/May/2009:20:03:26 -0700] "GET /roundcube-0.1/bin/msgimport HTTP/1.1" ? 224 "-" "Toata dragostea mea pentru diavola"
209.250.238.240 - - [26/May/2009:20:03:26 -0700] "GET /webmail/bin/msgimport HTTP/1.1" ? 224 "-" "Toata dragostea mea pentru diavola"
209.250.238.240 - - [26/May/2009:20:03:26 -0700] "GET /mail/bin/msgimport HTTP/1.1" ? 224 "-" "Toata dragostea mea pentru diavola"
209.250.238.240 - - [26/May/2009:20:03:27 -0700] "GET /bin/msgimport HTTP/1.1" ? 224 "-" "Toata dragostea mea pentru diavola"
209.250.238.240 - - [26/May/2009:20:03:27 -0700] "GET /roundcube/bin/msgimport HTTP/1.1" ? 224 "-" "Toata dragostea mea pentru diavola"
209.250.238.240 - - [26/May/2009:20:03:27 -0700] "GET /rc/bin/msgimport HTTP/1.1" ? 224 "-" "Toata dragostea mea pentru diavola"

If you would like to filter this kind of traffic but don’t have access to some hardware based IDS/IPS you should take a look at mod_security.

ZFS send and ZFS receive with mbuffer examples on X4500

After settling on a disk layout for one of our Sun X4500 Thumper boxes I had the chance to do some limited testing of transfer speeds over the wire and to /dev/null. Once one of our other Thumpers is freed up I’ll be able to perform more complete testing.

I’ve found mbuffer to be a very useful tool for the transport of ZFS send / receive datasets, within a secured private network. Depending on the data classification, source and destination you might be limited (by your business environment) to using SSH as the transport as mbuffer is not designed to be super secure. You can read more about mbuffer on the website of the author, Thomas Maier-Komor.

Using one thumper with a poorly optimized setup/disk layout I was able to send off about 230GB of data via mbuffer, over the wire to another thumper (with a bit nicer disk layout). Measured externally via SNMP, this transfer averaged about 45MBytes per second over Gig interfaces. While this was sufficient I envision higher rates when I get the chance to rebuild the poorly built thumper with a proper setup of the pools.

The blocking factor for that transfer was not the network. Disk layout plays a role here, see the examples.

Example 1 – Thumper with POS disk layout:

Sending a 233GB snapshot to the trash yields 38Mbytes per second off of the disk.


[root@thumper4:~]# time zfs send foo@052209 | mbuffer -v3 -s512k -m1G > /dev/null
in @ 9036 kB/s, out @ 9036 kB/s,  233 GB total, buffer   0% full
summary:  233 GByte in 1 h 43 min 38.5 MB/s
real    103m20.665s
user    0m10.837s
sys     11m2.920s
 [root@thumper4:~]# zpool status foo
  pool: foo
 state: ONLINE
 scrub: none requested
config:
        NAME        STATE     READ WRITE CKSUM
        foo    ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c7t4d0  ONLINE       0     0     0
            c6t7d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c5t7d0  ONLINE       0     0     0
            c5t6d0  ONLINE       0     0     0
errors: No known data errors

Notes: How I squeezed out an additional 10Mbytes per second over the wire is unknown at this point. I have an idea but until I have the chance to re-run that job without the –q flag I won’t speculate.
There are multiple smaller pools on the POS system. It was built prior to the release that allowed for ZFS root/boot and was initially a test box that somehow got moved into production in its sorry state! (That ever happen to you?)

Example 2 – Thumper with better disk layout.

Sending the same 233GB snapshot to the trash yields 115Mbytes per second off of the disk.


[root@thumper2:~]# time zfs send storage/foo@052209 | mbuffer -v3 -s512k -m1G > /dev/null
in @ 77.5 MB/s, out @ 77.5 MB/s,  233 GB total, buffer   0% full
summary:  233 GByte in 34 min 43.2 sec - average of  115 MB/s
real    34m43.871s
user    0m9.913s
sys     14m14.391s
[root@thumper2:~]# zpool status storage
  pool: storage
 state: ONLINE
 scrub: none requested
config:
        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t0d0  ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c8t0d0  ONLINE       0     0     0
            c5t0d0  ONLINE       0     0     0
            c7t0d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c8t1d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c6t1d0  ONLINE       0     0     0
            c7t1d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t2d0  ONLINE       0     0     0
            c8t2d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
            c6t2d0  ONLINE       0     0     0
            c7t2d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t3d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
            c5t3d0  ONLINE       0     0     0
            c6t3d0  ONLINE       0     0     0
            c7t3d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c8t4d0  ONLINE       0     0     0
            c5t4d0  ONLINE       0     0     0
            c7t4d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c8t5d0  ONLINE       0     0     0
            c5t5d0  ONLINE       0     0     0
            c7t5d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t6d0  ONLINE       0     0     0
            c1t6d0  ONLINE       0     0     0
            c8t6d0  ONLINE       0     0     0
            c5t6d0  ONLINE       0     0     0
            c6t6d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t7d0  ONLINE       0     0     0
            c1t7d0  ONLINE       0     0     0
            c8t7d0  ONLINE       0     0     0
            c6t7d0  ONLINE       0     0     0
            c7t7d0  ONLINE       0     0     0
        spares
          c0t1d0    AVAIL
          c1t2d0    AVAIL
          c8t3d0    AVAIL
          c6t5d0    AVAIL
          c7t6d0    AVAIL
          c5t7d0    AVAIL
errors: No known data errors

Additional info, here is how the data can be sent over the wire.

Run this on the host where you want to receive the snapshot. 192.168.1.1 is the IP address of the sending host.


mbuffer -I 192.168.1.1:10000 -q -s128k -m1G -P10 | zfs recv  storage/foo

Run this on the host with the snapshot you want to send, after your start the listener on the recipient host. 192.168.1.2 is the IP of the receiving host.


zfs send  foo@052209 | mbuffer -q -s128k -m1G -O 192.168.1.2:10000

Note that the –q, -m1G and –P10 options should probably be removed until you have a better understanding of how mbuffer will work in your environment.

More testing will be done when equipment becomes available and time permits. I’ll try to post some simple performance comparisons using raidz vs raidz2.

.