Solaris 10 zones and ZFS

06/19/2009Minor update at bottom of page

Intro:

A couple of my favorite Solaris 10 features provide the ability to create virtual machines with ease and run them on a very efficient and full featured file system.

You can learn more about the Zettabyte File System (ZFS) here. In summary, if you are not already using ZFS then it’s time to read up and start.

Zones are virtual machines running within a Solaris system. If you don’t know what a virtual machine is just picture a single physical server running multiple independent and isolated copies of the operating system. Solaris 10 zones (or containers) allow you to run a number of virtual machines on one physical machine. Although there are plenty of benefits to this approach the example below mainly focuses on the ease of deployment.

In the example there is a one to one relation between physical machine and zone. The same process could be used to create multiple zones on one physical machine if there was a need and the hardware resources were sufficient.

If you stumble upon this page keep in mind that with newer releases of software some of the options and methods may have changed. For obvious reasons I had to make some changes to hostnames, etc. If you find typos please let me know.

I don’t take any credit for inventing any of this. This info is available in many places on the web already. I’ve left a copy here mainly so I can refer back to when needed.

Background:

This is a simple process used to clone a bunch of web slingers. This assumes that each physical machine has the same base O/S, packages and patch level. Trying to use this process on systems with different packages or patch levels guarantees a headache. There are plenty of ways to install the base OS. While outside of the scope of this blurb I’m fond of using Flash Archives (flar). Since many of the systems I work with are highly customized (and, ugh.. constantly changing) having a handful of these systems available as flars is helpful.

The environment in the example is basic, each server has one child zone installed with its own instance of Apache/PHP/MySQL installed and running from a mounted NFS volume. (Note: I think that web farms with multiple boxes in front of some load balancing hardware provide more redundancy than getting one monster server and dropping multiple zones on it. Just my opinion though.)

Process:

Part One – Create some disk pools and ZFS space to run the virtual machines in.

1. What disks do we have? (I already knew disk c3t2d0 was in use, you should check df -h first if you are not sure)

[root@someserver:/]# format

Searching for disks...done

AVAILABLE DISK SELECTIONS:
0. c3t2d0
/pci@7b,0/pci1022,7458@11/pci1000,3060@2/sd@2,0
1. c3t3d0
/pci@7b,0/pci1022,7458@11/pci1000,3060@2/sd@3,0
Specify disk (enter its number): ^C

We don’t really want to do anything here so a Ctrl C is issued to exit without screwing things up horribly.

2. Create a pool called zones from physical disk c3t3d0. Notice that there is no mucking with mkfs or editing vfstab. The device is mounted and available instantly. Also note that if we had multiple slices or disks that we would use a similar command to create a mirror or raidz of said slices and/or disks (example: zpool create zones mirror c0t0d0s5 c0t1d0s5). The name of the pool can be foo or fred or whatever. I picked the name ‘zones’.


[root@someserver:/]# zpool create zones c3t3d0
[root@someserver:/]# df -h
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c3t2d0s0       50G   3.9G    45G     8%    /
/devices                 0K     0K     0K     0%    /devices
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                    14G   744K    14G     1%    /etc/svc/volatile
objfs                    0K     0K     0K     0%    /system/object
/dev/dsk/c3t2d0s3       38G   2.1G    36G     6%    /usr
/usr/lib/libc/libc_hwcap2.so.1
38G   2.1G    36G     6%    /lib/libc.so.1
fd                       0K     0K     0K     0%    /dev/fd
/dev/dsk/c3t2d0s4       38G   174M    38G     1%    /var
swap                    14G    32K    14G     1%    /tmp
swap                    14G    16K    14G     1%    /var/run
zones        134G    24K   134G     1%    /zones

3. Take a look at what we have and look at the status.


[root@someserver:/]# zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
zones         136G     88K    136G     0%  ONLINE     -

[root@someserver:/]# zpool status
pool: zones
state: ONLINE
scrub: none requested
config:

NAME        STATE     READ WRITE CKSUM
zones  ONLINE       0     0     0
c3t3d0    ONLINE       0     0     0
errors: No known data errors

4. Here we create some zfs space called zones/webserver1 and pre allocate 50GB for use in webserver1.


[root@someserver:/]# zfs create zones/webserver1
[root@someserver:/]# zfs set quota=50g zones/webserver1
[root@someserver:/]# zfs set reservation=50g zones/webserver1

[root@someserver:/]# df -kh
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c3t2d0s0       50G   3.9G    45G     8%    /
/devices                 0K     0K     0K     0%    /devices
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                    14G   744K    14G     1%    /etc/svc/volatile
objfs                    0K     0K     0K     0%    /system/object
/dev/dsk/c3t2d0s3       38G   2.1G    36G     6%    /usr
/usr/lib/libc/libc_hwcap2.so.1
38G   2.1G    36G     6%    /lib/libc.so.1
fd                       0K     0K     0K     0%    /dev/fd
/dev/dsk/c3t2d0s4       38G   174M    38G     1%    /var
swap                    14G    32K    14G     1%    /tmp
swap                    14G    16K    14G     1%    /var/run
zones        134G    25K    84G     1%    /zones
zones/webserver1
50G    24K    50G     1%    /zones/webserver1

[root@someserver:/]# zfs list
NAME                    USED  AVAIL  REFER  MOUNTPOINT
zones        50.0G  83.9G  25.5K  /zones
zones/webserver1  24.5K  50.0G  24.5K  /zones/webserver1

Okay, in four easy steps we setup some zfs space. Why would we do this? Well, for one thing because we can and more importantly because we can do things like this:
(insert various examples here, like exporting to another box, making a snapshot, and not so cool — deleting the whole thing with one simple command).

Part Two – Create a zone to run application $foo in with manageable resources.

Think of a zone as a virtual machine or a chrooted environment.

We want a virtual machine to run a web application but we only want to allow nn RAM and nn CPU for this new machine. This way, when we run our forkbomb – or our standard Poorly Optimized Software – the damage is limited to the virtual machines’ isolated environment. In theory this should work well, in practice I’ve seen a virtual machine take quite some time to reboot after the hosing I gave it. The actual server, called the global zone was running fine though, which is what we want. Yet, the slow response to kill and restart the virtual machine was/is a concern.

1. Here we discover that capped-cpu did not make it into our release of Solaris (10 8-07). Argh! This was the latest release at the time this was written.


[root@someserver:~/bin]# zonecfg -z webserver1
webserver1: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:webserver1> create
zonecfg:webserver1> set zonepath=/zones/webserver1
zonecfg:webserver1> set autoboot=true
zonecfg:webserver1> set scheduling-class=FSS
zonecfg:webserver1> add capped-cpu
usage:
add
(global scope)
add
(resource scope)
zonecfg:webserver1>

2. Since we can not use capped-cpu, which will give us more granularity we will have to stick with dedicated-cpu. There are other methods that can be used, but I prefer the simple approach. (You could use pooladm and create processor sets, which I may do again depending on how well ‘dedicated-cpu’ works).
Here we give the zone a single CPU and 4 gigs of RAM.


...
zonecfg:webserver1> add dedicated-cpu
zonecfg:webserver1:dedicated-cpu> set ncpus=1
zonecfg:webserver1:dedicated-cpu> end
zonecfg:webserver1> add capped-memory
zonecfg:webserver1:capped-memory> set physical=4G
zonecfg:webserver1:capped-memory> end
...

3. Almost there. Now we limit the number of processes to 2000, for now. Again, there are other ways to add limits. There are other limits that can be added too.


...
zonecfg:webserver1> add rctl
zonecfg:webserver1:rctl> set name=zone.max-lwps
zonecfg:webserver1:rctl> add value (priv=privileged,limit=2000,action=deny)
zonecfg:webserver1:rctl> end
...

4. Add some standard stuff and save the zone. Here is the complete output.


[root@someserver:~/bin]# zonecfg -z webserver1
webserver1: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:webserver1> create
zonecfg:webserver1> set zonepath=/zones/webserver1
zonecfg:webserver1> set autoboot=true
zonecfg:webserver1> set scheduling-class=FSS
zonecfg:webserver1> add dedicated-cpu
zonecfg:webserver1:dedicated-cpu> set ncpus=1
zonecfg:webserver1:dedicated-cpu> end
zonecfg:webserver1> add capped-memory
zonecfg:webserver1:capped-memory> set physical=4G
zonecfg:webserver1:capped-memory> end
zonecfg:webserver1> add rctl
zonecfg:webserver1:rctl> set name=zone.max-lwps
zonecfg:webserver1:rctl> add value (priv=privileged,limit=2000,action=deny)
zonecfg:webserver1:rctl> end
zonecfg:webserver1> add net
zonecfg:webserver1:net> set physical=nge0
zonecfg:webserver1:net> set address=192.168.1.10
zonecfg:webserver1:net> end
zonecfg:webserver1> verify
zonecfg:webserver1> commit
zonecfg:webserver1> exit
[root@someserver:~/bin]#

4 easy steps and we've configured a zone.

Part Three - Install, boot and use the new zone.

1. Perms need to be set properly else this will fail.


[root@someserver:~/bin]# zoneadm -z webserver1 install
/zones/webserver1 must not be group readable.
/zones/webserver1 must not be group executable.
/zones/webserver1 must not be world readable.
/zones/webserver1 must not be world executable.
could not verify zonepath /zones/webserver1 because of the above errors.
zoneadm: zone webserver1 failed to verify
[root@someserver:~/bin]# chmod go-rwx /zones/webserver1/
[root@someserver:~/bin]# zoneadm -z webserver1 install
Preparing to install zone .
Creating list of files to copy from the global zone.
Copying <831> files to the zone.
Initializing zone product registry.
Determining zone package initialization order.
Preparing to initialize <700> packages on the zone.
Initialized <700> packages on zone.
Zone  is initialized.
The file  contains a log of the zone installation.
[root@someserver:~/bin]#

2. Time to fire this zone up!


[root@someserver:~/bin]# time zoneadm -z webserver1 boot

real    0m1.344s
user    0m0.016s
sys     0m0.012s
[root@someserver:~/bin]#

3. Okay, let’s login. (I did not paste the initial screens, when you first login there are a couple of questions to answer about hostname, timeszone, root pw, DNS, kerberos and such). You only have to do this for the first zone you create. Takes about 30 seconds to answer those and then you have a prompt.


webserver1 console login: root
Password:
Last login: Mon Feb 18 12:14:00 on console
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
Feb 18 12:17:16 webserver1 login: ROOT LOGIN /dev/console

bash-3.00# export "TERM=vt100"
bash-3.00# ifconfig -a
lo0:1: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
nge0:1: flags=1000843 mtu 1500 index 2
inet 192.168.1.10 netmask ffffff00 broadcast 192.168.1.255

bash-3.00# ping 192.168.1.1
192.168.1.1 is alive
bash-3.00# mkdir /web
bash-3.00# mount nfsserver:/web /web
bash-3.00# cd /web
bash-3.00# ls -l
drwxr-xr-x  11 root     root        4096 Feb 14 16:10 mysql
drwxrwxrwx   3 root     root        4096 Feb 14 14:58 sw
bash-3.00# hostname
webserver1

After a certain amount of customization within the running zone called webserver1 it is ready to deploy to some other servers. Some examples of customization would be adding some users or groups, editing vfstab, compiling some software or toggling services to meet your need. The key here is that once you’ve created your ‘golden’ zone (which has a base install taken from your ‘golden’ flar) you can then clone this zone with minimal effort. Yeah, I’m sure there are lots of other ways to do this. This works well for me.

Part 5 - Clone / Migrate an installed zone to other machines.

Note: In order to pull this off, all of the physical servers need to be running the same versions of software and the same patches/packages must be installed.

Here we shut down the original zone, detach it and transfer it to another physical server. When we detach the zone a configuration file is dropped in the zones home. This is an important step.

1. Halt and detach the running zone.


[root@someserver:~]# zoneadm -z webserver1 halt
[root@someserver:~]# zoneadm -z webserver1 detach

2. Copy the zone to another server and then restart the original zone.


[root@someserver:~]# cd /zones/webserver1
[root@someserver:~]# gtar czpf - . | ssh user@anotherserver " cat - >> webserver1.tgz "

[root@someserver:~]# zoneadm -z webserver1 attach
[root@someserver:~]# zoneadm -z webserver1 boot

3. The rest of the steps are performed on the new physical server. If you have not already done so create the zfs file system for webserver2. See Part One for details.


[root@anotherserver:~]# zpool create zones $device
[root@anotherserver:~]# zfs create zones/webserver2
[root@anotherserver:~]# zfs set quota=50g zones/webserver2
[root@anotherserver:~]# zfs set reservation=50g zones/webserver2

Extract the archive sent from somesever.


[root@anotherserver:/zones/webserver2]# cp ~user/webserver1.tgz .
[root@anotherserver:/zones/webserver2]# gtar zxpf webserver1.tgz

After you extract the webserver1.tgz archive you should edit nodename and hosts. You can do so after booting the zone but if you forget you’re sure to get a headache. These files live at:

[root@anotherserver:/zones/webserver2/root/etc/{nodename,hosts}

4. Now that the trivial stuff is done it’s time to do the hard part.
Pay attention here, the important line below is "create -a /zones/webserver2". This is the path to where you extracted the webserver1.tgz archive. Don’t forget the –a.

The only zonecfg changes needed are what to call the new zone and the IP address of the zone. (You will need to change net physical if the hardware is different).


[root@anotherserver:~]# zonecfg -z webserver2
webserver2: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:webserver2> create -a /zones/webserver2

Wow! That was hard!
Now let’s take a look at the new zone configuration.


zonecfg:webserver2> info
zonename: webserver2
zonepath: /zones/webserver2
brand: native
autoboot: true
bootargs:
pool:
limitpriv:
scheduling-class: FSS
ip-type: shared
[max-lwps: 2000]
inherit-pkg-dir:
dir: /lib
inherit-pkg-dir:
dir: /platform
inherit-pkg-dir:
dir: /sbin
inherit-pkg-dir:
dir: /usr
net:
address: 192.168.1.10
physical: nge0
dedicated-cpu:
ncpus: 1
capped-memory:
physical: 4G
rctl:
name: zone.max-lwps
value: (priv=privileged,limit=2000,action=deny)

Okay, looks good except for the IP address, let’s change that.


zonecfg:webserver2> select net address=192.168.1.10
zonecfg:webserver2:net> set address=192.168.1.20
zonecfg:webserver2:net> end
zonecfg:webserver2> commit
zonecfg:webserver2> exit

You should now have all of the files and configuration needed to attach and boot the new zone.

5. Permissions must be set properly, same thing happens when you first setup a zone, perms :).


[root@anotherserver:~]# zoneadm -z webserver2 attach
/zones/webserver2 must not be group readable.
/zones/webserver2 must not be group executable.
/zones/webserver2 must not be world readable.
/zones/webserver2 must not be world executable.
could not verify zonepath /zones/webserver2 because of the above errors.
zoneadm: zone webserver2 failed to verify
[root@anotherserver:~]# chmod go-rwx /zones/webserver2/
[root@anotherserver:~]# zoneadm -z webserver2 attach

6. Okay, boot the new virtual machine. Note that all of your customizations made on webserver1 are there.


[root@anotherserver:~]# zoneadm -z webserver2 boot
[root@anotherserver:/zones/webserver2]# ping webserver2
webserver2 is alive

[root@anotherserver:/zones/webserver2]# ssh user@192.168.1.20
Password:
Last login: Tue Feb 19 09:48:38 2008 from somewhere
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005

-bash-3.00$

Addendum:

After speaking with the Sun Data Center Architect who visited last week I found out that some of the Sun documents might have been interpreted wrong regarding CPU/Cores in zones. Previously I was under the impression that setting ncpus=1 in the zone would provide 1 dedicated CPU. This is not the case in the version of software we have, ncpus=1 is only one core. This applies to x86 hardware, see the comments section for more info.

Here is how we add more CPU:

Virtual server shows 1 CPU and can be viewed with mpstat -- note that it is 90% idle. This is actually one core.


[root@webserver1:~]# mpstat
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
0   83   0    6   427  214  233   34    0    7    0   539   10   1   0  90

In the global zone we have 4 cores.


[root@someserver:~]# mpstat
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
0   83   0    6   427  214  233   34    0    7    0   539   10   1   0  90
1    0   0   23    63    3  118    0    4    2    0     1    0   0   0 100
2    0   0    1   433  408   32    0    4    9    0     3    0   0   0 100
3    0   0    1    18    0   15    0    1    1    0     6    0   0   0 100

In the global zone we use zonecfg to add another core.


zonecfg:webserver1> select dedicated-cpu ncpus=1
zonecfg:webserver1:dedicated-cpu> set ncpus=2
zonecfg:webserver:dedicated-cpu> end
zonecfg:webserver1> commit
zonecfg:webserver1> exit

There is probably another way to do this but I simply reboot the zone and the cores appear. Note the server is down hard, for almost 3 seconds.


[root@someserver:~]# zonecfg -z webserver1
[root@someserver:~]# time zoneadm -z webserver1 reboot

real    0m2.697s
user    0m0.013s
sys     0m0.007s

Back in the virtual machine using mpstat again we see the extra core.


[root@webserver1:~]# mpstat
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
0   83   0    6   427  214  233   34    0    7    0   539   10   1   0  90
1    0   0   23    63    3  118    0    4    2    0     2    0   0   0 100

So there you have it. Mixed with too many words you have a list of simple commands to create and deploy a Solaris 10 zone. Whoopie.

06/19/2009 Minor update:
The search pattern below came in and reminded me that some people still use top on solaris.

Search pattern: 'solaris capped memory set but top shows more'

Memory will show up but the cap is there. To see what resources are used by a specfic zone you can use prstat with the -z option (prstat is similar to top, native to Solaris).

Example:
prstat -mvaL -z $zonename

Tags:

5 Responses to “Solaris 10 zones and ZFS”

  1. Brett Robblee Says:

    Nice article. As a pointy-haired boss, I don’t spend any time in the trenches and its nice to be able to understand techicnal topics such as these in short order.

    Have you thought about approaching some trade rags to do a column?

    Cheers,
    Brett

  2. tm Says:

    Hey Brett,

    Thanks. This ‘article’ was more of a cut-n-paste of emails sent to co-workers documenting some basic stuff. I added some explanation which I hope makes it easy to follow. Trade Rags? Well, you and I may still read them…anyone else?

    Tim

  3. Wrex Allen Says:

    You may want to addendum:
    “Previously I was under the impression that setting ncpus=1 in the zone would provide 1 dedicated CPU. This is not the case in the version of software we have, ncpus=1 is only one core. ”

    Not necessarily true, as with a Sun CMT server (Sunt5240, for example) ncpus=1 would equate to one thread.

    Just thought you might want to know :-)

  4. tm Says:

    Thanks Wrex.

    The following doc on the Sun site states the ncpus setting is for number of CPU’s (or just another core on the chip die as I view it). I may may not have mentioned it above but these are all x86 boxes, I have not been able to play with any of the cool threads / niagra boxes like the one you mention.

    http://docs.sun.com/app/docs/doc/817-1592/gepsd?a=view
    “Solaris 10 8/07: dedicated-cpu Resource
    The dedicated-cpu resource sets limits for ncpus, and optionally, importance.
    ncpus:Specify the number of CPUs or specify a range, such as 2–4 CPUs.”

    Thanks for the note.

    Tim

  5. Wrex Says:

    Ah yes, I am sure it would make a huge difference on an x86 platform. Yeah, the threaded tech treats it entirely different. And if you want nice little headache, try staying within license compliance with Oracle and their DB products on one. It’s a mathematical circus, since it’s (CMT servers) all threaded, heh.

    Still a great blog post! I wasn’t meaning to discredit anything.

.