Sunday, April 27, 2008

Change boot device target number (Rev: 1.1)



Situation

I refered to the "Copy Solaris boot disk to another disk with different partitions layout" in my previous blog. After copying the data from disk 1 to disk 2, and installing the bootblocks in disk 2, we swapped disk 2 from slot 1 to slot 0 position.

After powered on the Sun Fire 280R server, the below boot error messages were encountered:
Boot device: disk file and args:
Evaluating: boot
Can't open boot device

At the OpenBoot PROM (OBP) command prompt,
ok devalias
disk1 /pci@8,600000/SUNW,qlc@4/fp@0,0/disk@2,0
disk0 /pci@8,600000/SUNW,qlc@4/fp@0,0/disk@1,0
disk /pci@8,600000/SUNW,qlc@4/fp@0,0/disk@1,0

ok printenv boot-device
boot-device disk disk

From the devalias command, the default boot device (disk) was previously set to boot from disk target 1 slice 0 (disk@1,0) when disk 2 was originally placed in slot 1 position.

After we had swapped disk 2 from slot 1 to slot 0 position, the server can't open the boot device because it was looking for disk 2 at target 0 but disk 2 was previously configured to boot at target 1.


Solution

I created a new devalias called disk2 and set it to boot from disk target 0:
ok nvalias disk2 /pci@8,600000/SUNW,qlc@4/fp@0,0/disk@0,0

Then, I set the default boot device to boot from disk2:
ok setenv boot-device disk2

Finally, wrote the new boot-device value to the PROM:
ok reset-all

The system will immediately reboot to boot from disk 2 at target 0 in the new slot 0 position.


Reference

[1] Sun Microsystems Documentation

Thursday, April 10, 2008

Copy Solaris boot disk to another disk with different partitions layout (Rev: 1.5)



On April 2008, I upgraded the Solaris 8 Operating System (OS) in the Sun Fire 280R server to Solaris 10 8/07 OS with the below partitions layout in disk 1:

Disk 1 (Solaris boot disk, c1t0d0, slot 0)
FilesystemKbytesMounted on
c1t0d0s0
2 GB/
c1t0d0s1 2 GBswap
c1t0d0s3 8 GB/usr
c1t0d0s724 GB/export/home


Before swapping disk 1 (original slot 0 position) and disk 2 (original slot 1 position)

The Sun Fire 280R server comes with two 36 GB hardisks: disk 1 (c1t0d0, Solaris boot disk) and disk 2 (c1t1d0). After installing some softwares, the root (/) filesystem in disk 1 left about 500 MB of free space which might not be enough for future usage.

I decided to increase the hardisk space of the root (/) filesystem from 2 GB to 4.5 GB. With the Solaris 10 OS in disk 1 running, I modified the partitions layout of disk 2 using the format command as shown below:

Disk 2 (c1t1d0, slot 1)
FilesystemKbytesMounted on
c1t1d0s0
4.5 GB/
c1t1d0s1 2 GBswap
c1t1d0s3 8 GB/usr
c1t1d0s721.5 GB/export/home


After quiting from the format command for disk 2, new UFS file systems have to be constructed by using the newfs command for all the partitions in disk 2 except the swap filesystem:
# newfs -v /dev/rdsk/c1t1d0s0
# newfs -v /dev/rdsk/c1t1d0s3
# newfs -v /dev/rdsk/c1t1d0s7



Copy data from disk 1 to disk 2

[Step 1] Mount the file system in disk 2, c1t1d0s0 filesystem to /mnt:
# mount -F ufs /dev/dsk/c1t1d0s0 /mnt

[Step 2] Copy the data from disk 1, c1t0d0s0 filesystem to disk 2, c1t1d0s0 filesystem:
# ufsdump 0f - / Pipe symbol (cd /mnt; ufsrestore xvf -)
...
DUMP: DUMP IS DONE
Add links
Set directory mode, owner, and times.

Set owner/mode for ‘.’? [yn] y
Directories already exist, set modes anyway? [yn] y
...





During ufsdump and ufsrestore of the / filesystem, I encountered the below error messages:
...
DUMP: DUMP IS DONE
Changing volumes on pipe input
abort? [yn] y
dump core? [yn] n

I had changed 4 used hardisks and also tried the below commands to specify a large "tape", but it still encountered the same error messages:
# ufsdump 0sdbf 13000 54000 126 - / Pipe symbol (cd /mnt; ufsrestore xvf -)


In the end, I changed a new hardisk, performed Step 2 and it worked as shown in Step 2.



[Step 3] Umount the /mnt filesystem:
# umount /mnt

Kindly repeat step 1 to step 3 for disk 2, partition 3 (c1t1d0s3) and partition 7 (c1t1d0s7) so that the partitions data in disk 1 (c1t0d0s0, c1t0d0s3 and c1t0d0s7) are the same as the partitions data in disk 2 (c1t1d0s0, c1t1d0s3 and c1t1d0s7) respectively. The steps are as shown below:

# mount -F ufs /dev/dsk/c1t1d0s3 /mnt
# ufsdump 0f - /usr Pipe symbol (cd /mnt; ufsrestore xvf -)
# umount /mnt

# mount -F ufs /dev/dsk/c1t1d0s7 /mnt
# ufsdump 0f - /export/home Pipe symbol (cd /mnt; ufsrestore xvf -)
# umount /mnt



Install bootblocks in the hardisk

We have to make disk 2 bootable by installing the bootblocks using the installboot command:
# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t1d0s0


Before swapping disk 2 and disk 1, kindly place your Solaris 10 DVD into the DVD drive. Ensure that no one is logged in to the server (# w) and shutdown the server:
# shutdown -y -i0

After the server is shutdown, kindly power off the server and swap disk 2 and disk 1 so that disk 2 can boot up with the new partition layout that has about 4.5 GB of root (/) filesystem.


After swapping disk 1 (new slot 1 position) and disk 2 (new slot 0 position)

The Sun Fire 280R server Fibre Channel (FC) hardisks use the World Wide Name (WWN) as the disk target:
# cldevice show Pipe symbol grep Device
...
DID Device Name: /dev/did/rdsk/d1
Full Device Path: phys-sun:/dev/rdsk/c1tWWNd0

...

If we simply swap disk 2 from slot 1 to slot 0 (previously used by disk 1), it does not change the device name of disk 2 in the Solaris 10 OS because we have not rebuild the /etc/path_to_inst file yet. The WWN in disk 2 still contains the WWN of disk 1 under the /device tree. As a result, the booted kernel will failed to mount the /usr filesystem as it is looking for the WWN of the disk 1. You will encounter the below error if you boot up the SUN Fire 280R server:
...
ERROR: svc: /system/filesystem/root: default failed to mount /usr
(see 'svcs -x' for details)
Requesting System Maintenance Mode
(See /lib/svc/share/README for more information)
Console login service(s) cannot run
...

To boot disk 2 in the new slot 0 position, we have to remove these device links and rebuild them. Assume you have changed the boot-device variable within the OpenBoot PROM (OBP) command prompt to disk 2 so that the SUN Fire 280R server can boot from disk 2 in the new slot 0 position.


Rebuild Solaris devices tree after swapping disks

1) Kindly power on the SUN Fire 280R server. From the ok prompt, kindly boot the server from the Solaris 10 DVD:
ok boot cdrom -s

2) Mount disk 2 root (/) filesystem to /mnt:
# mount /dev/dsk/c1t0d0s0 /mnt

3) Rename the /etc/path_to_inst file. Please do not delete the /etc/path_to_inst.old file as it is needed to rebuild the WWN of disk 2 during boot up:
# mv /mnt/etc/path_to_inst /mnt/etc/path_to_inst_org

4) Delete the old devices link:
# rm -f /mnt/dev/rdsk/c*
# rm -f /mnt/dev/dsk/c*
# rm -f /mnt/dev/cfg/c*

5) Rebuild the devices structure:
# devfsadm -r /mnt -p /mnt/etc/path_to_inst

6) Unmount the root (/) filesystem and reboot:
# cd /
# umount /mnt
# init 6


The SUN Fire 280R server will now boot up from disk 2 in slot 0 position and auto create a new /etc/path_to_inst file based on the /etc/path_to_inst.old file.


Notes:

After disk 2 had booted up successfully in the SUN Fire 280R server, I noticed that the system has the wrong date (eg: year 2007 instead of year 2008) and time. To change to the correct date and time, kindly use the date (# date MonthDayHourMinuteYear) command:
# date 040312072008
(set the date to 3 April 2008 and timing to 12:07 pm)


[August 2008] Solaris 10 8/07 had patches updated hang problem. Advisable to install Solaris 10 5/2008 or higher version as it did not has the patches updated hang problem or wrong date problem.


References:

[1] How to copy a Solaris boot drive to a disk with a different partition layout

[2] Rebuilding the Solaris Device Tree

[3] Sun Microsystems Documentation