Cell Cluster at WelfenLab
General Information
The Welfenlab bought a hpc cluster in a DFG Grand for VR-research equipment. The cluster is part of a VR-system that consists of several state-of-the-art VR-equipment like Inca6D haptic device, IOTracker/4 Camera Tracking, ...
Technical Specifications
The cluster consists of 12 QS22 Blades which are connected over an InfiniBand DDR-4x high speed network. Each blade has
- 2 Cell B.E. Processor running at 3.2GHz, containing
- PowerPC Processor Element
- AltiVitec Vector Unit
- 8 Synergistic Processing Elements
- 256KiB Local Storage
- Memory Flow Controller (MFC) connected with EIB
- Synergistic Processing Unit (SPU)
- Peak Performance:
- 24 Giga FLOP/s (single precision)
- 10 Giga FLOP/s (double precision)
- Peak Performance:
- Element Interconnect Bus
- four 32bit-rings clocked with 1.6 GHz
- maximum transfer rate 204 GB/s
- PowerPC Processor Element
- 2 Broadcom BCM5704S NICs (GigaBit Ethernet)
- 8 GB DDR2-800 (very low profile)
(NUMA-based access, 8 Banks with 1GB each)
- Mellanox MT25418 ConnectX IB DDR, PCIe 2.0 2.5GT/s
(incomplete)
Installation
As the blades are diskless, it is neccessary to provide an OS image via NFS. For this we use a dedicated server which is connected over Ethernet a private TCP/IP network with the blades.
The blades and the server as well are running with RHEL 5.3 which is already optimized by IBM for the using the QS22 blades.
For the boot process of the QS22 machines we took cobbler which manages the tftp preboot stage for downloading the kernel image and the initrd of the system.
Unfortunately, a custom built initrd is needed to boot the system over nfs as the nfs server ip and the network modules have to be specified. Moreover, some nasty feature of mkinitrd leads to boot failure on the different machines: it writes the MAC address of eth0 into the initrd. This prevents the network adapter (with a different mac) to initialize.
This can be fixed by commenting out the MAC address in /etc/sysconfig/network-scripts/ifcfg-eth0 or /etc/sysconfig/network.
Afterwards, execute the following command for the initrd build:
mkinitrd --with=tg3 --rootfs=nfs --net-dev=eth0 --rootdev=192.168.100.1:/export/bladeOS --without-dmraid --omit-raid-modules --omit-lvm-modules --fstab=/etc/fstab initrd-{kernel-version} {kernel-version}Steps for upgrading the kernel
After making a yum upgrade or install, you should create the initrd as described above. Then, copy both the kernel image (installed in /boot) and the newly created initrd to navier.
Now, you need to update the boot process managed by cobbler. Therefore you must place the two files in the according image directories of the distribution.
In our case, we must replace (certainly, we backup the old files before) the files in
/opt/cobbler/cobblerwww/images/RHEL5-ppc64-nfsboot
as it is the active profile for the QS22 blades. You can verify this by looking at the webpage of cobbler: http://navier/cobbler/web -> systems -> blade01 - 12 -> profile = RHEL5-ppc64-nfsboot
either in the command line of navier or on the webpage, you have to sync the configuration.
#cobbler sync
You can check if the copy was successfull, by looking into the image folders for the tftpd.
/tftpboot/images/RHEL5-ppc64-nfsboot
There are no comments on this page. [Add comment]