Welfenlab - Leibniz 
                        Universität Hannover Welfenlab Leibniz Universität Hannover
Search:

Cell Cluster at WelfenLab


General Information


The Welfenlab bought a hpc cluster in a DFG Grand for VR-research equipment. The cluster is part of a VR-system that consists of several state-of-the-art VR-equipment like Inca6D haptic device, IOTracker/4 Camera Tracking, ...

Technical Specifications


The cluster consists of 12 QS22 Blades which are connected over an InfiniBand DDR-4x high speed network. Each blade has

  • 2 Cell B.E. Processor running at 3.2GHz, containing
    • PowerPC Processor Element
    • 8 Synergistic Processing Elements
      • 256KiB Local Storage
      • Memory Flow Controller (MFC) connected with EIB
      • Synergistic Processing Unit (SPU)
        • Peak Performance:
          • 24 Giga FLOP/s (single precision)
          • 10 Giga FLOP/s (double precision)
    • Element Interconnect Bus
      • four 32bit-rings clocked with 1.6 GHz
      • maximum transfer rate 204 GB/s
  • 2 Broadcom BCM5704S NICs (GigaBit Ethernet)
  • 8 GB DDR2-800 (very low profile) (NUMA-based access, 8 Banks with 1GB each)
  • Mellanox MT25418 ConnectX IB DDR, PCIe 2.0 2.5GT/s (incomplete)

Installation


As the blades are diskless, it is neccessary to provide an OS image via NFS. For this we use a dedicated server which is connected over Ethernet a private TCP/IP network with the blades.
The blades and the server as well are running with RHEL 5.3 which is already optimized by IBM for the using the QS22 blades.
For the boot process of the QS22 machines we took cobbler which manages the tftp preboot stage for downloading the kernel image and the initrd of the system.

Unfortunately, a custom built initrd is needed to boot the system over nfs as the nfs server ip and the network modules have to be specified. Moreover, some nasty feature of mkinitrd leads to boot failure on the different machines: it writes the MAC address of eth0 into the initrd. This prevents the network adapter (with a different mac) to initialize.

This can be fixed by commenting out the MAC address in /etc/sysconfig/network-scripts/ifcfg-eth0 or /etc/sysconfig/network.

Afterwards, execute the following command for the initrd build:

mkinitrd --with=tg3 --rootfs=nfs --net-dev=eth0 --rootdev=192.168.100.1:/export/bladeOS --without-dmraid --omit-raid-modules --omit-lvm-modules --fstab=/etc/fstab initrd-{kernel-version} {kernel-version}


Steps for upgrading the kernel


After making a yum upgrade or install, you should create the initrd as described above. Then, copy both the kernel image (installed in /boot) and the newly created initrd to navier.

Now, you need to update the boot process managed by cobbler. Therefore you must place the two files in the according image directories of the distribution.
In our case, we must replace (certainly, we backup the old files before) the files in
/opt/cobbler/cobblerwww/images/RHEL5-ppc64-nfsboot

as it is the active profile for the QS22 blades. You can verify this by looking at the webpage of cobbler: http://navier/cobbler/web -> systems -> blade01 - 12 -> profile = RHEL5-ppc64-nfsboot

either in the command line of navier or on the webpage, you have to sync the configuration.
#cobbler sync

You can check if the copy was successfull, by looking into the image folders for the tftpd.
/tftpboot/images/RHEL5-ppc64-nfsboot


There are no comments on this page. [Add comment]

Page was generated in 0.0474 seconds