Setting up Lustre in under and hour

I work for the NRAO (National Radio Astronomy Observatory) where we store large quantities of radio telescope data and scientific research materials across two massive Lustre file systems. We currently are sitting on about 3.6 petabytes of storage. Now, you may be wondering, "Caleb, how do you keep up with that amount of data? Do you have a backup?" To answer all your questions, it's not that difficult really and kind of sort of. That's actually where this post comes from, how did I start setting up our backup Lustre environment for the production level file system?

Getting Started

To get started, you'll need 2 machines for your server devices, the MDS and the OSS, and another for your client to confirm that this is working and able to store data. You can do this with any size system as you can grow your lustre file system as you progress into bigger, better things with ease. This blog post assumes you are familiar enough with CentOS/RHEL, yum, and basic Linux commands to not accidentally blow something up or destroy your computing environment beyond repair. I take no responsibility for your actions should you choose to follow this post in any way. You have been warned; you can break stuff easily by doing this. We will be utilizing ethernet for this setup as you may not have access to infiniband, however, if you do, simply replace the eth0 or em0 with your infiniband interface (ex: ib0) in the instructions.

Requirements

  • Two systems to act as servers, one with multiple hard drives or a single, large drive to act as the Lustre OSS
  • One system to act as the client
  • CentOS installed on all three devices before you begin
  • CentOS updated on all three devices before you begin
  • Some knowledge of how Linux works and is structured
  • A drink and a healthy snack, this could be stressful

Installation

To start, we will add the Lustre repositories into our OS across all of our systems:

sudo sh -c 'cat >/etc/yum.repos.d/lustre.repo' <<EOF
[lustre-server]
name=CentOS-$releasever - Lustre
baseurl=https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el7/server/
gpgcheck=0

[e2fsprogs]
name=CentOS-$releasever - Ldiskfs
baseurl=https://downloads.hpdd.intel.com/public/e2fsprogs/latest/el7/
gpgcheck=0

[lustre-client]
name=CentOS-$releasever - Lustre
baseurl=https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el7/client/
gpgcheck=0
EOF

# Upgrade e2fsprogs
sudo yum upgrade -y e2fsprogs

# Install lustre-tests
sudo yum install -y lustre-tests

# Create the lnet module configuration (use appropriate interconnect in place of "tcp0" and appropriate interface in place of "eth0". You usually don't need to change "tcp0".)
sudo sh -c 'cat > /etc/modprobe.d/lnet.conf' <<EOF
options lnet networks=tcp0(eth0)
EOF

Now that we have the lustre kernel installed on our systems, let's go ahead and reboot:sudo reboot

On the MDS and OSS only, have the lnet module auto-load on boot:

sudo sh -c 'cat > /etc/sysconfig/modules/lnet.module' <<EOF
#!/bin/sh

if [ ! -c /dev/lnet ] ; then
    exec /sbin/modprobe lnet >/dev/null 2>&1
fi
EOF
sudo chmod 744 /etc/sysconfig/modules/lnet.module

On the MDS:

  • Initialize a disk or partition to use for lustre. Create a lustre MDT: mkfs.lustre --fsname=whatevs --mgs --mdt --index=0 /dev/sdX
  • Create a mount point and mount the lustre FS: mkdir /mnt/mdt && mount -t lustre /dev/sdX /mnt/mdt

On the OSS:

  • Intialize a disk or partition to use for lustre. Create a lustre OST: mkfs.lustre --ost --fsname=whatevs --mgsnode=192.168.N.N@tcp0 --index=0 /dev/sdX
  • Adjust the --mgsnode parameter for the address and protocol used for the MGS.
  • Create a mount point and mount the lustre FS: `mkdir /ostoss_mount && mount -t lustre /dev/sdX /ostoss_mount`

On the client:

sudo modprobe lustre

Create script to load Lustre module on boot:

sudo sh -c 'cat > /etc/sysconfig/modules/lustre.modules' <<EOF
#!/bin/sh

/sbin/lsmod | /bin/grep lustre 1>/dev/null 2>&1
if [ ! $? ] ; then
   /sbin/modprobe lustre >/dev/null 2>&1
fi
EOF
sudo chmod 744 /etc/sysconfig/modules/lustre.modules

Create a mount point: mkdir /mnt/lustre. Mount the lustre FS:mount -t lustre 192.168.N.N@tcp0:/whatevs /mnt/lustre.

Complete

At this point, you should have a 3 node lustre environment that you can expand at any time by simply adding an additional OSS into the mix and mounting it within your MDS. For more information about Lustre, check out the Lustre Wiki. Hope to see you all at LUG 2020 in Berkley, CA!


Hello World

Testing out this blog software to see how it works. However, I can't see what I'm typing as I've messed up some theme files on my RedhHat installation. So I'm having to Ctrl+A every so often to see if I have typed anything remotely legible. Anyways, more soon.