Welcome to Delicate template
Header
Just another WordPress site
Header

GENERAL PARALLEL FILE SYSTEM (GPFS)

IBM  General Parallel File System (GPFS)

IBM  General Parallel File System (GPFS) is a scalable high-performance shared-disk clustered file system management infrastructure for AIX®, Linux® and Windows developed by IBM It is efficient storage management for big data applications.

Like some other cluster filesystems, GPFS provides concurrent high-speed file access to applications executing on multiple nodes of clusters. It can be used with AIX 5L clusters, Linux clusters, or a heterogeneous cluster of AIX and Linux nodes. In addition to providing filesystem storage capabilities, GPFS provides tools for management and administration of the GPFS cluster and allows for shared access to file systems from remote GPFS clusters.

GPFS has been available on AIX since 1998 and on Linux since 2001, and is offered as part of the IBM  System Cluster 1350.

Versions of GPFS:

Versions:

GPFS 3.2, September 2007
GPFS 3.2.1-2, April 2008
GPFS 3.2.1-4, July 2008
GPFS 3.1
GPFS 2.3.0-29

Architecture:

GPFS provides high performance by allowing data to be accessed over multiple computers at once. Most existing file systems are designed for a single server environment, and adding more file servers does not improve performance. GPFS provides higher input/output performance by “striping” blocks of data from individual files over multiple disks, and reading and writing these blocks in parallel. Other features provided by GPFS include high availability, support for heterogeneous clusters, disaster recovery, security, DMAPI, HSM and ILM.

GPFS File System:

A GPFS file system is built from a collection of disks which contain the file system data and metadata. A file system can be built from a single disk or contain thousands of disks, storing Petabytes of data. A GPFS cluster can contain up to 256 mounted file systems. There is no limit placed upon the number of simultaneously opened files within a single file system. As an example, current GPFS customers are using single file systems up to 2PB in size and others containing tens of millions of file.

Application interfaces:

Applications can access files through standard UNIX® file system interfaces or through enhanced interfaces available for parallel programs. Parallel and distributed applications can be scheduled on GPFS clusters to take advantage of the shared access architecture. This makes GPFS a key component in many grid-based solutions. Parallel applications can concurrently read or update a common file from multiple nodes in the cluster. GPFS maintains the coherency and consistency of the file system using a sophisticated byte level locking, token (lock) management and logging. In addition to standard interfacesGPFS provides a unique set of extended interfaces which can be used to provide high performance for applications with demanding data access patterns. These extended interfaces are more efficient for traversing a file system, for example, and provide more features than the standard POSIX interfaces.

Performance and scalability:

GPFS provides unparalleled performance especially for larger data objects and excellent performance for large aggregates of smaller objects. GPFS achieves high performance I/O by:
• Striping data across multiple disks attached to multiple nodes.
• Efficient client side caching.
• Supporting a large block size, configurable by the administrator, to fit I/O requirements.
• Utilizing advanced algorithms that improve read-ahead and writebehind file functions.

Why Choose GPFS:

GPFS is highly scalable: (2000+ nodes)

  1. Symmetric, scalable software architecture
  2. Distributed metadata management
  3. Allows for incremental scaling of system (nodes, disk space) with ease

GPFS is high performance file system:

  1. Large block size (tunable) support with wide striping (across nodes and
  2. disks)
  3. Parallel access to files from multiple nodes
  4. Thorough token refinement (sector range) and token management
  5. Efficient deep prefetching: read ahead, write behind
  6. Recognize access patterns (adaptable mechanism)
  7. Highly multithreaded daemon
  8. Data shipping mode for MPI-IO and other applications

GPFS is highly available and fault tolerant:

  1. Data protection mechanisms include journaling, replication, mirroring,
  2. shadowing (these are standard file system techniques)
  3. Heartbeat mechanism to recover from multiple disk, node, connectivity
  4. failures
  5. Recovery software mechanisms implemented in all layers

GPFS is in fact transparent to most applications, therefore virtually any applications can work with GPFS as though they were using a local file system. There are some restrictions, though, which must be understood to make sure that your application is able to deliver the expected results when using GPFS (application concurrent mechanisms, application locking characteristics, etc.).

Install and configure a GPFS cluster on AIX

  1. Verify the system environment
  2. Create a GPFS cluster
  3. Define NSD‘s
  4. Create a GPFS file system

GPFS minimum requirements

  1. Two AIX 6.1 or 7.1 operating systems (LPARs)
  2. Very similar to Linux installation. AIX LPP packages replace the Linux RPMs, some of the administrative commands are different.
  3. At least 4 hdisks
  4. GPFS 3.4 Software with latest PTFs
  5. GPFS.base
  6. GPFS.docs.data
  7. GPFS.msg.en_US

Step 1: Verify Environment

  1. Verify nodes properly installed
    1. Check that the operating system level is supported
      On the system run oslevel
      Check the GPFS
    2. Is the installed OS level supported by GPFS? Yes No
    3. Is there a specific GPFS patch level required for the installed OS? Yes No
    4. If so what patch level is required? ___________
  2. Verify nodes configured properly on the network(s)
    1. Write the name of Node1: ____________
    2. Write the name of Node2: ____________
    3. From node 1 ping node 2
    4. From node 2 ping node 1
      If the pings fail, resolve the issue before continuing.
  3. Verify node-to-node ssh communications (For this lab you will use ssh and scp for secure remote commands/copy)
    1. On each node create an ssh-key. To do this use the command ssh-keygen; if you don’t specify a blank passphrase, -N, then you need to press enter each time you are promoted to create a key with no passphrase until you are returned to a prompt. The result should look something like this:

# ssh-keygen -t rsa -N “” -f $HOME/.ssh/id_rsa
Generating public/private rsa key pair.
Created directory ‘/.ssh’.
Your identification has been saved in /.ssh/id_rsa.
Your public key has been saved in /.ssh/id_rsa.pub.
The key fingerprint is:
7d:06:95:45:9d:7b:7a:6c:64:48:70:2d:cb:78:ed:61
root@node1

  1. On node1 copy the $HOME/.ssh/id_rsa.pub file to $HOME/.ssh/authorized_keys

# cp $HOME/.ssh/id_rsa.pub $HOME/.ssh/authorized_keys

  1. From node1 copy the $HOME/.ssh/id_rsa.pub file from node2 to /tmp/id_rsa.pub

# scp node2:/.ssh/id_rsa.pub /tmp/id_rsa.pub

  1. Add the public key from node2 to the authorized_keys file on node1

# cat /tmp/id_rsa.pub >> $HOME/.ssh/authorized_keys

  1. Copy the authorized key file from node1 to node2

# scp $HOME/.ssh/authorized_keys node2:/.ssh/authorized_keys

  1. To test your ssh configuration ssh as root from node 1 to node1 and node1 to node2 until you are no longer prompted for a password or for addition to the known_hosts file.
    node1# ssh node1 date
    node1# ssh node2 date
    node2# ssh node1 date
    node2# ssh node2 date
  2. Supress ssh banners by creating a .hushlogin file in the root home directory

# touch $HOME/.hushlogin

  1. Verify the disks are available to the system
    For this lab you should have 4 disks available for use hdiskw-hdiskz.
  2. Use lspv to verify the disks exist
  3. Ensure you see 4 unused disks besides the existing rootvg disks and/or other volume groups.

Step 2: Install the GPFS software

On node1

  1. Locate the GPFS software in /yourdir/GPFS/base/

# cd /yourdir/GPFS/base/

  1. Run the inutoc command to create the table of contents, if not done already

# inutoc .

  1. Install the base GPFS code using the installp command

# installp -aXY -d/yourdir/GPFS/base all

  1. Locate the latest GPFS updates in /yourdir/GPFS/fixes/

# cd /yourdir/GPFS/fixes/

  1. Run the inutoc command to create the table of contents, if not done already

# inutoc .

  1. Install the GPFS PTF updates using the installp command

# installp -aXY -d/yourdir/GPFS/fixes all

  1. Repeat Steps 1-7 on node2. On node1 and node2 confirm GPFS is installed using the lsLPP command

# lsLPP -L GPFS.\*
the output should look similar to this

Fileset                      Level  State Type  Description (Uninstaller)
———————————————————————————————————————————————————————————————————————————————–
GPFS.base                  3.4.0.11    A    F    GPFS File Manager
GPFS.docs.data             3.4.0.4     A    F    GPFS Server Manpages and Documentation
GPFS.gnr                   3.4.0.2     A    F    GPFS Native RAID
GPFS.msg.en_US             3.4.0.11    A    F    GPFS Server Messages U.S. English

Note1: Exact versions of GPFS may vary from this example, the important part is that the base, docs and msg filesets are present.
Note2: The GPFS.gnr fileset is used by the Power 775 HPC cluster only
  1. Confirm the GPFS binaries are in your $PATH using the mmlscluster command

# mmlscluster
mmlscluster: This node does not belong to a GPFS cluster.
mmlscluster: Command failed.  Examine previous error messages to determine cause.

Note: The path to the GPFS binaries is: /usr/LPP/mmfs/bin

Step 3: Create the GPFS cluster
For this exercise the cluster is initially created with a single node. When creating the cluster make node1 the primary configuration server and give node1 the designations quorum and manager. Use ssh and scp as the remote shell and remote file copy commands.
*Primary Configuration server (node1): __________
*Verify fully qualified path to ssh and scp: ssh path__________
scp path_____________

  1. Use the mmcrcluster command to create the cluster

# mmcrcluster -N node1:manager-quorum -p node1 -r /usr/bin/ssh -R /usr/bin/scp
Thu Mar 1 09:04:33 CST 2012: mmcrcluster: Processing node node1
mmcrcluster: Command successfully completed
mmcrcluster: Warning: Not all nodes have proper GPFS license designations.
Use the mmchlicense command to designate licenses as needed.

  1. Run the mmlscluster command again to see that the cluster was created

# mmlscluster
=====================================================================================
| Warning:                                                                    |
|   This cluster contains nodes that do not have a proper GPFS license        |
|   designation.  This violates the terms of the GPFS licensing agreement.    |
|   Use the mmchlicense command and assign the appropriate GPFS licenses      |
|   to each of the nodes in the cluster.  For more information about GPFS     |
|   license designation, see the Concepts, Planning, and Installation Guide.  |
===============================================================================
GPFS cluster information
========================
GPFS cluster name:         node1.IBM .com
GPFS cluster id:           13882390374179224464
GPFS UID domain:           node1.IBM .com
Remote shell command:      /usr/bin/ssh
Remote file copy command:  /usr/bin/scp
GPFS cluster configuration servers:
———————————–
1.Primary server:    node1.IBM .com
2.Secondary server:  (none)

Node Daemon node name            IP address       Admin node name             Designation
———————————————————————————————–
1  node1.lab.IBM .com          10.0.0.1         node1.IBM .com               quorum-manager

  1. Set the license mode for the node using the mmchlicense command. Use a server license for this node.

# mmchlicense server –accept -N node1

The following nodes will be designated as possessing GPFS server licenses:
node1.IBM .com
mmchlicense: Command successfully completed

Step 4: Start GPFS and verify the status of all nodes

  1. Start GPFS on all the nodes in the GPFS cluster using the mmstartup command

# mmstartup -a

  1. Check the status of the cluster using the mmgetstate command

# mmgetstate -a
Node number Node name GPFS state
——————————————
1 node1 active

Step 5: Add the second node to the cluster

  1. One node 1 use the mmaddnode command to add node2 to the cluster

# mmaddnode -N node2

  1. Confirm the node was added to the cluster using the mmlscluster command

# mmlscluster,/span>

  1. Use the mmchcluster command to set node2 as the secondary configuration server

# mmchcluster -s node2m

  1. Set the license mode for the node using the mmchlicense command. Use a server license for this node.

# mmchlicense server –accept -N node2

  1. Start node2 using the mmstartup command

# mmstartup -N node2

  1. Use the mmgetstate command to verify that both nodes are in the active state

# mmgetstate -a

Step 6: Collect information about the cluster
Now we will take a moment to check a few things about the cluster. Examine the cluster configuration using themmlsclustercommand

  1. What is the cluster name? ______________________
  2. What is the IP address of node2? _____________________
  3. What date was this version of GPFS “Built”? ________________
    Hint: look in the GPFS log file: /var/adm/ras/mmfs.log.latest

Step 7: Create NSDs

You will use the 4 hdisks.

•Each disk will store both data and metadata
•The storage pool column blank (not assigning storage pools at this time)
•The NSD server field (ServerList) is left blank (both nodes have direct access to the shared LUNs)

  1. On node1 create the directory /yourdir/data
  2. Create a disk descriptor file /yourdir/data/diskdesc.txt using the format:

#DiskName:ServerList::DiskUsage:FailureGroup:DesiredName:StoragePoole
hdiskw:::dataAndMetadata::nsd1:
hdiskx:::dataAndMetadata::nsd2:
hdisky:::dataAndMetadata::nsd3:
hdiskz:::dataAndMetadata::nsd4:

Note: hdisk numbers will vary per system.

  1. Create a backup copy of the disk descriptor file /yourdir/data/diskdesc_bak.txt

# cp /yourdir/data/diskdesc.txt /yourdir/data/diskdesc_bak.txt

  1. Create the NSD’s using the mmcrnsd command

# mmcrnsd -F /yourdir/data/diskdesc.txt

Step 8: Collect information about the NSD’s
Now collect some information about the NSD‘s you have created.

  1. Examine the NSD configuration using the mmlsnsdcommand
    1. What mmlsnsd flag do you use to see the operating system device (/dev/hdisk?) associated with an NSD? _______

Step 9: Create a file system
Now that there is a GPFS cluster and some NSDs available you can create a file system. In this section we will create a file system.

•Set the file system blocksize to 64kb
•Mount the file system at /GPFS

  1. Create the file system using the mmcrfs command

# mmcrfs /GPFS fs1 -F diskdesc.txt -B 64k

  1. Verify the file system was created correctly using the mmlsfs command

# mmlsfs fs1

Is the file system automatically mounted when GPFS starts? _________________

  1. Mount the file system using the _mmmount_ command

# mmmount all -a

  1. Verify the file system is mounted using the df command

# df -k
Filesystem    1024-blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4            65536      6508   91%     3375    64% /
/dev/hd2          1769472    465416   74%    35508    24% /usr
/dev/hd9var        131072     75660   43%      620     4% /var
/dev/hd3           196608    192864    2%       37     1% /tmp
/dev/hd1            65536     65144    1%       13     1% /home
/proc                   -         -    -         -     -  /proc
/dev/hd10opt       327680     47572   86%     7766    41% /opt
/dev/fs1        398929107 398929000    1%        1     1% /GPFS

  1. Use the mmdf command to get information on the file system.

# mmdf fs1

How many inodes are currently used in the file system? ______________