pNFS TESTBEDS
pnfs scale-up clusters
Clients
- 40 Gigabyte H62-Z6A-Y9 Nodes
- 4 Blades per Node
- 100 Blades have 3 Samsung PM9A3 Gen4 x4 M.2, 1.92 TB, each with
- Single Socket AMD EPYC 7313 24 Cores - Hyperthreaded 48 Cores
- 128 GB DDR4 3200 MT/s
- 1x 200 Gb/s Ethernet (ConnectX-6)
- 1 Gb/s management
Data Servers
- 12 Gigabyte R272-Z32-0, each with
- Single Socket AMD EPYC 7502 32 Cores - Hyperthreaded 64 Cores
- 16 NVMe SDDs Gen3 x3 U.2
- 128 GB DDR4 2933 MT/s
- 2x 200 Gb/s Ethernet (ConnectX-6)
- 1 Gb/s management
Mongo Data Servers
- 4 Aeon Eclipse 2U Gen5, each with
- Dual socket 64-core 3.3Ghz EPYC 9575F CPU - 256 threads with HT
- 16x 2TB Samsung 9100 Pro M.2 NVMe SSDs w/ U.2 adapter
- 24 x 32 (768) GB 5600 MT/s DDR5
- 4x 400 Gb/s Ethernet (ConnectX-7)
- 1 Gb/s Management
Grace-grace Server
- 1 Supermicro ARS-221GL-NR-01with
- Dual Socket 72-core 3.4Ghz NVIDIA ARM Neovers-V2
- 2x Samsung PM9A3 Gen4 x4 M.2 3.84 TB
- 2x 240 GB 8532 MT/s LPDDR5X
- 1x 400 Gb/s Ethernet (ConnectX-7) NDR
- 1 Gb/s Management
Networking
Hardware- 1 Arista DCS-7804-CH with
- 2x Supervisor DCS-7800-SUP1A Modules
- 4x 36-port (400G) Line cards (7800R3-36P-LC)
- 5x Arista DCS-7010TX-48-R (1Gb) Switches
PNFS Scale-out clusters
64-node 100GbE cluster (2 racks)
- Single Mellanox 100GbE non-blocking 64-port switch
- Each Dell PowerEdge R640 node having:
- 2x Intel Xeon Gold 6244 8C/16T CPUs
- ConnectX-5 VPI NIC in 100 Gigabit Ethernet mode
- 10GbE control network
- 42x "Client" role nodes 192GB DDR4 ECC RAM
- 22x "Server" role nodes 192GB RAM + 5x Samsung 990 Pro 1TB SS
324-node EDR Infiniband cluster (8 racks + switches)
- 27x Mellanox EDR IB switches in Fat-Tree topology
- Each Dell PowerEdge R640 node having:
- 2x Intel Xeon Gold 6244 8C/16T CPUs
- ConnectX-4 VPI NIC in 100 Gigabit EDR Infiniband mode
- 10GbE control network
- 216x "Client" role nodes 192GB DDR4 ECC RAM
- 108x "Server" role nodes 192GB RAM + 5x Samsung 990 Pro 1TB SSD
SOFTWARE/TEST ENVIRONMENTS
EMULAB
Emulab is a software platform that manages the nodes of a testbed cluster. It provides Emulab users with full bare-metal access to nodes. This allows researches to use a wide range of environments in which to develop, debug, and evaluate their systems. The primary Emulab installation is run by the Flux Group, part of the School of Computing at the University of Utah. There are also installations of the Emulab software at more than two dozen sites around the world, ranging from testbeds with a handful of nodes up to testbeds with hundreds of nodes. Emulab is widely used by computer science researchers in the fields of networking and distributed systems. It is also designed to support education and has been used to teach classes in those fields.
MVPNET
MVPNet is an MPI application that allows users to launch a set of qemu-based virtual machines (VMs) as an MPI job. Users are free to choose the guest operating systems to run and have full root access to the guest. Each mvpnet MPI rank runs its own guest VM under qemu. Guest operating systems communicate with each other using a MPI-based virtual private network managed by mvpnet. Each mvpnet guest VM has a virtual Ethernet interface configured using qemu's -netdev stream or -netdev dgram flags. The qemu program connects this type of virtual Ethernet interface to a unix domain socket file on the host system. The mvpnet application reads Ethernet frames sent by its guest OS from its Ethernet interface socket file. It then uses MPI point-to-point operations to forward the frame to the mvpnet rank running the destination guest VM. The destination mvpnet rank delivers the Ethernet frame to its guest VM by writing it to the VM's socket file. In order to route Ethernet frames, mvpnet uses a fixed mapping between its MPI rank number, the guest VM IP address, and the guest VM Ethernet hardware address. Both IPv4 ARP and Ethernet broadcast operations are supported by mvpnet.
OpenCHAMI
The LANL testbed utilizes OpenCHAMI (GitHub) to boot and manage its nodes. OpenCHAMI is an open source, microservice-based system management platform that adheres to principles of the cloud. Nodes boot images over the network in order to centrally-manage images, which are SquashFS archives built using OpenCHAMI’s image-builder tool. To reduce image complexity, post-boot configuration is handled by OpenCHAMI’s cloud-init server, which is a replacement of the upstream cloud-init by Canonical that organizes post-boot configuration by node group.
Images are built using layers, building subsequent layers on top of existing layers. This is to compartmentalize image changes so that time can be saved rebuilding certain components of an image. Partners are able to perform tasks in booted images such as install packages with the expectation that these changes are ephemeral on reboot. If something in the image needs to be changed persistently, partners can request changes to the configuration of their own image layer and it will be rebuilt using the OpenCHAMI image-builder tool. Otherwise, they can submit their own image to be added to the image repository. If modifying an image is not desired, changes to the cloud-init post-boot configuration for the partner’s image can be requested.
