Overlap Of Communication And Computation (part 2)

Servidores servidores

In part 1 of this series, I discussed various peer-wise technologies and techniques that MPI implementations typically use for communication / computation overlap.

MPI-3.0, published in 2012, forced a change in the overlap game.

Specifically: most prior overlap work had been in the area of individual messages between a pair of peers. These were very helpful for point-to-point messages, especially those of the non-blocking variety. But MPI-3.0 introduced the concept of non-blockingcollective(NBC) operations. This fundamentally changed the requirements for network hardware offload.

Let me explain.

An MPI collective operation potentially includes more than two peers, and can have all manner of algorithms used to effect their implementation.

For example, consider a simple tree-based broadcast from a root to N peers. The root will send its message to M peers. Each of those peers will then send the message to M more peers. Rinse, repeat until the message has been delivered to all N peers.

In this case,something had to happenat each non-root MPI process before messages were sent on to the next set of MPI processes. This is different than typical splitting-initiation-and-completion overlap semantics: we must wait for a specific action before something else can occur.

Here's some simplified pseudocode showing how a broadcast can be implemented:

if (I_have_a_parent)
    MPI_Recv(message, ..., parent_rank, tag, comm, &status);
if (I_have_children)
    for (i = 0; i < num_children; ++i)
        MPI_Send(message, ..., child_ranks[i], tag, comm);

That is, non-root MPI processes receive their parent, and if they have children of their own, send the message to each of them.

Question:How would this be offloaded to NIC hardware?
Answer:With existing overlap concepts, it can't. Need to develop new overlap concepts.

The reason it can't be offloaded to existing concepts is because they were focused on simple unconditional send and receive concepts - they had no concept of logic, conditional actions, or delayed operations.

Enter the concept of thetriggered send.

A triggered send is pretty much exactly what it sounds like: a send that doesn't occur until some event happens. As applied to our simple tree-based broadcast above, the "if I have children" block can be a set of triggered sends; the trigger can be the message receipt from the parent.

Fun note: if the NIC is powerful enough, the triggered sends can use the same (payload) buffer that it just received from the parent.

Hence, the whole tree-based broadcast can be offloaded to hardware, and CPU / NIC overlap is achieved. Huzzah!

There are tricky parts, of course, such as:

What happens when an MPI process with a parent in the broadcast tree is late to join the broadcast op, such that the message from the parent has already arrived (i.e., the trigger has already occurred)?
How expensive is it to setup triggered operations? (i.e., how much time does it take)
How fast can the hardware respond to triggered events?
How to handle resource exhaustion on the NIC when a triggered operation is only partially complete? (e.g., what if there are no send credits)
How well do these triggered operations scale (e.g., how many pending NBCs can be outstanding before a) resource exhaustion, and/or b) performance degradation)?

...and so on. This is still an area of active research.

The point here is that MPI-3's NBC operations have changed the game, and introduced new ideas into how to achieve communication / computation overlap. More new ideas will undoubtedly emerge over time as researchers and vendors continue to explore this space.

This is an area to keep watching.

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

Servidores servidores

Noticias calientes

Huawei s5735-l24p4s-a-v2 Delivers Scalable, Secure, and Smart PoE Access for Modern IT Infrastructures

Huawei S5735-L48T4XE-A-V2 Switch Delivers Enterprise-Grade Performance in a Compact Design

Huawei S5735-L48P4XE-A-V2 Review: Versatile Campus Switch with iStack and Full L3 Support

Differences Between Huawei CE Series and S Series Switches

Huawei CloudEngine S5735 Switches Set the Benchmark for High-Performance, Energy-Efficient Switching

Huawei CloudEngine S5731‑S48P4X Datasheet

Huawei CloudEngine S5731‑S24P4X Datasheet

Huawei S5731-S Empowers Next-Generation Campus Networks with Advanced Capabilities

Huawei S5731-H24P4XC Switch Review: Power-Packed Performance and Smart PoE

Huawei S5731-H Series Switches Redefine Campus Networking with Intelligent High-Performance Architecture

Top Features of the Huawei S5731-S24T4X: The Ultimate Gigabit Access Switch for Modern Networks

General Power Module Fault Location Procedure (CE8800 & 7800 & 6800 & 5800)

How Do I Split a Stack? How to clear the stacking configuration?

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

Overlap of communication and computation (part 2)

Etiquetas calientes: mpi MPI-3 HPI

Ordering Guide

Recursos recursos

Sobre nosotros

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

Servidores servidores

Noticias calientes

Huawei s5735-l24p4s-a-v2 Delivers Scalable, Secure, and Smart PoE Access for Modern IT Infrastructures

Huawei S5735-L48T4XE-A-V2 Switch Delivers Enterprise-Grade Performance in a Compact Design

Huawei S5735-L48P4XE-A-V2 Review: Versatile Campus Switch with iStack and Full L3 Support

Differences Between Huawei CE Series and S Series Switches

Huawei CloudEngine S5735 Switches Set the Benchmark for High-Performance, Energy-Efficient Switching

Huawei CloudEngine S5731‑S48P4X Datasheet

Huawei CloudEngine S5731‑S24P4X Datasheet

Huawei S5731-S Empowers Next-Generation Campus Networks with Advanced Capabilities

Huawei S5731-H24P4XC Switch Review: Power-Packed Performance and Smart PoE

Huawei S5731-H Series Switches Redefine Campus Networking with Intelligent High-Performance Architecture

Top Features of the Huawei S5731-S24T4X: The Ultimate Gigabit Access Switch for Modern Networks

General Power Module Fault Location Procedure (CE8800 & 7800 & 6800 & 5800)

How Do I Split a Stack? How to clear the stacking configuration?

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

Overlap of communication and computation (part 2)

Etiquetas calientes: mpi MPI-3 HPI

Ordering Guide

Recursos recursos

Sobre nosotros

Huawei CloudEngine S5731‑S48P4X Datasheet