High-Performance Intra-Node MPI Communication
See the
Download page for details about all releases.
News:
Use KNEM git if building for recent kernels.
No new releases are published anymore, but fixes for newest kernels are still applied to the repository.
(2024)
News:
Mellanox OFED
distribution includes KNEM starting with release MLNX_OFED-1.5.3-3.0.0.
(2011/12/08)
News:
Open MPI 1.5 released,
enables KNEM support by default.
(2010/10/10)
News:
NetPIPE 3.7.2
released with KNEM support.
(2010/08/20)
News:
MVAPICH2 1.5 released, includes MPICH2 1.2.1p1, which contains KNEM support.
(2010/07/12)
News:
MPICH2 1.1.1 released, includes KNEM support.
(2009/07/21)
Summary
KNEM is a Linux kernel module enabling high-performance intra-node
MPI communication for large messages.
KNEM works on all Linux kernel since 2.6.15 and offers support for
asynchronous and vectorial data transfers as well as offloading
memory copies on to Intel I/OAT hardware.
MPICH2
(since release 1.1.1) uses KNEM in the DMA LMT
to improve large message performance within a single node.
Open MPI also includes KNEM support
in its SM BTL component since release 1.5.
Additionally, NetPIPE
includes a KNEM backend since version 3.7.2.
Discover how to use them here.
The general documentation covers installing, running and using KNEM,
while the interface documentation describes the programming
interface and how to port an application or MPI implementation to KNEM.
To get the latest KNEM news, report issues or ask questions, you should subscribe to the
knem mailing list.
See also the news archive.
Why?
MPI implementations usually offer a user-space double-copy based intra-node communication
strategy. It's very good for small message latency, but it wastes many CPU cycles, pollutes
the caches, and saturates memory busses.
KNEM transfers data from one process to another through a single copy within the Linux kernel.
The system call overhead (about 100ns these days) isn't good for small message latency but
having a single memory copy is very good for large messages (usually starting from dozens
of kilobytes).
Some vendor-specific MPI stacks (such as Myricom MX, Qlogic PSM, ...) offer similar abilities
but they may only run on specific hardware interconnect while KNEM is generic (and open-source).
Also, none of these competitors offers asynchronous completion models, I/OAT copy offload
and/or vectorial memory buffers support as KNEM does.
Download
KNEM is freely available under the terms of the BSD license (user-space tools)
and of the GPL licence (Linux kernel driver)
Source code access and all tarballs are available from the Download page.
Papers
-
Brice Goglin and Stéphanie Moreaud.
KNEM: a Generic and Scalable Kernel-Assisted Intra-node MPI Communication Framework.
in Journal of Parallel and Distributed Computing (JPDC).
73(2):176-188, February 2013.
Elsevier.
Available here.
This paper describes the design of KNEM and summarizes how it was successfully integrated
in MPICH and Open MPI for point-to-point and collective operation improvement.
If you are looking for general-purpose KNEM citations, please use this one.
-
Teng Ma, George Bosilca, Aurelien Bouteiller, Brice Goglin, Jeffrey M. Squyres, and Jack J. Dongarra.
Kernel Assisted Collective Intra-node Communication Among Multi-core and Many-core CPUs.
In Proceedings of the 40th International Conference on Parallel Processing (ICPP-2011),
Taipei, Taiwan, September 2011.
Available here.
This article describes the implementation of native collective operations in Open MPI
on top of the KNEM RMA interface and the knowledge of the machine topology,
leading to dramatic performance improvement on various multicore and manycore servers.
-
Stéphanie Moreaud, Brice Goglin, Dave Goodell, and Raymond Namyst.
Optimizing MPI Communication within large Multicore nodes with Kernel assistance.
In CAC 2010: The 10th Workshop on Communication Architecture for Clusters, held in conjunction with IPDPS 2010.
Atlanta, GA, April 2010.
IEEE Computer Society Press.
Available here.
This paper discusses the use of kernel assistance and memory copy offload for various
point-to-point and collective operations on a wide variety of modern shared-memory
multicore machines up to 96 cores.
-
Darius Buntinas, Brice Goglin, Dave Goodell, Guillaume Mercier, and Stéphanie Moreaud.
Cache-Efficient, Intranode Large-Message MPI Communication with MPICH2-Nemesis.
In Proceedings of the 38th International Conference on Parallel Processing (ICPP-2009),
Vienna, Austria, September 2009.
IEEE Computer Society Press.
Available here.
This paper describes the initial design and performance of the KNEM implementation
when used within MPICH2/Nemesis and compares it to a vmsplice-based implementation
as well as the usual double-buffering strategy.
-
Stéphanie Moreaud.
Adaptation des communications MPI intra-noeud aux architectures multicoeurs modernes.
In 19ème Rencontres Francophones du Parallélisme (RenPar'19),
Toulouse, France, September 2009.
Available here.
This french paper presents KNEM and its use in MPICH2/Nemesis before looking in depth
at its performance for point-to-point and collective MPI operations.
-
Brice Goglin.
High Throughput Intra-Node MPI Communication with Open-MX.
In Proceedings of the 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2009),
Weimar, Germany, February 2009.
IEEE Computer Society Press.
Available here.
The Open-MX intra-communication subsystem achieves very high throughput
thanks to overlapped memory pinning and I/OAT copy offload.
This paper led to the development of KNEM to provide generic MPI implementations
with similar performance without requiring Open-MX.
There are several papers from people using KNEM:
-
Teng Ma, George Bosilca, Aurélien Bouteiller, and Jack Dongarra.
HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters,
in Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS '12).
Best paper award.
Shanghai, China, May 2012.
IEEE Computer Society Press.
This article presents a framework that orchestrates multi-layer hierarchical collective algorithms
with inter-node and kernel-assisted intra-node communication.
-
Teng Ma, Thomas Herault, George Bosilca, and Jack Dongarra.
Process Distance-aware Adaptive MPI Collective Communications
in Proceedings of the International Conference on Cluster Computing.
Austin, TX, September 2011.
IEEE Computer Society Press.
This article presents the distance- and topology aware implementation of some collective
operations over KNEM in Open MPI.
-
Teng Ma, Aurélien Bouteiller, George Bosilca, and Jack Dongarra.
Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW
in Proceedings of the 18th EuroMPI conference.
Santorini, Greece, September 2011.
LNCS, Springer.
This article shows how Open MPI KNEM-based collective operations improve CPMD and FFTW performance.
-
Teng Ma, George Bosilca, Aurélien Bouteiller, and Jack Dongarra.
Locality and Topology aware Intra-node Communication Among Multicore CPUs.
in Proceedings of the 17th EuroMPI conference.
Stuttgart, Germany, September 2010.
LNCS, Springer.
This article describes a framework for tuning shared memory communications
in Open MPI according to locality and topology.
-
Ping Lai, Sayantan sur, and Dhabaleswar Panda.
Designing truly one-sided MPI-2 RMA intra-node communication on multi-core systems.
in International Computing Conference (ISC'10).
Hamburg, Germany, May-June 2010.
Springer.
This paper compares one-sided performance of a dedicated custom implementation in MVAPICH2
with those of MPICH2 and Open MPI with their generic KNEM framework.
Last updated on 2024/02/06.