Arm SIMD | Arm Developer Hub

51本色

TOPIC 1

Scalable Matrix Extension (SME)

Learn how to use SME in your code: via intrinsics (C/C++) or in Assembly, and check several SME2 code examples.

Get started using SME and SME2 in your code, enabling matrix operations such as multiplication, inversion and on-the-fly transposition. SME2 extends SME by introducing multi-vector data-processing instructions, load to and store from multi-vectors, and a multi-vector predication mechanism. Several SME2 code examples are showcased at the end.

The SME glossary with the SME keyword attributes, SME types, SME functions and intrinsics.

SME introduction blogs:

, giving an overview of the key SME features.
, introduces some of the instructions that SME provides.

A hands-on getting started learning path, to get you started using SME2 in assembly code, as well as intrinsics code, with a matrix multiplication example.

TOPIC 3

Optimize with Arm SIMD

Learn how to optimize in Assembly and in C/C++ using Neon, SVE, and SVE2 intrinsics. Arm intrinsics are a set of C/C++ functions whose precise implementation is known to the Arm compiler, GCC and LLVM. The LLVM (open-source Clang) version 5 and onwards includes support for SVE, and version 9 and onwards includes support for SVE2.

The Arm intrinsics search engine can be filtered by SIMD ISA (Neon, SVE, SVE2, Helium), base type (floating point, integer, etc.), bit size, and architecture.

Optimizing C/C++ and Assembly Code with Arm SIMD

The , , , , explain how to use intrinsics in your C/C++ code to take advantage of SIMD in Armv8 and Armv9. For IoT Cortex-M ecosystem, there is the .

C/C++ Case Studies with Open-Source Libraries

with Neon Intrinsics, Optimizing library with Neon intrinsics, for Arm Neoverse CPUs.

C compilers have limited ability to vectorize loops with conditional statements. Learn how best to use Arm Neon intrinsics to get the best optimized code from C compilers.

TOPIC 3

Migrate from x86 and x64 to Arm Intrinsics

Learn about the different methods of porting existing x86 and x64 to Arm SIMD. And get inspired with several case studies from cloud to edge.

Learn about different libraries to migrate the x86 and x64 Intrinsics code to Arm intrinsics, and how to find intrinsics in large code bases.

Vectorscan is a portable fork of Intel’s Hyperscan. Learn about the porting challenges and the success of the porting project.

Optimize with Arm Intrinsics for Android

A wealth of resources on how-to get started using Arm intrinsics (Neon and SVE2) on Android’s NDK.

A case study on how H.266 (VVenC and VVdeC) was converted from x86 and x64 to Arm Neon with SIMDe, leveraging over 200% performance gains.

Read the list of considerations to take when deciding which library would be best suited to your SIMD porting needs.

Blog going through the different porting options with the pros and cons of each, when migrating x86 or x64 code to Arm intrinsics.

Arm Developer Program

51本色

How to Use Arm SIMD to Achieve Huge Performance Gains

Scalable Matrix Extension (SME)

Optimize Your Programs

Optimize with Arm SIMD

Migrate from x86 to Arm

Scalable Matrix Extension (SME)

Optimize with Arm SIMD

Migrate from x86 and x64 to Arm Intrinsics

Join the Arm Developer Program

Community Support

Learn from the Community

George Steed

Tell Us What We Are Missing

51本色

础谤尘アカウント

How to Use Arm SIMD to Achieve Huge Performance Gains

Scalable Matrix Extension (SME)

Optimize Your Programs

Optimize with Arm SIMD

Migrate from x86 to Arm

Scalable Matrix Extension (SME)

Optimize with Arm SIMD

Migrate from x86 and x64 to Arm Intrinsics

Join the Arm Developer Program

Community Support

Learn from the Community

George Steed

Tell Us What We Are Missing