CUDA Thrust Introduction
Thrust is analogous to C++ Standard Template Library(STL). The Thrust algorithms can run on both the host and the device.
Sort, scan, reduce, reduce-by-key
Transform input vector(s)
Eg. vector addition thrust::plus
Or you can apply user defined transformation functors.
interoperate with CUDA code.
If you don’t need to do something that is not built in to Thrust
or that’s difficult to do with Thrust
1 | thrust::device_vector<float> X(3); |
Some Algorithms
These algorithms are existed in thrust, but do not exist in C++ STL library.
1 | thrust::gather(); |
thrust::sequence()
The equivalent of algorithm in STL is std::iota
, which is not a good name. Thrust
changes this name to sequence
.
Some algorithms in <numeric>
header. There are 12 by-key-algorithms in thrust. These 12 algorithms are super useful. You can think of them as segmented versions of their corresponding algorithms. So if you have an inclusive scan by key. This is exactly like an inclusive scan but instead of doing it on a whole range you are doing it on segments of a range, which can come in handy in many cases.
1 | thrust::exclusive_scan_by_key(); |
There are three examples that shows how to use these algorithms.
Problem 1
In C++ STL view.
What is
std::iota
+std::transform
?In Thrust view:
What is
thrust::sequence
+thrust::transform
?
These two algorithms combined to make a single algorithm: thrust::tabulate
. This is very interesting algorithm. Generate the first odd number. For example, generate first 10 odd numbers.
In C++
1
2
3
4
5
6
7
8auto odds = std::vector<int>(10);
std::iota(odds.begin(), odds.end(), 0);
std::transform(
odds.begin(),
odds.end(),
odds.begin(),
[](auto e) { return e * 2 + 1; });In CUDA With Thrust
1
2
3
4auto odds = std::vector<int>(10);
thrust::tabulate(
odds.begin(), odds.end(),
[](auto e) { return e * 2 + 1; });This code is running on the host. And this is very easy to transform this code to device.
1 | auto odds = std::device_vector<int>(10); |
Problem 2
Copy every other number.
The answer is thrust::gather()
.
1 | auto const deck = std::vector<int>{13, 2, 14, 3, 6, 7}; |
Problem 3 — MCO
Maximum Consecutive Ones
Using thrust::reduce_by_key()
for solve this problem.
1 | // 1 1 1 0 0 1 0 1 1 1 1 |
CUDA Thrust Introduction
https://wtffqbpl.github.io/2022/11/22/CUDA-Thrust-Introduction/