- CUBLAS uses column-major storage and 1-based indexing (because to maintain compatibility with Fortran and btw, i'm not expert in Fortran but i know at least this much about it); contrast this to C and C++'s of using row-major storage and 0-based indexing.
inline functions to implement matrices on top of 1-dimensional arrays. As i code in C
and C++ primarily since i don't understand Fortran, the macro i used as recommended
in Nvidia CUDA's documentation is
#define IDX2C(i, j, ld) (((j)*(ld))+(i))
So here's a situation how i use the above macro to compute the index of an 1-D array
element using what looks like a 2-D matrix:
for( int j = 0; j < N; ++j) {
for( int i = 0; i < M; ++i) {
a[ IDX2C(i, j, M) ] = value;
}
}
...
// call a CUBLAS function (e.g. cublasScal) and using IDX2C to compute array index
cublasScal(...,..., &a[IDX2C(p,q,ldm)], ldm);
- Include the header file cublas.h into your programs in case you forget. Normally, i don't use the syntax #include <cublas.h> but instead #include "cublas.h" for obvious reasons.
- Remember to link your apps with the dynamic library provided cublas.so (Linux), cublash.dll (Windows) & cublas.dylib (Mac OS X) and they have the dynamic libraries for emulation purposes with naming conventions like cublasemu.so etc.
Notes of using CUFFT
- The NVIDIA CUDA implemented their version of FFT using FFTW and follows what's known as a plan - which specifies the optimal or minimal number of flops for execution.
- Depending on your graphics cards shared memory configuration, its best if you can fit your computation entirely in the CUDA's shared memory to minimize use of global memory.
- Refer to the CUFFT documentation for more details as its continually evolving as hardware and software continue to mature over the next few months.
0 comments:
Post a Comment