pycudaref.txt 1.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
  1. # pycuda reference
  2. # accessing a gpu
  3. Google Colab: https://colab.research.google.com/
  4. Kaggle Kernels: https://www.kaggle.com/kernels
  5. # executing a kernel
  6. mod = SourceModule("""
  7. __global__ void doubleval(float *a) {
  8. int i = threadIdx.x + threadIdx.y*4;
  9. a[i] = 2 * a[i];
  10. }
  11. """)
  12. # transferring data
  13. """ 4x4 array of random numbers """
  14. a = np.random.randn(4,4)
  15. """ 'a' consists of double precision numbers, but
  16. most nvidia devices only support single precision """
  17. a = a.astype(np.float32)
  18. """ combine the above two steps """
  19. a = np.random.randn(4,4).astype(np.float32)
  20. """ allocate memory on the device """
  21. d_a = drv.mem_alloc(a.nbytes)
  22. """ transfer the data to the device """
  23. drv.memcpy_htod(d_a, a)
  24. """ transfer the data to the host """
  25. drv.memcpy_dtoh(a, d_a)
  26. # shortcuts for explicit memory copies
  27. The pycuda.driver.In, pycuda.driver.Out, and pycuda.driver.InOut argument
  28. handlers can simplify some of the memory transfers. For example, instead of
  29. creating 'd_a', if replacing 'a' is fine, the following code can be used.
  30. func(drv.InOut(a), block=(4, 4, 1))
  31. # abstracting away the complications
  32. Using a pycuda.gpuarray.GPUArray, the same effect can be achieved with much less
  33. writing:
  34. import pycuda.gpuarray as gpuarray
  35. import pycuda.driver as drv
  36. import pycuda.autoinit
  37. import numpy as np
  38. d_a = gpuarray.to_gpu(np.random.randn(4,4).astype(np.float32))
  39. h_a = (2 * d_a).get()
  40. print(h_a)
  41. print(d_a)