I'm trying to speed up my NASTRAN V12.0 computations by using my NVIDIA Quadro M1200 GPU. I installed the CUDA v8.0 runtime. As a command line option, I set
However, this did not seem to work (judging by the task manager performance tab). Indeed, in the .f04 file, I find
CUDA: User has requested a GPU device CUDA: cudart64_80.dll not found in PATH
First thing I did, of course, was check the system environment variables. I found both CUDA_PATH and CUDA_PATH_V8_0 pointing to the correct path (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0). For good measure, I also added this path to the PATH variable. This still didn't work, so I even added C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin to the PATH variable, which is where the file cudart64_80.dll is located. Still, nothing.
What would be a logical next step?
I honestly wouldn't bother with GPGPU. GPGPU is going to bring value only for a specific type of problems, typically large. A laptop GPU isn't going to be able to bring much to the table in terms of GPGPU, it's probably going to hurt more than anything if you were able to make it to work...
I was able to reproduce the problem with Simcenter Nastran (NX Nastran 13), and found a workaround: copy the cudart64_80.dll in the em64tntL directory inside the NX Nastran root directory.
But wait, there's more! Once that is fixed, the next hurdle is: CUDA: cublas64_80.dll not found in PATH. Same deal, copy the dll and once that's done:
CUDA: User has requested a GPU device CUDA: Successful init of GeForce RTX 2070
For those who don't have CUDA 8.0 installed (it's a 1.3GB download), the CUDA 8.0 Patch 2 overlay contains both dll, and it's only 43MB. Unzip it and copy both dll in the em64tntL folder...
The acceleration is actually not as bad as I thought it would be, pretty impressive actually! Here's a SOL103 example of a 3M dof model ran with and without GPGPU (32-core AMD Threadripper, 128GB RAM, GeForce RTX2070):
Real: 1160.197 seconds ( 0:19:20.197)
User: 1101.625 seconds ( 0:18:21.625)
Sys: 61.796 seconds ( 0:01:01.796)
Real: 703.380 seconds ( 0:11:43.380) User: 483.750 seconds ( 0:08:03.750) Sys: 221.546 seconds ( 0:03:41.546)