Showing results for 
Search instead for 
Did you mean: 

GPU Technology Revolution – More Cores, Faster Processing


Five years ago, there was just one way to speed up complex computing tasks: buy more computers. With CAD and CAE software focused around the CPU, the only way for an enterprise to accelerate complex simulations jobs was to build huge, costly supercomputing networks or computer cluster.


Now there is another solution: GPU computing. Whereas even the most powerful professional CPUs typically have only four or six processing cores, modern GPUs such as AMD FirePro™ have hundreds. By tapping into the power of the GPU for general processing tasks, even an ordinary desktop workstation can become a supercomputer in its own right.


In order for this to happen, your simulation software needs to be able to talk to the GPU—and for that, you need an API. This is OpenCL. Originally developed by Apple and supported by leading hardware manufacturers Intel, IBM, Nvidia and AMD, this open standard offers a powerful, flexible—and, above all, universal—way to harness the power of modern GPUs.


OpenCL™ is the first truly open and royalty-free programming standard for general-purpose computations on heterogeneous systems.  OpenCL allows programmers to preserve source code investment and easily target multi-core CPUs, GPUs and APUs.


Applications accelerated with OpenCL can access the combined processing power of a computer or server’s GPU and CPU or APU cores under a single unified platform—a game-changing development for HPC users.


Siemens PLM simulation experts saw the potential to increase NX Nastran simulation performance using the massively parallel architecture of GPUs. With customer interest in leveraging GPU computing also evident, the Siemens-AMD cooperation expanded with a technical partnership to apply AMD FirePro graphics GPU processing power and the open-source, GPU programming language, OpenCL, to the issue of optimizing special calculations within NX Nastran.




To learn more about these technologies, visit FirePro Graphics.


Antoine Reymond is an industry executive for Professional Graphics at AMD. He has been with AMD since 2007 and has a well-rounded background in CAE simulation, CAD, and enterprise data management. His postings are his own opinions and may not represent AMD’s positions, strategies, or opinions.



Hi Antoine,


We'd love to try this, but are not familiar with the AMD product line...  We've been nVIDIA all the way from day 1, but the FirePro line seems interesting...  What recommendations would you have as far as workstation-based acceleration for NX NASTRAN?  We're looking at W4300 and W7100, is that going to bring any performance improvement?  I'm a bit worried about the double-precision performance... 


I would suggest for you to test the AMD FirePro W9100. The latest is 32GB GPU memory with 2.62 TFLOPS double precision compute performance.

Recommendations are shown in our collateral on the NX page at:

AMD FirePro W4300 and W7100 are designed for other design engineering workflows so in this case AMD FirePro W9100 is recommended.


Hi Antoine, thanks for your reply.  Unfortunately, sticking a $4,000 card in our workstations isn't quite something I want to do Smiley Happy


Too bad you guys don't have the "Experience Firepro" program anymore (, we would have at least tried an 8100...


Happy to say that you can buy up to 5x W9100 and receive a 30% rebate on each card using the link:


That's excellent news, thanks so much for sharing this!

We'lll start with a W8100, can't wait to check it against nvidia K40 Smiley Happy


OK, now that I've got my W8100 (what a beast BTW!), what's the best settings for NXN10 I should be using to take full advantage of it?


Congratulation on your new AMD FirePro W8100




There are no settings to change on the driver or CCC side.
You just have to be running a large enough FRRD1 model to take advantage of the GPU and you have to add some command line options when running the job: nrec, rdscale and cl_frrd.
For the conrod model (a model AMD created), here are the settings we used on a 12 core / 128GB system:

CPU only:
scratch=yes batch=no sdirectory=/scratch memory=7.8gb sscr=400gb nrec=128 rdscale=2.5 parallel=12

scratch=yes batch=no sdirectory=/scratch memory=7.8gb sscr=400gb nrec=128 rdscale=2.5 cl_frrd=1 parallel=12

Some additional comments:

Just to manage expectations properly: there are two areas where the GPU is used, in DCMP (as excerpted below) and in FRRD1 for SOL 111 jobs. DCMP performance with GPU will probably be on par with 4-way SMP (that’s a big max front size btw, which for GPU purposes is good).

You are likely to see more impressive results in FRRD1, given 10k or more modes computed.

With respect to EIGRL and FREQ1, roughly:
• increasing EIGRL size (number of modes) increases the relative performance of the FRRD1 module with GPU as compared to CPU only.
• Increasing FREQ1 number of responses increases the total time spent in the FRRD1 module.

So to demonstrate best possible end-to-end speedup, you need:
• A large number of modes so that the FRRD1 module delivers a good improvement with GPU, and
• A large number of responses so that FRRD1 time is dominant.

Information on GPU acceleration can also be found in NX Nastran 9.1 Release
Guide – Chapter 1: GPU Computing