Cancel
Showing results for 
Search instead for 
Did you mean: 

NX Nastran Performance Improvements

Siemens Phenom Siemens Phenom
Siemens Phenom

I have been doing a little performance testing in the latest release of NX Nastran (10.2) which is included in the newest Femap release just announced (version 11.2.1) and I thought I would share a few things that I have learned in the process.

 

I was particularly interested in new technolgies like GPU processing and also a new style disk drive (PCIe SSD) which we added to my desktop machine. My desktop is a Dell T3600 with dual quad core Xenon processors and 64 GB of RAM. It has a 1TB hybrid drive, a 512GB SSD drive and we just added a new 256GB PCIe SSD drive. My graphics card is an AMD Firepro W9100 with 16GB.

 

NX Nastran performance improvements

 

In the general information category, it appears NX Nastran 10.2 is about 10% faster than version 10.0 for standard modal dynamics solutions. I have also been happy with new config file (.rcf) settings that started with NX Nastran 10.0. In case you had not noticed there were some pretty significant changes there to help users take better advantage of machines with more memory. The settings are also designed to keep users from asking for too much resource which leads to performance issues. The new settings look like this:

buffsize=32769

memory=.45*physical

smem=20.0X

buffpool=20.0X

 

Observations on GPU performance: Using GPU computing can help reduce run time for certain problems. These are modal frequency response runs where many modes are needed. Implementation of GPU computing so far has focused on two modules in Nastran: DCMP and FRRD. The DCMP implementation is not working very well currently as it appears to increase run time which is seen in the time to actually calculate the modes. The FRRD implementation however works very well with a capable graphics card. The good news is that we can turn on only the FRRD part by using the keyword “CL_FRRD=1” on the command line instead of the advertised “gpgpu=any” method.

 

The elapsed time numbers look like this for NXN 10.2:

No gpu processing                    9:06

Gpgpu=any processing             9:59  (all of the increase can be found in the modes calculation)

Cl_frrd=1 processing                 6:47

 

PCIE SSD performance: Adding a new disk drive for Nastran scratch using the PCIE slot is very effective. It is basically plug and play and the performance is effective for all Nastran solutions. The more I/O that Nastran performs, the more it helps. The elapsed time numbers look like this for NX Nastran 10.2:

SATA                                    9:19

SSD/SATA                            9:15

Pcie/SSD                              9:06

 

 

As with any discussion of performance, your mileage may vary depending on your combination of hardware etc.

 

Comments
Solution Partner Phenom

Dear Joe,

Very good post, congratulations!.

Please clarify the meaning of the elapsed time numbers of your benchmark, "9:19" mean 9 hours + 19 minutes, or 9 minutes + 19 seconds?.

Best regards,

Blas.

Siemens Phenom

Blas,

The run times are min:sec in elapsed time from the f04 file.

Legend

Thanks for posting the GPU info, I've been asking for some real life benchmark since NXN9!

Question for you: you chose cl_frrd=1, yet you have an AMD card.  The rcf file in 10 shows cl_frrd=2 for AMD...  Have you tried cl_frrd=2 to compare?

Legend

Also, to add to the I/O bandwidth discussion, we have equipped our machines with a cache software (Primocache is the one we use), and managed to get read/writes 1 order of magnitude higher than our original 4xSSD RAID0 config, which was already pretty darn good...

Siemens Phenom

Currently, the gpu processing has 2 levels of control. The first is which gpu device to use and if you only have one card, then specifying AMD or NVIDIA does not matter. That would only be useful if you had multiple cards and wanted to specify which one to use for computing. By default, the first card available is used, regardless of the brand.

The second level specifies what operations to pass to the gpu. By default, both decomp and frrd1 operations are passed over. What we have found for windows, is the decomp does not help(I believe this is windows specific, on linux it does help). Using the CL_FRRD=1 sends only the FRRD operations to the gpu which can really speed up the response data recovery.

 

See the NXN 10 Release Guide, under GPU computing for the precise interpretation of the settings and defaults.

Legend

Thanks for the info, we'll test this out on NVIDIA K4000 and see what kind of benefit we get.

Experimenter

I test the GPU acceleration for DCMP on NX 10.0 using AMD firepro W8100 for SOL101.  Without GPU acceleration it takes 1062.220 seconds and with GPU acceleration it takes 1184.446 seconds. Maybe next test I should try on linux.