Cancel
Showing results for 
Search instead for 
Did you mean: 

Nonlinear matrix decomp. slow and fragmented

Genius
Genius

I'm seeing some strange behavior on a model solve. I think it should be solving faster.

I'm using 6 CPUs (probably relevant). The CPU history is shown below, as is the reduction history. I've marked the points on the reduction history where the solution 'hangs'. I've got 20 GB of HD free, and 50 GB of RAM free.

 

Unbenannt43.pngUnbenannt44.png

 

It's really not a big model, only 73k nodes, but does have block connectivity between the parts with CGAP elements, and two very large RBE3 elements.

 

Unbenannt66.png

 

9 REPLIES

Re: Nonlinear matrix decomp. slow and fragmented

PLM World Member Legend PLM World Member Legend
PLM World Member Legend
Large RBE elements can really "do a number" on an analysis. They will bog it down something awful. I had a similar sized model one time with a large-ish RBE3 element and the analysis was so slow that I replaced it with a non-structural mass to make it work.

Re: Nonlinear matrix decomp. slow and fragmented

Genius
Genius

The same model runs cleanly in linear static analysis though. Not exactly
the same actually, but with the same rbe3 clusters. I'll try some other
approach tomorrow.

 

*Update: It runs the same in linear analysis so matrix density due to RBE3s is almost definitely the problem. I'm still surprised that the solver stops being parallel, but that could be because there is so much coupling that there is no inherent parallelizability when moving over those RBE3s.

Re: Nonlinear matrix decomp. slow and fragmented

Phenom
Phenom
The reason why this occurs is because the sparse solver is most efficient with a narrow bandwidth. The easiest way to think about this is "how connected is the structure". If the model is all plates and beams, then each node is only directly connected to maybe 4 up to 8 other nodes. For solid meshes, each node may be connected to eg. 8 up to 20+ nodes (a higher average for tet compared to brick, usually), which is why solid models are usually slower to solve with the default sparse solver, compared to similar sized plate/beam models. When you use large RBE elements, you are directly connecting hundreds or thousands of nodes, which causes a huge lump of big bandwidth / front size in the stiffness matrix. The sparse solver (probably any solver) is much slower at working through this lump, which is why your graph crawls through that slow/steep section. As Chris says... you could consider an alternate. If you are using the RBE to distribute load, you could consider applying the load (divided by number of nodes) to all the RBE nodes. Not so easy if you the load induces moment as well. But it is worth thinking about alternatives to RBE if you want the fastest runtime.

Re: Nonlinear matrix decomp. slow and fragmented

Genius
Genius

Good points. But it isn't bandwidth, it's matrix density that kills sparse matrix solvers. Your points still stand.

Re: Nonlinear matrix decomp. slow and fragmented

Siemens Phenom Siemens Phenom
Siemens Phenom

Have you checked the f04 file to insure you have allocated enough RAM? 

 

Also, please remember, RBE elements are linear only, and gaps assume no relative motion in the shear direction, both of these can lead to issues if you are considering large displacements.

 

Another thing that can go wrong in performance with nonlinear, if your model is mostly solids, did you constrain the 456 rotations? Be aware that AUTOSPC is off for nonlinear, did you check the grid point singularity table of the linear run to see what was being constrained there?

 

Regards,

 

Joe

Re: Nonlinear matrix decomp. slow and fragmented

Genius
Genius

I'm doing small displacement stuff, but I'm surprised to hear that the coupling code is linear-only. Even the edge-solid weld connection? So such a connection couldn't be rotated 90° in a NLA for example?

Re: Nonlinear matrix decomp. slow and fragmented

Siemens Phenom Siemens Phenom
Siemens Phenom

RBE2 and RBE3 elements are small displacement only by default. The equations are written based on undeformed geometry and are not updated as the model deforms.

The exception would be if you specified "thermal expansion rigids" to trigger Femap to write the case control command "RIGID=LAGRAN". This inables thermal expansion,large displacement and differential stiffness.

 

Glue connections are not a problem in nonlinear.

 

 

Also, just to reinforce, the f04 file contains the information required to understand any performance issues, you could post that if you would like some opinions on possible performance improvements.

 

Joe

Re: Nonlinear matrix decomp. slow and fragmented

Phenom
Phenom
I'm always happy to learn something new, so my formal terminology may not be 100% for this forum.
I found this link useful...
http://mae.uta.edu/~lawrence/me5310/course_materials/me5310_notes/4_Computational_Aspects/4-3_Matrix...
Big RBE's add both density and bandwidth, but whether the bandwidth rather than density contributes any further penalty to the solve time I am not sure. Note that sparse solvers do internally renumber to minimize matrix bandwidth.

But one other thing to keep in mind is that if there is any moderate rotation in your RBE's in SOL106, the analysis will bisect to the smallest allowable load step every time (and noting that RBE's are small displacement elements). If you want to avoid excessive bisection, you can replace the RBEs with beams of appropriate stiffness... quite high for RBE2, quite low for RBE3 (but don't choose either too stiff or too flexible). Custom Tools | Element Update | Convert RBE to active Beam can be used to replace a big RBE with multiple beams. That's if you can't otherwise replace your current RBEs simply by distributing the load to multiple nodes.

Re: Nonlinear matrix decomp. slow and fragmented

Genius
Genius

EndZ wrote:
 Note that sparse solvers do internally renumber to minimize matrix bandwidth.

Not really. The one reodering I know of for sparse solvers is AMD, which doesn't minimize bandwidth at all, quite the opposite usually. It minimizes the number of reduction operations and storage required somehow.

https://en.wikipedia.org/wiki/Minimum_degree_algorithm