Cancel
Showing results for 
Search instead for 
Did you mean: 

Distributed experiments in 12.1

Legend
Legend

Hello,

 

I'm having problems with running distributed simulations with 12.1. When the experiment manager is closing the remote applications they result in the remote applications crashing just as the dialog says Closing model. I'm not sure how to reproduce them though. 

10 REPLIES 10

Re: Distributed experiments in 12.1

Siemens Phenom Siemens Phenom
Siemens Phenom

Hello Verbalins,

 

at the end of an experiment study the controlling Experiment Manager shows the following message:

The experiment run is finished.

The running time was ...

 

At this moment all results were already read by the controlling Experiment Manager. After confirmation this message the Experiment Manager terminates all remote Plant Simulation processes.

 

If you want to reproduce the results of a single observation of some experiment you can set the corresponding parameter in the Advanced Settings dialog (tab Validation).

 

When a remote Plant Simulation process terminates unexpectedly the controlling Experiment Manager detects that a remote process is missing. The Experiment Manager reopens the simulation job and tries to start a new process on the remote computer.

 

Regards,

Peter

Re: Distributed experiments in 12.1

Legend
Legend

Thank you for your reply!

 

The program crashes without exiting, requiring user input to close the window. This leaves instances of Plant Simulation open, eventually depleting RAM resources on the machine.

 

I have not seen it restarting a distributed simulation, usally it only fails and returns en empty result in the ExperimentManager.

 

When running distributed experiments on several machines, this creates an uncontrollable situation. See the attached screenshot. If you exit the dialog about the crash with "close the program", Plant Simulation stops and displays a screen like the second screenshot, but if you close the error dialog with the x it exits.

 

This is only an issue in 12.1, and affects both remote hosts and running distributed simulations on localhost.

Re: Distributed experiments in 12.1

Siemens Phenom Siemens Phenom
Siemens Phenom

Hello verbalins,

 

the second screenshut shows that any method called closeModel.

The model for a distributed simulation must not contain this instruction.

Please look at the word closeModel in the Source Code.

 

On the other hand there are too many Plant Simulation processes.

Clearly, your computer is quickly busy.

Therefore I recommend using the latest version of the Experiment Manager.

Open the Advanced Settings dialog, tab Distribution. Press the button Protocol and look at the Cardfiles in column 8 (To open the Cardfiles select such a cell of the MachinesTable and press F2).

All actions on the remote computers are recorded.

 

Regards,

Peter

Re: Distributed experiments in 12.1

Legend
Legend

Thank you for your reply Peter,

 

I have searced through the source code to find references to closeModel, but I can't find any at all through any of my models.

 

This issue is affecting several computers with different models all with the ExperimentManager version 12.0.3.1.

 

Yes, there are many processes, but that isn't due to me but due to them not exiting correctly when running distributed experiments. 

 

When the ExperimentManager is running the method EndOfAllExperiments and arrives at line 128, calling the method TerminateAllRemoteeMPlants, the remote clients crash instead of quitting. So the experiments are finished but the remote client can't be closed correctly through DCOM. I can't see why since the TerminateAllRemoteeMPlants isn't possible to debug. The logging in Protocol shows no problems, but as you can see from the first screenshot in the last post the processes aren't killed meaning when running several experiments crashed Plant Simulation instances are present and taking up resources. I need to manually go into task manager and kill the Plant Simulation processes.

Re: Distributed experiments in 12.1

Legend
Legend

After further testing I noticed something strange. When running distributed experiments on localhost, experiment manager would record that a timeout had occurred and that the experiment had finished and showing the report (filled with zeroes). But the remote processes would continue to run.

 

We have now started to revert to Experiment manager 12.0.2.

This version of the library works for us, the crashes are still happening but the results are delivered back to the experiment manager. I would recommend everyone else with the same issue to stop using 12.0.3.1 and go back to the earlier version found in Plant Simulation 12.0.2.

Re: Distributed experiments in 12.1

Siemens Phenom Siemens Phenom
Siemens Phenom

Hello verbalins,

 

the analytical capabilities of the Experiment Manager (EM) for Plant Simulation 11 and 12 were improved. For a distributed simulation the model must satisfy some conditions: For each parameterization the simulation must terminate and the debugger must never open. Otherwise the EM should treat errors during the simulation. That is not always satisfied.

 

The current version of the EM detects an infinite, non-terminating simulation runs. For this purpose we use the time value Terminate after (tab Distribution of the Advanced Settings dialog). It is the maximal CPU time for a simulation run. If a simulation run needs more time a timeout is recognized. The simulation is stopped. This is recorded in column 8 of the table which is opened by the button Protocol on the tab Distribution. The CPU time needed for each simulation run can be seen in the JobTable which is opened by the context menu Show Distributed Simulation. A simulation run with a timeout is also marked as 'ready' in the JobTable. Of course, the corresponding observation of the output value is missing. The GAWizard, which is also equipped with a Distributed Simulation, evaluates such an individual with a so-called Penalty Value. The remote process continues to run while there are further Jobs.

 

The method TerminateAllRemoteeMPlants is a built-in method. If the controlling EM calls that method all remote Plant Simulation processes (started by that EM) are closed. Another process of the same version is not closed.

If the remote processes of an EM are not closed the reason can be a DCOM problem. I do not expect an error in the method TerminateAllRemoteeMPlants.

 

Regards,

Peter

 

Re: Distributed experiments in 12.1

Legend
Legend

Hello again,

 

I've updated to Experiment Manager 12.0.4 and Distributed Simulation 12.0.4 but we are still getting the same issue with models crashing on completion when running distributed experiments.

Re: Distributed experiments in 12.1

Siemens Phenom Siemens Phenom
Siemens Phenom

Hello verbalins,

 

it seems to be, that this problem happens when the model has a 3D part. Is that right in your case?

 

For a distributed simulation study it is a good idea to save the model without 3D. The study will be completed more quickly.

 

Regards,

Peter

Highlighted

Re: Distributed experiments in 12.1

Legend
Legend

Hello Peter!

 

In the models that we have tested 3D should not be activated. When I try pressing the Switch between 2D/3D button it says that the 3D model will be created from scratch. Is there a way to verify that the 3D components are switched off?