Cancel
Showing results for 
Search instead for 
Did you mean: 

MCL V3 How Can I speed up the data ingestion ?

Siemens Creator Siemens Creator
Siemens Creator

I built an agent based on MCL , running on Raspbian (RPiV3b+)

My experience is that the time to process  a request  (transferring data store with timeseries) is respective to the size of the payload ,i.e. how much timeseries data is contained in the payload.

These MCL calls are obviously synchronized / blocking ,  the call comes back after it is processed in MindSphere. That  means you have to wait (until all plausibility tests are done by mindsphere ?) The bottleneck is the response time from MCL / MindSphere. The agent / RPi has enough CPU power for much more ...

My tests showed that there is a maximmum of  approx. 8-10 datapoints per sec.

This means the ingest rate of MindSphere via MCL is very poor.

Any suggestions to improve this behaviour ? 

15 REPLIES 15

Re: MCL V3 How Can I speed up the data ingestion ?

Valued Contributor
Valued Contributor

I am really pretty sure that the mindconnect API returns 200 for well formed data (there are no checks on the API side if it is semantically correct, this is actually the reason why we have diagnostic endpoint). 

 

Are you sure that its not because of your bandwidth to internet? B/C even with synchronous operations that can be a major issue.

 

 

Re: MCL V3 How Can I speed up the data ingestion ?

Siemens Creator Siemens Creator
Siemens Creator

Colleages of ATOS in India report the same ingest rate. Siemens wifi net at the office is pretty fast. As far as i can see the plausbilty tests of the data on mindphere services take too much time. There is a real linear correlation between amount of data points handed over in the data store to MindSphere and the response time.

Re: MCL V3 How Can I speed up the data ingestion ?

Valued Contributor
Valued Contributor

How many entries per message and how many different data points are you using? 

 

I was playing around a little bit with the nodejs library, as all operations there are asynchronous: you can achieve quite a speedup by not waiting for the previous operation to finish, however this can cause a considerable load and it will also affect your resource consumption in the mindsphere as it will most probably go over the limits (not to mention that there are contractual limits in place as well)

With C library I would guess threads/forks but b/c the agents need to keep the state (b/c of the token and client secret rotation) you would probably have to implement external locks.

 

// the code in js looks something like this:

let promises = []; const parallel = 5; for (let index = 0; index < iterations; index++) { promises.push (agent.BulkPostData(data)); if (promises.length % parallel === 0) { await Promise.all(promises); promises = []; } }
await Promise.all(promises);

 

 

 

 

Re: MCL V3 How Can I speed up the data ingestion ?

Valued Contributor
Valued Contributor

Hello @HorstRieger ,

 

I guess you are doing it this way:

[Buffer data] --> [Send data] --> [Buffer data] --> [Send data]

The operation cycle above is not a good practice.

Lots of reasons may cause delays, limiting Buffer data frequency.

Recommended minimum period between two Send data is 10s, that also limits Buffer data frequency.

 

You should try this one:

[Buffer data] --> ... --> [Buffer data] --> [Send data] --> [Buffer data] --> ... --> [Buffer data] --> [Send data]

You can implement the cycle above using one store and one thread, but you will not be able to Buffer data during Send data.

Using two stores and two threads (for Buffer and Send) would be best practice.

 

Please check the attachment for a real operation cycle log where:

  • Buffer[1] means Buffer data to store 1
  • Buffer[2] means Buffer data to store 2
  • Send[1] means Send data from store 1
  • Send[2] means Send data from store 2
  • OK means Send data was successful.

Re: MCL V3 How Can I speed up the data ingestion ?

Siemens Creator Siemens Creator
Siemens Creator

The buffering is no problem. My agent is divided in 2 functional parts.

 

1.Data Aquisation and buffering 

Done by python scripts to access sensors or any other data source. Buffering of data in a text file stored in a ram disk for max performance and not to use the SD Card as disk memory.

 

2. Communication with MindSphere

The agent checks every 10 s for the input file on the ram disk and reads all data contained there. This data gets transferred in MCL data store that gets handed over to MindSphere. During this time the python scripts for data aquisation and storing in the buffer file are running in parallel in the background.

 

The only problem is that the call used by MCL to hand over the data returns after a long time. And this time is proportional to the count of data points / timeseries contained in the data store. 

At a maximum transfer speed around 10 datapoints per second you only can ingest approx. 100 datapoints per 10s.

That is the bottleneck I see.

 

Re: MCL V3 How Can I speed up the data ingestion ?

Valued Contributor
Valued Contributor

I did a quick test on my setup (Windows laptop).

100 datapoints take around 500ms and 20000 datapoints take less than 3 seconds.

These periods may change depending on several reasons, but 10s for 100 datapoints is too much.

Can you please check MCL's log level in your setup? I guess it is Debug.

If you redirected Debug logs to file, it may cause some delay on Raspberry Pi.

Re: MCL V3 How Can I speed up the data ingestion ?

Siemens Creator Siemens Creator
Siemens Creator

I set the debug_level from 4 (debug) to 255 (none).

Things are the same.

 

Do I have to re-compile the whole stuff with another debug settings option ?

Re: MCL V3 How Can I speed up the data ingestion ?

Valued Contributor
Valued Contributor

This function call should also work:

mcl_log_util_set_output_level(MCL_LOG_UTIL_LEVEL_WARN);

Re: MCL V3 How Can I speed up the data ingestion ?

Valued Contributor
Valued Contributor

I attached the complete log file from my Raspberry Pi 3.

Each Buffer[1] collects 5 datapoints, totally more than 1500 datapoints before Send[1].

You can observe that time period between Send[1] and OK is around 1s.