Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Slamtec LIDAR Connection Issues with MATLAB #2

Open
ismet55555 opened this issue May 19, 2020 · 12 comments
Open

Multiple Slamtec LIDAR Connection Issues with MATLAB #2

ismet55555 opened this issue May 19, 2020 · 12 comments

Comments

@ismet55555
Copy link

ismet55555 commented May 19, 2020

I'm running into some initial LIDAR connection issue with simultaneously connecting 4 Slamtec RPLIDAR A3 using this MATALB interface llibrary.

The issue is that I am having to retry the connection on at least one of the LIDARs before it connects.
And it can also vary with LIDAR that is. That is, all but one LIDAR connects the first time.
One time, it could be LIDAR on one COM port, another time it could be a LIDAR on another COM port.

This is the way it is set up right now.
Basically MATALB loads the provided interface library, hardwarex.dll. That exposes some library methods to be used by MATLAB.

The method to connect the LIDAR does the following:

  1. Opens the RS232 port
  2. Sets port options
  3. Gets some info and health statuses form lidar
  4. Sets the motor PWM to zero (stop lidar motor)
  5. Uses express scan mode option

Here somewhere the communication will error out.
Using a serial sniffer I was able to see that the LIDAR errors out after the following message to the LIDAR:

a5 f0 02   ff 03 ab   a5 25   a5 82 05 00 00 00 00 00 22

Which I tracked to the following library methods, in that order

SetMotorPWMRequestRPLIDAR()
CheckMotorControlSupportRequestRPLIDAR()
StopRequestRPLIDAR()
StartExpressScanRequestRPLIDAR()   <-- Error here

To which the LIDAR responds with:

a5 5a 54 00 00 40 82 

Where as a successfully connection response from the LIDAR much longer in content.


Things I've tried

  • Drain (force all write data) the write buffer with the interface libraries DrainComputerRS232Port() method before and/or after any write to lidar.
  • Setting the TX/Write OS FIFO buffer to FILE_FLAG_NO_BUFFERING (ie. WriteFile()).
  • Changing the Hardware FIFO buffer form max (16) to min (1).
  • Using MATLAB's serial() command to flush any input or output buffers prior to loading the library or trying the connections.

This is the system and settings I am working with

Lidar (x4):

  • Slamtec RPLIDAR A3
  • Firmware 1.26
  • Connected via USB (no USB hub used)
  • No other COM port devices connected

Computer

  • OS: Windows 10 Pro - Build 1903
  • CPU: Intel Xeon 3.00Ghz
  • RAM: 64 GB
  • HD: SSD - 512GB NVMe

Serial Port Settings

  • Boud Rate: 256000
  • Timeout: 1000

Software

  • MATLAB R2018b (9.5.0)
@lebarsfa
Copy link
Member

lebarsfa commented May 20, 2020

Thank you for your detailed report. Here are some info, ideas, suggestions, questions:

  • I have not tested the RPLIDAR A3 so it is possible that something is not correctly done in the communication protocol, I would have to check more in details.
  • The code has not been designed to be able to use multiple devices of the same type in the same program, though I would say it should work for the RPLIDAR as long as you do not use the RPLIDAR MATLAB functions with "Thread" in their name (this is why they are just in comments in test_rplidar.m)...
  • Maybe you already do this but in case you do otherwise, I would recommend to copy the files of the repo in 4 folders, launch 4 separate instances of MATLAB and each instance would run its own version of test_rplidar.m. This would be a workaround to isolate as much as we can the code in case it does not behave well with multiple devices in the same program. This would also naturally add some delay between the launch of each device in case starting them exactly in the same time causes problems (e.g. power spikes). So maybe try to wait 5 s before launching each program...
  • I wonder if the RPLIDAR A3 tries to get some power from the USB port, even if it has its own external power supply. Having multiple power-consuming devices on the USB bus often cause power supply problems : the PC might not be really able to provide enough power to all the USB ports in the same time (sometimes even if theoretically we would expect it should). If you have USB hubs with external power supply, it might be worth to try to connect the RPLIDAR through them to the computer, it case there is some kind of power supply issue.
  • Could you confirm that everything works with 3 devices?
  • What did you mean by

Where as a successfully connection response from the LIDAR much longer in content

?

@ismet55555
Copy link
Author

ismet55555 commented May 21, 2020

Hey, first of all thank you very much for your quick and thorough response!
Also, this entire repo has been amazing help for us here.

I just saw your response, however I think we have gotten it to work and it may be something to consider and change.

To answer your questions first

  • The less lidars, the less issues, however the connection issue would happen here and there
  • What I meant was, the response from the lidar seemed to be a full payload and not a partial one that would fail the initial connection

Maybe it would be ok to keep this up, so others can reference it if they come to the same issue.

As I mentioned when debugging, the error seemed to happen ConnectRPLIDAR() --> StartExpressScanRequestRPLIDAR(), then specifically here:

	// Receive the first data response (2 data responses needed for angles computation...).
	memset(pRPLIDAR->esdata_prev, 0, sizeof(pRPLIDAR->esdata_prev));
	if (ReadAllRS232Port(&pRPLIDAR->RS232Port, pRPLIDAR->esdata_prev, sizeof(pRPLIDAR->esdata_prev)) != EXIT_SUCCESS)
	{
		// Failure
		printf("A RPLIDAR is not responding correctly. \n");
		return EXIT_FAILURE;
	}

What seemed to have happened before that is after the command being send out in WriteAllRS232Port(), sometimes it would not read a response in the ReadAllRS232Port(), esdata_prev would be nothing.

We tried implementing a mSleep(500) delay before that second ReadAllRS232Port(), and it seemed to help (my guess that the lidar was slow to respond), but the issue did not get resolved fully.

The following is what made it work every time with 4 lidars:

inline int StartExpressScanRequestRPLIDAR(RPLIDAR* pRPLIDAR)
{
	unsigned char reqbuf[] = { START_FLAG1_RPLIDAR,EXPRESS_SCAN_REQUEST_RPLIDAR,0x05,0,0,0,0,0,0x22 };
	unsigned char descbuf[7];
	unsigned char sync = 0;
	unsigned char ChkSum = 0;

	// Send request to output/tx OS FIFO buffer for port
	if (WriteAllRS232Port(&pRPLIDAR->RS232Port, reqbuf, sizeof(reqbuf)) != EXIT_SUCCESS)
	{
		printf("Error writing data to a RPLIDAR. \n");
		return EXIT_FAILURE;
	}


	// Receive the response descriptor.
	memset(descbuf, 0, sizeof(descbuf));  // Alocate memory
	if (ReadAllRS232Port(&pRPLIDAR->RS232Port, descbuf, sizeof(descbuf)) != EXIT_SUCCESS)
	{
		printf("A RPLIDAR is not responding correctly. \n");
		return EXIT_FAILURE;
	}


	// Quick check of the response descriptor.
	if ((descbuf[2] != 0x54) || (descbuf[5] != 0x40) || (descbuf[6] != MEASUREMENT_CAPSULED_RESPONSE_RPLIDAR))
	{
		printf("A RPLIDAR is not responding correctly. \n");
		return EXIT_FAILURE;
	}


	// Keep anticipating a port read buffer for 1 second
	int timeout = 1500;

	// Check it every 5 ms
	// Note on Checking Period Value:
	//    Waiting on 82 bytes in lidar payload
	//    10 bits per byte for the serial communication
	//    820 bits / 256000 baud = 0.0032s = 3.2ms
	int checkingperiod = 5;

	RS232PORT* pRS232Port = &pRPLIDAR->RS232Port;
	int i;
	int count = 0;

	// Wait for something to show up on the input buffer on port
	if (!WaitForRS232Port(&pRPLIDAR->RS232Port, timeout, checkingperiod))
	{
		//Success - Something is there

		// If anything is on the input buffer, wait until there is enough
		count = 0;
		for (i = 0; i < 50; i++)
		{
			// Check the input FIFO buffer on the port
			GetFIFOComputerRS232Port(pRS232Port->hDev, &count);

			// Check if there is enough to get a full payload read
			if (count >= sizeof(pRPLIDAR->esdata_prev))
			{
				// Thre is enough, stop waiting
				break;
			}
			else
			{
				// Not enough, wait a little
				mSleep(checkingperiod);
			}
		}

	}
	else
	{
		//Failure - After waiting for an input buffer, it wasn't there
		printf("[StartExpressScanRequestRPLIDAR] : Failed to detect response on the input FIFO buffer. \n");
		return EXIT_FAILURE;
	}


	// Receive the first data response (2 data responses needed for angles computation...).
	memset(pRPLIDAR->esdata_prev, 0, sizeof(pRPLIDAR->esdata_prev));
	if (ReadAllRS232Port(&pRPLIDAR->RS232Port, pRPLIDAR->esdata_prev, sizeof(pRPLIDAR->esdata_prev)) != EXIT_SUCCESS)
	{
		// Failure
		printf("A RPLIDAR is not responding correctly. \n");
		return EXIT_FAILURE;
	}


	// Analyze the first data response.
	sync = (pRPLIDAR->esdata_prev[0] & 0xF0)|(pRPLIDAR->esdata_prev[1]>>4);
	if (sync != START_FLAG1_RPLIDAR)
	{ 
		printf("A RPLIDAR is not responding correctly : Bad sync1 or sync2. \n");
		return EXIT_FAILURE;	
	}

	ChkSum = (pRPLIDAR->esdata_prev[1]<<4)|(pRPLIDAR->esdata_prev[0] & 0x0F);
	// Force ComputeChecksumRPLIDAR() to compute until the last byte...
	if (ChkSum != ComputeChecksumRPLIDAR(pRPLIDAR->esdata_prev+2, sizeof(pRPLIDAR->esdata_prev)-1))
	{ 
		printf("A RPLIDAR is not responding correctly : Bad ChkSum. \n");
		return EXIT_FAILURE;	
	}

	return EXIT_SUCCESS;
}

So in the above code, we are waiting for the OS read FIFO buffer to show something within 1.5s, checking every 5ms (WaitForRS232Port()). If anything shows up, makes sure to wait to have enough, the size of the payload (GetFIFOComputerRS232Port()).

I'm not sure if it made a difference but we also removed the OS write FIFO buffer by changing it from 0 to FILE_FLAG_NO_BUFFERING:

File: OSComputerRS232Port.h
...
	hDev = CreateFile( 
		tstr,
		GENERIC_READ|GENERIC_WRITE,
		0, // Must be opened with exclusive-access.
		NULL, // No security attributes.
		OPEN_EXISTING, // Must use OPEN_EXISTING.
		FILE_FLAG_NO_BUFFERING, // Not overlapped I/O. Should use FILE_FLAG_WRITE_THROUGH and maybe also FILE_FLAG_NO_BUFFERING?
		NULL // hTemplate must be NULL for comm devices.
		);
...

I'm definitely not a C++ ninja, but this seems to consistently work. If you have any suggestions, I would love to hear them!

@lebarsfa
Copy link
Member

lebarsfa commented May 21, 2020

Thank you very much for sharing your fix!

So it seems to be a timeout problem...

In the configuration file RPLIDAR0.txt, there is a timeout parameter which is 1000 ms by default (you mentioned in your first message that your serial timeout was also 1000). From your fix, I would say you expect the problematic read operation might take up to 1500 ms to get the first byte, then we might have to wait up to 50x5 ms for the rest of the bytes, but in practice the mSleep() function relies on the Windows API Sleep() function which is typically with a resolution of around 16 ms (see e.g. https://stackoverflow.com/questions/9518106/winapi-sleep-function-call-sleeps-for-longer-than-expected) so it is probably up to 50x16=800 ms.

From that, I wonder if just changing the timeout parameter in RPLIDAR0.txt to something like 2500 could also solve the problem without changing anything in the C++ code. If this does not work, I would highly expect that adding WaitForRS232Port(&pRPLIDAR->RS232Port, 1500, 5) (or even just mSleep(1500), or with a bit more than 1500) alone is the real fix (the ReadAllRS232Port() is supposed to already do something quite equivalent to the loop with GetFIFOComputerRS232Port() assuming the RPLIDAR0.txt timeout is 1000)...

@ismet55555
Copy link
Author

I see what you are staying.
Currently we are using one configuration file for each LIDAR, and each of which will set the port timeout parameter to 1000ms. I did try to mess with it and set it to 1250ms, and I didn't see much difference, but now to think of it, maybe it was not enouch.

Another thing to point out that makes these delays tricky with 4 lidars is the port os buffer overflow.

So let's say you have the first LIDAR trying to connect with your MATLAB script. The library does its thing and connects and everything in about 1.8s. Now the second LIDAR tries to do the same thing within the same amount of time. In the meantime, no data has been read for the first LIDAR (no data consumed in MATLAB), potentially overflowing the FIFO buffer for that first LIDAR. And if you do this for all 4 LIDARS, those delays matter. So we tried to keep delays to a minimum specifically during the connection process.

Our workaround was to purge all connected ports for all LIDARs right after connection and everywhere in MATLAB before all 4 LIDARS are connected and ready and ready to receive real data.

I hope that makes sense.

@lebarsfa
Copy link
Member

lebarsfa commented May 21, 2020

OK, I understand the additional problem with the serial buffer overflow.

An alternate MATLAB-only workaround could be to call the ConnectXXX() functions in a parfor loop (assuming that each device takes a similar time to start, or it could be the whole test_rplidar.m code modified to save data in a file and have a fifth parallel job that reads the latest lines of the files to do whatever we want with the data of each device), but this would need Parallel Computing Toolbox (without it parfor will only run as a normal for loop in reverse order, the maximum number of jobs might be also limited to the number of cores of the CPU, see https://fr.mathworks.com/help/parallel-computing/parallel-preferences.html), here is a sample of a parfor loop:

function test
    function parallel_job(x)
        pause(x)
        disp(x)
    end

    Funcs = {@(x) parallel_job(x), @(x) parallel_job(x+1), @(x) parallel_job(x+2), @(x) parallel_job(x+3)}; 
    x = 1:4; %N number of different inputs
    parfor j = 1:4 %Have each worker process a function in parallel
        Funcs{j}(x(j));
    end
end

But the best would be that I update the C++ code so that independent threads are created internally for each device...

@ismet55555
Copy link
Author

Yeah, that's a good idea. Thank you for the suggestion and the code snippet.

We have actually thought about using the parfor loop a while back, which right now seems like a good idea to make sure the buffer doesn't overrun.

What I was thinking is to just use the parfor loop for the connection process, then merge back to the main process for the rest of the program. By the way, for the rest of the program for the process to be separate is problematic the way we have it set up.

@ismet55555
Copy link
Author

Hey just by the way, something that really helped us out was to seperate the connection of the lidar from the scan request.

That is, remove the following section and make it its own endpoint to be called by MATLAB as (ie. HARDWAREX_API int StartExpressScanRequestRPLIDARx(RPLIDAR* pRPLIDAR);)

	// Start scan request (Takes the longest typically)
	memset(pRPLIDAR->esdata_prev, 0, sizeof(pRPLIDAR->esdata_prev));
	switch (pRPLIDAR->ScanMode)
	{
	case SCAN_MODE_RPLIDAR:
		if (StartScanRequestRPLIDAR(pRPLIDAR) != EXIT_SUCCESS)
		{
			printf("Unable to connect to a RPLIDAR : SCAN failure.\n");
			SetMotorPWMRequestRPLIDAR(pRPLIDAR, 0);
			CloseRS232Port(&pRPLIDAR->RS232Port);
			return EXIT_FAILURE;
		}
		break;
	case EXPRESS_SCAN_MODE_RPLIDAR:
		if (StartExpressScanRequestRPLIDAR(pRPLIDAR) != EXIT_SUCCESS)
		{
			printf("Unable to connect to a RPLIDAR : EXPRESS_SCAN failure.\n");
			SetMotorPWMRequestRPLIDAR(pRPLIDAR, 0);  // Stop Lidar
			CloseRS232Port(&pRPLIDAR->RS232Port);    // Close the port
			return EXIT_FAILURE;
		}
		break;
	case FORCE_SCAN_MODE_RPLIDAR:
		if (StartForceScanRequestRPLIDAR(pRPLIDAR) != EXIT_SUCCESS)
		{
			printf("Unable to connect to a RPLIDAR : FORCE_SCAN failure.\n");
			SetMotorPWMRequestRPLIDAR(pRPLIDAR, 0);
			CloseRS232Port(&pRPLIDAR->RS232Port);
			return EXIT_FAILURE;
		}
		break;
	default:
		if (StartScanRequestRPLIDAR(pRPLIDAR) != EXIT_SUCCESS)
		{
			printf("Unable to connect to a RPLIDAR : SCAN failure.\n");
			SetMotorPWMRequestRPLIDAR(pRPLIDAR, 0);
			CloseRS232Port(&pRPLIDAR->RS232Port);
			return EXIT_FAILURE;
		}
		break;
	}

This way, when using multiple LIDARS, you can connect all LIDARS using ConnectRPLIDAR, set everything up, then request the scan with StartExpressScanRequestRPLIDAR, and quickly jump into DAQ.

As opposed to initially, the remaining LIDARS have to wait for the connection (~0.6s) and scan request (~1s). This way the remaining LIDARS only accumulate the read FIFO for ~1s.

@ismet55555
Copy link
Author

Question. Have you ever encountered the issue where a LIDAR needs to be unplugged and plugged back into the USB port? Where it will not even bind to the port when trying to connect.

Further, if you are using the simple frame_grabber.exe provided by SLAMTEC, the message of Cannot bind to port appears for the locked up LIDAR.

Resetting the port or even restarting the computer, does not get it out of its locked state until the LIDAR is unplugged and plugged back in. What do you think the issue there is and what would you suggest to do about that?

@lebarsfa
Copy link
Member

lebarsfa commented Jun 5, 2020

OK, thank you for the suggestion, I will consider integrating it next time I update this.

I have not used personally a RPLIDAR since quite a long time, maybe I got also that error message but did not try to understand why since this can probably happen for a vast number of reasons. From what you describe, I would say it could be the device itself that has its firmware crashing, where only cutting the power of its USB port would reset the crashing part of the firmware. The power from the USB ports might not be cut when we restart, while it might be cut when we shutdown the computer, also some computers might have 1 (or sometimes all) of their USB port that is always powered even when shut down (e.g. for the purpose of charging smartphones), and all of this can be probably configured in the BIOS or other computer-specific tool.... Also the fast startup option of Windows might change the behavior of a shut down (when enabled, shutting down might be a kind of "half hibernation/sleep" so not everything might be reset...)...

@lebarsfa
Copy link
Member

lebarsfa commented Jul 2, 2020

I added the option bStartScanModeAtStartup in RPLIDAR0.txt and related functions in MATLAB, now it should be possible to do manually what was suggested in #2 (comment).
Additionally, now the functions use SLAMTEC SDK.
Also, I noticed I got several BSOD when using my RPLIDAR A3, I will see if upgrading the driver from https://www.silabs.com/products/development-tools/software/usb-to-uart-bridge-vcp-drivers will help...

@ismet55555
Copy link
Author

Oh nice! Very helpful indeed. Thanks!
Do you know if all this would work for the RPLIDAR S1?
I've been trying to port our lidar code from the A3 to S1, however, I'm getting some odd readings.
Seems to be working seamlessly with the SLAMTEC frame grabber application. Looking at the SDK frame grabber source code, it's hard for me to navigate from what you put together and their SDK source code.

@lebarsfa
Copy link
Member

lebarsfa commented Jul 2, 2020

Since now the MATLAB code calls SLAMTEC SDK functions internally, there might be a chance that it works with RPLIDAR S1 if it supports standard scan or express scan modes...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants