Saturday, 5 May 2012

Debugging the eyes

The cameras are mounted on a head, they're talking to the Arduino, it's talking to the blue tooth, the blue tooth is talking to a PC and the PC is displaying stereo images on screen:

Stereo image from MmBot's eyes
Unfortunately I'm still hitting an issue where after a few photos, something goes wrong and the PC stops getting data from the blue tooth. This post is gonna be another one of the debugging ones - I'll document my progress working out the issue as I work it out so it's gonna be rambling :)

At the moment, I have some basic code on the PC that:
  • Requests an image be taken
  • Waits until it has an image size
  • Repeatedly reads content until image is finished
On the Arduino side I've added some debug prints. The first bit of logging looks like this:

Requesting camera take picture
Unprocessed bytes: 5: 0x76 0x00 0x36 0x00 0x00 
Camera picture taken
Requesting camera file size
Camera file size received: 6244
Requesting camera take picture
Unprocessed bytes: 5: 0x76 0x00 0x36 0x00 0x00 
Camera picture taken
Requesting camera file size
Camera file size received: 6824

Here the PC has requested a picture from each eye, and then requested the file size. The unprocessed bytes are from the camera system, which show it's processing commands other than 'get content', so doesn't need to read the response. Here the unprocessed bytes are the response from the take picture request.

Next up, the Arduino starts printing out a log of what's being sent to the pc:

L[0] sz=32, strm=384
R[0] sz=32, strm=416
L[32] sz=32, strm=608
R[32] sz=32, strm=640
L[64] sz=32, strm=832
R[64] sz=32, strm=832
L[96] sz=32, strm=1024
R[96] sz=32, strm=1024
L[128] sz=32, strm=1024
R[128] sz=32, strm=1024
L[160] sz=32, strm=1024
R[160] sz=32, strm=1024

This shows the cameras (L or R), the address in the image being sent inside '[ ]', the size of the chunk, and how full the stream buffer is. Currently I can read from the camera faster than I can send to the PC, so you see the buffers gradually fill up to 1024 bytes (the total buffer size).

After a while, a camera will reach the end of the image, and it's stream buffer will start emptying as the PC reads the remaining bytes:

R[5184] sz=32, strm=1024
L[5216] sz=32, strm=1024
Unprocessed bytes: 5: 0x76 0x00 0x36 0x00 0x00 
R[5216] sz=32, strm=1024
L[5248] sz=32, strm=1023
R[5248] sz=32, strm=1024
L[5280] sz=32, strm=991
R[5280] sz=32, strm=1024

Here you can see the 'unprocessed bytes' from the 'stop picture' command that the left camera received when the left image was complete. Following that, the left stream buffer begins emptying.

A little while later, the right camera does the same:

L[5792] sz=32, strm=479
R[5792] sz=32, strm=1024
Unprocessed bytes: 5: 0x76 0x00 0x36 0x00 0x00 
L[5824] sz=32, strm=447
R[5824] sz=32, strm=1011
L[5856] sz=32, strm=415
R[5856] sz=32, strm=979

And eventually both buffers, empty, the PC shows the image, and the process repeats:

R[6784] sz=32, strm=51
R[6816] sz=19, strm=19
Requesting camera take picture
Unprocessed bytes: 5: 0x76 0x00 0x36 0x00 0x00 
Camera picture taken

OK, potentially not too interesting overview, but important to understanding what's going on. In this example everything went fine and I acquired an image similar to the one at the top.

Now it's been going wrong in a few different ways, all of which end with the PC waiting for data that isn't coming. Whether it's all the same problem just showing different symptoms, or several problems I don't yet know. Here's an example of something going wrong:

R[7232] sz=32, strm=102
R[7264] sz=32, strm=70
R[7296] sz=32, strm=38
Requesting camera take picture                        <--- succesfully requested the left camera take a picture
Unprocessed bytes: 5: 0x76 0x00 0x36 0x00 0x00 
Camera picture taken
Requesting camera file size
Camera file size received: 7828                        <--- got the file size
L[0] sz=32, strm=352                                <--- begin reading left stream
R[7328] sz=6, strm=6                                <--- eh? right feed still reading???
L[32] sz=32, strm=544                                
L[64] sz=32, strm=736
L[96] sz=32, strm=928
L[128] sz=32, strm=1024
L[160] sz=32, strm=1024
L[192] sz=32, strm=1024

What appears to have happened is the PC assumed the right image was finished before it was. I'm guessing the series of events went:
  • PC continues reading left/right images until it has got all data from both
  • PC then requests a new picture be taken from both cameras
  • For some reason, the assumption that the right camera was finished was incorrect!
  • As a result, the left state machine restarts as normal, but the right state machine does nothing, as it's still in the 'read content' state
  • The PC now requests file sizes from both cameras. It gets the new size from the left camera, but the size of the previous image (as the Arduino hasn't finished yet) from the right image
  • We now start reading content. the left works as normal - starting at address 0 and filling up the buffer. However the right simply reads the final few bytes, and then, having not got a command to take another picture, simply stops.
  • Finally, the PC reads the entire left image, then sits there waiting for the right image to come through - which isn't happening as it was never sent!
Phew! So, err, why? Well, first I do a few more tests to see if the symptoms change. Previously I've seen 2 other symptoms - sometimes the stream values seem to go complete corrupt (and I start reading from position 1827361928), and occasionally a byte just seems to get lost - the pc receives 63 bytes when it was supposed to get 64. After a few tests I see both problems occur again.

This could be to do with loss of data over blue tooth, but some of these problems stink of a logic bug somewhere - especially the first one. I make a couple of tweaks to the PC code:
  • I make sure it waits until the message comes back that the 'start picture' was succesful - my Arduino code already supported this (by sending back 1 or 0), so I may as well use it
  • I print out the image sizes it's expecting
The PC quickly gets stuck after a couple of images, failing to start taking the right photo. Basically the first problem I mentioned has occured, but I've caught it early. I check the print outs and the right camera was expecting 6176 bytes, which the Arduino agrees with. The last block requested and sent to the camera was at address 6144, of size 32 bytes, but there were 38 in the buffer. Clearly the PC made the right call - it got exactly 6176 bytes and then moved on to the next shot. So where did the extra 6 bytes come from?

Then I notice a really scary line in the Arduino output:

R[5152] sz=32, strm=1030

My 1024 byte stream buffer has 1030 bytes in it. Oh dear. It doesn't matter what the blue tooth is doing - there is no way this should ever happen if the code is correct. This explains the getting stuck due to 6 bytes, and also explains the corrupt data - memory trampling.  It's time to revisit my stream filling and see how on earth this can happen.

At first glance, I can see a line of code that would cause the problem if something else had gone wrong:


          //this is the main content bit
          //first, we bail out if we haven't yet sent the remote the data that is in the picture buffer
          if(PictureStreamUsed >= PICTURE_STREAM_SIZE)
            break;

This 'bail out' is fine provided you always read the same sized chunks (except the very last one), and PICTURE_STREAM_SIZE is divisible by the chunk size. In theory I do exactly this - only getting 32 bytes at a time. However in order for my stream to work I must ensure that:
  • The 'PictureStreamWrite' is always a multiple of 32, and never goes above 1024-32
  • The 'PictureStreamUsed' is always a multiple of 32, and never goes above 1024
If any chunk other than the last one is not 32 bytes, this code will fail. I could change the above code to handle it, but that'd just be hiding the deeper problem. Why am I getting chunks that aren't 32 bytes? First thing to do - print out what I am getting and see!

Requesting 32 bytes to camera stream at 992
Received 32 bytes to camera stream at 992
L[5120] sz=32, strm=1024
Requesting 32 bytes to camera stream at 0                     <----- left requests 32 bytes
Received 31 bytes to camera stream at 0                       <----- left gets 31 bytes back - finished?
Requesting 32 bytes to camera stream at 31                    <----- no! left requests 32 bytes again
Received 7 bytes to camera stream at 31                       <----- left gets 7 bytes back!
R[5120] sz=32, strm=1024
Requesting 32 bytes to camera stream at 0
Received 32 bytes to camera stream at 0
L[5152] sz=32, strm=1030
Unprocessed bytes: 5: 0x76 0x00 0x36 0x00 0x00 
.........
Requesting 32 bytes to camera stream at 320
Received 32 bytes to camera stream at 320
L[5472] sz=32, strm=710
R[5472] sz=32, strm=1024
Requesting 32 bytes to camera stream at 352                  <---- right requests 32 bytes
Received 11 bytes to camera stream at 352                    <---- right gets 11 bytes back - finished?
Requesting 32 bytes to camera stream at 363                  <---- no! right requests 11 bytes again
Received 3 bytes to camera stream at 363                     <---- right gets 3 bytes back
Unprocessed bytes: 5: 0x76 0x00 0x36 0x00 0x00 

Well, it seems my assumption that the camera would keep giving me the data I asked for until there was none left was incorrect. There's 2 chunks at the end of each image that are of none-32 byte multiples. This is already bad and explains corruption, but in theory, provided I end up with the right amount of data, it should work most of the time. Unless....

Requesting 32 bytes to camera stream at 928
Received 23 bytes to camera stream at 928 total read 7095 of 7096
Requesting 32 bytes to camera stream at 951
Received 7 bytes to camera stream at 951 total read 7102 of 7096
Unprocessed bytes: 5: 0x76 0x00 0x36 0x00 0x00 
R[6048] sz=32, strm=1024

Yes. It appears the camera happily returns more data than is actually in the image, and the code they suggest to mark the actual end (detect 0xff,0xd9) doesn't actually... well... work properly. I guess at this point I'd expect no less from this camera. Can you tell it's not my favourite thing in the world?

So there's 2 problems to handle:
  • The fact that I get additional bytes
  • The none-32 byte results.
The first issue is trivial. I'm already recording how many bytes have been sent, so I simply change the wait condition at the end of the camera state machine to be:

          //finally, wait for remote to drain the camera stream
          if(PictureSendAddress < PictureSize)
            break;

And for good measure, adjust the command code to avoid sending extra data:

          //clamp the amount to discard extra bytes at the end of the image
          if( (CS1.PictureSendAddress+bytes_to_send) > CS1.PictureSize )
          {
            bytes_to_send = CS1.PictureSize - CS1.PictureSendAddress;
          }

Things are much more stable now, and the code gets loads of pictures. Now all I have to deal with is the much less common issue of data corruption. This only occurs if the none-32 byte chunks arrive right at the end of the stream. This diagram shows the problem (it's got 8 byte chunks, not 32 byte, but the principle is the same):


Each row shows the buffer filling up, initially in full sized chunks nice and uniformly. However on row 4 we get a smaller chunk back, leaving a gap at the end. This stops the write pointer wrapping round. Now if the data stopped there this would be fine, but the image is not finished and another chunk is requested. The chunk is not full size, but it does go off the end of the buffer, corrupting any data that follows it. As it happens, the data following the stream buffers in my code is the image size and stream information, which explains why I occasionally end up reading from insane stream positions.

So what to do? Well first, I need to make the problem happen again. I've not seen it since the last fix which is mildly concerning - have I accidently made it go away? If so I need to know why. If not, I need to fix it. The issue occurs due to a buffer overrun, so I can make it more common by reducing the buffer size. I shrink it down to 96 bytes (from 1024). This means there's a 1 in 3 chance that a read will be at the end of the buffer. It takes a few photos to occur, but eventually I get:

Requesting 32 bytes to camera stream at 64
Received 15 bytes to camera stream at 64 total read 6319 of 6320
Requesting 32 bytes to camera stream at 79
Received 7 bytes to camera stream at 79 total read 3732081255 of 1000798054
Unprocessed bytes: 5: 0x76 0x00 0x36 0x00 0x00 

The read at byte 79 can cause up to 32 bytes to arrive (even if the camera api only gives you back 7, more can still be written). As a result, the stream buffer could be written up to byte 114, and it's only 96 bytes in size. The question is, do I actually need those extra bytes? i.e. The final 2 reads (the ones that are none-uniform) can be up to 64 bytes in total, but it's conceivable that I only need the first 32 bytes of them. It sounds weird, but print outs like this make me think it's possible:

Received 31 bytes to camera stream at 64 total read 7103 of 7104
Received 7 bytes to camera stream at 95 total read 7110 of 7104

Look carefully and you'll see the first none-uniform chunk is 31 bytes in size (1 byte off a full chunk), and we end up with 7103 bytes of a 7104 byte image. All I need is 1 byte to finish the image, and all I need is 1 byte to finish the chunk. This pattern seems to repeat - they almost always match up, and even when they don't, I never end up needing more than 1 full chunk to finish the image. If this is the case, I can simply extend the stream buffer to have 32 bytes of padding in, and allow it to overflow. I decide to try and see what happens, as this will make life much easier! You can see the small change here:

    byte PictureBuffer[PICTURE_STREAM_SIZE+32];

This seems to work and my images come through fine. The whole system is much more stable now, and head upstairs for a cup of tea. When I come back, it's still taking pictures happily!

However I'm still getting occasional image corruption, and once or twice the pc code stops responding, waiting for data that doesn't come. My gut says this is due to overloading the blue tooth and I recall reading a small note in the modem docs about wiring up the ready-to-receive and ready-to-send pins being advisable. This is because blue tooth is wireless, so isn't a perfectly stable (in terms of timing) connection, and I could easily end up overloading the transmit buffer. To test this, I remove a lot of the serial print outs (which always slow things town), increase the stream buffer, and turn up the rate at which I attempt to feed data back to the PC. With both commands set to transmit up to 256 bytes back to PC at a time and a stream buffer of 1024 bytes I start getting print outs like this:

Unprocessed bytes: 5: 0x76 0x00 0x36 0x00 0x00 
Camera picture taken
Requesting camera file size
Camera file size received: 7200
L[0] sz=256, strm=288
R[0] sz=256, strm=320
L[256] sz=256, strm=256
R[256] sz=256, strm=320
L[512] sz=256, strm=288
R[512] sz=256, strm=288
L[768] sz=256, strm=256
R[768] sz=256, strm=288
L[1024] sz=256, strm=256
R[1024] sz=256, strm=288
L[1280] sz=256, strm=256
R[1280] sz=256, strm=288

What you see here is very good! The stream buffer is no longer filling up, meaning I am transmitting to PC at the same rate as I receive from the cameras - exactly what you want from a streaming system, and I'm back to my 1 frame every 5 to 10 seconds rate. It's still very stable, and I'm tempted to say I'll leave the fancy blue tooth flow control for another day, and simply make my PC code more robust against lost data. Not by getting all clever with none-blocking code - simply by making it time out if data hasn't arrived, and tweak the PC+MmBot code so it is able to give up and restart a picture.

Wow! Eyes now work. Given image processing will be limited with this version (until I can do it onboard - either on a Raspberry Pie or some other more hardcore processor), I think I'll say that's as good as the camera feed is getting for now. 0.1fps isn't great, but I can still do stereo imaging of the background and basic face recognition.

Next up I'll get some nice controls in the PC app for driving MmBot around manually and getting the camera feed. Then I'll add some infra-red range finders and ultra sound to the equation and get MmBot wondering randomly around the office, taking pictures and feeding them back. Not long before the first hints of autonomy!

-Chris


No comments:

Post a Comment