Sunday, 27 October 2013

GPU Accelerated Camera Processing On The Raspberry Pi

Hallo!

Over the past few days I've been hacking away at the camera module for the raspberry pi. I made a lot of headway creating a simple and nice api for the camera which is detailed here:

http://robotblogging.blogspot.co.uk/2013/10/an-efficient-and-simple-c-api-for.html

However I wanted to get some real performance out of it and that means GPU TIME! Before I start explaining things, the code is here:

http://www.cheerfulprogrammer.com/downloads/picamgpu/picam_gpu.zip

Here's a picture:



And a video of the whole thing (with description of what's going on!)




The api I designed could use mmal for doing colour conversion and downsampling the image but it was pretty slow and got in the way of opengl. However, I deliberately allowed the user to ask the api for the raw YUV camera data. This is provided as a single block of memory, but really contains 3 separate grey scale textures - one containing the 'luminosity' (Y) and another 2 that contain information to specify the colour of a pixel:



I make a few tweaks to my code to generate these 3 textures:

        //lock the chosen frame buffer, and copy it into textures
        {
            const uint8_t* data = (const uint8_t*)frame_data;
            int ypitch = MAIN_TEXTURE_WIDTH;
            int ysize = ypitch*MAIN_TEXTURE_HEIGHT;
            int uvpitch = MAIN_TEXTURE_WIDTH/2;
            int uvsize = uvpitch*MAIN_TEXTURE_HEIGHT/2;
            int upos = ysize;
            int vpos = upos+uvsize;
            ytexture.SetPixels(data);
            utexture.SetPixels(data+upos);
            vtexture.SetPixels(data+vpos);
            cam->EndReadFrame(0);
        }

And write a very simple shader to convert from yuv to rgb:

varying vec2 tcoord;
uniform sampler2D tex0;
uniform sampler2D tex1;
uniform sampler2D tex2;
void main(void) 
{
    float y = texture2D(tex0,tcoord).r;
    float u = texture2D(tex1,tcoord).r;
    float v = texture2D(tex2,tcoord).r;

    vec4 res;
    res.r = (y + (1.370705 * (v-0.5)));
    res.g = (y - (0.698001 * (v-0.5)) - (0.337633 * (u-0.5)));
    res.b = (y + (1.732446 * (u-0.5)));
    res.a = 1.0;

    gl_FragColor = clamp(res,vec4(0),vec4(1));
}


Now I simply run the shader to read in the 3 yuv textures, and write out an rgb one, ending up with this little number:


Good hat yes? Well, hat aside, the next thing to do is provide downsamples so we can run image processing algorithms at different levels. I don't even need a new shader for that, as I can just run the earlier shader, but aiming it at successively lower resolution textures. Here's the lowest one now:


The crucial thing is that in opengl you can create a texture, and then tell it to also double as a frame buffer using the following code:

bool GfxTexture::GenerateFrameBuffer()
{
    //Create and bind a new frame buffer
    glGenFramebuffers(1,&FramebufferId);
    check();
    glBindFramebuffer(GL_FRAMEBUFFER,FramebufferId);
    check();

    //point it at the texture (the id passed in is the Id assigned when we created the open gl texture)
    glFramebufferTexture2D(GL_FRAMEBUFFER,GL_COLOR_ATTACHMENT0,GL_TEXTURE_2D,Id,0);
    check();

    //cleanup
    glBindFramebuffer(GL_FRAMEBUFFER,0);
    check();
    return true;
}

Once you have a texture as a frame buffer you can set it to be the target to render to (don't forget to set the viewport as well):

        glBindFramebuffer(GL_FRAMEBUFFER,render_target->GetFramebufferId());
        glViewport ( 0, 0, render_target->GetWidth(), render_target->GetHeight() );
        check();


And also use the read pixels function to read the results back to cpu (which I do here to save to disk using the lodepng library):

void GfxTexture::Save(const char* fname)
{
    void* image = malloc(Width*Height*4);
    glBindFramebuffer(GL_FRAMEBUFFER,FramebufferId);
    check();
    glReadPixels(0,0,Width,Height,IsRGBA ? GL_RGBA : GL_LUMINANCE, GL_UNSIGNED_BYTE, image);
    check();
    glBindFramebuffer(GL_FRAMEBUFFER,0);

    unsigned error = lodepng::encode(fname, (const unsigned char*)image, Width, Height, IsRGBA ? LCT_RGBA : LCT_GREY);
    if(error) 
        printf("error: %d\n",error);

    free(image);
}

These features give us a massive range of capability. We can now chain together various shaders to apply multiple levels of filtering, and once the gpu is finished with them the data can read to the cpu and fed into image processing applications such as opencv. This is really handy, as algorithms such as object detection often have to do costly filtering before they can operate. Using the gpu as above we can avoid the cpu needing to do the work.

Thus far I've written the following filters:

  • Gaussian blur
  • Dilate
  • Erode
  • Median
  • Threshold
  • Sobel
Here's a few of them in action:
 

Enjoy!

p.s. my only annoyance right now is that I still have to go through the cpu to get my data from mmal and into opengl. If anyone knows a way of getting from mmal straight to opengl that'd be super awesome!

pp.s. right at the end, here's a tiny shameless advert for my new venture - http://www.happyrobotgames.com/no-stick-shooter. If you like my writing, check out the dev blog for regular updates on my first proper indie title!

Saturday, 26 October 2013

An Efficient And Simple C++ API for the Rasperry Pi Camera Module

For the past few days I've been messing around with my new raspberry pi camera modules (see earlier blog posts for excessive details) and part of that has involved putting together a nice and easy to use api to access the camera in c++ and read its frames. This post is a guide to installation, an overview of the very simple api and a description of the sample application.



One word of caution - as with any tinkering there is always a chance something will go wrong and result in a dead pi. If this worries you, back up first. I didn't bother, but I didn't have anything on there I was worried about losing!

Installation

Make sure you're on a recent Raspberry Pi build, and have a working Camera!

I'm assuming at this point you've got a camera module and it's working. If you've not set it up yet you may need to update your raspberry pi (depends when you bought it). I won't go over this process as it's been described 100 times already, but here's a link to get you going just in case:

http://www.raspberrypi.org/archives/3890

Once all is up and running type:

raspivid -t 10000

That should show you the raspberry pi video feed on screen for 10 seconds.

Get CMake

If you haven't already got it, you'll need cmake for building just about anything:

sudo apt-get install cmake

Download and install the latest 'userland-master'

This is the bit of the raspberry pi OS that contains the code for the camera applications and the various libraries they use. At time of writing it isn't supplied as part of the install, so you need to download, build and install it manually. To do so:

Download the latest userland-master.zip from here

Unzip it into your /opt/vc directory. You should now have a folder called /opt/vc/userland-master with various folders in it such as "host_applications" and "interfaces".

Change to the /opt/vc/userland-master folder, then build it with the following commands:

sudo mkdir build
cd build
sudo cmake -DCMAKE_BUILD_TYPE=Release ..
sudo 
make

sudo make install

Test everything worked by running raspivid again. You may see some different messages pop up (I got some harmless errors probably due to the build being so recent), but the crucial thing is that you still get the video feed on screen.

Download and build the PiCam API/Samples

The api and samples can all be downloaded here:
http://www.cheerfulprogrammer.com/downloads/picamtutorial/picamdemo.zip

Extract them into a folder in your home directory called 'picamdemo'. You should have a few cpp files in there, plus a make file and some shaders. 

Change to the folder and build the application with:

cmake .
make

Then run the sample with

./picamdemo

If all goes well you should see some text like this:
Compiled vertex shader simplevertshader.glsl:
<some shader code here>

Compiled fragment shader simplefragshader.glsl:
<some shader code here>

mmal: mmal_vc_port_parameter_set: failed to set port parameter 64:0:ENOSYS
mmal: Function not implemented
Init camera output with 512/512
Creating pool with 3 buffers of size 1048576
Init camera output with 256/256
Creating pool with 3 buffers of size 262144
Init camera output with 128/128
Creating pool with 3 buffers of size 65536
Init camera output with 64/64
Creating pool with 3 buffers of size 16384
Camera successfully created
Running frame loop

And your tv should start flicking between various resolutions of the camera feed like this:



(Edit - I've had some reports of the blogger you-tube link not working. You can see the full video here on proper you tube: http://www.youtube.com/watch?v=9bWJBSNxeXk) The API (and what it does!)

PiCam is designed to be very simple but also useful for image processing algorithms. Right now it lets you:
  • Start up the camera with a given width, height and frame rate
  • Specify a number of 'levels'. More on that later.
  • Choose whether to automatically convert the camera feed to RGBA format
Basic Initialisation

All this is done just by calling StartCamera and passing in the right parameters. It returns a pointer to a CCamera object as follows:

CCamera* mycamera = StartCamera(512,512,30,1,true);

That's a 512x512 image at 30hz, with 1 level and rgba conversion enabled.

Reading

Once started you can extract frames from the camera by calling ReadFrame and passing in a buffer:

char mybuffer[512*512*4]
mycamera->ReadFrame(0,mybuffer,sizeof(mybuffer));

ReadFrame will return the number of bytes actually read, or -1 if there was an error. An error occurs either when there is no data available or your buffer is not large enough.

In addition to ReadFrame there are 2 functions: BeginReadFrame and EndReadFrame. These slightly more advanced versions are shown in the demo, and allow you to be more efficient by locking the actual camera buffer, using it, then releasing it. Internally ReadFrame is implemented using these functions.

Shutting down

Once done, call 'StopCamera'

Levels

In image processing it is often useful to have your data provided at different resolutions. Expensive operations need to be performed on low res images to run at a good frame rate, but you may still want higher res versions around for other operations or even just showing on screen. The PiCam api will do this for you automatically (for up to 3 additional levels). If we modify the StartCamera call to this:

CCamera* mycamera = StartCamera(512,512,30,4,true);

The system will automatically generate the main image plus an additional 3 down-sampled ones (at half res, quarter res and 1/8th res). These are then accessed by specifying a level other than 0 in the call to ReadFrame (or BeginReadFrame):

mycamera->ReadFrame(0,mybuffer,sizeof(mybuffer)); //get full res frame
mycamera->ReadFrame(1,mybuffer,sizeof(mybuffer)); //get half res frame
mycamera->ReadFrame(2,mybuffer,sizeof(mybuffer)); //get quarter res frame
mycamera->ReadFrame(3,mybuffer,sizeof(mybuffer)); //get 1/8th res frame

RGBA Conversions

For most purposes you'll want the data in a nice friendly RGBA format, however if you actually want the raw YUV data feed from the camera, specify false as the last parameter to StartCamera and no conversions will be done for you.

The demo application

The picamdemo application consists of the core camera code as these files:
  • camera.h/camera.cpp
  • cameracontrol.h/cameracontrol.cpp
  • mmalincludes.h
A very simple opengl graphics api (which you are welcome to use/modify/change in any way you please):
  • graphics.h/graphics.cpp
And the main demo app itself:
  • picam.cpp
Which looks like this:

#include <stdio.h>
#include <unistd.h>
#include "camera.h"
#include "graphics.h"

#define MAIN_TEXTURE_WIDTH 512
#define MAIN_TEXTURE_HEIGHT 512

char tmpbuff[MAIN_TEXTURE_WIDTH*MAIN_TEXTURE_HEIGHT*4];

//entry point
int main(int argc, const char **argv)
{
    //should the camera convert frame data from yuv to argb automatically?
    bool do_argb_conversion = true;

    //how many detail levels (1 = just the capture res, >1 goes down by halves, 4 max)
    int num_levels = 4;

    //init graphics and the camera
    InitGraphics();
    CCamera* cam = StartCamera(MAIN_TEXTURE_WIDTH, MAIN_TEXTURE_HEIGHT,30,num_levels,do_argb_conversion);

    //create 4 textures of decreasing size
    GfxTexture textures[4];
    for(int texidx = 0; texidx < num_levels; texidx++)
        textures[texidx].Create(MAIN_TEXTURE_WIDTH >> texidx, MAIN_TEXTURE_HEIGHT >> texidx);

    printf("Running frame loop\n");
    for(int i = 0; i < 3000; i++)
    {
        //pick a level to read based on current frame (flicking through them every 30 frames)
        int texidx = (i / 30)%num_levels;

        //lock the chosen buffer, and copy it directly into the corresponding texture
        const void* frame_data; int frame_sz;
        if(cam->BeginReadFrame(texidx,frame_data,frame_sz))
        {
            if(do_argb_conversion)
            {
                //if doing argb conversion just copy data directly
                textures[texidx].SetPixels(frame_data);
            }
            else
            {
                //if not converting argb the data will be the wrong size so copy it in
                //via a temporary buffer just so we can observe something happening!
                memcpy(tmpbuff,frame_data,frame_sz);
                textures[texidx].SetPixels(tmpbuff);
            }
            cam->EndReadFrame(texidx);
        }

        //begin frame, draw the texture then end frame (the bit of maths just fits the image to the screen while maintaining aspect ratio)
        BeginFrame();
        float aspect_ratio = float(MAIN_TEXTURE_WIDTH)/float(MAIN_TEXTURE_HEIGHT);
        float screen_aspect_ratio = 1280.f/720.f;
        DrawTextureRect(&textures[texidx],-aspect_ratio/screen_aspect_ratio,-1.f,aspect_ratio/screen_aspect_ratio,1.f);
        EndFrame();
    }

    StopCamera();
}


That's the full code for exploiting all the features of the api. It is designed to loop through each detail level and render them in turn. At the top of the main function you will find a couple of variables to enable argb or change level count, and higher up you can see the frame size settings.

Questions? Problems? Comments?

I'm happy to answer any questions, hear any comments, and if you hit issues I'd like to fix them. Either comment on this blog or email me (wibble82@hotmail.com) with a sensible subject like 'pi cam problem' (so it doesn't go into the junk mail box!).

p.s. right at the end, here's a tiny shameless advert for my new venture - http://www.happyrobotgames.com/no-stick-shooter. If you like my writing, check out the dev blog for regular updates on my first proper indie title!

Pi Eyes Stage 6

Right, we've got all kinds of bits working but there's another ingredient I need before the system is just about 'functional'. For image processing I need the camera feed at multiple resolutions, so I can do cheap processing operations on high res feeds, and expensive ones on low res feeds. To do this I use the video splitter component, and have reworked my camera api to:
  • Create 4 separate outputs, each with its own resizer that does the RGB conversion but generates a different resolution.
  • Output 0 = full res, output 1 = half res etc
  • You still use ReadFrame or Begin/EndReadFrame, but now you pass in a 'level' as well
  • Internally the camera code has become a bit more complex to handle this multi output system but it's mostly just rearranging code.
I won't go into the code here as it was lots of tweaks all over the place and not easy to write. Here is a nice image of 2 of the outputs to make it more clear:


As you can see, in the top image I am at full resolution, however in the lower one it's displaying me at (in this case) 1/8th of the upper resolution. Just to demonstrate it is actually getting all the feeds (and the above isn't just from running the app twice!), this video shows it flicking between them live:




Here's the actual application code for the program above:

//entry point
int main(int argc, const char **argv)
{
    printf("PI Cam api tester\n");
    InitGraphics();
    printf("Starting camera\n");
    CCamera* cam = StartCamera(MAIN_TEXTURE_WIDTH, MAIN_TEXTURE_HEIGHT,15);

    //create 4 textures of decreasing size
    GfxTexture textures[4];
    for(int texidx = 0; texidx < 4; texidx++)
        textures[texidx].Create(MAIN_TEXTURE_WIDTH >> texidx, MAIN_TEXTURE_HEIGHT >> texidx);

    printf("Running frame loop\n");
    for(int i = 0; i < 3000; i++)
    {
        //pick a level to read based on current frame (flicking through them every second)
        int texidx = (i / 30)%4;

        //lock the chosen frame buffer, and copy it directly into the corresponding open gl texture
        const void* frame_data; int frame_sz;
        if(cam->BeginReadFrame(texidx,frame_data,frame_sz))
        {
            textures[texidx].SetPixels(frame_data);
            cam->EndReadFrame(texidx);
        }

        //begin frame, draw the texture then end frame
        BeginFrame();
        DrawTextureRect(&textures[texidx],-0.9f,-0.9f,0.9f,0.9f);
        EndFrame();
    }

    StopCamera();
}

Note the really crucial point is that my app above is just reading 1 of the levels each frame, however they are all available every frame, so if I chose (and the cpu was available) I could do something with every level. That's really key and undoubtedly what I'll need going forwards. Code for the whole thing is here:

http://www.cheerfulprogrammer.com/downloads/pi_eyes_stage6/picam_multilevel.zip

In terms of frame rate it has suffered unfortunately. That image resizer really seems to chew through frame time for some reason. Maybe there's lots of copies going on or something else funky, but it is going disappointingly slow. At 1280x720 the frame rate is probably worse than 10hz when reading the hi res data.

Next up I reckon I'll clean up the api a little - give the user options as to what to enable/disable and make sure everything shuts down right and add a post with a little tutorial on its use. Once that's done, onto gpu acceleration land....

Pi Eyes Stage 5

I'm making real progress now getting the camera module simpler and more efficient. My next goal is to rework the camera API to be a more synchronous process (no more callbacks) where the user can simply call 'ReadFrame' to get the next frame.

A Simple Syncronous API

The first step turned out to be pretty simple thanks to the 'queue' structure in mmal. I simply create my own little queue called 'OutputQueue' and change the internal camera callback to be:

void CCamera::OnVideoBufferCallback(MMAL_PORT_T *port, MMAL_BUFFER_HEADER_T *buffer)
{
    //first, add the buffer to the output queue
    mmal_queue_put(OutputQueue,buffer);
}

That code used to lock the buffer, call a callback, then return it to the port for recycling. However now it just pushes the buffer into an output list for processing by the user. Next up, I add a 'ReadFrame' function:

int CCamera::ReadFrame(void* dest, int dest_size)
{
    //default result is 0 - no data available
    int res = 0;

    //get buffer
    if(MMAL_BUFFER_HEADER_T *buffer = mmal_queue_get(OutputQueue))
    {
        //check if buffer has data in
        if(buffer->length)
        {
            //got data so check if it'll fit in the memory provided by the user
            if(buffer->length <= dest_size)
            {
                //it'll fit - yay! copy it in and set the result to be the size copied
                mmal_buffer_header_mem_lock(buffer);
                memcpy(dest,buffer->data,buffer->length);
                mmal_buffer_header_mem_unlock(buffer);
                res = buffer->length;
            }
            else
            {
                //won't fit so set result to -1 to indicate error
                res = -1;
            }
        }

        // release buffer back to the pool from whence it came
        mmal_buffer_header_release(buffer);

        // and send it back to the port (if still open)
        if (VideoCallbackPort->is_enabled)
        {
            MMAL_STATUS_T status;
            MMAL_BUFFER_HEADER_T *new_buffer;
            new_buffer = mmal_queue_get(BufferPool->queue);
            if (new_buffer)
                status = mmal_port_send_buffer(VideoCallbackPort, new_buffer);
            if (!new_buffer || status != MMAL_SUCCESS)
                printf("Unable to return a buffer to the video port\n");
        }    
    }

    return res;
}

This gets the next buffer in the output queue, copies it into memory provided by the user, and then returns it back to the port for reuse, just like the old video callback used to do.

It all worked fine first time, so my actual application code is now as simple as:

//this is the buffer my graphics code uses to update the main texture each frame
extern unsigned char GTextureBuffer[4*1280*720];

//entry point
int main(int argc, const char **argv)
{
    printf("PI Cam api tester\n");
    InitGraphics();
    printf("Starting camera\n");
    CCamera* cam = StartCamera(1280,720,15);

    printf("Running frame loop\n");
    for(int i = 0; i < 3000; i++)
    {
        BeginFrame();

        //read next frame into the texture buffer
        cam->ReadFrame(GTextureBuffer,sizeof(GTextureBuffer));

        //tell graphics code to draw the texture
        DrawMainTextureRect(-0.9f,-0.9f,0.9f,0.9f);

        EndFrame();
    }

    StopCamera();
}

As an added benefit, doing it synchronously means I don't accidentally write to the buffer while it's being copied to the texture, so no more screen tearing! Nice!

A bit more efficient

Now that I'm accessing the buffer synchronously there's the opportunity to get things more efficient and remove a frame of lag. Basically the current system goes:

  • BeginFrame (updates the main texture from GTextureBuffer - effectively a memcpy)
  • camera->ReadFrame (memcpy latest frame into GTextureBuffer)
  • DrawMainTextureRect (draws the main texture)
  • EndFrame (refreshes the screen)
There's 2 problems here. First up, our read frame call is updating GTextureBuffer after its copied into the opengl texture. This means we're always seeing a frame behind, although that could be easily fixed by calling it before BeginFrame. Worse though, we're doing 2 memcpys - first from camera to GTextureBuffer, and then from GTextureBuffer to the opengl texture. With a little reworking of the api however this can be fixed...

First, I add 'BeginReadFrame' and 'EndReadFrame' functions, which effectively do the same as the earlier ReadFrame (minus the memcpy), but split across 2 function calls:

bool CCamera::BeginReadFrame(const void* &out_buffer, int& out_buffer_size)
{
    //try and get buffer
    if(MMAL_BUFFER_HEADER_T *buffer = mmal_queue_get(OutputQueue))
    {
        //lock it
        mmal_buffer_header_mem_lock(buffer);

        //store it
        LockedBuffer = buffer;
        
        //fill out the output variables and return success
        out_buffer = buffer->data;
        out_buffer_size = buffer->length;
        return true;
    }
    //no buffer - return false
    return false;
}

void CCamera::EndReadFrame()
{
    if(LockedBuffer)
    {
        // unlock and then release buffer back to the pool from whence it came
        mmal_buffer_header_mem_unlock(LockedBuffer);
        mmal_buffer_header_release(LockedBuffer);
        LockedBuffer = NULL;

        // and send it back to the port (if still open)
        if (VideoCallbackPort->is_enabled)
        {
            MMAL_STATUS_T status;
            MMAL_BUFFER_HEADER_T *new_buffer;
            new_buffer = mmal_queue_get(BufferPool->queue);
            if (new_buffer)
                status = mmal_port_send_buffer(VideoCallbackPort, new_buffer);
            if (!new_buffer || status != MMAL_SUCCESS)
                printf("Unable to return a buffer to the video port\n");
        }    
    }
}

The key here is that instead of returning the buffer straight away, I simply store a pointer to it in BeginReadFrame and return the address and size of the data to the user. In EndReadFrame, I then proceed to unlock and release it as normal.

This means my ReadFrame function now changes to:

int CCamera::ReadFrame(void* dest, int dest_size)
{
    //default result is 0 - no data available
    int res = 0;

    //get buffer
    const void* buffer; int buffer_len;
    if(BeginReadFrame(buffer,buffer_len))
    {
        if(dest_size >= buffer_len)
        {
            //got space - copy it in and return size
            memcpy(dest,buffer,buffer_len);
            res = buffer_len;
        }
        else
        {
            //not enough space - return failure
            res = -1;
        }
        EndReadFrame();
    }

    return res;
}

In itself that's not much help. However, if I make a tweak to the application so it can copy data straight into the opengl texture and switch it to use BeginReadFrame and EndReadFrame I can avoid one of the memcpys. In addition, by moving the camera read earlier in the frame I lose a frame of lag:

//entry point
int main(int argc, const char **argv)
{
    printf("PI Cam api tester\n");
    InitGraphics();
    printf("Starting camera\n");
    CCamera* cam = StartCamera(MAIN_TEXTURE_WIDTH, MAIN_TEXTURE_HEIGHT,15);

    printf("Running frame loop\n");
    for(int i = 0; i < 3000; i++)
    {
        //lock the current frame buffer, and copy it directly into the open gl texture
        const void* frame_data; int frame_sz;
        if(cam->BeginReadFrame(frame_data,frame_sz))
        {
            UpdateMainTextureFromMemory(frame_data);
            cam->EndReadFrame();
        }

        //begin frame, draw the texture then end frame
        BeginFrame();
        DrawMainTextureRect(-0.9f,-0.9f,0.9f,0.9f);
        EndFrame();
    }

    StopCamera();
}

Much better! Unfortunately I'm still only at 15hz due to the weird interplay between opengl and the mmal resizing/converting components, but it is a totally solid 15hz at 720p - about 10hz at 1080p. I suspect I'm going to have to ditch the mmal resize/convert components eventually and rewrite them as opengl shaders, but not just yet.

Here's a video of the progress so far:



And as usual, here's the code:


Next up - downsampling!

Friday, 25 October 2013

Pi Eyes Stage 4

Last time round I got to the point at which I was pulling data out of the camera and using opengl to efficiently render it to screen. There's a few performance improvements to be made, but it runs at a respectable frame rate and could hit 30hz at 720p no problem.


Next up it's time to get the camera feed from the native YUV format into RGB ready for image processing. I'll also be needing to get some down sampled versions of the image, as the more expensive image processing algorithms aren't fast enough to run on a hi def video feed. This'll be a fairly breif post though, as it's late and my brain is going to sleep...

The Image Resizer

My first port of call was the image resize component built into mmal (thanks to a handy tip on this post), which uses the hardware resizer to take an image and... well... resize it! However, as a handy side effect it can also convert between YUV and RGB in the process. At this point massive thanks goes out to Mr Jason Van Cleave, who made all the mmal component documentation available on his web site.

So in short, I need to adjust the camera code so it:
  • Create an image resize component (I eventually worked out its the "vc.ril.resizer" component)
  • Connect it to the camera's video output port (the one we're currently displaying)
  • Set it's input format to that of the video output, and the output format to RGBA. We leave the image sizes the same for now though, so its not really doing any resizing - just the conversion
I do a little code cleanup first so it's easier to add to, plonk in the new code and after a few iterations...



We're in business! Code here:


Unfortunately on the first attempt performance appeared to be very poor. Interestingly though if I remove the actual rendering of the image it runs fine. This leads me to believe that the image resizer is chomping through most of the gpu time and consequentially I can't render fast enough. This is really annoying as I frankly don't see why it should be so slow - maybe it's just some interplay between opengl and mmal. 

I'll know more about the resizer performance once I get multiple resizers running, generating different downsampled images and we'll see what the actual costs are. If necessary I'm fairly confident I could write a shader that did the convert and downsample quite efficiently. I'm now getting 15hz, which I'm not happy about but it'll do for the moment.

A quick restructure

My next goal is to get multiple outputs at different resolutions coming out of the camera. This allows me to analyse the data at different levels in order to pick and choose where I spend my cpu. It should be doable using the 'video splitter' component, but it raises a few problems in terms of my architecture.

Right now the camera code simply runs, then calls a callback for each frame. Once the splitter is running I'll be receiving blocks of data constantly from different sources and will need a nice way of managing this and providing it in an api to the user. As a result, before going onto the multiple output world, I'm going to adjust the camera code so it internally buffers up the frames and allows the user to read them in a syncronous manner. If I can make use of the queuing system in mmal then I should be able to set it up as follows:


The basic idea is that the camera wraps up all the mmal stuff as usual, but rather than providing a callback, the application simply calls 'ReadFrame' to get the current frame from the camera. It passes in a 'level' to choose the downsampling level (0=full res, 1=half res, 2=quarter res) and obviously a place to put the data. 

Internally those output queues will be added to by the internal callbacks on the resizer output ports. Crucially, the resizer buffers will be passed directly into the output queue. A buffer will then only be returned to the resizer when:
  • The application calls ReadFrame, thus the buffer is no longer needed
  • An output queue contains more than x (potentially 1) entry, indicating the application isn't reading data fast enough (or at all) so frames can be dropped
This'll all be a lot easier if I can use the mmal queue code, but if not I'll roll my own. 

The only problem with this plan is that it involves fiddling around with a complex api and reworking lots of fiddly code, and it's past my bed time. Even coders need sleep, so I'll have to get to downsampling another day.


Pi Eyes Stage 3

For the past few days I've been working on getting a pair of raspberry pi camera modules working and accessing their data in a c++ program ready for image processing. Last night I got to the first version of my camera api which can be found here. So far I can:
  • Start the camera up
  • Have a callback triggered once per frame that gets past the buffer + its size
  • Shut the camera down
Very soon I'll get to work on converting the YUV data the camera submits into nice friendly RGB data, and get downsampling of the images going. Both will either need to be done using mmal, or through my own GPU code if they stand a chance of being usable for real time processing. Once they're going I'll be in a great position to get more complex stuff like feature analysis working.

However, while I've got a host of ideas of how to move forwards, the first thing to do is get the output from the camera rendering on screen so I can actually see it in action. As such my next goal is to get opengl going, and render the output from the camera callback to the screen. Initially it'll look like garbage (as it'll actually be yuv data), but it'll be garbage that changes when things move in front of the camera! Once it's working I'll be in a position to do things like downsampling, rgb conversion etc and actually see the results to verify they're functioning correctly.

Getting OpenGL going

I've not endeavered to get opengl working on the pi yet, but there's a couple of examples called hello_triangle and hello_triangle2. On looking at them, hello_triangle2 is both the simplest and most interesting as it uses shaders to do its rendering. I start by copying the graphics init code from hello_triangle2, and then add begin/end frame calls that basically clear the screen and swap the buffers respectively. This rather uninteresting photo is the result:


OK so it's not much, but crucially it shows opengl is operating correctly - I have a render loop that is clearing the screen to blue (and all the while I'm still reading the camera every frame in the background).

Shaders, Buffers and Boxes

I'm not gonna mess with fixed function pipelines and then have to go back and change it to shaders as soon as I want something funky - this is the 21st century after all! As a result I need to get shaders working which I've never done in opengl. From the example it basically seems to be a case of:
  • Load/set source code for a shader
  • Compile it, and get a shader id
  • Create a 'program', and assign it a vertex shader and a fragment shader
So I come up with this code inside a little GfxShader class:

bool GfxShader::LoadVertexShader(const char* filename)
{
    //cheeky bit of code to read the whole file into memory
    assert(!Src);
    FILE* f = fopen(filename, "rb");
    assert(f);
    fseek(f,0,SEEK_END);
    int sz = ftell(f);
    fseek(f,0,SEEK_SET);
    Src = new GLchar[sz+1];
    fread(Src,1,sz,f);
    Src[sz] = 0; //null terminate it!
    fclose(f);

    //now create and compile the shader
    GlShaderType = GL_VERTEX_SHADER;
    Id = glCreateShader(GlShaderType);
    glShaderSource(Id, 1, (const GLchar**)&Src, 0);
    glCompileShader(Id);
    check();
    printf("Compiled vertex shader:\n%s\n",Src);

    return true;
}

That just loads up a file, and fires it through the open gl code to create a shader program. Next, I knock together the simplest vertex shader and fragment shader I can think of:

SimpleVertShader.glsl:


attribute vec4 vertex;
void main(void) 
{
    vec4 pos = vertex;
    gl_Position = pos;
};

SimpleFragShader.glsl:

void main(void) 
{
    gl_FragColor = float4(1,1,1,1);
};


And now it's time to try and render a triangle using those shaders!!! Please note - at time of writing I still don't know if this is going to work, or if those shaders are entirely wrong... Unless I've missed something, it appears the old way of specifying vertices 1 by 1 isn't present in OpenGLES2 (although it's very possible I've missed something), so I'm gonna need to create me a vertex buffer. I knock together these bits to create and draw it...

Create it...
    //create an ickle vertex buffer
    static const GLfloat quad_vertex_positions[] = {
        0.0f, 0.0f,    1.0f, 1.0f,
        1.0f, 0.0f, 1.0f, 1.0f,
        1.0f, 1.0f, 1.0f, 1.0f,
        0.0f, 1.0f, 1.0f, 1.0f
    };
    glGenBuffers(1, &GQuadVertexBuffer);
    glBindBuffer(GL_ARRAY_BUFFER, GQuadVertexBuffer);
    glBufferData(GL_ARRAY_BUFFER, sizeof(quad_vertex_positions), quad_vertex_positions, GL_STATIC_DRAW);
    check();

Draw it...
    glUseProgram(GSimpleProg.GetId());
    printf("gl error: %d\n",glGetError());
    check();
    glBindBuffer(GL_ARRAY_BUFFER, GQuadVertexBuffer);
    GLuint loc = glGetAttribLocation(GSimpleProg.GetId(),"vertex");
    glVertexAttribPointer(loc, 4, GL_FLOAT, 0, 16, 0);
    glEnableVertexAttribArray(loc);
    check();
    glDrawArrays ( GL_TRIANGLE_STRIP, 0, 4 );
    check();
    glFinish();
    glFlush();
    check();

But glUseProgram is giving me errors so its thinking hat time....

And... an hour of fiddling later I've discovered open gl doesn't return an error if the shader compiling or linking into a program fails. Instead it returns happy success, unless you specifically ask it how things went! Having fixed some compile errors in my earlier shaders I run it and am presented with my first quad:



And after adding offset and scale uniforms and passing them into this draw function...

void DrawWhiteRect(float x0, float y0, float x1, float y1)
{
    glUseProgram(GSimpleProg.GetId());
    check();

    glUniform2f(glGetUniformLocation(GSimpleProg.GetId(),"offset"),x0,y0);
    glUniform2f(glGetUniformLocation(GSimpleProg.GetId(),"scale"),x1-x0,y1-y0);

    glBindBuffer(GL_ARRAY_BUFFER, GQuadVertexBuffer);
    check();

    GLuint loc = glGetAttribLocation(GSimpleProg.GetId(),"vertex");
    check();

    glVertexAttribPointer(loc, 4, GL_FLOAT, 0, 16, 0);
    check();

    glEnableVertexAttribArray(loc);
    check();

    glDrawArrays ( GL_TRIANGLE_STRIP, 0, 4 );
    check();

    glFinish();
    check();

    glFlush();
    check();
}

Hmm - not sure what the glFlush does yet. One for later though. The point is I can make a box anywhere I want:



OK, it's.....

Texture Time

My ultimate goal here is to get the camera texture on screen, which will involve filling an open gl texture with data each frame and then displaying it on a quad like the one above. Before getting that far I'm just gonna try filling a texture with random data each frame and seeing where that gets me...

...half an hour later... well having grasped opengles2 a bit better, that was actually fairly easy. We have a 32x32 random texture (and a code base that's getting messier by the second):



Woohooo!

From camera to texture

This is it folks. If I can get from camera output to something on screen at a decent frame rate then it paves the way for much wonders on this raspberry pi. I'll start with a hacky not-thread-safe approach which will also waste a bit of cpu time doing extra memcpys and generally be bad. But quick to write.

So we've got a callback in the app that is submitting new frames from a separate thread, and a call on the main frame to render a texture on screen. I just need to get the data from the thread into the texure and all will be well. I start by bodgily setting my earlier 'random texture' to be exactly the right size for a 1280x720 camera image, resulting in something a little more 'trippy':



Now to try regenerating that random data each frame - works although very slow. Not even worth uploading the video I made of it really!

However, I now have code that generates some new data, pumps it to open gl and then draws it on screen. All I need to do now is use my camera feed as the 'new data' instead of random numbers. I write a new function that takes a pointer to a buffer and copies it into the block of memory I was previously filling with random numbers. Remember my camera data is still in YUV so it'll not fill a full RGB texture (and will look weird), so I make it keep copying until it fills the buffer - this gives me a good measure of performance. A bit of jiggery pokery and...


Eureka!!!

At 1080p the memcpys involved (probably 2 - one from camera -> scratch space, another from scratch space -> open gl texture) are heavy enough and it hits about 10fps. But at 720p (still easily enough for some tasty image fun) it's in at around 25fps. With a little clever engineering I can remove 1 of those copies, so it'll hit a solid 30fps. Here's a video to show it in action:



Pretty tasty yes? Although please note - when I say 'copies it into the cpu' I mean 'into cpu accesible memory'. One doesn't make sense, the other does...

All code is here, although it's in a bit of a state right now so don't take anything as 'the right way to do it' - especially the graphics bits!

http://www.cheerfulprogrammer.com/downloads/pi_eyes_stage3/picam_rendering.zip

Next Steps

Now that I can see what I'm generating (and can do it at an appreciable frame rate) I'm going to look at using the mmal image resizer component to create downsampled versions of the feed (for different granularity in image processing) and in theory do the rgb conversion (if the documentation is telling the truth...).

Firs though, I need to order a takeaway.






Thursday, 24 October 2013

Pi Eyes Stage 2

Right, in my last post I had got the raspberry pi camera modules up and running, but hit a bit of a blocker in terms of accessing the actual camera feeds in c++. Fortunately a very clever chap called Pierre Raufast had documented his reworking of the raspivid application here. It'd suffered a little over time, probably just due to newer versions of its dependencies so I redid some of his work and ended up with camcv.c, which sets up the camera and provides a point at which we can access each frame of the camera feed. My next task is to rewrite it from scratch, then experiment with decoding the data in an optimal way. First though, just so this post isn't entirely code - a picture of the latest setup:

My raspberry pi 'stereo camera rig'. Good old balsa wood and insulating tape.

Quick Instructions on getting the code

In this post I get to my first version of a working camera api, which can be downloaded here:

http://www.cheerfulprogrammer.com/downloads/pi_eyes_stage2/picam.zip

Note that to use it, you'll need to download and build the raspberry pi userland code from here. Mine is stored in /opt/vc/userland-master.

I'll go into more details once I have something I'm really happy with!

Writing the actual code...

The basic architecture of the camera system is quite simple once you get over the total lack of documentation of the fairly complex mmal layer... We basically:
  • Start up mmal
  • Create a 'camera component'
  • Tell its 'video output port' to call a callback each time it fills in a new buffer
  • In the callback we:
    • Lock the buffer
    • Read it
    • Unlock it
    • Give it back to be the port to be recycled
  • And when all is done, we kill the camera component and bail out
The callback is called from a seperate thread, so once things are moving the main application can carry on as normal. This lends itself well to a simple initial api of just:
  • StartCamera(some setup options + callback pointer)
  • StopCamera()
The main bit of code I'm going to keep from the raspberry pi userland code is the raspicamcontrol stuff, which wraps up setting parameters on the camera in a simple api.

.... imagine moments of intense programming with blondie on in the background here ....

It's a few hours later and I've finished revision one. I've got a basic camera api that gets initialised, does stuff for a while then shuts down. Here's the first 'application' that uses it:

#include <stdio.h>
#include <unistd.h>
#include "camera.h"

void CameraCallback(CCamera* cam, const void* buffer, int buffer_length)
{
    printf("Do stuff with %d bytes of data\n",buffer_length);
}

int main(int argc, const char **argv)
{
    printf("PI Cam api tester\n");
    StartCamera(1280,720,30,CameraCallback);
    sleep(10);
    StopCamera();
}


Neat! With that code I run the application and get this print out

PI Cam api tester
Creating video port pool with 3 buffers of size 1382400
mmal: mmal_vc_port_parameter_set: failed to set port parameter 64:0:ENOSYS
mmal: Function not implemented
Sent buffer 0 to video port
Sent buffer 0 to video port
Sent buffer 0 to video port
Camera successfully created
Do stuff with 1382400 bytes of data
Do stuff with 1382400 bytes of data
Do stuff with 1382400 bytes of data
//... repeat every frame for 10 seconds...
Do stuff with 1382400 bytes of data
Do stuff with 1382400 bytes of data
Do stuff with 1382400 bytes of data
Shutting down camera

Most of the magic is inside 2 files:
I've posted the full files online for download, but will go over the key bits here. The beefy one is the camera initialization - CCamera::Init

Camera Initialisation

Basic setup / creation of camera component

bool CCamera::Init(int width, int height, int framerate, CameraCBFunction callback)
{
    //init broadcom host - QUESTION: can this be called more than once??
    bcm_host_init();

    //store basic parameters
    Width = width;       
    Height = height;
    FrameRate = framerate;
    Callback = callback;

    // Set up the camera_parameters to default
    raspicamcontrol_set_defaults(&CameraParameters);

    MMAL_COMPONENT_T *camera = 0;
    MMAL_ES_FORMAT_T *format;
    MMAL_PORT_T *preview_port = NULL, *video_port = NULL, *still_port = NULL;
    MMAL_STATUS_T status;

    //create the camera component
    status = mmal_component_create(MMAL_COMPONENT_DEFAULT_CAMERA, &camera);
    if (status != MMAL_SUCCESS)
    {
        printf("Failed to create camera component\n");
        return false;
    }

    //check we have output ports
    if (!camera->output_num)
    {
        printf("Camera doesn't have output ports");
        mmal_component_destroy(camera);
        return false;
    }

    //get the 3 ports
    preview_port = camera->output[MMAL_CAMERA_PREVIEW_PORT];
    video_port = camera->output[MMAL_CAMERA_VIDEO_PORT];
    still_port = camera->output[MMAL_CAMERA_CAPTURE_PORT];

    // Enable the camera, and tell it its control callback function
    status = mmal_port_enable(camera->control, CameraControlCallback);
    if (status != MMAL_SUCCESS)
    {
        printf("Unable to enable control port : error %d", status);
        mmal_component_destroy(camera);
        return false;
    }

    //  set up the camera configuration
    {
        MMAL_PARAMETER_CAMERA_CONFIG_T cam_config;
        cam_config.hdr.id = MMAL_PARAMETER_CAMERA_CONFIG;
        cam_config.hdr.size = sizeof(cam_config);
        cam_config.max_stills_w = Width;
        cam_config.max_stills_h = Height;
        cam_config.stills_yuv422 = 0;
        cam_config.one_shot_stills = 0;
        cam_config.max_preview_video_w = Width;
        cam_config.max_preview_video_h = Height;
        cam_config.num_preview_video_frames = 3;
        cam_config.stills_capture_circular_buffer_height = 0;
        cam_config.fast_preview_resume = 0;
        cam_config.use_stc_timestamp = MMAL_PARAM_TIMESTAMP_MODE_RESET_STC;
        mmal_port_parameter_set(camera->control, &cam_config.hdr);
    }

This first section is pretty simple, albiet fairly long. It's just:

  • Creating the mmal camera component
  • Getting the 3 'output ports'. The main one we're interested in is the video port, but as far as I can tell the others still need setting up for correct operation.
  • Enabling the 'control' port and providing a callback. This basically gives the camera a way of providing us with info about changes of state. Not doing anything with this yet though.
  • Filling out a camera config structure, then sending it to the camera control port
Setting output port formats

Now we have a camera component running, the next step is to configure those output ports:

    // setup preview port format - QUESTION: Needed if we aren't using preview?
    format = preview_port->format;
    format->encoding = MMAL_ENCODING_OPAQUE;
    format->encoding_variant = MMAL_ENCODING_I420;
    format->es->video.width = Width;
    format->es->video.height = Height;
    format->es->video.crop.x = 0;
    format->es->video.crop.y = 0;
    format->es->video.crop.width = Width;
    format->es->video.crop.height = Height;
    format->es->video.frame_rate.num = FrameRate;
    format->es->video.frame_rate.den = 1;
    status = mmal_port_format_commit(preview_port);
    if (status != MMAL_SUCCESS)
    {
        printf("Couldn't set preview port format : error %d", status);
        mmal_component_destroy(camera);
        return false;
    }

    //setup video port format
    format = video_port->format;
    format->encoding = MMAL_ENCODING_I420; //not opaque, as we want to read it!
    format->encoding_variant = MMAL_ENCODING_I420; 
    format->es->video.width = Width;
    format->es->video.height = Height;
    format->es->video.crop.x = 0;
    format->es->video.crop.y = 0;
    format->es->video.crop.width = Width;
    format->es->video.crop.height = Height;
    format->es->video.frame_rate.num = FrameRate;
    format->es->video.frame_rate.den = 1;
    status = mmal_port_format_commit(video_port);
    if (status != MMAL_SUCCESS)
    {
        printf("Couldn't set video port format : error %d", status);
        mmal_component_destroy(camera);
        return false;
    }

    //setup still port format
    format = still_port->format;
    format->encoding = MMAL_ENCODING_OPAQUE;
    format->encoding_variant = MMAL_ENCODING_I420;
    format->es->video.width = Width;
    format->es->video.height = Height;
    format->es->video.crop.x = 0;
    format->es->video.crop.y = 0;
    format->es->video.crop.width = Width;
    format->es->video.crop.height = Height;
    format->es->video.frame_rate.num = 1;
    format->es->video.frame_rate.den = 1;
    status = mmal_port_format_commit(still_port);
    if (status != MMAL_SUCCESS)
    {
        printf("Couldn't set still port format : error %d", status);
        mmal_component_destroy(camera);
        return false;
    }

This is 3 almost identical bits of code - one for the preview port (which would be used for doing the full screen preview of the feed if we were using it), one for the video port (that's the one we're interested in) and one for the still port (presumably for capturing stills). If you read the code it's pretty much just plugging in the numbers provided to configure the camera. The most important part, highlighted in red is where we set the video port format to I420 encoding (the native format of the camera). By setting it correctly, this tells mmal that we will be providing a callback for the video output later, and it'll be wanting all the data thankyou very much! Otherwise it just passes in the buffer headers but no actual output... Point of note - I tried setting the format to ABGR, but the camera just output I420 data in a dodgy layout, so it's going to need converting.

Create a buffer pool for the video port to write to

    //setup video port buffer and a pool to hold them
    video_port->buffer_num = 3;
    video_port->buffer_size = video_port->buffer_size_recommended;
    MMAL_POOL_T* video_buffer_pool;
    printf("Creating video port pool with %d buffers of size %d\n", video_port->buffer_num, video_port->buffer_size);
    video_buffer_pool = mmal_port_pool_create(video_port, video_port->buffer_num, video_port->buffer_size);
    if (!video_buffer_pool)
    {
        printf("Couldn't create video buffer pool\n");
        mmal_component_destroy(camera);
        return false;    
    }


This little chunk is the first properly 'new' bit when compared to the raspivid. It creates a pool of buffers that we'll be providing to the video port to write the frames to. For now we're just creating it, but later we'll pass all the buffers to the video port and then begin capturing! The buffer_num is 3, as that gives you enough time to have the camera writing 1 buffer, while you read another, with an extra one in the middle for safety. The recommended buffer size comes from the format we specified earlier.

Enable stuff

    //enable the camera
    status = mmal_component_enable(camera);
    if (status != MMAL_SUCCESS)
    {
        printf("Couldn't enable camera\n");
        mmal_port_pool_destroy(video_port,video_buffer_pool);
        mmal_component_destroy(camera);
        return false;    
    }

    //apply all camera parameters
    raspicamcontrol_set_all_parameters(camera, &CameraParameters);

    //setup the video buffer callback
    status = mmal_port_enable(video_port, VideoBufferCallback);
    if (status != MMAL_SUCCESS)
    {
        printf("Failed to set video buffer callback\n");
        mmal_port_pool_destroy(video_port,video_buffer_pool);
        mmal_component_destroy(camera);
        return false;    
    }


This pretty simple code enables the camera, sends it a list of setup parameters using the cam control code, then enables the video port. Note the port enable call, which tells the video port about our VideoBufferCallback function, which we want calling for each frame received from the camera.

Give the buffers to the video port


    //send all the buffers in our pool to the video port ready for use
    {
        int num = mmal_queue_length(video_buffer_pool->queue);
        int q;
        for (q=0;q<num;q++)
        {
            MMAL_BUFFER_HEADER_T *buffer = mmal_queue_get(video_buffer_pool->queue);
            if (!buffer)
                printf("Unable to get a required buffer %d from pool queue", q);
            if (mmal_port_send_buffer(video_port, buffer)!= MMAL_SUCCESS)
                printf("Unable to send a buffer to encoder output port (%d)", q);
            printf("Sent buffer %d to video port\n");
        }
    }

OK, so this one looks a bit odd! The basic idea is that we created a pool of 3 buffers earlier, which is basically a queue of pointers to unused blocks of memory. This bit of code removes each buffer from the pool and sends it into the video port. In effect, we're handing the video port the blocks of memory it'll use to store frames in.

Begin capture and return SUCCESS!


    //begin capture
    if (mmal_port_parameter_set_boolean(video_port, MMAL_PARAMETER_CAPTURE, 1) != MMAL_SUCCESS)
    {
        printf("Failed to start capture\n");
        mmal_port_pool_destroy(video_port,video_buffer_pool);
        mmal_component_destroy(camera);
        return false;    
    }

    //store created info
    CameraComponent = camera;
    BufferPool = video_buffer_pool;

    //return success
    printf("Camera successfully created\n");
    return true;

As our final trick, we set the 'capturing' setting to 1, and if all goes well that VideoBufferCallback function should start getting called.

The video callback

What also deserves a mention is the video callback:

void CCamera::OnVideoBufferCallback(MMAL_PORT_T *port, MMAL_BUFFER_HEADER_T *buffer)
{
    //check if buffer has data in
    if(buffer->length)
    {
        //got data so lock the buffer, call the callback so the application can use it, then unlock
        mmal_buffer_header_mem_lock(buffer);
        Callback(this,buffer->data,buffer->length);
        mmal_buffer_header_mem_unlock(buffer);
    }
    
    // release buffer back to the pool
    mmal_buffer_header_release(buffer);

    // and send one back to the port (if still open)
    if (port->is_enabled)
    {
        MMAL_STATUS_T status;
        MMAL_BUFFER_HEADER_T *new_buffer;
        new_buffer = mmal_queue_get(BufferPool->queue);
        if (new_buffer)
            status = mmal_port_send_buffer(port, new_buffer);
        if (!new_buffer || status != MMAL_SUCCESS)
            printf("Unable to return a buffer to the video port\n");
    }
}

The first bit should be fairly simple - we check if the buffer has any data in (hopefully it always does!), and if so, lock it, call the users callback (remember they passed it into the Init function) so the application can have it's merry way with the frame data, then unlock it.

The next bit of code is a little more confusing. It's the second part of the buffer management stuff we saw earlier. Once the buffer is used, we first release it. This frees it up and effectively puts it back in the pool from whence it came! However, the video port is now down a buffer, so (if its still open), we pull the buffer back out of the pool and send it back into the video port ready for reuse.

What is interesting here is that we have control over when the buffer is returned to the video port. I can see down the line doing stuff with the gpu, where I extend the life time of a buffer over a frame so compute shaders can do stuff with it!

What's Next?

Well, I can now read data, at 1080p/30hz if I want but the next question is what to do with it! Currently it's in the funky I420 encoding (basically a common form where 1 channel represents the 'brightness' and the other 2 channels represent the colour). To be useful it'll need converting to rgb, and displaying on screen. I know from Pierre's work that opencv isn't ideal for this, so while I'll need it for proper image processing I think I'll have a look at faster ways to get from camera output -> me on the tv!


Pi Eyes Stage 1

I've got 2 new raspberry pi camera modules and 2 new rapsberry pis to go with them. Time to start hooking things up.

Setting up the Pis

First step I get my raspberry pis all setup how I like them. This is something I've done a few times now and should probably write down in detail for people, especially as the setup process these days is quite different to how it was with the earlier versions of the pi - maybe if enough people ask I will do :)

My ideal setup with the cameras included is:

  • Raspberry pi
  • Wireless usb dongle (TPLINK WN727N works out of the box with no additional power)
  • Pi Camera Module
  • Standard power supplies and cases
  • Unpowered usb hub + cheap wired keyboard / mouse for the initial setup steps
  • HDMI cable to plug into tv
Once everything is hooked up I proceed to:
  • Boot the raspberry pis and use the nice 'N00B' interface to install Raspbian. The only modification I make in the config screen is to ensure camera support is on.
  • Setup the wifi (or just plugin to network if using a wired connection)
  • Assign a static ip address to each raspberry pi so my computer can find them easily
At this point I can disconnect the mouse and keyboard - everything else can be done from my pc via ssh. Now my setup looks like this:



I can connect to the pis using SSH (with the putty software). Now I proceed to install:
  • TightVNCServer (for remote desktop access from pc)
  • CMake (for compiling all sorts of things)
  • Samba file sharing (so I can access the pi file system from pc)
  • Synergy (handy if I want to run the pi on the tv but use mouse/keyboard from pc)
In the absence of any details from me on this, a few great web sites to look at are:

Testing The Cameras

It's pretty easy to test the pi cams as some software comes packaged to record videos / take stills and show the feed on screen. By typing into putty the command:

raspivid -t 30000 -vf

I tell the video feed to show on screen for 30s, vertically flipped as my cameras are hanging upside down!


And there's me taking a photo of cameras taking a photo of me taking a photo of the cameras....

Getting the camera in c++....

So far so good - both pis work, both cameras works and we're setup ready for development. Or are we? It turns out not really - current support for the cameras in terms of coding is extremely minimal. If what you want to do is write an app that regularly takes a snap shot and sends it somewhere then fine. I however am looking to read the camera feed in c++ and do some image processing with it and at the time of writing this is not an out-of-the-box task.

The core issue with the cameras in terms of coding is that they don't come with video4linux drivers, so no standard web cam reading software (opencv included) can just read from them. Clearly it's possible as the raspivid application does it, and the source code is available so we have somewhere to start. Fortunately a very clever and helpful chap called Pierre Raufast has already done a load of the digging, and his information is all up here:


As Pierre discovered, the raspivid application uses the mmal (multimedia abstraction layer) library to access the camera data and transfer the it to the screen or encode it as stills / videos. His steps (which I recommend at least reading) are in short:
  • Install the raspberry pi camera module and get it going
  • Download / build / install the userland code for raspberry pi - this includes all the latest source code for the raspivid application and libaries it needs. Can be found here: https://github.com/raspberrypi/userland
  • Install opencv (and in his case the face recognistion library)
  • Copy the raspivid code and create a new modified one that doesn't do any fancy stuff, and instead just grabs the data from the camera and shoves it into an opencv window
The key result of Pierre's work is in this file: http://raufast.org/download/camcv_vid0.c

Having gone through his steps and made a few tweaks, I eventually got the code running:



However it dies after a few seconds due to some unknown error (probably because the code is a little out of date) and doesn't do exactly what I need it to.

So my next plan - redo some of Pierre's work using the most recent raspivid application and see if I can come up with a nice tidy camera api.

OK, so after a bit of work I've...

  • Stripped out everything to do with encoding from the latest raspvid
  • Re-implemented some of Pierre's work to capture the memory from each frame
In other words, I've got a functioning program that can run the camera at 1080p, and access the memory for each video frame. Here's the very basic code (currently at 720p to speed up disk writing):


This blog's got long enough for now, so I'll leave it there and write up my progress getting from this preliminary code into a nice camera api in the next installment.