C

Hardware accelerated blending

To implement hardware-acceleration, subclass PlatformInterface::DrawingEngine and override the functions that the platform is able to accelerate.

class ExampleDrawingEngine : public PlatformInterface::DrawingEngine
{
public:
    void blendRect(PlatformInterface::DrawingDevice *drawingDevice,
                   const PlatformInterface::Rect &rect,
                   PlatformInterface::Rgba32 color,
                   BlendMode blendMode) QUL_DECL_OVERRIDE;

    void blendRoundedRect(PlatformInterface::DrawingDevice *drawingDevice,
                          const PlatformInterface::Rect &rect,
                          const PlatformInterface::Rect &clipRect,
                          PlatformInterface::Rgba32 color,
                          int radius,
                          BlendMode blendMode) QUL_DECL_OVERRIDE;

    void blendImage(PlatformInterface::DrawingDevice *drawingDevice,
                    const PlatformInterface::Point &pos,
                    const PlatformInterface::Texture &source,
                    const PlatformInterface::Rect &sourceRect,
                    int sourceOpacity,
                    BlendMode blendMode) QUL_DECL_OVERRIDE;

    void blendAlphaMap(PlatformInterface::DrawingDevice *drawingDevice,
                       const PlatformInterface::Point &pos,
                       const PlatformInterface::Texture &source,
                       const PlatformInterface::Rect &sourceRect,
                       PlatformInterface::Rgba32 color,
                       BlendMode blendMode) QUL_DECL_OVERRIDE;

    void blendTransformedImage(PlatformInterface::DrawingDevice *drawingDevice,
                               const PlatformInterface::Transform &transform,
                               const PlatformInterface::RectF &destinationRect,
                               const PlatformInterface::Texture &source,
                               const PlatformInterface::RectF &sourceRect,
                               const PlatformInterface::Rect &clipRect,
                               int sourceOpacity,
                               BlendMode blendMode);
    void blendTransformedAlphaMap(PlatformInterface::DrawingDevice *drawingDevice,
                                  const PlatformInterface::Transform &transform,
                                  const PlatformInterface::RectF &destinationRect,
                                  const PlatformInterface::Texture &source,
                                  const PlatformInterface::RectF &sourceRect,
                                  const PlatformInterface::Rect &clipRect,
                                  PlatformInterface::Rgba32 color,
                                  BlendMode blendMode);

    void synchronizeForCpuAccess(PlatformInterface::DrawingDevice *drawingDevice,
                                 const PlatformInterface::Rect &rect) QUL_DECL_OVERRIDE;

    PlatformInterface::DrawingEngine::Path *allocatePath(const PlatformInterface::PathData *pathData,
                                                         PlatformInterface::PathFillRule fillRule) QUL_DECL_OVERRIDE;

    void setStrokeProperties(PlatformInterface::DrawingEngine::Path *path,
                             const PlatformInterface::StrokeProperties &strokeProperties) QUL_DECL_OVERRIDE;

    void blendPath(PlatformInterface::DrawingDevice *drawingDevice,
                   PlatformInterface::DrawingEngine::Path *path,
                   const PlatformInterface::Transform &transform,
                   const PlatformInterface::Rect &clipRect,
                   const PlatformInterface::Brush *fillBrush,
                   const PlatformInterface::Brush *strokeBrush,
                   int sourceOpacity,
                   PlatformInterface::DrawingEngine::BlendMode blendMode) QUL_DECL_OVERRIDE;
};

If any of DrawingEngine::blendRect, DrawingEngine::blendRoundedRect, DrawingEngine::blendImage, DrawingEngine::blendAlphaMap, DrawingEngine::blendTransformedImage, or DrawingEngine::blendTransformedAlphaMap functions are not overridden, the default implementation is used. It calls synchronizeForCpuAccess() before calling the corresponding fallback implementation on DrawingDevice::fallbackDrawingEngine. If the platform is able to partially accelerate some blending function, for example only for given blend modes or opacity parameters, it can itself use the fallbackDrawingEngine to fallback for a certain set of parameters.

Warning: If you experience a crash when using the fallbackDrawingEngine, it might be because you haven't called PlatformInterface::initializeArgb32CpuFallbackDrawingEngine() or similar in initializePlatform().

Note: If the blendImage function is using software rendering by default implementation or by using fallbackDrawingEngine, the source images are assumed to be in ARGB32_Premultiplied format. To enable blending of other formats with blendImage set QUL_PLATFORM_DEFAULT_RESOURCE_ALPHA_OPTIONS to Always.

The actual implementation of the blend functions might vary significantly depending on the hardware-acceleration API. To make your task of implementing these functions easier, a dummy hardware-acceleration API is used for demonstration purposes. For example, here's how the implementation of DrawingEngine::blendRect might look:

void ExampleDrawingEngine::blendRect(PlatformInterface::DrawingDevice *drawingDevice,
                                     const PlatformInterface::Rect &rect,
                                     PlatformInterface::Rgba32 color,
                                     BlendMode blendMode)
{
    // Implement rectangle blending here

    // If only blitting is supported by the hardware, this is how to use the
    // fallback drawing engine for the blending path.
    if (color.alpha() != 255 && blendMode != BlendMode_SourceOver) {
        synchronizeForCpuAccess(drawingDevice, rect);
        drawingDevice->fallbackDrawingEngine()->blendRect(drawingDevice, rect, color, blendMode);
        return;
    }

    // HW_SetColor(1.0f, 1.0f, 1.0f, sourceOpacity * (1 / 256.0f));

    // HW_BlitRect(toHwRect(rect));
}

This example code assumes that the hardware-acceleration API doesn't offer rectangle blending support, for the sake of demonstrating how the fallback drawing engine could be used to fulfill this task instead.

Otherwise, the color is set up and the a call to blit the rectangle is issued.

DrawingEngine::blendRoundedRect is called when a rectangle with rounded corners is blended. The platform implementation also has to clip against the provided clip rectangle if it's smaller than the rectangle to be blended. For example implementation see:

void ExampleDrawingEngine::blendRoundedRect(PlatformInterface::DrawingDevice *drawingDevice,
                                            const PlatformInterface::Rect &rect,
                                            const PlatformInterface::Rect &clipRect,
                                            PlatformInterface::Rgba32 color,
                                            int radius,
                                            BlendMode blendMode)
{
    // Implement rectangle blending here

    // HW_SetColor(1.0f, 1.0f, 1.0f, sourceOpacity * (1 / 256.0f));
    // HW_SetClip(clipRect.x(), clipRect.y(), clipRect.width(), clipRect.height());

    // HW_BlitRoundRect(toHwRect(rect), radius);

    // HW_SetClip(0, 0, screen->width(), screen->height());
}

Similarly, when blending images and alpha maps you also need to inform the hardware-acceleration API about the texture layout and data somehow. The example assumes that there's a bindTexture function that handles it. This function must be implemented by the platform port according to the needs of the particular hardware-acceleration API.

For blendImage there's a source rectangle, a destination point, and the source opacity in the range 0 to 256 inclusively. Here's how its implementation might look:

void ExampleDrawingEngine::blendImage(PlatformInterface::DrawingDevice *drawingDevice,
                                      const PlatformInterface::Point &pos,
                                      const PlatformInterface::Texture &source,
                                      const PlatformInterface::Rect &sourceRect,
                                      int sourceOpacity,
                                      BlendMode blendMode)
{
    // Fall back to default CPU drawing engine for pixel formats that can't be
    // blended with hardware acceleration
    if (source.format() == Qul::PixelFormat_RGB332 || source.format() == PixelFormat_RLE_ARGB32
        || source.format() == PixelFormat_RLE_ARGB32_Premultiplied || source.format() == PixelFormat_RLE_RGB32
        || source.format() == PixelFormat_RLE_RGB888) {
        DrawingEngine::blendImage(drawingDevice, pos, source, sourceRect, sourceOpacity, blendMode);
        return;
    }

    // Implement image blending here

    // HW_SetBlendMode(toHwBlendMode(blendMode));

    // bindTexture(source);

    // HW_SetColor(1.0f, 1.0f, 1.0f, sourceOpacity * (1 / 256.0f));

    // const Rect destinationRect(pos, sourceRect.size());
    // HW_BlendTexture(toHwRect(sourceRect), toHwRect(destinationRect));
}

Next up, blendAlphaMap is very similar, the main difference being that the texture will have the pixel format PixelFormat_Alpha1 or PixelFormat_Alpha8, meaning one bit or one byte per pixel respectively indicating opacity. For each pixel, its opacity value must be multiplied by the given color before being blended or blitted depending on the blendMode.

void ExampleDrawingEngine::blendAlphaMap(PlatformInterface::DrawingDevice *drawingDevice,
                                         const PlatformInterface::Point &pos,
                                         const PlatformInterface::Texture &source,
                                         const PlatformInterface::Rect &sourceRect,
                                         PlatformInterface::Rgba32 color,
                                         BlendMode blendMode)
{
    // Implement alpha map blending here

    // HW_SetBlendMode(toHwBlendMode(blendMode));

    // bindTexture(source);

    // const float inv = 1 / 255.0f;
    // HW_SetColor(color.red() * inv, color.green() * inv, color.blue() * inv, color.alpha() * inv);

    // const PlatformInterface::Rect destinationRect(pos, sourceRect.size());
    // HW_BlendTexture(toHwRect(sourceRect), toHwRect(destinationRect));
}

Next, there's transformed blending of images and alpha maps. If the platform supports accelerating this, the implementation of blendTransformedImage might look something like this:

void ExampleDrawingEngine::blendTransformedImage(PlatformInterface::DrawingDevice *drawingDevice,
                                                 const PlatformInterface::Transform &transform,
                                                 const PlatformInterface::RectF &destinationRect,
                                                 const PlatformInterface::Texture &source,
                                                 const PlatformInterface::RectF &sourceRect,
                                                 const PlatformInterface::Rect &clipRect,
                                                 int sourceOpacity,
                                                 BlendMode blendMode)
{
    // Fall back to default CPU drawing engine for pixel formats that can't be
    // blended with hardware acceleration
    if (source.format() == Qul::PixelFormat_RGB332 || source.format() == PixelFormat_RLE_ARGB32
        || source.format() == PixelFormat_RLE_ARGB32_Premultiplied || source.format() == PixelFormat_RLE_RGB32
        || source.format() == PixelFormat_RLE_RGB888) {
        DrawingEngine::blendTransformedImage(drawingDevice,
                                             transform,
                                             destinationRect,
                                             source,
                                             sourceRect,
                                             clipRect,
                                             sourceOpacity,
                                             blendMode);
        return;
    }

    // Implement transformed image blending here

    // float matrix[16];
    // toHwMatrix(transform, &matrix);

    // HW_SetTransformMatrix(matrix);
    // HW_SetClip(clipRect.x(), clipRect.y(), clipRect.width(), clipRect.height());

    // HW_SetBlendMode(toHwBlendMode(blendMode));

    // bindTexture(source);

    // HW_SetColor(1.0f, 1.0f, 1.0f, sourceOpacity * (1 / 256.0f));

    // HW_BlendTexture(toHwRect(sourceRect), toHwRect(destinationRect));

    // HW_SetClip(0, 0, screen->width(), screen->height());
    // HW_SetTransformIdentity();
}

In addition to setting a clipping rectangle, you also need to convert the given transform to some matrix representation that the hardware-acceleration API accepts. If a 4x4 matrix is used, the conversion might look like the example toHwMatrix:

static void toHwMatrix(const PlatformInterface::Transform &transform, float *matrix)
{
    matrix[0] = transform.m11();
    matrix[1] = transform.m12();
    matrix[2] = 0;
    matrix[3] = 0;
    matrix[4] = transform.m21();
    matrix[5] = transform.m22();
    matrix[6] = 0;
    matrix[7] = 0;
    matrix[8] = 0;
    matrix[9] = 0;
    matrix[10] = 1;
    matrix[11] = 0;
    matrix[12] = transform.dx();
    matrix[13] = transform.dy();
    matrix[14] = 0;
    matrix[15] = 1;
}

The blendTransformedAlphaMap sample code looks very similar to blendTransformedImage, apart from taking the color into account as already demonstrated in the blendAlphaMap example code above. Refer to the platform/boards/qt/example-baremetal/platform_context.cpp file for the blendTransformedAlphaMap sample code.

Syncronize CPU access

As hardware blending often happens asynchronously, if any CPU based reading or writing to the framebuffer happens, DrawingEngine::synchronizeForCpuAccess is called by the Qt Quick Ultralite core. This function needs to sync with the hardware-accelerated blending unit to ensure that every pending blending command has been fully committed to the framebuffer. Here's how that might look with our dummy hardware-acceleration API:

void ExampleDrawingEngine::synchronizeForCpuAccess(PlatformInterface::DrawingDevice *drawingDevice,
                                                   const PlatformInterface::Rect &rect)
{
    // HW_SyncFramebufferForCpuAccess();

    framebufferAccessedByCpu = true;

    unsigned char *backBuffer = framebuffer[backBufferIndex];
    for (int i = 0; i < rect.height(); ++i) {
        unsigned char *pixels = backBuffer + (ScreenWidth * (rect.y() + i) + rect.x()) * BytesPerPixel;
        // CleanInvalidateDCache_by_Addr(pixels, rect.width() * BytesPerPixel);
    }
}

On some hardware, it might also be necessary to invalidate the data cache for the area involved. This ensures that the asynchronous writes done by the blending unit are fully seen by the CPU.

Also, the synchronizeAfterCpuAccess function will be useful on some platforms. In case any CPU rendering fallback is used, it invalidates any data caches before allowing asynchronous reads from the display controller to access the memory:

static bool framebufferAccessedByCpu = false;

void synchronizeAfterCpuAccess(const PlatformInterface::Rect &rect)
{
    if (framebufferAccessedByCpu) {
        unsigned char *backBuffer = framebuffer[backBufferIndex];
        for (int i = 0; i < rect.height(); ++i) {
            unsigned char *pixels = backBuffer + (ScreenWidth * (rect.y() + i) + rect.x()) * BytesPerPixel;
            // CleanInvalidateDCache_by_Addr(pixels, rect.width() * BytesPerPixel);
        }
    }
}

In addition, two more functions need to be implemented to ensure Qt Quick Ultralite core can read and write texture data dynamically. They are PlatformContext::waitUntilAsyncReadFinished and PlatformContext::flushCachesForAsyncRead.

Here's how their implementation might look:

void ExamplePlatform::waitUntilAsyncReadFinished(const void * /*begin*/, const void * /*end*/)
{
    // HW_SyncFramebufferForCpuAccess();
}

void ExamplePlatform::flushCachesForAsyncRead(const void * /*addr*/, size_t /*length*/)
{
    // CleanInvalidateDCache_by_Addr(const_cast<void *>(addr), length);
}

waitUntilAsyncReadFinished should ensure that any asynchronous operation reading from the given memory area has finished before returning. Typically this memory area refers to some texture data, which is going to get overwritten. The simplest option here is to sync with the hardware-accelerated blending units to ensure there are no more texture blending operations pending.

flushCachesForAsyncRead ensures that any changes written by the CPU are flushed so that subsequent asynchronous reads get the correct up-to-date memory data. If this is necessary on the given hardware, there must be a call to invalidate the data caches available.

That covers all the pieces necessary to implement hardware accelerated blending, to make Qt Quick Ultralite UIs run smoothly and efficiently on the platform that's being ported to.

Available under certain Qt licenses.
Find out more.