An Attempt at Paint-by-Numbers

An Attempt at Paint-by-Numbers - Part 2

Posted on January 1st, 2025

Let's recap where this epic tale halted: we have a saturated, color-reduced, brush-stroked image that we want to transform into a "Paint by Number" composition. The main challenge is the abundance of small regions—if we leave them as is, the final result will be a mess of tiny pixel areas, nearly impossible to label or paint.

Attempt 1 - Morphological Transformations

After conferring with my genius bird photographer brother, he recommended Morphological Transformations - fundamental operations in image processing that are used to manipulate the structure of an image by altering its "shape" (hence morphological). To apply them, one needs a binary image (black and white) and a "structuring element" (a kernel).

There are two essential transformations and two that combine them:

Erosion: shrinks the white regions (foreground) of the image by eroding boundaries.
Dilation: expands the white regions by growing the boundaries.
Opening: a combination of erosion followed by dilation - useful for preserving object shapes while removing noise.
Closing: dilation followed by erosion - helpful for closing small gaps and holes in objects.

Let's visualize it to make more sense of the transformations.

Basic Morphological Transformations Demo

Erosion and dilation are relatively straightforward. Opening and closing can be trickier, so think of them in terms of their base transformations:

Opening erodes small objects first, then dilates to restore the main shapes, but the tiny objects that disappeared remain gone.
Closingdoes the opposite, first filling holes and gaps, then eroding only the parts that were expanded.

Compound Morphological Transformations Demo

This problem seems suspiciously likely to benefit from an opening operation that removes small pixels and regions. Because morphological transformations require grayscale images, you must apply them to the Value channel in HSV. Then, merge the transformed Value channel back with the original Hue and Saturation channels. The result of this merge might produce color mismatches across channels, so one way to return to a 16-color palette is by assigning each pixel to the nearest color, as in part 1.

def applyMorphologicalTransformationHSV(image, KERNEL_SIZE=3, ITERATIONS=3):
    # Convert to HSV color space
    hsv = cv2.cvtColor(image.astype(np.uint8), cv2.COLOR_RGB2HSV)
    h, s, v = cv2.split(hsv)

    kernelSize = (KERNEL_SIZE, KERNEL_SIZE)
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, kernelSize)

    # Apply morphological operations to the Value channel
    eroded = cv2.erode(v, kernel, iterations=ITERATIONS)
    dilated = cv2.dilate(eroded, kernel, iterations=ITERATIONS)

    # Merge channels back
    hsvProcessed = cv2.merge([h, s, dilated])
    processedImage = cv2.cvtColor(hsvProcessed, cv2.COLOR_HSV2RGB)

    return processedImage

def mapColorsToPalette(image, palette):
    # Reshape image for efficient color mapping
    h, w, c = image.shape
    reshaped_image = image.reshape(-1, 3)

    # Use KDTree for efficient nearest neighbor search
    tree = KDTree(palette)
    _, indices = tree.query(reshaped_image)
    closest_colors = palette[indices]

    # Reshape back to original image dimensions
    mapped_image = closest_colors.reshape(h, w, 3)
    return mapped_image

Below is the image after applying the transformation. Different iterations and kernel sizes will produce varying outcomes—there's a balance between "clean regions" and "image definition." Enlarging the kernel size creates more uniform areas but makes the image less recognizable.

Hallstatt Image After Morphological Transformation

Ok, so did this work? Long story short: no. After applying the same connected components method from part 1, the resulting image has a lot more than 1000 regions. Increasing the kernel size reduces the region count but reduces detail so much that the image barely resembles the original.

Attempt 2 - Brute Force?

The thinking now is: if there is a way to find every region (and there is, using connected components), one could iterate through every region, calculate the region area and, if the area is smaller than a threshold, fill that area with the dominant color outside the area. In pseudocode:

Iterate through all colors.

Select small regions.
For each of these small regions:

Identify surrounding pixels.
Calculate the most frequent surrounding color.
Change the region to this color in the original image.

This poses some challenges, the main one being the processing speed. First of all, finding the connected components for each of the 16 colors is already a slow operation. Additionally, this image is 1440 x 2560 pixels in RGB - nearly 3.6 million pixels per channel. That scale can become computationally intensive.

A naive approach (nested loops, full-array processing) took more than 10 minutes per color, though results were promising. Optimizing the procedure led to much faster performance:

Parallel Processing - using ThreadPoolExecutor to find connected components for each color in parallel speeds up the workflow considerably.
Sub-Arrays - each small region can be reduced to just the region pixels and its bounding-box coordinates (x0, x1, y0, y1) - reduced array sizes reduces compute times significantly.
Avoiding Direct Loops - techniques like convolution or Open-CV methods help avoid iterative pixel-by-pixel loops, like extracting the pixels outside the regions with a dilation operation.

The final code looks like this:

def processSingleColor(image, color, minArea=500, margin=5):
    h, w, c = image.shape
    # Create a binary mask for the current color
    colorMask = np.all(image == color, axis=-1).astype(np.uint8)

    # Find connected components in the binary mask
    numLabels, labels, stats, centroids = cv2.connectedComponentsWithStats(
        colorMask, connectivity=4
    )

    # Collect small regions
    smallRegions = []
    for i in range(1, numLabels):
        area = stats[i, cv2.CC_STAT_AREA]
        if area < minArea:
            x, y, wr, hr, _ = stats[i]
            # Add margin to the bounding-box
            x0 = max(x - margin, 0)
            y0 = max(y - margin, 0)
            x1 = min(x + wr + margin, w)
            y1 = min(y + hr + margin, h)
            # Collect the region data
            smallRegions.append(
                {
                    'label': i,
                    'bbox': (x0, y0, x1, y1)
                }
            )

    return {"color": color, "smallRegions": smallRegions, "labels": labels}

def processSmallRegion(image, labels, regionInfo, color):
    label = regionInfo['label']
    x0, y0, x1, y1 = regionInfo['bbox']

    # Extract the sub-image
    subImage = image[y0:y1, x0:x1].copy()

    # Extract the sub-region mask
    subLabels = labels[y0:y1, x0:x1]
    regionMask = (subLabels == label).astype(np.uint8)

    # Create the mask for the dilated region
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
    dilatedRegionMask = cv2.dilate(regionMask, kernel, iterations=1)
    borderMask = dilatedRegionMask & (~regionMask)

    # Get surrounding pixels
    surroundingPixels = subImage[borderMask.astype(bool)]

    # Exclude pixels that match the region color
    mask = ~np.all(surroundingPixels == color, axis=1)
    surroundingPixels = surroundingPixels[mask]

    if surroundingPixels.size == 0:
        dominantColor = color
    else:
        surroundingPixelsTuples = [tuple(pixel) for pixel in surroundingPixels]
        dominantColor = Counter(surroundingPixelsTuples).most_common(1)[0][0]

    return {
        "coordinates": [x0, y0, x1, y1],
        "regionMask": regionMask,
        "dominantColor": dominantColor
    }

def getRegionsParallel(image, palette, minArea=500, margin=5):
    regions = []
    with ThreadPoolExecutor() as executor:
        futures = [
            executor.submit(processSingleColor, image, color, minArea, margin)
            for color in palette
        ]
        for future in futures:
            regions.append(future.result())

    return [
        {"color": color["color"], "labels": color["labels"], "regionInfo": region}
        for color in regions
        for region in color["smallRegions"]
    ]

def getUpdatesParallel(image, regions):
    updates = []
    with ThreadPoolExecutor() as executor:
        futures = [
            executor.submit(processSmallRegion, image, region["labels"], region["regionInfo"], region["color"])
            for region in regions
        ]
        for future in futures:
            updates.append(future.result())

    return updates

def fillSmallRegions(image, palette, minArea=500, margin=10):
    regions = getRegionsParallel(image, palette, minArea=minArea, margin=margin)
    updates = getUpdatesParallel(image, regions)

    filledImage = image.copy()
    # Apply the updates to the original image
    for update in updates:
        x0, y0, x1, y1 = update["coordinates"]
        regionMask = update["regionMask"].astype(bool)
        dominantColor = np.array(update["dominantColor"], dtype=filledImage.dtype)

        # Apply dominant color to the masked area
        filledImage[y0:y1, x0:x1][regionMask] = dominantColor

    return filledImage

Ultimately, each color is searched in parallel to find connected components - any region smaller than a MIN_AREA parameter is collected, along with its bounding-box. Each region is also processed in parallel: the bounding-box is used to copy that area from the original image, a mask is created, and a dilation operation is used to discover the dominant surrounding color. Once computed, all updates are applied to the image. Although parallelizing this final update step is trickier (due to concurrent memory access), a single loop is usually fast enough.

Final processing time: 3.6 seconds. The result speaks for itself:

Hallstatt Image After Removing Small Regions

This is really good. Small regions are gone and the image preserved its shape. Maybe a Paint by Number image is a few steps away?

Maybe Paint By Number

Generating the final Paint by Number is straightforward: use connected components to find regions, label each color (e.g., green = 1, light green = 2, etc.), compute each region’s center to place the color index, and draw contours.

Let's skip this code for now, because we still have a problem. This is the result:

Hallstatt Image With Paint By Numbers Effect - First Try, Still Not Quite There

A couple issues and the plan to solve them:

Presence of Small Regions Not Removed - this issue is probably due to the dominant color part of the code and the use of 4-connectivity, which sometimes causes the dominant color to match the region color. Possible fixes:

Simple Iteration - apply the small region removal step multiple times and hope the remaining regions are eliminated.
Exclude Region Color From Dominant Color Search - use the second dominant color in case of match.

Centering Numbers - this is a tricky one - to find the center of the region, the first ideas that come to mind are using the x/2 and y/2 of the region - this does not work if the region is oddly shaped. The second idea is to calculate the region centroid - think of it as the balance point of the region, where you could balance this sheet perfectly on the tip of a pin without it tipping over, like a center of mass. Much like the first approach, this does not solve all problems - think of a donut - its centroid is outside the donut borders. On researching how to solve this:

Pole of Inaccessibility - this is a geography concept defined as the furthest point location in a landmass - it is the point that is farther away from all edges of a region - this guarantees that the point will always be inside the region (if there are no holes) - the challenge is to implement this.

Thin Large Regions - this defect shows in the right-side house and lamp post borders - this is a thin but long connected region - since the previous algorithm uses the area as the criteria for removing a region, these long but thin regions are not removed because their area is greater than the threshold. Increasing the threshold is not an option because it will also remove other regions that are not supposed to be removed. Possible approaches:

Bounding-Box Height-Width Ratio - these regions may have a big height but a small width or vice-versa - one idea is to use the ratio between those values and a threshold to decide whether a region is removed - if the ratio is too small (height is much bigger than width, for example), remove the region. The problem is that regions may be oddly shaped. For example, an L-shaped region wouldn't be removed.
Region Area and Bounding-Box Area Ratio - to compensate for oddly shaped regions, there is an alternative approach that uses the ratio between the region and its bounding-box. This way, an L shaped region would have a big bounding-box, but the actual region area would be small. If the ratio is too small, remove the region.

In part 3, we’ll dive deeper to refine these methods and tackle the remaining challenges. And until that, let us pray.

AxeCrafted Blog