An Attempt at Paint-by-Numbers

An Attempt at Paint-by-Numbers - Part 3

Posted on January 25th, 2025

In the previous parts of this series, we tried various techniques to simplify images and achieve a Paint-by-Numbers effect. We saw how morphological transformations didn’t work as hoped but discovered that connected components could effectively eliminate small, unwanted regions. With that done, we’re now focusing on the problem of “thin regions” that still linger in our image. By the end of this post, we’ll also dive into boundary detection and labeling to give our picture that classic Paint-by-Numbers look.

Removing Thin Regions

When it comes to removing thin regions, two initial ideas might spring to mind:

Width-to-height ratio. A region is considered “thin” if its width is much larger than its height, or vice versa.
Region area to bounding box area. A low ratio means the region occupies only a small fraction of its bounding box, suggesting a thin shape.

First, consider the ratio of width to height using a region’s bounding box. If the bounding-box width is much greater than its height (or vice versa), the region qualifies as “thin.” While this sounds promising, it’s often less effective in practice. Regions tend to be irregularly shaped or angled diagonally, and even a “straight line” might not align horizontally or vertically. As a result, this ratio alone ends up eliminating only a few problematic regions.

Next, we can compare a region’s pixel area to the area of its bounding box. A small ratio indicates that the region occupies only a fraction of its bounding box, marking it as “thin.” This works well for filtering out shapes like L-shaped regions. However, it struggles with complex, sparse forms or regions full of holes — such as leaves in a tree — where the total area is small but spread out in multiple clusters, which aren’t strictly thin.

Example of a sparse image with small area ratio

Example of a thin region with a large area ration

Take a look at these two examples. The region on the left is a “sparse region” from the trees, with a region-to-bounding-box area ratio of 26% — meaning 74% of its bounding box is effectively “background.” Meanwhile, the region on the right is extracted from the fence area, zoomed in, and only a few pixels wide, yet it has a 37% ratio. If we used a 30% threshold, for instance, we’d mistakenly remove the larger region on the left (which we want to keep) but fail to remove the smaller, thin region on the right. One could try combining multiple features — such as total area, width-to-height ratios, or shape descriptors — but the result often feels ad hoc and might only work for this specific image.

The logical approach when no single elegant solution emerges? Sometimes, you just brute force it.

Removing Thin Regions With Scans

When we slice the image horizontally into 1-pixel rows, each row becomes a series of colored segments with distinct lengths. If any segment is shorter than a given threshold, we replace it with the dominant color on its left or right — “dominant” here means the neighboring segment with the larger width. Let’s examine one such region as an example.

Thin Strip With 1 Pixel Height as Example

Counting colors and regions from left to right reveals four segments: a light blue region, a gray region, a dark brown region, and, finally, a small “beige” region that we want to remove. On the left, there’s a large dark brown region, while on the right, there’s a smaller light brown region. Because the left-side region is larger, we consider it the dominant color and use it to replace the beige strip. In the end, this simple brute-force technique solves our problem.

The algorithm works by iterating over each row and creating an offset version shifted by one pixel. By comparing the original row with its offset, we detect only the transition points, then mark the boundaries at the first and last pixel indices. For each segment between these boundaries, we apply the same removal logic described above. If we need a vertical scan, we can transpose the image, run the same procedure, and then transpose it back.

The code that implements this process is shown below:

def removeThinRegions(image, minLength=5, direction='horizontal'):
    h, w, _ = image.shape

    # For vertical scanning, transpose the image to reuse horizontal logic
    transposed = False
    if direction.lower() == 'vertical':
        image = np.transpose(image, (1, 0, 2))  # Shape becomes (W, H, 3)
        h, w = w, h
        transposed = True

    for rowIndex in range(h):
        # Extract current row (or column, if transposed)
        row = image[rowIndex]

        # Find indexes for color transitions between regions
        transitions = np.any(row[:-1] != row[1:], axis=1)
        boundaries = np.where(transitions)[0] + 1
        boundaries = np.concatenate(([0], boundaries, [w]))

        # Compile all "strips" of same color in current row/column
        strips = []
        for boundaryIndex in range(len(boundaries) - 1):
            start = boundaries[boundaryIndex]
            end = boundaries[boundaryIndex + 1]
            stripLength = end - start
            strips.append({'start': start, 'end': end, 'length': stripLength, 'color': row[start]})

        # Identify strips smaller than threshold and define dominant color based on left/right neighbors
        stripsToRecolor = []

        for i, strip in enumerate(strips):
            if strip['length'] < minLength:
                # Determine left and right neighbors
                leftStrip = strips[i - 1] if i > 0 else None
                rightStrip = strips[i + 1] if i < len(strips) - 1 else None

                # Get lengths of neighboring runs
                leftLength = leftStrip['length'] if leftStrip else 0
                rightLength = rightStrip['length'] if rightStrip else 0

                # Choose fill color based on longer neighbor run
                if leftLength >= rightLength:
                    fillColor = leftStrip['color']
                elif rightLength > leftLength:
                    fillColor = rightStrip['color']

                stripsToRecolor.append((strip['start'], strip['end'], fillColor))

        # Recolor the identified thin strips
        for (start, end, fillColor) in stripsToRecolor:
            image[rowIndex, start:end] = fillColor

    # If the image was transposed, transpose it back to original orientation
    if transposed:
        image = np.transpose(image, (1, 0, 2))

    return image

def removeThinStrips2D(image, minLength=5, iterations=2):
    # Apply horizontal and vertical scan iteratively
    out = image.copy()
    for i in range(iterations):
        out = removeThinRegions(out, minLength=minLength, direction='horizontal')
        out = removeThinRegions(out, minLength=minLength, direction='vertical')
    return out

After applying this process, you may notice a few artifacts, because each scan only looks at the current row. However, a second pass with the earlier small-area removal algorithm usually clears them up. The end result — achieved by running a 7-pixel threshold with 3 iterations, then 3 rounds of small - region removal—looks like this:

Hallstatt Image With Thin Regions Removed

Fantastic!

Now, let’s move on to boundary detection, labeling, and label positioning.

Region and Boundary Detection

Regions

Here’s a quick recap: we now have an image divided into clearly defined regions of reasonable size, with a limited color palette. Our goal is to generate a black-and-white version that displays each region’s boundary, then label each region by color number (for instance, blue = 1, brown = 2, and so forth). Eventually, we’ll be able to paint over it, just like a real paint-by-numbers image.

To break down the process: first, detect regions (via Connected Components); second, draw contours between these regions; third, find each region’s “pole of inaccessibility”; and fourth, use that pole to place a label that identifies the region’s color. We can even scale the label’s text size based on the area of the region — smaller labels for smaller areas, bigger labels for larger ones.

We’ve explored Connected Components before. Essentially, we create a mask for each color in the palette, then run connectedComponentsWithStats to locate and measure each contiguous region (using 4-connectivity: up, down, left, right). From there, we gather region areas for reference and build a mapping of labels to colors.

def generateLabelImageAndMapping(image, palette):
    h, w, c = image.shape
    # Prepare output
    labelImage = np.zeros((h, w), dtype=np.int32)
    labelToPaletteIndex = []
    labelAreas = []

    currentLabel = 1
    # For each unique color, find connected components
    for paletteIndex, color in enumerate(palette):
        colorMask = cv2.inRange(image, color, color)
            numLabels, labels, stats, _ = cv2.connectedComponentsWithStats(
            colorMask, 
            connectivity=4.
        )

        # Label each region found (skip 0 = background)
        mask = labels > 0
        labelImage[mask] = np.arange(currentLabel, currentLabel + numLabels - 1)[labels[mask] - 1]
        labelToPaletteIndex.extend([paletteIndex] * (numLabels - 1))
        labelAreas.extend(stats[1:, cv2.CC_STAT_AREA])
        currentLabel += numLabels - 1

    # Convert labelToPaletteIndex and labelAreas to dictionaries
    labelToPaletteIndex = {label: paletteIndex for label, paletteIndex in enumerate(labelToPaletteIndex, start=1)}
    labelAreas = {label: area for label, area in enumerate(labelAreas, start=1)}

    return labelImage, labelToPaletteIndex, labelAreas

A valuable optimization I’ve discovered is to use cv2.inRange for color masking instead of NumPy operations. It’s noticeably faster, so much so that I went back and updated other sections that needed masking. After running connected components, it might be tempting to loop through each label and apply new masks repeatedly, but in practice, it’s more efficient to work with arrays and minimize explicit loops.

Boundaries

For drawing contours, OpenCV does most of the heavy lifting with its findContours function. Using RETR_LIST ensures you get all boundaries, including holes and nested contours, while CHAIN_APPROX_SIMPLE simplifies the contour data and speeds things up—an ideal combination when you don’t need pixel-perfect edges.

def drawContour(image, palette, COUNTOUR_THICKNESS = 1):    
    h, w, _ = image.shape

    # Start with a blank white image
    outputImage = np.full((h, w, 3), 255, dtype=np.uint8)
    boundaryMask = np.zeros((h, w), dtype=np.uint8)

    # For each label, find and draw external contours on boundary_mask
    for color in palette:
        colorMask = cv2.inRange(image, color, color)
        contours, _ = cv2.findContours(colorMask, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
        cv2.drawContours(boundaryMask, contours, -1, 255, COUNTOUR_THICKNESS)

    # Convert boundary_mask into black contours on our white image
    outputImage[boundaryMask == 255] = (0, 0, 0)

    return outputImage

The result is FINALLY looking like a Paint By Number image:

Hallstatt Image With Boundaries Drawn

Poles of Inaccessibility and Labeling

The “pole of inaccessibility” is a geographical concept describing the point in a region that’s farthest from its boundaries. This makes it perfect for label placement, because a simple geometric center doesn’t always guarantee the label will remain inside the region.

One way to picture this is to imagine igniting every boundary of the region at once — the fire gradually converges on the point most shielded from the flames. Equivalently, you can keep applying erosion until only a single pixel remains. Conveniently, OpenCV implements this technique as distanceTransform. A good illustration is the map of my home state, Minas Gerais:

Black and White map of the state of Minas Gerais

Map of the state of Minas Gerais with distance transform applied

Processing everything in parallel further accelerates these calculations, and adjusting the label’s font size and thickness based on region size turns out to be surprisingly straightforward.

def findCoordinate(labelImage, label, paletteIndex, labelAreas, minFontScale=0.5, maxFontScale=1.5, minThickness=1, maxThickness=5):
    regionMask = cv2.inRange(labelImage, label, label)

    # Distance transform to find a point farthest from the boundary
    distTransform = cv2.distanceTransform(regionMask, cv2.DIST_L2, 5)
    _, _, _, maxLoc = cv2.minMaxLoc(distTransform)
    centerX, centerY = maxLoc

    # Retrieve area for dynamic text scaling
    area = labelAreas[label]
    minArea, maxArea = min(labelAreas.values()), max(labelAreas.values())
    # Scale font size and thickness by area
    areaFraction = (area - minArea) / float(maxArea - minArea)
    fontScale = minFontScale + areaFraction * (maxFontScale - minFontScale)
    thickness = int(minThickness + (areaFraction * (maxThickness - minThickness)))

    return str(paletteIndex), (centerX, centerY), fontScale, thickness

def getCoordinatesParallel(labelImage, labelToPaletteIndex, palette):
    updates = []
    with ThreadPoolExecutor() as executor:
        futures = [
            executor.submit(findCoordinate, labelImage, label, paletteIndex, labelAreas)
            for label, paletteIndex in labelToPaletteIndex.items()
        ]
        for future in futures:
            updates.append(future.result())

    return updates

Finally, the only remaining step is to tackle the (slightly unnecessary) complexity of rendering the text. Since OpenCV treats the specified point as the “bottom-left origin,” we use the font size to recalculate a true center position for the text. We also add extra logic to prevent clipping along the image borders, ensuring each label is both centered and fully visible.

def applyLabels(image, updates):
    def fixPoint(imageDimension, newPoint, textSize, coordinate):
        minBorder = 0 + int(textSize/2) if coordinate == 'x' else 0 + int(textSize)
        maxBorder = imageDimension - int(textSize) if coordinate == 'x' else imageDimension - int(textSize/2)
        if (newPoint <= minBorder) or (newPoint >= maxBorder):
            point = minBorder
        else:
            point = newPoint
        return point

    imageCopy = image.copy()
    h, w, _ = imageCopy.shape
    for update in updates:
        text = update[0]
        x, y = update[1]
        fontScale = update[2]
        thickness = update[3]
        font = cv2.FONT_HERSHEY_SIMPLEX

        # Calculate the size of the text
        (textWidth, textHeight), baseline = cv2.getTextSize(text, font, fontScale, thickness)

        # Calculate the new origin point to center the text
        centerX = fixPoint(w, int(x - textWidth / 2), textWidth, 'x')
        centerY = fixPoint(h, int(y - textHeight / 2), textHeight, 'y')

        cv2.putText(
            imageCopy, 
            text, 
            (centerX, centerY), 
            font, 
            fontScale, 
            (0, 0, 0), 
            thickness, 
            cv2.LINE_AA
        )

    return imageCopy

The (almost) Final Result

Ready for the result?

Hallstatt Image With Paint By Number Effect

I'm extremely satisfied with it. Let's end this exploration here, but we'll come back for a (hopefully) final part, where we test the method on different images, discuss operation order, calculate times, and discuss more optimizations. But for now, rejoice, my friend.

AxeCrafted Blog