An Attempt at Paint-by-Numbers - Part 3
Posted on January 25th, 2025
In the previous parts of this series, we tried various techniques to simplify images and achieve a Paint-by-Numbers effect. We saw how morphological transformations didn’t work as hoped but discovered that connected components could effectively eliminate small, unwanted regions. With that done, we’re now focusing on the problem of “thin regions” that still linger in our image. By the end of this post, we’ll also dive into boundary detection and labeling to give our picture that classic Paint-by-Numbers look.
Removing Thin Regions
When it comes to removing thin regions, two initial ideas might spring to mind:
- Width-to-height ratio. A region is considered “thin” if its width is much larger than its height, or vice versa.
- Region area to bounding box area. A low ratio means the region occupies only a small fraction of its bounding box, suggesting a thin shape.
First, consider the ratio of width to height using a region’s bounding box. If the bounding-box width is much greater than its height (or vice versa), the region qualifies as “thin.” While this sounds promising, it’s often less effective in practice. Regions tend to be irregularly shaped or angled diagonally, and even a “straight line” might not align horizontally or vertically. As a result, this ratio alone ends up eliminating only a few problematic regions.
Next, we can compare a region’s pixel area to the area of its bounding box. A small ratio indicates that the region occupies only a fraction of its bounding box, marking it as “thin.” This works well for filtering out shapes like L-shaped regions. However, it struggles with complex, sparse forms or regions full of holes — such as leaves in a tree — where the total area is small but spread out in multiple clusters, which aren’t strictly thin.


Take a look at these two examples. The region on the left is a “sparse region” from the trees, with a region-to-bounding-box area ratio of 26% — meaning 74% of its bounding box is effectively “background.” Meanwhile, the region on the right is extracted from the fence area, zoomed in, and only a few pixels wide, yet it has a 37% ratio. If we used a 30% threshold, for instance, we’d mistakenly remove the larger region on the left (which we want to keep) but fail to remove the smaller, thin region on the right. One could try combining multiple features — such as total area, width-to-height ratios, or shape descriptors — but the result often feels ad hoc and might only work for this specific image.
The logical approach when no single elegant solution emerges? Sometimes, you just brute force it.
Removing Thin Regions With Scans
When we slice the image horizontally into 1-pixel rows, each row becomes a series of colored segments with distinct lengths. If any segment is shorter than a given threshold, we replace it with the dominant color on its left or right — “dominant” here means the neighboring segment with the larger width. Let’s examine one such region as an example.
Counting colors and regions from left to right reveals four segments: a light blue region, a gray region, a dark brown region, and, finally, a small “beige” region that we want to remove. On the left, there’s a large dark brown region, while on the right, there’s a smaller light brown region. Because the left-side region is larger, we consider it the dominant color and use it to replace the beige strip. In the end, this simple brute-force technique solves our problem.
The algorithm works by iterating over each row and creating an offset version shifted by one pixel. By comparing the original row with its offset, we detect only the transition points, then mark the boundaries at the first and last pixel indices. For each segment between these boundaries, we apply the same removal logic described above. If we need a vertical scan, we can transpose the image, run the same procedure, and then transpose it back.
The code that implements this process is shown below:
def removeThinRegions(image, minLength=5, direction='horizontal'):
h, w, _ = image.shape
# For vertical scanning, transpose the image to reuse horizontal logic
transposed = False
if direction.lower() == 'vertical':
image = np.transpose(image, (1, 0, 2)) # Shape becomes (W, H, 3)
h, w = w, h
transposed = True
for rowIndex in range(h):
# Extract current row (or column, if transposed)
row = image[rowIndex]
# Find indexes for color transitions between regions
transitions = np.any(row[:-1] != row[1:], axis=1)
boundaries = np.where(transitions)[0] + 1
boundaries = np.concatenate(([0], boundaries, [w]))
# Compile all "strips" of same color in current row/column
strips = []
for boundaryIndex in range(len(boundaries) - 1):
start = boundaries[boundaryIndex]
end = boundaries[boundaryIndex + 1]
stripLength = end - start
strips.append({'start': start, 'end': end, 'length': stripLength, 'color': row[start]})
# Identify strips smaller than threshold and define dominant color based on left/right neighbors
stripsToRecolor = []
for i, strip in enumerate(strips):
if strip['length'] < minLength:
# Determine left and right neighbors
leftStrip = strips[i - 1] if i > 0 else None
rightStrip = strips[i + 1] if i < len(strips) - 1 else None
# Get lengths of neighboring runs
leftLength = leftStrip['length'] if leftStrip else 0
rightLength = rightStrip['length'] if rightStrip else 0
# Choose fill color based on longer neighbor run
if leftLength >= rightLength:
fillColor = leftStrip['color']
elif rightLength > leftLength:
fillColor = rightStrip['color']
stripsToRecolor.append((strip['start'], strip['end'], fillColor))
# Recolor the identified thin strips
for (start, end, fillColor) in stripsToRecolor:
image[rowIndex, start:end] = fillColor
# If the image was transposed, transpose it back to original orientation
if transposed:
image = np.transpose(image, (1, 0, 2))
return image
def removeThinStrips2D(image, minLength=5, iterations=2):
# Apply horizontal and vertical scan iteratively
out = image.copy()
for i in range(iterations):
out = removeThinRegions(out, minLength=minLength, direction='horizontal')
out = removeThinRegions(out, minLength=minLength, direction='vertical')
return out
After applying this process, you may notice a few artifacts, because each scan only looks at the current row. However, a second pass with the earlier small-area removal algorithm usually clears them up. The end result — achieved by running a 7-pixel threshold with 3 iterations, then 3 rounds of small - region removal—looks like this:
Fantastic!
Now, let’s move on to boundary detection, labeling, and label positioning.
Region and Boundary Detection
Regions
Here’s a quick recap: we now have an image divided into clearly defined regions of reasonable size, with a limited color palette. Our goal is to generate a black-and-white version that displays each region’s boundary, then label each region by color number (for instance, blue = 1, brown = 2, and so forth). Eventually, we’ll be able to paint over it, just like a real paint-by-numbers image.
To break down the process: first, detect regions (via Connected Components); second, draw contours between these regions; third, find each region’s “pole of inaccessibility”; and fourth, use that pole to place a label that identifies the region’s color. We can even scale the label’s text size based on the area of the region — smaller labels for smaller areas, bigger labels for larger ones.
We’ve explored Connected Components before. Essentially, we create a mask for each color in the palette, then run connectedComponentsWithStats
to locate and measure each contiguous region (using 4-connectivity: up, down, left, right). From there, we gather region areas for reference and build a mapping of labels to colors.
def generateLabelImageAndMapping(image, palette):
h, w, c = image.shape
# Prepare output
labelImage = np.zeros((h, w), dtype=np.int32)
labelToPaletteIndex = []
labelAreas = []
currentLabel = 1
# For each unique color, find connected components
for paletteIndex, color in enumerate(palette):
colorMask = cv2.inRange(image, color, color)
numLabels, labels, stats, _ = cv2.connectedComponentsWithStats(
colorMask,
connectivity=4.
)
# Label each region found (skip 0 = background)
mask = labels > 0
labelImage[mask] = np.arange(currentLabel, currentLabel + numLabels - 1)[labels[mask] - 1]
labelToPaletteIndex.extend([paletteIndex] * (numLabels - 1))
labelAreas.extend(stats[1:, cv2.CC_STAT_AREA])
currentLabel += numLabels - 1
# Convert labelToPaletteIndex and labelAreas to dictionaries
labelToPaletteIndex = {label: paletteIndex for label, paletteIndex in enumerate(labelToPaletteIndex, start=1)}
labelAreas = {label: area for label, area in enumerate(labelAreas, start=1)}
return labelImage, labelToPaletteIndex, labelAreas
A valuable optimization I’ve discovered is to use cv2.inRange
for color masking instead of NumPy operations. It’s noticeably faster, so much so that I went back and updated other sections that needed masking. After running connected components, it might be tempting to loop through each label and apply new masks repeatedly, but in practice, it’s more efficient to work with arrays and minimize explicit loops.
Boundaries
For drawing contours, OpenCV does most of the heavy lifting with its findContours
function. Using RETR_LIST ensures you get all boundaries, including holes and nested contours, while CHAIN_APPROX_SIMPLE simplifies the contour data and speeds things up—an ideal combination when you don’t need pixel-perfect edges.
def drawContour(image, palette, COUNTOUR_THICKNESS = 1):
h, w, _ = image.shape
# Start with a blank white image
outputImage = np.full((h, w, 3), 255, dtype=np.uint8)
boundaryMask = np.zeros((h, w), dtype=np.uint8)
# For each label, find and draw external contours on boundary_mask
for color in palette:
colorMask = cv2.inRange(image, color, color)
contours, _ = cv2.findContours(colorMask, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
cv2.drawContours(boundaryMask, contours, -1, 255, COUNTOUR_THICKNESS)
# Convert boundary_mask into black contours on our white image
outputImage[boundaryMask == 255] = (0, 0, 0)
return outputImage
The result is FINALLY looking like a Paint By Number image:
Poles of Inaccessibility and Labeling
The “pole of inaccessibility” is a geographical concept describing the point in a region that’s farthest from its boundaries. This makes it perfect for label placement, because a simple geometric center doesn’t always guarantee the label will remain inside the region.
One way to picture this is to imagine igniting every boundary of the region at once — the fire gradually converges on the point most shielded from the flames. Equivalently, you can keep applying erosion until only a single pixel remains. Conveniently, OpenCV implements this technique as distanceTransform
. A good illustration is the map of my home state, Minas Gerais:


Processing everything in parallel further accelerates these calculations, and adjusting the label’s font size and thickness based on region size turns out to be surprisingly straightforward.
def findCoordinate(labelImage, label, paletteIndex, labelAreas, minFontScale=0.5, maxFontScale=1.5, minThickness=1, maxThickness=5):
regionMask = cv2.inRange(labelImage, label, label)
# Distance transform to find a point farthest from the boundary
distTransform = cv2.distanceTransform(regionMask, cv2.DIST_L2, 5)
_, _, _, maxLoc = cv2.minMaxLoc(distTransform)
centerX, centerY = maxLoc
# Retrieve area for dynamic text scaling
area = labelAreas[label]
minArea, maxArea = min(labelAreas.values()), max(labelAreas.values())
# Scale font size and thickness by area
areaFraction = (area - minArea) / float(maxArea - minArea)
fontScale = minFontScale + areaFraction * (maxFontScale - minFontScale)
thickness = int(minThickness + (areaFraction * (maxThickness - minThickness)))
return str(paletteIndex), (centerX, centerY), fontScale, thickness
def getCoordinatesParallel(labelImage, labelToPaletteIndex, palette):
updates = []
with ThreadPoolExecutor() as executor:
futures = [
executor.submit(findCoordinate, labelImage, label, paletteIndex, labelAreas)
for label, paletteIndex in labelToPaletteIndex.items()
]
for future in futures:
updates.append(future.result())
return updates
Finally, the only remaining step is to tackle the (slightly unnecessary) complexity of rendering the text. Since OpenCV treats the specified point as the “bottom-left origin,” we use the font size to recalculate a true center position for the text. We also add extra logic to prevent clipping along the image borders, ensuring each label is both centered and fully visible.
def applyLabels(image, updates):
def fixPoint(imageDimension, newPoint, textSize, coordinate):
minBorder = 0 + int(textSize/2) if coordinate == 'x' else 0 + int(textSize)
maxBorder = imageDimension - int(textSize) if coordinate == 'x' else imageDimension - int(textSize/2)
if (newPoint <= minBorder) or (newPoint >= maxBorder):
point = minBorder
else:
point = newPoint
return point
imageCopy = image.copy()
h, w, _ = imageCopy.shape
for update in updates:
text = update[0]
x, y = update[1]
fontScale = update[2]
thickness = update[3]
font = cv2.FONT_HERSHEY_SIMPLEX
# Calculate the size of the text
(textWidth, textHeight), baseline = cv2.getTextSize(text, font, fontScale, thickness)
# Calculate the new origin point to center the text
centerX = fixPoint(w, int(x - textWidth / 2), textWidth, 'x')
centerY = fixPoint(h, int(y - textHeight / 2), textHeight, 'y')
cv2.putText(
imageCopy,
text,
(centerX, centerY),
font,
fontScale,
(0, 0, 0),
thickness,
cv2.LINE_AA
)
return imageCopy
The (almost) Final Result
Ready for the result?
I'm extremely satisfied with it. Let's end this exploration here, but we'll come back for a (hopefully) final part, where we test the method on different images, discuss operation order, calculate times, and discuss more optimizations. But for now, rejoice, my friend.