Apple fooled me once, but they won’t fool me again

Since a few years now I do all my work on a Mac, which means I’m also using my Mac for nut.js development. Usually this works flawlessly and I’m really happy with macOS and its ecosystem.

But in December 2020 I’ve been fooled by Apple for the first time.

The old problem

A former colleague of mine created an issue on the nut.js repo, stating that capturing screen regions on macOS yielded broken images. He also mentioned that capturing the whole screen worked fine, which left me puzzled.

Capturing the full screen content worked fine, but capturing a region of the screen resulted in tilted images.

Broken image

I spent some time investigating the issue and suspected a memory layout problem, but one day, after upgrading to macOS Big Sur, the issue was gone. Since there were not too many macOS users in the nut.js userbase back then, I just shrugged the issue off and moved on.

The new problem

Several years passed since the initial appearance of beforementioned issue and I didn’t receive any further reports about it. The amount of macOS users using nut.js continued to grow and I was happy with the stability of the macOS implementation. That was until two month ago when suddenly several users reported on-screen image search to not work and screen capture throwing errors.

I tried to reproduce the issue on both my 2018 Intel MacBook Pro and my M1 Mac Mini, but failed to do so. I was tempted to slap a Can’t reproduce, closing onto the issue, but since I knew something like that had happened before, I decided to look for the actual root cause this time.

How nut.js captures screen content

On macOS nut.js (or more specifically, one of it’s underlying provider plugins, e.g. libnut uses the CoreGraphics framework to capture screen content.

It aquires a reference to the image data of the main screen using the following code:

CGDirectDisplayID displayID = CGMainDisplayID();

CGImageRef image = CGDisplayCreateImageForRect(displayID,
                                               CGRectMake(
                                                       rect.origin.x,
                                                       rect.origin.y,
                                                       rect.size.width,
                                                       rect.size.height
                                               )
);

rect is either a user-defined custom region, or defaults to the full screen size.

Following that it determines the required buffer size using CFDataGetLength, allocates memory and creates a copy of the image data for further use via CFDataGetBytes.

This piece of code was working fine for years, but suddenly it seemed to have stopped working for some users, so let’s dissect this code a bit more.

macOS screen image data

How much data are we actually talking about here? Let’s use the default screen resolution of my 2021 16“ MacBook Pro, which is 1728x1117 pixels, as reported by the OS.

So we’re talking about a total of 1728 * 1117 = 1,930,176 pixels.

We can determine the amount of bytes for a single pixel by using CGImageGetBitsPerPixel, which tells us that a single pixel is represented by 32 bits, or 4 bytes.

1,930,176 * 4 = 7,720,704 bytes.

But there’s one more thing missing: pixel density!

Depending on the display type (e.g. Retina, non-Retina), the amount of pixels per inch (ppi) differs. This is called the backing scale factor of a display.

‌size_t expectedBufferSize = rect.size.width * pixelDensity * rect.size.height * pixelDensity * bytesPerPixel;

Accounting for the backing scale factor in x and y direction, which is 2 for my MacBook Pro with Retina display, we get a total of 7,720,704 * 2 * 2 = 30,882,816 bytes.

Now that we’ve got the numbers, let’s compare them to what we actually receive.

Size matters

I compared the amount of bytes we receive from CGDisplayCreateImageForRect to the amount of bytes we expect to receive on:

The built-in display of my 2021 16“ Apple Silicon MacBook Pro
The built-in display of my 2018 15“ Intel MacBook Pro
An external 2K display connected to my 2021 16“ Apple Silicon MacBook Pro

This turned out to be quite interesting.

External 2K display

When connected to my external display I couldn’t find any issues with the screen capture. Both expected and reported buffer sizes matched perfectly and I was able to capture the screen content without any problems.

Built-in display (2018 15“ Intel MacBook Pro)

On my 2018 15“ Intel MacBook Pro with a reported resolution of 1680x1050 pixels everything worked as expected as well:

Expected buffer size: 28,224,000
Reported buffer size: 28,224,000

Built-in display (2021 16“ Apple Silicon MacBook Pro)

When using the built-in display of my 2021 16“ Apple Silicon MacBook Pro, I was able to reproduce the issue.

Expected buffer size: 30,882,816
Reported buffer size: 30,883,840
Diff: 1,024

As you can see, the reported buffer size is 1,024 bytes larger than the expected buffer size. Cutting off these additional 1,024 bytes from the reported buffer size, we get the expected buffer size of 30882816 bytes and full-screen screen capture worked again.

But that did not really solve the problem, just a single case.

What if you apply scaling? Or want to capture only a sub-region of the screen?

I ran the test again, but with two types of scaling applied:

Higher resolution: 2056x1329

Reported buffer size: 43,892,736
Expected buffer size: 43,718,784
Diff: 173,952

Cutting off the excess bytes did not fix the image buffer. Instead, it resulted in tilted images similar to the one I provided at the beginning of this post.

Looking at the tilted image I suspected a byte width problem.

The expected byte width would be

size_t expectedByteWidth = expectedBufferSize / (rect.size.height * pixelDensity);

and we can determine the byte width of the CGImageRef using ‌ CGImageGetBytesPerRow:

Reported byte width: 16,512
Expected byte width: 16,448
Diff: 64

Every row of the image has an additional 64 bytes, which explains why the image is tilted and shows a black diagonal.

Lower resolution: 1496x967

Reported buffer size: 23,281,664
Expected buffer size: 23,146,112
Diff: 135,552

The same problem happened with a lower screen resolution:

Reported byte width: 12,032
Expected byte width: 11,968
Diff: 64

Smaller screen region: 100x100

Reported buffer size: 180,224
Expected buffer size: 160,000
Diff: 164,224

And with smaller screen regions:

Reported byte width: 896
Expected byte width: 800
Diff: 96

These numbers show that image data is strided:

|XXXXX——|XXXXX——|XXXXX——|XXXXX——|

Manually assembling the image buffer solves this problem:

auto parts = bufferSize / reportedByteWidth;

for (size_t idx = 0; idx < parts - 1; ++idx) {
    std::memcpy(buffer + (idx * expectedByteWidth),
                dataPointer + (idx * reportedByteWidth),
                expectedByteWidth
    );
}

Having this logic in place in cases where expectedBufferSize < bufferSize ‌ fixes screen capture on macOS and should work across different screen resolutions and/or display types.

After all these tests I came to the conclusion that this issue was re-introduced in macOS Ventura. Similar to the my first encounter with it it was not present in Monterey and appeared in Ventura. I’m curious whether it’ll disappear again in one of the next releases of macOS.

Let’s see if Apple comes up with yet another way to break things in nut.js.

All the best

Simon