Depicting an iOS Vulnerability

On March 31, 2025, Apple released iOS 18.4, reportedly fixing 76 vulnerabilities. One of those vulnerabilities was in IOGPUFamily, a kernel driver responsible for handling communication with the GPU. Apple describes the issue as an out-of-bounds write:

IOGPUFamily

Impact: An app may be able to cause unexpected system termination or write kernel memory

Description: An out-of-bounds write issue was addressed with improved input validation.

CVE-2025-24257: Wang Yu of Cyberserval

While Apple’s descriptions can be misleading or sometimes blatantly wrong, it still gives a general idea of what the vulnerability might look like.

Discovery

This bug could be discovered via static analysis, however since we have the patched and unpatched kernel images (and know in which driver the bug resides), we can use diffing to find the changed code. Since IOGPUFamily is not open source, we can use a disassembler such as IDA with diffing programs like Diaphora or BinDiff.

Since macOS is also affected (macOS and iOS share a kernel and many drivers), we can use the symbolicated IOGPUFamily kext from the macOS KDK for our analysis.

Running Diaphora on the latest vulnerable kext and the patched version yields 85 changed functions in the “Interesting matches” tab. Since we don’t know which function contains the vulnerable code, we have to look through each match and analyse the differences.

One of these functions is IOGPUResource::newResourceGroup, and looking at the pseudocode view of the diff shows the following:

Before the patch:

if ( userHashSize )
    localHashSz = userHashSize; // fully user controlled
else
    localHashSz = 0x40;

a4.i32[0] = (int)localHashSz;
v5 = (uint8x8_t)vcnt_s8(a4);
v5.i16[0] = vaddlv_u8(v5);

if ( v5.i32[0] == 1 )
{
    AGXResourceTexture = IOGPU->createResource(this, a2, 3);
    if ( AGXResourceTexture )
    {
        AGXResourceTexture->lock = IOLockAlloc();
        groupMemory = IOGPUGroupMemory::groupMemory(this, localHashSz);
        AGXResourceTexture->groupMem = groupMemory;
        if ( !groupMemory )
            goto nomem;

        mallocedBuf = IOMallocTypeImpl();
        *(_OWORD *)mallocedBuf = 0u;
        *(_OWORD *)(mallocedBuf + 16) = 0u;
        *(uint64_t *)(mallocedBuf + 32) = 0LL;
        *(uint64_t *)(mallocedBuf + 40) = 0LL;
        *(uint64_t *)(mallocedBuf + 48) = 0LL;
        *(uint64_t *)(mallocedBuf + 56) = 0LL;
        AGXResourceTexture->m_mallocedBuf = mallocedBuf;

        if ( IOGPUCountedMap<unsigned long long,IOGPUResource *,IOGPUResourceCountedMapBucket,IOGPUIOLibAllocatorPolicy>::init(
              mallocedBuf,
              (unsigned int)localHashSz,
              0x100000u) )
        {
            LODWORD(AGXResource_obj->resType) |= 0x200u;
            ...
        }
        else
        {
nomem:
            ((void (__fastcall *)(AGXResource *))AGXResource_obj->vtab->release_0)(AGXResource_obj);
            return 0LL;
        }

And the patched version:

if ( (unsigned int)userHashSize > 0x3F )
{
    if ( ((unsigned int)userHashSize ^ ((uint32_t)userHashSize - 1)) <= (int)userHashSize - 1 )
    {
        _os_log_internal(
          &dword_0,
          (os_log_t)&_os_log_default,
          OS_LOG_TYPE_FAULT,
          "%s: newResourceGroup bad initial capacity: %d\n",
          "static OSPtr<IOGPUResource> IOGPUResource::newResourceGroup(IOGPU *, IOGPUDevice *, uint32_t)",
          userHashSize);
        return 0LL;
    }
    else
    {  
        AGXResourceTexture = IOGPU->createResource(this, a2, 3);
        if ( AGXResourceTexture )
        {
            AGXResourceTexture->lock = IOLockAlloc();
            groupMemory = IOGPUGroupMemory::groupMemory(this, userHashSize);
            AGXResourceTexture->groupMem = groupMemory;
            if ( !groupMemory )
                goto nomem;

            mallocedBuf = IOMallocTypeImpl();
            *(_OWORD *)mallocedBuf = 0u;
            *(_OWORD *)(mallocedBuf + 16) = 0u;
            *(uint32_t *)(mallocedBuf + 32) = 0;
            *(uint64_t *)(mallocedBuf + 40) = 0LL;
            *(uint32_t *)(mallocedBuf + 36) = 0;
            *(uint64_t *)(mallocedBuf + 48) = 0LL;
            *(uint64_t *)(mallocedBuf + 56) = 0LL;
            AGXResourceTexture->m_mallocedBuf = mallocedBuf;

            if ( IOGPUCountedMap<unsigned long long,IOGPUResource *,IOGPUResourceCountedMapBucket,IOGPUIOLibAllocatorPolicy>::init(
                   mallocedBuf,
                   (unsigned int)userHashSize,
                   0x100000u) )
            {
                LODWORD(AGXResource_obj->resType) |= 0x200u;
                ...
            }
            else
            {
nomem:  
                ((void (__fastcall *)(AGXResource *))AGXResource_obj->vtab->release_0)(AGXResource_obj);
                return 0LL;
            }

At the start of the function, you can see additional checks were added to validate the user-provided userHashSize value. Note that this is the first time in the call path that this value is validated.

In the original version, the code checks if the value is zero, and assigns localHashSz to 0x40 if so. If it’s non-zero, it will use the user-provided value. In either case, it will verify the value is a power of two (this check is shown as the floating point operations in the pseudocode).

In the patched version, the code still checks whether the value is a power of two (x ^ (x - 1) <= x - 1), but ensures the value is also larger than 0x40.

We can see the error message refers to this value as an “initial capacity”, which gives us a good indication that this is the buggy code path.

While this is indeed the bug, it’s not yet clear how this value is used by the code and what issues it may cause. Let’s first take a step back to understand what this code path is doing.

Note: You may notice that the buggy size is also passed in the call to IOGPUGroupMemory::groupMemory. This function internally creates an IOGPUCountedSet object, which also has its own set of issues due to this buggy size check. In the interest of brevity, we will focus on the IOGPUCountedMap object in this blog post, and the IOGPUCountedSet is left as an exercise for the reader.

`IOGPUGroupMemory`

User processes like Safari, backboardd, or games, will create many texture objects as part of their GUI rendering process. In usermode, this is handled by Apple’s Metal framework, which communicates with the IOGPUFamily driver under-the-hood to render those objects on the GPU. In order to optimize this process, Apple groups objects (ie. textures) together. This reduces CPU overhead, improves efficiency, and allows them to extract better performance out of the hardware they have available. The Metal documentation here goes into further detail on this process. While this documentation is a few years old and refers to the usermode API, Metal, it makes sense that Apple would want to apply this optimization in the kernel driver too.

Previously a user process would create a GPU resource (with, for example, IOSurface backing), retrieve the underlying shared memory, copy the texture bitmap, submit the buffer (ie. wire it into the GPU pagetables), and issue a draw command.

Using IOGPUGroupMemory, the process can group together multiple objects of the same type. In the driver, Apple implements this by creating a hash map in the IOGPUResource object. With the new newResourceGroup call, these objects can be batched together in the hashmap. Then, when the user process wants to wire backing pages into the GPU’s pagetable, or tear down a group, it can do so in one call.

Hashmaps

A hashmap is a data structure used in computer science that stores key/value pairs. By using a hashing function applied to the key, it allows for near-constant-time insertion, lookup, and deletion of objects within the map. Since the hashing function may produce collisions between unique keys (for example, the keys ‘A’ and ‘B’ may hash to the same index), you need a way to resolve these collisions. To achieve this, each array slot (a “bucket”) can hold a secondary container such as a linked list or dynamic array, which stores all key/value entries whose hash value maps to the same index. This allows multiple items to co-exist at the same index, while preserving good performance.

The bucketing strategy is implementation-defined, however two common types are:

Chaining (closed addressing): Each bucket object will have a linked list (or similar structure) pointing to the next entry. If multiple keys map to the same bucket, they are chained (linked) together.
Open addressing: All values are stored in the map itself (without using sub-objects), and an algorithm determines the next free slot to be used in the map.

The performance and memory efficiency depends on the task and the implementation itself, but on average, hashmaps provide a constant O(1) lookup time; though this also depends on the number of collisions and the size of the table.

A closed addressed (chained) implementation might look like this (if we assume that keys ‘A’ and ‘B’ collide in our hashing implementation):

Bucket[1] ->  ---------
              | key: A |
              | val: 1 |
              | next ----> ---------
              ---------    | key: B |
                           | val: 2 |
                           | next: null |
                           ---------

Bucket[2] -> null

As opposed to an open addressed implementation:

Bucket[1] ->  ---------
              | key: A |
              | val: 1 |
              ---------

Bucket[2] ->  ---------
              | key: B |
              | val: 2 |
              ---------

Bucket[3] -> null

With that in mind, we can now take a look at the IOGPUGroupMemory code to understand how the hash table is implemented, and which addressing scheme Apple uses.

`IOGPUCountedMap`

An astute reader who has spent some time with vulnerabilities might notice that the patched check shown before ensures that the value cannot be lower than 0x40, an indication that there might be some underflow issue.

Indeed, looking at IOGPUCountedMap::init (which is called from IOGPUResource::newGroupMemory), reveals the following:

bool __fastcall IOGPUCountedMap<unsigned long long,IOGPUResource *,IOGPUResourceCountedMapBucket,IOGPUIOLibAllocatorPolicy>::init(
        IOGPUCountedMap *hopmap,
        uint32_t userHashSize,
        uint32_t a3)
{
    uint64_t *v5; // x0
    bool v6; // zf

    if ( a3 < 2 * userHashSize )
        return 0LL;

    hopmap->capacity = userHashSize;
    hopmap->mask = userHashSize - 1;
    hopmap->group_select_mask = (userHashSize >> 6) - 1; // <-- A
    hopmap->log2Sz = __clz(__rbit32(userHashSize));
    *(_DWORD *)&hopmap->buf_dummy[4] = a3;
    hopmap->hashMap = (uint64_t *)IOMallocZeroData();
    hopmap->IOMalloc = (uint64_t *)IOMallocTypeVarImpl(..., 8LL * hopmap->capacity);
    hopmap->IOMalloced_via_user_sz = (void *)IOMallocTypeVarImpl(..., 16LL * hopmap->capacity);
    v5 = (uint64_t *)IOMallocZeroData();
    hopmap->hopMapBits = v5;

    if ( !hopmap->hashMap )
        return 0LL;
    if ( hopmap->IOMalloced_via_user_sz )
        v6 = v5 == 0LL;
    else
        v6 = 1;

    return !v6;
}

Our user-controlled value with the buggy size check is passed into this function as userHashSize. We can see at [A], this value is right-shifted by 6 (equivalent to a division by 0x40), and one is subtracted. In most cases this operation would be fine, unless you provide a value smaller than 0x40. For example, 0x2 right shifted by 6 is zero, and subtracting one causes an integer underflow, leaving us with the value 0xffffffff assigned to group_select_mask.

Other fields such as log2Sz also become “unexpected”, too. When using a value of 0x40 or larger, it would be at least 6. When using 0x2, it becomes 1.

In order to understand what this bug grants us, we can look at the insertion and deletion code for the hashmap.

hopscotch

As discussed before, there are many implementations for hashmaps, but Apple decided to use hopscotch.

Hopscotch hashing is a type of open addressing. The basic idea behind hopscotch is that it creates a number of buckets, and if a collision would occur, it would use the neighbouring free buckets to store the value there. Now of course, in order to make this more efficient than linear probing, a form of record keeping is implemented. Hopscotch does this by creating a bitmap, essentially keeping an “occupied” bit for the neighbouring buckets which the key would’ve hashed to. This also guarantees that a key will be a fixed offset from the home bucket before needing to rehash.

In practice, the bitmap information associated with the bucket tracks n positions ahead. Say for example the 3rd bit is set; that tells us that there is a key which originally hashed to this bucket ‘home’, but since it was already occupied it will be placed three slots away, ie. into position ‘home + 3’. This works well, as during a lookup we can read the bitmap and check the offsets indicated by it.

To understand the concept better, the diagrams below demonstrates this:

Bucket[0] ->  ---------     bitinfo:  ----------
              | key: A |             | 00000101 | -> the Home bucket (Bucket[0]) is occupied by this key
              | val: 1 |             |          | -> the neighbouring bucket (Bucket[2]) is occupied
              | bitmap | --------->   ----------  bits {0,2} is set, A = Home, B displaced to Home+2
              ---------             

Bucket[1] ->  ---------     bitinfo:  ----------
              |key:null|             | 00000000 | -> This bucket is completely empty
              | val: 0 |             |          | 
              | bitmap | --------->   ----------   
              ---------

Bucket[2] ->  ---------
              | key: B |
              | val: 2 |
              | bitmap |
              ---------


Bucket[3] ->  ---------
              |key:null|
              | val: 0 |
              | bitmap |
              ---------

Now, the way it was implemented in the kext differs to the above diagram slightly. For example, instead of just creating a ‘random’ number of buckets, they create groups, and in a group there are 64 buckets. This means that the total capacity (dictated by userHashSize) must be a multiple of 64, and if the input is for example 128, then there will be 2 groups of 64 buckets created.

The actual implementation of this is quite extensive and going line by line isn’t necessary here. A high level overview of the state of ‘groups’ works as shown:

            capacity: 128

group[0] --+              group[1] --+
           |                         |
           |                         |
           |                         |
 Buckets:  |              |  +-------+
  +--------+              |  |
  |                       |  |
  V                       |  V
 [0], [1], [2], ..., [63] | [64], [65], ..., [127]
  |                       |    
  |
  |
  |
  +----> Bucket[0] ->   ---------    bitinfo:  ----------
                       | key: A |             | 00000101 |
                       | val: 1 |             |          | 
                       | bitmap | --------->   ----------  
                        ---------  

This grouping matters because one of the critical fields is group_select_mask, which is derived from the calculation (userHashSize >> 6) - 1. When this calculation underflows (ie. userHashsize is smaller than 64), it will leave the whole structure in an inconsistent state.

Once in an inconsistent state, we have functions like add_group_resources and add_group_resources_fast. These are called from userspace, and essentially, will fetch one of these AGX objects, and add it to the hashmap:

__int64 __fastcall IOGPUResource::add_group_resources_fast(
        IOLock **this,
        IOGPUResource **malloced_obj, // Buffer where to-be-added objects are stored
        unsigned int user_input_scalar) // Number of objects in the buffer
{
    ...
    mallocedObj = malloced_obj;
    while ( 2 )
    {
        AGXResource_fetched = (AGXResource *)mallocedObj[v6];
        if ( !AGXResource_fetched || LOBYTE(AGXResource_fetched->resType) == 3 )
            IOGPUResource::add_group_resources_fast();

        globalObjectIDCounter = AGXResource_fetched->globalObjectIDCounter;
        hash_idx = (0x9E3779B97F4A7C15LL * globalObjectIDCounter) >> -LOBYTE(AGXResource_gpuCountedMap->log2Sz); // <-- A
        object_entry = *((_DWORD *)AGXResource_gpuCountedMap->hashMap + (unsigned int)hash_idx); // <-- B
        do
        {
            if ( !object_entry )                    // The hashmap entry is empty
            {
                while ( 2 )
                {
                    v33 = 0;
                    mask = AGXResource_gpuCountedMap->capacity - 1;
                    group_select_mask = AGXResource_gpuCountedMap->group_select_mask;
                    indexFor = (0x9E3779B97F4A7C15LL * globalObjectIDCounter) >> -LOBYTE(AGXResource_gpuCountedMap->log2Sz);
                    v37 = indexFor & 0x3F;
                    hopMapBits = AGXResource_gpuCountedMap->hopMapBits;
                    occupancy = hopMapBits[(unsigned int)indexFor >> 6] | ~(-1LL << indexFor); // <-- C
                    v39 = (unsigned int)indexFor >> 6 << 6;
                    v40 = ((unsigned int)indexFor >> 6) + 1;
                    ...
                    v32 = (unsigned int)__clz(__rbit64(~occupancy)) + v39;
                    ...
                    hopMapBits[group_select_mask & ((unsigned int)v32 >> 6)] |= 1LL << v32; // <-- D
                    ...
                    AGXResource_gpuCountedMap->hashMap[indexFor] |= 1 << (object_idx - indexFor); // <-- E
                    AGXResource_gpuCountedMap->hopMapBits[group_select_mask & (object_idx >> 6)] |= 1LL << object_idx; // <--- F
                }
                ...
            }
            ...
        }
    }
    ...
}

Note: this code is heavily abridged to highlight the main issues. As such, not all variable definitions are present, and the exact detail of the logic is excluded.

At [A], the code uses the “global object ID” of the object we want to add as the key for the hash map. This is multiplied by the ‘golden hash’ ratio, resulting in a 64-bit hash. In order to convert this hash into an index into the backing array, the code right shifts it by the negated value of log2Sz. However, since we can set up the hashmap with a capacity of 1, log2Sz becomes zero, leading the right-shift operation to be a no-op. We are then left with a huge value in hash_idx, which is then used in [B] as an out-of-bounds index into the hash map. Later it’s calculated again (as indexFor), and can cause an out-of-bounds read at [C], and three writes at [D] - [F].

There are further other out-of-bounds issues in the code, such as when the userclient closes, and the objects need to be cleaned up:

void __fastcall IOGPUGroupMemory::removeMemoryFromResourceMap(
        IOLock **this,
        IOGPUCountedMap *gpu_countedMap,
        int const_zero)
{
    uint32_t idx; // w22
    unsigned __int64 bits_set; // x24
    unsigned __int64 bucket_idx; // x8
    __int64 vars8; // [xsp+48h] [xbp+8h]

    IOGPUMemory::lock(this);

    group_idx = 0;
    do
    {
        bits_set = gpu_countedMap->hopMapBits[group_idx]; // <-- A

        while ( bits_set )
        {
            bucket_idx = __clz(__rbit64(bits_set));
            bits_set &= ~(1LL << bucket_idx);

            IOGPUGroupMemory::remove_memory_object(
                (IORegistryEntry **)this,
                *(IOGPUMemory **)(*((_QWORD *)gpu_countedMap->IOMalloced_via_user_sz + 2 * (bucket_idx | (group_idx << 6))) + 0x28LL), // <-- B
                const_zero);
        }

        ++group_idx;
    }
    while ( group_idx <= gpu_countedMap->group_select_mask );

    if ( ((vars8 ^ (2 * vars8)) & 0x4000000000000000LL) != 0 )
        __break(0xC471u);

    IOGPUMemory::unlock(this);
}

Here, the code first loops over all the groups from 0 -> group_select_mask, indexing hopMapBits to determine the indexes of objects which are set within that group. Eventually group_idx will be incremented past the size of the hopMapBits, causing an out-of-bounds read at [A].

Likewise, due to the out-of-bounds group_idx, another out-of-bounds read will occur at [B] when the function fetches the object pointer from IOMalloced_via_user_sz.

Unfortunately, group_select_mask will always be 0xffffffff in the buggy scenario. Even if you can derive useful behaviour from the remove_memory_object call, the loop will eventually perform accesses up to ~32GB out of bounds (at least for read [A]), which will always inevitably lead to a crash. Since there is no way to break out of this loop early, this is unavoidable, and as such, is not a viable path to consider for exploitation.

Exploitation

In recent years, while bugs have remained in the iOS/macOS kernel, the number of exploits have drastically declined. This is due to Apple’s constant push for stronger and stronger mitigations in the kernel. While there are many, the main one which applies here is Apple’s use of a “data-only” heap within the kernel.

At boot, the kernel’s virtual memory subsystem allocates a zone map. From this, whenever the core or a kext wants to allocate some memory, it can do so through the kalloc_* API.

In iOS 14, Apple introduced zone sequestering, allowing them to divide the zone into multiple subzones. Furthermore, in iOS 15, they introduced kalloc_type. This allows them to not only group specific zones, but also objects by their type. Previously, objects were grouped based on their size alone, for example:

kalloc.16                     16
kalloc.32                     32
kalloc.48                     48
kalloc.64                     64

In this setup, any two objects of the same size will be allocated in the same zone, and can be adjacent in memory.

This makes exploitation of UaF vulnerabilities trivial, as an attacker can spray any object of the same size to replace the dangling one. Likewise, for out-of-bounds accesses, an attacker can spray objects of the same size such that your “attacker” and “target” objects would be adjacent on the heap.

Under kalloc type, kalloc size buckets are further divided into type ranges:

kalloc.type0.16               16
kalloc.type1.16               16
kalloc.type2.16               16
kalloc.type3.16               16
kalloc.type4.16               16
kalloc.type5.16               16
kalloc.type6.16               16
kalloc.type.var1.16           16
kalloc.type.var2.16           16
kalloc.type.var3.16           16
kalloc.type.var4.16           16
kalloc.type.var5.16           16
kalloc.type.var6.16           16
data.kalloc.16                16

With kalloc_type, a structure is created at compile-time in the kernel, which holds a “signature” for the object, among other fields. The signature is based on the field types within an object, based on an 8-byte granule.

For example, a structure with a pointer and a data member would get the following signature:

struct some_struct {
    void *ptr;      // Pointer type - 1
    uint64_t size;  // Data type - 2
};

// Resulting signature: "12"

Furthermore, statically and variably sized types are further split into kalloc.typeN and kalloc.type.varN zones.

Any types which contain purely data (ie. no pointers) are allocated in the data.kalloc zones, which are again separated from kalloc.type{.var} allocations. As such, even with memory corruption in a data zone, you are unable to gain any control of the kernel, as there are simply no pointers to attack, for example to build an arbitrary read or write primitive.

As noted above, there are 2 heap allocations we can potentially access out-of-bounds: hashMap, and hopMapBits.

If we refer to the initialization code, we can see both objects are allocated on the data heap (IOMallocZeroData specifically requests an allocation in the data.kalloc heap, which is zero’d by the allocator):

hopmap->hashMap = IOMallocZeroData(...);
hopmap->hopMapBits = IOMallocZeroData(...);

As such, even if you are able to coerce the code into writing controlled values (which alone is a big undertaking due to the complexity of the hashmap logic), type separation means you are unable to corrupt any useful object within kernel memory.

Conclusion

In this blog post we discussed the use of diffing to discover patched bugs in the XNU kernel (or more specifically, its drivers), and demonstrated how a single missing check can lead to a multitude of state issues within a complex object. We also discussed how this bug is unexploitable on modern Apple kernels, due to heap mitigations added over the last few years.

Credits

We would like to thank Tomi Tokics (@tomitokics) from Dataflow Forensics for this blog post, and Ben Sparkes (@iBSparkes) from Dataflow Security for his assistance.