Thursday, September 3, 2020

Windows Hyper-V Denial of Service Vulnerability in nested virtualization component (CVE-2020-0890)


Vulnerability is triggered from guest OS with nested virtualization option enabled. Tested on Windows Server 2019 August 2020 updates, Windows 10 20H1 August 2020 updates, and Windows 10 21H1 Preview (builds 20206.1000 and early).


Bug is also presented in original version of Windows Server 2019 without patches (November 2018).


PoC source on Github: https://github.com/gerhart01/hyperv_local_dos_poc


Software:

Windows Server 2019 in host OS;

Windows Server 2019 in guest OS;

WinDBG preview

IDA PRO freeware


Hyper-V nested virtualization for Intel CPU was introduced in Windows Server 2016 and Windows 10 Anniversary update (2016 year): https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/nested-virtualization. It can be used for launching hypervisor inside guest OS (or some features like Windows Sandbox, MDAG inside Hyper-V VM).


Technically mistake of Hyper-V coder is very simple – they didn’t filter parameters of VP Assist Page, which address is written to Virtual VP Assist MSR (0x40000073). Early msr register 0x40000073 have name HV_X64_MSR_APIC_ASSIST_PAGE, now (in TLFS 6.0) - HV_X64_MSR_VP_ASSIST_PAGE


HV_X64_MSR_VP_ASSIST_PAGE msr structure:



typedef union _VIRTUAL_VP_ASSIST_PAGE_PFN

{

    UINT64 AsUINT64;

    struct

    {

        UINT64 Enable : 1;

        UINT64 Reserved : 11;

        UINT64 PFN : 52;

    };

} VIRTUAL_VP_ASSIST_PAGE_PFN, * PVIRTUAL_VP_ASSIST_PAGE_PFN;


According Hyper-V TLFS 6.0 VP Assist Page is overlay page. GPA address of that page is written to PFN field:


typedef union _HV_VP_ASSIST_PAGE

{

    struct

    {

        //

        // APIC assist for optimized EOI processing.

        //


        HV_VIRTUAL_APIC_ASSIST ApicAssist;

        UINT32 ReservedZ0;


        //

        // VP-VTL control information

        //


        HV_VP_VTL_CONTROL VtlControl;

        HV_NESTED_ENLIGHTENMENTS_CONTROL NestedEnlightenmentsControl;

        BOOLEAN EnlightenVmEntry;

        UINT8 ReservedZ1[7];

        HV_GPA CurrentNestedVmcs;

        BOOLEAN SyntheticTimeUnhaltedTimerExpired;

        UINT8 ReservedZ2[7];


        //

        // VirtualizationFaultInformation must be 16 byte aligned.

        //


        HV_VIRTUALIZATION_FAULT_INFORMATION VirtualizationFaultInformation;

    };

    UINT8 ReservedZBytePadding[HV_PAGE_SIZE];

} HV_VP_ASSIST_PAGE, * PHV_VP_ASSIST_PAGE;


If we write PFN of zeroed page to HV_X64_MSR_VP_ASSIST_PAGE msr, we get BSOD.


Изображение


Windows 10 immediately reboots, even if automatic reboot option is disabled. If we connect debugger to hvix64.exe (Windows Server 2019, 08.2020 updates, hvix64.exe, build 10.0.17763.1397) we get:


hv+0x28af50:

fffff982`efc8af50 cc              int     3

1: kd> g

Access violation - code c0000005 (!!! second chance !!!)

hv+0x27747e:

fffff982`efc7747e 384249          cmp     byte ptr [rdx+49h],al


2: kd> k

 # Child-SP          RetAddr               Call Site

00 00000100`00803d08 fffff982`efc75e1b     hv+0x27747e

01 00000100`00803d10 fffff982`efcfd74f     hv+0x275e1b

02 00000100`00803d60 fffff982`efc82729     hv+0x2fd74f

03 00000100`00803d90 fffff982`efc1691f     hv+0x282729

04 00000100`00803df0 fffff982`efc1816b     hv+0x21691f

05 00000100`00803e80 fffff982`efc8c571     hv+0x21816b

06 00000100`00803fc0 00000000`00000000     hv+0x28c571


2: kd> r

rax=ffffe802c560d000 rbx=ffffe802c5607050 rcx=ffffe802c5608d00

rdx=0000000000000000 rsi=0000000000000000 rdi=ffffe802c5608000

rip=fffff982efc7747e rsp=0000010000803d08 rbp=0000000000000014

 r8=0000000000000000  r9=0000000000000000 r10=0000000000000000

r11=0000000000000014 r12=0000000000000000 r13=ffffe802c56078d0

r14=ffffe802c5608d00 r15=ffffe802c5607630

iopl=0         nv up di pl zr na po nc

cs=0010  ss=0020  ds=0020  es=0020  fs=0020  gs=0020             efl=00010046

hv+0x27747e:

fffff982`efc7747e 384249          cmp     byte ptr [rdx+49h],al ds:0020:00000000`00000049=??


Windows Server 2019 generates crash dump:





Exploit source code is present for Intel CPU only (technically exploit must works on AMD platform too, but no pc with such CPU around). Described shortly:


  1. Activate VMX feature in guest OS (Next command in host OS must be executed for supporting: Set-VMProcessor -VMName <VMName> -ExposeVirtualizationExtensions $true);

  2. Allocate and activate VMXON region;

  3. Allocate VP Assist Page;

  4. Get physical address of VP Assist Page and write it to HV_X64_MSR_VP_ASSIST_PAGE msr; 

  5. Execute vmclear, then vmlaunch and get BSOD.


Vulnerability internals


When instruction cmp  byte ptr [rdx+49h],al is executed, rdx contains 0, and we get access to zero pointer. It is simply NULL pointer dereference, but rdx is not controlled from guest OS address space.



There is no symbols for hvix64.exe, therefore procedures have BSOD-related functionality names with level call index, nearest level is 1. That code blocks prepared all necessary for vmlaunch instruction execution in hypervisor context.


When is this block executed?

Caller block L2 is not interesting



But next caller level is important. 



How r8b is controlled?



When VP Assist Page is zeroed:


WINDBG>dps poi(@rsi+198)+40

ffffe802`c5608040  ffffe802`c561c000 – address of overlay VP Assist Page. Don’t changed after host OS reboot.

ffffe802`c5608048  00000000`000f000f

ffffe802`c5608050  00000000`00000000

ffffe802`c5608058  00000000`00000000

ffffe802`c5608060  00000000`00000000


WINDBG>dps ffffe802`c561c000

ffffe802`c561c000  00000000`00000000

ffffe802`c561c008  00000000`00000000

ffffe802`c561c010  00000000`00000000

ffffe802`c561c018  00000000`00000000

ffffe802`c561c020  00000000`00000000

ffffe802`c561c028  00000000`00000000 – rcx+28h

ffffe802`c561c030  00000000`00000000


if rcx+28 != 0, then r8b = 1.


We can debug PoC driver in guest OS step-by-step and see physical and virtual addresses of variables, that are passed to hypervisor:




WINDBG>dps ffffe802`c561c000

ffffe802`c561c000  00000000`00000000

ffffe802`c561c008  00000000`00000000

ffffe802`c561c010  00000000`00000000

ffffe802`c561c018  00000000`00000000

ffffe802`c561c020  00000000`00000000

ffffe802`c561c028  00000000`00000001 - pHvVpPage->EnlightenVmEntry

ffffe802`c561c030  00000000`7ff23000 - pHvVpPage->CurrentNestedVmcs

ffffe802`c561c038  00000000`00000000

ffffe802`c561c040  00000000`00000000

ffffe802`c561c048  00000000`00000000

ffffe802`c561c050  00000000`00000000

ffffe802`c561c058  00000000`00000000


Next action is very simple, if pHvVpPage->EnlightenVmEntry == 0, we get BSOD. Hypervisor simply don’t verified VP Assist Page content, when vmlaunch is executed.

Overlay page initialization problem


BSOD is not one problem.

2nd problem, that even VP Assist Page was filled with actual values before writing to HV_X64_MSR_VP_ASSIST_PAGE msr, parameters will not be passed to hypervisor. Why it happened? It is feature of hypervisor overlay page (or bug).


According 5.2.1 section of Hyper-V TLFS 6.0:


The hypervisor defines several special pages that “overlay” the guest’s GPA space. The hypercall code page is an example of an overlay page. Overlays are addressed by guest physical addresses but are not included in the normal GPA map maintained internally by the hypervisor. Conceptually, they exist in a separate map that overlays the GPA map.

If a page within the GPA space is overlaid, any SPA page mapped to the GPA page is effectively “obscured” and generally unreachable by the virtual processor through processor memory accesses. Furthermore, access rights installed on the underlying GPA page are not honored when accessing an overlay page.


Lets do experiment. Before write address of buffer to HV_X64_MSR_VP_ASSIST_PAGE msr, we need allocate it. Fill that buffer with numbers 0x11


FillBuffer((PCHAR)pHvVpPage, PAGE_SIZE, 0x11);

__writemsr(HV_X64_MSR_APIC_ASSIST_PAGE, guestPFN.AsUINT64);


and see it content in LiveCloudKd (we know physical and virtual addresses from standard WinDBG, launching in source mode debugging mode):



In standard attaching to guest OS kernel debugger:



Attach LiveCloudKd to same VM at same time. First, we can see enlightenment structure. Some values in CurrentNestedVmcs and EnlightenVmEntry



And see same values in VP Assist Page (we see same address as from standard debugger):



After writing to HV_X64_MSR_VP_ASSIST_PAGE msr we can see some old garbage inside hypervisor (Address of overlay page is constant, as we see early. Driver was restarted, guest OS is not rebooted)


WINDBG>dps ffffe802`c561c000 – inside hypervisor

ffffe802`c561c000  00000000`00000000

ffffe802`c561c008  00000000`00000000

ffffe802`c561c010  00000000`00000000

ffffe802`c561c018  00000000`00000000

ffffe802`c561c020  00000000`00000000

ffffe802`c561c028  00000000`00000001

ffffe802`c561c030  00000000`7ff23000

ffffe802`c561c038  00000000`00000000

ffffe802`c561c040  00000000`00000000

ffffe802`c561c048  00000000`00000000

ffffe802`c561c050  00000000`00000000

ffffe802`c561c058  00000000`00000000


Return back to guest OS debugger. Yes, 0x11 values in overlay page was replaced by garbage values from last HV_X64_MSR_VP_ASSIST_PAGE msr writing:



If we want to write something in overlay page after we wrote it address to HV_X64_MSR_VP_ASSIST_PAGE msr, all will be correct.



LiveCloudKd shows old values, because it parses original values in guest memory, which is mapping in host OS using MDL and hostPFN-to-GuestPFN page map. Overlay page is stored inside hypervisor and replaced purely, when guest OS try read\write memory from its diapason. Interesting, that page attributes in guest page are not changed.


What I checked: overlay page ffffe802`c561c000 inside hypervisor constantly change CR3, but physical address of it doesn’t change. Probably, need additional investigation in it (but it is not bug releated research). 


Conclusion: two bugs was found:


  • Incorrect VP Assist Page values handling during vmlaunch emulation, which cause null pointer deference error and next BSOD.

  • Incorrect handling of overlay page initialize. It simply switched to another memory buffer even without clearing old values, which is stored inside hypervisor, or copying values from guest page, which address was pointed in HV_X64_MSR_VP_ASSIST_PAGE msr. 


                    Patch is simple:




MSRC communications history:

21 august 2020 – first submission was opened but next rejected, because of attach was blocked on some mail server.

21 august 2020 – second submission was opened.

22 august 2020 – case for second submission was opened.

24 august 2020 – DoS behavior of PoC was confirmed.

27 august 2020 – MSRC message – «Our team has completed a fix for this issue, and we plan on releasing it as part of the September Update Tuesday, assuming there are no issues during the next few weeks of testing. We have assigned CVE-2020-0890 to this issue».

31 august 2020 – MSRC bounty program message: «So the first report was sent to us months before we received you submission».

08 september 2020 - CVE-2020-0890 was published.


I still don’t believe in «sent to us months before», because of bug was not patched in Windows Insider builds )

Very dark story. It looks like program manager did not pass some information to developer team.

Update: According CVE-2020-0890 was discovered by another researcher first, but it was presented in Windows 10, build 1803 (April 2018). It means, that vulnerability was presented during 2,5 year, was discovered by another researcher, but was patched during 18 day after my report. You can independently calculate the probability of such events )