Friday, June 19, 2020

Hyper-V memory internals. EXO partition memory access

           Software, used in article (operation systems have June 2020 patches):

Windows 10 20H1, build 19041 x64
Windows 10 1803 x64
VMware Workstation 20H2 preview
VirtualBox 6.1.8
LiveCloudKd
Process Hacker
PyKd plugin for WinDBG
WinDBG Preview

            The testing lab works on Intel-based PC. Therefore, Intel specific Hyper-V terms: hvix64.exe will be used in article context. By the way, Windows 10 20H2 preview from build 19640 supports nested AMD virtualization (https://techcommunity.microsoft.com/t5/virtualization/amd-nested-virtualization-support/ba-p/1434841), and it looks like more PC will be working with Hyper-V.

Terms and definitions:

  • WDAG – Windows Defender Application Guard (MDAG – Microsoft Defender Application Guard for newer Windows versions);

  • Full VM (virtual machine) – virtual machine, which was created in Hyper-V manager. Differs from WDAG container, Windows Sandbox, docker in Hyper-V isolation mode;

  • Root OS – operation system, where server part of Hyper-V is working;

  • Guest OS – operation system, which works in Hyper-V emulation context, uses virtual devices, which is presented by Hyper-V infrastructure. It can be Full VM, Hyper-V containers, WHVP-based VM;

  • TLFS – Hypervisor Top-Level Functional Specification 6.0;

  • GPA (guest physical address) – Guest OS physical memory address;

  • SPA (system physical address) – Root OS physical memory address;

  • Hypercall – hypervisor service function, which is called by vmcall execution with specifying hypercall number;

  • VBS – Virtualization Based Security;

  • VPN – virtual page number;

  • GPN – guest page number;

  • EXO-partition – partition object, which is created, when virtual machines is running using Windows Hypervisor Platform API;

  • WHVP API – Windows HyperVisor Platform API.


Intro
Hyper-V virtualization platform, developed by Microsoft, appeared a long time ago - the first report about it was published at the WinHec conference in 2006, the platform was integrated into Windows Server 2008. At first, Microsoft rather willingly shared an API functions description, they were even present in Microsoft Windows SDK 7.0, but then their policy changed, and official information about Hyper-V interfaces became less and less, it was presented only in Hyper-V Top Level Function Specification provided for developers of operating systems, who want make to do their OS compatible with Hyper-V. The problem arose, after Microsoft introduced Virtualization Based Security (VBS) technology in Windows 10, which components (Device Guard, Code Integrity and Credential Guard) use Hyper-V for protection critical components of the operating system, even without starting the guest OS. It turned out that existing virtualization systems such as Qemu, VirtualBox and VMware Workstation can’t work in such conditions, when using hardware virtualization features. Hyper-V just blocked them, when it running.

VBS appeared in Windows 10, build 1511 Enterprise (November 2015). It must be specially activated as the Windows feature option, but in 1607 build VBS component was already integrated into the OS by default (https://docs.microsoft.com/ru-ru/windows/security/identity-protection/credential-guard/credential-guard-manage) and it just needed to be activated. Next stage, in December 2019, Microsoft decided to activate VBS option as default https://techcommunity.microsoft.com/t5/virtualization/virtualization-based-security-enabled-by-default/ba-p/890167 which led to the failure of virtualization applications created by third-party developers.

For solving that problem Microsoft has developed the Windows Hypervisor Platform API (https://docs.microsoft.com/en-us/virtualization/api/), which provide the following features for third-party developers:

1. Creation and management of Hyper-V “partitions”;
2. Memory management for each partition;
3. Virtual processors management of Hyper-V.

The main point of this APIs is to provide the ability to manage processor resources, read\write register values, start\stop processor, and generate interrupts. The result was a certain absolute minimum for working with a virtual partition.

These APIs have been available on Windows 10 since build 1803 (April 2018 update). Almost 2.5 years have passed since development of VBS and the realization, that the technology blocks third-party virtualization applications.

Also, in the context of these APIs, Windows Hypervisor Platform (WHPX) term is used - it is a Windows component of upon activation of which specified APIs become available. “X” seems to be implied as executable (similar to HAXM - Intel Hardware Accelerated Execution Manager), probably because the API was tested on Qemu first), or simply from word “accelerator”. 


For Full VMs and containers like Windows Sandbox, WDAG or Microsoft Emulator (using to run virtual images with Windows 10 X), standard Microsoft APIs exported by vid.dll are used.

Qemu emulator is the first, for which Microsoft itself developed the WHPX acceleration module, demonstrating that their APIs are functional.

Then Oracle VirtualBox developers finalized their solution, so that it worked together with Hyper-V in Windows versions 1809, but after a while it broke (Microsoft blocked the execution of some functions of the vid.sys driver for partitions, created using the WHVP API).

In January 2020, VMware announced VMware Workstation 20H1 preview build, working in conjunction with Hyper-V, but whose performance was quite low. On May 28, 2020, version 15.5 was released and it supports working with Hyper-V enabled in the OS host. It is worth mentioning, that VMware significantly redesign VMware Workstation for integration with WHVP API:


And at the same time, nested virtualization supporting was lost, if Hyper-V was activated in root OS. If remember, VMware was one of the first companies, which adding nested virtualization subsystem to its software (it was added to VMware Workstation 8 in 2011). Information, when nested virtualization feature will be available for root OS, is not yet available. At the time of publication, there are some discussions about a rather large decrease in VMware performance, when using WHVP API, but I hope this problem will be solved. But don’t forget, that VMware and VirtualBox are essentially competitive solutions for Windows Sandbox\WSL\Hyper-V in Windows 10.

WHVP API are currently used in:
  • Android emulator from Google: 
https://docs.microsoft.com/ru-ru/xamarin/android/get-started/installation/android-emulator/hardware-acceleration?pivots=windows
https://developer.android.com/studio/run/emulator-acceleration#vm-windows-whpx
VirtualBox have problems after Windows 10 1903 release, and (we will see it below) now they don’t use part of these WHVP API, replacing them with their own mechanism, because WHVP API are designed only for user mode. VirtualBox still can’t fully utilize them due to the kernel mode components used during operation. VMware had to do a lot of work and lose some functionality. There are performance complaints.
In fact, we can say that the API can be used effectively only in usermode (Qemu, Bochs).
In Microsoft defense, I can say, that API, exported by the vid.dll, are changed every six months, when a new version of Windows is released, and even when monthly cumulative updates are released. Below we can see list of functions, exported by vid.dll depending of Windows version. As you can see, the list of changes is significant, especially for Windows server versions, and we can only guess about functions parameters changing. For third-party developers, this kind of changes is unacceptable in any case.





The situation for WHVP API is much more stable, which is generally logical for public APIs:


In general, the situation for low-level developers in the field of Hyper-V is rather complicated, and, besides, it is getting worse (even Hyper-V TLFS is updated extremely rarely - the latest update added description of nested virtualization, but information is extremely scarce, it allows you to read information about the internal structures of the hypervisor, which I did at one time using the LiveCloudKd utility (https://github.com/gerhart01/LiveCloudKd), but so far it turns out to be used for researching purposes - put it into practice and integrate, for example, into the debugger still doesn’t work).

But, as Satya Nadella (https://content.techgig.com/windows-is-no-longer-important-hints-microsoft-ceo/articleshow/71474383.cms) said, the Windows era is leaving, everyone should go to the clouds, therefore, no chance, that situation with low-level API will change in a positive direction.

Let's leave the theory aside and proceed to the practical part.

EXO partitions memory internals

Windows 10 x64 Enterprise 20H1 (2004) and Windows 10 x64 Enterprise 1803 (lifecycle will finish in November 2020, so information is provided only for comparison) with updates for June 2020 is used as root OSes. As a guest OS, Windows 10 x64 Enterprise 20H1 (2004) was used everywhere.

There are 3 headers in Windows SDK 10.0.19041 (Windows 10 20H1 SDK):

WinHvPlatform.h
WinHvPlatformDefs.h
WinHvEmulation.h

Functions are exported by the winhvplatform.dll library and their definitions are presented in WinHvPlatform.h header file. Functions are wrappers over procedures provided by vid.dll (Microsoft Hyper-V Virtualization Infrastructure Driver Library), which calls vid.sys (Microsoft Hyper-V Virtualization Infrastructure Driver) driver services.

Briefly consider what happens when the VM starts. The source code of Qemu is available, if anyone is interested, then you can always study the working algorithms in detail:


When Qemu starts with WHPX hardware acceleration mode, two descriptors are created: \Device\VidExo. They allow get acces to mentioned kernel mode vid.sys device:


Both descriptors – file objects:


If look at each FsContext, there are different data structures, that have signatures: Exo and Prtn.


Prtn-structure (VM_PROCESS_CONTEXT) was described in previous article (http://hvinternals.blogspot.com/2019/09/hyper-v-memory-internals-guest-os-memory-access.html)

Using WinDBG and pykd plugin, this structure can be parsed and meaningful elements can be shown (example for VMware Workstation 20H2 preview):


As you can see, Exo object is not registered in winhvr!WinHvpPartitionArray array (only one Prtn object is presented), i.e. is not a full partition object.
Exo object is address of vid.sys variable VidExoDeviceContext. Prtn object is created for Exo partitions doesn’t contain partition name (full VM partition has same name as VM in Hyper-V manager, containers partition has name “Virtual machine”). But GUID for EXO partition is present.

There are not so many Exo functions in the kernel in the vid.sys driver:

Windows 10 x64 20H1
Windows 10 x64 1803
0: kd> x /1 vid!*exo*
Vid!VsmmExopAccessVaFault
Vid!VidExopIoControlPreProcess
Vid!VidExopFileClose
Vid!VidExopFileObjectDestroy
Vid!VidExopUpdateDeviceSecurity
Vid!VidExopFileCleanup
Vid!VidExopDeviceSetupInternal
Vid!VidExopRegKeyNotificationHandler
Vid!VidExopFileCreate
Vid!VidExoVpStopCompleteMessageCallback
Vid!VidExoIoControlPartition
Vid!VidExoFastIoControlDriver

Vid!VidExoCurrentProcessIsAccessAllowed
Vid!VidExoPartitionInitialize

Vid!VsmmNtSlatExoGpaAccessVaFault
Vid!VsmmExoGpaRangeIoctlUnmap
Vid!VsmmExoGpaRangeIoctlAccessTrackingControl
Vid!VsmmExoTranslateGuestVirtualAddress
Vid!VidExoDeviceContext

Vid!VidExoInterceptsHandlePassthrough
Vid!VsmmExoGpaRangeIoctlMap
Vid!VsmmExoHandleMemoryIntercept
Vid!VidInformationIoctlExoGetSystemInformation
Vid!VidExoFastIoControlPartition
Vid!VidExoIoControlDriver

0: kd> x /1 vid!*exo*

Vid!VidExopIoControlPreProcess
Vid!VidExopFileClose
Vid!VidExopFileObjectDestroy
Vid!VidExopUpdateDeviceSecurity
Vid!VidExopFileCleanup


Vid!VidExopFileCreate

Vid!VidExoIoControlPartition
Vid!VidExoFastIoControlDriver
Vid!VidExoDeviceTeardown
Vid!VidExoCurrentProcessIsAccessAllowed
Vid!VidExoPartitionInitialize
Vid!VsmmGpaRangeIoctlExoCreateSpecifyUserVa




Vid!VidExoDeviceContext
Vid!VidExoDeviceSetup




Vid!VidExoFastIoControlPartition


There are two values in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Vid\Parameters key

  ExoDeviceEnabled
  ExoDeviceEnabledClient


If both are zero, nothing happened, but if one of them is changed, Vid.sys!VidExopRegKeyNotificationHandler immediately starts to work (it was early registered using nt!ZwNotifyChangeKey).

If one of variables equivalent 1, then vid.sys!VidExopDeviceSetupInternal function is executed, created device object \Device\VidExo and SymbolicLink \DosDevices\VidExo and registered handler functions:

VidExopFileCreate
VidExopFileClose
VidExopFileCleanup

VidExopIoControlPreProcess 
VidIoControl

Separate handlers for fast I\O:

VidExoFastIoControlPartition
VidExoFastIoControlDriver

Function is finished by calling VidObjectHeaderInitialize(VidExoDeviceContext, ' oxE')

vid.sys!VidExopIoControlPreProcess – function, that is used to process IOCTL requests, sending to the VidExo object, the vid.sys!VidIoControlPreProcess function is called from it, first parameter of which is transferred to the VM_PROCESS_CONTEXT structure. If VM_PROCESS_CONTEXT contains “Prtn” signature, then VidExoIoControlPartition will be executed, if “Exo”, then vid.sys!VidExoIoControlDriver (it calls winhvr!WinHvGetSystemInformation with certain parameters, however, I did not encounter the case when this function is executed, because exo-object is not partition object). Accordingly, even in the case of working with the WHVP API, almost all work is carried out with a Prtn-object.

Next functions can be called from vid.sys!VidExoIoControlPartition:

VidIoControlPartition
WinHvInstallIntercept

WinHvSetLocalInterruptControllerState
WinHvGetLocalInterruptControllerState
VsmmExoGpaRangeIoctlAccessTrackingControl

VsmmExoGpaRangeIoctlUnmap
VsmmExoGpaRangeIoctlMap

vid.sys!VidIoControlPartition can handle limited IOCTL queries for EXO partitions.

IOCTL
Функция
Подфункции (для информации)
0x221184
VidVpIoctlStart
VidMessageSlotCancelWait, no winhv call
0x221034
VsmmDoorbellCreateEntry
WinHvAllocatePortId, WinHvCreatePort, WinHvConnectPort
0x2210BC

WinHvGetXsaveData
0x2210D0
VidHvStatsIoctlPageMapLocal
VidClientBufferInitialize, VidClientBufferShare
0x2210D4
VidHypercallDoorbellIoctlMap
VidClientBufferShare
0x2210DC
VidVpLookup, VidClientBufferShare

0x2210F0
VidMessageSlotMap
VidVpLookup
0x221134
VidCpuidResultRegister
WinHvRegisterInterceptResult
0x221174
VidPartitionIoctlSetup
WinHvCreatePartition
0x22117C

WinHvSetXsaveData
0x220003
VidPartitionIoctlPropertyGet
WinHvGetPartitionId, WinHvGetPartitionProperty
0x22122B
VsmmVaGpaCoreGetGpaPageProperties
VsmmVaGpaCorepGpnCompareFunctionByPage,
VsmmNtSlatAccessFault
0x2210B7
VidVpIoctlStateGet
WinHvGetVpRegisters
0x221013
WinHvAssertVirtualInterrupt

0x22105F
VsmmDoorbellRemoveEntry
VidHandleTableFreeEntry
0x2210AF
Get partition id from Prtn object

0x2210EF
VidVpLookup
VidVpSuspend, VidMessageSlotHandle, VidVpRun
0x22116F
VidPartitionIoctlPropertySet

0x22117B
VidVpIoctlStateSet
WinHvSetVpRegisters
0x2211A3
VsmmExoTranslateGuestVirtualAddress
WinHvTranslateVirtualAddress

It corresponds to a limited set of functions provided by the WHVP API SDK. If a forbidden request is called, error code C0000002h will be returned.

As you can see, the functions of reading/writing memory are not available through WHVP API, so memory access using official API isn’t impossible. We need to go deeper into the vid.sys driver and consider the structure of the created memory blocks.


Shortly (if you don’t want read previous article), VM_PROCESS_CONTEXT object is created for each virtual machine. Virtual machine memory is described with structures MEMORY_BLOCK and GPAR_OBJECT.
For full VM, created through Hyper-V Manager, MEMORY_BLOCK structure contains a pointer to the guest OS GPA array, which maps the SPA host of the operating system, the GPA. Each MEMORY_BLOCK describes its own GPA range. Having found a specific block and obtained a GPA, you can execute the IoAllocateMdl and MmMapLockedPagesSpecifyCache functions and read / write data to the memory of the guest OS.
When Hyper-V containers is started, a separate kernel mode vmmem process (minimal process) is created. VM_PROCESS_CONTEXT object contains reference to array of GPAR objects, that contains GPA and memory block’s offsets in the vmmem process. Guest memory is already mapped in vmmem process, and for reading/writing it is necessary to find the block, that describes the necessary GPA and read the corresponding memory block from the vmmem address space, for example, using the MmCopyVirtualMemory function built into the Windows kernel.

EXO partitions have a different memory organization:


Memory blocks mapping occurs through vid.sys!VsmmExoGpaRangeIoctlMap call, from which vid.sys!VsmmVaGpaCoreMapGpaRange is called.

First we are interested in vid.sys!VsmmVaGpaCorepFindRange, which is called from vid.sys!VsmmVaGpaCorepCreateGpaToVaMappings, which provides pointers to two functions:

VsmmVaGpaCorepGpnCompareFunctionByPage:
cmp     rax, [rdx+40h] – upper GPA
cmp     rax, [rdx+38h] – lower GPA 

VsmmVaGpaCorepVpnCompareFunctionByPage:

cmp     rax, [rdx+20h] – upper vmmem memory boundary offset
cmp     rax, [rdx+18h] – lower vmmem memory boundary offset

vid.sys!VsmmVaGpaCorepGpaRangeAllocate – allocates pool with size 0x70h bytes.

We see following code:

lea     rcx, [r13+57A0h]
mov     rdx, rdi
call    cs:__imp_RtlRbRemoveNode

lea     rcx, [r13+57B0h]
call    cs:__imp_RtlRbRemoveNode

At offset 0x57A0 and 0x57B0 Prtn-structure contains structures, that are goes as first parameter to nt!RtlRbRemoveNode function. We can see definition of that function in:


(_In_ PRTL_RB_TREE Tree, _In_ PRTL_BALANCED_NODE Node)

3: kd> dt -r1 nt!_RTL_RB_TREE
   +0x000 Root             : Ptr64 _RTL_BALANCED_NODE
      +0x000 Children         : [2] Ptr64 _RTL_BALANCED_NODE
      +0x000 Left             : Ptr64 _RTL_BALANCED_NODE
      +0x008 Right            : Ptr64 _RTL_BALANCED_NODE
      +0x010 Red              : Pos 0, 1 Bit
      +0x010 Balance          : Pos 0, 2 Bits
      +0x010 ParentValue      : Uint8B
   +0x008 Encoded          : Pos 0, 1 Bit
   +0x008 Min              : Ptr64 _RTL_BALANCED_NODE
      +0x000 Children         : [2] Ptr64 _RTL_BALANCED_NODE
      +0x000 Left             : Ptr64 _RTL_BALANCED_NODE
      +0x008 Right            : Ptr64 _RTL_BALANCED_NODE
      +0x010 Red              : Pos 0, 1 Bit
      +0x010 Balance          : Pos 0, 2 Bits
      +0x010 ParentValue      : Uint8B

This structure is red-black tree. I won’t go into theory (it can be googled easily), for result we need see all on practice, because we already have compiled code.

We have two VPN trees (probably, virtual page number) and GPN (guest page number), the vertex addresses of which are located at offsets 57A0h and 57B0h of the Prtn structure (for 20H1), respectively.

2: kd> dps 0xffffd88344414000+0x57a0
ffffd883`444197a0  ffffd883`44dc9b70 - VPN tree (_RTL_RB_TREE address)
ffffd883`444197a8  ffffd883`44dcb060 - VPN tree (_RTL_BALANCED_NODE root address)
ffffd883`444197b0  ffffd883`443b2890 - GPN tree (_RTL_RB_TREE address)
ffffd883`444197b8  ffffd883`48d75890 - GPN tree (_RTL_BALANCED_NODE root address)
ffffd883`444197c0  00000000`00000000
ffffd883`444197c8  00000000`00000000

Consider each structure separately:

GPN tree contains nodes and leaves, that contains, in addition to links to other tree’s elements, a payload — the guest page number addresses and a link to a VPN node, containing start and end addresses of the corresponding memory block in vmmem process.

2: kd> dt _RTL_RB_TREE ffffd883`444197a0 – VPN tree
nt!_RTL_RB_TREE
   +0x000 Root             : 0xffffd883`44dc9b70 _RTL_BALANCED_NODE
   +0x008 Encoded          : 0y0
   +0x008 Min              : 0xffffd883`44dcb060 _RTL_BALANCED_NODE

2: kd> dt _RTL_RB_TREE ffffd883`444197b0 – GPN tree
    nt!_RTL_RB_TREE
        +0x000 Root             : 0xffffd883`443b2890 _RTL_BALANCED_NODE
        +0x008 Encoded          : 0y0
        +0x008 Min              : 0xffffd883`48d75890 _RTL_BALANCED_NODE 

We will work with the GPN tree. The header looks something like this (you can see which node is black, which red)

2: kd> dx -id 0,0,ffffd8833e087040 -r1 ((ntkrnlmp!_RTL_BALANCED_NODE *)0xffffd883443b2890)
((ntkrnlmp!_RTL_BALANCED_NODE *)0xffffd883443b2890)                 : 0xffffd883443b2890 [Type: _RTL_BALANCED_NODE *]
    [+0x000] Children         [Type: _RTL_BALANCED_NODE * [2]]
    [+0x000] Left             : 0xffffd883443a8e10 [Type: _RTL_BALANCED_NODE *]
    [+0x008] Right            : 0xffffd88342f4a690 [Type: _RTL_BALANCED_NODE *]
    [+0x010 ( 0: 0)] Red              : 0x0 [Type: unsigned char]
    [+0x010 ( 1: 0)] Balance          : 0x0 [Type: unsigned char]
    [+0x010] ParentValue      : 0x0 [Type: unsigned __int64]

2: kd> dx -id 0,0,ffffd8833e087040 -r1 ((ntkrnlmp!_RTL_BALANCED_NODE *)0xffffd883443a8e10)
((ntkrnlmp!_RTL_BALANCED_NODE *)0xffffd883443a8e10)                 : 0xffffd883443a8e10 [Type: _RTL_BALANCED_NODE *]
    [+0x000] Children         [Type: _RTL_BALANCED_NODE * [2]]
    [+0x000] Left             : 0xffffd88344398490 [Type: _RTL_BALANCED_NODE *]
    [+0x008] Right            : 0xffffd883443ae090 [Type: _RTL_BALANCED_NODE *]
    [+0x010 ( 0: 0)] Red              : 0x1 [Type: unsigned char]
    [+0x010 ( 1: 0)] Balance          : 0x1 [Type: unsigned char]
    [+0x010] ParentValue      : 0xffffd883443b2891 [Type: unsigned __int64]

2: kd> dx -id 0,0,ffffd8833e087040 -r1 ((ntkrnlmp!_RTL_BALANCED_NODE *)0xffffd88344398490)
((ntkrnlmp!_RTL_BALANCED_NODE *)0xffffd88344398490)                 : 0xffffd88344398490 [Type: _RTL_BALANCED_NODE *]
    [+0x000] Children         [Type: _RTL_BALANCED_NODE * [2]]
    [+0x000] Left             : 0xffffd88348d75890 [Type: _RTL_BALANCED_NODE *]
    [+0x008] Right            : 0xffffd88344398510 [Type: _RTL_BALANCED_NODE *]
    [+0x010 ( 0: 0)] Red              : 0x0 [Type: unsigned char]
    [+0x010 ( 1: 0)] Balance          : 0x0 [Type: unsigned char]
    [+0x010] ParentValue      : 0xffffd883443a8e10 [Type: unsigned __int64]

                   We are primarily interested in the payload, contained in the body of a tree leaf.

2: kd> dps  0xffffd88346564610
ffffd883`46564610  00000000`00000000
ffffd883`46564618  00000000`00000000
ffffd883`46564620  ffffd883`46564910
ffffd883`46564628  ffffd883`4802e558
ffffd883`46564630  ffffd883`4802e558
ffffd883`46564638  fffffc0f`022a73a0
ffffd883`46564640  fffffc0f`022a73a0
ffffd883`46564648  00000000`0000000e – Start GPA
ffffd883`46564650  00000000`0000009f – End GPA
ffffd883`46564658  ffffd883`44414000 – Prtn object
ffffd883`46564660  ffffd883`4802e530 – соответствующий элемент VPN-дерева
ffffd883`46564668  00000000`00000040
ffffd883`46564670  00000000`00000003
ffffd883`46564678  00000000`00000000

2: kd> dps 0xffffd8834802e530
ffffd883`4802e530  ffffd883`4802caa0
ffffd883`4802e538  ffffd883`4802f250
ffffd883`4802e540  ffffd883`4802f7a0
ffffd883`4802e548  00000000`281af2ee – Start virtual page address
ffffd883`4802e550  00000000`281af37f – end virtual page address
ffffd883`4802e558  ffffd883`46564628
ffffd883`4802e560  ffffd883`46564628
ffffd883`4802e568  00000000`00000001


For Qemu process, you can see, that base address of memory region coincides with beginning of VPN block:

2: kd> dps 0xffffad04c7872c10
ffffad04`c7872c10  00000000`00000000
ffffad04`c7872c18  ffffad04`cc1b7610
ffffad04`c7872c20  ffffad04`cc269610
ffffad04`c7872c28  ffffad04`c96a75b8
ffffad04`c7872c30  ffffad04`c96a75b8
ffffad04`c7872c38  ffffbf8d`0ef103a0
ffffad04`c7872c40  ffffbf8d`0ef103a0
ffffad04`c7872c48  00000000`00000000 – Start GPA
ffffad04`c7872c50  00000000`0000009f – End GPA
ffffad04`c7872c58  ffffad04`c8bd1000 
ffffad04`c7872c60  ffffad04`c96a7590 – соответствующий элемент VPN-дерева.
ffffad04`c7872c68  00000000`00000040
ffffad04`c7872c70  00000000`00000003
ffffad04`c7872c78  00000000`00000000

2: kd> dps ffffad04`c96a7590
ffffad04`c96a7590  ffffad04`ccb86120
ffffad04`c96a7598  ffffad04`c6cd8af0
ffffad04`c96a75a0  ffffad04`c6cd94a0
ffffad04`c96a75a8  00000000`0007fff0 – Start virtual page address
ffffad04`c96a75b0  00000000`0008008f – end virtual page address
ffffad04`c96a75b8  ffffad04`c7872c28
ffffad04`c96a75c0  ffffad04`c7872c28
ffffad04`c96a75c8  00000000`00000001
ffffad04`c96a75d0  53646156`02050000
ffffad04`c96a75d8  00000000`00000000
ffffad04`c96a75e0  00000000`00000000
ffffad04`c96a75e8  00000000`00000000


In some ways, EXO partition memory organization looks like VAD tree, that describes the state of the process address space, built on the basis of AVL trees. There are also minimum and maximum values of the range of the memory block.

kd> dt ntkrnlmp!_MMVAD_SHORT
   +0x000 NextVad          : Ptr64 _MMVAD_SHORT
   +0x008 ExtraCreateInfo  : Ptr64 Void
   +0x000 VadNode          : _RTL_BALANCED_NODE
   +0x018 StartingVpn      : Uint4B
   +0x01c EndingVpn        : Uint4B
   +0x020 StartingVpnHigh  : UChar
   +0x021 EndingVpnHigh    : UChar
   +0x022 CommitChargeHigh : UChar
   +0x023 SpareNT64VadUChar : UChar
   +0x024 ReferenceCount   : Int4B
   +0x028 PushLock         : _EX_PUSH_LOCK
   +0x030 u                : <anonymous-tag>
   +0x034 u1               : <anonymous-tag>
   +0x038 EventList        : Ptr64 _MI_VAD_EVENT_BLOCK
kd> dt ntkrnlmp!_MMVAD
   +0x000 Core             : _MMVAD_SHORT
   +0x040 u2               : <anonymous-tag>
   +0x048 Subsection       : Ptr64 _SUBSECTION
   +0x050 FirstPrototypePte : Ptr64 _MMPTE
   +0x058 LastContiguousPte : Ptr64 _MMPTE
   +0x060 ViewLinks        : _LIST_ENTRY
   +0x070 VadsProcess      : Ptr64 _EPROCESS
   +0x078 u4               : <anonymous-tag>
   +0x080 FileObject       : Ptr64 _FILE_OBJECT
kd> dt ntkrnlmp!_MI_VAD_SEQUENTIAL_INFO
   +0x000 Length           : Pos 0, 12 Bits
   +0x000 Vpn              : Pos 12, 52 Bits

Finally, to read\write to the virtual address space of a guest OS, running in Qemu in whpx acceleration mode, you must first:

1.  translate a virtual address into a physical one using vid.dll!VidTranslateGvatoGpa and get the physical address.
2.  find the necessary GPN-list or node in the tree, comparing the start and end GPA, with the received on stage 1 physical address
3.  get VPN-list and find out the memory block offset in the address space of the qemu-system-x86_64.exe process or vmware-vmx.exe.
4.  read the corresponding memory block or write (depending on the operation).

Option 2 (Theoretical. Does not require kernel-mode operations, but hasn’t been tested):

1. Get physical address from virtual using winhvplatform.dll!WHvTranslateGva
2. Scan the address space of the qemu-system-x86_64.exe or vmware-vmx.exe process, find a block, that matches the size of the RAM (hope that it will be the same without fragmentation)
3. To consider the physical address as an offset in the process memory block
4. Read or write, and hope that you're lucky)

Yes, non-100% variant, but don’t required kernel mode EXO-trees parsing.

We can get trace during Qemu started:
qemu-system-x86_64.exe -m 3072M -smp 1 -drive file= Win1020H1.gcow2, index=0, media=disk, cache=writeback -accel whpx
 with WinDBG command
bp winhvr!WinHvMapGpaPagesSpecial "r rcx, rdx,r8,r9;g"
0: kd> g
rcx=0000000000000005 rdx=0000000000000000 r8=0000000000080400 r9=00000000000c0000
rcx=0000000000000005 rdx=00000000000fffc0 r8=0000000000080400 r9=0000000000000040
rcx=0000000000000005 rdx=0000000000000000 r8=0000000000080400 r9=00000000000000c0
rcx=0000000000000005 rdx=00000000000000c0 r8=0000000000080400 r9=0000000000000020
rcx=0000000000000005 rdx=00000000000000e0 r8=0000000000080400 r9=0000000000000020
rcx=0000000000000005 rdx=0000000000000100 r8=0000000000080400 r9=00000000000bff00
rcx=0000000000000005 rdx=0000000000000000 r8=0000000000080400 r9=00000000000000a0
rcx=0000000000000005 rdx=00000000000000c0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000d0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000e0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000f0 r8=0000000000080400 r9=00000000000bff10
rcx=0000000000000005 rdx=00000000000000c0 r8=0000000000080400 r9=00000000000bff40
rcx=0000000000000005 rdx=00000000000fd000 r8=0000000000080400 r9=0000000000001000
rcx=0000000000000005 rdx=00000000000febe0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000feb80 r8=0000000000080400 r9=0000000000000040
rcx=0000000000000005 rdx=00000000000000c0 r8=0000000000080400 r9=000000000000000b
rcx=0000000000000005 rdx=00000000000000cb r8=0000000000080400 r9=0000000000000003
rcx=0000000000000005 rdx=00000000000000ce r8=0000000000080400 r9=0000000000000002
rcx=0000000000000005 rdx=00000000000000d0 r8=0000000000080400 r9=0000000000000020
rcx=0000000000000005 rdx=00000000000000f0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=0000000000000100 r8=0000000000080400 r9=00000000000bff00
rcx=0000000000000005 rdx=00000000000000ce r8=0000000000080400 r9=000000000000001a
rcx=0000000000000005 rdx=00000000000000e8 r8=0000000000080400 r9=0000000000000008
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000fd000 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000fd010 r8=0000000000080400 r9=0000000000000ff0
rcx=0000000000000005 rdx=00000000000fd000 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000fd010 r8=0000000000080400 r9=0000000000000ff0
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010
rcx=0000000000000005 rdx=00000000000000a0 r8=0000000000080400 r9=0000000000000010

We can see to VPN and GPN trees data:




The size of the blocks in guest OS is approximately the same (bfedd‬ and bff40), but technically it not equivalent.

For VirtualBox 6.1.8 we get:


Despite that VirtualBox developers use winhvplatform!WHvCreatePartition, they do not perform memory mapping using winhvplatform!WHvMapGpaRange and winhvr!WinHvMapGpaPagesSpecial. VirtualBox emulation operations is partitially performed in kernel mode, and usermode performance is insufficient for the virtualization subsystem to function normally. The development topic about compatibility of Hyper-V and VirtualBox can be seen on the official VirtualBox forum:


The main subsystem working with Hyper-V is described in this file:


WHVP API usage example can be seen in: 
https://github.com/ionescu007/Simpleator - Simpleator application: 
https://github.com/epakskape/whpexp - NOP generator from Microsoft (https://en.wikipedia.org/wiki/NOP_slide);
https://crates.io/crates/libwhp - WHVP Rust-based API;
https://github.com/0vercl0k/pywinhv - WHVP python-based API.

After adding that algorithm to LiveCloudKd, it became possible to read the memory of all partitions created using the WHVP API. Writing operation has same algorithm with 1 difference - need change direction.


Memory organization for Windows 10 1803 is similar with Windows Defender Application Guard\Windows Sandbox containers or Docker Containers running in Hyper-V isolation mode.


Google Android emulator (Qemu-based):


Conclusion

In general, we can say, that some emulators successfully work with WHVP API (Qemu, Android), and some haven’t been able to switch to them fully (Virtual Box, VMware). Microsoft clearly doesn’t want to simplify the life of competitive products, although there is no direct benefit to Microsoft in this. The performance of virtual operating systems running these APIs also raises questions. And I think, that documented API for kernel mode winhvr.sys can solved this problem.

No comments:

Post a Comment