Experiment with self modifying code

Recently I am porting spectre and meltown mitigation to OVM3.3(xen-4.3.x).
One of the dependency feature is alternative mechanism. When porting alternative mechanism, there is an optimization to change CPUID to RET instruction.

Below is details about this commit.
*****************************************
[PATCH 12/16] xen/x86/alternatives: Do not use sync_core() to serialize I$

We use sync_core() in the alternatives code to stop speculative
execution of prefetched instructions because we are potentially changing
them and don't want to execute stale bytes.

What it does on most machines is call CPUID which is a serializing
instruction. And that's expensive.

However, the instruction cache is serialized when we're on the local CPU
and are changing the data through the same virtual address. So then, we
don't need the serializing CPUID but a simple control flow change. Last
being accomplished with a CALL/RET which the noinline causes.

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
[Linux commit 34bfab0eaf0fb5c6fb14c6b4013b06cdc7984466]

Ported to Xen.

Signed-off-by: Andrew Cooper <[email protected]>
Acked-by: Konrad Rzeszutek Wilk <[email protected]>
Acked-by: Jan Beulich <[email protected]>

Index: xen/arch/x86/alternative.c
===================================================================
--- xen/arch/x86/alternative.c    (revision 13262)
+++ xen/arch/x86/alternative.c    (revision 13264)
@@ -127,13 +127,13 @@
  * handlers seeing an inconsistent instruction while you patch.
  *
  * This routine is called with local interrupt disabled.
+ *
+ * "noinline" to cause control flow change and thus invalidate I$ and
+ * cause refetch after modification.
  */
-static void *__init text_poke_early(void *addr, const void *opcode, size_t len)
+static void *__init noinline text_poke_early(void *addr, const void *opcode, size_t len)
 {
-    memcpy(addr, opcode, len);
-    sync_core();
-
-    return addr;
+    return memcpy(addr, opcode, len);
 }
 
 /*
******************************************

It's based on a recommendation in INTEL manual and change option 2 to option 1.

...snip start...
To write self-modifying code and ensure that it is compliant with current and
future versions of the IA-32 architectures, use one of the following coding options:
(* OPTION 1 *)
Store modified code (as data) into code segment;
Jump to new code or an intermediate location;
Execute new code;
(* OPTION 2 *)
Store modified code (as data) into code segment;
Execute a serializing instruction; (* For example, CPUID instruction *)
Execute new code;
...snip end...

But in order to confirm it indeed does not break the instruction stream, I write below patch to test the self-modifying case.

# gendiff . .old1
diff -up --new-file ./arch/x86/alternative.c.old1 ./arch/x86/alternative.c
--- ./arch/x86/alternative.c.old1       2018-01-12 03:56:12.000000000 +0800
+++ ./arch/x86/alternative.c    2018-01-12 06:14:26.000000000 +0800
@@ -131,9 +131,19 @@ static void __init add_nops(void *insns,
  * "noinline" to cause control flow change and thus invalidate I$ and
  * cause refetch after modification.
  */
-static void *__init noinline text_poke_early(void *addr, const void *opcode, size_t len)
+static always_inline void *__init text_poke_early(void *addr, const void *opcode, size_t len)
 {
-    return memcpy(addr, opcode, len);
+    long d0, d1, d2;
+
+    asm volatile (
+        "   rep ; movs"__OS" ; "
+        "   mov %4,%3        ; "
+        "   rep ; movsb        "
+        : "=&c" (d0), "=&D" (d1), "=&S" (d2)
+        : "0" (len/BYTES_PER_LONG), "r" (len%BYTES_PER_LONG), "1" (addr), "2" (opcode)
+        : "memory" );
+
+    return addr;
 }

 /*
@@ -184,7 +194,11 @@ static void __init apply_alternatives(st

         add_nops(insnbuf + a->replacementlen,
                  a->instrlen - a->replacementlen);
+
+        if( cpu_has_apic )
+            printk(KERN_INFO "APIC supported\n");
         text_poke_early(instr, insnbuf, a->instrlen);
+        alternative("ud2", "nop", X86_FEATURE_APIC);
     }

     /* Reinstate WP. */


This intrusive patch modified the instruction right after the modifying, saying executing a ud2 instruction right after a memcpy replacing it with a nop. Added the printk to ensure we are really ever there. Note, always_inline is added to text_poke_early() and memcpy is replaced with a assemble version to avoid any RET.

Unluckly my intel processor just survived. It even need not a RET to refresh the processor pipeline.
I'm not sure if this is a module specific behavior, the family/Model/Stepping of CPU is 15/6/4

Disassemble of below two line confirms there is no ret instruction.
         text_poke_early(instr, insnbuf, a->instrlen);
        alternative("ud2", "nop", X86_FEATURE_APIC);

ffff82c4c0295ec5:       4c 89 ff                mov    %r15,%rdi
ffff82c4c0295ec8:       48 8b 74 24 08          mov 0x8(%rsp),%rsi
ffff82c4c0295ecd:       48 89 c1                mov    %rax,%rcx
ffff82c4c0295ed0:       83 e0 07                and    $0x7,%eax
ffff82c4c0295ed3:       48 c1 e9 03             shr    $0x3,%rcx
ffff82c4c0295ed7:       f3 48 a5                rep movsq %ds:(%rsi),%es:(%rdi)
ffff82c4c0295eda:       48 89 c1                mov    %rax,%rcx
ffff82c4c0295edd:       f3 a4                   rep movsb %ds:(%rsi),%es:(%rdi)

ffff82c4c0295edf:       0f 0b                   ud2a


One thing interesting among this hacking is I used a XEN host on HVM guest to test. I'll talk more details about this special setup in next topic.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章