De1CTF-xorz-Writeup

"De1CTF-Crypto-xorz-Writeup"

Posted by FitzBC on August 4, 2019

“Yeah It’s on. ”

前言

这两天打了一下 De1CTF 的比赛,贼菜,只出了最简单的几道题,在这里记录一下密码学的 xorz 这道题


正文

题目描述

题目非常简单,给了一个 python 脚本,如下:

1
2
3
4
5
6
7
8
9
10
11
12
from itertools import *
from data import flag,plain

key=flag.strip("de1ctf{").strip("}")
assert(len(key)<38)
salt="WeAreDe1taTeam"
ki=cycle(key)
si=cycle(salt)
cipher = ''.join([hex(ord(p) ^ ord(next(ki)) ^ ord(next(si)))[2:].zfill(2) for p in plain])
print cipher
# output:
# 49380d773440222d1b421b3060380c3f403c3844791b202651306721135b6229294a3c3222357e766b2f15561b35305e3c3b670e49382c295c6c170553577d3a2b791470406318315d753f03637f2b614a4f2e1c4f21027e227a4122757b446037786a7b0e37635024246d60136f7802543e4d36265c3e035a725c6322700d626b345d1d6464283a016f35714d434124281b607d315f66212d671428026a4f4f79657e34153f3467097e4e135f187a21767f02125b375563517a3742597b6c394e78742c4a725069606576777c314429264f6e330d7530453f22537f5e3034560d22146831456b1b72725f30676d0d5c71617d48753e26667e2f7a334c731c22630a242c7140457a42324629064441036c7e646208630e745531436b7c51743a36674c4f352a5575407b767a5c747176016c0676386e403a2b42356a727a04662b4446375f36265f3f124b724c6e346544706277641025063420016629225b43432428036f29341a2338627c47650b264c477c653a67043e6766152a485c7f33617264780656537e5468143f305f4537722352303c3d4379043d69797e6f3922527b24536e310d653d4c33696c635474637d0326516f745e610d773340306621105a7361654e3e392970687c2e335f3015677d4b3a724a4659767c2f5b7c16055a126820306c14315d6b59224a27311f747f336f4d5974321a22507b22705a226c6d446a37375761423a2b5c29247163046d7e47032244377508300751727126326f117f7a38670c2b23203d4f27046a5c5e1532601126292f577776606f0c6d0126474b2a73737a41316362146e581d7c1228717664091c

代码的主要意思是说取出来 flag 中 de1ctf{} 之间的部分作为 key,然后给了一串盐值,然后明文未知,将明文的每一位和key还有盐值做异或,作为新的密文,而最终的密文已知。

分析

核心代码是这一行 ord(p) ^ ord(next(ki)) ^ ord(next(si)),密文、盐值已知,明文和 key 未知,所以直接去做逆运算是不行的。因为加密算法很简单,只是单纯的异或,所以虽然明文和 key 不知道,但是他们的异或结果是可以算出来的,所以第一步需要先计算他们的异或结果,洗掉盐值。代码如下:

1
2
3
4
5
6
7
8
9
10
11
miwen='49380d773440222d1b421b3060380c3f403c3844791b202651306721135b6229294a3c3222357e766b2f15561b35305e3c3b670e49382c295c6c170553577d3a2b791470406318315d753f03637f2b614a4f2e1c4f21027e227a4122757b446037786a7b0e37635024246d60136f7802543e4d36265c3e035a725c6322700d626b345d1d6464283a016f35714d434124281b607d315f66212d671428026a4f4f79657e34153f3467097e4e135f187a21767f02125b375563517a3742597b6c394e78742c4a725069606576777c314429264f6e330d7530453f22537f5e3034560d22146831456b1b72725f30676d0d5c71617d48753e26667e2f7a334c731c22630a242c7140457a42324629064441036c7e646208630e745531436b7c51743a36674c4f352a5575407b767a5c747176016c0676386e403a2b42356a727a04662b4446375f36265f3f124b724c6e346544706277641025063420016629225b43432428036f29341a2338627c47650b264c477c653a67043e6766152a485c7f33617264780656537e5468143f305f4537722352303c3d4379043d69797e6f3922527b24536e310d653d4c33696c635474637d0326516f745e610d773340306621105a7361654e3e392970687c2e335f3015677d4b3a724a4659767c2f5b7c16055a126820306c14315d6b59224a27311f747f336f4d5974321a22507b22705a226c6d446a37375761423a2b5c29247163046d7e47032244377508300751727126326f117f7a38670c2b23203d4f27046a5c5e1532601126292f577776606f0c6d0126474b2a73737a41316362146e581d7c1228717664091c'
salt="WeAreDe1taTeam"
si2=cycle(salt)
miwen_after=''
for i in range(int(len(miwen)/2)): # 遍历整个密文串
    temp=miwen[i*2:i*2+2] # 步长为2
    # print(binascii.a2b_hex(temp))
    temphex_int=ord(binascii.a2b_hex(temp).decode())
    # p = chr(temphex_int ^ ord(next(si2)) ^ ord(next(ki2)))
    temp_re= hex(temphex_int ^ ord(next(si2)))[2:].zfill(2)
    miwen_after=miwen_after+temp_re

洗完盐值之后,这个题就比较纯粹了,通过两步来完成:1.猜解密钥长度;2.解出明文。下面来详细解释一下如何去做。

1。猜解密钥长度

最简单的猜解方式是爆破,枚举密钥长度,因为本题中设置了一个断言,密钥长度不会超过38,所以完全可以直接爆破。而另外一个高级一点的思路是计算密文的汉明距离,然后大概猜测一下密钥的位数,然后只用从排序靠前的几个密钥值中尝试枚举即可,该方法通用性较强,基本不受密钥长度限制。汉明距离的详细原理解释及计算实现可参考小记一类ctf密码题解题思路。下面只是先记录一下我对于汉明距离的理解,如果对于汉明距离有一定了解,或者读完上述文章可直接跳到解出明文部分

先来引一下 Wiki 的定义:

在信息论中,两个等长字符串之间的汉明距离(英语:Hamming distance)是两个字符串对应位置的不同字符的个数。换句话说,它就是将一个字符串变换成另外一个字符串所需要替换的字符个数。 —— Wiki-汉明距离

举几个例子,例如:

  • 10<font color=blue>1</font>1<font color=blue>1</font>01 与 10<font color=red>0</font>1<font color=red>0</font>01 之间的汉明距离是2。
  • 2<font color=blue>14</font>3<font color=blue>8</font>96 与 2<font color=red>23</font>3<font color=red>7</font>96 之间的汉明距离是3。
  • “<font color=blue>t</font>o<font color=blue>n</font>e<font color=blue>d</font>” 与 “<font color=red>r</font>o<font color=red>s</font>e<font color=red>s</font>” 之间的汉明距离是3。

所以,汉明距离可以表示字符之间的相似性程度,那对于我们求解密钥长度有什么帮助呢。是因为所有的可打印字符其实都有它对应的 ASCII 值,大写 A-Z 的 ASCII 值是 65-90,小写 a-z 的 ASCII 值是 97-122。65 的二进制表示是01000001,01011010,字母的汉明距离平均是2-3,任意字符(非纯字母)的两两汉明距离平均为4,那么虽然我们不知道明文,但是可以计算两两字母之间汉明距离,如果使用了正确的密钥长度,那么整个密文的汉明距离应该较小(不一定是最小,排除误差。因为明文都是可打印字符,汉明距离比较接近,而如果是一堆不可打印字符,他们的汉明距离可能较大)。所以通过计算汉明距离即可猜测出密钥的有限可能长度。代码参考小记一类ctf密码题解题思路

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
normalized_distances = []

for KEYSIZE in range(2, 40):
    #我们取其中前6段计算平局汉明距离
    b1 = b[: KEYSIZE]
    b2 = b[KEYSIZE: KEYSIZE * 2]
    b3 = b[KEYSIZE * 2: KEYSIZE * 3]
    b4 = b[KEYSIZE * 3: KEYSIZE * 4]
    b5 = b[KEYSIZE * 4: KEYSIZE * 5]
    b6 = b[KEYSIZE * 5: KEYSIZE * 6]

    normalized_distance = float(
        hamming_distance(b1, b2) +
        hamming_distance(b2, b3) +
        hamming_distance(b3, b4) +
        hamming_distance(b4, b5) +
        hamming_distance(b5, b6)
    ) / (KEYSIZE * 5)
    normalized_distances.append(
        (KEYSIZE, normalized_distance)
    )
normalized_distances = sorted(normalized_distances,key=lambda x:x[1])

2.解出明文

因为是 Writeup,所以直接分为两部分,第一部分为比赛思路。即如果没有什思路的话,可以找到类似的题目,快速将题目问题转换为相似题目问题,然后参考别人 Writeup 思路,快速解决,不去细究其算法原理(尤其是密码学)。第二部分为详细算法解释。

比赛思路

因为已经找到类似题目解题思路——小记一类ctf密码题解题思路,故只需要复现即可。本题和该解题思路的区别就是加入了盐值,而第一步我们已经将盐值洗去,所以问题已经基本转换过来,可以直接用其 Writeup 中的代码,稍加修改即可完成解出明文过程。详细代码见完整解题代码

详细算法解释

附:完整解题代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
import base64
import binascii
import base64
import string
from itertools import *

def bxor(a, b):     # xor two byte strings of different lengths
    if len(a) > len(b):
        return bytes([x ^ y for x, y in zip(a[:len(b)], b)])
    else:
        return bytes([x ^ y for x, y in zip(a, b[:len(a)])])

def hamming_distance(b1, b2):
    differing_bits = 0
    for byte in bxor(b1, b2):
        differing_bits += bin(byte).count("1")
    return differing_bits

# text = ''
# with open("6.txt","r") as f:
#     for line in f:
#         text += line


# b = base64.b64decode(text)

def break_single_key_xor(text):
    key = 0
    possible_space=0
    max_possible=0
    letters = string.ascii_letters.encode('ascii')
    for a in range(0, len(text)):
        maxpossible = 0
        for b in range(0, len(text)):
            if(a == b):
                continue
            c = text[a] ^ text[b]
            if c not in letters and c != 0:
                continue
            maxpossible += 1
        if maxpossible>max_possible:
            max_possible=maxpossible
            possible_space=a
    key = text[possible_space]^ 0x20
    return chr(key)


miwen='49380d773440222d1b421b3060380c3f403c3844791b202651306721135b6229294a3c3222357e766b2f15561b35305e3c3b670e49382c295c6c170553577d3a2b791470406318315d753f03637f2b614a4f2e1c4f21027e227a4122757b446037786a7b0e37635024246d60136f7802543e4d36265c3e035a725c6322700d626b345d1d6464283a016f35714d434124281b607d315f66212d671428026a4f4f79657e34153f3467097e4e135f187a21767f02125b375563517a3742597b6c394e78742c4a725069606576777c314429264f6e330d7530453f22537f5e3034560d22146831456b1b72725f30676d0d5c71617d48753e26667e2f7a334c731c22630a242c7140457a42324629064441036c7e646208630e745531436b7c51743a36674c4f352a5575407b767a5c747176016c0676386e403a2b42356a727a04662b4446375f36265f3f124b724c6e346544706277641025063420016629225b43432428036f29341a2338627c47650b264c477c653a67043e6766152a485c7f33617264780656537e5468143f305f4537722352303c3d4379043d69797e6f3922527b24536e310d653d4c33696c635474637d0326516f745e610d773340306621105a7361654e3e392970687c2e335f3015677d4b3a724a4659767c2f5b7c16055a126820306c14315d6b59224a27311f747f336f4d5974321a22507b22705a226c6d446a37375761423a2b5c29247163046d7e47032244377508300751727126326f117f7a38670c2b23203d4f27046a5c5e1532601126292f577776606f0c6d0126474b2a73737a41316362146e581d7c1228717664091c'
salt="WeAreDe1taTeam"
si2=cycle(salt)
miwen_after=''
for i in range(int(len(miwen)/2)):
    temp=miwen[i*2:i*2+2]
    # print(binascii.a2b_hex(temp))
    temphex_int=ord(binascii.a2b_hex(temp).decode())
    # p = chr(temphex_int ^ ord(next(si2)) ^ ord(next(ki2)))
    temp_re= hex(temphex_int ^ ord(next(si2)))[2:].zfill(2)
    miwen_after=miwen_after+temp_re



b=binascii.a2b_hex(miwen_after)

normalized_distances = []


for KEYSIZE in range(2, 40):
    #我们取其中前6段计算平局汉明距离
    b1 = b[: KEYSIZE]
    b2 = b[KEYSIZE: KEYSIZE * 2]
    b3 = b[KEYSIZE * 2: KEYSIZE * 3]
    b4 = b[KEYSIZE * 3: KEYSIZE * 4]
    b5 = b[KEYSIZE * 4: KEYSIZE * 5]
    b6 = b[KEYSIZE * 5: KEYSIZE * 6]

    normalized_distance = float(
        hamming_distance(b1, b2) +
        hamming_distance(b2, b3) +
        hamming_distance(b3, b4) +
        hamming_distance(b4, b5) +
        hamming_distance(b5, b6)
    ) / (KEYSIZE * 5)
    normalized_distances.append(
        (KEYSIZE, normalized_distance)
    )
normalized_distances = sorted(normalized_distances,key=lambda x:x[1])


for KEYSIZE,_ in normalized_distances[:5]:
    block_bytes = [[] for _ in range(KEYSIZE)]
    for i, byte in enumerate(b):
        block_bytes[i % KEYSIZE].append(byte)
    keys = ''
    try:
        for bbytes in block_bytes:
            keys += break_single_key_xor(bbytes)
        key = bytearray(keys * len(b), "utf-8")
        plaintext = bxor(b, key)
        print("keysize:", KEYSIZE)
        print("key is:", keys, "n")
        s = bytes.decode(plaintext)
        print(s)
    except Exception:
        continue


本文总阅读量