用python制作一个简单html压缩

简介

举洪荒之力,集天地精华,亲自操刀,用python玩一玩。
真是,虽有嘉肴,弗食,不知其旨也;虽有至道,弗学,不知其善也。
这个压缩很low,并没有什么留掰的,非常简单,可以说只比压缩前小一点点而已,
此乃缺点,优点是不会出错!!!

流程

思路

其实就是去回车而已,如果可以就加上去空格,不过去空格要定为两个才能去,不然标签会出错。因为很low,所以js和css不压缩。
程序载入→遍历目录文件→逐个压缩

实现

个人比较懒,直接饮用网上的某函数

1
2
3
4
5
6
7
8
9
10
11
12
13
import os

def getFiles(dir, suffix): # 查找根目录,文件后缀
res = []
for root, directory, files in os.walk(dir): # =>当前根,根下目录,目录下的文件
for filename in files:
name, suf = os.path.splitext(filename) # =>文件名,文件后缀
if suf == suffix:
res.append(os.path.join(root, filename)) # =>吧一串字符串组合成路径
return res

for file in getFiles("./", '.py'): # =>查找以.py结尾的文件
print(file)

去注释,改文件名,在用replace替换使路径可用,加上判断是否要这个路径,修改后如下

修改函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import os

def getFiles(dir, suffix):
res = []
for root, directory, files in os.walk(dir):
for filename in files:
name, suf = os.path.splitext(filename)
if suf == suffix:
res.append(os.path.join(root, filename))
return res

for file in getFiles(r"html所在文件夹路径", '.html'):
if file == r"D:\blog\public\404.html":
continue
elif:
pass

path = file.replace("\\","\\\\")
print(path)

定义处理函数

1
2
3
def delete(string):
res = string.replace("\n","").replace(" ","")
return res

流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

text_list = []
with open(path,"r",encoding="UTF-8") as f:

Not_Change = False
for each in f.readlines():
long += len(each)

if "<script" in each and "</script>" in each:
text_list.append(each)
continue

if "<style" in each and "</style>" in each:
text_list.append(each)
continue

if "<script" in each:
Not_Change = True
text_list.append(each)
continue

if "</script>" in each:
Not_Change = False
text_list.append(delete(each))
continue

if "<style" in each:
Not_Change = True
text_list.append(each)
continue

if "</style>" in each:
Not_Change = False
text_list.append(delete(each))
continue


if Not_Change:
text_list.append(each)
else:
text_list.append(delete(each))

with open(path,"w",encoding="UTF-8") as f:
for each in text_list:
short += len(each)
f.write(each)

print(file + " 压缩完成!一共节省了" + str(long-short) + "个字符!")

END

简直是绞尽脑汁,哈哈哈!压缩这玩意,还行!
最后附上完整代码!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
import os

total = 0

def getFiles(dir, suffix):
res = []
for root, directory, files in os.walk(dir):
for filename in files:
name, suf = os.path.splitext(filename)
if suf == suffix:
res.append(os.path.join(root, filename))
return res

def delete(string):
res = string.replace("\n","").replace(" ","")
return res

for file in getFiles(r"D:\blog\public", '.html'):
if file == r"D:\blog\public\404.html":
continue
elif True:
pass

path = file.replace("\\","\\\\")

long = 0
short = 0

text_list = []
with open(path,"r",encoding="UTF-8") as f:

Not_Change = False
for each in f.readlines():
long += len(each)

if "<script" in each and "</script>" in each:
text_list.append(each)
continue

if "<style" in each and "</style>" in each:
text_list.append(each)
continue

if "<script" in each:
Not_Change = True
text_list.append(each)
continue

if "</script>" in each:
Not_Change = False
text_list.append(delete(each))
continue

if "<style" in each:
Not_Change = True
text_list.append(each)
continue

if "</style>" in each:
Not_Change = False
text_list.append(delete(each))
continue


if Not_Change:
text_list.append(each)
else:
text_list.append(delete(each))

with open(path,"w",encoding="UTF-8") as f:
for each in text_list:
short += len(each)
f.write(each)

print(file + " 压缩完成!一共节省了" + str(long-short) + "个字符!")
total += long-short


print("本次压缩共节省了%s个字符" %total)
input("回车退出!")

BUG

经测试,用这个压缩很容易把代码框的缩进一起去掉。而缩进是python的灵魂……
解决方法: 再加个代码框识别或放弃空格压缩。放弃空格压缩可直接在delete函数里的.replace(“ “,””)去掉。


满分是10分的话,这篇文章你给几分,您的支持将鼓励我继续创作!