Linux网页缓存之varnish

[TOC]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
程序的运行具有局部性特征:
时间局部性:一个数据被访问过之后,可能很快会被再次访问
空间局部性:一个数据被访问时,其周边的数据也有可能被访问到

cache:命中

热区:局部性;
时效性:
缓存空间耗尽:LRU
过期:缓存清理

缓存命中率:hit/(hit+miss)
(0,1)
页面命中率:基于页面数量进行衡量
字节命中率:基于页面的体积进行衡量

缓存与否:
私有数据:private,private cache;
公共数据:public, public or private cache;

Cache-related Headers Fields
The most important caching header fields are:

Expires:过期时间;
Expires:Thu, 22 Oct 2026 06:34:30 GMT
Cache-Control

Etag
If-None-Match

Last-Modified
If-Modified-Since

Vary
Age

缓存有效性判断机制:
过期时间:Expires
HTTP/1.0
Expires
HTTP/1.1
Cache-Control: maxage=
Cache-Control: s-maxage=
条件式请求:
Last-Modified/If-Modified-Since
Etag/If-None-Match

Expires:Thu, 13 Aug 2026 02:05:12 GMT
Cache-Control:max-age=315360000
ETag:"1ec5-502264e2ae4c0"
Last-Modified:Wed, 03 Sep 2014 10:00:27 GMT

缓存层级:
私有缓存:用户代理附带的本地缓存机制;
公共缓存:反向代理服务器的缓存功能;

User-Agent <--> private cache <--> public cache <--> public cache 2 <--> Original Server

请求报文用于通知缓存服务如何使用缓存响应请求:
cache-request-directive =
"no-cache",
| "no-store"
| "max-age" "=" delta-seconds
| "max-stale" [ "=" delta-seconds ]
| "min-fresh" "=" delta-seconds
| "no-transform"
| "only-if-cached"
| cache-extension

响应报文用于通知缓存服务器如何存储上级服务器响应的内容:
cache-response-directive =
"public"
| "private" [ "=" <"> 1#field-name <"> ]
| "no-cache" [ "=" <"> 1#field-name <"> ],可缓存,但响应给客户端之前需要revalidation;
| "no-store" ,不允许存储响应内容于缓存中;
| "no-transform"
| "must-revalidate"
| "proxy-revalidate"
| "max-age" "=" delta-seconds
| "s-maxage" "=" delta-seconds
| cache-extension

开源解决方案:
squid:
varnish:

varnish官方站点: http://www.varnish-cache.org/
Community
Enterprise

This is Varnish Cache, a high-performance HTTP accelerator.

程序架构:
Manager进程
Cacher进程,包含多种类型的线程:
accept, worker, expiry, ...
shared memory log:
统计数据:计数器;
日志区域:日志记录;
varnishlog, varnishncsa, varnishstat...

配置接口:VCL
Varnish Configuration Language,
vcl complier --> c complier --> shared object


varnish的程序环境:
/etc/varnish/varnish.params: 配置varnish服务进程的工作特性,例如监听的地址和端口,缓存机制;
/etc/varnish/default.vcl:配置各Child/Cache线程的缓存工作属性;
主程序:
/usr/sbin/varnishd
CLI interface:
/usr/bin/varnishadm
Shared Memory Log交互工具:
/usr/bin/varnishhist
/usr/bin/varnishlog
/usr/bin/varnishncsa
/usr/bin/varnishstat
/usr/bin/varnishtop
测试工具程序:
/usr/bin/varnishtest
VCL配置文件重载程序:
/usr/sbin/varnish_reload_vcl
Systemd Unit File:
/usr/lib/systemd/system/varnish.service
varnish服务
/usr/lib/systemd/system/varnishlog.service
/usr/lib/systemd/system/varnishncsa.service
日志持久的服务;

varnish的缓存存储机制( Storage Types):
-s [name=]type[,options]

・ malloc[,size]
内存存储,[,size]用于定义空间大小;重启后所有缓存项失效;
・ file[,path[,size[,granularity]]]
文件存储,黑盒;重启后所有缓存项失效;
・ persistent,path,size
文件存储,黑盒;重启后所有缓存项有效;实验;

varnish程序的选项:
程序选项:/etc/varnish/varnish.params文件
-a address[:port][,address[:port][...],默认为6081端口;
-T address[:port],默认为6082端口;
-s [name=]type[,options],定义缓存存储机制;
-u user
-g group
-f config:VCL配置文件;
-F:运行于前台;
...
运行时参数:/etc/varnish/varnish.params文件, DEAMON_OPTS
DAEMON_OPTS="-p thread_pool_min=5 -p thread_pool_max=500 -p thread_pool_timeout=300"

-p param=value:设定运行参数及其值; 可重复使用多次;
-r param[,param...]: 设定指定的参数为只读状态;

重载vcl配置文件:
~ ]# varnish_reload_vcl

varnishadm
-S /etc/varnish/secret -T [ADDRESS:]PORT

help [<command>]
ping [<timestamp>]
auth <response>
quit
banner
status
start
stop
vcl.load <configname> <filename>
vcl.inline <configname> <quoted_VCLstring>
vcl.use <configname>
vcl.discard <configname>
vcl.list
param.show [-l] [<param>]
param.set <param> <value>
panic.show
panic.clear
storage.list
vcl.show [-v] <configname>
backend.list [<backend_expression>]
backend.set_health <backend_expression> <state>
ban <field> <operator> <arg> [&& <field> <oper> <arg>]...
ban.list

配置文件相关:
vcl.list
vcl.load:装载,加载并编译;
vcl.use:激活;
vcl.discard:删除;
vcl.show [-v] <configname>:查看指定的配置文件的详细信息;

运行时参数:
param.show -l:显示列表;
param.show <PARAM>
param.set <PARAM> <VALUE>

缓存存储:
storage.list

后端服务器:
backend.list

VCL:
”域“专有类型的配置语言;

state engine:状态引擎;

VCL有多个状态引擎,状态之间存在相关性,但状态引擎彼此间互相隔离;每个状态引擎可使用return(x)指明关联至哪个下一级引擎;每个状态引擎对应于vcl文件中的一个配置段,即为subroutine

vcl_hash --> return(hit) --> vcl_hit

Client Side:
vcl_recv, vcl_pass, vcl_hit, vcl_miss, vcl_pipe, vcl_purge, vcl_synth, vcl_deliver

vcl_recv:
hash:vcl_hash
pass: vcl_pass
pipe: vcl_pipe
synth: vcl_synth
purge: vcl_hash --> vcl_purge

vcl_hash:
lookup:
hit: vcl_hit
miss: vcl_miss
pass, hit_for_pass: vcl_pass
purge: vcl_purge

Backend Side:
vcl_backend_fetch, vcl_backend_response, vcl_backend_error

两个特殊的引擎:
vcl_init:在处理任何请求之前要执行的vcl代码:主要用于初始化VMODs;
vcl_fini:所有的请求都已经结束,在vcl配置被丢弃时调用;主要用于清理VMODs;

vcl的语法格式:
(1) VCL files start with vcl 4.0;
(2) //, # and /* foo */ for comments;
(3) Subroutines are declared with the sub keyword; 例如sub vcl_recv { ...};
(4) No loops, state-limited variables(受限于引擎的内建变量);
(5) Terminating statements with a keyword for next action as argument of the return() function, i.e.: return(action);用于实现状态引擎转换;
(6) Domain-specific;

The VCL Finite State Machine
(1) Each request is processed separately;
(2) Each request is independent from others at any given time;
(3) States are related, but isolated;
(4) return(action); exits one state and instructs Varnish to proceed to the next state;
(5) Built-in VCL code is always present and appended below your own VCL;

三类主要语法:
sub subroutine {
...
}

if CONDITION {
...
} else {
...
}

return(), hash_data()

VCL Built-in Functions and Keywords
函数:
regsub(str, regex, sub)
regsuball(str, regex, sub)
ban(boolean expression)
hash_data(input)
synthetic(str)

Keywords:
call subroutine, return(action),new,set,unset

操作符:
==, !=, ~, >, >=, <, <=
逻辑操作符:&&, ||, !
变量赋值:=

举例:obj.hits
if (obj.hits>0) {
set resp.http.X-Cache = "HIT via " + server.ip;
} else {
set resp.http.X-Cache = "MISS via " + server.ip;
}


变量类型:
内建变量:
req.*:request,表示由客户端发来的请求报文相关;
req.http.*
req.http.User-Agent, req.http.Referer, ...
bereq.*:由varnish发往BE主机的httpd请求相关;
bereq.http.*
beresp.*:由BE主机响应给varnish的响应报文相关;
beresp.http.*
resp.*:由varnish响应给client相关;
obj.*:存储在缓存空间中的缓存对象的属性;只读;

常用变量:
bereq.*, req.*:
bereq.http.HEADERS
bereq.request:请求方法;
bereq.url:请求的url;
bereq.proto:请求的协议版本;
bereq.backend:指明要调用的后端主机;

req.http.Cookie:客户端的请求报文中Cookie首部的值;
req.http.User-Agent ~ "chrome"


beresp.*, resp.*:
beresp.http.HEADERS
beresp.status:响应的状态码;
reresp.proto:协议版本;
beresp.backend.name:BE主机的主机名;
beresp.ttl:BE主机响应的内容的余下的可缓存时长;

obj.*
obj.hits:此对象从缓存中命中的次数;
obj.ttl:对象的ttl值

server.*
server.ip
server.hostname
client.*
client.ip

用户自定义:
set
unset

示例1:强制对某类资源的请求不检查缓存:
vcl_recv {
if (req.url ~ "(?i)^/(login|admin)") {
return(pass);
}
}

示例2:对于特定类型的资源,例如公开的图片等,取消其私有标识,并强行设定其可以由varnish缓存的时长;
if (beresp.http.cache-control !~ "s-maxage") {
if (bereq.url ~ "(?i)\.(jpg|jpeg|png|gif|css|js)$") {
unset beresp.http.Set-Cookie;
set beresp.ttl = 3600s;
}
}

缓存对象的修剪:purge, ban
(1) 能执行purge操作
sub vcl_purge {
return (synth(200,"Purged"));
}

(2) 何时执行purge操作
sub vcl_recv {
if (req.method == "PURGE") {
return(purge);
}
...
}

添加此类请求的访问控制法则:
acl purgers {
"127.0.0.0"/8;
"10.1.0.0"/16;
}

sub vcl_recv {
if (req.method == "PURGE") {
if (!client.ip ~ purgers) {
return(synth(405,"Purging not allowed for " + client.ip));
}
return(purge);
}
...
}

如何设定使用多个后端主机:
backend default {
.host = "172.16.100.6";
.port = "80";
}

backend appsrv {
.host = "172.16.100.7";
.port = "80";
}

sub vcl_recv {
if (req.url ~ "(?i)\.php$") {
set req.backend_hint = appsrv;
} else {
set req.backend_hint = default;
}

...
}

Director:
varnish module;
使用前需要导入:
import director;

示例:
import directors; # load the directors

backend server1 {
.host =
.port =
}
backend server2 {
.host =
.port =
}

sub vcl_init {
new GROUP_NAME = directors.round_robin();
GROUP_NAME.add_backend(server1);
GROUP_NAME.add_backend(server2);
}

sub vcl_recv {
# send all traffic to the bar director:
set req.backend_hint = GROUP_NAME.backend();
}

BE Health Check:
backend BE_NAME {
.host =
.port =
.probe = {
.url=
.timeout=
.interval=
.window=
.threshhold=
}
}

.probe:定义健康状态检测方法;
.url:检测时请求的URL,默认为”/";
.request:发出的具体请求;
.request =
"GET /.healthtest.html HTTP/1.1"
"Host: www.magedu.com"
"Connection: close"
.window:基于最近的多少次检查来判断其健康状态;
.threshhold:最近.window中定义的这么次检查中至有.threshhold定义的次数是成功的;
.interval:检测频度;
.timeout:超时时长;
.expected_response:期望的响应码,默认为200;

健康状态检测的配置方式:
(1) probe PB_NAME = { }
backend NAME = {
.probe = PB_NAME;
...
}

(2) backend NAME {
.probe = {
...
}
}

示例:
probe check {
.url = "/.healthcheck.html";
.window = 5;
.threshold = 4;
.interval = 2s;
.timeout = 1s;
}

backend default {
.host = "10.1.0.68";
.port = "80";
.probe = check;
}

backend appsrv {
.host = "10.1.0.69";
.port = "80";
.probe = check;
}


varnish的运行时参数:
线程模型:
cache-worker
cache-main
ban lurker
acceptor:
epoll/kqueue:
...

线程相关的参数:
在线程池内部,其每一个请求由一个线程来处理; 其worker线程的最大数决定了varnish的并发响应能力;

thread_pools:Number of worker thread pools. 最好小于或等于CPU核心数量;
thread_pool_max:The maximum number of worker threads in each pool. 每线程池的最大线程数;
thread_pool_min:The minimum number of worker threads in each pool. 额外意义为“最大空闲线程数”;

最大并发连接数=thread_pools * thread_pool_max

thread_pool_timeout:Thread idle threshold. Threads in excess of thread_pool_min, which have been idle for at least this long, will be destroyed.
thread_pool_add_delay:Wait at least this long after creating a thread.
thread_pool_destroy_delay:Wait this long after destroying a thread.

设置方式:
vcl.param
param.set

永久有效的方法:
varnish.params
DEAMON_OPTS="-p PARAM1=VALUE -p PARAM2=VALUE"

varnish日志区域:
shared memory log
计数器
日志信息

1、varnishstat - Varnish Cache statistics
-1
-1 -f FILED_NAME
-l:可用于-f选项指定的字段名称列表;

MAIN.cache_hit
MAIN.cache_miss

# varnishstat -1 -f MAIN.cache_hit -f MAIN.cache_miss

2、varnishtop - Varnish log entry ranking
-1 Instead of a continously updated display, print the statistics once and exit.
-i taglist,可以同时使用多个-i选项,也可以一个选项跟上多个标签;
-I <[taglist:]regex>
-x taglist:排除列表
-X <[taglist:]regex>

3、varnishlog - Display Varnish logs

4、 varnishncsa - Display Varnish logs in Apache / NCSA combined log format

内建函数:
hash_data():指明哈希计算的数据;减少差异,以提升命中率;
regsub(str,regex,sub):把str中被regex第一次匹配到字符串替换为sub;主要用于URL Rewrite
regsuball(str,regex,sub):把str中被regex每一次匹配到字符串均替换为sub;
return():
ban(expression)
ban_url(regex):Bans所有的其URL可以被此处的regex匹配到的缓存对象;
synth(status,"STRING"):purge操作;


总结:
varnish: state engine, vcl
varnish 4.0:
vcl_init
vcl_recv
vcl_hash
vcl_hit
vcl_pass
vcl_miss
vcl_pipe
vcl_waiting
vcl_purge
vcl_purge
vcl_deliver
vcl_synth
vcl_fini

vcl_backend_fetch
vcl_backend_response
vcl_backend_error

sub VCL_STATE_ENGINE
backend BE_NAME {}
probe PB_NAME {}
acl ACL_NAME {}

博客作业:以上所有内容;
课外实践:(1) zabbix监控varnish业务指标;
(2) ansible实现varnish快速部署;
(3) 两个lamp部署wordpress,用Nginx反代,做压测;nginx后部署varnish缓存,调整vcl,多次压测;

ab, http_load, webbench, seige, jmeter, loadrunner,...