Nginx

nginx alias配置小问题

之前单独做了一个健康检查的页面。配置是如下location ~* HEALTH_CHECK {alias /home/app/HEALTH_CHECK ;try_files /lbck =410;} location ~* /HEALTH_CHECK/ {alias /home/app/HEALTH_CHECK ; access_log off; } location ~* /HEALTH_CHECK/ {root /home/app/ ;access_log off;} 后来尝试了最简单的配置方式可以正常工作。location /HEALTH_CHECK/ {alias /home/app/HEALTH_CHECK/ ;access_log off;} 咨询了一下开发同学。主要是alias实现上方式的问题。不过总的来说，用这些东西还是多做好测试。

nginx的dns ttl问题

今天遇到了一个和dns ttl相关的问题。线上一个nginx服务器代理了一些外部的资源，把外部的http的资源变成https的供我们自己的https页面上用。但是今天看到了有很多错误日志，显示的是连upstream的机器失败了。我看了一下配置文件，直接在nginx服务器上访问配置的url是正常访问的。再在nginx服务器上解析了一下对应的IP，发现和错误日志里显示的不一样了。看样子是外部的dns切换了IP，nginx一直是在访问老的失效的IP。网上看了一下nginx的WIKI，也问了一下tengine的开发同学。nginx wiki上说是会遵循DNS的ttl设置，但是结果确实不是这样。自己简单测试了一下。测试环境：1. 1台linux服务器，装上nginx-1.2.8即可。2. 1台linux服务器跑dnsmasq，设置好ttl并开启日志，也在上面装了wireshark方便抓包。配置文件如下worker_processes 1;error_log logs/error.log;events {worker_connections 1024;}http {include mime.types;default_type application/octet-stream;sendfile on;keepalive_timeout 65;server {listen 8888;server_name localhost;charset utf-8; }} 发现启动的时候会做4次dns查询，但是后面无论多久是不会重新进行nginx.test.org的查询的,而wireshark显示TTL确实是被置为了10s。Domain Name System (response)[Request In: 7][Time: 0.000074000 seconds]Transaction ID: 0x925fFlags: 0x8580 (Standard query response, No error)1… …. …. …. = Response: Message is a response.000 0… …. …. = Opcode: Standard query (0)…. .1.. …. …. = Authoritative: Server is an authority for domain…. ..0. …. …. = Truncated: Message is not truncated…. …1 …. …. = Recursion desired: Do query recursively…. …. 1… …. = Recursion available: Server can do recursive queries…. …. .0.. …. = Z: reserved (0)…. …. ..0. …. = Answer authenticated: Answer/authority portion was not authenticated by the server…. …. …. 0000 = Reply code: No error (0)Questions: 1Answer RRs: 1Authority RRs: 0Additional RRs: 0Queriesnginx.test.org: type A, class INName: nginx.test.orgType: A (Host address)Class: IN (0x0001)Answersnginx.test.org: type A, class IN, addr 220.xx.xx.xxName: nginx.test.orgType: A (Host address)Class: IN (0x0001)Time to live: 10 secondsData length: 4Addr: 220.xx.xx.xx ...

blog设置缓存后遇到的问题

上周把blog开启了cache php结果后，主要是为了解决假想的一种短时间内请求过大php-fpm性能跟不上的问题。但是今天发现了一个比较奇怪的问题，就是打开的blog首页的时候页面是空白的。然后想起这个可能是和我针对dnspod的监控有特殊配置的原因。因为dnspod的访问比较频繁，所以我设置了直接返回200的特殊配置，避免无谓消耗机器的性能。42 location / {4344 # First attempt to serve request as file, then45 # as directory, then fall back to displaying a 404.46 try_files $uri $uri/ /index.php;47 # Uncomment to enable naxsi on this location48 # include /etc/nginx/naxsi.rules49 if ($http_user_agent ~ monitor ) {50 return 200;51 access_log off;52 }53 fastcgi_cache blog;54 } 自己当时配置的时候只是去连续刷新几次，看是否成功cache住了页面。但是实际平时都是dnspod的监控在访问，直接返回了200，这样如果这样的页面被cache，自己打开的时候就啥东西都看不到了。解决的方式就是修改一下dnspod的监控页面，比如监控的URL改成/favicon.ico之类的。但是这样修改也会有个漏洞，比如我手动设置user-agent为monitor，使劲访问/，则首页缓存住的一直是一个空页面，所以想了个办法，把user-agent单独map到一个变量上，然后cache_key里把这个变量加上25 map $http_user_agent $agent {26 default ‘normal’;27 ~monitor ‘dnspod’;28 } curl -I -A “monitor” http://blog.gnuers.orgHTTP/1.1 200 OKServer: nginx/1.2.7Date: Wed, 03 Apr 2013 12:11:55 GMTContent-Type: application/octet-streamContent-Length: 0Connection: keep-alive – – ::ffff:101.226.68.137:35145 – – [03/Apr/2013:13:46:19 +0000] blog.gnuers.org “HEAD / HTTP/1.1” 200 0 “-” “Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; QQDownload 713; .NET CLR 2.0.50727; InfoPath.2)” normal “-” “unix:/var/run/php5-fpm.sock” “- -” 0.214 0.214 log_format main ‘$http_orig_client_ip – $remote_addr:$remote_port – $remote_user [$time_local] $host “$request” $status $body_bytes_sent “$http_referer” ‘‘"$http_user_agent – $agent – $hitstatus" “$http_x_forwarded_for” “$upstream_addr” “$ssl_protocol $ssl_cipher” $request_time $upstream_response_time’; ...

blog缓存设置

动态的内容往往性能都非常差，所以一般的nginx+fastcgi的模式下性能肯定都是卡在后面cgi上。尤其是小内存的VPS上的一些配置使得mysql的速度也非常慢，所以这样的情况就很明显了。上图就是随便压测一下，QPS非常低，但是php-fpm就把CPU都耗完了。所可以考虑直接在fastcgi里做一下cache，这样如果某篇文章的访问量比较大的时候（目前我的blog还没有这样的访问量）可以直接使用cache的文件，这样的话性能就不是问题了，一般的流量nginx在小vps上都是可以轻松应对的。配置如下 map $upstream_addr $hitstatus {default ‘cache’ ;~unix ‘nocache’;}map $http_user_agent $agent {default ‘normal’;~monitor ‘dnspod’;}fastcgi_cache_path /var/cache/nginx levels=1:2 keys_zone=blog:10m inactive=2m max_size=50m;server {listen [::]:443 ssl so_keepalive=on;listen [::]:80 so_keepalive=on;root /home/www/blog;index index.html index.htm index.php;server_name localhost;ssl_certificate cert/server.crt;ssl_certificate_key cert/server.key;ssl_session_timeout 5m;ssl_session_cache shared:sslcache:1m;ssl_protocols SSLv3 TLSv1 TLSv1.1 TLSv1.2;ssl_ciphers HIGH:!aNULL:!MD5;ssl_prefer_server_ciphers on; location / { try_files $uri $uri/ /index.html; # Uncomment to enable naxsi on this location# include /etc/nginx/naxsi.rulesif ($http_user_agent ~ monitor ) {return 200; access_log off; }if ($http_user_agent ~ monitor ) {return 200; access_log off; }fastcgi_cache blog; fastcgi_cache_valid 200 302 10m; fastcgi_cache_valid 404 1m; fastcgi_cache_min_uses 2; fastcgi_cache_methods GET HEAD; fastcgi_cache_key "$scheme$host$agent$request_uri$server_protocol$request_method"; add_header hit $hitstatus; expires modified +1h; } location ~ (wp-.*.php|xmlrpc.php){ fastcgi_split_path_info ^(.+.php)(/.+)$;fastcgi_pass unix:/var/run/php5-fpm.sock;fastcgi_index index.php;include fastcgi_params;fastcgi_intercept_errors on;fastcgi_buffers 1024 4k;fastcgi_buffer_size 64k;fastcgi_busy_buffers_size 128k;fastcgi_send_timeout 60;fastcgi_read_timeout 60;fastcgi_connect_timeout 60;add_header hit $hitstatus;} ...

limit_req引发的访问速度慢

之前是考虑到安全上的问题，所以简单地设置了一下limit_req，但是设置的太死了，限制了每秒3个请求。今天突然意识到这个可能是之前blog速度慢的元凶。直接看来一下一个页面了需要请求的元素大概有接近20个。赶紧把原来的limit_req_zone $binary_remote_addr zone=gnuers:10m rate=3r/s;

nginx map的使用

在前面的一篇blog里说了现在对页面做cache，但是我想能直接在浏览器里看到是否是命中缓存。这个可以比较简单地通过map来说解决。在http段添加map $upstream_addr $hitstatus {default ‘cache’ ;~unix ‘nocache’;}

nginx下单个server段同时支持http和https

自从之前给blog申请了一个ssl证书后，我是把http段和https段分成了2个server。但是这样有个问题是改配置的时候比较麻烦，有时候修改了http段的就忘记了修改https段的。实际上是可以直接把这两个配置文件合并起来的。方法比较简单，就是一个server段同时listen 443和80端口，然后listen 443后面加上ssl参数，并且需要把以前ssl on给删除掉（这种配置方式本来也不是现在提倡的了，详细的可以参考nginx的WIKI）。配置文件如下 [WIKI](http://wiki.nginx.org/HttpCoreModule#listen)``` server {listen [::]:443 ssl so_keepalive=on;listen [::]:80 so_keepalive=on;root /home/www/blog;index index.html index.htm index.php;server_name localhost;ssl_certificate cert/server.crt;ssl_certificate_key cert/server.key;ssl_session_timeout 5m;ssl_session_cache shared:sslcache:1m;ssl_protocols SSLv3 TLSv1 TLSv1.1 TLSv1.2;ssl_ciphers HIGH:!aNULL:!MD5;ssl_prefer_server_ciphers on;location / {……}} 之前我配置的时候只是同时listen了80和443，但是同时开启了ssl on。这个时候就遇到了400 Bad Request: The plain HTTP request was sent to HTTPS port这种报错。

自定义HTTP头时的注意事项

HTTP头是可以包含英文字母([A-Za-z])、数字([0-9])、连接号(-)hyphens, 也可义是下划线(_)。在使用nginx的时候应该避免使用包含下划线的HTTP头。主要的原因有以下2点。1.默认的情况下nginx引用header变量时不能使用带下划线的变量。要解决这样的问题只能单独配置underscores_in_headers on。2.默认的情况下会忽略掉带下划线的变量。要解决这个需要配置ignore_invalid_headers off。当然，nginx设置变量的时候是没有任何这样的限制的，可以直接设置带下划线的header。但是最好不要这样做。在使用nginx做多级代理的时候，也需要注意一些header不要重复设置。比如用来保存用户IP的这个header只在最上层的nginx里配置就行，后面的nginx不要重复设置导致覆盖。简单测试一下多个nginx做代理的时候处理的思路,为了方便我就直接在一个nginx上跑多个server测试 worker_processes 1;events {worker_connections 1024;}http {include mime.types;default_type application/octet-stream;log_format main ‘$http_orig_client_ip – $remote_addr – $remote_user [$time_local] “$request” ‘‘$status $body_bytes_sent “$http_referer” ‘‘"$http_user_agent" “$http_x_forwarded_for” “$upstream_addr” ‘;sendfile on;underscores_in_headers on;ignore_invalid_headers off;keepalive_timeout 65;upstream test2081{server 10.209.128.28:2081;}upstream test2082{server 10.209.128.28:2082;}upstream test2083{server 10.209.128.28:80;}server {listen 2080;server_name localhost;access_log logs/access80.log main;location / {root html;proxy_set_header ORIG_CLIENT_IP $remote_addr;proxy_set_header Host $http_host;proxy_set_header X-Forwarded-By $server_addr:$server_port;proxy_set_header X-Forwarded-For $http_x_forwarded_for;proxy_pass http://test2081;}}server {listen 2081;server_name localhost;access_log logs/access81.log main;location / {root html;proxy_set_header Host $http_host;proxy_set_header X-Forwarded-By $server_addr:$server_port;proxy_set_header X-Forwarded-For $http_x_forwarded_for;proxy_pass http://test2082;}}server {listen 2082;server_name localhost;access_log logs/access82.log main;location / {root html;proxy_set_header Host $http_host;proxy_set_header X-Forwarded-By $server_addr:$server_port;proxy_set_header X-Forwarded-For $http_x_forwarded_for;proxy_pass http://test2083;}}} server 2080收到请求后会设置一个不规范的HTTP头，后面连接了2段server。发起请求后,日志如下==> logs/access80.log <==– – 10.210.208.47 – – [29/Mar/2013:20:18:43 +0800] “GET / HTTP/1.1” 200 52873 “-” “curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5” “-” “10.209.128.28:2081” ...

lvs+nginx做负载均衡的架构

随着开源技术的发展，以及商业设备价格的不断攀升。大公司总是希望能使用开源的方案来替换过去使用的商业设备。比如之前大家用的很多的F5和A10,现在已经在逐步被LVS替换。传统的单个lvs的性能是比不上商业设备的，而且稳定性等也相对会差些。去年淘宝开源了对LVS新增的FULLNAT，并且在公开的PPT里也详细介绍了淘宝使用的架构。基本思路就是把多个LVS组成一个OSPF集群，这样可以使得LVS集群的性能可以远远超过单个传统的商业设备（当然，对于F5等等其实也可以做这样的集群做水平化的扩展）然后因为LVS上不能做7层的一些操作和ssl卸载，所以下面挂一个nginx或者haproxy就可以做一个全局的负载均衡了。不过关键还是在于要有配套化的维护平台才行。因为使用OSPF协议对到多个LVS机器的连接进行的状态检测，不能针对多个端口，所以最好每个VIP上只使用一个端口。如果一个VIP上使用多个端口的话，会引起一些问题。比如一个LVS访问后端nginx因为自己网络链路的出现问题时，可以使得这个LVS把上面绑定的VIP删除了，这样就不会影响外部用户的访问。但是如果上面帮顶了多个端口的话就很难权衡这样的策略。如果后端的单个APP上是跑了多种程序的，而且相互没有关系（对于公有云来说，其实很多人这样干的，或许习惯了在大公司干活的人不能理解，但是对于小企业来说少用一个服务器能节省成本就少用一个），那么后端所有APP的单个端口如果都挂了，前面的LVS是否删除VIP就比较难判断了，只能是做特殊的策略，如果所有的端口都挂了再回收掉VIP。

集群流量视频的制作思路

去年解决过一次线上的问题，因为实际分析起来比较麻烦，机器又非常多。当时表象是代理服务器的负载不均衡引起的后端服务器雪崩。不过后来通过制作集群的流量视频进行回放，对分析原因很有帮助。简单的说一下思路。比如有10个代理服务器，后端共挂了1000个服务器，那么要分析一段时间内后端所有服务器在每秒的请求量大小。那么就把代理服务器上的日志统统放到一起，使用脚本解析出每个服务器在每秒内的请求量，以机器序号为横坐标，单机每秒的访问量为纵坐标，然后按照时间递增，把每秒的情况都画一个图，输出多一个文件夹内（脚本里面使用gnuplot会非常方便的，可以设置好title里保护时间之类的），然后使用ffmpeg就可以把所以的图片合并成一个视频文件，为了演示方便可以使用flv文件。把这样的文件放到nginx服务器上面，在主目录下写一个html文件，引用这样的视频文件就可以直接页面演示了。ffmpeg -f image2 -i %d.png -s 1366×768 xxx.flv [html] [/html]html里迁入视频参考了这里。这里