今天遇到了一个和dns ttl相关的问题。线上一个nginx服务器代理了一些外部的资源,把外部的http的资源变成https的供我们自己的https页面上用。但是今天看到了有很多错误日志,显示的是连upstream的机器失败了。我看了一下配置文件,直接在nginx服务器上访问配置的url是正常访问的。再在nginx服务器上解析了一下对应的IP,发现和错误日志里显示的不一样了。看样子是外部的dns切换了IP,nginx一直是在访问老的失效的IP。

网上看了一下nginx的WIKI,也问了一下tengine的开发同学。nginx wiki上说是会遵循DNS的ttl设置,但是结果确实不是这样。自己简单测试了一下。测试环境:1. 1台linux服务器,装上nginx-1.2.8即可。2. 1台linux服务器跑dnsmasq,设置好ttl并开启日志,也在上面装了wireshark方便抓包。配置文件如下worker_processes 1;error_log logs/error.log;events {worker_connections 1024;}http {include mime.types;default_type application/octet-stream;sendfile on;keepalive_timeout 65;server {listen 8888;server_name localhost;charset utf-8;

}}

发现启动的时候会做4次dns查询,但是后面无论多久是不会重新进行nginx.test.org的查询的,而wireshark显示TTL确实是被置为了10s。Domain Name System (response)[Request In: 7][Time: 0.000074000 seconds]Transaction ID: 0x925fFlags: 0x8580 (Standard query response, No error)1… …. …. …. = Response: Message is a response.000 0… …. …. = Opcode: Standard query (0)…. .1.. …. …. = Authoritative: Server is an authority for domain…. ..0. …. …. = Truncated: Message is not truncated…. …1 …. …. = Recursion desired: Do query recursively…. …. 1… …. = Recursion available: Server can do recursive queries…. …. .0.. …. = Z: reserved (0)…. …. ..0. …. = Answer authenticated: Answer/authority portion was not authenticated by the server…. …. …. 0000 = Reply code: No error (0)Questions: 1Answer RRs: 1Authority RRs: 0Additional RRs: 0Queriesnginx.test.org: type A, class INName: nginx.test.orgType: A (Host address)Class: IN (0x0001)Answersnginx.test.org: type A, class IN, addr 220.xx.xx.xxName: nginx.test.orgType: A (Host address)Class: IN (0x0001)Time to live: 10 secondsData length: 4Addr: 220.xx.xx.xx


自己测试了一下其实只有另外加上resolver才能使得nginx遵循ttl时间的设置。

resolver 220.xx.xx.xx  valid=15s;….server {listen 9002;server_name  localhost;charset utf-8;location / {proxy_pass http://$http_host:8888;}}
Apr  8 14:22:45 dnsmasq[6970]: query[A] nginx.test.org from 220.xx.xx.xxApr  8 14:22:45 dnsmasq[6970]: /home/admin/dnsmasq/dnsmasq.hosts nginx.test.org is 220.xx.xx.xxApr  8 14:23:01 dnsmasq[6970]: query[A] nginx.test.org from 220.xx.xx.xxApr  8 14:23:01 dnsmasq[6970]: /home/admin/dnsmasq/dnsmasq.hosts nginx.test.org is 220.xx.xx.xxApr  8 14:23:17 dnsmasq[6970]: query[A] nginx.test.org from 220.xx.xx.xxApr  8 14:23:17 dnsmasq[6970]: /home/admin/dnsmasq/dnsmasq.hosts nginx.test.org is 220.xx.xx.xxApr  8 14:23:33 dnsmasq[6970]: query[A] nginx.test.org from 220.xx.xx.xxApr  8 14:23:33 dnsmasq[6970]: /home/admin/dnsmasq/dnsmasq.hosts nginx.test.org is 220.xx.xx.xx
最终才能得到前面的这种效果,每个15s重新查询一次。