twemproxy recommendation 翻译-阿里云开发者社区

原文：https://github.com/twitter/twemproxy/blob/master/notes/recommendation.md

译文：http://www.tianjiaguo.com/system-architecture/loadbalance/twemproxy-recommendation-document/

 
        If you are deploying nutcracker in your production environment, here are a few recommendations that might be worth considering.

建议,如果你想部署它到生产环境

Log Level

 
        By default debug logging is disabled in nutcracker. However, it is worthwhile running nutcracker with debug logging enabled and verbosity level set to LOG_INFO (-v 6 or –verbosity=6). This in reality does not add much overhead as you only pay the cost of checking an if condition for every log line encountered during the run time.

默认情况下debug日志是没有打开的.然而打开debug日志并把日志级别设置到LOG_INFO(-v 6或者–verbosity=6)是值得的,它不会增加过多的开销.你只需要支付检查是否每个日志行的条件都在运行时发生了的时间成本.

 
        At LOG_INFO level, nutcracker logs the life cycle of every client and server connection and important events like the server being ejected from the hash ring and so on. Eg.

在LOG_INFO的日志级别上,它记录了每个客户端的生命周期,每个服务器连接和重要的事件比如服务器被从hash环中退出等等.

 
        1[Thu Aug  2 00:03:09 2012] nc_proxy.c:336 accepted c 7 on p 6 from '127.0.0.1:54009'   
       
        2[Thu Aug  2 00:03:09 2012] nc_server.c:528 connected on s 8 to server '127.0.0.1:11211:1'    
       
        3[Thu Aug  2 00:03:09 2012] nc_core.c:270 req 1 on s 8 timedout    
       
        4[Thu Aug  2 00:03:09 2012] nc_core.c:207 close s 8 '127.0.0.1:11211' on event 0004 eof 0 done 0 rb 0 sb 20: Connection timed out    
       
        5[Thu Aug  2 00:03:09 2012] nc_server.c:406 close s 8 schedule error for req 1 len 20 type 5 from c 7: Connection timed out    
       
        6[Thu Aug  2 00:03:09 2012] nc_server.c:281 update pool 0 'alpha' to delete server '127.0.0.1:11211:1' for next 2 secs    
       
        7[Thu Aug  2 00:03:10 2012] nc_connection.c:314 recv on sd 7 eof rb 20 sb 35    
       
        8[Thu Aug  2 00:03:10 2012] nc_request.c:334 c 7 is done    
       
        9[Thu Aug  2 00:03:10 2012] nc_core.c:207 close c 7 '127.0.0.1:54009' on event 0001 eof 1 done 1 rb 20 sb 35    
       
        10[Thu Aug  2 00:03:11 2012] nc_proxy.c:336 accepted c 7 on p 6 from '127.0.0.1:54011'    
       
        11[Thu Aug  2 00:03:11 2012] nc_server.c:528 connected on s 8 to server '127.0.0.1:11212:1'    
       
        12[Thu Aug  2 00:03:12 2012] nc_connection.c:314 recv on sd 7 eof rb 20 sb 8    
       
        13[Thu Aug  2 00:03:12 2012] nc_request.c:334 c 7 is done    
       
        14[Thu Aug  2 00:03:12 2012] nc_core.c:207 close c 7 '127.0.0.1:54011' on event 0001 eof 1 done 1 rb 20 sb 8

 
        To enable debug logging, you have to compile nutcracker with logging enabled using –enable-debug=log configure option.

要打开日志,需要在编译的时候使用–enable-debug=log的配置选项

Liveness

活跃度

 
        Failures are a fact of life, especially when things are distributed. To be resilient against failures, it is recommended that you configure the following keys for every server pool. Eg:

在分布式的环境中,失败是常见的事情.要灵活的处理失败,建议在每个server pool中都添加如下的配置

 
        1  resilient_pool:   2  auto_eject_hosts: true    3  server_retry_timeout: 30000    4  server_failure_limit: 3
       
        Enabling auto_eject_hosts: ensures that a dead server can be ejected out of the hash ring after server_failure_limit: consecutive failures have been encountered on that said server. A non-zero server_retry_timeout: ensures that we don’t incorrectly mark a server as dead forever especially when the failures were really transient. The combination of server_retry_timeout: and server_failure_limit: controls the tradeoff between resiliency to permanent and transient failures.

auto_eject_hosts,保证死掉的服务器可以在server_failure_limit后被从hash环中退出来.
server_failure_limit:在上面的服务器上记录的连续失败次数
server_retry_timeout:确保我们不会错误的标记一个服务器为死掉,尤其当失败是相当短的时候.
这三个的组合:在永久故障和临时故障之间做权衡.

 
        To ensure that requests always succeed in the face of server ejections (auto_eject_hosts: is enabled), some form of retry must be implemented at the client layer since nutcracker itself does not retry a request. This client-side retry count must be greater than server_failure_limit: value, which ensures that the original request has a chance to make it to a live server.

为了确保请求每次都成功,在开启了auto_eject_hosts的情况下,因为它并不提供重发请求,所以实现某种形式的重发必须在客户端.客户端级别的重发次数必须大于server_failure_limit的值,以确保这个原始请求有机会发到一个活着的服务器.

Timeout

 
        It is always a good idea to configure nutcracker timeout: for every server pool, rather than purely relying on client-side timeouts. Eg:

配置nutcracker过期时间总是一个好的主意:对于每个服务器池配置,而不是单纯依靠客户端超时.

 
        1  resilient_pool_with_timeout:   
       
        2  auto_eject_hosts: true    
       
        3  server_retry_timeout: 30000    
       
        4  server_failure_limit: 3    
       
        5  timeout: 400    
       
        6  带超时的弹性池

 
        Relying only on client-side timeouts has the adverse effect of the original request having timedout on the client to proxy connection, but still pending and outstanding on the proxy to server connection. This further gets exacerbated when client retries the original request.

仅仅在客户端请求超时的回复对客户端到代理端的原始请求超时有相反的效应，但代理端到客户端的链接仍然会处于等待或未解决状态。当客户端有重试原始请求时它会被进一步加剧.

 
        By default, nutcracker waits indefinitely for any request sent to the server. However, when timeout: key is configured, a requests for which no response is received from the server in timeout: msec is timedout and an error response SERVER_ERROR Connection timed out\r\n is sent back to the client.

默认情况下,任何请求发送给服务器后,它会无限期的等待.当timeout被设置后,如果一个请求在timeout的时间过后还没有从服务器上得到响应,这时一个错误信息的响应SERVER_ERROR会被发送给客户端.

Error Response

 
        Whenever a request encounters failure on a server we usually send to the client a response with the general form – SERVER_ERROR \r\n (memcached) or -ERR (redis).

每当一个请求在服务器端遇到了失败,我们通常会给客户端发送一个错误的响应.

SERVER_ERROR\r\n 或者 ERR

For example, when a memcache server is down, this error response is usually:

SERVER_ERROR Connection refused\r\n or,
SERVER_ERROR Connection reset by peer\r\n

When the request timedout, the response is usually:

SERVER_ERROR Connection timed out\r\n

 
        Seeing a SERVER_ERROR or -ERR response should be considered as a transient failure by a client which makes the original request an ideal candidate for a retry.

SERVER_ERROR或者-ERR响应应该被认为是一个客户端传输失败,这时客户端是一个理想的重发的候选人.

read, writev and mbuf

 
        All memory for incoming requests and outgoing responses is allocated in mbuf. Mbuf enables zero copy for requests and responses flowing through the proxy. By default an mbuf is 16K bytes in size and this value can be tuned between 512 and 65K bytes using -m or –mbuf-size=N argument. Every connection has at least one mbuf allocated to it. This means that the number of concurrent connections nutcracker can support is dependent on the mbuf size. A small mbuf allows us to handle more connections, while a large mbuf allows us to read and write more data to and from kernel socket buffers.

….每一个连接至少拥有一个mbuf申请给它.这就表示它能支撑多少连接依赖于mbuf的大小.小的mbuf值允许我们处理更多的连接,大的mbuf允许我们读写更多的数据从或者去内核套间字缓冲区中.

 
        If nutcracker is meant to handle a large number of concurrent client connections, you should set the mbuf size to 512 or 1K bytes.

如果它要处理大量的客户端连接,你应该把mbuf值设置为512或者1k bytes.

Maximum Key Length

 
        The memcache ascii protocol specification limits the maximum length of the key to 250 characters. The key should not include whitespace, or ‘\r’ or ‘\n’ character. For redis, we have no such limitation. However, nutcracker requires the key to be stored in a contiguous memory region. Since all requests and responses in nutcracker are stored in mbuf, the maximum length of the redis key is limited by the size of the maximum available space for data in mbuf (mbuf_data_size()). This means that if you want your redis instances to handle large keys, you might want to choose large mbuf size set using -m or –mbuf-size=N command-line argument.

memcache ascii协议:250字节,不能有空白,\r \n字,而在redis中我们没有这些限制.然而,nutcraker需要把这个key存储在一个连续的内存空间.因为所有的请求和响应被存储在mbuf 中,所以最大的redis的key的长度被mbuf中最大的可用数据空间的大小所限制(mbuf_data_size()).,这个mbuf大小可以用 -m 或者 –mbuf-size=N 的命令行参数来设置.

Node Names for Consistent Hashing

 
        The server cluster in twemproxy can either be specified as list strings in format ‘host:port:weight’ or ‘host:port:weight name’.

 
        servers:   – 127.0.0.1:6379:1    – 127.0.0.1:6380:1    – 127.0.0.1:6381:1    – 127.0.0.1:6382:1
       
        Or,
       
        servers:   – 127.0.0.1:6379:1 server1    – 127.0.0.1:6380:1 server2    – 127.0.0.1:6381:1 server3    – 127.0.0.1:6382:1 server4

 
        In the former configuration, keys are mapped directly to ‘host:port:weight’ triplet and in the latter they are mapped to node names which are then mapped to nodes i.e. host:port pair. The latter configuration gives us the freedom to relocate nodes to a different server without disturbing the hash ring and hence makes this configuration ideal when auto_eject_hosts is set to false. See issue 25 for details.

…后者的配置给了我们自由迁移到不同的服务器节点，而不会干扰的哈希环，从而使得这种配置的理想时的auto_eject_hosts设置为false。有关详细信息，请参见问题25。

 
        Note that when using node names for consistent hashing, twemproxy ignores the weight value in the ‘host:port:weight name’ format string.

当配置节点名字给hash环时,它会忽略配置中的weight值

Hash Tags

 
        Hash Tags enables you to use part of the key for calculating the hash. When the hash tag is present, we use part of the key within the tag as the key to be used for consistent hashing. Otherwise, we use the full key as is. Hash tags enable you to map different keys to the same server as long as the part of the key within the tag is the same.

hash标签允许你使用部分key去计算hash值.当hash tag存在的时候,我们使用tag标签中的key去计算hash值.其它情况,我们使用全部的key去计算.它允许你去把不同的keys哈希到相同的服务器,只要它们在tag内的值是相同的.

 
        For example, the configuration of server pool beta, aslo shown below, specifies a two character hash_tag string – “{}”. This means that keys “user:{user1}:ids” and “user:{user1}:tweets” map to the same server because we compute the hash on “user1″. For a key like “user:user1:ids”, we use the entire string “user:user1:ids” to compute the hash and it may map to a different server.

 
        beta:   
       
        listen: 127.0.0.1:22122    
       
        hash: fnv1a_64    
       
        hash_tag: “{}”    
       
        distribution: ketama    
       
        auto_eject_hosts: false    
       
        timeout: 400    
       
        redis: true    
       
        servers:    
       
        – 127.0.0.1:6380:1 server1    
       
        – 127.0.0.1:6381:1 server2    
       
        – 127.0.0.1:6382:1 server3    
       
        – 127.0.0.1:6383:1 server4

   
   本文转自UltraSQL51CTO博客，原文链接：http://blog.51cto.com/ultrasql/1639569
    ，如需转载请自行联系原作者

twemproxy recommendation 翻译

热门文章

最新文章

相关电子书