Yea for clustering in tomcat!

July 25, 2005

Well, it only took a week to find on little crucial line missing in my tomcat configuration. I’ve been spending obscene amounts of time trying to get a simple tomcat cluster up and running for fail-over testing and tonight - it actually worked. It’s pretty slick how I can just drop one of the servers and my tapestry sessions still work. Bring the dead server back up and *poof* it starts processing requests.

It was quite frustrating trying to wade through the thousands of posts on how to cluster (98% of which didn’t pertain to what I was trying to do). I also find it amazing that if I followed what most of these sites said to do, I’d still be scratching my head trying to figure things out.

My goal: failover.


Things to note when trying to achieve failover:
1) Don’t use sticky sessions (in mod_jk, the sticky_session value is defaulted to true - good for general load balancing - bad for fail-over clustering).
2) Make sure you have a <distributable /> tag in the web.xml file associated with your web application. That’s it. That’s all you need in your web.xml. I was looking for some really complicated settings or a ton of attributes to put on the tag. Nothing. You just need the existence of the tag for things to work (well, plus all the other junk I’m about to post). I almost put that tag in my global web.xml file (under tomcat/conf), but then I realized that I didn’t want session replication for the admin console, so I decided to keep it an individual application level.

I still have a lot to learn about the plethora of settings to set when doing the clustering thing, but just so I don’t lose my current setup, here it is:

My httpd.conf:
JkWorkersFile conf/workers.properties
JkLogFile logs/mod_jk.log
JkLogLevel info
 
 
JkMount /web/* loadbalancer
JkUnMount /web/images/* loadbalancer

Note: Notice how I’m unmounting the images folder? This will send requests to that folder to the folder set up in my VirtualHost section defined before here. There are betters ways of doing this. Just so you know (like JkAlias or something).

Also note that this points to “conf/worker.properties”. Here’s what I setup:
#
# workers.properties
#
#
# In Unix, we use forward slashes:
ps=/
#
# list the workers by name
#
worker.list=loadbalancer
#
# ------------------------
# First tomcat server
# ------------------------
worker.tomcat1.type=ajp13
worker.tomcat1.host=localhost
worker.tomcat1.port=8009
worker.tomcat1.lbfactor=50
worker.tomcat1.cachesize=10
worker.tomcat1.cache_timeout=600
worker.tomcat1.socket_keepalive=1
worker.tomcat1.recycle_timeout=300
worker.tomcat1.domain=cluster1
#
# ------------------------
# Second tomcat server
# ------------------------
worker.tomcat2.type=ajp13
worker.tomcat2.host=localhost
worker.tomcat2.port=9009
worker.tomcat2.lbfactor=50
worker.tomcat2.cachesize=10
worker.tomcat2.cache_timeout=600
worker.tomcat2.socket_keepalive=1
worker.tomcat2.recycle_timeout=300
worker.tomcat2.domain=cluster1
#
# ------------------------
# Load Balancer worker
# ------------------------
#
# The loadbalancer (type lb) worker performs weighted round-robin
# load balancing with sticky sessions.
# Note:
# ----> If a worker dies, the load balancer will check its state
# once in a while. Until then all work is redirected to peer
# worker.
worker.loadbalancer.type=lb
worker.loadbalancer.balanced_workers=tomcat1, tomcat2
worker.loadbalancer.sticky_session=0
#
# END workers.properties
#

Note: When I setup tomcat, I setup two (nearly) identical folders on my system. Therefore, I have two main worker nodes here (tomcat1 and tomcat2). Now, this file went through several iterations before it ended up this way. Things I’m not sure about… that “domain” setting on the workers. Do I need it? That worker.list variable - some examples on the net show all the works (tomcat1, tomcat2, AND loadbalancer), but one example only had the loadbalancer. That later made more sense to me - and hey, it works! :)

Now, for tomcat. I setup one instance of tomcat on my box under /usr/local/tomcat. I got it all set up nice and neat (with my web app already installed) and then copied the entire folder to /usr/local/tomcat2. Then I changed the server.xml file to reflect the changes needed to get two instances of tomcat working in harmony on the same machine. Specifically, I had to change the ports that tomcat listens on for the TCP layer and the shutdown service. No biggie here - TONS of examples on the net. But the “cluster” node was the tough guy.

The “cluster node” goes inside the “Host” node (which should already be there). Here’s my version of the “cluster” node (with notable settings bolded):
<Cluster
   className="org.apache.catalina.cluster.tcp.SimpleTcpCluster"
   managerClassName="org.apache.catalina.cluster.session.DeltaManager"
   expireSessionsOnShutdown="false"
   useDirtyFlag="true">
   <Membership
      className="org.apache.catalina.cluster.mcast.McastService"
      mcastAddr="228.0.0.4"
      mcastPort="45564"
      mcastFrequency="500"
      mcastDropTime="3000"/>
   <Receiver
      className="org.apache.catalina.cluster.tcp.ReplicationListener"
      tcpListenAddress="127.0.0.1"
      tcpListenPort="4002
      tcpSelectorTimeout=”100″
      tcpThreadCount=”6″/>
   <Sender
      className=”org.apache.catalina.cluster.tcp.ReplicationTransmitter”
      replicationMode=”synchronous”/>
   <Valve
      className=”org.apache.catalina.cluster.tcp.ReplicationValve”
      filter=”.*\.gif;.*\.js;.*\.jpg;.*\.htm;.*\.html;.*\.txt;”/>
</cluster>

Note: In the above example, there’s a listener port that I bold faced. That value must be unique for each instance of tomcat. There’s nothing too terribly magical about what it is. All the different instances of tomcat figure out what that value is amongst each other.

Also, I just noticed that I have “expireSessionsOnShutdown” set to “false”. This could be bad. I’ll have to check that one out more. It was set to true (as per almost every example on the web), but I changed it as part of my testing. I may have to change it back.

Also, there’s an attribute you must add to the “Engine” node. It’s the name of the tomcat instance. The attribute is called “jvmRoute”:

<Engine
   jvmRoute=”tomcat1″
   defaultHost=”localhost”
   name=”Catalina”>

The value of this attribute MUST MATCH the worker name in the “worker.properties” file (in my case, I called the tomcat instances “tomcat1″ and “tomcat2″).

Hey, that’s about it. If you’ll notice, I have a replicationMode setting in my server.xml files that is set to “synchronous”. That’s my next task - figure what that does. In most of the examples I found, it was set to “pooled”, but in my quest for something that worked, I changed it to synchronous. I think it’s a performance thing, but I have to verify that.

Now on to some sweet, sweet slumber…. aaaaaahhhh….

Posted in Technobabble

Leave a Reply

Powered by WordPress