I just finished to configure RHQ 3.0.0 to monitor a JBoss cluster on Amazon EC2 (Elastic Compute Cloud) and I want to share some RHQ limitations expecting to help improving this helpful tool. First I'll describe my problem: I have a cluster that goes from 2 to 165 nodes in few minutes according to the number of expected users on the application on some period of the day. This is done completely automatically, without manual intervention. The issues I had to lead with are:
1) The Agents registers itself in the Server informing an "Agent Name". I would like to register the agent names for my 165 full capacity cluster and monitor the instances when it becomes online (for example: jboss1, jboss2, ..., jbossn). But there is one problem. The agent name is bound to the IP address of the monitored instance. The problem is that the machines have dynamic IP addresses and I have to generate the name on the agent startup based on the given IP address. That approach leads me to lots of garbage resources and to conflicts of names. And I always have to do manual maintenance on the RHQ inventory and committed resources.
2) It is not clear how the Agent communicates with the Server. Basically, I have to inform the Server host and port (7080), without authentication. But the port 7080 is also used for the http access. If I publish the access to this port on this machine to the internet, anyone could register new agents to my inventory, what is not desired. I will put an Apache in front of it, but I'll have to discover what URL paths to block to avoid the agent registering. It would be nice if the RHQ documentation clarifies this security issue.
For now this are my issues and I think it is very important to address the deploy of RHQ to monitor elastic cloud services. I am available to help, if desired.
Fábio Lima Santos
By default the agent uses the host name (or IP address) for its name; however, you are free to choose whatever you want. The name though should be unique and consistent. And of course in EC2 as you pointed out, host names and IP addresses can and will change. Instead of using the host name use the machines instance ID which should be both unique and consistent across machine restarts for the lifetime of the mahcine. You can obtain the instance ID as follows,
$ curl 169.254.169.254/1.0/meta-data/instance-id
As for your second issue, consider using only the private IP addresses for agent/server communication.
Hi John, thanks for your time.
The problem is that RHQ maintains a relation between the agent name and the IP address, rejecting the agent connections even if the old and unavailable agent is not committed.
The problem is that I have to publish the RHQ http interface to the internet. This means making public the access to the RHQ host, port 7080. Other problem is that on Amazon I cannot define the IP range that I will use, can I?
RHQ does not explicitly maintain a relationship between the agent name and the IP address. That relationship exists only in so far as the IP address is used as the default agent name if nothing is specified. Let's say you start an agent as follows,
$ rhq-agent.sh --cleanconfig
Agent Name [127.0.0.1] : # i-35b75b5 (the machine's instance id) Agent Hostname or IP Address [!*] : # leave this blank!
The agent name gets stored (most importantly) in the RHQ inventory whereas the value specified/used for the agent's host name/IP address is stored and used by the agent/server communications layer. During your agent set up did you specify the IP address for it to use? If so, this will results in connectivity problems with the RHQ server when the agent macine restarts. By not specifying a value during set up, the agent will use the machine's current address which is what you want particularly in an environment like EC2.
If you want to use an IP range, you would need to use Amazon's VPC service. If you go that route, you are going to be dealing with something much more like an intranet/vpn deployment. I'd love to hear about it if you do go that route.
There is possibly another way to do this. Its not used much, but the capability is there - you can change the port the server listens to for agent traffic. This is documented here (in the context of setting up secure SSL comm from agent to server, but I think the same concept can be used if you just want to have a different port the agents talk to vs. the port the browsers send HTTP traffic over):
This talks about using the "sslsocket" transport, since you talk about using port 7080 (which is the unsecured port) you could even use the "socket" transport if you don't care about encryption/authentication (but then again, you are talking about exposing this over a public endpoint - so I highly recommend you read that entire wiki page that talks about Securing Communications).
> "problem is that RHQ maintains a relation between the agent name and the IP address"
As to your first point here about agent name and hostname/IP address - RHQ does maintain a relation between the agent name and IP address for communication purposes, but it can change and is meant to change. Yes the RHQ_AGENT table does have a relation to the agent name and the agent's comm endpoint address and port, but as long as your agent maintains its security token, that relationship can change during the next agent registration (more on this below). Yes, the *default* value that agent will use for its name is the hostname - but that's only in the case where you don't explicitly give the agent a name. The agent has to have a name, and if you don't give it one, the agent has to come up with one itself - the best thing the agent can use is the hostname as its built-in default. If you don't like the default, set the agent's name explicitly yourself (either at the startup setup prompt, setting the preference rhq.agent.name in agent-configuration.xml before you start the agent the first time or pass in -Drhq.agent.name=your-name-here as a command line argument to the agent or in RHQ_AGENT_ADDITIONAL_JAVA_OPTS or RHQ_AGENT_CMDLINE_OPTS - you have several ways to configure this).
The RHQ agent name can be anything (it was designed purposefully to NOT HAVE TO BE the same as the IP address or hostname for this very reason you bring up - that is, what happens if the hostname/IP changes under the covers. In that case, we wanted the agent identity to remain the same but its endpoint address to be able to change). Read this wiki page for how the agent name is used, and what happens if IP/hostnames change during registration:
Read that page. You may be interested in the paragraph that starts, "If, however, the agent wants to change its IP or port but doesn't have its security token, the server will reject that registration request because it will look like some other agent on this other IP/port is trying to "hijack" an existing agent registration". So in short, you can do what you want, you just have to keep the old agent's persisted configuration around, specifically its security token. If you start the agent with --cleanconfig or otherwise delete the agent's Java Preferences while at the same time change the agent's hostname or IP (along with changing the name), you will get security errors and comm will be rejected - that wiki page above explains why.
> "even if the old and unavailable agent is not committed."
BTW: committing an agent to inventory (or not) has nothing to do with whether or not an agent is registered and can talk to the server. Agent registration and communication is purposefully separated from the fact that an agent is committed to inventory or not. Remember, the Agent resource that you commit to inventory is nothing more than a resource that is created via the RHQ Agent plugin (we have a plugin, just like all the rest, that is for managing the agent itself). You could technically not deploy the rhq-agent plugin (in case you want to minimize agent footprint and you don't want to manage the agent itself through RHQ) and everything will work fine - because the agent registration/comm is built into the core agent - it has nothing to do with the agent plugin or the agent resource that you commit to inventory. So when talking about agent registration and communication, remember that has nothing to do with inventory or the agent resource or the agent plugin.
Very clarifying, but the problem persists exactly because I use just one image (Amazon AMI) to start all my instances and I define the agent name according to a centralized script (this script distributes the agent names according to the active instances registered on my Gossip Routers). In this case the security token is just gone, cause on the end of the high load period the instances are terminated. In general is cheaper to start new instances than to persist its contents on EBS.
I really don't want to import and discard each resource every day. I could try to do it with the client API but the problem still annoying. For example, yesterday I had a problem cause an instance was initiated (coincidentally) with the same IP address that another instance was some time ago (2 or 3 days ago). The agent simply can't register on the server. Other problem is the lots of resources that appears on the auto-inventory. The old ones still there and I can't find, even on the client API, a way to remove this occurrences from the auto-inventory.
To reiterate, you cannot reliably use IP addresses for agent names in EC2 and likely other cloud providers for that matter. Using a recycled IP address in the comm layer shouldn't be an issue though. Using instance IDs should be much more reliable, and I encourage you to explore that option.
The other issue you bring up, involving the inventory growing out of control with resources no longer existence has been discussed some, and we are considered how best to solve this. Ultimately, we would like to to come up with a solution that would be resuable across different cloud providers. For now, I can offer the following suggestion. You could write a server plugin that runs as a scheduled, reocurring job. The job could take care of auto-importing machines as well as purging terminated machines from inventory.
If you are not familiar with server plugins, they were introduced in RHQ 3.0.0 (and in JON 2.4.0 if you are using JON). You can read more about server plugins here. Additionally, you can look at the Groovy Server Script plugin if you are interested in a more lightweight approach that allows you to implement your plugin functionality with Groovy scripts. This would be a great area to contribute to the project if that is something you are interested in.