I've just started playing around with jopr, so I wouldn't be surprised if this is a rookie mistake, but I've got jopr server 2.1.0.GA installed on an os x test machine, and I've got a jopr agent 2.1.0.GA installed on that same machine as well. The agent properly registers, I can add it to my inventory and I can add the JBoss AS instance also running on that box to the inventory, so it seems that server and agent are communicating fine. The problem is that while I can see data collection on traits like architecture, host name, etc, all the graphs show "No Metric Data Available". The configure tab under monitor shows that collection is enabled for CPU and memory statistics, but they aren't showing up, and I see the same behavior with JBoss AS performance data as well. Am I missing something?
The basic answer would be "did you wait long enough for the collection interval?"
Each metric is collected on a custom-defined "collection interval". Many of the defaults are 10 minutes or even 20 minutes (you can change these if you want faster collections - the minimum allowed is 30 seconds).
If you did wait long enough, and you still see "No Metric Data Available", the next most common cause is your "Metric Display Range" is set to some time in the past. Look at the time frame specified at the bottom (the default is something like "the last 8 hours"). If you changed this to some time frame in the past, before measurements were collected, you will see empty data graphs.
Lastly, if your server and agent are on different boxes, you must ensure your machines have their clocks synced (via ntp or some other sync mechanism). Since you say you are all running on the same MacOS box, this isn't your error.
If you still can't see any data after checking all of these, look in your agent and server logs for any error messages - post any error messages you see so we can try to figure out the problem.
Thanks for the quick reply. I think in part you nailed it with your basic answer. I hadn't waited long enough for the JBoss AS stats to be collected and when the CPU and other machine level info wasn't collected I assumed that was happening throughout. It seem the application server info is getting collected now however, just not the system information. I think it's possible that this is to blame:
sending> native The native system is NOT available on this agent platform. The native system is currently enabled. The native system is NOT initialized.
Can I take that to mean that native collection is not supported on os x?
that is quite possible. Alot of the platform level stats (CPU stats for example) are collected by the native layer. If the native layer is disabled for whatever reason, those stats are not collected.
"The native system is NOT available on this agent platform." tells me that yes your platform is not supported.
What hardware are you using? Heiko uses a Mac and has no problems with the native layer - but I don't know what hardware he has. I know other folks have used MacOS in the past. The JNI library (named SIGAR) definitely might not be supported on your Mac hardware, though - so let us know what you have and we can confirm.
It's a pre-unibody Macbook Pro, Intel Core 2 duo running os x 10.5.6. It's not a major issue for me if there isn't much support for Macs as I intend to use jopr to monitor linux boxes, it just means i've got to set up a different test environment. I've got to say, so far I'm impressed. After spending way too much time messing around with OpenNMS recently, I've been very happy with the features jopr provides and how easy it has been to set up, small roadblocks aside.
Here is is. The one modification I needed to make is that sigar.jar was sigar-184.108.40.206.jar in the lib folder. It does seem to be returning native information though.
garfunkel:lib dave$ ls -la libsigar*mac*
-rwxr-xr-x@ 1 dave dave 1069464 Apr 16 2008 libsigar-universal-macosx.dylib
garfunkel:lib dave$ java -jar sigar-220.127.116.11.jar
2 total CPUs..
--More-- (Page 1 of 2)
Sigar version.......java=18.104.22.168, native=22.214.171.124
Build date..........java=04/16/2008 09:44 AM, native=02/23/2008 12:20 AM
OS description......Mac OS X Leopard
OS patch level......unknown
OS vendor version...10.5
OS code name........Leopard
OS data model.......32
OS cpu endian.......little
Java vm version.....1.5.0_16-133
Java vm vendor......Apple Inc.
10:07 AM up 4 days, 18:14, load average: 1.10, 1.29, 1.39
File Systems.........[/, /dev, /dev]
Network Interfaces...[lo0, en0, en1, en2, en3, en4]
System resource limits:
Class Not Found: junit/framework/TestCase
Unable to locate: junit.jar
Interesting, as this is exactly the same what I see (except for the Model string, Mhz and cache size), so it should be good.
Did you by any chance disable native in conf/agent-configuration.xml:
entry key="rhq.agent.disable-native-system" value="false"
Do you see strange things in the agent.log - relatively shortly after startup?
What is the result of doing
native --enable at the agent prompt?
No problem Heiko. Thanks for helping out with this.
I just stopped and then restarted the agent and didn't see any interesting log messages. I also don't see any when disabling and re-enabling native.
I did not explicitly disable native in the conf file, although when I initially started the agent native was disabled. I enabled it from the console which is when I got the status messages I had posted before. Here is what I get when I disable and then re-enable it:
sending> sending> native --disable Native system has been disabled. sending> native The native system is NOT available on this agent platform. The native system is currently disabled. The native system is NOT initialized. sending> native --enable Native system has been enabled. sending> native The native system is NOT available on this agent platform. The native system is currently enabled. The native system is NOT initialized.
The native system is NOT available on this agent platform.
This is the important message. This tells us the SIGAR jar cannot find the native shared library (the .dylib library).
Let's look at the code that loads SIGAR:
Look at the static block and the initialize() method. There should be WARN messages in the agent log if bad things happen here. Double check the agent logs and see if you see messages from this SystemInfoFactory object.
BTW: here is the code for the "native" prompt command:
You can see it dumps the "NOT available" message when false is returned by SystemInfoFactory.isNativeSystemInfoAvailable().
Edit your agent's conf/log4j.xml so this is in there:
<category name="org.rhq.core.system"> <priority value="TRACE"/> </category>
and make sure the FILE appender allows TRACE:
<appender name="FILE" class="org.apache.log4j.RollingFileAppender"> <param name="Threshold" value="TRACE"/> <param name="File" value="logs/agent.log"/> ....
Run the agent again. See if the strack trace comes out from the static block of that factory class.
Now we're getting somewhere. I don't know how I missed that message before, but with tracing turned on I get:
2009-02-18 11:44:53,273 WARN [main] (org.rhq.core.system.SystemInfoFactory)- System
info API not accessible on this platform (native shared library not found in java.lib
2009-02-18 11:44:53,286 TRACE [main] (org.rhq.core.system.SystemInfoFactory)- Stack t
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Caused by: org.hyperic.sigar.SigarException: /Applications/jopr/jopr-agent-2.1.0.GA/l
... 15 more
So it looks like there is some problem loading the sigar library, even though it does exist in that location and has r-x permissions for everyone.
I don't know why kind of tracing SIGAR outputs, turn on TRACE for the "com" and "org" category in log4j.xml and restart the agent. Be prepared for a deluge of log messages in your log file :) You might have to increase the filesize/filecount settings for the file appender so it captures the messages up until the time SIGAR is trying to load the library.