Can we please ask some advice on how to get HornetQ to start up more quickly? Currently, it can take several hours for our server to start up.
We have a HornetQ sever with a single topic, with a typical message size of around 3-5K, and the following address settings:
The Java MX setting is set to 10240m, which we thought would be plenty to cover the running of HornetQ with a 4096MiB topic.
When we start HornetQ, it very quickly slows down as it reads each file in the journal. The slowdown seems to be the checkDeleteSize() anonymous class in JournalImpl.load().
For most of the journal files, the free memory is not less than 20% of the maximum memory, so this method does nothing. But when it gets about 80% of the way through, the loading process slows right down. We have seen start-up times of several hours in total.
Some more observations:
- most of the files have a deleteCount of 20001, exactly one more than the threshold.
- the amount of memory saved by this loop does not make any significant difference to the heap usage shown in the JMX console (which also shows Eden space and Old Gen space as full)
- this method loops over all messages in memory for each journal file after a certain threshold. It is therefore order O(M*N), where M is the number of messages and N the number of journal files.
- this method is single-threaded, which means we are limited by the speed of each CPU core; if it used a multi-threaded approach (even just to find the records), it would give us a significant speed boost.
The main problem this causes is that we need to artificially increase the MX heap usage setting for Java in order to start HornetQ in a sensible time-frame. But there is no way to reduce this setting without restarting HornetQ, which means waiting for the journal directory to become small. (We think we need between 12 and 14Gb to start up avoiding this loop, for a single 4Gb address).
Would it be better to have fewer, larger journal files, or would this cause a performance problem during normal use?
Would it be better to have more, smaller journal files, to try and avoid having 20000 deletes in each file?
Is there any way we can "compact" the journal (ie remove deleted records) while HornetQ is stopped?
Many thanks for any help or advice.
We are working on a test case, and think we've reproduced something very similar to the issue that we saw on the live server.
It looks like part of the problem is the garbage collection method, when we run our unit tests on the default JVM settings, it works fine, but when we set it to the default from run.sh, the system slows right down during journal load (-XX:+UseParallelGC -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -XX:ParallelGCThreads=6). We haven't seen checkDeleteSize() take progressively longer in the test yet though.
We will hopefully be in a position to create a JIRA issue tomorrow.
The data was generated by TRUNK version #9716, and read back by the same version. We're still working on a unit test to replicate the problem; we think we know how to do it but the amount of data involved is significant enough that it's taking longer than we expected.
We did find one issue though: the default run.sh file contains settings for Xmx but not Xms, so the total memory will likely be smaller than the maximum memory at the point where the journal loads. JournalImpl line 1480 tests to see if memory is critical by comparing freeMemory with maxMemory, but freeMemory is the amount of total memory that is free, even if total memory is less than max memory. So the code could end up flushing deletes even if memory is not critical, but only a small amount of total memory is available.
Unless you are trying to reduce memory expansion during the journal load for some reason, but memory expansion characteristics are controlled by -XX:MinHeapFreeRatio and -XX:MaxHeapFreeRatio so it would seem strange to attempt such here?