During the last years, I’ve been frequently working on Documentum platform end-user application performance monitoring and Documentum monitoring in general. This work resulted in the implementation of a monitoring platform based on two solutions:
- a system for running script-based platform sanity checks, running active interface monitoring, DQL query monitoring, etc… I will give details about this “back-end” monitoring system in a future post.
- a system based on the centralization of logs produced by the major components of the platform
I will certainly post about the first system in future posts but the aim of my vey next posts will be to to give details about the logs centralizing system, as I consider it to be a quick-win solution for someone who would like to implement a non-intrusive but yet very powerful real-time monitoring system:
- Part 1 (this very first post) will present the logic/methodology which lead to the design of the solution
- Part 2 will focus on giving the practical details of the installations steps to be performed to instrument the Webtop and JMS components.
- Part 3 will broaden the scope and give guidelines for the instrumentation of other Documentum components
The solution proposed in this article is certainly far from being perfect. My aim is just to present a solution which is a proven one and which may help some people improve the way they monitor their Documentum platform, at a very little price/effort. I wish I had such a simple but yet powerful logging system on the different Documentum platforms I worked on in my previous experiences.
Note that I do not have any type of link or interest with any of the (commercial or free) packages presented in this article.
The solution has been implemented on a 6.7 platform but can be used on any 6.X platforms. The only exception would be the information related to index server logs (to be adapted for Fast).
The setup was performed on a Windows infrastructure but the different components of the solution support both Windows and Linux/Unix systems.
Solution selection criteria
The overall logic of the solution is the following: all components of the Documentum platform should forward their own logs (without having to really care about their format) to a central location in order to be able to:
- monitor their activity
- detect errors and alert if necessary
- trace user transactions
- calculate any bottleneck in transactions
The main requirements for the logs centralization solution were the following:
- it should be possible to view logs in real-time in an interface
- it should be possible to easily instrument the following components: Webtop, ACS, JMS, Content server processes, docbrokers, CTS, indexing server
- the performance impact of the components’ instrumentation should be unnoticeable for end-users. A 2% performance impact in terms of execution time was acceptable but not more.
- it should possible to report on historical data, ideally in the same interface used for real-time logs viewing
- it should be ideally possible to trace an end-user transaction even if it is spread over different components (example: start promotion in Webtop but trace the promotion on the JMS (in case some lifecycle code is involved)).
Some very interesting tools out there
I tested several solutions when trying to find an acceptable compromise in terms of price, performance and quality. I will not list all the solutions I studied but will present the major ones. The study of each single solution was an excellent way to get to a more precise definition of what the “ideal” solution should be.
At the very beginning, I thought I should have a look at whether I would find a decent solution based on the JMS (Java Messaging Services). There are many JMS implementations (ActiveMQ, JBoss Messaging, OpenJMS,…), but I found it difficult to find a nice end-user GUI which would be suited for JMS-based logs centralization. On the other side, the more I was reading about JMS-messages persisting, the more I was worried about the overall messages persisting rate, as most implementations of JMS-messages persistence rely on standard relational databases, which are not performing too well when it comes to potentially thousands of inserts per second. Bringing a NoSQL database to a JMS environment is possible but I was nearly sure this would require some more time.
-> In the ideal solution, the back-end database should not be the bottleneck for reaching up to thousands of events per second.
Splunk, which is really a fantastic piece of software, was found to be too complex for final users, and more precisely the learning curve for its search language. Splunk was also over-kill in this context, too expensive for its benefits compared to the chosen solution.
-> The ideal solution should have a straightforward and fast search interface based on simple forms using simple criteria.
Another very trendy solution for event logs centralization and analysis is the combination of ElasticSearch (based on Lucene), Logstash and Kibana, aka ElasticSearch ELK. I would definitely work on such a solution if I had the time to. I took a sort of shortcut over this type of solution as it would really require more time to setup and to maintain.
-> Logstash seems to be a very good candidate for non-trivial logs forwarding
-> The ideal solution should not be too intrusive otherwise it will require changes when upgrading to newer versions or when customizing the webtop interface.
And here comes a (VERY) good compromise: Logfaces
In short, Logfaces is a logging solution composed of:
- a logging server which receives logs, makes them available for real-time viewing or for off-line data mining (the logging server persists logs to a database). Note that although the logging server can be connected to standard RDMS databases, it is highly recommended to connect it to MongoDB in high throughput environments. Very important point: the logfaces server is compatible with syslog.
- a viewer interface, which is a hard client based on Eclipse RCP and which permits the both live-access and reporting on logs
- a set of appenders which permit the asynchronous sending of logs from applications to the logfaces server. The different appenders are compatible with the most popular logging frameworks: log4j, logback, log4php, NLog, log4net, log4cpp, slf4j,…
If we consider a simple Documentum platform having Webtop as end-user interface, we can distinguish two types of components:
- recent components which are run on a JVM machine, developed in Java and using Log4j as a logging solution: Webtop, DA, the Java Method Server, ACS, BOCS, CTS, xPlore
- “legacy” components which only produce logs in flat files: the content server process, the docbroker process.
Forwarding log4J-enabled components’ logs
Forwarding logs to Logfaces from components using log4j is as simple as defining the log4j logfaces asynchronous appender in the log4j configuration.
log4j.appender.LFS = com.moonlit.logfaces.appenders.AsyncSocketAppender
log4j.appender.LFS.application = APP-1
log4j.appender.LFS.remoteHost = host1,host2,host3
log4j.appender.LFS.port = 55200
log4j.appender.LFS.locationInfo = true
log4j.appender.LFS.threshold = ALL
log4j.appender.LFS.reconnectionDelay = 5000
log4j.appender.LFS.offerTimeout = 0
log4j.appender.LFS.queueSize = 100
log4j.appender.LFS.backupFile = “c:/lfs-backup.log”
Although INFO-level logs forwarding is sufficient for most of the “advanced” components, Java Method Server and Webtop components do not communicate much about their activity in INFO-level logs.
- By default the only logs that are traced in the JMS and Webtop are ERROR-level components. Enabling DEBUG-level messages would generate too many logs, and those would actually not be too useful.
- Modifying the JMS or Webtop code to insert custom logging code was impossible.
- Using some AOP method to inject custom logging code was considered to intrusive and dangerous in terms of potential conflicts with the out-of-the box BOF AOP mechanism.
- Using Webtop WDK tracing logs would imply using some code that is not primarily meant to be used in production and would not provide logging for custom components/actions.
A CUSTOM SERVLET FILTER
The biggest part of the integration of the logfaces tool was indeed the development of a very small but yet powerful servlet filter which helps JMS and Webtop components provide useful event logs. We will also see in Part 2 that the implemented servlet filter uses log4j MDC.
The same servlet filter is used for tracking both JMS and Webtop activities. For each call, the execution time of the call (time the request is received minus the time the response is sent) is traced and some of the parameters sent in the requests are added to the log4j event forwarded to the logfaces server and permit the “contextualization” of logs (adding username when possible for example)
Part 2 will give the details of this servlet filter.
Forwarding legacy components logs
Several tools were tested for the need of forwarding legacy components logs files to the logfaces server. In Linux or Unix environments, there are several ways to forward flat files changes to a syslog server: using the logger utility could be the easiest. On windows, finding a decent reliable and free (why pay for such a service?) program for such needs is a little bit more complicated. Logstash was found to be the best and most reliable tool to monitor Documentum legacy components’ logs like the content server logs and the docbrokers logs. As said before, Logstash is one of the most famous open-source tool for collecting, parsing and transforming event logs and forwarding them to other systems. Although logstash has lately be integrated into the elasticsearch stack, it is fully usable on its own. Logstash can also be used in Unix/Linux environments.
Part 3 will give the details of the logstash configuration used.
Production: figures and news from the front
The logs centralization solution has been used for 6 months now and here are the major figures:
- Performance: load tests performed in an environment comparable to our production environment show that, at standard production load (about 200 active Documentum users), the performance impact cannot be detected. We actually had some runs where the system was faster when using logfaces… Conclusion: no impact in terms of performance. Highest logs inflow rate observed in production: 960 events/sec
- Scale: we keep one month of logs in the live production database. This represents about 50Gb of data
- Stability: no problem with the logfaces server. The server has been up for several months. One MongoDB database crash, investigation on-going, seems to be related to the amount of allocated memory. Not a single problem with logstash.
Production: other takeaways
In the middle of the implementation of the Logfaces solution, it became clear that the productive system would not be the only environment to benefit from the solution. As the solution enables an easy viewing of real-time logs coming from different components, the developers team started using it as their major debugging/tracing tool. Having logs coming from both webtop and the JMS appear in real-time in the exact same interface is just priceless. The sole usage of such a system for development activities justifies its price. Having used it for several months now, I think developers could not work without it anymore…