OK, I'm biased. I think Java is the best language on the market today. But why should you use it when LotusScript achieves the same purpose? The answer lies in the many advanced features of the language. One of these features, which I will discuss in this article is called multi-threading.
A multi-threaded application is one that is divided into many smaller parts (or threads) by the programmer when writing the code. Each thread typically performs its own logical task within the program. The Lotus Domino server is an example of a multi-threaded application. Each server task (ie: HTTP, Agent Manager) runs in its own thread distinct from the main server's code. The main benefit of this multi-threaded approach is that if one thread is waiting for a resource to become available, it doesn't impact the performance of the other threads.
A project I was working on recently involved uploading a vast quantity of records from a Notes database to a relational database. First draft of the scheduled agent was written in LotusScript using ODBC. Normally this approach would work fine, but in this scenario the entire upload had to occur overnight. We were expecting approximately 100,000 records, and with an upload speed of one per second that equated to a total elapsed time of around 28 hours. Much too long. The nature of the task was a simple push to the relational database so suited itself very nicely to a multi-threaded Java solution.
A multi-threaded program works best when each logical unit of work does not rely on the completion of another. There are three important things to consider in a multi-threaded program:
- A mechanism to limit the total number of threads
- Inter-thread communications
- Serialising access to resources
The total number of threads should be limited, otherwise the server will very quickly fill up with thousands of threads. Each thread requires some system resources and execution time on the processor. Multiply this by 1000s and the performance gained by multi-threading begins to diminish.
The second factor to consider is inter-thread communications. A typical multi-threaded agent will consist of a main program loop with many child threads performing the work. At the most basic level, the child threads require a mechanism to tell the parent process that it has completed. Often child threads will also want to access properties in the parent process.
The third and most difficult to code is serial access to objects and resources. To maintain data integrity you must ensure that thread access is only one at a time. Again Java makes this easy through the 'synchronized' keyword. This keyword tells the JVM to only allow one thread at a time access. It is difficult to code because there is a small performance overhead for the JVM to synchronise the access. You need to ensure you only synchronise the critical code sections. In a large program these may be difficult to identify.
Java makes creating new threads a snap. Simply create a new class that extends NotesThread, then write the code your thread should execute in the runNotes() method. Executing the thread is as simple as calling NewThread.start() from the parent process. The thread is considered dead and may be garbage collected once the run method exits.
And now, on to the code:
The approach I have taken is one of simplicity. This is most likely because I'm not very bright and get confused easily, and the fact that this will be maintained by someone else who may not have intimate application knowledge. There are only two classes (excluding the JDBC and specific library classes). The first class is the main controlling class which spawns the worker threads. The second class is the worker thread, of which there are many at run-time. Keep in mind that the code examples have had some big chunks removed in order to improve readability. In addition there are two classes m_wcSystem and m_err that are from a library, just ignore references to these as they have no real bearing on the exercise. Note also that the DB2 JDBC driver jar file is attached to the agent by clicking the "Edit Project" button. As we can't show the full source, the following describes the three techniques we mentioned earlier. For your reference there is a full copy of the source code attached to this document.
1. A mechanism to limit the total number of threads
To limit the total number of threads running I have implemented a simple while loop. While there are more threads than the threshold (specified by the constant MAX_THREADS), the code stays "blocked" in a loop. When the worker thread completes, it decrements m_iThreadCount, effectively releasing the loop.
2. Inter-thread communications
This is the constructor for the new thread. Notice how we pass a reference to the parent object as a parameter. This means we can access any of the public methods or properties in the parent object from the worker thread.
When the thread has completed (i.e. as the last line in the runNotes() method completes) we decrement the thread counter in the parent. This allows the main loop in the parent thread to create a new worker thread and start it.
3. Serialising access to resources
Instructing the JVM to allow serial access to execute methods is as easy as adding the "synchronised" key word. Remember that there is a slight performance penalty incurred by adding this, so ensure you only synchronise the methods and objects that require it.
A few known Issues:
There are a few known issues with this code, they include:
1. The while loop does not trap thread timeout problems and may result in an infinite loop, and hence a hang. You could check against a timestamp and, if a certain period has elapsed, exit the loop.
2. Creating a new JDBC connection is an expensive resource operation. There are some techniques to pool JDBC connections in a connection pool. This is well beyond the scope of the article.
3. Creating new threads is an expensive operation. Again there are techniques for thread pooling which is also beyond the scope of this article.
4. The use of getNthDocument() while looping through very large collections (>2000 entries) is horrendously inefficient, and will cause the agent to go slower the longer it runs. It is better to use getFirstDocument() and then repeated calls to getNextDocument().
Now with this new-found knowledge have a look at the full source and see how it all fits together. Remember don?t write your code in Java just because it is a ?new? language, always select the right tool for the job. There are two distinct advantages through using Java in Lotus Notes. First is the ability to use JDBC which has much better functionality than the basic LotusScript ODBC LSX (and it?s multi-platform!). Second is the performance benefit of using multithreaded agents.
Jake's Final Thoughts
I'm sure you'll all join me in thanking Brendon for this article. At last the idea of multi-threading an agent has been demystified. I for one am very grateful.
Some of you may be dissapointed that this article is not like most on this site, where you have something that you can go away and "Plug 'n' Play" in to your database. However, I hope you see the benifits to be gained by learning and, more importantly, using this technique.
About the Author:
Brendon Upson is a freelance consultant based in Sydney, Australia, specializing in C, Lotus Notes and Java. Likes: Hoegaarden, Guinness and sailboarding. Dislikes: Fax machines, acronyms (especially anything beginning with X) and Biere d'Alsace.