by Nick Entin
In this article I will describe the critical performance factors of the Polarion platform, scalability pitfalls and limitations, and recommendations related to capacity-planning your production environment, all based on several scenarios that are representative of our install base.
System Configuration Landscape
Polarion is a web-based application. Clients interface with it through a supported web browser. Consequently, Polarion can be accessed via LAN, WAN (e.g. company-internal inter-office networks), Internet or VPN.
Basic Polarion System Architecture
Polarion is built on top of a number of open source frameworks and APIs. All of these components have their own performance and scalability characteristics, only some of which are relevant to Polarion, due to the way these components are used by the Polarion platform. Some factors may not even come into play except in very large, or very heavily stressed environments.
Polarion Application Architecture
Apache HTTP Server
Apache is the de facto web server solution, rivaled only by IIS on Windows platforms. It is extremely robust, and powers some of the largest web sites on the Internet.
Scalability and Performance
There are no known issues related to Apache as a performance bottleneck for Polarion.
The Polarion installation procedure installs and configures Apache for you. To avoid unnecessary performance degradation or system downtime, it is recommended that you don’t deviate from this configuration unless you have very specific reason for it, and then only after consulting with your Polarion Support Team.
Polarion also interprets folder permissions in Subversion to determine if a particular user has read and write access to a document or work item. The more complex the access file is, the more time and memory Polarion may take for the processing. If your access file is quite complex (thousands of rules) in handling permissions to sources, and less relevant for Polarion, you may switch off parsing of the file by Polarion.
Since its initial launch in 2004, Subversion has quickly become the leading version control solution. It is now widely used in organizations of all sizes worldwide. Polarion uses Subversion to store almost all of its configuration and data. (Exceptions include trend and reporting data, as well as log files.)
Scalability and Performance
One of Subversion’s key strengths is also its primary performance bottleneck. Uploading large change sets to the repository consumes network traffic and data will be queued if there are many requests.
That said, Polarion change sets are typically in the range of kilobytes, which is far from being heavy lifting for Subversion.
Subversion has no proven upper limit as far as repository size is concerned. The Apache Foundation uses a single repository for all of its projects (including Subversion itself, as well as the Apache Web Server project), which consists of well over 1300K revisions (as of June 2012 - see http://svn.apache.org/repos/asf/).
Apache has released version Subversion 1.7, in which performance was highly optimized by changing the working copy format and using simpler HTTP (sometimes referred to as HTTPv2) protocol.
Since Polarion doesn’t use working copies and connection is practically permanent between Polarion and SVN, those improvements don’t have significant impact on Polarion (tests show 1-5% improvement). Nevertheless, if you also use Polarion’s SVN repository to host your source code, and your developers frequently access it, then it makes sense to use SVN 1.7.x. This version is now bundled with Polarion.
External Repositories – SVN and Git
Polarion 2012 offers the possibility to attach multiple source code repositories to one Polarion project. Among supported systems we list Subversion and Git. Those repositories are scanned for new revisions that should be linked to corresponding Work Items for traceability purposes. When a link is recognized, a user may open a corresponding commit via an external SVN or Git browser. A very good connection between Polarion and those repositories is not mandatory – the slower/less reliable the connection is between those servers, the longer it will take before linked revisions will be shown in the Polarion UI. However, you’re ensured that nothing gets lost.
The key factor for optimal SVN performance is to limit the amount of processing that is done in the various commit hooks on the Subversion repository itself in order to boost the operations’ performance. Any non-trivial automation should be implemented to run out-of-process of the commit itself whenever and wherever possible.
Subversion requires ultra-fast access to its file system. Anything other than physically attached storage is strongly discouraged. In other words, the best performance may be only achieved with SVN repository stored on the same server as Polarion or on comparable fast NAS solutions.
Be aware that Polarion and SVN use a lot of small files, so performance of the disc subsystem is crucial. Also note that operating systems work with caches of the file system, and you need to reserve enough RAM for OS caches related to the file operations. As rule-of-thumb, we recommend to reserve ½ of server RAM for Polarion while leaving rest for OS caches if you have more than 8GB RAM available on the server and work with big projects.
Polarion stores all of its (XML) data, and its configuration in Subversion. This means we have to bridge the gap between not having a database back end, and being able to query and quickly retrieve and filter data. Apache Lucene is a powerful indexing framework, and Polarion uses it to implement what we affectionately refer to as "The Index", which is what the Polarion platform uses for all of its read operations.
Scalability and Performance
Lucene has built-in mechanisms to balance fast in-memory storage with on-disk overflow to limit the memory footprint of larger data sets. Beyond that, there is no known upper limit to what Lucene can handle. It scales in an almost perfectly linear fashion when increasing the number of concurrent requests.
The critical performance factor for Lucene is the number of returned results. To mitigate this, Polarion makes heavy use of lazy loading whenever possible.
Encourage all users to narrow the scope of their queries to extract more relevant (and therefore a smaller number of) results.
Polarion 2012 introduces a new indexing mechanism, which allows more sophisticated querying, including joined queries and historical search. This mechanism is natively combined with Lucene in such a way that simple queries might be run against Lucene while more complex ones run against the SQL database.
Scalability and Performance
Polarion uses an embedded H2 Database, which is proven by the open source community to be fast and reliable SQL storage. It doesn’t add significant footprint to Polarion’s memory consumption, but does add significantly to storage requirements – approximately 5-10 times more than Lucene takes.
During initial reindex of the repository after updating an older version to Polarion 2012, database population may take significant time – as much as 1-2 days, depending on size of repository and how many revisions of Polarion documents and work items already exist in it. This process can be safely interrupted - it will continue automatically after server restart. Before the historical index is fully populated, access to baselines and historical searches may be limited.
Review any Traceability matrices and reports written for Polarion 2011 or previous versions. These can possibly be optimized for better performance using the new SQL support in macro queries on work items, replacing Lucene queries that perform poorly.
Since their inception in version 2011, Polarion's artifact-aware LiveDoc online documents ("Documents") have gained significant attention in the fields of Requirements Management and Quality Assurance & Testing. They combine the familiarity and ease of use of office documents with the ability to contain granular artifacts like requirements and test cases ("Work Items") that are tracked and managed with automated process workflow.
Scalability and Performance
Work items contained in a LiveDoc document are stored separately from other work items in a project, and are treated differently because of the added constraints of the LiveDoc concept. As a result, Documents containing less than 5,000 work items are well supported, while exceeding that number might degrade performance significantly.
Polarion 2012 SR1 addresses several cases better than in previous releases. Performance of server and client in particular is optimized to support up to 1000 comments per document without significant degradation of performance. The release also adds optimized functionality for merging of concurrent modifications to one document, allowing structural changes to be done in different sections of the document without raising an overwrite conflict.
No single Document should contain more than 5,000 Work Items.
Overall Performance and Scalability
This topic needs to be looked at from two separate yet related perspectives: load and volume of data.
When looking at Polarion as a whole, you will notice it scales almost perfectly linearly on Dual Core-CPU platforms. When moving to 8 CPU cores, the application essentially scales infinitely, for all practical intents and purposes.
The graph points out that 80% of save operations with 100 concurrent users actively working with the server will be served in less than 4 secs on one Intel i7 CPU platform. * If there are 100 users changing work items every 5 minutes, statistically computed there will be 80% probability that less than 5 save operations go in parallel.
As long as Polarion’s data remains within the parameters set out earlier in this document, scalability of volume is limited to memory (RAM) consumption, and manifests itself as a largely linear relationship between number of projects in one repository and Polarion’s overall memory footprint.
The amount of work items in the repository has low impact on Polarion's performance, and scalability is not affected unless you often query all work items at once.
The number of open projects has a high impact on memory consumption, while project configurations, permissions, roles and other information is kept in memory for performance reasons. We recommend:
Run no more than 500 projects on one instance of Polarion.
Close inactive and completed projects to recoup resources.
If you need more then 500 active projects, consider using a multi-instance installation of Polarion to split projects across several Polarion servers. Hosting multi-instance on one server may be helpful, but the preferred way is to distribute the load across several physical machines.
The number of registered users has medium impact. User management administration may be affected (slow rendering of the pages) if there are more than 1000 users registered in one instance of Polarion.
The number of active users has high impact. Processing will be parallelized, but the amount of threads exceeding amount of available CPUs would create a processing queue, with corresponding growth of response time.
Hosting by Polarion Software
Polarion Software offers customers hosting for their Polarion installation(s). The Polarion edition(s) you license will be installed for you on Amazon Cloud. We’ll take care of optimizing the server for your needs, plus install regular updates to future releases and handle all technical administration of the server.
Reference Customer Installations
The following are examples from our customer base showing a range of customer installations. Examples are for the purpose of illustrating what kinds of numbers you may need to be thinking of when planning your own installation’s scalability.
15,000 work items
300,000 work items
150,000 work items
External Factors and Recommendations
Like most other rich web-based applications, Polarion caches a lot of dynamic content in the browser. As a result, memory consumption of the browser process can balloon over time. Polarion recommends that you close your browser after using Polarion, to keep this from becoming a problem. This problem is more related for users of Internet Explorer, while Firefox and Chrome have more optimized memory management.
Polarion can be heavily dependent on disk operations, especially as the server scales to where a growing portion of the index is serialized to the file system.
Performance of the index can be sensitive to disk fragmentation. No operating system is immune to this. Most Linux distributions do give the option of using the ext3 file system, which has features to prevent meaningful fragmentation altogether. In all other cases, regularly scheduled defragmentation is highly recommended.
Subversion requires local or as-fast-as-local file system access to the repository. We strongly recommend either an internal drive, or attached storage (fiber-optic connection). If network-attached storage (NAS) must be used, the length, speed and stability of the network path between server and storage are absolutely critical.
Use of Solid-State Disks (SSD) should be considered carefully. Relatively low-price devices are affected by degradation of save performance over time. We recommend only performance-proven SSDs to be used for Polarion, since Polarion and Subversion make a lot of small writes to the disk.
Real-time virus scanning can cripple file system performance like nothing else. We recommend you exclude the Subversion repository file structure and all on-disk Polarion data (c:\polarion\data or /opt/polarion/data in default install) from being scanned, and schedule any scans you feel are needed to run overnight or other off-hours.
Windows is generally slower than Linux in all relevant areas (up to 5% slower in file system operations). Beyond ease of installation, there is no recommendation as to a particular flavor of Linux.
Selecting a 64-bit rather than a 32-bit operating system allows more memory to be assigned to the server process. Even if this is not an immediate concern, it makes for much easier scalability, as replacing a 32-bit with a 64-bit operating system down the road is going to be an intrusive exercise. A 32-bit operating system is fine for evaluation purposes.
For production environments, we recommend a 64-bit Linux with at least 8GB of RAM.
Make sure that the OS has enough file handles available for Polarion. Since Windows has a pretty good default, it is more of a Linux specific issue. The Polarion process should get access to 32K file handles for stable performance.
The Polarion client interface relies heavily on quick, short bursts of communication with the server. Network latency is a major factor in client performance degradation. For this reason, network round-trip (ping) between client and server should ideally be no worse than 150 milliseconds.
Virtual environments come at a performance cost, since hardware components such as memory, network, graphics and even storage, are emulated by software. As a result, any application that runs in a virtual machine (VM) will perform worse compared with running the same application on dedicated physical hardware with the same specification.
Please be aware that our hardware recommendations for the physical server environment are applicable to any specific virtual machine on which Polarion runs, and not just to the physical host server machine, where many VMs may run in parallel.
Polarion Software recommends that you only run Polarion on Linux virtual machines. This is largely due to Windows being a slower, larger-footprint operating system to begin with, a fact that is only amplified by adding virtualization to the mix.
Example Hardware Configurations
32 or 64-bit
GB RAM (dedicated to Polarion)
Storage for Polarion
1TB+ (SCSi or similar)
1TB+ (RAID 10, NAS, SAN)
1TB+ (RAID 10, NAS, SAN)
Make sure that there is enough RAM available to the OS for file-caching. If SVN is hosted on different machine, more memory could be allocated for the Polarion process.
About the Author:
Nick Entin is Vice President for Research & Development at Polarion Software and a Certified Scrum Master