«Cambridge CB3 0FD, U.K. IBM T.J. Watson Research Center P.O. Box 704, Yorktown Heights, NY 10598, U.S.A. Abstract. There are many algorithms for ...»
After step 5 we still have a working deletion algorithm, but we would like to obtain more of the properties of Yuasa, namely, bounded re-tracing of newly allocated objects triggered by roots rescanning. To do that, we perform an additional transformation where if a newly allocated pointer is stored into the heap, the object is marked as reachable for this collection cycle. This simply means that if a newly allocated pointer is stored into the heap, we always increase the SRC, ignoring what the color of the destination object is. This is clearly a trivial over-approximation transformation on Shade of the destination object as indicated in step 6. With this, the collector now only needs to trace from existing objects and not from newly allocated objects. Newly allocated objects are essentially allocated white and colored black either during roots rescanning or during a pointer store in the heap.
The Hybrid collector is particularly suited for hard real-time applications, where it is desirable to achieve a bound on the roots rescanning work while reducing the ﬂoating garbage.
The skeleton code for the algorithm is illustrated in Fig. 5. M arkRootsDirect is the procedure that performs the one-level deep rescanning procedure for the roots partition while the isAllocatedInT hisCycle bit is used to differentiate between newly allocated and existing objects.
5 Experimental Evaluation We have implemented a concurrent collector framework in IBM’s J9 virtual machine.
The collector supports both standard work-based collection (for every a units of allocation the collector performs ka units of collection work) as well as time-based collection (the collector runs for c out of q time units). This collector has been built as a secondgeneration Metronome real-time collector .
However, in this paper we will concentrate on work-based collection because its use is more common in more widely used soft real-time systems, and is likely to provide a better basis for comparison with other work. Isolated experiments have shown that the trends we report for work-based collection generally hold for time-based collection as well.
Our collector is implemented in a J2ME-based system that places a premium on space in the virtual machine. Therefore, we use the microJIT rather than the much 1.5
more resource-intensive optimizing compiler. The microJIT is a high-quality singlepass compiler, producing code roughly a factor of 2 slower than the optimizing JIT.
The system runs on Linux/x86, Windows/x86, and Linux/ARM. The measurements presented here were performed on a Windows/x86 machine with a Pentium 4 3GHz CPU and 500MB of RAM.
The measurements presented all use a collector to mutator work ratio of 1.5, that is, for every 6K allocated by the mutator, the collector processes 9K. Collection is triggered when heap usage reaches 10MB.
We have measured the SPECjvm98 benchmarks, which exhibit a fairly wide range of allocation behavior (with the exception of compress, which performs very little allocation).
Figures 6 and 7 summarize the performance of the four collector algorithms by showing the maximum heap size and total execution time respectively. Both graphs are normalized to the Hybrid algorithm, and shorter bars represent better performance (less heap usage or shorter execution times). A geometric mean is also shown. These graphs summarize more detailed performance data which can be found in the Appendix.
5.1 Space Consumption As expected, the incremental update collectors (Dijkstra and Steele) often require less memory than the snapshot collector (Yuasa). This is because the incremental update collectors allocate white (unmarked) and only consider live those objects which are 1.5
added to the graph. However, there is no appreciable difference on 2 of the ﬁve benchmarks (jess and jack), which conﬁrms that the space savings from incremental update collectors are quite program-dependent.
The use of Steele’s write barrier instead of Dijkstra’s theoretically produces less ﬂoating garbage at the expense of more re-scanning, since it marks the source rather than the target object of a pointer update. This means that if there are multiple updates to the same object, only the most recently installed pointer will be re-scanned.
However, the Steele barrier only leads to signiﬁcant improvement in one of the benchmarks (db). This is because db spends much of its time performing sort operations. These operations permute the pointers in an array, and each update triggers a write barrier. With a Steele barrier, the array is tagged for re-scanning. But with a Dijkstra barrier, each object pointed to by the array is tagged for re-scanning. As a result, there is a great deal more ﬂoating garbage because the contents of the array are being changed over time.
Finally, the hybrid collector which we introduced, a snapshot collector that allocates white (unmarked), signiﬁcantly reduces the space overhead of snapshot collection: the space overhead over the best collector is at worst 13% (for javac), which is quite reasonable.
5.2 Execution Time While the incremental update collectors are generally assumed to have an advantage in space, their potential time cost is not well understood. Incremental update collectors may have to repeatedly re-scan portions of the heap that changed during tracing.
Termination could be difﬁcult if the heap is being mutated very quickly.
Our measurements show that incremental update collectors do indeed suffer time penalties for their tighter space bounds. The Dijkstra barrier causes signiﬁcant slowdown in db, javac, mtrt, and jack. The Steele barrier is less prone to slowdown – only suffering on javac – but it does suffer the worst slowdown, about 12%. These measurements are total application run-time, so the slow-down of the collector is very large – this represents about a factor of 2 slowdown in collection time.
Once again, our hybrid collector performs very well – it usually takes time very close to the fastest algorithm. Thus the hybrid collector appears to be a very good compromise between snapshot and incremental update collectors.
Because its only rescanning is of the stack, it suffers no reduction in incrementality from a standard Yuasa-style collector, which must already scan the stack atomically.
Its advantage over a standard snapshot collector is that it signiﬁcantly reduces ﬂoating garbage by giving newly allocated objects time to die. But because it never rescans the heap, it avoids the termination problems of incremental update collectors and is still suitable for real-time applications.
As shown by the more detailed graphs in the appendix, the primary reason why the Yuasa and Hybrid algorithms are quicker is that the Dijkstra and Steele collectors both scan signiﬁcantly more data during barrier buffer processing.
The benchmark with the most unusual behavior is jack, for which the Yuasa snapshot collector uses the least memory, while the Steele algorithm uses the least time. We are still in the process of investigating this behavior.
collector will represent a good compromise between time and space efﬁciency, and has the notable advantage of snapshot collectors in terms of predictable termination.
We hope this work will spur further systematic study of algorithms for concurrent collection and further quantitative evaluation of those algorithms.
Appendix: Detailed Performance Data
References  APPEL, A. W., ELLIS, J. R., AND LI, K. Real-time concurrent collection on stock multiprocessors. In Proceedings of the SIGPLAN’88 Conference on Programming Language Design and Implementation (Atlanta, Georgia, June 1988). SIGPLAN Notices, 23, 7 (July), 11–20.
 AZATCHI, H., LEVANONI, Y., PAZ, H., AND PETRANK, E. An on-the-ﬂy mark and sweep garbage collector based on sliding views. In Proceedings of the 18th ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications (Oct 2003), ACM Press, pp. 269–281.
 BACON, D. F., ATTANASIO, C. R., LEE, H. B., RAJAN, V. T., AND SMITH, S. Java without the coffee breaks: A nonintrusive multiprocessor garbage collector. In Proc. of the SIGPLAN Conference on Programming Language Design and Implementation (Snowbird, Utah, June 2001). SIGPLAN Notices, 36, 5 (May), 92–103.
 BACON, D. F., CHENG, P., AND RAJAN, V. T. A real-time garbage collector with low overhead and consistent utilization. In Proceedings of the 30th Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages (New Orleans, Louisiana, Jan. 2003). SIGPLAN Notices, 38, 1, 285–298.
 BACON, D. F., CHENG, P., AND RAJAN, V. T. A uniﬁed theory of garbage collection. In Proceedings of the ACM Conference on Object-Oriented Systems, Languages, and Applications (Vancouver, British Columbia, Oct. 2004), pp. 50–68.
 BAKER, H. G. List processing in real-time on a serial computer. Commun. ACM 21, 4 (Apr. 1978), 280–294.
 BAKER, H. G. The Treadmill, real-time garbage collection without motion sickness. SIGPLAN Notices 27, 3 (Mar. 1992), 66–70.
 BARTH, J. M. Shifting garbage collection overhead to compile time. Commun. ACM 20, 7 (July 1977), 513–518.
 BEN-ARI, M. Algorithms for on-the-ﬂy garbage collection. ACM Trans. Program. Lang.
Syst. 6, 3 (1984), 333–344.
 BOEHM, H.-J., DEMERS, A. J., AND SHENKER, S. Mostly parallel garbage collection. In PLDI ’91: Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation (1991), ACM Press, pp. 157–164.
 BROOKS, R. A. Trading data space for reduced time and code space in real-time garbage collection on stock hardware. In Conference Record of the 1984 ACM Symposium on Lisp and Functional Programming (Austin, Texas, Aug. 1984), G. L. Steele, Ed., pp. 256–262.
 CHEADLE, A. M., FIELD, A. J., MARLOW, S., PEYTON JONES, S. L., AND WHILE, R. L. Non-stop Haskell. In Proc. of the Fifth International Conference on Functional Programming (Montreal, Quebec, Sept. 2000). SIGPLAN Notices, 35, 9, 257–267.
 CHENG, P., AND BLELLOCH, G. E. A parallel, real-time garbage collector. In Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation (Jun 2001), ACM Press, pp. 125–136.
 DIJKSTRA, E. W., LAMPORT, L., MARTIN, A. J., SCHOLTEN, C. S., AND STEFFENS, E. F. M. On-the-ﬂy garbage collection: an exercise in cooperation. Commun. ACM 21, 11 (1978), 966–975.
 DOMANI, T., KOLODNER, E. K., LEWIS, E., SALANT, E. E., BARABASH, K., LAHAN, I., LEVANONI, Y., PETRANK, E., AND YANORER, I. Implementing an on-the-ﬂy garbage collector for java. In Proceedings of the second international symposium on Memory management (Oct 2000), ACM Press, pp. 155–166.
 DOMANI, T., KOLODNER, E. K., AND PETRANK, E. A generational on-the-ﬂy garbage collector for Java. In Proc. of the SIGPLAN Conference on Programming Language Design and Implementation (June 2000). SIGPLAN Notices, 35, 6, 274–284.
 HENRIKSSON, R. Scheduling Garbage Collection in Embedded Systems. PhD thesis, Lund Institute of Technology, July 1998.
 HUDSON, R. L., AND MOSS, E. B. Incremental garbage collection for mature objects.
In Proc. of the International Workshop on Memory Management (St. Malo, France, Sept.
1992), Y. Bekkers and J. Cohen, Eds., vol. 637 of Lecture Notes in Computer Science.
 JOHNSTONE, M. S. Non-Compacting Memory Allocation and Real-Time Garbage Collection. PhD thesis, University of Texas at Austin, Dec. 1997.
 LAMPORT, L. Garbage collection with multiple processes: an exercise in parallelism. In Proc. of the 1976 International Conference on Parallel Processing (1976), pp. 50–54.
 LAROSE, M., AND FEELEY, M. A compacting incremental collector and its performance in a production quality compiler. In Proc. of the First International Symposium on Memory Management (Vancouver, B.C., Oct. 1998). SIGPLAN Notices, 34, 3 (Mar., 1999), 1–9.
 LEVANONI, Y., AND PETRANK, E. An on-the-ﬂy reference counting garbage collector for java. In Proceedings of the 16th ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications (Oct 2001), ACM Press, pp. 367–380.
 NETTLES, S., AND O’TOOLE, J. Real-time garbage collection. In Proc. of the SIGPLAN Conference on Programming Language Design and Implementation (June 1993).
SIGPLAN Notices, 28, 6, 217–226.
 NORTH, S. C., AND REPPY, J. H. Concurrent garbage collection on stock hardware. In Functional Programming Languages and Computer Architecture (Portland, Oregon, Sept.
1987), G. Kahn, Ed., vol. 274 of Lecture Notes in Computer Science, pp. 113–133.
 PIXLEY, C. An incremental garbage collection algorithm for multi-mutator systems. Distributed Computing 3, 1 6, 3 (Dec. 1988), 41–49.
 ROTH, D. J., AND WISE, D. S. One-bit counts between unique and sticky. 49–56.
 STEELE, G. L. Multiprocessing compactifying garbage collection. Commun. ACM 18, 9 (Sept. 1975), 495–508.
 STEELE, G. L. Corrigendum: Multiprocessing compactifying garbage collection. Commun. ACM 19, 6 (June 1976), 354.
 YUASA, T. Real-time garbage collection on general-purpose machines. Journal of Systems and Software 11, 3 (Mar. 1990), 181–198.