Competition Results

Here is the entire collection of data produced by all the planners participating in IPC3 (nearly 6MB). At the conference, we had an opportunity to identify those planners whose performance we considered to be outstanding according to the criteria identified below. Now we invite the community to draw its own conclusions based on the entire data set.

The data sets are organised under the subdirectory "IPCResults/PLANS". In the subdirectory "IPCResults/Collected" can be found the data extracted from these plans. The data sets are simple files containing one line for each problem solved in a given problem set. The line contains five values: the problem number, the plan quality, a second plan quality value, the number of steps in the plan and, finally, the time taken to produce the plan. Only plans that validated are entered. The two plan quality measures are determined by the problem instance metric (where the problem stipulates it - otherwise it defaults to plan length). In cases where plan length is used, or in the case of non-temporal domains using "total-time" as a metric (which is considered equivalent to plan length for non-temporal domains), plan length is measured as either the number of steps or as the number of distinct points in the plan at which activity occurs (equivalent to "Graphplan length"). The first plan quality measure is that derived using plan length and the second is that derived using Graphplan length. In most cases the values are identical.

An extra set of results for LPG, generated after the competition (due to a bug resolved after the event) is available here. It should be unpacked in the IPCResults directory and provides the Satellite-Complex set of results for both quality and speed settings.

We made a qualitative judgement based on the coverage (how many problems were tackled), the ratio of successful plans to problems tackled and the quality of the solutions generated. We also considered speed of planners, but believe that an order of magnitude is easily accounted for in details of implementation. We favoured a high coverage and high ratio of success in combination, making a qualitative judgement on the boundary between high coverage combined with moderate ratio and good coverage combined with high ratio. Because the competition was concerned with pushing the frontier of temporal and metric planning we felt that coverage was a very important factor in judging the performance of the planners. Of course, coverage is not necessarily indicative of quality. Therefore we considered ratio an equally important criterion.

Planner	Problems solved	Problems attempted	Success ratio	Capabilities
(Link to description)
FF	237 (+70)	284 (+76)	83% (85%)	(Strips, Numeric, HardNumeric)
LPG	372	428	87%	(Strips, Numeric, HardNumeric, SimpleTime, Time)
MIPS	331	508	65%	(Strips, Numeric, HardNumeric, SimpleTime, Time, Complex)
SHOP2	899	904	99%	(Strips, Numeric, HardNumeric, SimpleTime, Time, Complex)
Sapa	80	122	66%	(Time, Complex)
SemSyn	11	144	8%	(Strips, Numeric)
Simplanner	91	122	75%	(Strips)
Stella	50	102	49%	(Strips)
TALPlanner	610	610	100%	(Strips, SimpleTime, Time)
TLPlan	894	894	100%	(Strips, Numeric, HardNumeric, SimpleTime, Time, Complex)
TP4	26	204	13%	(Numeric, SimpleTime, Time, Complex)
TPSYS	14	120	12%	(SimpleTime, Time)
VHPOP	122	224	54%	(Strips, SimpleTime)

Note that FF attempted 76 additional problems intended for the handcoded planners and solved 70 of them successfully. IxTeT solved 9 problems with plans accepted by the validator and attempted a further 10 problems producing plans that could not be validated. IxTeT requires recoding of the domain and problem instances from PDDL and this must be carried out by hand.

On these critera we identified one fully automated and one hand coded planner as demonstrating distinguished performance of the first order. These were:

Fully automated: LPG (Alfonso Gerevini, Ivan Serina and Team, University of Brescia, Italy)
Hand coded: TLPLAN (Fahiem Bacchus and Michael Ady, University of Toronto, Canada)

We also identified one fully automated and one hand coded planner as demonstrating distinguished performance. These were:

Fully automated: MIPS (Stefan Edelkamp, Freiburg University, Germany)
Hand coded: SHOP2 (Dana Nau and Team, University of Maryland, USA)

Finally, we thought it appropriate to identify a best newcomer in order to encourage student and individual participation in the competition. As everyone knows, the work involved in successfully participating in a competition is huge and this is particularly difficult for individuals without the support of a team. In addition, entering the competition as a newcomer participating alongside well-established systems and teams is a daunting undertaking and we wanted to begin a tradition of rewarding that effort. For us that was easy, as one of the newcomers produced a high level of performance across a wide range of the competition domains. We awarded the best newcomer prize to:

VHPOP (Håkan Younes, Carnegie Mellon University, USA)

Identifying winners is difficult because it seems to undervalue the efforts of the other participants many of whom also performed extremely impressively. In particular, certain planners achieved outstanding performance in particular tracks even though they did not display broad coverage of the entire data set. For example, FF out-performed its competitors in many of the Numeric and Strips problems, but it didn't compete in the temporal domains, giving it lower overall coverage. Similarly, TALPlanner exhibited extremely good performance in many of the temporal domains, but didn't participate in the numeric domains, lowering its overall coverage. Our decision to use coverage and ratio as the criteria for identifying conference prize-winners is not intended to devalue excellent performance in a smaller subset of the domains.