There is a lot of literature and debate on how to rank candidates under
preferential voting
systems. Two of the methods used to determine winners are those based on some
form of Borda count and those
based on some form of Condorcet
method. Many students of
politics and voting systems prefer Condorcet methods to Borda ones for its
stronger theoretical qualities. However, while Borda counts, especially in
their most recognizable form—average rank, is easy to calculate, Condorecet
winners are more difficult, as it requires pairwise comparisons between all
candidates. Moreover, the possibility of having a Condorcet
cycle
grows as the number of candidates grows.
One of the more common methods of solving a Condorcet paradox is the Schulze
method. This not only has some
strong theoretical qualities, but it also has a relatively simple
implementation. However,
it is slow. Pairwise ranking, in its straightforward form, is of O(n²) and
the Schulze method is O(n³) where n is the number of candidates.
In this post, Rcpp is used to significantly speed up the vote ranking process.
Here is a sample ballot with eight voters ranking five candiates:
Candidates Vote A Vote B Vote C Vote D Vote E Vote F Vote G Vote H
1 Albert 1 2 1 3 2 1 3 4
2 Bruce 2 4 5 4 5 4 1 2
3 Charles 3 1 3 1 1 2 4 5
4 David 4 5 2 5 4 5 2 1
5 Edward 5 3 4 2 3 3 5 3
Here is some simple code to calculate the average rank of the candidates:
The above ballot would result in the following Borda-based ranking:
Names Average Rank Position
1 Albert 2.125 1
3 Charles 2.500 2
2 Bruce 3.375 3
4 David 3.500 4
5 Edward 3.500 5
Here is some simplified code to calculate Condorcet and Schulze winners. The ballots have been created to ensure that there is always a unique Schulze winner. In reality, there often is not, and some further form of tiebreaking routine will be necessary:
Name Rank Method
1 Albert 1 Condorcet
2 Charles 2 Condorcet
3 Edward 3 Schulze
4 Bruce 4 Condorcet
5 David 5 Condorcet
When profiling this code using the actual ballot of 30+ people with multiple Condorcet paradoxes, 81% of the time was spent in the Schulze algorithm, another 12% was spent in the PairCount algorithm, and the remaining 7% was spent on everything else (the actual ranking had multiple steps to handle cases when there was no Schulze winner). To speed up the procedure, I ported the Schulze and PairCount functions to C++:
It is also interesting to compare these results with those obtained from byte-compiling the functions:
First, we need to check that the functions return the same values:
While byte-compiling the PairCount function gives an impressive speedup of
ariund 40%, porting it to C++ makes it over 25 times faster (an over 2400%
speedup). Results with the Schulze algorithm is even more
striking. Byte-compilation gives an increase in speed of between 3 to 3.5
times without any change to the R code, but porting it to C++ is around 100
times as fast (and was over 120 times as fast on a different machine)!
Moreover, the PairCount algorithm reads more logically in C++, as the way R
handles vectors and matrices, when subtracting the current rank, the
resulting matrix ended up transposed with the candidates across the columns.
So with easy-to-read code that results in speed gains of multiple orders of
magnitude, what’s not to like?!