What's a rating system?
A rating system is a method for assessing a player's strength in games of skill. We believe fencing and by extension HEMA is a game of skill.
The rating is purely based on match results, calculated based on the rating of your opponents and how well you did against them. The rating system has no clue about how close matches were or how well you fenced.
We use the Glicko-2 algorithm, created by Harvard statistics professor Mark Glickman. You can read more about the algorithm on Glickman's website.
Why do you do this?
We would like for this to exist.
What tournaments do you accept into the system?
Our criteria aren't very strict, but it needs to be a serious HEMA tournament.
We've established some guidelines:
- It needs to be HEMA.
- There must be members from at least two different clubs competing.
- The event should in be open to all eligible fighters. We obviously allow beginner tournaments and invitational tournaments, but we will exclude tournaments that aren't open to fencers of a given club, federation, etc. One goal of HEMA Ratings is to integrate the community, not to create "islands"
- It needs to be judged/reffed, and though a judged competition that permits some self-calling is allowed, tournaments that exclusively rely on the "Honour System" as their primary scoring methodology do not meet our criteria.
- The outcome of the individual matches must be distillable into a win/loss/draw format.
- Each match needs to symmetrical and stand on its own. This means that points can't "carry over" between matches, no fighters can start with more points than the other, one fighter can't be disadvantaged with a different scoring system, etc.
- For rating purposes the weapons need to be symmetrical. No dagger vs. pollaxe or similar. That being said, we can still import mixed weapons tournaments to put them in the fighter' record.
The criteria are a work in progress, but it should be possible to get the general intention based on the above. If your tournament meets those loose criteria and you can provide us with the data, we'll be happy to include it in the system!
We have a template for data entry that we're happy to send us, so get in touch and we'll send you everything you need.
What can you use this rating data for?
Mostly bragging rights and because it's fun. It's also a useful tool for seeding pools in tournaments. It's certainly not a be-all-end-all. A number can obviously never capture everything.
My name is wrong / my club is wrong / you are wrong
Please get in touch with us at our contact page
You should add [awesome feature]
Thanks for the suggestion! This system is currently in beta and it's been thrown together pretty quickly in order to put it out there for the community. As we get closer to launch the Real Deal, we will implement (including but not limited to):
- Filters in all tables
- Top 10 history per division
- Fighter profile pictures
I want to help!
Great! Get in touch with us at our contact page. We're particularly looking to work with contributors that can add more matches from more tournaments. We have a template for data entry that we're happy to send us, so get in touch and we'll send you everything you need.
What data do you use?
We have a list of matches pulled from various tournaments, and we use the Glicko-2 algorithm to calculate the ratings based on this. For a complete list of events, see the "Events" page.
Our goal is to have as many matches as possible in our database, spanning as many years as possible.
Currently (February 2017) our database covers over 10 000 matches, and we're constantly on the lookout for more.
Okay. I get what a rating is. How does this work?
The key assumptions here are at work are the following:
The performance of each player in each match is a normally distributed random variable. Although a player might perform significantly better or worse from one game to the next, we assume that the mean value of the performances of any given player changes only slowly over time.
Performance can only be inferred from wins, draws and losses. Therefore, if a player wins a game, he is assumed to have performed at a higher level than his opponent for that game. Conversely if he loses, he is assumed to have performed at a lower level. If the game is a draw, the two players are assumed to have performed at nearly the same level.
Suppose two players, both rated 1700, played a tournament game with the first player defeating the second. Suppose that the first player had just returned to tournament play after many years, while the second player plays every weekend. In this situation, the first player’s rating of 1700 is not a very reliable measure of his strength, while the second player’s rating of 1700 is much more trustworthy.
Our intuition tells us that that:
- the first player’s rating should increase by a large amount (more than 16 points) because his rating of 1700 is not believable in the first place, and that defeating a player with a fairly precise rating of 1700 is reasonable evidence that his strength is probably much higher than 1700
- the second player’s rating should decrease by a small amount because his rating is already precisely measured to be near 1700, and that he loses to a player whose rating cannot be trusted, so that very little information about his own playing strength has been learned.
What does this mean in practice?
If you win fights, you gain points. If you lose fights, you lose points. If you perform better than expected (win against higher rated opponents), you will gain a lot of points. If you perform worse than expected, you may lose a lot of points.
As you compete your rating deviation gets smaller as the rating system has a more accurate estimate of your rating. If you don't compete your rating deviation will grow.
Why does my rating go down when I don't compete?
First, it's important to understand the fundamentals of the rating system we use.
In the Glicko rating system your rating actually consists of two numbers:
- Score - This is the system saying "I think you're this good ..."
- Rating Deviation (RD) - This is the system saying "... and this is how confident I am that I'm right". One RD is equivalent to one standard deviation.
The lower the RD, the less uncertain the system is about your performance. As you compete, your RD will normally go down, unless you perform very unevenly, I.E. losing to lower rated fighters and defeating higher rated fighters. When you don't compete, however, your RD will rise slightly every month because the system is becoming increasingly uncertain about where you actually belong.
Since most HEMA practitioners don't compete that much, everyone has a relatively high deviation, meaning that the system is never very sure about your performance. In order to smooth out some of the error that comes with this, we've decided to implement a "weighted rating", which is the score - 2 * rating deviation. This is essentially the same as saying "I'm 97.5% confident that your score should be at least this high".
Since the deviation is part of this weighted rating, the monthly increase in deviation will translate to a slight drop in rating for the months you don't compete.
Why do you have separate lists for different weapons? Why do you have separate lists for Men and Women?
The ranking systems assumes all fighters in the list could face each other in any given tournament, and that past performance is predictive of future performance. We don't believe past performance in a Rapier tournament is a strong indicator of how well someone would do in a Longsword tournament.
Where does your data come from?
We graciously use data provided to us by HEMA CM, the premier name in competition management software. The initial dump of a few thousand matches from HEMA CM is what got us started down this road in the first place.
Furthermore, we have had awesome help from many helpers who have either organized events, recorded results from videos, provided us with paper records from old tournaments, written tournament management software, etc.
What is "Island Effect"?
"Island Effect" is what happens when you have a division with little or no overlap between subgroups.
For example: imagine that there's a large group of active sabreurs in Norway, South Africa and Australia. All three scenes oranize multiple tournaments over many years, but never travel abroad to compete with the two other nations. All scenes have a fighter who sticks out as the best beating everyone else in their country and ending up with a weighted rating of 2000.
The question now is, who's better, the Norwegian, the Australian or the South African sabre champion? The truth is that without "cross-polination" between the scenes it's impossible to know because the three scenes are essentially "islands" in the sea of sabre with independent ratings. It's possible that they're equally good, but it's just as likely that one scene is way ahead of the others, and you can't know which is which before there's crossover between the islands.
How can you combat Island Effect?
The good thing about the algorithm we use for rating is that not everyone needs to fight everyone in order for the results to have an effect. If, for example, a few of the top rated fighters from an island travel to another island, they will either come back with a reduced (if their island was worse) or an increased (if their island was better) rating, which will in turn affect the fighters from their own scene. If the top three fighters from an island come back home 200 points lower after having taken a solid beating, but still beat everyone else on their island, everyone else on that island will also drop.