A Simple Object Detection Example: A Chess Detector

Melissa Montero
May 14, 2023
8 min read

Updated: Aug 1, 2024

A robotic arm at the black-pieces side of a chess board in the initial position

Artificial intelligence has numerous practical applications, including process automation in industries such as automotive assembly or fruit collection on farms. The core challenge in such applications lies in the detection and classification of various objects for subsequent manipulation. For example, in an automated engine assembly line, the ability to identify pistons, rings, and valves is necessary for robotic arms to select and arrange them correctly. Chess piece detection using a deep neural network provides a simplified toy example for experimentation that can generalize to more complex use cases such as automated assembly systems.

This article explores the various stages of a typical deep learning problem, using chess detection as a baseline problem. The successful detection of chess pieces is crucial for developing more sophisticated use cases, given that the same challenges are inherent in both scenarios.

Problem Description

To begin with, it is essential to establish the problem that our network aims to solve, including the inputs and expected output. As illustrated in Figure 1, we consider a screenshot of a chess game played on a mobile device. The objective of this project is to develop a deep-learning pipeline that can process chessboard images similar to the one provided and automatically detect and classify each piece on the board. The output of the pipeline is an annotated image as shown in Figure 2, where each chess piece is enclosed within a bounding box indicating its type.

A screenshot of chess application mid game. — Figure 1. A sample chess game from a cellphone application. The full frame containing the chessboard is fed to the deep learning pipeline for the detection of the pieces.

A crop of the chessboard from the previous screenshot with colored bounding boxes around pieces, each with a label identifying a piece. — Figure 2. Resulting image after detection. Every piece on the chessboard is bounded by a box and labeled with the corresponding inferred class.

The reason behind feeding the pipeline with an uncropped image of the game is to simulate scenarios where the surrounding environment contains parts whose position can vary, and there is a need for detecting the working area.

Object Detection Processing Pipeline

In many deep learning applications, additional processing stages may be required to either adequate the data before the inference or post-process it and fine-tune the results. This pre- and post-processing do not necessarily involve a new deep learning process, but it can comprise some typical computer vision tasks and classical machine learning algorithms, among others.

The overall pipeline used in our example is depicted in Figure 3. The raw image from the game is fed to the system and processed by the chessboard detection algorithm, which is implemented using classical contour extraction algorithms. Once the board is extracted, the next stage involves the deep learning algorithm for detecting and classifying the chess pieces in the board. The output is a labeled version of the input image with corresponding bounding boxes and labels for each piece.

A diagram of the processing pipeline: the full screenshot goes through a chessboard detecion which extracts the board, which is then pushed to the piece detection system. The resulting image has the pieces located and identified. — Figure 3. Block diagram of the proposed solution. The game snapshot is fed to the pipeline in order to generate the predictions of the different pieces.

Dataset

The dataset is composed of 64 training images and 54 validation samples, all of them taken from different cellphone chess applications, varying the background theme and the piece's styles. Table 1 shows a list of the applications used for generating the dataset images:

Table 1. Mobile applications used for generating the dataset

Application Name

Chess.com

Lichess

Chess24

Fide Online Arena

In the case of the training images, they contain only 1 piece of every type per player, for a total of 12 pieces in the chessboard. All pieces are located in random positions. Figure 4 shows a sample training image.

A screenshot of a mobile chess app. The screenshot shows a single entity of each piece in each color. — Figure 4. Sample training image. Every training image contains only one piece type per player.

The even distribution of piece representations within the dataset promotes balance and helps mitigate potential biases that may arise from the over-representation of common pieces, such as pawns. Additionally, a heatmap analysis helped to understand the most common locations of each piece and its distribution along the game board such as the one shown for the black knight in Figure 5.

A heatmap of the knight location in the training images. The heatmap shows the knight has been generally everywhere except borders, mainly. — Figure 5. Heatmap of the black knight in the training dataset. It shows the most common locations of the piece on the board.

In the case of the validation set, every image contains a chess game with a random number of pieces as shown in Figure 6.

A screen shot of another mobile chess app. This one is of a natural game, there's no restriction in the number of pieces. — Figure 6. Sample validation image. Every validation image shows a real chess game with a random number of pieces.

Every piece in the data set is labeled with the corresponding bounding box, type, and player color. Note that we have an interest not only in recognizing the piece type but also in whether it is the black or white player.

Using different chess applications and themes allows the inclusion of a variety of pieces’ styles, leading to a better generalization of the detector. Figure 7 shows an example of different knight representations within the training dataset.

A collection of cropped out knight pieces. Not only they are different, but they are looking in different orientations and have different background colors too. — Figure 7. Pieces styles. Every piece in the data set contains different style representations depending on the source application and theme. This setup allows for the improvement of the generalization of the model.

An interesting note regarding the dataset is its size. Given that transfer learning will be used for training the model, it is not required to keep a huge set of images to get a reasonable performance.

Results and discussion

Board Detection

The classic board detection algorithm works pretty well by thresholding the image and extracting the image contours, obtaining a high IOU (Intersection Over Union) with respect to the labeled boards. The results for all the datasets are shown in Table 2, the IOUs are mostly over 0.9.

Table 2. Board detection IOU for the training, validation, and test datasets (higher is better).

Dataset	Mean IOU	MIN IOU	MAX IOU
Train	0.988	0.948	1.0
Valid	0.993	0.935	0.999
Test	0.903	0.839	0.999

Some board detection examples were taken for each of the datasets, Figure 8 shows a sample of these images for the training dataset. Each figure contains the detection with the minimum and maximum IOU, plus a couple of random images. The detected board is highlighted in the figures with a blue square. After some analysis we noted that the minimum IOU is obtained when an extra border -generally above or below the board- is added to the detection. This failure is caused by low contrast between the board and the background or highlights in the background causing confusion in the contour detection.

A mosaic of 2 by 2 random mobile chess mobile apps. In each one, a blue bounding box outlines the board. They are pretty accurate, in general. — Figure 8. Board detection examples for training dataset

Pieces Detection

With a small dataset of only 141 images, we decided to use transfer learning to train the YOLOv5 network. It is expected that transfer learning would allow us to get better results than training from scratch since we can take advantage of the shared domain. We initialized our model with the weights provided by Ultralytic’s YOLO repository, which were trained using the COCO object dataset. Specifically, we decided to test two different networks: YOLOv5s and YOLOv5x, corresponding to one of the smallest and the largest of the series, respectively. The objective was to determine if a small network would be sufficient for this problem and to see how much improvement can be gained with a larger model.

We started training using the default YOLOv5 hyperparameters, summarized in Table 3. As you can see, it includes a low augmentation that would help our dataset to have more variability.

Table 3. YOLOv5 default hyperparameters

Hyperparameter	Value
Initial learning rate (lr0)	0.01
Final learning rate (lrf)	0.01
Momentum	0.937
Weight decay	0.0005
Warmup epochs	3.0
Warmup momentum	0.8
Warmup initial bias lr	0.1
Box loss gain	0.05
Classification loss gain	0.5
Classification BCELoss positive_weight	1.0
Objectness loss gain	1.0
Objectness BCELoss positive_weight	1.0
IoU training threshold	0.20
Anchor-multiple threshold	4.0
Focal loss gama	0.0
Image HSV hue augmentation	0.015
Image HSV saturation augmentation	0.7
Image HSV value augmentation	0.4
Image rotation	0.0
Image translation	0.1
Image scale	0.5
Image shear	0.0
Image perspective	0.0
Image flip up-down(probability)	0.0
Image flip left-right (probability)	0.0
Image mosaic (probability)	0.0
Image mixup (probability)	0.5

Other parameters to take into consideration are the number of epochs, batch size, the amount of frozen layers, and the optimizer. Table 4 shows the values used for training:

Table 4. Mobile chess detection training parameters

Parameter	Values
Number of epochs	500
Batch size	8
Freeze layers	0.10
Optimizer	SGD

YOLOv5s

Initially, we tested different values for the epochs, starting with 100, but noticed that higher epochs resulted in better performances, being around 300 where the losses started stabilizing. This can be seen in Figure 9. We noticed that 500 epochs were enough for training, but we selected the best model for inference detection that was not necessarily the last epoch model.

On the other hand, the model is composed of two parts: the backbone and the head layers. Since the backbone layers work as feature extractors and the head layers as output predictors, we decided to test freezing backbone network layers (avoid re-training them) to determine if it helped with the final results. The backbone consists of 10 layers, so we ran the experiments without freezing any layer and freezing the 10 layers of the backbone.

The selected optimizer was the YOLOv5 default Stochastic Gradient Descent (SGD) with a batch size of 8.

A 3 by 3 mosaic of learning curves for Yolov5s. The loss at the top plateaus at around 100epocs. The accuracy at the bottom plateaus almost at one around 100 epocs. — Figure 9. Result metrics over 700 epochs for YOLOv5s

In Figure 10 you can see a comparison of the results between freezing and not freezing backbone layers for the YOLOv5s model. As you can see, the classifier loss takes more epochs to stabilize for the experiment that freezes the backbone, however, the final results are similar. Using the best model, the experiment without freeze obtained an F1 score of 0.987 for the validation and 0.809 for the test, while the case with backbone freeze obtained an F1 score of 0.966 for validation and 0.801 for the test. The main difference is a 0.02 in the F1 score for the validation dataset and both perform worse in the test dataset.

A 3 by 3 mosaic of learning curves for Yolov5s. The plots show curves for the experiments with and without weight freezing. There's not much difference between them. The loss at the top plateaus at around 100epocs. The accuracy at the bottom plateaus almost at one around 100 epocs. — Figure 10. Result metrics comparison between freeze 0 and freeze 10 for training over 500 epochs for YOLOv5s

On the other hand, Figures 11 and 12 show the confusion matrix for validation and test datasets, respectively, for the experiment with no freezing. Both datasets accomplish a clear diagonal, meaning that most of the detections are correct, as we can expect with the F1 scores presented, the validation results have few outliers while the test results have more.

A confusion matrix with a very clear diagonal. There are a few sparse outliers, but in general the diagonal is very noticeable. — Figure 11. Validation confusion matrix for YOLOv5s freeze 0

A confusion matrix where the diagonal still pops out, but there are a lot of false detections around. — Figure 12. Test confusion matrix for YOLOv5s freeze 0

Taking a deeper look at the validation outliers we discovered that the detection is confusing the bishop piece with the knight pieces for the style shown in Figure 13. You can see that it doesn’t have the usual rounded shape and has a slot bigger than a usual bishop. It can be seen that, with some imagination, it can be confused with one of the knights that are looking up.

An odd looking white bishop. The piece has straith corners an edges, and the dent in the head is very wide and may be confused with an open horse mouth.

An odd looking black bishop. The piece has straith corners an edges, and the dent in the head is very wide and may be confused with an open horse mouth. — Figure 13. Bishop style confused with knight

The model is also confusing the white king with the black king and the white pawn with the black pawn for the style shown in Figure 14, where the shapes have a thick black line.

A cartoonish white pawn. The piece has a very thick black border.

A cartoonish white king. The piece has a very thick black border. — Figure 14. King and pawn style where colors are confused due to the thick piece outline.

On the other hand, inspecting the detection and confusion matrix for the test dataset, it is clear that it is confusing a lot more pieces. Specifically, the model is having trouble with the piece style shown in Figure 15. This figure shows the detections obtained from the inference using the best model of YOLOv5s without freezing. The bounding boxes indicate the piece's detection is correct, but the classification of the piece doesn’t make sense in most of the pieces. To understand what was going on with this style we created a couple of test images, using the same piece style but removing the gray circular background as shown in Figure 16. Our hypothesis was that the circular background could be confusing the network. Applying the inference to these images we got the results shown in Figure 16, where the pieces were detected correctly, except for the king which was confused with the rook.

A chessboard with very odd looking pieces where almost all the pieces have been wrongly identified. The pieces are embedded in a silver coin. — Figure 15. Test image detection with classification problems, YOLOv5s, freeze=0

A chessboard with the same piece set but the coin from the pieces have been removed. Piece identification has improved a lot.

A chessboard with the same piece set but the coin has been removed from the pieces. Piece identification has improved. — Figure 16. Inference results of composed images

Another common mistake in the test dataset results is to confuse a bishop with a pawn in piece styles like the one shown in 17.

An odd looking pawn. It consists of a rounded body in a platform with a red circle in the top. Easily confused with a bishop. — Figure 17. Bishop style confused with pawn

Finally, Figure 18 shows some sample-labeled images from validation and test datasets, so you can have a qualitative sense of the chess detector's performance.

A 3 by 4 mosaic of several chessboards where pieces have been detected. In general, they all seem correct. — Figure 18. Sample detection results, YOLOv5s, freeze 0

YOLOv5x

Figure 19 shows a comparison of the results between backbone freeze layers and no freezing layers for the YOLOv5x model. As you can see, the general behavior is pretty similar, obtaining a small improvement without freezing but at the cost of more memory usage and larger training times. The best model in this scenario obtained an F1 score of 0.96943 for validation and 0.83202 for testing when freezing the first 10 layers and 0.99258 for validation and 0.8609 for testing when all the layers were trained.

Figures 20 and 21 show the confusion matrix for testing and validation when the freeze is set to 0. As it is expected, we can see a clear diagonal in both with some outliers for the validation dataset and more outliers for the test dataset. The false positives are mostly caused by the same pieces exposed for YOLOv5s.

A confusion matrix for the training set. It has a very clear diagonal with a couple of false detections. — Figure 20. Validation confusion matrix for YOLOv5x, freeze 0

A confusion matrix for the test set. The diagonal is clear, but there are a lot more of sparse false detections — Figure 21. Test confusion matrix for YOLOv5x, freeze 0

Going beyond digital!

The goal for every model training is to achieve enough generalization to react properly to unexpected scenarios. In order to stress our model and determine how well it performs in unexpected situations (in an empiric manner), we used a picture of a game from a chess book, as shown in Figure 22. The result of the detection is shown in Figure 23. It is possible to note that the model was able to detect and classify every piece in the chessboard correctly, demonstrating a good level of generalization.

A picture taken from a chess book. It is a black and white book and there is text in spanish around. — Figure 22. Picture of a chess book exercise used for testing the model’s generalization.

A picture taken from a chess book. It is a black and white book and there is text in spanish around. Color bounding boxes are tightly placed around pieces and labels correctly identify them. — Figure 23. The output of the detection model using a chess book exercise. All pieces are detected and classified correctly.

What is Next?

Although we used a chess game as the subject of this project, it is worth mentioning that the same process and challenges can be extrapolated to more complex scenarios in the industry such as assembly lines, where the environment is not necessarily static and the location of the pieces can vary. We have demonstrated that using transfer learning and a model with a shared domain, allows us to speed up the training and achieve acceptable results in a short time, requiring fewer resources.

A Simple Object Detection Example: A Chess Detector

Problem Description

Object Detection Processing Pipeline

Dataset