Sudoku solving using CV and AI with an Image of Sudoku as Input!

Project Motivation

We have been solving Sudoku since primitive times using the conventional approach. Now it’s time to solve it using the technologies that we have been learning in our curriculum. During this project, we are following one slogan that we learn from our experienced mentors/faculties that is,

“Learning By Doing”

Objective of Project

Partnership Collaboration Stock Illustrations – 28,485 Partnership Collaboration Stock Illustrations, Vectors & Clipart - Dreamstime

Figure 1.1

Today whole world is moving to technology with good speed so We aim to achieve the fastest Sudoku solver using minimum space and time complexity with using the legacy of AI (Artificial Intelligence). We believe that by creating this project our logic will be expanded with the guts of AI (Artificial Intelligence) knowledge.

Milestones Achieved

Figure 1.2

As per our definition, we decide to divide our whole project into 3 parts for better understanding and accurate output.

1.First we take an Image from the user as input!
--Through image processing, we identify the digits.
2.Applied our sudoku logic on identified digits!
3.Get accurate output!
So, First of all, we need to extract the grid from the image provided in the input. Then we will need to identify digits using the K-Nearest Neighbor algorithm and create a solution. Lastly, we will reconstruct the grid by substituting the values that we got through the backtracking algorithm.

Code Snippet

Followed Code Snippet, is a K-Nearest Neighbours algorithm with k=3. This model is trained on the MNIST(Modified National Institute of Standards and Technology) handwritten digits dataset with 50,000 images for the training set and 10,000 for testing. Then the model is saved using Pickle as a knn.sav file.

This function makes a dataset of size "size" and returns that datasets images and targets. This is used to make the dataset that will be stored by a model and used in experimenting with different stored dataset sizes.

k : number of neighbors to use in classification

Test_Data : The data/targets used to test the classifier

Stored_Data : The data/targets used to classify the test_data

Figure 1.3

Figure 1.4

Followed Code Snippet, class takes an image path as input, performs preprocessing, identifies the grid, crops the grid, corrects perspective, writes all these stages to the StagesImages folder, and finally slices the grid into 81 cells and returns the 2D array of 81 cell images. About preprocessing you will learn in this blog itself.

This function blurs the image, applies thresholding, inverts it, and dilates the image.

Figure 1.5

This function finds the grid (the biggest blob), uses Hough transform to find lines, fuses related lines, finds the borderlines of the grid, warps the board to correct perspective, and stores the board in self.extractedgrid.

Figure 1.6

Working

Figure 1.7 & 1.8

Image Pre-Processing

Gaussian Blurring & Adaptive Gaussian Thresholding - In image processing, a Gaussian blur (also known as Gaussian smoothing) is the result of blurring an image by a Gaussian function (named after mathematician and scientist Carl Friedrich Gauss). It is a widely used effect in graphics software, typically to reduce image noise and reduce detail.

In a Gaussian blur, the pixels nearest the center of the kernel are given more weight than those far away from the center. This averaging is done on a channel-by-channel basis, and the average channel values become the new value for the filtered pixel.

Adaptive thresholding with a Gaussian Function to account for different illuminations in different parts of the image.

Figure 1.9 & 2.0

Inverting & Dilation - Black and white image inversion refers to an image processing technique where light areas are mapped to dark, and dark areas are mapped to light. In other words, after image inversion black becomes white, and the white becomes black. An inverted black and white image can be thought of as a digital negative of the original image.

Dilation adds pixels to the boundaries of objects in an image, while erosion removes pixels on object boundaries. The number of pixels added or removed from the objects in an image depends on the size and shape of the structuring element used to process the image. With a plus-shaped 3X3 Kernel to fill out any cracks in the board lines and thicken the board lines.

Figure 2.1 & 2.2

Flood Filling & The Largest Blob -Since the board will probably be the largest blob a.k.a connected component with the largest area, flood filling from different seed points and finding all connected components followed by finding the largest flood filled area region will give the board.

The largest blob a.k.a the board is found after the previous step. Let's call this the outer box.

Figure 2.3 & 2.4

Eroding & Hough Line Transform - Eroding the grid a bit to undo the effects of the dilation on the outer box that we did earlier.

Erosion is one of two fundamental operations in morphological image processing from which all other morphological operations are based. It was originally defined for binary images, later being extended to grayscale images, and subsequently to complete lattices.

The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing. Hough Line Transform - to find all the lines in the detected outer box.

Figure 2.5 & 2.6

Merging - related lines. The lines found by the Hough Transform that are close to each other are fused together.

Finding the Extreme lines - We find the border lines by choosing the nearest line from the top with slope almost 0 as the upper edge, the nearest line from the bottom with slope almost 0 as the lower edge, the nearest line from the left with slope almost infinity as the left edge and the nearest line from the right with slope almost infinity as the right edge.

Finding the four intersection points - The four intersection points of these lines are found and plotted along with the lines.

Figure 2.7 & 2.8

Warping Perspective - We find the perspective matrix using the endpoints, correct the perspective, and crop the board out of the original image.

Thresholding and Inverting the grid - The cropped image from the previous step is adaptive thresholded and inverted. We find the perspective matrix using the endpoints, correct the perspective, and crop the board out of the original image.

Figure 2.9 & 3.0

Slicing - the grid into 81 slices to get images of each cell of the Sudoku board.

Blackfilling and centering the number - Any white patches other than the number are removed by flood filling with black from the outer layer points as seeds. Then the approximate bounding box of the number is found and centered in the image.

Figure 3.1 & 3.2

Output

Figure 3.3 & 3.4

Grid constructed from the image provided in the input. Grid reconstructed by getting values from the K-Nearest Neighbour algorithm. The accuracy found in this case was approximate 95 to 96%. If an incorrect digit founds in the grid then we can also edit the grid to correct the value and generate an accurate solution. As per our research, it is found that using a convolution neural network (CNN) we can get digits more accurately. Hence the solution will generate accurate results.

For Demonstration of project you can refer our youtube video :

Written By: Manharsinh Jadeja & Kaushal Doshi

Respect By Fremox Dribbble Martial Arts GIF - LowGif

Thank You!

Search This Blog

Something New

Sudoku solving using CV and AI with an Image of Sudoku as Input!

Comments

Post a Comment

Popular posts from this blog

Industry Report

DBMS Query

We'll Make AatmanirbharBharat!