Computer Vision Department: What do we do

Hello 👋🏻

My name is Andrija Urosevic, and I am a Computer Vision Team Lead @ Blinking.

Here, at the Computer Vision Department, we are taking part in making it easier for everyone around the world to create their digital identity using government issued documents and face as a proof of presence. Having a digital identity allows us to do things at a much faster pace, and it’s usually a much easier way of doing things. To read more about digital identity, please take a look at this wonderful explanation from my colleague Ivana Jovičić: Meet Your Digital Twin.


Let’s get back to our topic. So what does a Computer Vision Engineer do at Blinking? A Computer Vision Engineer at Blinking is capable and equipped with knowledge at both research and the development part of the process.

The research part plays a critical role in: reading papers, collecting the necessary data, wrangling, and at the end training and retraining state of the art Machine Learning models to find the best fit for our needs.

At the development side of things we work on making the models available to use from a service structure perspective, so other components of the Blinking machinery can use it and present it to the end user. Another crucial role of the development process is to create the logic for scanning personal documents from all over the world using only camera lenses.

We all share the obsession for thriving to perfect each segment of the work. And we also put an emphasis on writing the clean code, making the system sustainable and helping the newcomers not to feel overwhelmed.

To give you a closer look at a more practical side of the things, let me break down and explain the work looking from the three perspectives:

  • OCR
  • Biometric
  • Fraud Detection

Each domain plays a crucial role in extracting real-world user data and creating a reliable and secure digital version of it. By extracting and verifying user data, we are creating a platform with endless possibilities to both users and other businesses that users allow to use their data.


We are doing optical character recognition (OCR) to detect and extract text data from the personal document user provided us. But before doing so, there are a few things we need to make sure are right.

When a user takes a photo of his personal document, it usually needs some work on it before we go straight into the OCR process. Let’s take a look at Picture 1.

No alt text provided for this image

Picture 1: Possible input image from the user

One of the practical examples of a user taking a picture of its document may result in the document being skewed by all three axes. It can even have something in the background that could interfere with our process if not treated properly.

So what we do here is: We find the document on the given picture, deskew it, and eliminate the background so we can work just on the personal document, Picture 2.

No alt text provided for this image

Picture 2: Preparing the input image for OCR

Now that we have a better representation of the given personal document, we can go to the first stage of OCR: Let’s find the text fields.

No alt text provided for this image

Picture 3: Finding text fields on personal document

Looking at the left side of the Picture 3 we have found text fields on the personal document, and we need to perform the cleaning process, where we keep only the text fields that interest us. This step is somehow similar to all the personal documents in the world, but each has its own specific characteristics that need to be addressed individually.

Then when we are left only with the desired regions on the document, the next step would be the image preprocessing, see Picture 4. To find more about the image preprocessing, please take a look at the article written by the colleague from my team, Pavle Milošević: When theory meets practice: A computer vision case.

No alt text provided for this image

Picture 4: Preprocessing and reading

The final stage of the OCR process is reading the text from the preprocessed image.

If you would like to try to replicate the process yourself, you can use the open source tools such as OpenCV for image preprocessing and Tesseract for reading the textual data from the image.

After we read all the personal data from the provided document, there is one more thing that represents you to collect, but it is not in textual form.


Biometric data plays an important role in verifying the person’s identity. As for now, here at Blinking we are doing face and fingerprint biometric verification. In this article I’ll focus only on the face.

Let’s continue with our process of collecting all the necessary data from the personal document.

One more thing that we need, to have a complete digital representation of a user is its face from the provided personal document. To find and extract the face from the document, we use a face detection model, take a look at the Picture 5.

It is not common, nor recommended to work directly with a picture of a face. Instead of doing that we take the face image given by the face detector and extract the vector of face features that represent each user uniquely and work with that data. This way the vector of face features can not be used to connect to the user outside the Banking system, making it a safe solution for the comparison during the active session. Once the session expires all the face data connected to the user is safely deleted. Sometimes before working with the vector of face features we make sure that even the vector is encrypted.

No alt text provided for this image

Picture 5: Detecting face on the personal document

If you would like to try this step yourself, here are some of the open source solutions: OpenCV’s HAAR as a lightweight solution, or dnn face detector also from OpenCV. We are using a home-made proprietary solution for face detection, but these two are good enough if you are looking to start somewhere.

Fraud Detection

We have our user onboarded, Yay! But wait, how can we be sure that our user is who he claims to be in real life?

To remove this dilemma, we introduce the concept of Fraud detection. To approve digital identity to our user we perform various types of fraud detection, both during the process of taking a picture of the personal document, and afterwards when he’s asked to confirm his presence.

Document Fraud Detection

No alt text provided for this image

Picture 6: Fraud detection on front side of the personal document

To increase the probability of catching possible fraud attacks during the document scanning process, we pick several points on the document and send them as an input to a fraud detection model. Each document has unique regions that we detect and send for inspection. See example at Picture 6. This process is repeated for each side of the document, if it has more than one.

In addition to inspecting the specific regions on the documents, we also inspect the image as a whole, to make sure that it wasn’t a home-printed document or someone took a picture of a document from the screen.

Face Fraud Detection

We made sure that the document was government issued. Now we would like to make sure that the person taking a picture of the document is present and that the document belongs to that person.

We ask a user to take a picture of himself, and we analyze the given picture similar to how we analyzed the picture of the personal document. This method is known as the static approach in fraud detection. In some use cases, we want our user to perform specific actions in short video intervals. We first present action to the user, such as: smiling, opening mouth, moving head in the requested direction, etc. Then we expect the user to repeat it in the given timeframe. This method is a dynamic approach to face fraud detection.

Now that we are sure our user was not taking a picture of a fake or non-present document and face, we need to check one more thing: Is the person who’s taking the pictures really the person whom the document belongs to. We check that during the process of face verification. This belongs to the Biometric part of the article, but I wanted to leave it for the end to simulate the complete user journey when doing the Know Your Customer(KYC) procedure.

Remember how we saved a face from our users document in the form of a vector with face features? Now we use that face representation and compare it to the face representation of a face image we asked our user to take. If we get the match from the face verification model, we then have successfully completed the process and proceed into creating our users digital identity.


There are a few layers of knowledge you can expect to get as a Computer Vision Engineer at Blinking.

The first one is learning how to research things: reading and understanding the papers, searching for necessary data, improving the data you have acquired, and creating the state of the art solutions when implementing Machine Learning models.

The second set of skills are software development skills. Here you’ll learn how to make your Machine Learning models available to others, how to write code that others can easily understand and improve on, and how to combine all the necessary tools that you have into a product.

This was an overview of what you can expect to do being a part of the Computer Vision team at Blinking. Stay tuned and follow our blog for a more detailed and more technical view of each of the topics I’ve covered in this article.


Andrija ✌🏻