Analysis of battle between CAPTCHA and CAPTCHA farm

As we all know, CAPTCHA is used to distinguish between human and machine. However, with the development of science and technology, some have found lucrative opportunity in it, leading to a protracted battle of attack and defence.

While the CAPTCHA becomes more complicated, new ways to attack it keep cropping up.

This paper starts from the confrontation between verification code and coding platform.

What is CAPTCHA farm?

CAPTCHA farm comes on the scene with the advent of artificial intelligence. It's designed to crack CAPTCHA using artificial intelligence technology.

It works as follow:

In the past, if one wanted to obtain data illegally, they would first send a request to the data page. If the data page did not have any protection mechanism, the data could be easily obtained; if the data page had CAPTCHA defense mechanism, it would not return directly. Instead, CAPTCHA pops up as a condition for human-machine verification, and data is returned only after the CAPTCHA is passed. Generally illegal and semi-illegal behaviors provide no means to deal with CAPTCHA, so they cannot obtain the data.

That's where CAPTCHA farm comes in. Submit the CAPTCHA information to the CAPTCHA farm, which returns the result after CAPTCHA is cracked. This greatly increases the risk of the CAPTCHA being cracked.

To sum up, the original request process only has two-way communication between the one cracking CAPTCHA and the data page. Due to the addition of CAPTCHA farm, it has evolved into the communication between the cracker, the data page, the verification code, and the CAPTCHA farm.

How does CAPTCHA farm work?

So, how does the CAPTCHA farm work to crack all types of CAPTCHA?

It has two major advantages:

Timeliness:

As the CAPTCHA farm finds way to crack CAPTCHA, the CAPTCHA platform also constantly rolls out new forms of CAPTCHA or increase the difficulty of original CAPTCHA. In the case where the difficulty of cracking increases, if the CAPTCHA farm reacts quick and finds new way to crack, the time spent being blocked by CAPTCHA will be greatly shortened.

Efficiency:

After obtaining the relevant data of the CAPTCHA, it will be submitted to the CAPTCHA farm for cracking, and then go to the CAPTCHA page after obtaining the cracking information. If the CAPTCHA farm takes a long time to crack (inefficient), for example, it takes 1ms to solve the slider CAPTCHA and 10ms to solve the click CAPTCHA, then the efficiency is reduced by 10 times. Relatively speaking, it is efficient to solve the slider CAPTCHA, but it is not efficient enough to solve the click CAPTCHA.

Assume the scenario of crawling airline flight data (query flight dynamic information constantly). If the data page adopts the click CAPTCHA, there would be delay compared with slider CAPTCHA.

Two cracking methods used by CAPTCHA farm

In view of these two points, the current coding platform cracking verification code is mainly machine cracking and manual coding two.

1. Automatically:

The advantage of this method is that the recognition speed is fast and the price is low; the disadvantage is that it takes a lot of time to understand method of CAPTCHA, the workload invested in the early stage will be relatively large, and the recognition accuracy rate is relatively low.

At present, the main forms of verification codes are based on image processing, such as slider and picture click CAPTCHA. Therefore, the automatic cracking is mainly performed by identifying the elements in the picture, such as identifying the gap of the slider CAPTCHA, clicking on the text elements and digital elements in the CAPTCHA, by using image processing, image binarization and sliding trajectory simulation. An example of the key elements supported by CAPTCHA farm is shown below:

Due to the continuous improvement of the image processing technology of the CAPTCHA platform, it's more difficult for CAPTCHA farm to analyze the key elements, as shown in the following figure:

In this regard, the CAPTCHA farm is also improving the ability of its machines, such as introducing machine learning and neural networks to intelligently identify elements in pictures and improve the accuracy of identifying and verifying elements. An example of the cracking process with artificial intelligence is shown below:

With artificial intelligence, the recognition efficiency is increased expotentially, which further increases the defence difficulty of CAPTCHA platform. It means that the battle between the CAPTCHA platform and the CAPTCHA farm has come to a new level.

2. Manually

The shortcoming of cracking automatically is that the workload at early stage is large, the technical threshold is high, and the recognition accuracy is low. These can be made up for by cracking manually.

It works as follow: The CAPTCHA farm has a task platform. The cracking party will encapsulate the obtained verification code information into tasks and submit them to the platform. The CAPTCHA farm acts as an intermediary and assigns tasks to the markers (people who crack the CAPTCHA). After the cracking is completed, the result is returned to the cracking party and used on the data page.

The CAPTCHA farm will also keep a copy of the result. In case the same CAPTCHA problem comes up again next time, the result will be returned directly. If the verification pictures of the CAPTCHA platform are not updated for a long time, all the verification pictures will be marked. At this time, there is no need for manual marking, and they are solved automatically.

An example of manual cracking is as shown below:

However, the biggest shortcoming of manual cracking is that the request takes a long time, because it depends heavily on the cracking speed of the marker, but it takes at least a few seconds for any person to solve the puzzle, which is considered time-consuming for HTTP requests (equivalent to hundreds of crackings automatically), therefore, it has evolved into the following workflow:

As shown above, in the past, the CAPTCHA farm passively received verification requests for cracking. Now, the CAPTCHA farm takes the initial to request CAPTCHA service, and after completing the request, the legal token issued by the CAPTCHA service is stored. When the cracking party access the CAPTCHA platform, the token will be issued to them to bypass the CAPTCHA.

What's the best measure?

So, how do CAPTCHA platforms fight against CAPTCHA farm to protect data?

AISecurius believes that in order to do improve the efficiency of defense, CAPTCHA platform must focus on the characteristics of the CAPTCHA farm, that is, the efficiency and timeliness mentioned above.

How it works exactly?

For machine coding, generally speaking, there are the following directions:

1. Increase the frequency of CAPTCHA update: The verification code platform can continuously introduce new CAPTCHA forms to increase the difficulty of cracking automatically. If one tries to bypass the CAPTCHA automatically, when a form of CAPTCHA that has never been seen before pops up, he will be blocked out, and cannot access to the data. Undoubtedly, it adds difficulty to CAPTCHA farm.

2. Improve the difficulty of elements identification: The CAPTCHA platform can improve the identification difficulty of verification elements in the original form. For example, the core verification element of text click CAPTCHA is text. If the text element in the picture is recognized, the correct result can be returned. Improve the recognition difficulty of text elements as shown below:

The three characters on the left have not been processed, and the first character on the right has been processed with coloring and hollowed out. For automatic cracking, it is more difficult to recognize the text that has been processed, and it can also increase the recognition difficulty by replacing the text, rotating, overlapping, twisting, and adding interference items. As shown below:

It also applies for other types of CAPTCHA, such as add shadow to slider CAPTCHA. Converting picture elements from 2D to 3D can increase the difficulty of element identification and verification.

3. Introduce artificial intelligence:

AISecurius has 7 types of CAPTCHA with constant update.

And it has introduced artificial intelligence. For example, using NLP technology, use the given key words to make sentence, use the words in the thesaurus to make sentence, cut sentence, and finally let users restore the sentence, which greatly improves the difficulty of cracking.

Manual cracking may focus on:

1、Ensure that the images used for CAPTCHA are constantly updated;

As mentioned above, it is difficult to identify using manual cracking, and the image used is the key, that's where we should focus on.

AISecurius updates the CAPTCHA pictures frequently, which eliminates the dragging from the source. In this way, the marker needs to continuously verify the new images, which greatly increases the identification cost (and also increases the cost of cracking party), thus protect the users.

2、Determine whether the collection and verification environment is malicious;

Based on the manual coding workflow, AISecurius can identify the manual coding platform by collecting verification environment information, such as determine whether the verification environment information when the verification is completed is consistent with the verification environment information when the token is reported. The behavior of the manual CAPTCHA farm is different from illegal and semi-illegal behavior. Therefore, recognizable environmental information can be collected in the verification process and reported together with the verification results. The CAPTCHA platform configures rules and policies to analyze this information, and screen out requests that may be illegal and semi-illegal behaviors for secondary verification or interception, so as to improve security.

At present, AISecurius relies on device fingerprints and risk engine. During verification, the verification environment information is collected and reported to the real-time risk control engine. The rule engine is used to judge the risk factors, and finally determine whether the request is risk request.

As mentioned above, the battle between CAPTCHA platform and CAPTCHA farm is a protracted one, and the challenge and attack between the two will remain an important topic for all CAPTCHA platforms in the near future.