OUR EXPERT David Bolton used to create PHP websites a long time ago. Luckily, they let him off for good (programming) behaviour and he promised never to create another one again.
QUICK UP If you are passing important parameters between pages, always use Post not Get methods, and include a method of obscuring them, such as hashing, so they can’t be easily changed.
C reating a captcha is something of a challenge; you need to balance it being too difficult for a computer program to solve against not being so hard that a human can’t pass its test.
Most of us have probably experienced difficult captchas that feature bizarre-looking fonts that take two or three attempts to solve. Having both upper- and lower-case letters that are almost indistinguishable is particularly evil. That breaks factor two.
The captcha in this tutorial was developed because an online contact page was getting lots of spam messages. Almost all spam comes from scripts, so anything that can slow them down is going to help the issue. The big problem is coming up with something that is easy to solve for a human but hard for a computer program, and we thought that asking a simple question might work.
So, our captcha is going to output a question in text form, but what sort of question should it be? It has to be relatively easy to answer, which rules out trivia questions. Any trivia we could think of would be too specific; writing a trivia question that anyone can answer is quite difficult. It might, for instance, involve British royalty, but as visitors to the website could come from anywhere in the world, they might know nothing about that. So, let’s forget trivia.
After some thought, we decided on simple mathematical operations, such as 4 + 5 = ? But in that simple text form, it would be a little too easy. Instead, let’s put it the numbers as text, like this:
What is four plus five?
To solve that requires something that can extract the text, make sense of the question and then come up with the correct answer. Possibly the likes of ChatGPT and so on might be able to do that, but this is only meant to be an example; it’s a speed bump, not a complete road block.
Let’s make it a bit harder
What if the text is output as a graphic, rather than text? Normal text on a web page is output typically between <p> and </p> tags, like this, as text characters: <p> What is four plus five?</p> This is very easy for a web-scraping bot to read. However, if the question is output as a graphic instead of text, then the web scraper needs to be able to do optical character recognition (OCR) to extract the text from the graphic. That’s something us humans can do naturally but it takes a lot more effort for a web-scraper tool.
A PHP captcha in action.
And to make it a little harder, we can draw lines through the text to try to obscure it somewhat. So, we end up with something that looks like the screenshot (above) on the web page.