7 MIN READ TIME

PERL

How to use Mojolicious for web scraping

Mark Gardner reveals how you can retrieve and parse HTML and XML from websites with a few lines of Perl and the Mojolicious framework.

Part One

Don’t miss next issue! Subscribe on page 16

OUR EXPERT

Mark Gardner is a software developer and blogger with over 25 years of IT experience. You can reach him at www.phoenixtrap. com and @markjgardner.

So much of the modern web is driven by services and front-end interfaces talking to APIs that it’s easy to lose sight of the fact that everything is ultimately presented in a soup of HTML markup. In the absence of a well-structured interface or format, sometimes the code you’re writing needs to scrape the ingredients of that soup apart and parse out meaningful data. Perl’s Mojolicious web framework includes a set of components that make this task easier.

Although most Linux distros come with a version of Perl, it helps to have your own installation separate from the system so you’re not tied to a possibly older version that’s required to support operating system tools and other packages. This separate installation can live in your $HOME directory (or wherever you specify) with its own modules that neither require sudo to install nor interfere with those handled by the package manager.

The most popular tool for managing separate Perl installations is called Perlbrew. Installation instructions are at https://perlbrew.pl/Installation.html. You can install it with either of the following shell commands, depending on what you already have installed:

Unlock this article and much more with
You can enjoy:
Enjoy this edition in full
Instant access to 600+ titles
Thousands of back issues
No contract or commitment
Try for 99c
SUBSCRIBE NOW
30 day trial, then just $9.99 / month. Cancel anytime. New subscribers only.


Learn more
Pocketmags Plus
Pocketmags Plus

This article is from...


View Issues
Linux Format
April 2022
VIEW IN STORE

Other Articles in this Issue


WELCOME
MEET THE TEAM
This issue we’re compiling the Linux kernel from source. We wondered if our experts compile anything from source, or does even the thought of this cause them to run screaming into the night?
Kernel of truth
Compiling the kernel from source was always a
REGULARS AT A GLANCE
Linux malware grows by 35%
Malware aimed directly at Linux systems increased drastically last year
Major Linux exploit found
Almost every distro is affected by a major vulnerability – make sure that your systems are patched
US government wants your messages
And the UK government wants to confirm IDs online
IT’S COOL
Michael Meeks is general manager at Collabora Productivity
OLD AND UNTRUSTED
Keith Edmunds is MD of Tiger Computing Ltd,
Framework open sources its firmware
Users can change how the modular laptop manages its hardware
FreeCAD Project Association formed
New legal non-profit association set up in Belgium
LibreOffice 7.3 Community now out
New features in the latest version of the office suite
Distro watch
What’s down the side of the free software sofa?
TRUST ME?
Matt Yonkovit Head of open source strategy and
MEASURE THE PULSE
Jon Masters has been involved with Linux for
KERNEL WATCH
Jon Masters summarises the latest happenings in the Linux kernel, because someone has to…
Answers
Got a burning question about open source or the kernel? Whatever your level, email it to lxf.answers@futurenet.com
Mailserver
WRITE TO US Do you have a burning
Hot Picks
THE BEST NEW OPEN SOURCE SOFTWARE ON THE PLANET
REVIEWS
Intel Core i5 12400
It’s the new Q6600, Dave James’ greatest compliment ever!
ArchLabs 2022.02.12
Mayank Sharma has a soft spot for new Arch-based distros, and this time around he’s found one that tickles his fancy
UBOS Linux
Mechanisms for simplifying the installation of popular web tools are like snake oil. Or so thought Mayank Sharma, until he came across UBOS…
Slackware 15.0
He’s no masochist… until there’s a new Slackware release, when Mayank Sharma puts himself through all kinds of pains to relive the good ol’ days
Valve Steam Deck
Our PCGamer friends got a hands-on preview hardware of the Steam Deck before the big launch at the end of February. Here’s what they learnt…
Ubuntu alternatives
WE COMPARE TONS OF STUFF SO YOU DON’T HAVE TO!
The Verdict
Ubuntu alternatives
BUILD THE KERNEL
The kernel is what makes Linux tick. Jonni Bidwell is happy to get his hands dirty and help you tune up those ticks…
Grasp the kernel basics
Just what is a kernel and why is it telling my computer what to do?
Compiling a kernel
Get straight to business and build your own Ubuntu-esque kernel
Kernel minification
Perfection is reached not when there’s nothing left to add, but rather when there’s nothing left to take away…
Popular patches
Forget trawling through configs – use a pre-rolled patchset to set the rules
Pi USER
The maker’s Coolest Projects goes Global!
The once US-centric event is opening its doors to makers from around the world. Yes, that means you!
FLIRC Pi Zero Case
Les Pounder keeps his cool with a case for his beloved Raspberry Pi Zero 2 W. Will it beat the heat of a benchmark?
Assemble a Micro Dot pHAT news ticker
Les Pounder builds his own homage to the famous Times Square news ticker, this time scaled for ants!
Create your own Chromecast device
Mats Tage Axelsson takes you on a tour of the Linux tools that will enable you to stream media between your laptop and other devices
IN-DEPTH
LINUX FROM SCRATCH
Like to get your hands dirty? Aaron Peters is your man, as we get deep into building Linux
TUTORIALS
Rapid fuzzy finder
Shashank Sharma isn’t one for magic, but he’s not averse to using the Accio spell or its computing incarnation, fuzzy finder, to find things quickly
Store and search your research notes
Always on the look-out for organisational tools, Nick Peers reveals how to bring all your research materials and notes into a single convenient space
May the forth be with the Jupiter Ace
Les Pounder nips back to the early 80s and pays homage to a home computer that sold less than 6,000 units, despite its go-faster stripes!
Make your home as smart as possible
Matt Holder, who’s a bit of a clever-clogs himself, investigates the usage of Home Assistant to make your home as smart as it can be
Publishing your own slick ebooks
Seeking to chalk up a best seller, Michael Reed goes further into the intricacies of publishing a book from his Ubuntu desktop
Create your own virtual classroom
David Rutland moodles along with classroom management software on the LXF virtual private server and starts his own online course
TOP OF THE FOSS
CHARITABLE CHARACTERS
The Emmabuntüs collective enlightens Jonni Bidwell as to its kind and open source efforts
CODING ACADEMY
Working with binary tree data structures
Mihalis Tsoukalos explains how to construct and use binary trees for faster searches and easier relationships (of the data sort, mind)
Interact with your 3D game environment
Enhance your gaming world by adding collision detection and custom objects, as Andrew Smith throws barrels at you
Chat
X
Pocketmags Support