What is a Webhook?
Web hooks are a incredibly useful and a resource-light way to implement event reactions. Web hooks provide a mechanism where by a server-side application can notify a client-side application when a new event (that the client-side application might be interested in) has occurred on the server.
Webhooks are also sometimes referred to as “Reverse APIs”. In APIs, the client-side application calls (consumes) the server-side application.Whereas, in case of web hooks it is the server-side that calls (consumes) the web hook (the end-point URL provided by the client-side application), i.e. it is the server-side application that calls the client-side application.
Handling Incoming Webhooks in PHP
An increasing number of applications now offer webhooks as an integration, often in addition to an API. The classic example, familiar to most developers, is the GitHub webhooks which can notify your other systems such as CI tooling that a new commit has been added to a branch. If you imagine how many repositories exist on GitHub, and how many other systems react to changes on each repository ... there's a reason they are excellent with webhooks! Whether it's your source control, updates from your IoT sensors, or an event coming from another component in your application, I have some Opinions (TM) about handling webhooks, so I thought I'd write them down and include some code as well, since I think this is an area that many applications will need to work with.
Receive and Respond
The majority of problems I've seen or created when working with incoming webhooks is to try to do too much in a synchronous way - so doing all the processing as the hook arrives. This leads to issues for two reasons:
- the incoming web connection stays open while the processing is taking place. There are a limited number of web connections, so once we run out, the next connection has to wait, making the system slower .... you get the idea. This sort of thing is what makes the "hockey stick" graph shapes we see on the web, where things get slower and then make everything else slower and it all snowballs
- if something goes wrong in the middle, you have no way of retrying that piece of data
So my advice is to immediately store and then acknowledge incoming data, then process it asynchronously. The best solution here is to use a queue but if it's not straightforward to add new dependencies to your application then you can absolutely start off with a simple database. Store a record for each incoming webhook, with some sort of unique identifier, a timestamp of when it arrived, probably some status field to say if it's been processed, and the whole webhook data payload as you received it. It's probably also helpful to put some of the key fields from the incoming payload into their own columns such as account number or event type, depending what sort of data you're handling.
Quick Code Example
Here's a quick piece of code I use in one of my talks on this topic, using PHP to receive an incoming webhook and store it to CouchDB (adapt as required if you're not using CouchDB, this would work perfectly well with MySQL as well, this is just from a project that uses CouchDB).
<?php
if($json = json_decode(file_get_contents("php://input"), true)) {
print_r($json);
$data = $json;
} else {
print_r($_POST);
$data = $_POST;
}
echo "Saving data ...\n";
$url = "http://localhost:5984/incoming";
$meta = ["received" => time(),
"status" => "new",
"agent" => $_SERVER['HTTP_USER_AGENT']];
$options = ["http" => [
"method" => "POST",
"header" => ["Content-Type: application/json"],
"content" => json_encode(["data" => $data, "meta" => $meta])]
];
$context = stream_context_create($options);
$response = file_get_contents($url, false, $context);
This script starts by trying to guess if we have incoming JSON data or an ordinary form post - and either way creates a $data
array which is the incoming payload of the webhook. It also outputs this for debugging purposes, which helps to see what arrived. If there is any uncertainty about the reliability of the data format, or if you are integrating with a third-party system, you might also want to store the actual contents of file_get_contents("php://input")
verbatim in case they are needed for debugging or debate about who broke what!
With the data in hand, this script sets up a $meta
variable as well, with the additional fields to store (in this case, just a status and the user agent. The database itself will give our record a unique identifier. Finally the POST
request that is set up on line 18 will be how we insert the data to our database.
It isn't called out explicitly here but when a PHP script completes successfully, it will return a 200 OK response. Note that there's no additional steps here, no validation or checking of fields, or fetching of extra data. Just accept, and once it's successfully stored, return a "Thanks!" (or rather, a 200 OK status).
Planning for Processing
With this data in place, then you can process the webhooks asynchronously. If you used a queue rather than the database, then you'll set up a few workers to process the incoming data. With a solution like the one above, I'd recommend a cron job to pick up unprocessed jobs and actually process the data. You can always webhook back when they are finished if you need to offer notifications of whether the data was successfully received and processed. One more word of advice here: put a limit on how many unprocessed jobs are picked up, and mark them as "being processed". If the system is under a lot of load then more than one of these processes will be useful so being able to pick up a few waiting jobs each will be useful!
Hopefully the example here helps to illustrate the point I tried to make about the incoming webhooks. For a scalable system, each part of a system wants to be as independent as possible and the tactics outlined here have worked well for me in the past - hopefully they're useful to you too.