pfy.ch

I recently learned about the W3 Spec for Webmentions!

This spec is described by W3 as:

[…] a simple way to notify any URL when you mention it on your site. From the receiver’s perspective, it’s a way to request notifications when other sites mention it.

I’ve always wanted to write an implementation for a spec and this one did not seem extremely complex, plus I had a long weekend coming up, so I thought it’d make for a fun challenge!

The Plan

  1. Write a serverless implementation of webmentions.
  2. It should use AWS Cloudformation to stand everything up automatically
  3. It should be cheap to run indefinitely
  4. It should be testable locally

Breaking a receiver down, there are really only 3 endpoints required:

  1. Submit a webmention
  2. Get the status of a webmention
  3. Query webmentions

The basic flow of our function to receive webmentions is as follows:

sequenceDiagram
    participant User
    participant Lambda
    participant DynamoDB
    
    User ->> Lambda: Send webmention request
    Lambda ->> Lambda: Confirm valid request
    Lambda ->> DynamoDB: Create status object
    DynamoDB ->> Lambda: Return status object identifier
    Lambda ->> User: Return 201 & correct headers
    
    Lambda ->> Lambda: Confirm mention exists
    Lambda ->> DynamoDB: Create webmention object
    Lambda ->> DynamoDB: Update status object
    
    User ->> Lambda: Query webmention
    Lambda ->> DynamoDB: Request webmention
    DynamoDB ->> Lambda: Return webmention
    Lambda ->> User: Return webmention

Cloud?! Lambda?! What’s the cost?!

I’ve chosen using DynamoDB & Lambda on AWS since when used correctly they’re extremely cheap and efficient. Both are charged based on usage so if you’re running a small site like mine you can expect the bill to come out at basically nothing. Lets do some quick math.

DynamoDB

DynamoDB charges based on read & write units. With queries against an index costing 1 read unit and writes costing 1 unit per index on the table. The downside of Dynamo is that if you expect your data structure to be complex or have intricate relationships - it can end up costing much, much, more than a standard database.

Amazon currently has the cost of read and write units priced at the following:

Cost
Write Unit$1.4232 / Million
Read Unit$0.2846 / Million

Then if we break down how many units we use per function in our application:

FunctionRead Units UsedWrite Units Used
Initial Request01
Query Request Status10
Create Webmention021
Query URL for webmention10

We can calculate the cost as the following:

FunctionRead units usedWrite units usedCost
1000 Mentions210003000$0.0045542
1000 Page-views10000$0.0002846

Lambda

AWS Lambda is a serverless compute platform where you can just run code. Amazon will provision servers globally for you, and these servers will only run while your function is active.

We run our lambda with 512mb of memory3, and with a timeout of 15 seconds. AWS charge per million requests and per second the function runs based on the memory used.

Cost
Request$0.20 / Million
Memory$0.0000000083 / second

Assuming worst case scenario and our function runs for 15 seconds we’re paying 0.0000003245 per execution. In reality the cost is less than this, especially when querying for webmentions & status objects since its runs at almost sub second speeds.

Requests madeWorst case execCost
Creating 1000 Mentions100015 seconds$0.00032449
Querying 1000 Mentions10001 second$0.0002083

Total stack cost

Combining both our DynamoDB & Lambda costs we can figure out at what point our bill with become “excessive” (more than a dollar).

DynamoLambdaTotal
Creating 1000 Mentions$0.00032449$0.00032449$0.00064898
Querying 1000 Mentions$0.0002846$0.0002083$0.0004929

This means that It’ll take around 1,540,880 webmention creation requests before our bill exceeds a dollar! We can use AWS Budgets & API Gateway Rate limiting to stop execution before we hit this, which I’d highly recommend. Nothing is worse than an unexpected bill at the end of the month.

Implementation

Our final stack should consist of the following:

Tables

The status table is extremely simple, since it only really needs to store basic information temporarily. It would be possible to add a TTL to records, however for now I plan to manually clean this out over time.

The table consists of a single index: id. Which will be used by other services to query the status of webmentions.

statusTable:
  Type: AWS::DynamoDB::Table
  Properties:
    TableName: webmentions-status-table 
    AttributeDefinitions:
      - AttributeName: id
        AttributeType: S
    KeySchema:
      - AttributeName: id
        KeyType: HASH
    BillingMode: PAY_PER_REQUEST

The mention table has a few extra definitions, most importantly it’s target global secondary index (GSI).

We use id as the main unique index, however we do not query it. Instead, our functions query based on the webmention target. By indexing this field, we can get Webmentions for a page at incredibly fast speeds.

mentionTable:
  Type: AWS::DynamoDB::Table
  Properties:
    TableName: webmentions-mention-table 
    AttributeDefinitions:
      - AttributeName: id
        AttributeType: S
      - AttributeName: target
        AttributeType: S
    KeySchema:
      - AttributeName: id
        KeyType: HASH
    GlobalSecondaryIndexes:
      - IndexName: target-index
        KeySchema:
          - AttributeName: target
            KeyType: HASH
        Projection:
          ProjectionType: ALL
    BillingMode: PAY_PER_REQUEST

Lambda

We then define our Lambda & its corresponding API Gateway. This is pretty chunky since we also need to give our function permission to access the tables we’ve just defined.

  ApiGatewayApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: !Ref Stage
      Cors: "'*'"
      MethodSettings:
        - ResourcePath: "/*"
          HttpMethod: "*"
          ThrottlingRateLimit: 100
          ThrottlingBurstLimit: 500

  Webmention:
    Type: AWS::Serverless::Function
    Properties:
      MemorySize: 512
      Timeout: 15
      CodeUri: .build
      Handler: handler.handler
      Runtime: nodejs14.x
      Architectures:
        - x86_64
      Policies:
        - Version: '2012-10-17'
          Statement:
            - Effect: Allow
              Action:
                - dynamodb:Query
                - dynamodb:Scan
                - dynamodb:GetItem
                - dynamodb:PutItem
                - dynamodb:UpdateItem
                - dynamodb:DeleteItem
                - dynamodb:BatchGetItem
              Resource:
                - !Sub ${statusTable.Arn}
                - !Sub ${statusTable.Arn}/index/*
                - !Sub ${mentionTable.Arn}
                - !Sub ${mentionTable.Arn}/index/*
      Events:
        Root:
          Type: Api
          Properties:
            Path: /web-mentions
            Method: any
            RestApiId:
              Ref: ApiGatewayApi
        Sub:
          Type: Api
          Properties:
            Path: /web-mentions/{any+}
            Method: any
            RestApiId:
              Ref: ApiGatewayApi

A minor complaint with AWS SAM is that it’s built in esbuild bundler for functions does not behave as expected, to get around this we manually bundle ourselves.

Luckily esbuild works pretty much on its own and doesn’t require very much external config like webpack or other bundlers.

esbuild src/handler.ts \
  --target=es2020 \
  --platform=node \
  --external:aws-sdk \
  --sourcemap=linked \
  --outfile=.build/handler.js \
  --bundle

Express

Getting express to work on a Lambda is super easy once you know what you’re doing4.

We’ve set up our API Gateway to forward requests at /web-metions or /web-mentions/{any+} through to our Lambda.

import express, { Express, NextFunction, Request, Response } from 'express';
import {
  APIGatewayEventRequestContext,
  APIGatewayProxyEvent,
} from 'aws-lambda';
import serverless from 'serverless-http';

export type RequestContext = Request & {
  context: APIGatewayEventRequestContext;
};

export const createApp = (): Express => {
  const app = express();
  app.use(express.urlencoded({ extended: true }));
  app.use(express.json());
  app.use(cors(corsOptions));
  app.options('*', cors(corsOptions));
  app.use((req: RequestContext, res: Response, next: NextFunction) => {
    console.log(`Request: ${req.method} ${req.originalUrl}`);
    next();
  });

  return app;
};

export const createHandler = (app: Express) =>
  serverless(app, {
    request(request: RequestContext, event: APIGatewayProxyEvent) {
      request.context = event.requestContext;
    },
  });

const app = createApp();
export const handler = createHandler(app);

Once we’ve configured the app we can now use express as we normally would!

app.post(
  '/web-mentions',
  async (request: RequestContext, response: Response) => {},
);

app.get(
  '/web-mentions/status/:id',
  async (request: RequestContext, response: Response) => {},
);

app.post(
  '/web-mentions/query',
  async (request: RequestContext, response: Response) => {},
);

Deploying

AWS SAM provides single command deploy, I’ve wrapped both build and deploy up into scripts in the projects package.json.

export REGION='ap-southeast-2' # Our target AWS region
export PROFILE='pfych-aws' # Our target AWS profile as configured in AWS-CLI
npm run build
npm run deploy:dev

This process takes a few minutes on first deploy but should be sub minute on any subsequent deploys as long as you don’t change the SAM Template.

In closing

It was a fun challenge to implement Webmentions in a serverless manner. It wasn’t extremely difficult since the spec is well-defined. I did skip over web mention updates & deletions, but I’ll implement them at another time. Serverless definitely has warts & almost all of it is in its tooling & documentation. I hope that with time this can improve.

My next steps is to implement sending Webmentions from my CMS when I make new posts, but I haven’t had too much time to work on personal projects recently.

The project is available on Github. It currently is deployed with the Serverless Framework but there is a pending PR to use AWS SAM instead since it has less 3rd party dependencies.


  1. We have 2 indexes, so two write units are consumed↩︎

  2. Assuming the request status is queried once↩︎

  3. We could potentially run this with much less memory if we wanted too↩︎

  4. Classic case of nothing being documented well 🥲↩︎


© 2024 Pfych 🏳️‍⚧️