Categories
Uncategorized

WordPress CMS : Developer Notes

Working presently to migrate a large (2000+ pages, 10gb images) travel blog from a bespoke PHP CMS to WordPress.

Over last 15 years task of developing, updating and maintaining the handwritten CMS has become burdensome and time consuming.

There is a dependency on me as developer being only person who knows architecture / codebase.

Migrating to WordPress makes sense for a number of reasons – developers, administrators and content creators worldwide are familiar with the platform.

There is a healthy marketplace for commercial and freely available plugins, themes and tooling along with a choice of hosting providers.

WordPress is the internet’s most widely deployed CMS with estimated 65% market share

A few developer notes on implementation / architecture –

Posts / Pages

Aren’t these the same thing – content? Isn’t WordPress now a generalised content management platform rather than being specific to blogs?

Posts are displayed chronologically having URLs in form /<year>/<month>/<day>/<post-name> but otherwise posts and pages have the same a data model, right?

Shouldn’t tables post and page be merged into a new table “wp_content” – with content-type field to distinguish different page types?

Content Tree / URL Index

WordPress organises content hierarchically using parent – child database relation with each tier representing URL segments /<tier-1>/<tier-2>/…

This makes searching for content by URL relatively inefficient. Large sites might also encounter wp_posts.page_name namespace collisions

Rendering a content tree requires a recursive SQL or application code algorithm to “walk the tree”.

Why not decouple wp_posts.page_name and store as fully qualified URL in an index table?

                      Table "public.url_index"
   Column    |          Type          |  
-------------+------------------------+-----------+-----
 content_id  | integer                | 
 url         | character varying(256) | 

-- SQL to retrieve all pages under /blog/2023
select
 c.id,
 c.title,
 c.content,
 i.url
 from wp_content c, wp_url_index i
where c.id = i.content_id
and i.url like '/blog/2023%'
order by i.url asc, c.id asc

Parent / child relation is no longer required – by publishing a page under a specific URL parent hierarchy is implicit and can be determined simply be traversing URL segments.

Having a URL index table makes retrieving one or groups of pages very low cost and efficient as a b-tree index can be added to url_index.url

Database Optimisation

To my mind WordPress core requires only three tables and fields could be simplified significantly:

wp_content: <id>,<title>,<content>,<author-id>,<created-date>,<last-updated>, <content-type>, <version-id>
wp_metadata: <content-id>,<metadata-key>, <metadata-value>, <metadata-type>
wp_url_index: <content-id>,<url>

Metadata is a generalised key/value store linked to content by id.

Any structured metadata relating to a page can be maintained – Plain text (integers, floats (albeit as text), varchar) JSON, serialised PHP objects even JPEG encoded images.

Ok, lets consider a query for posts having extended metadata representing price from – to field:

select 
 mk.content_id,
 mvf.metadata_value,
 mvc.metadata_value,
from 
wp_metadata_key mk,
wp_metadata_value_float mvf,
wp_metadata_value_varchar mvc,
where mk.metadata-type = "price"
and mk.id = mvf.metadata-key-id 
and mk.id = mvc.metadata-key-id 
and mvf.metadata-value BETWEEN "30" AND "50"

Metadata in this scenario has container for varchar() but also one for float field enabling numerical queries.

Version ID

By maintaining version id field alongside content or metadata application can automatically determine appropriate implementation to retrieve, render or update.

Version ID beyond describing WordPress version number could also specify this metadata is stored as format serialised JSON or this is a serialised PHP 5 object – over time as application evolves different implementations can be supported even concurrently.

Conclusion

We appreciate making changes to any codebase having grown organically through time is not trivial – especially when a large market of 3rd party plugins / tools are based on current implementation and would potentially need to mirror changes in their own new versions.

Planning a smooth migration path is not an easy undertaking. But on the other side, optimisations at architecture level bring significant benefit increasing application performance, lowering computational resource costs, make application code easier to work with and allow the platform to scale efficiently.