Derick RethansXdebug Update: August 2024 (3.9.2024, 13:35 UTC)

Xdebug Update: August 2024

Tuesday, September 3rd 2024, 14:35 BST

London, UK

In this monthly update I explain what happened with Xdebug development

GitHub and Pro/Business supporters will get it earlier, around the first of each month.

On GitHub sponsors, I am currently 32% towards my $2,500 per month goal, which is set to allow continued maintenance of Xdebug. This is again less than last month.

If you are leading a team or company, then it is also possible to support Xdebug through a subscription.

In the last month, I spend around 8 hours on Xdebug, with 20 hours funded. I also spent 8.5 hours on the Native Path Mapping project.

PHP 8.4

The large change for the exit-as-a-function proposal has been merged into PHP, which means I had to spend some time finishing and tidying up a Pull Request that Tim Düsterhus had provided.

I also improved Xdebug's CI so that it can run as a separate user.

Right now, there don't seem to be any outstanding issues with PHP 8.4 so I will soon create another alpha (or beta) release so that you can try out Xdebug with the latest beta versions of PHP 8.4's.

Native Xdebug Path Mapping

I continued with the Native Xdebug Path Mapping project.

This is a separately funded project, I don't classify the hours that I worked on this in "Xdebug Hours". I also publish a separate report on the project page.

Again, I am including it here:

I continued with the parser to parse the path mapping files. The parsing is now finished, although I do still need to modify how the parsed information is stored. Right now, the data structures are not optimised to be able to do line mapping as well.

Because the parser parsers user input, it is important that the parser is rubust. It should be able to handle correctly formatted files, but also files with errors.

It is not always possible to come up with all the failure situations by thinking, and therefore a common technique is to use a fuzzer. For PHP there is Infection PHP for mutation testing for example. For C, and C++, a commonly used tool is AFL++. This provides a compiler wrapper and a run-time to fuzz the input to your application. You first provide a template, which it then modifies to try to break your code.

The template that I used in this case was a minimal map file:

remote_prefix: /usr/local/www
local_prefix: /home/derick/project
/projects/example.php:5-17 = /example.php:8
/example.php:5-17 = /example.php:8-20
/example.php:17 = /example.php:20
/projects/php-web/ = /php-web/

In addition to the template you also need to provide a shim — the program that in my case takes the argument given to it and then parses that as a file. The AFL++ tool's compiler wrapper adds some magic to it to be able to catch errors.

When you then run the fuzzer, such as with:

AFL_SKIP_CPUFREQ=1 afl-fuzz -b 7 -i fuzz-seeds -o fuzz-output -- ./afl-test @@

It then runs your program with your template files (in fuzz-seeds in my example). And then also with loads of variants.

The fuzzer found a few errors in my parser, which ended up crashing it. One such examples that I hadn't thought of is of a line starting with a =:

remote_prefix: /usr/local/www
local_prefix: /home/derick/project
=/example.php:42-5 = /example.php

You can find the other cases it found in a dedicated test file.

In September I hope to finish the parser (by storing things more efficiently internally) as well as creating APIs to do the mapping.

Xdebug Videos

I have created no new videos since last month, but you can see all

Truncated by Planet PHP, read more at the original (another 1990 bytes)

Link

Rob AllenPrevent the Docker container from taking 10 seconds to stop (3.9.2024, 10:00 UTC)

For one project that I'm working on the PHP-FPM-based Docker container is built from a Ubuntu container with PHP is installed into it.

A little like this:

FROM ubuntu:22.04

RUN apt-get update && apt-get upgrade -y && apt-get install -y gnupg curl

# Register the Ondrej package repo for PHP
RUN mkdir -p /etc/apt/keyrings \
    curl -sS 'https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x14aa40ec0831756756d7f66c4f4ea0aae5267a6c' | gpg --dearmor | tee /etc/apt/keyrings/ppa_ondrej_php.gpg > /dev/null \
    && echo "deb [signed-by=/etc/apt/keyrings/ppa_ondrej_php.gpg] https://ppa.launchpadcontent.net/ondrej/php/ubuntu jammy main" > /etc/apt/sources.list.d/ppa_ondrej_php.list \
    && apt-get update

# Install PHP
RUN apt-get install -y php8.2-cli php8.2-fpm php8.2-dev \
       php8.2-sqlite3 php8.2-gd php8.2-intl php8.2-imagick \
       php8.2-mbstring php8.2-xml php8.2-zip php8.2-bcmath \
       php8.2-curl php8.2-pdo php8.2-opcache php8.2-gettext \
       php-pear


# other stuff, such as copying over settings files


# Run php-fpm
WORKDIR /var/www/html
EXPOSE 9000
CMD /usr/sbin/php-fpm8.2 -F -R

This works fine, however, when stopping the containers with docker compose down, I noticed that it takes 10 seconds for the PHP container to stop:

$ docker compose down
[+] Running 5/5
 ✔ Container portal-web-1   Removed                            0.1s
 ✔ Container portal-db-1    Removed                            0.3s
 ✔ Container portal-mail-1  Removed                            0.1s
 ✔ Container portal-php-1   Removed                           10.1s
 ✔ Network portal_default   Removed                            0.0s

Googling around, it seems that 10 seconds is a Docker timeout, so it seems that the PHP container isn't shutting down itself, but is being killed by Docker.

Further googling and many tests later, I found that the solution is to use exec form for the CMD:

CMD ["/usr/sbin/php-fpm8.2", "-F", "-R"]

Once I had made this change, stopping the PHP container takes 0.1s as I would hope:

$ docker compose down
[+] Running 5/5
 ✔ Container portal-web-1   Removed                            0.2s
 ✔ Container portal-mail-1  Removed                            0.1s
 ✔ Container portal-db-1    Removed                            0.3s
 ✔ Container portal-php-1   Removed                            0.1s
 ✔ Network portal_default   Removed                            0.0s

Much better!

Aside: why does this work?

The underlying reason is the way unix-like operations systems handle termination of processes. When a process needs to terminate via say a SIGTERM signal, it will terminate and become one of those "zombie" processes that you sometimes see when you type ps. It's parent process then "waits on" (or "reaps") it for the exit code and then it is really gone. If that process that's been terminated has children, then they now no longer have a parent and so the init process (PID 1) takes them over ("adopts" them) and all is well with the system.

With Docker however, we usually run our process as PID 1 and we need this process to correctly reap child processes and adopt orphans (i.e. act like init). Specifically, CMD /usr/sbin/php-fpm8.2 -F -R will start Bash as PID 1 and then php-fpm as a child. This seems to be fine as Bash can reap and adopt processes, except that Bash doesn't handle signals properly!

On shutdown, Docker sends SIGTERM to Bash, which terminates. However, Bash, does not send SIGTERM to its child processes (php-fpm in this case) and so it sits there waiting php-fpm to terminate. As php-fpm doesn't know that it needs to terminate, we have to wait 10 seconds until Docker gets bored and kills the container via SIGKILL.

Knowing this, the solution becomes obvious: we want php-fpm to be PID 1 as it knows how to handle it's children. This happens when you specify the CMD using exec form as a shell is not created which leaves php-fpm as PID 1 and the SIGTERM is handled correctly.

This leads naturally on to another way to solve it: use the shell's exec command:

CMD exec /usr/sbin/php-fpm8.2 -F -R

This has the same effect as bash exec will replace the current shell with the command being executed.

Another solution is to run a real init process as PID 1. There are many options available out there.

Link

PHP: Hypertext PreprocessorPHP 8.4.0 Beta 4 now available for testing (29.8.2024, 00:00 UTC)

The PHP team is pleased to announce the release of PHP 8.4.0, Beta 4. This is the second beta release, continuing the PHP 8.4 release cycle, the rough outline of which is specified in the PHP Wiki. For source downloads of PHP 8.4.0, Beta 4 please visit the download page. Please carefully test this version and report any issues found in the bug reporting system. Please DO NOT use this version in production, it is an early test version. For more information on the new features and other changes, you can read the NEWS file or the UPGRADING file for a complete list of upgrading notes. These files can also be found in the release archive. The next release will be Beta 5, planned for 12 September 2024. The signatures for the release can be found in the manifest or on the QA site. Thank you for helping us make PHP better.

Link

Matthew Weier O'PhinneyConfiguring PHP.INI settings in a PHP-FPM pool (27.8.2024, 22:37 UTC)

I consume PHP via Docker primarily, and to keep it manageable, I generally use a PHP-FPM container, with a web server sitting in front of it. I learned something new about PHP configuration recently that (a) made my day, and (b) kept me humble, as I should have known this all along.

What was it? quite simply, the php_admin_value struct can be used to configure php.ini settings for the pool. This is a great alternative to also adding PHP configuration settings via php.ini (or an include file for php.ini), as it allows you to keep the settings specific to that pool. That way, if you must have multiple pools (e.g., to serve multiple applications from the same machine and/or same PHP version), you can still have separate configuration for each.

How does it work? In your pool configuration, add values to that struct:

php_admin_value[memory_limit] = 32M
php_admin_flag[error_reporting] = E_ALL & ~E_NOTICE & ~E_DEPRECATED
php_admin_flat[track_errors] = Off
; etc

With ZendPHP, we just launched some Ansible tooling, which operates on the assumption that you are deploying PHP-FPM — and as part of its operation, it creates a template for the FPM pool configuration, but not for the PHP SAPI. And this is fine! Because you can use the php_admin_value settings to configure the pool for the application you're deploying!

Looking forward to simplifying a few of my deployments with this!

Configuring PHP.INI settings in a PHP-FPM pool was originally published 27 August 2024 on https://mwop.net by Matthew Weier O'Phinney.

Link

Christopher JonesPipelined database operations with python-oracledb 2.4 (20.8.2024, 22:08 UTC)

Python-oracledb 2.4 introduces pipelining functionality to improve the performance and scalability of your Python applications. Pipelining is simple in python-oracledb: you add operations such as INSERT and SELECT statements to a “pipeline” object, send that object to the database which processes the statements, and finally all the results will be returned to the application. Since the API is asynchronous, your application can submit the pipeline and continue with other local tasks while the database is doing its processing. This lets the database server and the application be kept busy, giving efficiencies and letting them work concurrently instead of waiting on each other for statements and results to be submitted/fetched.

Flow diagram described in the caption — Multiple SQL statements are sent in a single round-trip. Your application continues doing non-database work (i.e. do_local_stuff()) while the SQL statements are being executed. Results are returned from the database when it has executed all statements.

Pipelining Overview

Pipelining is useful when many small database operations need to be performed in rapid succession. It is supported by various drivers, including JDBC, Oracle Call Interface, ODP.NET, and Python.

The benefits of Oracle Database 23ai Pipelining:

Your app can do local work at the same time the database is processing statements.
Your app doesn’t have to wait on one database response before sending a second statement.
Your app is kept busy.
After the database finishes one statement, it doesn’t have to wait for your app to fetch results and send a second statement.
The database is kept busy.
Fewer round-trips: reduced Oracle network listener wake-ups. Reduced interrupts. More efficient network usage.
Better overall system scalability.

The reduction in round-trips is a significant contributor to pipelining’s performance improvement in comparison to executing the equivalent SQL statements individually. But, even with high-speed networks, where the performance benefit of pipelining may be lower, the database and network efficiencies of pipelining can still help system scalability.

Pipelining in Python is available via python-oracledb’s Async classes: this means you must use the default Thin mode of python-oracledb, and use it in an asynchronous programming style. Pipelining works with Oracle Database 23ai. (Although you can actually use the new python-oracledb API when connected to older database versions, you won’t get the internal pipelining behavior and benefits — this is recommended only for migration or compatibility reasons). You can get Oracle Database 23ai from Oracle Database Software Downloads.

You can use the following python-oracledb calls to add operations to a pipeline:

add_callfunc() - calling a stored PL/SQL function
add_callproc() - calling a stored PL/SQL procedure
add_commit() - commiting current transaction on the connection
add_execute() - executing one SQL statement
add_executemany() - executing one SQL statement with many bind values
add_fetchall() - executing a query and fetching all the results
add_fetchmany() - executing a query and fetching a set of the results
add_fetchone() - executing a query and fetching one row

Note the database processes the pipelined statements sequentially. The concurrency gain is betwen your application’s local work and the database doing its work.

Query results or OUT binds from one operation cannot be passed to subsequent operations in the same python-oracledb pipeline. If you need to use results from a pipeline operation in a subsequent database step, you can use multiple pipelines.

Social Network Example

An example use case is a social networking site. After you log in, your home page needs to show information gathered from various sources. It might show how many of your friends are also logged in. A news feed might show the top news items of the day. Current and forecast temperatures could be shown. Some of this data could be in a database, but require several distinct queries to fetch. Finding the current temperature might require Python calling out to a service to return that data. This is great use case for Pipelining: the distinct queries can be sent in a pipeline for processing, while the remote temperature sensor data is gathered at the same time.

The full code for this simple example web app is in

Truncated by Planet PHP, read more at the original (another 11198 bytes)

Link

Christopher JonesPython-oracledb 2.4 has been released (20.8.2024, 22:07 UTC)

Python-oracledb 2.4, the extremely popular Oracle Database interface for Python, is now on PyPI.

Python-oracledb is an open source package for the Python Database API specification with many additions to support advanced Oracle Database features. By default, it is a ‘Thin’ driver that is immediately usable without needing any additional install e.g. no Instant Client is required. Python-oracledb is the new name for the cx_Oracle driver.

To get started quickly, use samples/sample_container to create a container image containing Oracle Database and python-oracledb.

Top Features in python-oracledb 2.4

Support for Oracle Database 23ai Pipelining. This is a great feature. For details, see my companion blog post Pipelined database operations with python-oracledb 2.4.
A refactored connection string parser to improve support for various connection string syntaxes.
Added packages for Python 3.13 and dropped support for Python 3.7.

Other enhancements and bug fixes also landed. Check out the release notes for details.

Installing or Upgrading python-oracledb

You can install or upgrade python-oracledb by running:

python -m pip install oracledb --upgrade

The pip options --proxy and --user may be useful in some environments. See python-oracledb Installation for details.

Python-oracledb References

Home page: oracle.github.io/python-oracledb/index.html

Installation instructions: python-oracledb.readthedocs.io/en/latest/installation.html

Documentation: python-oracledb.readthedocs.io/en/latest/index.html

Release Notes: python-oracledb.readthedocs.io/en/latest/release_notes.html

Discussions: github.com/oracle/python-oracledb/discussions

Issues: github.com/oracle/python-oracledb/issues

Source Code Repository: github.com/oracle/python-oracledb

Link

Christopher JonesQuery result caching for fast database applications (15.8.2024, 07:39 UTC)

Oracle Database’s built-in “Client Result Cache” is an efficient, integrated, managed cache that can dramatically improve query performance and significantly reduce database load when repeatedly querying mostly-static tables, such as postal codes or part numbers. No application changes are needed. No separate mid-tier cache needs to be installed. CRC is available to any “Thick” client that uses Oracle Client libraries, such as drivers for Python, Node.js, Go, PHP, Rust, Ruby, and Oracle’s C API. It is also available in JDBC. This blog post demos an example in Python.

CRC unlocks performance (Photo by Maria Ziegler on Unsplash)

Benefits of Client Result Caching

Can be used without needing to modify application code
Improved query response time
Statements are not sent to the database to be executed
Better performance by eliminating server round-trips
Improved database server scalability by saving server resources
Automatically managed cache invalidation, keeping the cache consistent with database changes
No mid-tier cache server required
Developers don’t need to build, or use, a custom cache

What is Client Result Caching?

The Client Result Cache (CRC) is a memory area inside a client process (i.e. the application process) that caches SELECT query results. Its presence is invisible to application code, which just executes queries and gets results as normal. The cache is internal to the Oracle client libraries used by the application. The client libraries know whether a query’s result set is already cached. If so, the SELECT statement gets the results from the cache immediately without involving the database. If the results aren’t in the cache, then the SELECT is sent to the database for execution — which has obvious network and database costs, and contributes to overall system load. Magically the client libraries know whether the cache is outdated and needs to be refreshed. This makes CRC very attractive because application logic doesn’t need to be changed to give performance benefits.

Architecture diagram showing that repeated queries get data from the client cache without interacting with the database. — Connections in the process share the Client Result Cache

Oracle Database has supported CRC for many versions. When we first released it back in the Oracle Database 11g timeframe, we ran the ‘Niles benchmark’ and saw these material improvements:

Benefits of Oracle Database Client Result Caching in the Niles benchmark

How does Client Result Caching work?

Oracle Client libraries manage a result cache for each client process. It is shared by all sessions (i.e. connections) inside that process. The cache can be enabled, and its size specified, with a database initialization parameter.

Client result caching stores the results of the outermost query, which are the columns defined by application.

Oracle Database transparently keeps the client result cache consistent with session state or database changes that affect it. When any database transaction changes the data or metadata of database objects used to build the cached result, the database sends an invalidation flag to the client as part of its response to the application’s next round-trip. (A round-trip is defined as the travel of a message from the application to the database and back. Calling each driver function, or accessing a driver attribute, will require zero or more round-trips.)

If the application is idle and hasn’t initiated a round-trip to the database in a certain amount of time, then cached values are assumed to be invalid. This invalidation time is configurable.

Oracle recommends using CRC for queries from small, read-only or read-mostly tables, however some customers have used it for relatively large tables.

Enabling Client Result Caching in Oracle Database

Caching is disabled by default. It can be enabled by setting the database parameter CLIENT_RESULT_CACHE_SIZE to 32K or greater, and optionally tuning the cache entry invalidation time parameter

Truncated by Planet PHP, read more at the original (another 11640 bytes)

Link

PHP: Hypertext PreprocessorPHP 8.4.0 Beta 3 now available for testing (15.8.2024, 00:00 UTC)

The PHP team is pleased to announce the release of PHP 8.4.0, Beta 3. This is the first beta release, continuing the PHP 8.4 release cycle, the rough outline of which is specified in the PHP Wiki. For source downloads of PHP 8.4.0, Beta 3 please visit the download page. Please carefully test this version and report any issues found in the bug reporting system. Please DO NOT use this version in production, it is an early test version. For more information on the new features and other changes, you can read the NEWS file or the UPGRADING file for a complete list of upgrading notes. These files can also be found in the release archive. The next release will be Beta 4, planned for 29 August 2024. The signatures for the release can be found in the manifest or on the QA site. Thank you for helping us make PHP better.

Link

Derick RethansPHP Internals News: Episode 93: Never For Parameter Types (8.8.2024, 18:01 UTC)

Link

Derick RethansPHP Internals News: Episode 93: Never For Parameter Types (7.8.2024, 20:02 UTC)

Link