Indie Heaven
Porting PHP to Node.js


Adventures in Node.JS - Part 1

We've become big fans of Node.js

Why Node?

The problem that Node addresses, is web sites are fundamentally single-threaded. You can't just connect an in-office business app to a web site - well, you "can", but it won't behave the same way it does in the office. A web server serves up web pages, that's its purpose in life. The transaction duration for serving up a web page should be short, it should be milliseconds rather than seconds. The problem arises when a web page has to do a lot of internal work, like sifting through a large volume of database records. If the web server is asked to do that "while" a web page is displaying, nothing else will happen in the meantime, no other web pages will display and the browser will dutifully wait till the thread is available - which could be several minutes, if you're trying to do a monthly financial rollup or stream some high resolution video.


Adventures with Node

We recently ported a large mission critical business app from PHP to Node.js. In PHP, everything works pretty much as expected, from a sequential and procedural point of view. However the business app does things like mass mailings, and in some cases the mailing lists can be thousands of entries long - which takes a while. Or, the monthly rollups are great examples, they typically take several minutes in real time.

If you're trying something similar (porting a businss app to a web site), here is what we can tell you up front: the bulk of the work is in the screens. PHP screens are ugly, you have all kinds of "echo" statements that need to be removed. And, Node is capable of serving up ordinary HTML pages (with the assistance of a rendering engine), but that's also ugly, and cumbersome. Later we'll show you some nifty tricks for delivering the HTML stream to the web browser. But let's continue talking about the "on-line transaction processing" (OLTP) aspects of this scenario.

Asynchronous but still Sequential


If this is your first time with a large project in Node.JS, let's review some of the basics.

PHP is sequential code, it's like PL/1 or Cobol or Fortran. You give the computer a linear list of things to do, one after the other. Usually in the transaction processing world, a screen consists of some database records gathered up front, followed by a display, followed by some action(s) afterwards.

In Node.js, things are different. Node.js is asynchronous, it's event-driven. So, if you've spent 30 years coding C, you're going to be a bit mystified at first. "How do I make sure things happen in order, if everything just returns right away ?"

Let's review some of the basic programming patterns that will help us port sequential code to the asynchronous event-driven world of Node.js.

First, it is important to be clear on what "asynchronous" really means. With sequential code, a subroutine doesn't return till it's finished. In Node.js, a subroutine can be asynchronous, meaning that it returns right away, before doing any work. The idea with Node.js is, you kick off a job and it lets you know when the job is done.

How does it let you know? There are several ways. One of the most useful ways is with a "Promise". That means, when the subroutine returns (right away, before doing work), it returns with a Promise, which is a data structure you can use to "wait on completion" of the work being done.

For example - the purpose of a screen is to display something to the user. It usually does not matter if the display is presented "right away", some small delay is usually tolerable. So what you can do is, kick off the display as an "asynchronous subroutine", and then wait for the Promise indicating it's been completed.

Promises are minimal synchronization mechanisms, they're not in the same class as semaphores, however they do bear some resemblance to message-passing patterns. Typically in a Node.js app, one must understand that "everything" is asynchronous. There is rarely if ever a need for truly synchronous operation, and in such cases special handling is required. The best advice we can offer is: let Node do its thing. Our experience with Node.js so far is, it's very robust. It doesn't necessarily have all the high-power debugging capability that C language has (especially when used in or with Docker containers), so be ready to put some "console.log" statements in your JavaScript.


Porting a Transaction Stream


Let's consider a concrete example. Let's say we're trying to update a User Profile. On the screen we have a number of fields containing the user's name and address and so on. Usually we got to this screen from a URL, the user either clicked a button on some other screen to get here, or typed in the URL directly. So, when we write the routine that handles this screen in Node.js, we write the whole thing as an asynchronous "job". We're going to kick off the job, and then it's going to let us know when the results are ready.

How we do this is:

exports.routine_name = async function(args)
{
        // function contents
        //
        //   1. gather database records
        //   2. check for errors
        //   3. paint the screen
        //
        // etc
}


The whole screen-handling operation is kicked off as an asynchronous "job". The goal of the job is to display the screen, and we only really care about the timing or results if there are errors that render the screen undisplayable.

Node.js gives us two data structures we can use, called Request and Response, that are passed any time a route is invoked. (Route means URL, and we'll talk about that in detail in a subsequent blog entry). We will pass these two data structures into our asynchronous routine, and the reason will become clear in moments.

So our routine now looks like this:

exports.routine_name = function(req, res, other_args)
{
        //
        // work_to_be_done
        //
        // part 1: retrieve database information
        //
        // getDatabaseInfo(req, res, other_args);         this won't work, because it'll return before the data is ready

        const result2 =
await getDatabaseInfo(req, res, args);         // we have to do this instead
}


Let's consider the profile screen in this context. The first thing we have to do, is gather the existing profile information from the database. And we can't display anything till the information is available.

So we will put the following code into our asynchronous handler, instead of calling the database routine directly:

        const result2 = await getDatabaseInfo(req, res, args);

And then we will write the getDatabaseInfo routine as follows:

getDatabaseInfo = (req, res, args) =>
{
        var qstr = "SELECT * FROM some_table";

        return new Promise((resolve, reject) =>
        {
                dbpool.pool.query(qstr, (error, results) =>
                {
                        if(error)
                        {
                                console.log(error);
                                // return reject(error);
                                results = "";
                        }
                        else
                        {
                                console.log('Promise: Got USER results ');
                        }

                        return resolve(results);
                });
        });
};

The coding style takes a little getting used to. We're writing functions using the "arrow syntax". We could have done it differently (in the classical style), but the arrow syntax gives us a convenient way to compress our code and make it a little more readable. (It's actually very friendly when you get used to it).

You'll notice that we've done something clever here. Since we're porting from PHP, we would like our database routine to behave like PHP, in other words, return either a record, or FALSE if there isn't one. You'll notice the line that's commented out, that says return reject(error); . This would be the normal way of handling Promise errors in Node.js, but here we do something a little different, we simply log the error and return a NULL result, and instead of rejecting the Promise we resolve it with an empty result. This generates behavior very similar to PHP, the caller gets either a record or FALSE.

We're writing functions using the "arrow syntax". We could have done it differently (in the classical style), but the arrow syntax gives us a convenient way to compress our code and make it a little more readable. (It's actually very friendly when you get used to it).

Our database routine will not return until the data is available. This is because the query() method itself is asynchronous, it returns a promise that's only fulfilled when the request is completed. And, in our upper layer routine, we wait for completion by "awaiting" the Promise.

How do we guarantee that multiple asynchronous operations can be performed in order?

Simple. We just wait for the appropriate Promises in the right order. For example, here is a routine that "sprays" the database with simultaneous requests, and in this case we need them to be performed in order, so we will wait for the Promise results in order.

try
{
       const userData = await getUserRecordPromise(req, res, usename, password);

       console.log("get_data: GOT USER RESULTS : ******")
       console.log(userData[0]);
       console.log('******')

       const userid = userData[0].userid;

       const accountData = await getAccountRecordPromise(req, res, userid);

       console.log("get_data: GOT ACCOUNT RESULTS : ******")
       console.log(accountData[0]);
       console.log('******')

       console.log('*********')
       console.log('get_data(): FINISHED, PROCEEDING TO DISPLAY')
       console.log('*********')

        exports.do_display(req, res, userData, accountData);
}
catch(error)
{
        console.log('Catch error - get_data')
        console.log(error)
        errorHandler.paintAnErrorScreen(error)
}


This accomplishes the goal of having things done in order, without having to care about how long each step takes. The above code will wait for the User results before retrieving the Account records.

In real life we would check the error conditions more thoroughly, like if the User record can't be found we might not call for Account records at all. userData[0] is the first User record, which doesn't exist if no record can be found - so we can check for an empty string returned from getDatabaseInfo(), and thereafter we can check userData.length to make sure it's > 0

Back to the Console

Back to the Home Page


(c) 2024 Indie Heaven LLC
All Rights Reserved
webmaster@indieheaven.io