Official pgLOGd Home
Download the Current Version: 2.3 Release
Latest News:
April 29, 2006 pgLOGd End of Life
I have decided that further development of pgLOGd is not something I have time for, mostly because any time I would spend on pgLOGd I'm devoting to its successor, dbWebLog.
March 17, 2006 pgLOGd does not work with Postgres 8.x
I have received reports that pgLOGd does not work properly against Postgres 8.x. This is not really any big surprise to me since it has been over 2 years since I have actively worked on it. Sorry, sometimes that's the way it goes with a one-man Open Source projects, mostly because needing to eat (i.e. make real money) takes priority, as does having two children in the last 3 years. I'm sure the problems are due to the Postgres C-API changing between major releases, but I have not had time to confirm this or update pgLOGd.
Description
pgLOGd, simply put, is a program that takes web server (Apache) log entries and sends them to a database. It is called pgLOGd because of the database it was designed to function with, PostgreSQL. PostgreSQL is sometimes abbreviated as pg, this program LOGs entries, and it runs as a daemon (hence the d).
Who should use pgLOGd?
Almost (see requirements) anyone who runs a web server can use pgLOGd, however, sites that need 24x7 up-time on their web servers, or who run many virtual hosts, will benefit the most from pgLOGd.
What does it cost?
Nothing, it's Open Source (basically free.) The code is released under the BSD Open Source License
Requirements
In this version (2.3), the requirements are as follows:
A PostgreSQL database installation.
A Web Server capable of writing its log entries to a file and which has a customizable log entry format. pgLOGd was developed with Apache in mind and that is the recommend web server.
A C compiler. After all, you get the source! The GNU gcc compiler is recommended (comes standard with most U*IX OSes.)
A multi-tasking OS, FreeBSD is my preference and the development platform.
Why PostgreSQL?
Three primary reasons:
The reason that most of us use the things we do, because we like them! I like PostgreSQL, so I use it. If you don't like PostgreSQL, I would dare to wager that you have never used it...
PostgreSQL is fast. Yes, I said fast! Like Ford™® says:
Have you driven PostgreSQL lately?
If not, I highly recommend you check it out at PostgreSQL's homepage.It provides the asynchronous connection and query processing needed to make pgLOGd robust and fast. See the Features and README for details.
Where do I get pgLOGd?
Download the Source.Features
Here I will list the prominent features of pgLOGd and briefly describe each one. For more detail about these features and how they came to be, see the pgLOGd README.
Database logging. The primary feature that gave pgLOGd its name. Instead of writing log entries to a file, pgLOGd writes them to a database. The advantages of this method over logging to a file are explained in the README.
Fast. pgLOGd was designed to be as fast as possible, just as if the web server was logging to a file.
Robust. pgLOGd is smart and will attempt to recover from errors, network failures, and database problems when possible.
Fall-Back-Logging. This is part of the robustness of pgLOGd. If the connection to the database fails for any reason, pgLOGd will begin to write the log entries to a temporary overflow file. pgLOGd will then attempt to re-establish the database connection every 30 seconds (configurable) until it restores the database connection. At this time pgLOGd will begin processing the overflow file entries to the database during idle moments. The overflow file will also be used when entries are coming in from the web server faster than they can be set to the database, which is how pgLOGd achieves its [as fast as writing to a file] speed. Then during slow periods the entries will be processed from the overflow file and sent to the database.
Non-blocking. This is also what allows pgLOGd to be extremely responsive and fast, it never waits for anything to finish. Simply put, pgLOGd will ask the database to store an entry, then come back later to see if it was successful, instead of waiting around for the database to finish. This allows pgLOGd to continue processing entries from the web server while previous entries are being written to the database. The non-blocking functionality of pgLOGd is also one of the reasons why PostgreSQL was chosen, because of its excellent asynchronous connection and query processing provided by its C-API.
Small system overhead and resource usage. pgLOGd runs as a single daemon process and typically uses less that 128K of memory.
Flexible. pgLOGd was designed to be configurable and as system independent as possible.
Installation and Configuration
Setting up and operating pgLOGd is very straight forward, however there are some areas that may leave you asking why? Please see the README for complete details on the reasoning behind pgLOGd.
Assumptions
Never assume! ;-) But, I will assume:
You know how to unTAR the source.
Use a text editor.
Use a C compiler.
You are installing on a U*IX OS.
Your Apache web server is up and running.
Your PostgreSQL database is up and running and is version 7.1 or greater. DO NOT run pgLOGd on a PostgreSQL install less than 7.1! You have been warned.
You have root access on the system to run pgLOGd.
Precompile
Edit the Makefile if your PostgreSQL installation is not in /usr/local/pgsql.
PostgreSQL Backend
Create an entry in the /path/to/postgres/data/pg_hba.conf file to allow connections from what ever machine your web server is running on. If your postmaster and web server are on the same box, this step can be skipped.
Make sure your postmaster is running with the -i option, only needed if your web server is not running on the same box as the postmaster.
Create a database, default name pglogd, but call it what you want, and create the required tables within the database. An SQL script is included with the source pglogd_tables.sql to do this:
# su - postgres
$ createdb pglogd
$ psql pglogd < pglogd_tables.sql
$ exit
Compiling and Running pgLOGd
To build the pglogd and pglogctl binaries:
# make
Solaris users:
# make -DSOLARIS
OpenBSD users:
# make -DOPENBSD
Copy them where ever you wish:
# cp pglogd /usr/local/sbin/
# cp pglogctl /usr/local/bin/
Edit the pglogd.conf configuration file and copy it where ever you wish:
pgLOGd default options are as follows:
The database name will be: [
pglogd]The database user is: [
postgres]The database connection is: [
local, trusted host]PostgreSQL is installed at: [
/usr/local/pgsql]The log file will be written to: [
/var/log/pglogd.log]The FIFO file will be located at: [
/path/to/apache/logs/pglogd_fifo]The overflow file will be located at: [
/path/to/apache/logs/pglogd_overflow]Log overflow warnings every (n) entries: [
100]Wake up at least every (n) seconds and check the database connection: [
30]Strip virtual hosts back to top level domain only (1=true;0=false): [
1]
The name of the configuration file is not important, however, make sure you pass the same name to pglogd with the -c parameter.
# cp pglogd.conf /usr/local/etc/pglogd.conf
Start pglogd:
# /path/to/binary/pglogd -c /path/to/config/file/pglogd.conf
If there are any problems you will get an error message. If the daemon has already been created then there will be no terminal and errors will be written to the error log, typically in /var/log/pglogd.log unless you changed the location.
Make sure that pgLOGd is always running before Apache and make sure it is shutdown after Apache!
Apache Configuration
Add these lines to your httpd.conf file:
LogFormat "%t %T %>s %b %m %v %h \"%U\" \"%{Referer}i\" \"%{User-agent}i\" \"%r\" %l %u" pglogd
Add this entry for each site you want pgLOGd to record log entries for:
CustomLog "/path/to/apache/logs/pglogd_fifo" pglogd
If you changed the location and name of the FIFO file, make the adjustment here as well.
Restart Apache, usually done with:
# /path/to/apache_binary/apachectl graceful
Contributed Notes
Linux RedHat-7.1 notes provided by Calvin Dodge
NOTE: The addition of the -s flag in 2.1beta should solve the startup problem described below.
*** advice for Red Hat users (assuming they installed PostgreSQL from RPMs) ***
Be sure to have postgresql-devel installed.
In the Makefile, change the following lines:
"PGDIR=/usr/local/pgsql" to "PGDIR=/usr"
"CFLAGS = -I${PGDIR}/include" to "CFLAGS = -I/usr/include/pgsql"
Pglogd needs to be running before Apache starts, but after PostgreSQL starts.
Unfortunately, the RedHat scripts start up Apache before PostgreSQL (they have the same start numbers (85), and "httpd" is less than "postgresql").
My solution was to:
Edit /etc/init.d/postgresql - changing "chkconfig 85 15" to "chkconfig 80 20" (so it will start before and shutdown after Apache)
chkconfig --del postgresql
chkconfig --add postgresql
chkconfig --level 345 postgresql on
chkconfig --level 0126 postgresql off
Create a pglogd script for the /etc/init.d directory. I copied the apache script, and edited it to remove irrelevant information, and to provide for "start", "stop", and "restart". Its chkconfig line has the numbers "83 17", so it will start after PostgreSQL and before Apache.
chkconfig --add pglogd
chkconfig --level 345 pglogd on
chkconfig --level 0126 pglogd off
Using pgLOGd
Syntax:
pgLOGd [-s] [-c <config file>]
| Option | Description |
|---|---|
-s | Start pgLOGd without attempting to make a database connection. This can be usefull when you need to start pgLOGd at start-up, before the database is online. |
-c <config file> | Specify the full path to the configuration file. |
To start:
# /path/to/binary/pglogd [-s] [-c <configuration file>]
To stop:
# kill -TERM [pid]
NOTE: The overflow file only contains log entries that have passed parsing and are not checked again prior to processing. Adding entries to the overflow file manually is not recommended! If you decide you just have to do this, the format requires one pgLOGd style entry per line, each line not more than 16,768 bytes in length, and each line terminated with a single new-line character (character 10.) A better way to get log entries into pgLOGd manually would be to simply dump the file to the FIFO:
# cat [some_file_in_pgLOGd_format] > /path/to/apache/logs/pglogd_fifo
You can do this any time pgLOGd is running, however, I would not recommend doing this with really large files during peak traffic times.
pgLOGd will write errors and other information to its log file, and it should not be too noisy as far as logging goes.
Using pglogctl
pglogctl can be used to generate log files from the pgLOGd database. The output format is the standard Combined Log Format.
Description:
Moves records from the log entries table into the temp entries table which has indexes. Also facilitates creation of Combined Log Format log files from the temp table.
Syntax:
pglogctl [-o | -m | -d | -p] domain startdate days
| Option | Description |
|---|---|
-o | Output Combined Log Format from the temp entries table |
-m | Move records from log entries table to the temp entries table |
-d | Delete records from the temp entries table |
-p | Print values to be used based on command line options |
domain | Domain to use, required |
startdate | Format: mm/dd/yyyy or mm.dd.yyyy. Start time will be 00:00:00 and end time will be 23:59:59 |
days | Number of days, including the start date, to process. Value can be negative. |
Performance
The initial tests here were done prior to the implementation of overflow logging and asynchronous non-blocking operation. Needless to say pgLOGd was capable of keeping up then, it should not have any problems now. Note, this test is very old.
Test Results
Tested on a Dual P2-333, 128MB RAM, 9GB SCSI (Seagate Barracuda)
A quick insert test indicated that PostgreSQL can do about 11 to 28 inserts per second, depending on record size and number of indexes. Even at 11 per second, that is still over a million hits per day on a web server, so there should not be to much trouble with pglogd keeping up even on a heavily loaded server. Also, since the entries table does not have any indexes, the high end of the performance curve is realized, which means pglogd can easily keep up with a very heavy traffic site.
If 28 inserts per second is not enough, then start your postmaster with these two options:
-o -F
That will shut off fsync and the inserts per second jump to about 800 per second!! What you lose is the ACID reliability of PostgreSQL by turning off fsync. But unless you are getting over 2 million hits in a 24 hour period, you should not have to do that.
README
Here I will attempt to explain why pgLOGd exists and the reasoning and theories behind its madness. If you agree, disagree, or have insight to share, by all means please don't hesitate to Email me.
Contents
- The Problem
- Configuration and Rotation
- Solutions
- Review
- The Design of Something Better
- A New Log Format
- Theory of Operation
The Problem
pgLOGd was, like many things, developed to resolve a problem for which there did not seem to be a complete solution. The problem is with the routine maintenance and configuration of the web server logs, particularly:
The configuration of logs for many virtual sites on a web server.
Regular log file rotation, archiving, and making them available to clients or statistics programs.
There are also several smaller problems to which pgLOGd currently addresses or to which it will address in the near future:
Generating log files in various formats.
Generating multiple log files for each virtual site.
Creating log files that are broken into certain date periods.
Generating useful statistics based on the logs.
Providing timely or real-time statistics based on the logs.
Each of these points undoubtedly has a solution available, in one form or another, but it usually requires several command line utilities, CRON jobs, and administrator time to accomplish the tasks. Not to mention the frequency (daily, weekly, monthly) that the tasks require.
Configuration and Rotation
Configuration is not too bad if a system has already been devised, put in place, and is consistent. For example, names and locations of log files has already been decided, policy for rotation based on allowed disk space and bandwidth has been determined, and access to log files established. This can easily become quite an administrative chore (or nightmare) as the site count increases.
Log file rotation was the primary motivator for the creation of pgLOGd. Sites will undoubtedly have more or less traffic than other sites, so when do you do rotation? On a busy site you might have to rotate the logs hourly, every 12 hours, or daily. On a smaller site, maybe once a week or once a month is sufficient. Also, when do your customers expect the logs to be available? Up to the minute (not possible with standard logs), twice a day, daily, weekly? So, now there has to be a policy and schedule set up for each site based on the site's traffic and customer expectation.
Solutions
The Apache Group does not provide any built-in solution, but they do provide all kinds of hooks and options (modules, external programs, excellent server configuration, etc.) One solution to the problem is to use the Apache option that lets you pipe the log entries to an external program, like cronolog. Cronolog is a nice little program that will automatically write the log entry to a file based on a date scheme that you can configure. This was the savior, cronolog seemed to solve all the problems:
No stopping or restarting of the web server was necessary.
The logs were automatically rotated into files based on dates.
Each log file for each site could be configured independently.
Does not require recompiling of Apache.
But, Cronolog has some drawbacks:
If the web server receives no traffic for a site then no log entries are generated and cronolog may miss a file in a sequence. Example: Say you need log files every 8 hours because your stats program fires up and looks for a file to run against. Now, say it is between midnight and 8am Sunday morning and your web site did not get a single hit. In this case cronolog will not have generated a log file for that time period, so if you didn't tell your stats program that a file for any given period may be missing, then the stats program will probably get mad and fire off a lot of errors or mess up your stats for that period. It could be worked around, but every solution was a bigger pain than the previous.
The other issue that I could not accept was that for each web server child process that was spawned, two additional system processes were created as well! One for the shell that Apache starts to run the external program (cronolog in this case), and the program itself. This is not specific to just cronolog, but to any program you pipe the log entries out to. I already have enough processes to keep track of and I didn't need another 30 to 250 processes (depending on traffic) floating around, not to mention taking up CPU time, file descriptors, memory, etc.
UPDATE about this point. I have since learned that if you pipe all your logs, virtual or not, to the same file, then Apache only spawns two processes for all the child processes. While this is good for limiting the process count, you are left with a single log file containing all the entries for all the sites running on the web server.
Review
A quick review of where we are and how we got here:
The primary motivation for writing pgLOGd was a need to rotate Apache log files without stopping the server. Several other solutions were looked at and decided against:
HUPing the server after moving the log files. Not too bad, but can be an administrative chore if you have many virtual sites to maintain.
rotatelogs. A program that comes with Apache and looked like it was going to be a solution. It uses Apache's CustomLog pipe feature to send the log entries to some other program for processing. The downside of the pipe feature is that for every Apache child process (of which there could be many) you end up with two new system processes on your web server, one for the shell that Apache starts to run the external program, and the program itself. On a production server that can lead to a lot of additional processes and used resources (memory, file descriptors, CPU time.)
cronolog. This was a little better than rotatelogs but still suffers from the same side affects (two new processes for each Apache child process.) It also suffers from missing log files due to no activity.
Aside from the two additional processes per Apache child, there is the time to start up the additional processes, albeit a rather small time, but on a busy web server every clock tick counts. Nothing seemed acceptable.
The Design of Something Better
There had to be a better way to rotate log files. A way that was similar to writing directly to files, just as fast as writing to files, but that didn't add a bunch of system processes and overhead into the mix. Well, I couldn't find one, so I wrote one. Enter pgLOGd.
I needed something with the following principles:
Be fast, like logging to a file.
Be robust and try to recover from errors.
Small system overhead and resource usage.
Single process.
Non-blocking.
Write to a database for ease of entry processing.
Provide fall-back logging in case of database failure.
Be flexible, work with any system configuration.
pgLOGd was designed and written to adhere to each of these principles, which basically becomes its feature list. With the implementation of the fall-back logging, pgLOGd can handle log entries as fast the web server can send them, just like it was logging to a file, even if the database connection cannot keep up or goes completely down!
A New Log Format
One of the first things you will undoubtedly notice when setting up pgLOGd is the requirement of a custom log format instead of the generally accepted Common Log Format. Why is this? Well, first take a look at the Common Log Format:
Common Log Format (CLF):
"%h %l %u %t \"%r\" %>s %b"Common Log Format with Virtual Host:
"%v %h %l %u %t \"%r\" %>s %b"NCSA extended/combined log format:
"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""
Now take a look at all the parameters available for customizing a log entry (these tokens are for the Apache web server, other web server formating will most likely be different):
| %...a: | Remote IP-address |
| %...A: | Local IP-address |
| %...B: | Bytes sent, excluding HTTP headers. |
| %...b: | Bytes sent, excluding HTTP headers. In CLF format, i.e. a '-' rather than a 0 when no bytes are sent. |
| %...c: | Connection status when response is completed. 'X' = connection aborted before the response completed. '+' = connection may be kept alive after the response is sent. '-' = connection will be closed after the response is sent. |
| %...{FOOBAR}e: | The contents of the environment variable FOOBAR |
| %...f: | Filename |
| %...h: | Remote host |
| %...H: | The request protocol |
| %...{Foobar}i: | The contents of Foobar: header line(s) in the request sent to the server. |
| %...l: | Remote logname (from identd, if supplied) |
| %...m: | The request method |
| %...{Foobar}n: | The contents of note "Foobar" from another module. |
| %...{Foobar}o: | The contents of Foobar: header line(s) in the reply. |
| %...p: | The canonical Port of the server serving the request |
| %...P: | The process ID of the child that serviced the request. |
| %...q: | The query string (prepended with a ? if a query string exists, otherwise an empty string) |
| %...r: | First line of request |
| %...s: | Status. For requests that got internally redirected, this is the status of the *original* request --- %...>s for the last. |
| %...t: | Time, in common log format time format (standard English format) |
| %...{format}t: | The time, in the form given by format, which should be in strftime(3) format. (potentially localized) |
| %...T: | The time taken to serve the request, in seconds. |
| %...u: | Remote user (from auth; may be bogus if return status (%s) is 401) |
| %...U: | The URL path requested, not including any query string. |
| %...v: | The canonical ServerName of the server serving the request. |
| %...V: | The server name according to the UseCanonicalName setting. |
There is quite a bit more useful information available that is not included in the Common Log Format. The first question that comes to mind is why so little information in the Common Log Format? One can only speculate, but it was most likely designed way back when log files did not grow very fast or very big, and when they were probably read by humans. The current state of the Internet makes reading raw log files almost unheard of, albeit unnecessary due to the availability of many free and commercial log analyzer programs.
So why a new format?
So why deviate from the normal, the Common Log Format? The primary reason has to do with parsing the log entry, and other reasons include the need to record some of the other useful information that is not part of the Common Log Format. First the parsing problem, so take another look at the Common Log Format followed by a typical line from a log file:
"%h %l %u %t \"%r\" %>s %b"
10.0.0.1 - - [04/Sep/2001:19:34:59 -0400] "GET /index.html?userdata=badstuff HTTP/1.1" 200 9206 "http://10.0.0.1/index.html" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"
The first parameter is the remote host. Not too bad for parsing, especially if HostnameLookups is off (which it should be for any production or high volume web server) and IP address is very easy to parse.
The second parameter is the remote logname. This is practically useless since visitors to your web site will probably not even be running an ident server. But worse than just being practically useless is the fact that if a person so chooses to set-up their own ident server they can, and that means they can configure the data that would be logged. This is bad. What if I configured my ident server to supply an ident name of:
- - [04/Sep/200
Then the log entry would look something like this:
10.0.0.1 - - [04/Sep/200 - [04/Sep/2001:19:34:59 -0400] "GET /index.html?userdata=badstuff HTTP/1.1" 200 9206 "http://10.0.0.1/index.html" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"
Typically an ident entry is limited to about 15 characters, but that is plenty for me to do damage. Unless a log parser is very intelligent it will get choked up right here and most likely skip the line as invalid. This is because Common Log Format uses space characters as delimiters between the remote host, remote logname, and remote user; but space characters could be part of the ident (and remote user) data. What harm does this do? Well, if I don't want a company tracking me on their web site I could configure my ident server to report a logname similar to the one above and their log analyzer will most likely /dev/null the log entries generated by my activity on their web server. Unless someone digs into the logs by hand, I could bang away at their web site without being noticed, and maybe I'm a cracker trying to gain illegal access...
The third entry, remote user, is just as bad. Again, user collected data right up front in the log entry. Remember, this data is not checked anywhere, it is passed straight from the remote client so it could contain control characters as well.
I'm sure there may be a very excellent log analyzer out there that could overcome these problems, but the pay-off for the protection is speed. Such an analyzer, if possible, would be very slow and that is not an option with huge log file sizes on high traffic web servers. There is a better way...
The pgLOGDd Log Format Solution
Here is the format required by pgLOGd:
%t %T %>s %b %m %v %h \"%U\" \"%{Referer}i\" \"%{User-agent}i\" \"%r\" %l %u
Notice where the remote logname and remote user are placed. The format is designed to be parsed quickly by a computer and log information goes from the most trusted data to the least trusted (data from the remote client.) The format also includes useful data that cannot be attained by using the Common Log Format, such as the time it took the server to service the request. This can be very useful to the system admin so they know when it is time to invest in faster CPUs, more memory, or faster disk arrays.
Don't worry though, pgLOGd's log format is a superset of the Common Log Format, so a Common Log Format log file can be produced to feed into your favorite log analyzer.
Theory of Operation
For those who are interested in how pgLOGd is intended to work, read on.
The problem seemed impossible, and the solution only came after some digging around through man pages and with some consulting of "Advanced Programming in the UNIX Environment" by W. Richard Stevens (RIP). It seems that U*IX, since the beginning of time (well at least BSD-4.2), has had these nice little things called FIFOs or Named Pipes that allow communication between non-related processes.
A FIFO, once created, looks and smells just like a file and is handled as if it were a file (they use the standard open(), read(), write(), and close() file operators.) The only caveats to the FIFO are:
The file size never grows, it is always zero bytes.
The FIFO should be opened for reading (usually by the process expecting data via the FIFO) before any other processes tries to write to it, otherwise the writing processes will block.
So it seemed pretty straight forward: make a daemon that listens on a known FIFO and have Apache log all entries to that FIFO. The FIFO looks and acts like a file so it is fast and Apache treats it just like it was a regular log file. Almost there! The only other problem was that all entries were being written to the same log file (the FIFO), so for web servers hosting many virtual sites there needed to be a way to determine which entries were for which sites. That solution came with Apache's ability to make custom log file formats, and particularly with the %v parameter. That about solved all the problems and it was off to code the daemon.
Initially error log entries were to be logged as well, but since error logs are not configurable and since they can be written by any number of different mods, there is no format. We will have to wait and see if The Apache Group changes this in the near future. Until then error logs will have to be dealt with in the usual ways, but they should not grow to much unless a site it having serious problems.
pgLOGd Operation
This is a verbal description of what goes on inside pgLOGd. For implementation details, consult the source code.
pgLOGd begins by making sure it can access or create each of the fundamental components it requires:
The FIFO. If it exists pgLOGd assumes it is already running and exits with an error. If it does not exist, pgLOGd creates it.
A database. pgLOGd attempts to connect to the back-end database and exits with an error if a connection cannot be established.
The overflow file. If it does not exist, it is created. If it does exists then pgLOGd counts the number of entries in the file and sets an internal counter to indicate that n number overflow entries are waiting to be written.
Currently pgLOGd logs messages and errors directly to a file (instead of to a logging facility like syslog.) No error checking of any kind is done on this log file. Eventually pgLOGd will include options to take advantage of system logging such as syslog.
With these checks complete, pgLOGd next calls fork() to begin the transition to a daemon process. The parent exits and the child process sets itself as the session leader. Next, all inherited file descriptors are closed and three signals are captured:
SIGHUP.
SIGINT.
SIGTERM.
These signals currently all do the same thing, cause pgLOGd to shut down. In the future, SIGHUP will cause pgLOGd to reread its configuration file, and SIGINT may perform some other tasks such as re-establishing the database connection, write its current state to the log file, etc.
At this point pgLOGd enters a select() and waits for any of several things to happen:
The FIFO is ready to be read. This usually indicates that the web server is sending a log entry. This action takes precedence over all other actions, as it is critical that the web server does not have to wait on pgLOGd.
The database connection is ready for reading or writing. This could be due to a query result being ready to process or the processing of establishing the database connection (if it went down.)
The overflow file has entries that are waiting to be written. This is the lowest priority and will only happen if neither the FIFO or the database connection need processing.
Once an action is detected by select(), pgLOGd enters the state logic. A state machine, as it is typically known, is basically a set of logical states that a program can be in at any given time. Not all states are wired to all other states, so depending on certain conditions, only certain actions are possible. For pgLOGd, one of those states might be "database connection down", and from that state pgLOGd cannot get to the "write entry to database" state.
pgLOGd's states are pretty straight forward, and the primary ones are described here:
Nothing happening. select() will block for a predetermined amount of time (configurable), then wake up, check the database connection, and go back to waiting. If the database connection is down, pgLOGd enters the "database connection establish" state which will cause any new log entries to be written to the overflow file.
Nothing happening, overflow entries pending processing. If overflow entries are waiting to be processed, and pgLOGd is not in the "database connection establish" state, pgLOGd will read an overflow entry and enter the "entry ready" state.
FIFO data ready. pgLOGd enters the "read FIFO" state and begins consuming data. If an entry is found, the "read FIFO" state is exited and the "entry ready" state is entered. Otherwise pgLOGd enters the "eat FIFO data" state, which basically causes pgLOGd to consume and ignore all FIFO data until a valid entry is found.
The "entry ready" state will determine where to write the entry based on the database connection and whether or not the entry came from the overflow file. If the database connection is up, pgLOGd will enter the "write to database" state. If the database connection is down and the entry was read from the FIFO, the "write to overflow file" state is entered. If the database connection is down and the entry was read from the overflow file, the "abort overflow entry" state is entered and the overflow entry is re-queued for processing.
The "database connection establish" state blocks all database operation until the state is exited (only possible with a good connection.)
If at any time an unrecoverable error is encountered, pgLOGd will write its current state to the log file and exit. Examples of unrecoverable errors are system function call failures such as: malloc(), read(), write(), and select(). Encountering an error caused by any of the aforementioned functions failing is currently not something pgLOGd can recover from.
pgLOGd stays in the select() loop until one of several events happen. Any of these events will cause pgLOGd to shut down:
One of the captured signals is received.
An unrecoverable system error is encountered.
A signal will cause pgLOGd to perform a "graceful" shut down, meaning it will close all its connections and shut down properly. An unrecoverable error may or may not allow pgLOGd to perform a "graceful" shut down, but it will certainly try.



