Interbase

JLabs.Docs.Interbase
Interbase and how we started to use it

Contents

Introduction

Why and how we decided to go the Interbase way

What our first application was about

Reducing database size and other non standard tricks

Balancing the load of the system and using the computational power of client PC's

Advantages and limits of agnosticism

Examples of our typical applications runnig with Interbase

Final remarks, bugs found and some tricks

Introduction

This page was made just to list some experience with Interbase we gained thru about three years of intensive use in rather critical applications. We had no time to tune the english ... This paper is far from complete and even consistent. Anyway it could be of interest for those that are making their decision about what their future database engine will be. As you will find in the sequel we will use often the terms "agnostic", "agnosticism" and the like. Apologies to those for which these terms are having some spiritual value, we simply took the terms as suitable for our discussion.

We want to stress that we have no general recipe and our experience is only a limited one. Thus our conclusions need to be judged with this fact on mind. On the other hand we describe real projects, real failures and real successes. We really struggled with Interbase, ORACLE, WNT and Linux as described later on. Even when the corresponding companies are selling technical support the local representants in our country were not able to solve any non trivial problem we met. This also influenced our decision to use open source products wherever possible. Even with Linux not everything was nice but this system evolved that quickly so that several of our problems disappeared spontantaneously as we installed newer versions.

At the very beginning we were not sure which way we had to go. We hesitated between proprietary products with declared technical support and open source products with no formal support at all. Also our customers were and still are rather conservative and often our products are the only exception to otherwise totally Microsoft Windows oriented broader context in the companies we are working for.

Any decision about the right database engine is influenced unfortunately enough by marketing as well as by not IT people. It is important to know that a database engine is only a tool and that the ideas and theory around which these engines are constructed are mostly rather trivial and well known. Moreover the notion of a relational database is an old one (about 25 years or so). Thus most of the engines are offering basically the same functionality. The bigger, broader or older the product is the more problems with backwards compatibility and the bigger is the petrification of old-fashioned ideas. That indicates that sometimes switching to a new leaner product could be a good idea. We found Interbase to be a rather good tool with extremely small footprint and good functionality.

Now a few words about our company. We are a rather small group creating original software in various branches - our profile can be found on our home pages. Our aim is to implement even huge and critical projects using minimum number of people. We do not believe that good project can be done reasonably when tens or hunderds of programmers are working together. Their productivity decreases rather quickly to zero as their number surpasses some constant varying from 4 to 10. That is why we concentrate ourselves to maximize productivity and re-usability of whatever we ever did. Well we are not alone thinking about what big armies of programmers are really doing in big companies. As far as we know, many important projects originated as one-man or few-man shows. Have a look at how unix, C, troff or other important things were build-up.

Our method can be described as "distilled" abstraction and interpretation. The applications we create are strictly database engine agnostic and even the application in question agnostic. We always want that the application in question could do easily more than requested using always more abstraction that actually needed. It seems strange or even contradictory while it really is not as we explain in the sequel. Agnostic means here "knowing nothing about" or "unaware of".

First we explain the agnosticism with respect to a database engine. We try to write applications so that no particular database engine is assumed. It can be even changed transparently and the application in question will not notice. Actually we are using simultaneously Interbase, ORACLE and Paradox and except for slight differencies (ORACLE does not support multiple BLOBs within one record while Interbase does) the same application can be run with any of these databases without any change. Specifically we are using a kind of abstract interface that is translating every database oriented code using sql constructs only. We will extend the list of databases supported in the future but once Interbase was released as a free product, our motivation to cover other engines decreased significantly. Thus most of our applications can be run without any relation to particular database engine. We even support the fact that one part of the system is using Interbase while another one is pumping data from ORACLE. All looks like if a unified and always the same database engine is running.

Let us turn to the more intrinsic angosticism with respect to the application in question. The majority of jobs we are doing is using Windows based PC's as workstations while the database and other parts of the application are running under unix or Linux on a dedicated server. Typically these are hospital laboratories, small or even huge companies - briefly the kind of their business is very variable. Having limited number of employees we decided to unify all of the applications to one "super-application" that is interpreting a particular application's specification. Thus we are using ALWAYS the same front-end binary that is fed during the start-up by the detailed specifications of what and how it will be servicing. Well that means the words like "account","patient","microb", "surname", "product" are NEVER mentioned in our binary. It is agnostic of any particular data model, it knows no particular working scenario beforehand. Everything including the type of database that is used is written in simple text files as a specification and the binary is interpreting or just-in-time compiling these specifications when needed. It is understood that such a program is able to do majority of the most ordinary working scenarios, it has a sophisticated statistical subsystem as well as graphics. That is browsing thru data, creating and arranging different views, scannig forms, chaining elementary actions into more sophisticated ones etc. More complicated than trivial scenarios are constructed thru a powerfull scripting language. The system handles calmly any inconsistency between the current data model and the existing data and data dictionary within the application. Thus you can use not yet stable data model having different tables and items than you suppose to use in your forms. The system handles such situations doing its best. E.g. you can perform operations as "Here is a set of named values and let every table in the system takes its values as it wants and performs an insert ...". This is practical namely thru the development phase of the project in question.

It is important to stress that the binary in question is generally the only (except for web presentation) front-end visible to our customers. Generally we supply our products as rather autonomous sub-systems aimed to some specialized job. Typically we supply:

Totally database engine and application agnostic front-end written in Delphi.

Totally particular hence totally gnostic application specification.

Adequate database model and configured database server - the database in question is generated thru the above front-end as it translates specifications into the particular back-end database in question even for create clauses.

General and again rather agnostic web based presentation interface.

Two-way Java and Interclient based interface to and from our system / database.

Adequately configured hardware and operating system with automated administering cron jobs.

Our approach is always a compromise. It replaces coding in a programming language by another kind of coding in a specification language. The difference is that the specification language is very powerfull provided the requested application is coherent with the general model we accepted. In such a case the "code" of the application is extremely compact and short. Some of the strong points of this method are:

The binary in question is a very stable one generating nearly never run-time errors. We change it rather seldom only when new inner functions are needed or some hard bugs are discovered. To give you an idea: one of our customers is running 3 years 24h/24h the program in question on many PC's across whole of the country. There are various versions of the binary running including the oldest ones. During this time the data model was totally changed, many new features were implemented and even the database changed from ORACLE to Interbase. The users did not notice. Only some of those that were using the oldest version of the binary were complaining about the fact that they could not invoke some newly implemented features...

We are having equally in our hands "hardware" (the binary) and the "software" (the application in question). Thus potential bugs in the binary can be worked arround in the scripting language and conversely a repeatedly requested and complicated function can be "hardwired" within the binary. This is to some extent similar approach Transmeta is using in its new Crusoe chips. Continuing in this analogy our binary is a kind of morphing software that is translating the application into a native code and executing it then.

Because of the fact that the basic style of the work with the system is always the same the expenses for documentation and training are reduced. In hospitals where we are implementing various particular applications the personnel can easily migrate from one department to another. Basically they are all using the same software that is tuned for even extremely different purposes (micro biology thru neural surgery).

Even rather complicated applications can be build quickly. Construction of a fully functional first iteration of a non trivial system can be question of one or two weeks. The system is improved then step by step but is ROCK-SOLID FROM THE VERY BEGINNING. This is important for those customers that are experiencing frequent changes of their needs or for those that are simply unable to give their needs specification. With technology evolving this is very often the case as you will see later on when some of our applications will be described in details. Actually there are many applications that cannot be defined - they simply change faster than anyone can compile a good definition. Thus they can be defined only by themselves using self-reference. This way is our approach most suitable for.

Technical support of such software is particularly easy and can be done by persons having no training in programming or computer science. This is a very important point because of the fact that many customers need to do slight changes of their software without hiring programmers.

Let us mention also some weak points:

The binary in question is always the same. Even if it is highly configurable it cannot cover all types of applications. Our experience is that about 90 % of the cases can be handled by our approach with good results while the other 10 % of the cases cannot be handled regardless to effort we apply. These 10 % are simply "hardwired" that is to say coded separatedly in C, Java, Delphi or whatever appropriate environment. It can result in an a totally independend application or in an external plug-in to our binary.

Some customers are having very strict ideas of how their future software will behave, what kind of coloured buttons they will press ... Many of them want to reinvent wheel and it is a very difficult job to smash their prejudices out. Building an application always from scrap is waisting of time but the customers sometimes are requesting exactly this approach. What is worse, they are able to accept some limitations provided they are caused by say ORACLE or other company's development tools but they do not accept that easily that we, too, are having some limitations from the very beginning.

On some old machines the interpretation approach can be slow resulting in bad performance. This argument is now very marginal - most of even mediocre PC's are INCREDIBLY fast and the size of the binary in question is only about 2MB.

Practically all customers are not very happy having a similar system to their neighbour's ... They are unable to understand that a system is important because it offers some functions and not because of how exactly the functions are offered. Understood that the implicite way of offering of the functionality is generally acceptable ... Here paradoxally helps the fact that the neighbour also was able to accept a similar system.

Why and how we decided to go the Interbase way

Now we turn to our first experience with Interbase, the description of the applications and the methodology will continue in the sequel. We started to use Interbase 5.1 under Linux having some ideas about Interbase version 4 under Linux as well as under WNT. The job in question was a kind of complete rewriting old fashioned 24h/24h running DOS applications in a Novell based network. The new target were Windows. The customer insisted that also the database must run under Windows. Thus ORACLE was selected first and the rather huge system was rewritten step by step using the above approach (we call it VaxNt for some already unclear reasons ...). Very soon we discovered that the first version of the new system will be used by a limited number of users - simply because of the fact that the old software worked well, was rock-stable and covered critical business. Thus the new system could be applied only when 100 % approved. We switched between Paradox and ORACLE experiencing difficulties with both. While the WNT server was a really good machine (in 1997) having 256MB RAM, 350 MHZ Intel PII and plenty of disks it was never able to run longer than one month servicing a limited number of users generating only a fragment of the target load.

Paradox was unstable because of network environments, ORACLE suffered from bad stability of WNT. The server run nothing else than ORACLE and IIS and was down too frequently. The customer insisted on WNT and it turned to be a bad decision. For testing purposes we tuned Paradox inventing rather sophisticated methods for recovering corrupted database files. The system was fully developed and tested for Paradox and the question of final database engine was left open.

WNT and ORACLE suffered and killed each other on the server and were down sometimes even if they were out of usage ... Thus we were looking for another system. We knew that ORACLE on unix systems worked okay. But we preferred Linux that we already used for file-sharing and web pages servicing. As the very first step we replaced with good results Novell servers by Linux ones (using the same hardware) and we wanted to run the database engine under Linux logically. Interbase v4 was installed but the results were not satisfactory. We felt that both Redhat 4.2 and Interbase 4 were guity and not ripe enought at those times. The situation improved with Interbase 5.1 released. Some fixes on Redhat were needed to limit problems with IP - these problems disappeared with new kernel versions. The waek point was the rather slow communication between satellite PC and server even thru 100Mbit LAN. With new drivers for Windows and new versions of Linux kernel the situation improved step by step. Note: when we started no ORACLE version for Linux was available.

Thus we installed on similar hardware WNT+ORACLE and Linux+Interbase. The nature of the application was such that we were able to run simultaneously both of the configurations with identical functionality. Both of the machines were Intel PII with 128MB RAM and similar disks. The size of database was a limited one - about 100MB only. The performance was roughly the same. The most common action was browsing thru a huge table having about 50,000 of records and resolving joins of several tables each one having about 10,000 of records. We also reloaded the databases using the application itself converting the data to text and loading it back into the tables. Again the speed was similar and the load of the machines too.

ORACLE when installed performed EXTREMELY poorly. We knew that there was a tuning utility available (but not cheap) and thus we experienced blindly with various constants in the configuration files guessing what the meaning could be. With some effort we arrived to run ORACLE about 10 times faster than it run with its defaults. As far as we remember the biggest magic was tuning of memory sizes used for sorting, caching and the like. Really a nasty job ! Moreover, ORACLE did not survive some of our experiments and we needed to reinstall it over and over and, worse, WNT too. Anyway we obtained a running configuration and the battle continued with the sizes of roll-back segments and the like ... Here we deduced that ORACLE is probably using still some very old ideas and is excessively complicated to maintain. On the side of satellite PC's we used W95 and the ORACLE driver was good. Also BDE interface behaved well until we switched to version 8 of ORACLE - it was a real catastrophe. Until now we have no fool-proof BDE interface for ORACLE 8 ... The problem is really in BDE and not in ORACLE.

Honestly speaking, we were never sure whether the latest patches (service-packs) were applied as necessary during iterative reinstalls of WNT. Very frustrating fact was that after application of some service pack we need to reinstall ORACLE - the service pack simply killed ORACLE ... But the same was true when an imaging software was installed. Thus who knows what the problem really was ? The whole procedure of installing WNT and ORACLE was that tiring that we were not able, at least some of us, to speak about WNT calmly.

To the contrary, Interbase behaved well and the only tuning we did was increasing the number of buffers - we used about 1024 buffers and 8KB pages. Adding even more buffers turned out to be non productive. The only problem was with our language (Czech with really devillish accents ...) and it was not that easy to find out a suitable combination of CHARACTER SET (WIN1250), COLLATE (PXW_CSY) and local language driver for satellite PC's (PdoxAnsiCzech). There were really too many combinations and nearly no documentation. Who on evil will ever guess what PXW_CSY could mean ...

Having these two configurations we martyrized the two machines pressing the speeder down. The behaviour was similar in all cases except for the fact that WNT went down from time to time. Also the management of ORACLE under WNT was a difficult job. WNT did nothing else while Linux was our normal server doing all of everyday jobs including compilation of huge sources, file sharing and the like.

Thus we decided to go the Interbase way and stopped the ORACLE one. The major inhibitor on WNT+ORACLE was the unstability and difficult management. The unstability had some common points with memory leaking. When used intensively, both ORACLE and IIS blew-up eating by small chunks and never giving their memory back to WNT. WNT begun to swap and that was the beginning of the end. We need to reinstall WNT several times after such hang-up and we never discovered why.

One thing must by clear. We have no general hint or comparison between ORACLE and Interbase. We simply did our best during a difficult decision making procedure and Interbase proved to be a good solution, at least for our purposes. Our decision was made using experiments of our own. We also tuned the testing system to our application only. In this case and also in other cases we programmed later on we discovered that the difference between Interbase and ORACLE is nearly negligible in performance and Interbase is winner as far as simplicity of maintenance is concerned. The best machine we were having was multi PA-RISC CPU HP Box with 2GB of RAM and ORACLE runnig on this box was not much better than Interbase we were running on a dual Intel PII/350 box with 256 MB of RAM.

Finally, there remains the question about Interbase under WNT. We really had this configuration, too. We only measured performance and discovered bugs in drivers when the only Linux Interbase version was 4. The performace was about the same as under Linux. But we were tired because of frequent WNT crashes so that we never used Interbase under WNT. Here we can say nothing of any value. Maybe W2000 will perform better but for us it is too late. The combination Linux+Interbase proved to be a good solution and actually we are having no motivation to find out some better one. Not until we will see a heavily loaded W2000 box with uptime more than 300 days.

What our first application was about

The customer we did the above job for is the far biggest data operator in our country - Ceske Radiokomunikace. It is also responsible for all TV and high frequency signals distribution and about 50% of GSM traffic is also passing thru this customer's network. Our software is responsible for overall management of all equipment. These are about 20,000 items covering TV emitors, satellite antenas, Radio-Relay networks, ATM networks, enourmous routers and the like. The traffic is managed thru our software, repairs are planned and contributions or mobile transmissions are organized as well. The system covers also contracts, billing and many of other functions. The system is run on two Compaq machines, each one having one processor PII/450, 256 MB of RAM and plenty of SCSI disks. Both machines are indentically configured Linux Redhat 6 and the database engine used to be Interbase 5.1 and now it is Interbase 6 - in both cases classical architecture. All of the traffic is run on one server while the other is used as a hot swap reserve where the data are stored. The low-level control of the devices is certainly not performed by our system - this is case by case solved by specialized hardware we obtain filtered outputs only from. Some of the equipment has no on-line control. There is another Linux server installed and used to store critical data and logs in the form of flat files. This server is a kind of buffer and it is an ordinary PC with plenty of disks and a DAT tape. This third server is a Redhat 4.2 running non-stop since installed three years ago.

The system is outsourced to our company that is responsible for all repairs, system recovery and namely development of new functions. With new technology and new needs of customers of our customer we are charged to implement minor changes daily and major ones about every two months. The upgrades must be run without system restart or even any forced restart of software run on satellite PC's. That is why our VaxNt approach and the interpretation is important - we can do even deep changes by tuning specifications without changing the binaries. Thus no restart is needed.

Our software is run in a large WAN across the whole country and the network is not always 100% stable. The number of client PC's running the software is about 400 but only a fragment of these are really connecting to database for a non trivial ammount of time. The majority is acessing the data thru an inner Web server with no transient connections. Here good point for Interbase - connecting to it is really fast (unlike ORACLE).

The Linux server is running typically about 30 simultaneous gds_inet_server processes - the other connections are polled by the application itself that is disconnecting when not used for a defined ammount of time and connecting again when necessary. It is also running about 100 smbd processes servicing shared disks. With this there are several httpd and sshd daemons and that is all. The whole system is administered automatically using cron jobs and saves usable images of the database every hour, every day of week and every month in year. The recovery is backed by a proprietary solution - mirroring and other methods proved to be unusable. Shortly spoken, the system is making extensive logging on the application level and in case of destruction it can replay the logs on the application level again. But this is out of the scope of this paper.

The average long-term load of the system is about 20% and drops do 6% when the cron jobs are blocked out. Every hour the load ammounts to about 99% for several minutes during administering jobs but the final users do not notice. These administering cron jobs are niced so that they run in the background.

The ammount of data stored is rather big. All of about 20,000 devices are generating some data daily, all break-downs are monitored fully detailed and all customer related or billing related data are collected ... The average inflow is about 50,000 records daily - most of them are inserted in the background on the Linux side without any manual intervention of the operators. There are about 1,000 manual interventions a day (manual input of data) and more than 100,000 sql queries are resolved daily - most of them being automatic refreshes of screens at dispatch centers (7 units equipped by 3-6 PC's each). To sum-up, the system is every second answering nontrivial queries, making inserts, deletes or updates. Note: speaking about GSM it must be clear that we do not track the traffic upto final users, from our point of view the GSM provider is only one big customer using some of our equipment to distribute its signal. The same is true for data distribution using high speed networks. The peek predictable insert/update rate is about 2,000 per minute during the daily cleanup and the peek rate detected so far was about 60,000 during several minutes long technology breakdown causing an unexpected flooding of messages.

The stability seems to be good. Linux itself never came down except when hardware was upgraded / maintained. The typical uptime is 300 days and more. Interbase was a bit less stable having from time to time problems with corrupt indeces. We feel that the problem is in unstable network and every time the problem arrived there were also heavy difficulties on routers or so. Also W95 IP implementation is guilty - all of the PC's are running W95 and are going down rather frequently because of WinWord, Excel or IE run-time errors. Thus the problems with Interbase did not influence the customer simply because of the fact he was influenced much more by Windows related problems. As the PC's were not able to run at all, no problem they potentially could not connect to Interbase.

The repeair time needed to recover from corrupt indeces was typically several minutes and the response was always qualified as good by the customer. The problem was infrequent with period of about 2-3 months.

There are some problems with Windows worth of mentioning. In large installations there you rather often find a PC with Windows system files corrupt. Such a PC can generate false locks a often it finds the error "Another user changed record" ... Reinstalling Windows resumes normal function. Windows 98 proved to be more stable, WNT were used seldom to report some significant data.

With rather big inflow of data - milions of records - we needed to develop some methodology of how to store these data while enabling low recovery times for potentially corrupt database. It is not an easy task to backup databases having many GB of data. It is a time and space consuming procedure. If you want to create consistent system snapshots every hour, you must complete total backup every hour ... How we did it is described in the next section ...

Reducing database size and other non standard tricks

The data collected in technological databases are seldom updated. Most of them are stored once for ever. Only a kernel, generally configuration data, is heavily updated. With a Radio-Relay data channel you have some configuration data (configuration, technical parameters, capacity, location, traffic attached ...) that are changing rather seldom. Then you are having actual status data, maybe breakdown parameters. And the device can generate a stream of signals. This stream of signals and events represents the total majority of data related to one device. These streams are normally filtered so that only an interesting part remains and infinite sequences ot type "YYYYMMDD HHMMSS: operation continues OK" are thrown away.

Even frequently changed data (today's traffic and events) become less and less frequently changed as the time passes. The data originating last month will nearly never be changed once computed, analyzed and stored.

Thus one can break the database into frequently changing or frequently accessed kernel part and a huge archive. Here one can use with good results the fact that Interbase is supporting external files as if they were database tables. Thus the streams of signals can be stored in database tables first and when old enough they can flow out to external files. This "purge" can be done say monthly. The GREAT advantage is that you need to backup such data only once a month ... The external files can be used with extremely good results as buffers catching unexpected flooding of records. Appending lines to a text file is by an order of magnitude faster than any insert into a database table could be. Thus you can calmly catch the data and let the database chew them at its own speed. The trick is that the flat file appears exactly as a table. Only deleting from such buffer tables is a bit tricky - must be done on files system level by an UDF or so. The application pumps the data from the buffer external text file table into a real one as if both were regular tables and is unaware of the peek speed of incoming data.

To our knowledge one can access such external tables with good response times even when having milions of rows. It is important that external tables are having no indeces ... It is essential to find out a good compromise - normally nearly no one is accessing one year old data concerning signals of one particular device. And it if is the case, the person in question could simply wait one minute or so.

There is another trick for saving database space. In medical applications there are very often results generated - text files describing in some readable format the exam resuls or summary. A typical micro biology laboratory can generate about 100,000 of such results a year generating a few GB of disk space. You can zip such data before storing them into BLOB items. However, the typical gain is not that good. Using similarity between such results you can obtain at least 10-times better compression when compressing say 1,000 of such results being stored in one BLOB. Better, you can store it again as an external file and manipulate such zipped arhives using UDF functions. Using this method we reduced about 12-times a large medical database. Again we are using the fact that once a result is generated, it is never changed and can be stored in a different way. And again, the current results can be stored in database tables while the old ones (less frequently or nearly never accessed) can be purged into zipped archives. From the application point of view nothing is changed - the UDF transparently searches for the data if they were already purged out of the current horizon.

Balancing the load of the system and using the computational power of client PC's

Here we have only a few comments. Generally there are two extremal approaches. Using totally client-server model everything is done by the client software and consequetly even incredible ammounts of data are being transferred thru the local or even broader computer network. The other way is to programm the majority of computations on the server side using stored procedures or other programs that are run on the same computer as the database engine in question. Here the server can be heavily loaded due to the fact that the majority of computation is done by only one machine.

It is not very easy to decide which way is better. On one side you can buy cheap but powerfull PC's - no problem having a 1GHz PC under $1,000. On the other side it is not that easy to buy for reasonable price a server which is about 10 to 20 times that fast.

The client-server model is more suitable for "agnostic" style of programming and to some extent the load is better distributed within the network - every machine computes a bit.

We decided again to use a compromise. Our clients generally do not like to spend huge ammounts of money for servers. Thus the majority of the load is distributed to cheaper client PC's but really computationally intensive tasks are scheduled to the server. The agnosticism imposes that the stored procedures need to be used seldom - the language of these as well as their functionality extremely differ from one engine to another. Wherever possible we try to generate even huge sql-script files and these are scheduled and run on the server. It is in some sense a kind of batch processing.

Advantages and limits of agnosticism

Generally speaking when the application in question becomes stable and well pertified within the customer's company the needs for database-engine agnosticism do drop down. It is not probable that a well functioning system will be moved just for fun to another database engine. In such a case the usage of database-engine specific code (stored procedures,...) need no longer be limited that strictly. As every dogmatic idea the hard agnosticism becomes a problem in such a case.

We try anyway to limit all decisions that suppose a particular engine to an absolute and well based minimum.

As every restriction accepted the agnosticism limits what you can use from the potentially broader functionality your engine is offering. However it can turn to be an advantage. E.g. the migration to newer engine versions is much cheaper - you need to verify only a limited part of your code.

Using only or nearly only the standard and common part of the functionality every engine has is a good general practice. Using more exotic features is always risky. You never know what will be your next customer's beloved engine.

Writing applications with more abstract level that really necessary is also a good practice. Surprisingly you obtain often much shorter and much more compact code.

Our approach is in some way extremely nearly absurd abstraction and interpretation. There are for sure many other groups that are using a similar and maybe more ripe in house methodology of their own. Such an approach requires always a rather homogenous group of people working together. This is a weak point of the method. The higher the abstraction the more different and well based opinions you will have. With decreasing afinty to abstraction you can more easily create coherent and larger groups. One of our objectives was that the customers will be given binaries and simple abstract tools using which they will tune their applications to perfection while paying us some money for it. This goal was achieved only partially, however.

Examples of our typical applications runnig with Interbase

Microbiology application - including sophisticated system for acquired (nosocomial) infections surveillance and other hi-tech components including automated hypothesis formation. Generates several GB of raw data a year.

Data operator dispatching system - very specialized application covering hunderds of database tables and about one thousand of different working scenarios. Really a complicated system, generated tens of GB of raw data and really intensive usage.

Neural surgery - complete system including operation preparation, statistical sub-system, billing ...

Positron Emision Tomography - again a totally original system of overall planning, results distribution ...

Total of 12 other specialized medical applications ranging from scheduling of rehabilitation procedures upto non invasive surgery in gynekology.

Actually our company is running about 15 different applications with Interbase as database engine, in all cases the resulting information system is fully outsourced to our company. For all of the customers the system in question is a critical one and 24h/24h running is required. All of the systems are using Intebase 6 Classical Architecture under Linux Redhat 6. All of the systems are supported remotely thru modems or thru secure channels over internet and most of then are changing functionality rather often.

Final remarks, bugs found and some tricks

As is always the case in the community of programmers, the documentation was consulted only as the last possibility. We mostly jumped into the problem being armed only by some experience. We would like to stress that even writing UDF function under Linux for Interbase turned out to be a simple task and we started simply using the examples in the distribution.

Whenever using another language that english you are a bit disqualified. The tools like isql do not expect the letters with accents even within literral constants. Thus writing INSERT INTO X(NAME) VALUES('Řehoříček Blažej') is tiresome if impossible - speaking about Linux and unix generally. You will even find difficult to write this statement on an ordinary keyboard at the console. If you have a huge ammount of text data that is to be thru a cron job inserted into database tables, then the most ordinary way is to generate a huge sql script and pass it to isql to do the job. With such devil letters inside the data you cannot do it this way. The problem can be solved using again external files. You declare an appropriate external table (suitable charset and collate) and the data insert simply by writing them as a fixed column formated text. Then using a one-line sql command you pump the data from one table to the other. Here you must imagine that a hard core unix programmer or administrator is writing specialized C-programs only if the problem cannot be solved at script level within shell by commands as sort, cut, sed, awk and similar ones chained in a pipe.

There was one hopeless moment when one of really important databases was corrupt in some deterministic way so that even older backups tend to coredump all standard tools (isql, gbak and gfix). The database was in a 24h/24h usage. Even the documentation was not clear but finally we saved ourselves using -G,-I and -IG switches for gbak. The repair tool gfix is not docummented well - now with sources available we will write our own version of gfix and gbak so that even largely corrupt databases will be partialy recovered. It is a very frustrating moment when a recovery tool coredumps ...

Creating backups of large databases consumes large ammount of disk space and it will be helpfull if gbak could read / write to pipes instead of to files only. Then compressed streams could be used and this will save normally about 80% of the disk space. Actually you need gzip the backup files only when they are fully created ...

This item relates to the classical architecture. We found a configuration running two gds_lock_mgr processes. Along the isc_lock1* file blown to an incredible size. Killing both gds_lock_mgr a removing the isc_lock1* file helped but we do not now what was the real cause of the problem. At the same time one of the PC's was corrupt and needed total reinstall of Windows. We think that the handshake between the driver on Windows side and the server is not absolutely perfect. Maybe Linux itself is guilty. The problem was found in one installation once thru 2 years of non-stop run.

Wild gds_inet_server. Due to errors of any kind you can from time to time find gds_inet_server consuming 99% of the CPU time while doing nothing of importance. You can simply kill it in the quasi totality of cases. We think that the process is a kind of zombie left by abrubtly destroyed IP connection. We were using a cron job that was analyzing via "ps ax" the list of all gds_inet_servers and killing the wild ones if they look strange. With the version 5 and 6 this problem disappeared.

Allowing users to write sql queries of their own can result in totally blocking the server. The users will for sure join multiple tables without imposing some conditions on keys ... In such a situation Interbase is creating enormous work files (to be found in /tmp). We solved this by resizing /tmp to at least 1GB and invoking a cron job that is purging old temporary files in /tmp. Old means about one hour. But that depends on the nature of the application in question.

The docs are often inexact. E.g. we were not able to redirect temporary sort files. They were always in /tmp as if hardwired within the binary. The same was true with first version of distribution of version 6 - it installed in one directory while requesting some files (isc4*) in another one. It was only a matter of some guessing to fix this problem.

Writing external programs or UDFs interacting with Interbase was rather simple comparing it to what you need when compiling a C-written program for interaction with ORACLE. Normally the licenses you obtain from ORACLE do not contain the development option, you must request them (and pay) explicitely. Interbase comes as a whole with all features.

There are some absurd bugs in the parser of sql queries. We were having examples where bad implicite casting (varchar to integer) generated incomplete answers - some evident rows were missing. The problem was that the parser did not report syntax errors while producing absurd responses. The problem was fixed using explicite casting.

Construction of aditional indeces helped much in improving the performance. Because of the nature of our applications we were not able to guess off-hand what the frequented queries will be like. Thus we constructed indeces only after having accumulated some information about the real traffic in the system.

This paper was written by Ladislav Janiga thru one night and upon request made by Claudio Valderrama. The author believes that the paper will more help than hurt anyone that was courreageous enough to read it up here ...

Updated 20000930. Comments: Lada.Janiga@jlabs.cz