Fun with Firebird and Python



Pavel Císař, IBPhoenix

Firebird Conference 2012, Luxenbourg

_images/license.png

Problem Domain

Why Python?

Python & Firebird

Example I.

Analyze database content

After some unfortunate event we have two (or more) copies of the same database where one (or all) has some rows missing, and we need to assemble one good database.

Step I. - Get some data

Get primary keys from single table from both copies as sets.

>>> import fdb

>>> con1 = fdb.connect(dsn='db1',user='sysdba',password='pas')
>>> con2 = fdb.connect(dsn='db2',user='sysdba',password='pas')

>>> c1 = con1.cursor()
>>> c2 = con2.cursor()

>>> c1.execute("select pk from tbl")
>>> c2.execute("select pk from tbl")

>>> pks1 = set(c1)
>>> pks2 = set(c2)

Step II. - Analyze both sets

Step III. - Count the damage

It seems that most efficient is to take everything from db1 and add 100 missing rows from db2.

Step IV. - Moving data

# helper function
def read_rows(c, pks):
  for pk in pks:
    c.execute("select * from tbl where pk = ?",pk)
    for out_row in c:
      yield out_row

# Prepare output
con3 = fdb.connect(dsn='db3', user='sysdba', password='pas')
c3 = con3.cursor()
insert = 'insert into tbl values (?,?)'

# Move data from A
c1.execute('select * from tbl')
c3.executemany(insert,c1)
# Move data from B
c3.executemany(insert,read_rows(c2,miss1))

Step V. - reuse

Example II.

Analyze Firebird log

Which is big (3MB for example) and full of entries, most of them repeating, but some are special.

Typical needles in the haystack situation.

Example Firebird log entries

SOMESERVER    Wed Apr 06 16:33:11 2011
   INET/inet_error: read errno = 10054

SOMESERVER    Wed Apr 06 16:33:12 2011
   Unable to complete network request to host "SOMESERVER".
   Error reading data from the connection.

SOMESERVER    Wed Apr 06 16:36:16 2011
   Shutting down the server with 1 active connection(s) to
   1 database(s), 0 active service(s)

SOMESERVER    Mon Nov 21 01:29:13 2011
   SERVER/process_packet: broken port, server exiting

SOMESERVER    Wed Sep 21 14:45:26 2011
   Database: MYDATABASE
   internal gds software consistency check (page in use
   during flush (210), file: cch.cpp line: 3715)

How to process the Firebird log?

Log-processing pipeline

  1. Feed non-empty lines from log
  2. Group lines to "entries"
  3. Parse entries
  4. Convert entries to dictionaries (or objects)
  5. Query for objects of interests

Log-processing pipeline

Log-processing pipeline

Log-processing pipeline

Log-processing pipeline

Some query helpers

Query the log

As we packaged all in functions in single file (fblog.py), we can:

>>> from fblog import *
>>> show(frequency(fblog()))
>>> show(frequency(normalize_keys(fblog())))
>>> show(frequency(normalize_keys(fblog(exclude="INET"))))
>>> show(full(fblog(only="consistency check")))
>>> show(frequency(between(normalize_keys(fblog()),
                   to_time=dt("14.10.11 10:22:02"))))
>>> show(brief(between(fblog(),d("13.10.11"),d("14.10.11"))))

Too much typing and brackets?

There is a way how to make it more simple, but that's for another session (or an article for IBPhoenix/Foundation newsletter).

Example III.

Firebird log once again

But this time we want to be notified about daily events

Daily Report

from fblog import *
now = datetime.now()
day_off = timedelta(days=1,minutes=5)
last24hours = list(between(fblog(),from_time=now-day_off))

brief = '\n'.join(frequency(last24hours))
full = '\n'.join(full(last24hours))

Sent by e-mail

Example IV.

Firebird log once again

But this time we want to monitor it and see entries as they are written

Python version of tail -f

def follow(thefile):
  # Go to the end of the file
  thefile.seek(0,2)
  while True:
    line = thefile.readline()
    if not line:
      # Sleep briefly
      time.sleep(1)
      continue
    yield line

Simple script

from fblog import *
log = follow(open('/opt/firebird/firebird.log'))
try:
  show(full(fblog(log_file=log)))
except KeyboardInterrupt:
  pass
except Exception as e:
  raise e

Example V.

Firebird Trace Service

You want to know which indices are used to optimize queries and which are useless

Battle plan

  1. Use Trace Service to collect execution PLANs.
  2. Parse collected PLANs to get index names.
  3. Compare collected names with index names from database.
  4. Create report.

Using Trace Service

Script to control services

Last Example

The PowerConsole.

Enhanced python interpreter that can host user defined commands.

PowerConsole

Questions?

Pavel Císař
pcisar@ibphoenix.cz
IBPhoenix
www.ibphoenix.com