1. Why Test an RDBMS?
Agile software
developers take a test-first approach to development where they write a test
before you write just enough production code to fulfill that test. The
steps of test first development (TFD) are overviewed in the UML activity
diagram of Figure 2. The first step is to
quickly add a test, basically just enough code to fail. Next you run your
tests, often the complete test suite although for sake of speed you may decide
to run only a subset, to ensure that the new test does in fact fail. You
then update your functional code to make it pass the new tests. The
fourth step is to run your tests again. If they fail you need to update
your functional code and retest. Once the tests pass the next step is to
start over.
Test-driven
development (TDD) is an evolutionary approach to development which
combines test-first development and refactoring. When an agile software
developer goes to implement a new feature, the first question they ask
themselves is "Is this the best design possible which enables me to add
this feature?" If the answer is yes, then they do the work to add
the feature. If the answer is no then they refactor the design to make it
the best possible then they continue with a TFD approach. This strategy
is applicable to developing both your application code and your database
schema, two things that you would work on in parallel.
When you first
start following a TDD approach to development you quickly discover that to make
it successful you need to automate as much of the process as possible? Do
you really want to manually run the same build script(s) and the same testing
script(s) over and over again? Of course not. So, agile developers
have created OSS tools such as ANT, Maven, and Cruise
Control (to name a few) which enable them to automate these
tasks. More importantly, it enables them to automate their database testing script into the build
procedure itself.
Agile developers
realize that testing is so important to their success that it is something they
do every day, not just at the end of the lifecycle, following agile
testing strategies. They test as often and early as possible,
and better yet they test first. As you can see with the agile
system development lifecycle (SDLC) of Figure
3 testing is in fact something that occurs during the development
and release cycles, not just during release. Furthermore, many agile
software developers realize that you can test more than just your code, you can
in fact validate every work product created on a software development project
if you choose to.
Although you want
to keep your database testing efforts as simple as possible, at first you will
discover that you have a fair bit of both learning and set up to do. In this
section I discuss the need for various database sandboxes in which people will test: in short,
if you want to do database testing then you're going to need test databases
(sandboxes) to work in. I then overview how to write a database test and more importantly describe setup strategies for database tests. Finally, I
overview several database testing tools which you may want to consider.
4.1 Database
Sandboxes
Figure 4. Sandboxes.
6. Introducing Database Regression Testing into Your
Organization
8. Best Practices
There are
several reasons why you need to develop a comprehensive testing strategy for
your RDBMS:
1.
Data is an important corporate asset
2.
Mission-critical business functionality is implemented in RDBMSs.
3.
Current approaches aren't sufficient
4.
Testing provides the concrete feedback required to identify defects
5.
Support for evolutionary development.
Here's a few
interesting questions to ask someone who isn't convinced that you need to test
the DB:
1.
If you're implementing code in the DB in the form of stored procedures, triggers,
shouldn't you test that code to the same level that you test your app code?
2.
Think of all the data quality problems you've run into over the years.
Wouldn't it have been nice if someone had originally tested and discovered
those problems before you did?
3.
Wouldn't it be nice to have a test suite to run so that you could
determine how (and if) the DB actually works?
I think that one of the reasons that we don't hear much about database testing is because it is a relatively new idea within the data community. Many
traditional data professionals seem to think that testing is something that
other people do, particularly test/quality assurance professionals, do. This
reflects a penchant for over-specialization and a serial approach towards development by traditionalists, two
ideas which have also been shown to be questionable organizational approaches
at best.
2. What Should We Test?
2. What Should We Test?
Figure 1 indicates what you should consider testing when it
comes to relational databases. The diagram is drawn from the point of
view of a single database; the dashed lines indicate threat boundaries, indicating that you need to consider
threats both within the database (clear box testing) and at the interface to
the database (black box testing). Table 1 lists the issues which you should consider testing for
both internally within the database and at the interface to it.
Figure 1. What to Test.
Table 1. What to Test in RDBMS.
Black-Box Testing at the Interface
|
White/Clear-Box Testing Internally
Within the Database
|
|
|
3. When Should We Test?
Figure 2. The Process of Test First Development (TFD).
Figure 3. The Agile Lifecycle.
4. How to Test?
4.1 Database
Sandboxes
A common best practice on agile teams
is to ensure that developers have their own "sandboxes" to work
in. A sandbox is basically a technical environment whose scope is well
defined and respected. Figure 4 depicts the various
types of sandboxes which your team may choose to work in. In each sandbox
you'll have a copy of the database. In the development sandbox you'll experiment,
implement new functionality, and refactor existing functionality, validate your
changes through testing, and then eventually you'll promote your work once
you're happy with it to the project integration sandbox. In this sandbox
you will rebuild your system and then run all the tests to ensure you haven't
broken anything (if so, then back to the development sandbox).
Occasionally, at least once an iteration/cycle, you'll deploy your work to the
level (demo and pre-production testing), and rerun your test suite (including
database tests) each time that you do so to ensure that your changes integrate
with the changes made by other developers. Every so often (perhaps once
every six to twelve months) into production. The primary advantage of sandboxes
are that they help to reduce the risk of technical errors adversely affecting a
larger group of people than is absolutely necessary at the time.
4.2 Writing Database Tests
There's no magic when it comes to writing a database
test, you write them just like you would any other type of test. Database
tests are typically a three-step process:
1. Setup the test. You need to put your database into a known state
before running tests against it. There are several strategies for
doing so.
2. Run the test. Using a database regression testing tool,
run your database tests just like you would run your application tests.
3. Check the results. You'll need to
be able to do "table dumps" to obtain the current values in the
database so that you can compare them against the results which you expected.
4.3 Setting up
Database Tests
To successfully your database you must first know the
exact state of the database, and the best way to do that is to simply put the
database in a known state before running your test suite. There are two
common strategies for doing this:
1.
Fresh start : A common practice is to rebuild the database,
including both creation of the schema as well as loading of initial test data,
for every major test run (e.g. testing that you do in your project
integration or pre-production test sandboxes).
2.
Data initialization:
For testing in developer sandboxes, something
that you should do every time you rebuild the system, you may want to forgo
dropping and rebuilding the database in favor of simply reinitializing the
source data. You can do this either by erasing all existing data and then
inserting the initial data vales back into the database, or you can simple run
updates to reset the data values. The first approach is less risky and
may even be faster for large amounts of data.
An important part of writing
database tests is the creation of test data. You have several strategies
for doing so:
1.
Have source test data. You can maintain an external definition of
the test data, perhaps in flat files, XML files, or a secondary set of
tables. This data would be loaded in from the external source as needed.
2.
Test data creation scripts. You develop and maintain scripts, perhaps
using data manipulation language (DML) SQL code or simply application source
code (e.g. Java or C#), which does the necessary deletions, insertions, and/or
updates required to create the test data.
3.
Self-contained test cases. Each individual test case puts the database
into a known state required for the test.
These
approaches to creating test data can be used alone or in combination. A
significant advantage of writing creation
scripts and self-contained test cases is that it is much more likely that the
developers of that code will place it under configuration management (CM) control.
Although it is possible to put test data itself under CM control, worst case
you generate an export file that you check in, this isn’t a common practice and
therefore may not occur as frequently as required. Choose an approach
that reflects the culture of your organization.
Where does test data come from? For unit testing, I
prefer to create sample data with known values. This way I can predict
the actual results for the tests that I do write and I know I have the
appropriate data values for those tests. For other forms of testing --
particularly load/stress, system integration, and function testing, I will use
live data so as to better simulate real-world conditions.
4.4 What Testing Tools Are Available?
I believe that there are several critical features which you need to
successfully test RDBMSs. First, as Figure 1 implies you
need two categories of database testing tools, one for interface tests and one
for internal database tests. Second, these testing tools should support
the language that you're developing in. For example, for internal
database testing if you're a Microsoft SQL Server developer, your T-SQL
procedures should likely be tested using some form of T-SQL framework.
Similarly, Oracle DBAs should have a PL-SQL-based unit testing framework.
Third, you need tools which help you to put your database into a known state,
which implies the need not only for test data generation but also for managing
that data (like other critical development assets, test data should be under configuration management control).
To make a long story short, although we're starting to see a glimmer of
hope when it comes to database testing tools, as you can see in Table 2, but we still
have a long way to go. Luckily there are some good tools being developed by the
open source software (OSS) community and there are some commercial tools
available as well. Having said that, IMHO there is still significant
opportunity for tool vendors to improve their database testing offerings.
Table 2. Some Database Testing Tools.
Category
|
Description
|
Examples
|
Data Privacy Tools
|
Data privacy, or more generally
information privacy, is a critical issue for many organizations. Many
organizations must safeguard data by law due to regulatory compliance concerns.
|
|
Testing tools for load testing
|
Tools simulate high usage loads on
your database, enabling you to determine whether your system's architecture
will stand up to your true production needs.
|
|
Test Data Generator
|
Developers need test data against
which to validate their systems. Test data generators can be
particularly useful when you need large amounts of data, perhaps for stress
and load testing.
|
|
Test Data Management
|
Your test data needs to be
managed. It should be defined, either manually or automatically (or
both), and then maintained under version control. You need to define
expected results of tests and then automatically compare that with the actual
results. You may even want to retain the results of previous test runs
(perhaps due to regulatory compliance concerns).
|
|
Unit testing tools
|
Tools which enable you to
regression test your database.
|
5. Who Should Test?
During development cycles, the primary people responsible for doing
database testing are application developers and agile DBAs. They will typically pair together,
and because they are hopefully taking a TDD-approach to development the implication is that
they'll be doing database unit testing on a continuous basis. During the
release cycle your testers, if you have any, will be responsible for the final
system testing efforts and therefore they will also be doing database testing.
The role of your data management (DM) group, or IT management if your
organization has no DM group, should be to support your database testing
efforts. They should promote the concept that database testing is
important, should help people get the requisite training that they require, and
should help obtain database testing tools for your organization. As you
have seen, database testing is something that is done continuously by the
people on development teams, it isn't something that is done by another group
(except of course for system testing efforts). In short, the DM group
needs to support database testing efforts and then get out of the way of the
people who are actually doing the work.
6. Introducing Database Regression Testing into Your
Organization
Database testing is new to many people, and as a result you are likely to
face several challenges:
1.
Insufficient testing skills. This problem can be
overcome through training, through pairing with someone with good testing
skills (pairing a DBA without testing skills and a tester without DBA skills
still works), or simply through trial and error. The important thing is that
you recognize that you need to pick up these skills.
2.
Insufficient unit tests for existing databases. Few organizations
have yet to adopt the practice of database testing, so it is likely that you
will not have a sufficient test suite for your existing database(s). Although
this is unfortunate, there is no better time than the present to start writing
your test suite.
3.
Insufficient database testing tools. As I said
earlier, we still have a way to go with respect to tools.
4.
Reticent DM groups. My experience is
that some data management (DM) groups may see the introduction of database
regression testing, and agile techniques such as test-first development (TFD) and refactoring, as a threat. Or, as my July 2006 "state of data management" survey shows, a large
percentage of organizations are not only not doing any database testing at all
they haven't even discussed it. For many in the data management community
the idea of doing database testing is rather new and it's simply going to take
a while for them to think it through. I'm not so sure that you should wait
to do such obvious process improvement.
7. Database
Testing and Data Inspection
A common quality technique s to use data inspection tools to examine
existing data within a database. You might use something as simple as a
SQL-based query tool such as DB Inspect to select a subset of the data within a
database to visually inspect the results. For example, you may choose to
view the unique values in a column to determine what values are stored in it,
or compare the row count of a table with the count of the resulting rows from
joining the table with another one. If the two counts are the same then
you don't have an RI problem across the join.
As Richard Dallaway points out, the problem with data
inspection is that it is often done manually and on an irregular basis.
When you make changes later, sometimes months or years later, you need to redo
your inspection efforts. This is costly, time consuming, and error prone.
Data inspection is more of a debugging technique than it is a testing
technique. It is clearly an important technique, but it's not something
that will greatly contribute to your efforts to ensure data quality within your
organization.
8. Best Practices
I'd like to conclude this
article by sharing a few database testing "best practices"
with you:
1.
Use an
in-memory database for regression testing. You can dramatically speed up your database tests
by running them, or at least portions of them, against an in-memory database
such as HSQLDB. The
challenge with this approach is that because database methods are implemented
differently across database vendors that any method tests will still need to
run against the actual database server.
2.
Start fresh
each major test run. To
ensure a clean database, a common strategy is that at the beginning of each
test run you drop the database, then rebuild it from scratch taking into
account all database
refactoring's and transformations to that point, then reload the
test data, and then run your tests. Of course, you wouldn't do this to
your production database. ;-)
3.
Take a
continuous approach to regression testing. I can't say this enough, a TDD approach to
development is an incredibly effective way to work.
4.
Train people
in testing. Many developers and
DBAs have not been trained in testing skills, and they almost certainly haven't
been trained in database testing skills. Invest in your people, and give
them the training and
education they need to do their jobs.
5.
Pair with
novices with people that have database testing experience. One of the easiest ways to gain database
testing skills is to pair program with someone who already has them.
No comments:
Post a Comment