What it takes to be a DBA?
Posted On November 6, 2007 by Sneha Philipose filed under Miscellaneous
A database administrator (DBA) is similar to a database developer or designer except that the DBA's key responsibilities are to ensure that a database is available at all points of time to the users and programs that need it. This includes making backups and archiving data. It also includes on-going monitoring to ensure that the database is responding to requests quickly.
Sometimes the DBA has some hardware knowledge so that the appropriate servers can be purchased so that the database's performance is not impacted by hardware bottlenecks.
A DBA is expected to stay abreast of emerging technologies and new design approaches. Typically, a DBA has either a degree in Computer Science and some on-the-job training with a particular database product or more extensive experience with a range of database products.
A DBA is usually expected to have experience with one or more of the major database management products, such as Structured Query Language, SAP, Oracle-based database management software and others. If you are part of a technical team looking for a DBA, do yourself a favor and volunteer. It is a great job and good DBAs are in demand and typically are paid more than developers.
DBA Responsibilities
The DBA should establish and enforce standards for the SQL Server and databases and make sure that everyone sticks to them.
Installing, Upgrading SQL Server
The DBA is responsible for installing SQL Server or upgrading an existing SQL Server. The DBA is also responsible for applying SQL Server service packs.
Monitoring the Database Server
The DBA has to make sure that the following is done:
- The server is running with optimal performance.
- The error log or event log is monitored for database errors.
- Databases have routine maintenance performed on them, and the overall system has periodic maintenance performed by the system administrator.
Storage
SQL Server 2000 enables you to automatically grow the size of your databases and transaction logs, or you can choose to select a fixed size for the database and transaction log. Either way, maintaining the proper use of storage means monitoring space requirements and adding new storage space (disk drives) when required.
Backup and Recovery
Backup and recovery are the DBA's most critical tasks:
- Establishing standards and schedules for database backups
- Developing recovery procedures for each database
- Making sure that the backup schedules meet the recovery requirements
With SQL Server 2000, the DBA works tightly with the Windows NT administrator to add user NT logins to the database. In non-NT domains, the DBA adds user logins. The DBA is also responsible for assigning users to databases and determining the proper security level for each user. Within each database, the DBA is responsible for assigning permissions to the various database objects such as tables, views, and stored procedures.
Transferring and Replicating Data
The DBA is responsible for importing and exporting data to and from the SQL Server. In the current trend to downsize and combine client/server systems with mainframe systems and Web technologies to create Enterprise systems, importing data from the mainframe to SQL Server is a common occurrence that is about to become more common with the SQL Server 2000 Data Transformation Services. Good DTS DBAs will be in hot demand as companies struggle to move and translate legacy system to Enterprise systems.
Managing and setting up replication topologies is a big undertaking for a DBA because of the complexities involved with properly setting up and maintaining replication.
Data Warehousing
Data warehousing provides new and interesting challenges to the DBA and in some companies a new career as a warehouse specialist.
Scheduling Events
The database administrator is responsible for setting up and scheduling various events using Windows NT and SQL Server to aid in performing many tasks such as backups and replication.
Providing 24-Hour Access
The database server must stay up, and the databases must always be protected and online. Be prepared to perform some maintenance and upgrades after hours. If the database server should go down, be ready to get the server up and running. After all, that's your job.
The role also encompasses these special activities:
Performance Optimization
What is the purpose of database optimization statistics? It is to get statistics for the data in the RDBMS so the optimizer can create a query plan to find a fast path to the data. Statistics are created by running 'ANALYZE' in Oracle, 'update statistics' in Informix or 'vacuum' in Postgresql and other utilities in other RDBMS.
A performance problem occurs when the processing to get the statistics takes a significant amount of time in a large application. If processing statistics takes a long time the application performance will be crippled. If table sizes and the distribution of values do not change drastically the easiest way to optimize runtime of the statistics job is to run it fewer times. Instead of five times a week, run it once a week, or once a month or split the job to run a few tables per day. Some RDBMS utilities can get statistics on a sample of the data, or some subset of statistics. The statistics job will run faster. The idea is that some statistics, even if not complete or totally accurate, will help the optimizer enough to get a fast path to the data.
Data Loading and Indexes
Index administration is the key to quickly load tables or run batch jobs that load a set of tables. For example, in an ERP database of different products, the rows are deleted one by one and are reloaded with the indexes still intact. Batch jobs on the same database take 40 hours and are scheduled to run "overnight."
As DBAs and developers know that the RDBMS will truncate or drop and recreate tables that are to be completely refreshed. This gets rid of the lengthy delete phase. But don't forget to handle the indexes. The simple point is, load data or run batch jobs that load tables with the indexes dropped. Re-create the index at the end of the process. The index is building at the same time as the tables are loading. But the index is an ordered set of data and takes longer to create than adding data to the table. It has been shown that loading presorted data in an index order is usually faster than unordered data. How fast is loading tables without indexes compared to loading with indexes updating continuously? Try it yourself and time it. Of course to do this you must be able to access the DDL (Data Definition Language) to get the index definitions.
Constraints
Choose database level constraints where possible. Integrity constraints (foreign keys), check constraints (attribute domains) take work out of the programmers' headspace. It stops data from getting scrambled. For example, great wads of code written to check that an orderline is attached to an order becomes worthless if one error pops up anywhere or any time in any application that touches the data. ERP applications end up costing companies a huge overhead to straighten out the "programmer enforced" constraints. Most of the processing that happens in ERP batch jobs seems to be 'fixing' the problems brought about by bad design, old code and non-RDBMS based architecture which fails to use database level constraints.
Database Reports
The only way to do any reports from an Online Transaction and Processing (OLTP) database is to give the data from an SQL script in raw ASCII format to an analyst. This makes an efficient report on the OLTP server; the analyst who wants the report does the data manipulation. Consider an analysis program where the data is formatted as a spreadsheet or SPSS, SAS, etc. Let the business analyst play, move columns around, summarize, graph, whatever.
Any business analyst that cannot understand the metadata and manipulate the data they work with should be fired. However, competent analysts are thrilled to play with the data instead of waiting for a report to get modified.
This method keeps report changes to a minimum. The reason is, the analyst must also work on any changes to a report as the report comes from the analyst's own tools. And chances are that the base data does not change as much as the analysis of the data done by the analyst.
Metadata
Metadata does not apply just to a data warehouse. It is the guts of any database. You need to know what each attribute and entity mean as a semantic definition, how to create, update, delete and archive the model elements. How elements change over time. This is in addition to the language definition of the model element and its relation to other elements. There is a lot of modeling software on the market to help track metadata. Use it or not, there is no excuse not to have a reasonably complete model. Dangerous projects are interfacing one application to another or migrating a legacy application to an ERP application.
A first step for a DBA is to find and secure the metadata of each model involved. Having metadata means that you can interface applications, migrate legacy models to new models, plan resource allocation and manage the data with no surprises.
Conclusion
It is important for the DBA to work closely with development teams to assist in overall database design, such as creating normalized databases, helping developers tune queries, assigning proper indexes, and aiding developers in the creation of triggers and stored procedures.
DBAs are sometimes content to sit back and watch developers make bad design and SQL Server decisions. Some of them want to be involved but are prevented. Don't be underutilized. Take an active role in new project development. The entire team will benefit from your insight and knowledge!
So while it may be true that a database may better care for itself, there are now a variety of new infrastructure dependencies that must be managed, and it is the DBA who is needed to metamorphose to keep ahead of resolving the new database-driven application complexities. For it is the DBA who is always summoned when the Help Desk gets that call from the user saying, "The application seems to be running slow." How prepared is your organization to handle such a call?
This article has been contributed by Haneef Khan. He can be reached at
khan_haneef@rediffmail.com
