Information may be the most valuable commodity in the modern world. It can take many different forms — accounting and payroll information, information about customers and orders, scientific and statistical data, graphics–to mention just a few. We are virtually swamped with data. And we cannot — or at least we'd like to think about it this way — afford to lose it, but these days we simply have too much data to keep storing it in file cabinets or cardboard boxes. The need to safely store large collections of persistent data, efficiently "slice and dice" it from different angles by multiple users and update it easily when necessary is critical for every enterprise. That need mandates the existence of databases, which accomplish all the tasks listed above, and then some. To put it simply, a database is just an organized collection of information — with emphasis on organized.
A more specific definition often used as a synonym for "database" is database management system (DBMS). That term is wider and, in addition to the stored information, includes some methods to work with data and tools to maintain it.
DBMS can be defined as a collection of interrelated data plus a set of programs to access, modify, and maintain the data. More about DBMS later in this chapter.
There are many ideas about what a database is and what it should do. However, all modern databases should have at least the following characteristics.
A database's primary function is to store large amounts of information. For example, an order management system for a medium-sized company can easily grow into gigabytes of data; the bigger the company, the more data it needs to store and rely upon. A company that wants to keep historical (archival) data will require even more storage space. The need for storage capacity is growing rapidly, and databases provide for structured storage.
As was noted previously, enterprise data is valuable and must be stored safely. That means protection of the stored data not only from malicious or careless human activities, such as unauthorized logins, accidental information deletions/modifications, and so on, but also from hardware failures and natural disasters.
It's also important to note that in order to be useful, the information stored in a database must be accessible to many users simultaneously at different levels of security, and, no matter what, the data must stay consistent. For example, if two users try to change the same piece of information at the same time, the result can be unpredictable (e.g., data corruption), so situations like that have to be handled appropriately by internal database mechanisms. Also, certain groups of users may be allowed to modify several pieces of information, browse other parts of it, and be prevented from even viewing yet another part. (Some company data can be strictly confidential with a very restricted access.)
Users need quick access to the data they want. It is very important not only to be able to store data, but also to have efficient algorithms to work with it. For example, it would be unacceptable for users to have to scroll through each and every record to find just one order among millions stored in the database; the response to someone's querying the database must be fast, preferably instantaneous.
As an analogy, suppose you wanted to find all the occurrences of the word "object" in a book. You could physically browse through the entire book page by page until you reach the end. Or you could use the index and determine that the word is used on pages 245, 246, and 348. This situation is comparable to using bad or good programming algorithms.
Databases must be flexible and easily adaptable to changing business needs. That primarily means that the internal structure of database objects should be easily modified with minimum impact on other objects and processes; for example, to add a field in a legacy database you would have to bring the whole dataset offline, that is, make it inaccessible to users, modify it, change and recompile related programs, and so on. We'll talk more about that in the "Database Legacy" section of this chapter.
Another scalability aspect is that data typically lives longer than the hardware and software used to access and manipulate it, so it would not be very convenient to have to redesign the entire database to accommodate the current "flavor-of-the-month" development environment; for example, in case of a takeover or when company management suddenly decides to switch production environment from Java to C#.
Databases are not just for programmers and technical personnel (some would say not for programmers — period). Nontechnical users constitute the majority of all database users nowadays. Accountants, managers, salespeople, doctors and nurses, librarians, scientists, technicians, customer service representatives — for all these and many more people, interaction with databases is an integral part of their work. That means data must be easy to manipulate. Of course, most users will access it through a graphical user interface with a predefined set of screens and limited functionality, but ad-hoc database queries and reports become more and more popular, especially among sophisticated, computer-literate users.
Consider this. An order management application has a screen to view all orders and another window to browse customers. It can also generate a number of reports, including one to analyze orders grouped by customer. But accountant Jerry is working on a report for his boss and needs to find the ten customers with the highest debt. He can request a new report from the IT department, but it will take days (or even weeks) because of bureaucratic routine, programmers' busyness, or something else. The knowledge of SQL can help Jerry to create his own ad-hoc query, get the data, and finish his report.