Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL
Help

Chapter 1. Introduction to Database Syst... > File-Oriented System versus Database...

1.8. File-Oriented System versus Database System

Computer-based data processing systems were initially used for scientific and engineering calculations. With increased complexity of business requirements, gradually they were introduced into the business applications. The manual method of filing systems of an organisation, such as to hold all internal and external correspondence relating to a project or activity, client, task, product, customer or employee, was maintaining different manual folders. These files or folders were labelled and stored in one or more cabinets or almirahs under lock and key for safety and security reasons. As and when required, the concerned person in the organisation used to search for a specific folder or file serially starting from the first entry. Alternatively, files were indexed to help locate the file or folder more quickly. Ideally, the contents of each file folder were logically related. For example, a file folder in a supplier’s office might contain customer data; one file folder for each customer. All data in that folder described only that customer’s transaction. Similarly, a personnel manager might organise personnel data of employees by category of employment (for example, technical, secretarial, sales, administrative, and so on). Therefore, a file folder leveled ‘technical’ would contain data pertaining to only those people whose duties were properly classified as technical.

The manual system worked well as data repository as long as the data collection were relatively small and the organisation’s managers had few reporting requirements. However, as the organisation grew and as the reporting requirements became more complex, it became difficult in keeping track of data in the manual file system. Also, report generation from a manual file system could be slow and cumbersome. Thus, this manual filing system was replaced with a computer-based filing system. File-oriented systems were an early attempt to computerize the manual filing system that we are familiar with. Because these systems performed normal record-keeping functions, they were called data processing (DP) systems. Rather than establish a centralised store for organisation’s operational data, a decentralised approach was taken, where each department, with the assistance of DP department staff, stored and controlled its own data.

Table 1.5 shows an example of file-oriented system of an organisation engaged in product distribution. Each table represents a file in the system, for example, PRODUCT file, CUSTOMER file, SALES file and so on. Each row in these files represents a record in the file. PRODUCT file contains 6 records and each of these records contains data about different products. The individual data items or fields in the PRODUCT file are PRODUCT-ID, PRODUCT-DESC, MANUF-ID and UNIT-COST. CUSTOMER file contains 5 records and each of these records contains data about customer. The individual data items in CUSTOMER file are CUST-ID, CUST-NAME, CUST-ADDRESS, COUNTRY, TEL-NO and BAL–AMT. Similarly, SALES file contains 5 records and each of these records contains data about sales activities. The individual data items in SALES file are SALES-DATE, CUST-ID, PROD-ID, QTY and UNIT-PRICE.

Table 1.5. File-oriented system

PRODUCT
PRODUCT-IDPRODUCT-DESCMANUF-IDUNIT-COST
A12345Steel almirah1004000
B23412Dryer2004500
B44332Freeze2106000
A98765Steel table1053500
A29834Steel chair1104800
C11008Iron moulding4445100


CUSTOMER
CUST-IDCUST-NAMECUST-ADDRESSCOUNTRYTEL-NOBAL-AMT
1001Waterhouse Ltd.Box 41, MumbaiIndia214701545,000
1000KLY System41, 1st Street, ChicagoUSA200089433,550
1005MegapointsC-12, Pataya, GoaIndia222200974,314
1010Concept Shapers32, Main Road, RanchiIndia459873349,444
1006Trinity AgenciesP.O. Box 266, TokyoJapan234567855,542


SALES
SALE-DATECUST-IDPROD-IDQTYUNIT-PRICE
02/12/021001A123451006,700
10/10/021000B234122504,000
12/12/031010B4433212014,000
01/04/041005A987651105,500
30/02/041001A2983430012,999


With the assistance of DP department, the files were used for a number of different applications by the user departments, for example, account receivable program written to generate billing statements for customers. This program used the CUSTOMER and SALES files and these files were both stored in the computer in order by CUST-ID and were merged to create a printed statement. Similarly, sales statement generation program (using PRODUCT and SALES files) was written to generate product-wise sales performance. This type of program, which accomplishes a specific task of practical value in a business situation is called application program or application software. Each application program that is developed is designed to meet the specific needs of the particular requesting department or user group.

Fig. 1.18 illustrates structures in which application programs are written specifically for each user department for accessing their own files. Each set of departmental programs handles data entry, file maintenance and the generation of a fixed set of specific reports. Here, the physical structure and storage of the data files and records are defined in the application program. For example:

Fig. 1.18. File-oriented system


  1. Sales department stores details relating to sales performance, namely SALES(SALE-DATE, CUST-ID, PROD-ID, QTY, UNIT-PRICE).

  2. Customer department stores details relating to customer invoice realization summary, namely CUSTOMER (CUST-ID, CUST-NAME, CUST-ADD, COUNTRY, TEL-NO, BAL-AMT).

  3. Product department stores details relating to product categorization summary, namely PRODUCT (PROD-ID, PROD-DESC, MANUF-ID, UNIT-COST).

It can be seen from the above examples that there is significant amount of duplication of data storage in different departments (for example, CUST-ID and PROD-ID), which is generally true with file-oriented system.

1.8.1. Advantages of Learning File-oriented System

Although the file-oriented system is now largely obsolete, following are the several advantages of learning file-based systems:

  • It provides a useful historical perspective on how we handle data.

  • The characteristics of a file-based system helps in an overall understanding of design complexity of database systems.

  • Understanding the problems and knowledge of limitation inherent in the file-based system helps avoid these same problems when designing database systems and thereby resulting in smooth transition.

1.8.2. Disadvantages of File-oriented System

Conventional file-oriented system has the following disadvantages:

  1. Data redundancy (or duplication): Since a decentralised approach was taken, each department used their own independent application programs and special files of data. This resulted into duplication of same data and information in several files, for example, duplication of PRODUCT-ID data in both PRODUCT and SALES files, and CUST-ID data in both CUSTOMER and SALES files as shown in Table 1.5. This redundancy or duplication of data is wasteful and requires additional or higher storage space, costs extra time and money, and requires increased effort to keep all files up-to- date.

  2. Data inconsistency (or loss of data integrity): Data redundancy also leads to data inconsistency (or loss of data integrity), since either the data formats may be inconsistent or data values (various copies of the same data) may no longer agree or both.

    Fig. 1.19 shows an example of data inconsistency in which a field for product description is being shown by all the three department files, namely SALES, PRODUCT and ACCOUNTS. It can been seen in this example that even though it was always the product description, the related field in all the three department files often had a different name, for example, PROD-DESC, PROD-DES and PRODDESC. Also, the same data field might have different length in the various files, for example, 15 characters in SALES file, 20 characters in PRODUCT file and 10 characters in ACCOUNTS file. Furthermore, suppose a product description was changed from steel cabinet to steel chair. This duplication (or redundancy) of data increased the maintenance overhead and storage costs. As shown in Fig. 1.19, the product description filed might be immediately updated in the SALES file, updated incorrectly next week in the PRODUCT file as well as ACCOUNT file. Over a period of time, such discrepancies can cause serious degradation in the quality of information contained in the data files and can also affect the accuracy of reports.

    Fig. 1.19. Inconsistent product description data

  3. Program-data dependence: As we have seen, file descriptions (physical structure, storage of the data files and records) are defined within each application program that accesses a given file. For example, “Account receivable program” of Fig. 1.18 accesses both CUSTOMER file and SALES file. Therefore, this program contains a detailed file description for both these files. As a consequence, any change for a file structure requires changes to the file description for all programs that access the file. It can also be noticed in Fig. 1.18 that SALES file has been used in both “Account receivable program” and “Sales statement program”. If it is decided to change the CUST-ID field length from 4 characters to 6 characters, the file descriptions in each program that is affected would have to be modified to confirm to the new file structure. It is often difficult to even locate all programs affected by such changes. It could be very time consuming and subject to error when making changes. This characteristic of file-oriented system is known as program-data dependence.

  4. Poor data control: As shown in Fig. 1.19, a file-oriented system being decentralised in nature, there was no centralised control at the data element (field) level. It could be very common for the data field to have multiple names defined by the various departments of an organisation and depending on the file it was in. This could lead to different meanings of a data field in different context, and conversely, same meaning for different fields. This leads to a poor data control, resulting in a big confusion.

  5. Limited data sharing: There is limited data sharing opportunities with the traditional file-oriented system. Each application has its own private files and users have little opportunity to share data outside their own applications. To obtain data from several incompatible files in separate systems will require a major programming effort. In addition, a major management effort may also be required since different organisational units may own these different files.

  6. Inadequate data manipulation capabilities: Since File-oriented systems do not provide strong connections between data in different files and therefore its data manipulation capability is very limited.

  7. Excessive programming effort: There was a very high interdependence between program and data in file-oriented system and therefore an excessive programming effort was required for a new application program to be written. Even though an existing file may contain some of the data needed, the new application often requires a number of other data fields that may not be available in the existing file. As a result, the programmer had to rewrite the code for definitions for needed data fields from the existing file as well as definitions of all new data fields. Therefore, each new application required that the developers (or programmers) essentially start from scratch by designing new file formats and descriptions and then write the file access logic for each new program. Also, both initial and maintenance programming efforts for management information applications were significant.

  8. Security problems: Every user of the database system should not be allowed to access all the data. Each user should be allowed to access the data concerning his area of application only. Since, applications programs are added to the file-oriented system in an ad hoc manner, it was difficult to enforce such security system.

1.8.3. Database Approach

The problems inherent in file-oriented systems make using the database system very desirable. Unlike the file-oriented system, with its many separate and unrelated files, the database system consists of logically related data stored in a single data dictionary. Therefore, the database approach represents the change in the way end user data are stored, accessed and managed. It emphasizes the integration and sharing of data throughout the organisation. Database systems overcome the disadvantages of file-oriented system. They eliminate problems related with data redundancy and data control by supporting an integrated and centralised data structure. Data are controlled via a data dictionary (DD) system which itself is controlled by database administrators (DBAs). Fig. 1.20 illustrates a comparison between file-oriented and database systems.

Fig. 1.20. File-oriented versus database systems

(a) File-oriented system

(b) Database system


 

1.8.4. Database System Environment

A database system refers to an organisation of components that define and regulate the collection, storage, management and use of data within a database environment. It consists of four main parts:

  • Data

  • Hardware

  • Software

  • Users (People)

Data: From the user’s point of view, the most important component of database system is perhaps the data. The term data has been explained in Section 1.2.1. The totality of data in the system is all stored in a single database, as shown in Fig. 1.20 (b). These data in a database are both integrated and shared in a system. Data integration means that the database can be thought of as a function of several otherwise distinct files, with at least partly eliminated redundancy among the files. Whereas in data sharing, individual pieces of data in the database can be shared among different users and each of those users can have access to the same piece of data, possibly for different purposes. Different users can effectively even access the same piece of data concurrently (at the same time). Such concurrent access of data by different users is possibly because of the fact that the database is integrated.

Depending on the size and requirement of an organisation or enterprise, database systems are available on machines ranging from the small personal computers to the large mainframe computers. The requirement could be a single-user system (in which at most one user can access the database at a given time) or multi-user system (in which many users can access the database at the same time).

Hardware: All the physical devices of a computer are termed as hardware. The computer can range from a personal computer (microcomputer), to a minicomputer, to a single mainframe, to a network of computers, depending upon the organisation’s requirement and the size of the database. From the point of view of the database system the hardware can be divided into two components:

  • The processor and associated main memory to support the execution of database system (DBMS) software and

  • The secondary (or external) storage devices (for example, hard disk, magnetic disks, compact disks and so on) that are used to hold the stored data, together with the associated peripherals (for example, input/output devices, device controllers, input/output channels and so on).

A database system requires a minimum amount of main memory and disk space to run. With a large number of users, a very large amount of main memory and disk space is required to maintain and control the huge quantity of data stored in a database. In addition, high-speed computers, networks and peripherals are necessary to execute the large number of data access required to retrieve information in an acceptable amount of time. The advancement in computer hardware technology and development of powerful and less expensive computers, have resulted into increased database technology development and its application.

Software: Software is the basic interface (or layer) between the physical database and the users. It is most commonly known as database management system (DBMS). It comprises the application programs together with the operating system software. All requests from the users to access the database are handled by DBMS. DBMS provides various facilities, such as adding and deleting files, retrieving and updating data in the files and so on. Application software is generally written by company employees to solve a specific common problem.

Application programs are written typically in a third-generation programming language (3GL), such as C, C++, Visual Basic, Java, COBOL, Ada, Pascal, Fortran and so on, or using fourth-generation language (4GL), such as SQL, embedded in a third-generation language. Application programs use the facilities of the DBMS to access and manipulate data in the database, providing reports or documents needed for the information and processing needs of the organisation. The operating system software manages all hardware components and makes it possible for all other software to run on the computers.

Users: The users are the people interacting with the database system in any form. There could be various categories of users. The first category of users is the application programmers who write database application programs in some programming language. The second category of users is the end users who interact with the system from online workstations or terminals and accesses the database via one of the online application programs to get information for carrying out their primary business responsibilities. The third category of users is the database administrators (DBAs), as explained in Section 1.7, who manage the DBMS and its proper functioning. The fourth category of users is the database designers who design the database structure.

1.8.5. Advantages of DBMS

Due to the centralised management and control, the database management system (DBMS) has numerous advantages. Some of these are as follows:

  1. Minimal data redundancy: In a database system, views of different user groups (data files) are integrated during database design into a single, logical, centralised structure. By having a centralised database and centralised control of data by the DBA the unnecessary duplication of data are avoided. Each primary fact is ideally recorded in only one place in the database. The total data storage requirement is effectively reduced. It also eliminates the extra processing to trace the required data in a large volume of data. Incidentally, we do not mean or suggest that all redundancy can or necessarily should be eliminated. Sometimes there are sound business and technical reasons for maintaining multiple copies of the same data, for example, to improve performance, model relationships and so on. In a database system, however, this redundancy can be carefully controlled. That is, the DBMS is aware of it, if it exists and assumes the responsibility for propagating updates and ensuring that the multiple copies are consistent.

  2. Program-data independence: The separation of metadata (data description) from the application programs that use the data is called data independence. In the database environment, it allows for changes at one level of the database without affecting other levels. These changes are absorbed by the mappings between the levels. With the database approach, metadata are stored in a central location called repository. This property of data systems allows an organisation’s data to change and evolve (within limits) without changing the application programs that process the data.

  3. Efficient data access: DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently. This feature is especially important if the data is stored on external storage devices.

  4. Improved data sharing: Since, database system is a centralised repository of data belonging to the entire organisation (all departments), it can be shared by all authorized users. Existing application programs can share the data in the database. Furthermore, new application programs can be developed on the existing data in the database to share the same data and add only that data that is not currently stored, rather having to define all data requirements again. Therefore, more users and applications can share more of the data.

  5. Improved data consistency: Inconsistency is the corollary to redundancy. As explained in Section 1.8.2 (b) in the file-oriented system, when the data is duplicated and the changes made at one site are not propagated to the other site, it results into inconsistency. Such database supplies incorrect or contradictory information to its users. So, if the redundancy is removed or controlled, chances of having inconsistence data is also removed and controlled. In database system, such inconsistencies are avoided to some extent by making them known to DBMS. DMS ensures that any change made to either of the two entries in the database is automatically applied to the other one as well. This process is known as propagating updates.

  6. Improved data integrity: Data integrity means that the data contained in the database is both accurate and consistent. Integrity is usually expressed in terms of constraints, which are consistency rules that the database system should not violate. For example in Table 1.5, the marriage month (MRG-MTH) in the EMPLOYEE file might be shown as 14 instead of 12. Centralised control of data in the database system ensures that adequate checks are incorporated in the DBMS to avoid such data integrity problem. For example, an integrity check for the data field marriage date (MRG-MTH) can be introduced between the range of 01 and 12. Another integrity check can be incorporated in the database to ensure that if there is reference to a certain object, that object must exit. For example, in the case of bank’s automatic teller machine (ATM), a user is not allowed to transfer fund from a nonexistent saving to a checking account.

  7. Improved security: Database security is the protection of database from unauthorised users. The database administrator (DBA) ensures that proper access procedure is followed, including proper authentication schemes for access to the DBMS and additional checks before permitting access to sensitive data. A DBA can define (which is enforced by DBMS) user names and passwords to identify people authorised to use the database. Different levels of security could be implemented for various types of data and operations. The access of data by authorised user may be restricted for each type of access (for example, retrieve, insert, modify, update, delete and so on) to each piece of information in the database. The enforcement of security could be data-value dependent (for example, a works manager has access to the performance details of employees in his or her department only), as well as data-type dependent (but the manager cannot access the sensitive data such as salary details of any employees, including those in his or her department).

  8. Increased productivity of application development: The DBMS provides many of the standard functions that the application programmer would normally have to write in a file-oriented application. It provides all the low-level file-handling routines that are typical in application programs. The provision of these functions allows the application programmer to concentrate on the specific functionality required by the users without having to worry about low-level implementation details. DBMSs also provide a high-level (4GL) environment consisting of productivity tools, such as forms and report generators, to automate some of the activities of database design and simplify the development of database applications. This results in increased productivity of the programmer and reduced development time and cost.

  9. Enforcement of standards: With central control of the database, a DBA defines and enforces the necessary standards. Applicable standards might include any or all of the following: departmental, installation, organisational, industry, corporate, national or international. Standards can be defined for data formats to facilitate exchange of data between systems, naming conventions, display formats, report structures, terminology, documentation standards, update procedures, access rules and so on. This facilitates communication and cooperation among various departments, projects and users within the organisation. The data repository provides DBAs with a powerful set of tools for developing and enforcing these standards.

  10. Economy of scale: Centralising of all the organisation’s operational data into one database and creating a set of application programs that work on this source of data resulting in drastic cost savings. The DBMS approach permits consolidation of data and applications. Thus reduces the amount of wasteful overlap between activities of data-processing personnel in different projects or departments. This enables the whole organisation to invest in more powerful processors, storage devices or communication gear, rather than having each department purchase its own (low-end) equipment. Thus, a combined low cost budget is required (instead of accumulated large budget that would normally be allocated to each department for file-oriented system) for the maintenance and development of system. This reduces overall costs of operation and management, leading to an economy of scale.

  11. Balance of conflicting requirements: Knowing the overall requirements of the organisation (instead of the requirements of individual users), the DBA resolves the conflicting requirements of various users and applications. A DBA can structure the system to provide an overall service that is best for the organisation. A DBA can chose the best file structure and access methods to get optimal performance for the response-critical operations, while permitting less critical applications to continue to use the database (with a relatively slower response). For example, a physical representation can be chosen for the data in storage that gives fast access for the most important applications.

  12. Improved data accessibility and responsiveness: As a result of integration in database system, data that crosses departmental boundaries is directly accessible to the end-users. This provides a system with potentially much more functionality. Many DBMSs provide query languages or report writers that allow users to ask ad hoc questions and to obtain the required information almost immediately at their terminal, without requiring a programmer to write some software to extract this information from the database. For example (from Table 1.4), a works manager could list from the EMPLOYEE file, all employees belonging to India with a monthly salary greater than INR 5000 by entering the following SQL command at a terminal, as shown in Fig. 1.21.

    Fig. 1.21. SQL for selected data fields

  13. Increased concurrency: DBMSs manage concurrent databases access and prevents the problem of loss of information or loss of integrity.

  14. Reduced program maintenance: The problems of high maintenance effort required in file-oriented system, as explained in Section 1.8.2 (g), are reduced in database system. In a file-oriented environment, the descriptions of data and the logic for accessing data are built into individual application programs. As a result, changes to data formats and access methods inevitably result in the need to modify application programs. In database environment, data are more independent of the application programs.

  15. Improved backup and recovery services: DBMS provides facilities for recovering from hardware or software failures through its back up and recovery subsystem. For example, if the computer system fails in the middle of a complex update program, the recovery subsystem is responsible and makes sure that the database is restored to the state it was in before the program started executing. Alternatively, the recovery subsystem ensures that the program is resumed from the point at which it was interrupted so that its full effect is recorded in the database.

  16. Improved data quality: The database system provides a number of tools and processes to improve data quality.

1.8.6. Disadvantages of DBMS

In spite of the advantages, the database approach entails some additional costs and risks that must be recognized and managed when implementing DBMS. Following are the disadvantages of using DBMS:

  1. Increased complexity: A multi-user DBMS becomes an extremely complex piece of software due to expected functionality from it. It becomes necessary for database designers, developers, database administrators and end-users to understand this functionality to full advantage of it. Failure to understand the system can lead to bad design decisions, which can have serious consequences for an organisation.

  2. Requirement of new and specialized manpower: Because of rapid changes in database technology and organisation’s business needs, the organisation’s need to hire, train or retrain its manpower on regular basis to design and implement databases, provide database administration services and manage a staff of new people. Therefore, an organisation needs to maintain specialized skilled manpower.

  3. Large size of DBMS: The large complexity and wide functionality makes the DBMS an extremely large piece of software. It occupies many gigabytes of storage disk space and requires substantial amounts of main memory to run efficiently.

  4. Increased installation and management cost: The large and complex DBMS software has a high initial cost. It requires trained manpower to install and operate and also has substantial annual maintenance and support costs. Installing such a system also requires upgrades to the hardware, software and data communications systems in the organisation. Substantial training of manpower is required on an ongoing basis to keep up with new releases and upgrades. Additional or more sophisticated and costly database software may be needed to provide security and to ensure proper concurrent updating of shared data.

  5. Additional hardware cost: The cost of DBMS installation varies significantly, depending on the environment and functionality, size of the hardware (for example, micro-computer, mini-computer or main-frame computer) and the recurring annual maintenance cost of hardware and software.

  6. Conversion cost: The cost of conversion (both in terms of money and time) from legacy system (old file-oriented and/or older database technology) to modern DBMS environment is very high. In some situations, the cost of DBMS and extra hardware may be insignificant compared with the cost of conversion. This cost includes the cost of training manpower (staff) to use these new systems and cost of employing specialists manpower to help with the conversion and running of the system.

  7. Need for explicit backup and recovery: For a centralised shared database to be accurate and available all times, a comprehensive procedure is required to be developed and used for providing backup copies of data and for restoring a database when damage occurs. A modern DBMS normally automates many more of the backup and recovery tasks than a file-oriented system.

  8. Organisational conflict: A centralised and shared database (which is the case with DBMS) requires a consensus on data definitions and ownership as well as responsibilities for accurate data maintenance. As per past history and experience, sometimes there are conflicts on data definitions data formats and coding, rights to update shared data, and associated issues, which are frequent and often difficult to resolve. Organisational commitment to the database approach, organisationally astute database administrators and a sound evolutionary approach to database development is required to handle these issues.

  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint