1. Introduction
1.1 Background
With the advancement in computer technologies, computers and software have been driving modern business, government and military organisation all over the world. As the year 2000 approaches, many software applications that use only the last two digits to indicate the year in a date will either malfunction or fail. This is commonly known as the Y2K (Year 2000) problem. With its ripple effect, it could potentially disrupt business operations, causing chaos all around the world. Millions of software applications need to be checked, re-written and verified to be Y2K compliant. The cost of fixing the problem appeared to be the most expensive in human history.
The cause of the Y2K problem can be traced to the early days of computers in the 50s and 60s. During that time, data was stored in punch cards, paper tape, low-density magnetic tape and direct access devices with limited storage. Because of the expensive and low capacity storage, use of two-digit date format helped to save large amounts of money and resources. The storage costs were so high in that era that Dr. Leon Kappelman [LEON97] indicates that four-digit rather than two-digit dates might have raised the total volume of tape and disk space required by perhaps 3% during the era when the storage was extremely costly. Programmers were well aware of the two-digit date problems but they never expected the software to last until year 2000. Moreover, it is the explicit requirement by the clients of the software application and the executives responsible for the data centers to use two-digit date as an effective way of saving money [JONES98].
According to Gartner group [GAR99], 90% of the applications will be affected by the Y2K problem and systems will crash if the century problem is not corrected before year 1999. 20% of business applications will fail due to date computations in the year 1995. Most major corporations are expected to spend about 50 to 100 million US dollars. The Gartner Group estimates "A medium size shop with approximately 8000 programs, each program averages 1500 LOC (line of code), and a data reference to LOC ratio of 1:50 will cost in the range of $450/program to $600/program or $3.6-$4.8 million for the entire initiative".
To make matter worst, the Asian crisis that is troubling the Asian countries would delay the work to bring the software to Y2K compliance. This means that the number of software that will be Y2K compliant by year 2000 will not be what is as expected.
1.1.4 Overcoming the Y2K Problem
To overcome the Y2K problem, millions of software programs must be rewritten and verified to ensure that they are Y2K compliant. For instance, to bring the whole enterprise into Y2K compliant, WRQ [WRQ99], one of the leading information systems companies with its South East Asian headquarters in Singapore, recommended a three-phase attack on the problem [COM99]. The three phases are discovery, prioritisation and bringing the software into Y2K compliance. Discovery involves the finding all the software that might be subjected to the Y2K problem. Prioritisation is needed because performing a full check on all the software is not possible due to the little time left. Thus, applications that are mission-critical must be given a higher priority than the other applications. Bringing the software into Y2K compliance means checking and fixing the Y2K bug in the software.
1.1.5 The Need for an Automated Documentation System
To fix the Y2K bugs, source codes of the applications have to be checked and fixed accordingly. However, these codes might have little or no documentation at all. This makes the programmers’ job very difficult and time consuming. The programmers have to read piles of codes to understand the original programmer’s work. To make matter worst, the codes are usually written in old programming languages like COBOL which the programmer might not be familiar with. Among the programming languages affected by the Y2K problem, COBOL has the largest exposure to the Y2K problem. COBOL is a very old programming language (1960s) and is widely used for business applications. It accounts for 60% of the management information systems software [JONES98]. Besides COBOL, spreadsheets and C applications also have a large exposure to the Y2K problem. The table below shows the impact of the Y2K problem on a list of selected programming languages. The function points shown in the table is a metric for measuring the software based on the inputs, outputs, inquiries, logical files and interfaces of the software application.
|
Language |
Programmers |
Applications |
Function Points |
|
COBOL |
550,000 |
12,100,000 |
605,000,000 |
|
Spreadsheets |
600,000 |
3,600,000 |
54,000,000 |
|
C |
200,000 |
2,600,000 |
156,000,000 |
|
Basic |
250,000 |
2,250,000 |
45,000,000 |
|
Query |
150,000 |
1,400,000 |
105,000,000 |
|
Database |
200,000 |
1,600,000 |
120,000,000 |
|
C++ |
175,000 |
1,400,000 |
105,000,000 |
|
Pascal |
90,000 |
1,080,000 |
54,000,000 |
|
Assembly |
50,000 |
750,000 |
93,750,000 |
|
Ada83 |
90.000 |
720,000 |
54,000,000 |
|
Fortran |
50,000 |
575,000 |
28,750,000 |
|
PL/1 |
30,000 |
270,000 |
13,500,000 |
|
Jovial |
15,000 |
105,000 |
7,875,000 |
|
Other |
1,000,000 |
7,000,000 |
336,000,000 |
|
Total |
3,450,000 |
36,000,000 |
1,702,125,000 |
Table 1. Impact of Y2K Problem for Selected Languages
Source: Capers Jones, The Year 1000 Software Problem, in "Function Points versus Lines of Code Metrics for the Year 2000 Problem", page 49, 1st edition, Addison-Wesley, 1998.
The other aspect of the Y2K problem is that there is not much attention of dealing with applications that use multiple programming languages. Mixed-language applications are used commonly in systems, military, information and commercial systems. Fixing a mixed-language application means that programmers need to be familiar with more than one language, thus making the job even harder.
What the programmers need is simple but accurate documentation for the source codes that are to be checked and fixed. To cater for mixed-language applications, the documentation should not be language specific. Moreover, the documentation itself should be easy to generate and should not consume too much time and effort of the programmers.
This bought the need for automated documentation software. The automated documentation software would generate clear and accurate documentation for the programmers from the source codes of the applications of interest. Because the documentation generation is automated, little effort is needed from the programmers. This helps to save time and effort for the programmers, and thus save the cost for fixing the Y2K bug. The documentation could also used to serve as a communication media between programmers.
To aid in the uncovering and fixing of Y2K bugs in the software affected by the year 2000 problem, a tool for auto-generation of documentation for source codes will be developed. The objective of the project will be:
The automated documentation software would read in the application’s source codes and generates documentation for it. The generated documentation should provide accurate, clear and useful information for the application of interest to the software developers. The documentation should be easily accessible through the Internet and provide richer presentation like use of images rather than plain text. The HTML is chosen for documentation format. The HTML allows easily navigation of the documentation through links and image maps.
A project schedule was worked out to ensure that the project was carried out systematically. There were three phases in the schedule: the research phase, the design phase and the implementation and testing phase. In the research phase, studies on various topics like encoding a GIF file, building a parser etc were made. In the design phase, various design issues of the documentation system were made. In the implementation and testing phase, the system was coded in C++ and Visual Basic. Testing was performed and enhancements were made.
The project schedule is included in the appendices.
Chapter Two covered the system analysis of the documentation system. Two case studies were made and requirements were analysed. Chapter Three covers the design and implementation of the system, addressing the various problems faced and different approaches to overcome these problems. Chapter Four gives a brief discussion about the future development and enhancement of the system. The report ends with a conclusion summarising what has been achieved from the project.