Direct Access File Processing

Direct access files, also known as random access files, provide the facility to access a record directly without having to go through every other record to reach it. This speeds up the process of file access considerably and makes file processing more efficient.

The heart of direct access to records in a file is the hash key, which is simply a number that supplies the address of the record in the file. If there are 20 records in a file then the key gives the number of any record, for example 3 or 16.

The hash key value might be taken directly from a record value, for example record_number=1, 2, 3, etc. In this case there is a direct correspondence between the value in the record and the location of the record in the file. This is a neat and simple solution but it requires the number of records in a file to be known in advance and it does not cater for additional records or most or the realities of data processing.

A more commonly used system is to calculate the hash key value from a field in a record and to use this as the location of the record in the file. One way to do this might be to take the numeric values of some characters in a field such as a name, add them and take the remainder modulo n to provide the record number in the file. A name such as 'Davidson' would thus yield 68 ('D'), 97 ('a') and 118 ('v'); these could be added and the remainder found modulo 20 (68+97+118 mod 20 = 283 mod 20 = 3). For a larger file we simply increase the modulus.

One problem with the calculation approach to finding a hash value is that different combinations of letters can produce the same result, a phenomenon known as a 'synonym' (different letters, same meaning or value) or a clash. An obvious synonym or clash with the method just described would be 'Davidson' and 'Davies'. Where synonyms or clashes occur action must be taken to avoid writing over the existing data, for example by moving the file pointer to the next available record in the file. This introduces an element of sequential file processing into direct access methods but the amount of sequential processing should be small in a well balanced file.

To avoid synonyms or clashes direct access files that rely on hash keys calculated from field values should be larger than the total amount of data, probably by a factor of 2. Thus if a direct access file is thought to have a likely size of 50 records the physical file should be 100 records so that there is plenty of space for synonyms to be located near their calculated positions. If two keys point to the same location of position 13 then it will be best if one of the records can go in location 14 or 15, or within a small number of records; this direct access file will not be very efficient if the synonym has to be stored in position 1000!

To begin we set up a constant and some types:

const
filesize=20;

type
 Tperson=record
              flag:char;
              name:string[20];
              address:string[20];
              dateofbirth:TDateTime;
             end;
Tpersonfile=file of TPerson;

Now we declare some variables:

var
 Form1: TForm1;
 person,fileperson:TPerson;
 personfile:TPersonfile;
 fileposition,filepositioncopy, a,b,c:integer;

To set up a file we use a basic routine:

procedure TForm1.CreateNewButtonClick(Sender: TObject);
var i:integer;
begin
 if MessageDlg('Do you want to overwrite the file?', mtConfirmation, [mbYes, mbNo], 0) = mrYes
 then
 begin
  assignfile(personfile,'directpersons.dat'); 
//change name & location to suit
  rewrite(personfile);
  with person do
  begin
   flag:='e';
//all records initially empty
   name:=' ';
   address:=' ';
   dateofbirth:=datetimepicker1.Date;
  end;
  for i:= 0 to filesize - 1 do
  begin
   person.name:=inttostr(i);
   seek(personfile,i);
   write(personfile,person);
  end;
 closefile(personfile);
 end;
end;

This routine creates a number of records specified in the for loop. The content of the records is established in the with construct where a flag is set to indicate whether a record is empty ('e') or occupied ('o'). We can make use of this later when we want to delete records: all we need to do is change the 'o' to 'e'.

We can now display our file, though there is not much to see:

procedure TForm1.displaylist;
var
 lbstring:string;
 i:integer;
begin
 listbox1.Clear;
 assignfile(personfile,'directpersons.dat'); 
//change name & location to suit
 reset(personfile);
 i:=0;
 while not eof (personfile) do
 begin
  seek(personfile,i);
  read(personfile,person);
  lbstring:=person.flag + ' ' + person.name + ' ' + person.address + ' ' + datetostr(person.dateofbirth);
  listbox1.Items.Add(lbstring);
  inc(i);
 end;
end;

procedure TForm1.DisplayButtonClick(Sender: TObject);
begin
 displaylist;
end;

The code for displaying a list is needed later on so it has been put inside its own procedure here so it can be called from elsewhere in the program. Note that the declaration of displaylist in the interface section does not require the TForm1 part:

procedure displaylist;

The code to add a new record is broken into three procedures. The first deals with opening the file and then calls a procedure to calculate the hash key value (see below). When the hask key has been calculated the file position is accessed to see if the record slot there is occupied or empty. If the slot is empty the new record is copied in to it, otherwise a procedure is called to find the next empty slot and write the new record there. If no empty slot is found a message is returned saying that the file is full.

procedure TForm1.AddButtonClick(Sender: TObject);
begin
 assignfile(personfile,'directpersons.dat'); 
//change name & location to suit
 reset(personfile);
 FindFilePosition;
 seek(personfile,fileposition);
 read(personfile,fileperson);
 if fileperson.flag='e' then
 begin
  //output record or search sequentially for find new empty slot:
  seek(personfile,fileposition);
//have to seek filepos again
  write(personfile,person);
 end
 else
  FindEmptySlot;
  closefile(personfile);
end;

Notice that the seek function has to be called before reading from the file and then before writing to it - the value of fileposition between the read and the write changes otherwise and the record does not get written into the right place.

procedure TForm1.FindFilePosition;
//hash function using numeric value of first three letters of name
begin
 with person do
 begin
  flag:='o';
  name:=edit1.Text;
  address:=edit2.text;
  dateofbirth:=datetimepicker1.Date;
 end;
 if length(person.name)=1 then
//very short name e.g. 'C', 'M', etc.
 begin
  a:=ord(person.name[1]);
  b:=32;
  c:=32;
 end
 else if length(person.name)=2 then
//short name e.g. Ho, Mo, Ng, etc.
 begin
  a:=ord(person.name[1]);
  b:=ord(person.name[2]);
  c:=32
 end
 else
 begin
  a:=ord(person.name[1]);
  b:=ord(person.name[2]);
  c:=ord(person.name[3]);
 end;
 fileposition:=(a+b+c) mod 20;
//hash function returns value in range 0-19
 filepositioncopy:=fileposition;
end;

Once again the declaration of FindFilePosition in the interface section does not need TForm1:

procedure FindFilePosition;

The purpose of the filepositioncopy variable is to hold the first calculated hash key value and to compare this with the fileposition variable as it moves through the file. If fileposition reaches the same value as filepositioncopy then the whole file has been scanned and if an empty slot has not been found then the file is full.

procedure TForm1.FindEmptySlot;
var slotfound:Boolean;
begin
 slotfound:=false;
 inc(fileposition); //look at next slot
 while not (fileposition=filepositioncopy) and (not slotfound) do
 begin
  if fileposition=filesize then fileposition:=0;
//if last record in file then wrap to first
  seek(personfile,fileposition);
  read(personfile,fileperson);
  if fileperson.flag='e' then 
//slot empty
  begin
   slotfound:=true;
   seek(personfile,fileposition);
//have to seek filepos again
   write(personfile,person);
  end;
  inc(fileposition);
 end;
 if not slotfound then showmessage('file is full');
end;

Once again the declaration of FindEmptySlotin the interface section does not need TForm1:

procedure FindEmptySlot;

Here we can see that 'bob' is about to be inserted into the file. The hash key will be 118 ('b') + 111 ('o') + 118 mod 20 = 7. It happens that records 7, 8 and 9 are already occupied so 'bob' goes into slot 10.

Continue adding records up to the limit of the file (currently 20). Note that names 'Fred' (102+114+101=317) and 'Ann' (97+110+110=317) will produce clashes. Can you identify any other clashes in the names you add? (You could write a simple program to return the numeric values of the first three letters of a name, mod 20 to give the location in the file of each name; this would be quicker than adding them manually each time.)

Deleting a Record

We would now like to add a routine for deleting a record. To do this we need to enter a name as the key for the record to be deleted and then search through the file until it is found. When the record with the matching name is found we clear the name and address fields and set the flag to 'e' for empty; this returns the record to the free records in the file, available for new data.

To make it easier for the user to enter the record with a particular name to be deleted we will add a combobox with a list of names in the file so that they can be chosen from it. Alternative methods for getting the name might be to click on items in the listbox (this would require searching for the name in the string occupying each line of the listbox) and entering it into an editbox from the keyboard. Reading the names from the file into a combobox provides a useful alternative to these methods as it removes the need to search for the name in a longer string and the user can see what names are available.

We need to populate the combobox when the form is opened:

procedure TForm1.comboboxpopulate;
begin
 combobox1.Clear;
 assignfile(personfile,'directpersons.dat'); 
//change name & location to suit
 reset(personfile);
 while not eof (personfile) do
 begin
  read (personfile, person);
  if person.flag='o' then combobox1.Items.Add(person.name);
 end;
 closefile(personfile);
end;

procedure TForm1.FormCreate(Sender: TObject);
begin
 comboboxpopulate;
end;

Once again the declaration of ComboboxPopulate the interface section does not need TForm1:

procedure ComboboxPopulate;

We will need to run this same procedure when we delete an item so that the name no longer appears in the combobox:

procedure TForm1.DeleteButtonClick(Sender: TObject);
var
 nametodelete:string[20];
 found:Boolean;
 i:integer;
begin
 found:=false;
 if not(combobox1.Text='Delete Names') then
 begin
  nametodelete:=combobox1.Text;
  assignfile(personfile,'directpersons.dat'); 
//change name & location to suit
  reset(personfile);
  for i:= 0 to filesize - 1 do
  begin
   seek(personfile,i);
   read (personfile, person);
   if person.name=nametodelete then
  begin
   found:=true;
   with person do
    begin
     flag:='e';
     name:=' ' ;
     address:=' ';
    end;
  end;
 seek(personfile,i);
 write(personfile,person);
 end;
 closefile(personfile);
 end;
 if not found then showmessage('Name not found, not deleted');
 comboboxpopulate; 
//update list of names for deletion
 displaylist;  
//update display of file
end;

The DeleteButton code takes the name to be deleted from the combobox and then searches the file for a match in a record. When a match is found it writes over the data in that record and sets the flag to 'e', thus making it available for new data. The procedure updates the display of the file contents in the list box and of the names that can be deleted in the combobox. The situation where a name is not found should not arise because the names are chosen from the combobox.

  

Back to Tutorial