Process Identifiers

Every process has a unique ID, a non-negative integer. Process ID 0 is usually the scheduler process and is often known as the swapper. No process on disk correspons to this process, which is part of the kernel and is konwn as a system process. Process ID 1 is usually the init process and is invoked by the kernel at the end of the bootstrap procedure. The program file for this process was /etc/init in older verion of the UNIX System and is /sbin/init in newer version. This process is responsible for bringing up a UNIX system after the kernel has been bootstraped. init usually reads the system-dependent initialization files - the /etc/rc* files or /etc/inittab and the files in /etc/init.d - and brings the system to a certain state, such as multiuser. The init process never dies, it is a normal user process althrough it runs with superuser privileges.

fork function

An existing process can create a new one by calling the fork function.

1
2
3

# include<unistd.h>
//Returns: 0 in child, process ID of child in parent, -1 on error
pid_t fork(void);

This function is called once but return twice. The reason the child’s process ID is returned to the parent is that a process can have more than one child, and there is no function that allows a process to obtain the process IDs of its children. The reason fork returns 0 to the child is that a process can have only a single parent, and the child can always call getppid to obtain the process ID of its parent.(Process ID 0 is reserved for use by the kernelit’s not possible for 0 to by the process ID of a child).

Both the child and the parent continue executing with the instruction that follows the call to fork. The child is a copy of the parent, for example, the child gets a copy of the parent’s data space, heap, and stack. This is a copy for the child, the parent and the child do not share these portions of memory. The parent and the child share the text segment.

Modern implements don’t perform a complete copy of the parent’s data, stack and heap, since a fork is often followed by an exec. Instend, a technique called copy-on-write(COM) is used. These regions are shared by the parent and the child and have their protection changed by the kernel to read-only. If either process tries to modify these regions, the kernel then makes a copy of that piece of memeory only, typically a “page” in a virtual memory system.

All file descriptors that are open in the parent are duplicated in the child. The parent and the child share a fiel table entry for every open descriptor. There are two normal cases for handling the descriptors after a fork.

The parent waits for the child to complete. In this case, the parent docs not need to do anything with its descriptors. When the child terminates, any of the shared descriptors that the child read from or wrote to will have their file offsets updated accordingly.
Both the parent and the child go their own ways. Here after the fork, the parent closes the descriptors that it doesn’t need and the child does the same thing. This way neither interferes with the other’s open descriptors. This scenario is often found with network servers.

Properties the child inherited from the parent:

Open files
Real user ID, real group ID, effective user ID, and effective group ID
Supplementary group IDs
Process group ID
Session ID
Controlling terminal
The set-user-Id and set-group-ID flags
Current working directory
Root directory
File mode creating mask
Signal mask and dispositions
The close-on-exec flag for any open file descriptors
Environment
Attached shared memory segments
Memory mappings
Resource limits

The differences between the parent and child:

Return value from fork
The process IDs
The two processes have different parent process ID
The child’s tms_utime, tms_stime, tms_cutime and tms_cstime values are set to 0
File locks set by the parent are not inherited by the child
Pending alarms are cleared for the child
The set of pending signale for the child is set to the empty set

There are two uses for fork:

When a process wants to duplicate itself so that the parent and the child can each execute different sections of code at the same time. This is common for network servers - the parent waits for a service request from a client. When the request arrives, the parent calls fork and lets the child handle the request. The parent goes back to waiting for the next service request to arrive.
When a process wants to execute a different program. This is comman for shells. In this case, the child does an exec right after it return from the fork.

exit

If the parent terminates before the child, the init process becomes the parent. We say the process has been inherited by init. What normally happens is that whether a process terminates, the kernel goes through all active processes to see whether the terminating process is the parent of any process that still exits. If so, the parent process ID of the surviving process is changed to be 1(the process ID of init).

The kernel kepps a small amount of information for every terminating process, the information is available when the parent of the terminating process calls wait or waitpid. Minimally, this information consists of the process ID, the termination status of the process, and the amount of CPU time taken by the process. The kernel can discard all the memory used by the process and close its open files. A process that has terminated, but whose parent has not yet waited for it is called zombie.

Process init is written so that whenever one of its children terminates, init calls one of the wait functions to getch the termination status. By doing this, init prevents the system from being clogged by zombies.

wait, waitpid

The wait function can block the caller until a child process terminates, whereas waitpid has an option that prevents it from blocking.

The waitpid function doesn’t wait for the child that terminates first; it has a number of options that control which process it waits for.

if a child has already terminated and is a zombie, wait returns immediately with that child’s status. Otherwise, it blocks that calller until a child terminates. If the caller bolcks and has multiple children, wait returns when one terminates.

exec

When a process calls one of exec functions, that process is completely replaced by the new program, and the new program starts executing at its main function. The process ID does not change across an exec, because a new process is not created, exec merely replaces the current process - its text, data, heap and stack segments with a brand-new program from disk.

Reference

Advanced Programming in the Unix Environment

Noob

Process control