report/chapter3.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86

\section{Architectural considerations}

When using fixed priority scheduling with deferred preemption, one must take some extra architectural considerations into account. This chapter deals with a number of aspects such as the choice of processor, use of the memory and interrupt handling.

\subsection{Processors}

In this section, two types of processors are being compared. These two are pipelined processors and general purpose processors. Pipelining, a standard feature in RISC\footnote[1]{RISC, or Reduced Instruction Set Computer, is a type of microprocessor architecture that utilizes a small, highly-optimized set of instructions.} processors, is much like an assembly line. Because the processor works on different steps of the instruction at the same time, more instructions can be executed in a shorter period of time. General purpose processors on the other hand are mainly used to achieve a descent average response time (although most architectures nowadays contain a pipeline feature). \\

The use of a pipelined processor can result in significant performance improvements. However, dependencies between instructions can cause pipeline hazards that may delay the completion of instructions. Christopher A. Healy et. al. posed a method to determine the worst case execution times of pipelined processors with cache in \cite{hwh-itapic-95}. This approach reads all kinds of information about each type of instruction from a machine-dependent data file. This way the performance of a sequence of instructions in a pipeline can be analyzed. \\

In the paper of Healy et. al. they assume a system with a non-preemptive scheduling paradigm. In the case of deferred preemption scheduling, a task is split up in one or more sub-tasks which are all non-preemptive, thus making the approach applicable to FPDS. This means the worst case execution times can be calculated and thus the system can be analyzed if it's schedulable under FPDS. \\

It's clear that the use of pipelined processors have a huge advantage over general purpose processors when it comes down to performance. Yet, determining the worst case execution times is much more elaborate and complex. Both aspects need to be taken into account when choosing the right architecture for a specific real-time application.

\subsection{Memory}

\subsubsection{Cache}

When cache memory is to be used in real-time systems, special attention must be paid since cache memory introduces unpredictability to the system. This unpredictable behavior is due to cache reloads upon preemption of a task. When preemptions are frequent, the sum of such cache reloading delays takes a significant portion of task execution time. The cache reloading costs due to preemptions have largely been ignored in real-time scheduling. \\

Buttazzo states in \cite{but-hrtcs-05} that it would be more efficient to have processors without cache or with cache disable. Although this would obviously get rid of the problem, it is not desirable. Another approach is to allows preemptions only at predetermined execution points with small cache reloading costs. A scheduling scheme was given in \cite{scmsc-lpsecmrts-98} and is called Limited Preemptible Scheduling (LPS). The scheduling scheme only allows preemptions only at predetermined execution points with small cache reloading costs. This means that the method can be applied to FPDS. \\

In \cite{hwh-itapic-95} a method to determine the worst case execution times was given for multiprocessors with cache. In that paper a cache simulation is used to categorize a cache operation. Using the outcome of such a simulation the blocking time can be determined very precisely. It was already stated that this approach is applicable to FPDS in paragraph 3.1. \\

\subsubsection{Local and global memory}

In any multiprocessing system cooperating processes share data via shared data objects. A typical abstraction of a shared memory multiprocessor real-time system configuration is depicted in figure 1. Each node of the system contains a processor together with its local memory. All nodes are connected to the shared memory via an interconnection network.

\begin {center}
  \includegraphics[width=80mm]{figs/shared_memory_multiprocessor.png} \\
  Figure 1. Shared memory multiprocessor system structure
\end {center}

However, the literature doesn't state much about the use of local and global memory when applying FPDS. Therefore it is difficult to make a statement about this subject. Most likely the use of local memory shouldn't have much impact on the schedulability of the system under FPDS. Global memory on the other hand can indeed bring nondeterminism. When guarding a critical section with semaphores for instance, a lower priority job can block a higher priority job. This can be overcome with various protocols like PIP, PCP and SRP (although the implementations of PIP and PCP recently have been proven flawed).

\subsection{Interrupt handling}

Real time systems usually have at least two groups of tasks. They are the application tasks and the interrupts. Both classes of task are repeatedly fired due to certain events. The difference between the two classes is that application tasks are usually periodic and they start executing due to events that are generated internally. In contrast, interrupts execute in response to external events. \\
	
Interrupts generally pre-empt a task in the same way a higher priority task would pre-empt a lower priority one. There are two main ways an interrupt can be handled. Interrupts can be handled using a Unified interrupt architecture where system services can be accessed from Interrupt Service Routines (ISR) or using a Segmented Interrupt Architecture wherein systems services may not be accessed from ISR. \\

In using the Unified interrupt architecture, interrupts are served immediately after they are invoked, and all interrupts must be disabled during the time the initial interrupt is served because the ISR can access the system services directly and there is no way to ascertain which ISRs make which kernel calls. Hence, there exists a possibility that interrupts may be disabled for too long and an interrupt may be missed. \\

In contrast, when using the Segmented Interrupt architecture, ISRs cannot call kernel services. Instead, the ISRs invoke a Link Service Routine (LSR), which are then scheduled by a LSR scheduler to run at a later time. LSRs can run only after all ISRs have been completed. They then call kernel services which schedule the LSR with respect to all other tasks. The kernel services schedules the LSR so that it only starts running if and when the appropriate resources are available. This means that it incurs a lower task switching overhead. Using this method of serving interrupts also helps to smooth peak interrupt overloads. When a burst of ISRs are invoked in rapid succession, the LSR scheduler helps to ensure that temporal integrity is maintained and will allow the interrupts to run in order of the way they were invoked.  Additionally, LSRs run with interrupts fully enabled, which prevent missing of any interrupts during the execution of LSRs. \\

In conclusion, we find that using the segmented interrupt architectures have the benefit of lower task switching overhead, smoothing peak interrupts overloads and prevent the missing of interrupts that occur while LSRs are being served. Thus, Segmented Interrupt architecture is superior compared to the Unified interrupt architecture.

\subsection{Multi-processor systems}

Gai et. al. \cite{gab-mdssa-02} Describe scheduling of tasks in asymmetric multiprocessor systems consisting of a general purpose CPU and DSP acting as a co-processor to the GPP master. The DSP is designed to execute algorithms on a set of data without interruption, hence the schedule for the DSP is non-preemptive. Gai et. al. treat the DSP scheduling as a special case of scheduling with shared resources in a multiprocessor distributed system, using a variant of the \emph{cooperative scheduling} method presented in \cite{sr-csmr-99} by Seawong and Rajkumar. Cooperative scheduling is appropriate in situations where a task can be decomposed into multiple phases, such that each phase requires a different resource. The basic idea of cooperative scheduling as described by Seawong and Rajkumar is to associate suitable deadlines to each phase of a job in order to meet the job deadline.

In order to apply this to the GPP + DSP multiprocessor architecture, Gai et. al. define their real-time model to consist of periodic and sporadic tasks subdivided into regular tasks and DSP tasks. The regular tasks (application tasks, interrupt service routines, ...) are executed entirely on the master CPU for $ C_{i} $ units of time. The DSP tasks execute for $ C_{i} $ units of time on the master CPU and an additional $ C^{DSP}_{i} $ units of time on the DSP. It is assumed that each DSP job performs at most one DSP request after $ C^{pre}_{i} $ units of time, and then executes for another $ C^{post}_{i} $ units of time, such that $ C_{i} = C^{pre}_{i} + C^{post}_{i} $ as depicted in Figure \ref{fig:dsptaskstruct}.

\begin{figure}[ht]
	\includegraphics{figs/dsptaskstruct.pdf}
	\label{fig:dsptaskstruct}
	\caption{Structure of a DSP task}
\end{figure}

When executing a DSP task, a hole within each job is generated in the schedule of the master processor. Gai et. al. show that the earliest deadline first (EDF) and rate monotonic (RM) scheduling algorithms do not always yield a feasible schedule, while one exists. Figure \ref{fig:infeasible} shows an infeasible task set when scheduled by RM or EDF. Figure \ref{fig:feasible} shows the same task set scheduled using a fixed priority assignment where $ \tau_2 \prec \tau_1 $, such that $ \tau_2 $ executes in the holes of the master CPU's schedule.

\begin{figure}[ht]
	\includegraphics{figs/infeasible.pdf}
	\label{fig:infeasible}
	\caption{A task set cannot be scheduled by RM and EDF ( $\tau_2$ misses all deadlines).}
\end{figure}

\begin{figure}[ht]
	\includegraphics{figs/feasible.pdf}
	\label{fig:feasible}
	\caption{A feasible schedule with fixed priority assignment  $ \tau_2 \prec \tau_1 $.}
\end{figure}

The main idea is to modify the scheduler to exploit these holes to schedule some other task on the master processor to improve the schedulability bound of the system. This is achieved by modeling the DSP request of a task $ \tau_i $ as a remote procedure call that blocks $ \tau_i $ for $ B_i $ units of time, waiting for its completion. The scheduling algorithm uses a fixed priority assignment. In order to determine the next task to be executed, the scheduler enqueues regular tasks and DSP tasks in two separate queues that are ordered by priority, as shown in Figure \ref{fig:dspscheduler}.

\begin{figure}[ht]
	\includegraphics{figs/dspscheduler.pdf}
	\label{fig:dspscheduler}
	\caption{Structure of a DSP task}
\end{figure}

When the DSP is idle the scheduler selects the task with the highest priority between those at the head of the two queues. When the DSP is active, the scheduler select the highest priority task from the head of the regular queue only. In this way, a task using the DSP blocks all other tasks requiring the DSP, but not the regular tasks, which can freely execute on the master processor in the holes created by DSP activities.

Because the DSP tasks executing on the DSP cannot be preempted, it can happen that a lower priority DSP task $ \tau_i $ is blocking a higher priority task that was released during the execution of $ \tau_i $ on the DSP. The blocking of high priority DSP tasks by lower priority DSP task has to be accounted for in the schedulability test. This test is presented \cite{gab-mdssa-02} using the hyperbolic bound.

The article furthermore present results of a simulation of the described algorithm and compares them to schedules generated by the distributed priority ceiling protocol (DPCP). However they do not mention the method of priority assignment used in their simulation.