Friday, December 25, 2009

Unit 1

Computer architecture is the conceptual design and fundamental operational structure of a computer system. It is a blueprint and functional description of requirements (especially speeds and interconnections) and design implementations for the various parts of a computer — focusing largely on the way by which the central processing unit (CPU) performs internally and accesses addresses in memory.


 

It may also be defined as the science and art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals.


 

Computer hardware is the physical part of a computer, including the digital circuitry, as distinguished from the computer software that executes within the hardware. Computer hardware

A typical Personal computer consists of a case or chassis in a tower shape (desktop) and the following parts:


 


 

Internals of typical personal computer Typical Motherboard found in a computer


 

Motherboard
or system board with slots for expansion cards and holding parts

Central processing unit (CPU) , Computer fan - used to cool down the CPU

Random Access Memory (RAM) - for program execution and short term data storage, so the computer does not have to take the time to access the hard drive to find the file(s) it requires. More RAM will normally contribute to a faster PC. RAM is almost always removable as it sits in slots in the motherboard, attached with small clips. The RAM slots are normally located next to the CPU socket.

Firmware
usually Basic Input-Output System (BIOS) based or in newer systems Extensible Firmware Interface (EFI) compliant


 

Buses : Taking data, instructions from one place to another ( processor to main memory ….)

PCI

PCI-E

USB

HyperTransport

CSI (expected in 2008)

AGP (being phased out)

VLB (outdated)

ISA (outdated)

EISA (outdated)

MCA (outdated)


 

Power supply
- a case that holds a transformer, voltage control, and (usually) a cooling fan

Storage controllers of IDE, SATA, SCSI or other type, that control hard disk, floppy disk, CD-ROM and other drives; the controllers sit directly on the motherboard (on-board) or on expansion cards


 

Video display controller that produces the output for the computer display. This will either be built into the motherboard or attached in its own separate slot (PCI, PCI-E or AGP), requiring a Graphics Card.


 

Computer bus controllers (parallel, serial, USB, FireWire) to connect the computer to external peripheral devices such as printers or scanners

Some type of a removable media writer:


 

CD - the most common type of removable media, cheap but fragile.

CD-ROM Drive

CD Writer

DVD

DVD-ROM Drive

DVD Writer

DVD-RAM Drive

BD

BD-ROM Drive

BD Writer

Floppy disk

Zip drive

USB flash drive AKA a Pen Drive, memory stick

Tape drive - mainly for backup and long-term storage

Internal storage - keeps data inside the computer for later use.

Hard disk - for medium-term storage of data.

Disk array controller

Sound card - translates signals from the system board into analog voltage levels, and has terminals to plug in speakers.

Networking - to connect the computer to the Internet and/or other computers

Modem - for dial-up connections

Network card - for DSL/Cable internet, and/or connecting to other computers.

Other peripherals

In addition, hardware can include external components of a computer system. The following are either standard or very common.


 

Wheel Mouse


 

Input or Input devices

Text input devices

Keyboard

Pointing devices

Mouse

Trackball

Gaming devices

Joystick

Gamepad

Game controller

Image, Video input devices

Image scanner

Webcam

Audio input devices

Microphone


 

Output or Output devices

Image, Video output devices

Printer
Peripheral device that produces a hard copy. (Inkjet, Laser)

Monitor
Device that takes signals and displays them. (CRT, LCD)

Audio
output devices

Speakers
A device that converts analog audio signals into the equivalent air vibrations in order to make audible sound.

Headset A device similar in functionality to computer speakers used mainly to not disturb others nearby.


 

Computer software, consisting of programs, enables a computer to perform specific tasks, as opposed to its physical components (hardware) which can only do the tasks they are mechanically designed for. The term includes application software such as word processors which perform productive tasks for users, system software such as operating systems, which interface with hardware to run the necessary services for user-interfaces and applications, and middleware which controls and co-ordinates distributed systems.


 

A Central processing unit (CPU), or sometimes simply processor, is the component in a digital computer capable of executing a program.(Knott 1974) It interprets computer program instructions and processes data. CPUs provide the fundamental digital computer trait of programmability, and are one of the necessary components found in computers of any era, along with primary storage and input/output facilities. A CPU that is manufactured as a single integrated circuit is usually known as a microprocessor.


 

CPU operation : The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored instructions called a program. Discussed here are devices that conform to the common von Neumann architecture. The program is represented by a series of numbers that are kept in some kind of computer memory. There are four steps that nearly all von Neumann CPUs use in their operation: fetch, decode, execute, and writeback.


 


 

Diagram showing how one MIPS32 instruction is decoded. (MIPS Technologies 2005)

The first step, fetch, involves retrieving an instruction (which is represented by a number or sequence of numbers) from program memory. The location in program memory is determined by a


 

Processor registers:

IR – instruction register holds current instruction

PC - Program counter register which stores a number that identifies the current position in the program. In other words, the program counter keeps track of the CPU's place in the current program. After an instruction is fetched, the PC is incremented by the length of the instruction word in terms of memory units. Often the instruction to be fetched must be retrieved from relatively slow memory, causing the CPU to stall while waiting for the instruction to be returned.


 

The instruction that the CPU fetches from memory is used to determine what the CPU is to do. In the decode step, the instruction is broken up into parts that have significance to other portions of the CPU. The way in which the numerical instruction value is interpreted is defined by the CPU's instruction set architecture (ISA). Often, one group of numbers in the instruction, called the opcode, indicates which operation to perform. The remaining parts of the number usually provide information required for that instruction, such as operands for an addition operation. Such operands may be given as a constant value (called an immediate value), or as a place to locate a value: a register or a memory address, as determined by some addressing mode. In older designs the portions of the CPU responsible for instruction decoding were unchangeable hardware devices. However, in more abstract and complicated CPUs and ISAs, a microprogram is often used to assist in translating instructions into various configuration signals for the CPU. This microprogram is sometimes rewritable so that it can be modified to change the way the CPU decodes instructions even after it has been manufactured.


 


 

Block diagram of a simple CPU


 

FETCHà DECODEà EXECUTE


 

After the fetch and decode steps, the execute step is performed. During this step, various portions of the CPU are connected so they can perform the desired operation. If, for instance, an addition operation was requested, an arithmetic logic unit (ALU) will be connected to a set of inputs and a set of outputs. The inputs provide the numbers to be added, and the outputs will contain the final sum. The ALU contains the circuitry to perform simple arithmetic and logical operations on the inputs (like addition and bitwise operations). If the addition operation produces a result too large for the CPU to handle, an arithmetic overflow flag in a flags register may also be set (see the discussion of integer range below).


 

The final step, writeback, simply "writes back" the results of the execute step to some form of memory. Very often the results are written to some internal CPU register for quick access by subsequent instructions. In other cases results may be written to slower, but cheaper and larger, main memory. Some types of instructions manipulate the program counter rather than directly produce result data. These are generally called "jumps" and facilitate behavior like loops, conditional program execution (through the use of a conditional jump), and functions in programs Many instructions will also change the state of digits in a "flags" register. These flags can be used to influence how a program behaves, since they often indicate the outcome of various operations. For example, one type of "compare" instruction considers two values and sets a number in the flags register according to which one is greater. This flag could then be used by a later jump instruction to determine program flow.


 

After the execution of the instruction and writeback of the resulting data, the entire process repeats, with the next instruction cycle normally fetching the next-in-sequence instruction because of the incremented value in the program counter. If the completed instruction was a jump, the program counter will be modified to contain the address of the instruction that was jumped to, and program execution continues normally. In more complex CPUs than the one described here, multiple instructions can be fetched, decoded, and executed simultaneously.

------------------------------

Control unit:


 

A control unit is the part of a CPU or other device that directs its operation. The outputs of the unit control the activity of the rest of the device. A control unit can be thought of as a finite state machine.


 

Operations:

  1. Pass control signals
  2. Control/regulate other devices
  3. Controlling the Data transfer
  4. Timely signal


 

Now they are often implemented as a microprogram that is stored in a control store. Words of the microprogram are selected by a microsequencer and the bits from those words directly control the different parts of the device, including the registers, arithmetic and logic units, instruction registers, buses, and off-chip input/output. In modern computers, each of these subsystems may have its own subsidiary controller, with the control unit acting as a supervisor.


 

The control unit is the circuitry that controls the flow of information through the processor, and coordinates the activities of the other units within it. In a way, it is the "brain within the brain", as it controls what happens inside the processor, which in turn controls the rest of the PC.

The functions performed by the control unit vary greatly by the internal architecture of the CPU, since the control unit really implements this architecture. On a regular processor that executes x86 instructions natively, the control unit performs the tasks of fetching, decoding, managing execution and then storing results.

-----------------------------------

Arithmetic logic unit


 

A typical schematic symbol for an ALU: A & B are operands; R is the output; F is the input from the Control Unit; D is an output status


 

The arithmetic logic unit (ALU) is a digital circuit that calculates an arithmetic operation (addition, subtraction, etc.) and logic operations (Exclusive Or, AND, etc.) between two numbers. The ALU is a fundamental building block of the central processing unit of a computer.

Many types of electronic circuits need to perform some type of arithmetic operation, so even the circuit inside a digital watch will have a tiny ALU that keeps adding 1 to the current time, and keeps checking if it should beep the timer, etc...


 

Memory (storage devices)


 

Primary storage is directly connected to the central processing unit of the computer. It must be present for the CPU to function correctly. As shown in the diagram, primary storage typically consists of three kinds of storage:

Processor registers are internal to the central processing unit. Registers contain information that the arithmetic and logic unit needs to carry out the current instruction. They are technically the fastest of all forms of computer storage, being switching transistors integrated on the CPU's silicon chip, and functioning as electronic "flip-flops".

Cache memory is a special type of internal memory used by many central processing units to increase their performance or "throughput". Some of the information in the main memory is duplicated in the cache memory, which is slightly slower but of much greater capacity than the processor registers, and faster but much smaller than main memory. Multi-level cache memory is also commonly used—"primary cache" being smallest, fastest and closest to the processing device; "secondary cache" being larger and slower, but still faster and much smaller than main memory.

Main memory contains the programs that are currently being run and the data the programs are operating on. In modern computers, the main memory is the electronic solid-state random access memory. It is directly connected to the CPU via a "memory bus" (shown in the diagram) and a "data bus". The arithmetic and logic unit can very quickly transfer information between a processor register and locations in main storage, also known as a "memory addresses". The memory bus is also called an address bus or front side bus and both busses are high-speed digital "superhighways". Access methods and speed are two of the fundamental technical differences between memory and mass storage devices. (Note that all memory sizes and storage capacities shown in the diagram will inevitably be exceeded with advances in technology over time.)

Secondary and off-line storage


 

ROM (read only memory) :    Bootstrap/loader – a program stored in ROM for starting the

computer S/W operating when power is turned on.


 

Secondary storage requires the computer to use its input/output channels to access the information, and is used for long-term storage of persistent information. However most computer operating systems also use secondary storage devices as virtual memory - to artificially increase the apparent amount of main memory in the computer. Secondary storage is also known as "mass storage", as shown in the diagram above. Secondary or mass storage is typically of much greater capacity than primary storage (main memory), but it is also much slower. In modern computers, hard disks are usually used for mass storage. The time taken to access a given byte of information stored on a hard disk is typically a few thousandths of a second, or milliseconds. By contrast, the time taken to access a given byte of information stored in random access memory is measured in thousand-millionths of a second, or nanoseconds. This illustrates the very significant speed difference which distinguishes solid-state memory from rotating magnetic storage devices: hard disks are typically about a million times slower than memory. Rotating optical storage devices, such as CD and DVD drives, are typically even slower than hard disks, although their access speeds are likely to improve with advances in technology. Therefore, the use of virtual memory, which is millions of times slower than "real" memory, significantly degrades the performance of any computer. Virtual memory is implemented by many operating systems using terms like swap file or "cache file". The main historical advantage of virtual memory was that it was much less expensive than real memory. That advantage is less relevant today, yet surprisingly most operating systems continue to implement it, despite the significant performance penalties.

Off-line storage is a system where the storage medium can be easily removed from the storage device. Off-line storage is used for data transfer and archival purposes. In modern computers, CDs, DVDs, memory cards, flash memory devices including "USB drives", floppy disks, Zip disks and magnetic tapes are commonly used for off-line mass storage purposes. "Hot-pluggable" USB hard disks are also available. Off-line storage devices used in the past include punched cards, microforms, and removable Winchester disk drums.


 

Computer memory architecture


 


 

Magnetic storage uses different patterns of magnetization on a magnetically coated surface to store information. Magnetic storage is non-volatile. The information is accessed using one or more read/write heads. Since the read/write head only covers a part of the surface, magnetic storage is sequential access and must seek, cycle or both. In modern computers, the magnetic surface will take these forms:

Magnetic disk

Floppy disk, used for off-line storage

Hard disk, used for secondary storage

Magnetic tape data storage, used for tertiary and off-line storage

In early computers, magnetic storage was also used for primary storage in a form of magnetic drum, or core memory, core rope memory, thin film memory, twistor memory or bubble memory. Also unlike today, magnetic tape was often used for secondary storage.

Semiconductor memory uses semiconductor-based integrated circuits to store information. A semiconductor memory chip may contain millions of tiny transistors or capacitors. Both volatile and non-volatile forms of semiconductor memory exist. In modern computers, primary storage almost exclusively consists of dynamic volatile semiconductor memory or dynamic random access memory. Since the turn of the century, a type of non-volatile semiconductor memory known as flash memory has steadily gained share as off-line storage for home computers. Non-volatile semiconductor memory is also used for secondary storage in various advanced electronic devices and specialized computers.

Optical disc storage

Optical disc storage uses tiny pits etched on the surface of a circular disc to store information, and reads this information by illuminating the surface with a laser diode and observing the reflection. Optical disc storage is non-volatile and sequential access. The following forms are currently in common use:

CD, CD-ROM, DVD: Read only storage, used for mass distribution of digital information (music, video, computer programs)

CD-R, DVD-R, DVD+R: Write once storage, used for tertiary and off-line storage

CD-RW, DVD-RW, DVD+RW, DVD-RAM


 

Memory Hierarchy


 


 

Memory system characteristics

Location        :    Register (processor)

                Internal (main )

                External (auxiliary memory)

Capacity        :    word size (40 cells..) , number of words

Unit of transfer    :    word (collection of cells each carry ' 0/1') , block, no. of words

Access Methods    :    sequential, random, direct, associate

Performance        :    access time, cycle time, transfer rate

Physical type        :    semiconductor, magnetic, optical

Physical

Characteristics    :     volatile / non volatile , erasable/non erasable

Throughput     :    is the rate at which information can be read from or written to the

storage. In computer storage, throughput is usually expressed in terms of megabytes per second or MB/s, though bit rate may also be used. As with latency, read rate and write rate may need to be differentiated.


 

Von Neumann architecture

Key points :

  • Stored program (assembly, compiler program)
  • Instruction & data both are stored in memory
  • Arithmetic operation – Arithmetic Unit
  • Logic operation- Logical Unit
  • 1000 storage locations – called words of 40 digits . each word contain two 20-bit instructions.
  • Includes 21 instructions.


 


 


 


 


 

A computer that makes use of von Neumann architecture has five components: a control circuitry, an arithmetic-logic unit, an input/output device, a memory, and a bus that ensures a data path connecting these components.


 

Design of the Von Neumann architecture


 

The "von Neumann" in von Neumann architecture refers to Hungarian-American mathematician John von Neumann (1903-1957). Von Neumann was initially interested in access to the fastest computers available (of which there were few) during World War II in order to perform complex computations for a variety of war-related problems. In 1944, Von Neumann became a consultant to the ENIAC (Electronic Numerical Integrator and Computer) project, which upon its completion in 1945 became the world's first general purpose, electronic computer.


 

The von Neumann architecture is a computer design model that uses a processing unit and a single separate storage structure to hold both instructions and data. It is named after mathematician and early computer scientist
John von Neumann. Such a computer implements a universal Turing machine, and the common "referential model" of specifying sequential architectures, in contrast with parallel architectures. The term "stored-program computer" is generally used to mean a computer of this design.


 

The Von Neumann model is as used in a desktop computer - executes instructions sequentially

Von Neumann computations are a class of computer programs ideally suited to sequential processing.


 

The earliest computing machines had fixed programs. Some very simple computers still use this design, either for simplicity or training purposes. For example, a desk calculator (in principle) is a fixed program computer. It can do basic mathematics, but it cannot be used as a word processor or to run video games. To change the program of such a machine, you have to re-wire, re-structure, or even re-design the machine. Indeed, the earliest computers were not so much "programmed" as they were "designed". "Reprogramming", when it was possible at all, was a very manual process, starting with flow charts and paper notes, followed by detailed engineering designs, and then the often-arduous process of implementing the physical changes.


 

The idea of the stored-program computer changed all that. By creating an instruction set architecture and detailing the computation as a series of instructions (the program), the machine becomes much more flexible. By treating those instructions in the same way as data, a stored-program machine can easily change the program, and can do so under program control.


 

The terms "von Neumann architecture" and "stored-program computer" are generally used interchangeably, and that usage is followed in this article. However, the Harvard architecture concept should be mentioned as a design which stores the program in an easily modifiable form, but not using the same storage as for general data.


 

A stored-program design also lets programs modify themselves while running. One early motivation for such a facility was the need for a program to increment or otherwise modify the address portion of instructions, which had to be done manually in early designs. This became less important when index registers and indirect addressing became customary features of machine architecture. Self-modifying code is deprecated today since it is hard to understand and debug, and modern processor pipelining and caching schemes make it inefficient.


 

There are drawbacks to the von Neumann design. Aside from the von Neumann bottleneck described below, program modifications can be quite harmful, either by accident or design. In some simple stored-program computer designs, a malfunctioning program can damage itself, other programs, or the operating system, possibly leading to a crash. A buffer overflow is one very common example of such a malfunction. Memory protection and other forms of access control can help protect against both accidental and malicious program modification.


 

The term "von Neumann architecture" arose from mathematician John von Neumann's paper, First Draft of a Report on the EDVAC.[2] Dated June 30, 1945, it was an early written account of a general purpose stored-program computing machine (the EDVAC). However, while von Neumann's work was pioneering, the term von Neumann architecture does somewhat of an injustice to von


 

The idea of a stored-program computer existed at the Moore School of Electrical Engineering at the University of Pennsylvania before von Neumann even knew of the ENIAC's existence. The exact person who originated the idea there is unknown.


 

When the ENIAC was being designed, it was clear that reading instructions from punched cards or paper tape would not be fast enough, since the ENIAC was designed to execute instructions at a much higher rate. The ENIAC's program was thus wired into the design, and it had to be rewired for each new problem. It was clear that a better system was needed. The initial report on the proposed EDVAC was written during the time the ENIAC was being built, and contained the idea of the stored program, where instructions were stored in high-speed memory, so they could be quickly accessed for execution.


 

The separation between the CPU and memory leads to the von Neumann bottleneck, the limited throughput (data transfer rate) between the CPU and memory compared to the amount of memory. In modern machines, throughput is much smaller than the rate at which the CPU can work. This seriously limits the effective processing speed when the CPU is required to perform minimal processing on large amounts of data. The CPU is continuously forced to wait for vital data to be transferred to or from memory. As CPU speed and memory size have increased much faster than the throughput between them, the bottleneck has become more of a problem.


 

The performance problem is reduced by a cache between CPU and main memory, and by the development of branch prediction algorithms. It is less clear whether the intellectual bottleneck that Backus criticized has changed much since 1977. Backus's proposed solution has not had a major influence. Modern functional programming and object-oriented programming are much less geared towards pushing vast numbers of words back and forth than earlier languages like Fortran, but internally, that is still what computers spend much of their time doing.


 


 

Sample instructions:


 

Instruction type    Opcode        Symbolic             Description

Representation


 

Data transfer        00001010        LOAD MQ            transfer content from

                                        Register to another

Unconditional

Branch            00001101        JUMP M(X,0:19)        take next instruction from

                                        Left half of M(x)


 

Conditional branch    00001111        JUMP + M(X 0:A)         ; ; with condition


 

Arithmetic        00000101        ADD M(X)            Arithmetic addition


 

Address Modify    00010010        STOR M(X,8:19)        Replace left cell


 


 

Instruction fetch and execution


 


 


 


 


 


 


 

    

Unit 2

Reduced Instruction Set Computer (RISC)


 

An important aspect of computer architecture is the design of the instruction set for the processor. The instruction set chosen for a particular computer determines the way that machine language programs are constructed. As digital hardware became cheaper with the advent of integrated circuits, computer instructions tended to increase both in number and complexity. The trend into computer hardware complexity was influenced by various factors such as upgrading existing models to provide more customer applications, adding instructions that facilitate the translation from high-level language into machine language programs, and striving to develop machines that move functions from software implementation into hardware implementation.

A computer with a large number of instructions is classified as complex instruction set computer, abbreviated CISC. In the early 1980s, a number of computer designers recommended that computers use fewer instructions with simple constructs so they can be executed much faster within the CPU without having to use memory as often. This type of computer is classified as a reduced instruction set computer or RISC.


 

CISC Characteristics:

The design of an instruction set for a computer must take into consideration not only machine language constructs, but also the requirement imposed on the use of high-level programming languages. The translation from high-level to machine language programs is done by means of a compiler program.

Another characteristic of CISC architecture is the incorporation of variable-length instruction formats. Instructions that require register operands may be only two byte in length, but instructions that need two memory addresses may need five bytes to include the entire instruction code.

The major characteristics of CISC architecture are:

  1. A large number of instructions – typically from 100 to 250 instructions.
  2. Some instructions that perform specialized tasks and are used infrequently.
  3. A large variety of addressing modes-typically from 5 to 20 different modes.
  4. A variable-length instruction formats.
  5. Instructions that manipulate operands in memory.


 

RISC Characteristics:

The small set of instructions of a typical RISC processor consists mostly of register-to-register operations, with only simple load and store operations for memory access. Thus each operand is brought into a processor register with a load instruction.

A characteristic of RISC processors is their ability to executer one instruction per clock cycle. This is done by overlapping the fetch, decode and executer phases of two or three instruction by using procedure referred to as pipelining. A load or store instruction may require two clock cycles because access to memory takes more time that register operations.


 

The major characteristics of RISC architecture are:

  1. Relatively few instruction and addressing modes.
  2. Memory access limited to load and store instructions.
  3. All operations done within the registers of the CPU.
  4. Fixed-length, easily decoded instruction format.
  5. Single-cycle instruction execution.
  6. Hardwired rather than micro programmed control.
  7. A relatively large number of registers in the processor unit.
  8. Use of overlapped register windows to speed-up procedure call and return.
  9. Efficient instruction pipeline.
  10. Compiler support for efficient translation of high-level language programs into machine language programs.


 

PIPELINING:

Pipeline is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates concurrently with all other segments. A pipeline can be visualized as a collection of processing segment through which binary information flows. Each segment performs partial processing dictated by the way the task is partitioned. The result obtained from the computations in each segment is transferred to the next segment in the pipeline. The name "pipeline" implies a flow of information analogous to an industrial assembly line.

The simplest way of viewing the pipeline structure is to imagine that each segment consists of an input register followed by a combinational circuit. The register holds the data and the combinational circuit performs the sub operation in the particular segment. The output of the combinational circuit in the given segment is applied to the input register of the next segment.

Example:

The pipeline organization will be demonstrated by means of a simple example. Suppose that we want to perform the combined multiply and add operations with a stream of numbers.

Ai * Bi + Ci for i= 1,2,3,……….,7

Each sub operation is to be implemented in a segment within a pipeline. Each segment has one or two registers and a combinational circuit is shown below

Diagram to be placed


 

R1 through R5 are registers that receive new data with every clock pulse. The multiplier and adder are combinational circuits. The sub operations performed in each segment of the pipeline are as follows:


 

R1 for Ai , R2 for Bi Input Ai and Bi

R3 = R1 * R2, R4 for Ci multiply and input Ci

R5=R3+R4 add Ci to product


 

The five registers are loaded with new data every clock pulse. The effect of each clock is shown below:

Diagram to be placed


 

The first clock pulse transfers A1 and B1 into R1 and R2. The second clock pulse transfers the product of R1 and R2 into R3 and C into R4. The same clock pulse transfers A2 and B2 into R1 and R2. The third clock pulse operates on all three segments simultaneously. It places A3 and B3 into R1 and R2, transfers the product of R1 and R2 into R3, transfers C2 into R4 and places the sum of R3 and R4 into R5. It takes three clock pulses to fill up the pipe and retrieve the first output from R5. From there on each clock produces a new output and moves the data one step down the pipeline. This happens as ling as new input data flow into the system. When no more input data are available, the clock must continue until the last output emerges out of the pipeline.


 

Arithmetic Pipeline:

Pipeline arithmetic units are usually found in very high speed computers. They are used to implement floating-point operations, multiplication of fixed-point numbers and similar computations encountered in scientific problems. A pipeline multiplier is essentially an array multiplier as described below, with special adders designed to minimize the carry propagation time through the partial products.

The inputs to the floating-point adder pipeline are two normalized floating-point binary numbers.

X=A*2b

Y=B*2b

A and B are two fractions that represent the mantissas and a and b are the exponents. The floating-point addition and subtraction can be performed in four segments given below.


 


 

Diagram to be placed


 


 


 

The registers labeled R are placed between the segments to store intermediate results. The sub operations that are performed in four segments are:

  1. Compare the exponents.
  2. Align the mantissas.
  3. Add or subtract the mantissas.
  4. Normalize the result.


 

Example:

Consider the two normalized floating-point numbers:

X=0.9504*103

Y=0.8200*102

The two exponents are subtracted in the first segment to obtain 3-2=1. The larger exponent 3 is chosen as the exponent of the result. The next segment shifts the mantissa or Y to the right to obtain.

X=0.9504*103

Y=0.0820*103

This aligns the two mantissas under the same exponent. The addition of lthe two mantissas in segment 3 produces the sum

Z=1.0324*103

The sum is adjusted by normalizing the result so that it has a fraction with a nonzero first digit. This is done by shifting the mantissa once to the right and incrementing the exponent by one to obtain the normalized sum.

Z=0.10324*104

The comparator, shifter, adder-subtractor, Incrementor and decrementor in the floating-point pipeline are implemented with combinational circuits.


 

Instruction Pipeline

Pipeline processing can occur not only in the data stream but in the instruction stream as well. An instruction pipeline reads consecutive instructions from memory while previous instructions are being executed in other segments. This causes the instruction fetch and executes phases to overlap and perform simultaneous operations.

Consider a computer with an instruction fetch unit and an instruction execution unit designed to provide a two-segment pipeline. The instruction fetch segment can be implemented by means of a first in, first out (FIFO) buffer. This is a type of unit that forms a queue rather than a stack. Whenever the execution unit is not using memory the control increments the program counter and uses its address value to read consecutive instructions from memory. The instructions are inserted into the FIFO buffer so that they can be executed on a first-in, first-out basis. Thus an instruction stream can be placed in a queue, waiting for decoding and processing by the execution segment. The instruction stream queuing mechanism provides an efficient way for reducing the average access time to memory for reading instructions. Whenever there is space in the FIFO buffer, the control unit initiates the next instruction fetch phase. The buffer acts as a queue from which control then extracts the instructions for the execution unit.

Computers with complex instructions require other phases in addition to the fetch and execute to process an instruction completely. In the most general case, the computer needs to process each instruction with the following sequence steps.

  1. Fetch the instruction from memory.
  2. Decode the instruction.
  3. Calculate the effective address.
  4. Fetch the operands from memory.
  5. Execute the instruction.
  6. Store the result in the proper place.


 

There are certain difficulties that will prevent the instruction pipeline from operation at its maximum rate. Different segments may take different times to operate on the incoming information. Some segments are skipped for certain operations.


 

Example:

A register mode instruction does not need an effective address calculation. Two or more segments to wail until another are finished with the memory. Memory address conflicts are sometime resolved by using two memory buses for accessing instructions and data in separate modules. In this way, an instruction word and a data word can be read simultaneously from two different modules.

The design of an instruction pipeline will be most efficient if the instruction cycle is divided into segments of equal duration. The time that each step takes to fulfill its function depends on the instruction and the way it is executed.

Unit 3

Memory Organization

  • Memory Hierarchy
  • Main Memory
  • Auxiliary Memory
  • Associative Memory
  • Cache Memory
  • Virtual Memory


     

Memory Hierarchy


 

The memory unit is an essential component in any digital computer since it is needed for storing programs and data. A very small computer with a limited application may be able to fulfill its intended task without the need of additional storage capacity. Most general purpose computers would run more efficiently if they were equipped with additional storage beyond the capacity of the main memory.


 

Auxiliary Memory:

The memory unit that communicates directly with the CPU is called the main memory. Devices that provide backup storage are called auxiliary memory. The most common auxiliary memory devices used in computer systems are magnetic disks and tapes. They are used for storing system programs, large data files, and other backup information. Only programs and data currently needed by the processor reside in main memory. All other information is stored in auxiliary memory and transferred to main memory when needed.

The total memory capacity of a computer can be visualized as being a hierarchy of components. The memory hierarchy system consists of all storage devices employed in a computer system from the slow but high-capacity auxiliary memory to a relatively faster main memory to an even smaller and faster cache memory accessible to high-speed processing logic.


 

Memory hierarchy in a computer system:


 


 


 

At the bottom of the hierarchy are the relatively slow magnetic tapes used to store removable files. Next are the magnetic disks, which is used for backup storage. The main memory occupies a central position by being able to communicate directly with the CPU and with auxiliary memory devices through an I/O processor. When the CPU needs programs not residing in main memory, they are brought in from auxiliary memory. Programs not currently needed in main memory are transferred into auxiliary memory to provide space for currently used programs and data.


 

Cache Memory:

A special very-high speed memory called a cache is sometimes used to increase the speed of processing by making current programs and data available to the CPU at a rapid rate. The cache is used for storing segments of programs currently being executed in the CPU and temporary data frequently needed in the present calculations.

While the I/O processor manages data transfers between auxiliary memory and main memory, the cache organization is concerned with the transfer of information between main memory and CPU. Thus each is involved with a different level in the memory hierarchy system.


 

Reasons for having two or three levels of memory hierarchy:

As the storage capacity of the memory increases, the cost per bit for storing binary information decreases and the access time of the memory becomes longer. The auxiliary memory has a large storage capacity, is relatively inexpensive, but has low access speed compared to main memory. The cache memory is very small, relatively expensive, and has very high access speed. Thus as the memory access speed increases, so does its relative cost. The overall goal of using a memory hierarchy is to obtain the highest-possible average access speed while minimizing the total cost of the entire memory system.


 

MAIN MEMORY:

The main memory is the central storage unit in a computer system. It is a relatively large and fast memory used to store programs and data during the computer operation.


 

RAM (random – access memory):

The principal technology used for the main memory is based on semiconductor integrated circuits. Integrated circuit RAM chips are available in two possible operating modes , static and dynamic. The static RAM consists essentially of internal flip-flops that store the binary information. The stored information remains valid as long as power is applied to the unit. The dynamic RAM stores the binary information in the form of electric charges that are applied to capacitors.

The dynamic RAM offers reduced power consumption and larger storage capacity in a single memory chip. The static RAM is easier to use and has shorter read and write cycles.


 

ROM (read-only memory)

Most of the main memory in a general-purpose computer is made up of RAM integrated circuit chips, but a portion of the memory may be constructed with ROM chips. RAM was used to refer to a random-access memory, but now it is used of designate a read/write memory to distinguish it from a read-only memory, although ROM is also random access. ROM is used for storing programs that are permanently resident in the computer and for tables of constants that donot change in value once the production of the computer is completed.


 

Bootstrap loader:

The ROM portion of main memory is needed for storing an initial program called a bootstrap loader. The bootstrap loader is a program whose function is to start the computer software operation when power is turned on. Since RAM is volatile, its contents are destroyed when power is turned off. The contents of ROM remain unchanged after power is turned off and an again.


 

RAM and ROM Chips:

A RAM chip is better suited for communication with the CPU if it has one or more control inputs that select the chip only when needed. Another common feature is a bidirectional data bus that allows the transfer of data either from memory to CPU during a read operation or from CPU to memory during a write operation.

A bidirectional bus can be constructed with three-state buffers. A three-state buffer output can be placed in one of three possible states: a signal equivalent to logic 1 , a signal equivalent to logic 0 , or a high impedance state. The logic 1 and 0 are normal digital signals. The high impedance state behaves like an open circuit, which means that the output does not carry a signal and has no logic significance.


 


 


 

The capacity of the RAM memory is 128 words of eight bits (one byte) per word. This requires a 7 bit address and an 8-bit data bus. The read and write inputs specify the memory operation and the two chips (CS) control inputs are for enabling the chip only when it is selected by the microprocessor. The availability of more than one control input facilitates the decoding of the address lines when multiple chips are used in the microcomputer. The read and write inputs are sometimes combined into one line labeled R/W. When the chips is selected, the two binary stated in this line specify the two operations of read or write.


 

Operations of the function table:


 


 

CS1 CS2 RD WR 


 

Memory function 


 

State of data bus 


 

0 0 * *


 

0 1 * *


 

1 0 0 0


 

1 0 0 1


 

1 0 1 *


 

1 1 * *


 

Inhibit


 

Inhibit


 

Inhibit


 

Write


 

Read


 

Inhibit  


 

High-impedance


 

High-impedance


 

High-impedance


 

Input data to RAM


 

Output data from RAM


 

High-impedance 


 

The above function table specifies the operation of the RAM chip. The unit is in operation only when CS1=1 and CS2 =0. The bar on top of the second select variable indicates that its input is enabled when it is equal to 0. If the chip select inputs are not enabled, or if they are enabled but the read or write inputs are not enabled, the memory is inhibited and its data bus is in a high-impedance state. When the WR input is enabled, the memory stores a byte from the data bus into a location specified by the address input lines. When the RD input is enabled, the content of the selected byte is placed into the data bus. The RD and WR signals control the memory operation as well as the bus buffers associated with the bidirectional data bus. The two inputs must be CS1=1 and CS2 =0 for the unit to operate. Otherwise the data bus is in a high-impedance state. There is no need for a read or write control because the unit can only read.


 

Memory Address Map:

The designer of a computer system must calculate the amount of memory required for the particular application and assign it to either RAM or ROM. The addressing of memory can be established by means of a table that specifies the memory address assigned to each chip. The table, called a memory address map, is a pictorial representation of assigned address space for each chip in the system.


 


 

Component Hexadecimal Address bus

address

10 9 8 7 6 5 4 3 2 1  


 

RAM 1 0000-007F 0 0 0 * * * * * * *


 

RAM 2 0080-00FF 0 0 0 1 * * * * * *


 

RAM 3 0100-017F 0 1 0 * * * * * * *


 

RAM 4 0180-01FF 0 1 1 * * * * * * *


 

ROM 0200-03FF 1 * * * * * * * * *


 


 

In the above table the hexadecimal address column assigns a range of hexadecimal equivalent addresses for each chip. The address bus lines are listed in the third column. Although there are 16 lines in the address bus, the table shows only 10 lines because the other 6 are not used in this example and are assumed to be zero. The small *'s under the address bus lines designate those lines that must be connected to the address inputs in each chip. The RAM chips have 128 bytes and need seven address lines. The ROM chip has 512 bytes and needs 9 address lines. The *'s always assigned to the low order bus lines. Lines 1 through 7 for the RAM and lines 1 through 9 for the ROM. It is now necessary to distinguish between four RAM chips by assigning to each a different address. For this particular example we choose bus lines 8 and 9 to represent four distinct binary combinations. The distinction between a RAM and ROM address is done with another bus line. Here we choose line 10 for this purpose. When line 10 is 0, the CPU selects a RAM, and when this line is equal to 1, it selects ROM.


 

The address bus lines are subdivided into groups of four bits each so that each group can be represented with a hexadecimal digit. The first hexadecimal digit represents lines 13 to 16 and is always 0. The next hexadecimal digit represents lines 9 to 12, but lines 11 and 12 are always 0. The range of hexadecimal addresses for each component is determined from the *'s associated with it. These *'s represent a binary number that can range from an all-0's to an all-1'a value.


 

AUXILIARY MEMORY

The most common auxiliary memory devices used in computer systems are magnetic disks and tapes. Other components used, but not as frequently , are magnetic drums , magnetic bubble memory and optical disks. The important characteristics of any device are its access mode , access time , transfer rate , capacity and cost.

The average time required to reach a storage location in memory and obtain its contents is called access time. In electomechanical devices with moving parts such as disks and tapes , the access time consists of a seek time required to position the read-writ head to a location and a transfer time required to transfer data to or from the devices. Because the seek time is usually much linger than the transfer time, auxiliary storage is organized in records or blocks. The transfer rate is the number of characters or words that the device can transfer per second, after it has been positioned at the beginning of the record.


 

Magnetic Disks


 


 


 


 


 


 

A magnetic disk is a circular plate constructed of metal or plastic coated with magnetized material. Often both sides of the disk are used and several disks may be stacked on one spindle with read/write heads available on each surface. All disks rotate together at high speed and are not stopped or started for access purposes. Bits are stored in the magnetized surface in spots along concentric circles called tracks. The tracks are commonly divided into sections called sectors.

Address bits that specify the disk number, the disk surface, the sector number and the track within the sector address a disk system. After the read/write heads are positioned in the specified track , the system has to wait until the rotated disk reaches the specified sector under the read/write head. Information transfer is very fast once the beginning of a sector has been reached.

A track in a given sector near the circumference is longer than a track near the center of the disk. If bits are recorded with equal density, some tracks will contain more recorded bits than others.

Disks that are permanently attached to the unit assembly and cannot be removed by the occasional user are called hard disks. A disk drive with removable disks is called floppy disks. Floppy disks are extensively used in personal computers as a medium for distributing software to computer users.


 

Magnetic Tape:


 


 


 


 

Magnetic tape is a strip of plastic coated with a magnetic recording medium. Bits are recorded as a Magnetic spots on the tape along several tracks. Usually 7 or 8 bits are recorded simultaneously to form a character. Magnetic tape units can be stopped started to move forward or in reveres or can be rewound. But they cannot be started or stopped fast enough between individual characters. By reading the bit pattern at the end of the record, the control recognizes the beginning of a gap. A tape unit is addressed by specifying the record number and the number of characters in the record. Records may be of fixed or variable length.


 

AUXILIARY MEMORY

Many data-processing applications require the search of items in a table stored in memory. An assembler program searches the symbol of address table in order to extract the symbol's binary equivalent. The search procedure is a strategy for choosing a sequence of addresses, reading the content of memory of each address and comparing the information read with the item being searched until a match occurs. The number of accesses to memory depends on the location of the item and the efficiency of the search algorithm.

The time required to find an item stored in memory can be reduced considerably if stored data can be identified for access by the content of the data itself rather by an address. A Memory unit accessed by the content is called an Associative Memory (or) Content addressable memory (CAM)


 

Example:

When a word is written in an Associative memory, no address is given. The memory is capable of finding an empty unused location to store the word. When a word is to be read from an Associative memory, the content of the word or part of the word is specified.

Hardware Organization:


 


 


 


 


 

The above diagram explains the associative memory. It consists of a memory array and logic for m words with n bits per word. The argument register A and key register K each have n bits , one for each bit of a word. The match register M has m bits , one for each memory word. Each word in memory is compared in parallel with the content of the argument register. The words that match the bits of the argument register set a corresponding bit in the match register. After the matching process, those bits in the match register that have been set indicate the fact that their corresponding words have been matched. Reading is accomplished by a sequential access to memory for those words whose corresponding bit in the match register have been sent.


 

The key register provides a mask for choosing a particular field or key in the argument word. The entire argument is compared with each memory word if the key register contains all 1's. Otherwise, only those bits in the argument that have 1's in their corresponding position of the key register are compared. Thus the key provides a mask or identifying a piece of information, which specifies how the reference to memory is made.


 


 

Example


 

A 1 0 1 1 1 1 0 0 0


 

K 1 1 1 0 0 0 0 0 0


 

Word1 1 0 0 1 1 1 1 0 0 (No Match)


 

Word2 1 0 1 0 0 0 0 0 1 (Match)


 

Let us consider that the argument register A and the key register K have the bit configuration shown above. Only the three leftmost bits of A are compared with memory words because K has 1's in these positions.

Word2 matches the unmasked argument field because the three leftmost bits of the argument and the words are equal.

Read Operation:

If more than one word in memory matched the unmasked argument field , all the matched words will have 1's in the corresponding bit position of the Match register. The matched words are read in sequence by applying a read signal to each word line whose corresponding Mi bit is one. If we exclude words having a Zero content , and all zero O/P will indicate that no match occurred and that the searched item is not available in memory.


 

Write Operation:

The Associative memory must have a write capability for storing the information to be searched. Writing in an Associative memory can take different forms depending on the application. Addressing each location in sequence can do if the entire memory is loaded with new information at once prior to a search operation than the writing.


 

Tag Register:

If unwanted words to be deleted and new words to be inserted one at a time, then there is a need for a special register, to distinguish between active and inactive words. This register is called as TAG REGISTER. In Tag register active words are denoted as 1, and inactive words are denoted as 0.


 

CACHE MEMORY

The fundamental idea of cache organization is that by keeping the most frequently accessed instruction and data in the fast cache memory, the average memory access time will approach the access time of the cache. Although the cache is only a small fraction of the size of main memory, a large fraction of memory requests will be found in the fast cache memory because of the locality of reference property of the programs.


 

The basic operations of cache:

When the CPU needs to access the memory, Cache is examined. If the word is found in the cache, it is read from the fast memory. If the word addressed by the CPU not found in the cache, the main memory is accessed to read the word. The performance of Cache memory is frequently measured in terms of a quantity called Hit ratio. When the CPU refers the memory and finds the word in Cache, it is said to produce a HIT, else if the word is not found and it is in Main memory, it counts as a Miss.

The basic characteristic of cache memory is its fast access time. Therefore, very little or no time must be wasted when searching for words in the cache.


 

Mapping:

The transformation of data from main memory to cache memory is referred to as a mapping process. There are three types of mapping procedures. They are as follows:

  1. Associative mapping
  2. Direct mapping
  3. Set-associative mapping


     

Example of Cache Memory:


 


 

In the above figure the main memory can store 32K words of 12 bits each. The cache is capable of storing 512 of these words at any given time. For every word stored in cache, there is a duplicate copy in main memory. The CPU communicates with both memories. It first sends a 15-bit address to cache. If there is a hit, the CPU accepts the 12-bit data from cache. If there is a miss, the CPU reads the word from main memory and the word is then transferred to cache.


 

Associative Mapping:

The fastest and most flexible cache organization uses an associative memory. The associative memory stores both the address and content (data) of the memory word. This permits any location in cache to store any word from main memory.

Associative mapping cache(all numbers in octal)


 


 

The above diagram shows three words presently stored in the cache. The address value of 15 bits is shown as five-digit octal number and its corresponding 12-bit word is shown as a four-digit octal number. A CPU address of 15 bits is placed in the argument register and the associative memory is searched for a matching address. If the address is found, the corresponding 12-bit data is read and sent to the CPU. If no match occurs, the main memory is accessed for the word. The address-data pair is then transferred to the associative-memory cache. If the cache is full, an address-data pair must be displaced to make room for a pair that is needed and not presently in the cache.

The decision as to what pair is replaced is determined from the replacement algorithm that the designer chooses for the cache. A simple procedure is to replace cells of the cache in round-robin order whenever a new word is requested from main memory. This constitutes a first-in-first (FIFO) replacement policy.


 

Direct Mapping:

Associative memories are expensive compared to random-access memories because of the added logic associated with each cell.

Addressing relationships between cache and cache memories


 


 


 

The CPU address of 15 bits is divided into two fields. The nine least significant bits constitute the index fields and the reaming six bits for the tag field. The figure shows the main memory needs an address that includes both the tag and the index bits. The number of bits in the index field is equal to the number of address bits required to access the cache memory.

Direct Mapping:


 


 

The word at address zero is presently stored in the cache (index = 000 , tag = 1220 ). Suppose that the CPU now wants to access the word at address 02000. The index address is 000, so it is used to access the cache. The two tags are then compared. The cache tag is 00 but the address tag is 02, which does not produce the match. Therefore, the main memory is accessed and the data word 5670 is transferred to the CPU. The cache word at index address 000 is than replaced with a tag 02 and data of 5670.


 

The disadvantage of direct mapping is that the hit ration can drop considerably if two or more words whose addresses have the same index but different tags are accessed rapidly.


 


 


 


 


 

Set-Associative Mapping:


 

It was mentioned that the disadvantage of direct mapping is that two words with the same index in their address but with different tag values cannot reside in the cache memory at the same time.


 

A third type of cache organization, called set-associative mapping, is an improvement over the direct-mapping organization in that each word of cache can store two or more words of memory under the same index address. Each data word is stored together with its tag and the number of tag-data items in one word of cache is said to form a set.


 


 


 


 


 


 


 


 


 


 


 


 


 

When the CPU generates a memory request, the index value of the address is used to access the cache. The tag field of the CPU address is then compared with both tags in the cache to determine if a match occurs. The comparison logic is done by an associative search of the tags in the set similar to an associative memory search: thus the name "set-associative". The hit ratio will improve as the set size increase because more words with the same index but different tags can reside the cache. However, an increase in the set size increases the number of bits in words of cache and requires more complex comparison logic.

VIRTUAL MEMORY

Virtual memory is a concept used in some large computer systems that permit the user to construct programs as though a large memory space were available, equal to the totality of auxiliary memory. Each address that is referred by the CPU goes through an address mapping from the virtual address to a physical address in main memory. A Virtual memory system provides a mechanism for translating program-generated addresses into correct main memory locations.


 

Address and memory space:

  • An address used by Programmer called as Virtual address.


 

  • The set of such address is called as address space.


 

  • An address in main memory is called as location or physical address.


 

In multiprogramming computer system, programs and data are transferred to and from auxiliary memory and main memory based on demands imposed by the CPU. Suppose that program1 is currently being executed in the CPU. Program1 and a portion of its associated data are moved from auxiliary memory into main memory is shown in the below figure.


 


 


 


 


 

Memory table for mapping a virtual address

    
 

Virtual address


 


 

In a virtual memory system, programmers are told that they have the total address space at their disposal. Moreover, the address field of the instruction code has a sufficient number of bits to specify all virtual address. In the example, the address of an instruction code will consist of 20 bits but physical memory addresses must be specified with only 15 bits. The CPU will reference instructions and data with a 20-bit address, but the information at this address must be taken from physical memory because access to auxiliary storage of r individual word will prohibitively long.


 

A table is needed as shown above, to map a virtual address of 20 bits to a physical address of 15 bits. The mapping is a dynamic operation, which means that every address is translated immediately as CPU references a word.


 

As per the above diagram, the mapping table may be stored in a separate memory in the first case; an additional memory unit is required as well as one extra memory access time. In the second case, the table takes space from main memory and two accesses to memory are required with the program running a half speed. A third alternative is to use an associative memory.