Information Technology: 2008-09-28

Sunday, September 28, 2008

Random Acces Memory (RAM)

Random-access memory (usually known by its acronym, RAM) is a type of computer data storage. Today it takes the form of integrated circuits that allow the stored data to be accessed in any order, i.e. at random. The word random thus refers to the fact that any piece of data can be returned in a constant time, regardless of its physical location and whether or not it is related to the previous piece of data.^[1]
This contrasts with storage mechanisms such as tapes, magnetic discs and optical discs, which rely on the physical movement of the recording medium or a reading head. In these devices, the movement takes longer than the data transfer, and the retrieval time varies depending on the physical location of the next item.
The word RAM is mostly associated with volatile types of memory (such as DRAM memory modules), where the information is lost after the power is switched off. However, many other types of memory are RAM as well (i.e. Random Access Memory), including most types of ROM and a kind of flash memory called NOR-Flash.

History

An early type of widespread writable random access memory was the magnetic core memory, developed in 1949-1951, and subsequently used in most computers up until the development of the static and dynamic integrated RAM circuits in the late 1960s and early 1970s. Before this, computers used relays, delay lines or various kinds of vacuum tube arrangements to implement "main" memory functions (i.e. hundreds or thousands of bits), some of which were random access, some not. Latches built out of vacuum tube triodes, and later, out of discrete transistors, were used for smaller and faster memories such as registers and (random access) register banks. Prior to the development of integrated ROM circuits, permanent (or read-only) random access memory was often constructed using semiconductor diode matrixes driven by address decoders.
Overview

Types of RAM

Modern types of writable RAM generally store a bit of data in either the state of a flip-flop, as in SRAM (static RAM), or as a charge in a capacitor (or transistor gate), as in DRAM (dynamic RAM), EPROM, EEPROM and Flash. Some types have circuitry to detect and/or correct random faults called memory errors in the stored data, using parity bits or error correction codes. RAM of the read-only type, ROM, instead uses a metal mask to permanently enable/disable selected transistors, instead of storing a charge in them.
As both SRAM and DRAM are volatile, other forms of computer storage, such as disks and magnetic tapes, have been used as "permanent" storage in traditional computers. Many newer products instead rely on flash memory to maintain data between sessions of use: examples include PDAs, small music players, mobile phones, synthesizers, advanced calculators, industrial instrumentation and robotics, and many other types of products; even certain categories of personal computers, such as the OLPC XO-1, Asus Eee PC, and others, have begun replacing magnetic disk with so called flash drives (similar to fast memory cards equipped with an IDE or SATA interface).
There are two basic types of flash memory: the NOR type, which is capable of true random access, and the NAND type, which is not; the former is therefore often used in place of ROM, while the latter is used in most memory cards and solid-state drives, due to a lower price.

Memory hierarchy

Many computer systems have a memory hierarchy consisting of CPU registers, on-die SRAM caches, external caches, DRAM, paging systems, and virtual memory or swap space on a hard drive. This entire pool of memory may be referred to as "RAM" by many developers, even though the various subsystems can have very different access times, violating the original concept behind the random access term in RAM. Even within a hierarchy level such as DRAM, the specific row, column, bank, rank, channel, or interleave organization of the components make the access time variable, although not to the extent that rotating storage media or a tape is variable. (Generally, the memory hierarchy follows the access time with the fast CPU registers at the top and the slow hard drive at the bottom.)
In many modern personal computers, the RAM comes in an easily upgraded form of modules called memory modules or DRAM modules about the size of a few sticks of chewing gum. These can quickly be replaced should they become damaged or too small for current purposes. As suggested above, smaller amounts of RAM (mostly SRAM) are also integrated in the CPU and other ICs on the motherboard, as well as in hard-drives, CD-ROMs, and several other parts of the computer system. The overall goal of using a memory hierarchy is to obtain the higher possible average access speed while minimizing the total cost of entire memory system.

Swapping

If a computer becomes low on RAM during intensive application cycles, the computer can perform an operation known as "swapping". When this occurs, the computer temporarily uses hard drive space as additional memory. Constantly relying on this type of backup memory is called thrashing, which is generally undesirable because it lowers overall system performance. In order to reduce the dependency on swapping, more RAM can be installed.

Other uses of the "RAM" term

Other physical devices with read/write capability can have "RAM" in their names: for example, DVD-RAM. "Random access" is also the name of an indexing method: hence, disk storage is often called "random access" because the reading head can move relatively quickly from one piece of data to another, and does not have to read all the data in between. However the final "M" is crucial: "RAM" (provided there is no additional term as in "DVD-RAM") always refers to a solid-state device.

RAM disks

Software can "partition" a portion of a computer's RAM, allowing it to act as a much faster hard drive that is called a RAM disk. Unless the memory used is non-volatile, a RAM disk loses the stored data when the computer is shut down. However, volatile memory can retain its data when the computer is shut down if it has a separate power source, usually a battery.

Shadow RAM

Sometimes, the contents of a ROM chip is copied to SRAM or DRAM to allow for shorter access times (as ROM may be slower). The ROM chip is then disabled while the initialized memory locations are switched in on the same block of addresses (often write-protected). This process, sometimed called shadowing, is fairly common in both computers and embedded systems.
As a common example, the BIOS in typical personal computers often have an option called “use shadow BIOS” or similar. When enabled, functions relying on data from the BIOS’s ROM will instead use DRAM locations (most can also toggle shadowing of video card ROM or other ROM sections). Depending on the system, this may or may not give a performance boost. On some systems the benefit may be hypothetical because the BIOS is not used after booting in favour of direct hardware hardware access. Of course, somewhat less free memory is available when shadowing is enabled.^[2]

Recent developments

Several new types of non-volatile RAM, which will preserve data while powered down, are under development. The technologies used include carbon nanotubes and the magnetic tunnel effect. In summer 2003, a 128 KB magnetic RAM chip manufactured with 0.18 µm technology was introduced. The core technology of MRAM is based on the magnetic tunnel effect. In June 2004, Infineon Technologies unveiled a 16 MB^[3] prototype again based on 0.18 µm technology. Nantero built a functioning carbon nanotube memory prototype 10 GB^[3] array in 2004. Whether some of these technologies will be able to eventually take a significant market share from either DRAM, SRAM, or flash-memory technology, however, remains to be seen.
Since 2006, "Solid-state drives" (based on flash memory) with capacities exceeding 150 gigabytes and speeds far exceeding traditional disks have become available. This development has started to blur the definition between traditional random access memory and "disks", dramatically reducing the difference in performance.

Memory wall

The "memory wall" is the growing disparity of speed between CPU and memory outside the CPU chip. An important reason for this disparity is the limited communication bandwidth beyond chip boundaries. From 1986 to 2000, CPU speed improved at an annual rate of 55% while memory speed only improved at 10%. Given these trends, it was expected that memory latency would become an overwhelming bottleneck in computer performance. ^[4]
Currently, CPU speed improvements have slowed significantly partly due to major physical barriers and partly because current CPU designs have already hit the memory wall in some sense. Intel summarized these causes in their Platform 2015 documentation (PDF)

“First of all, as chip geometries shrink and clock frequencies rise, the transistor leakage current increases, leading to excess power consumption and heat (more on power consumption below). Secondly, the advantages of higher clock speeds are in part negated by memory latency, since memory access times have not been able to keep pace with increasing clock frequencies. Third, for certain applications, traditional serial architectures are becoming less efficient as processors get faster (due to the so-called Von Neumann bottleneck), further undercutting any gains that frequency increases might otherwise buy. In addition, partly due to limitations in the means of producing inductance within solid state devices, resistance-capacitance (RC) delays in signal transmission are growing as feature sizes shrink, imposing an additional bottleneck that frequency increases don't address.”

The RC delays in signal transmission were also noted in Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures which projects a maximum of 12.5% average annual CPU performance improvement between 2000 and 2014. The data on Intel Processors clearly shows a slowdown in performance improvements in recent processors. However, Intel's new processors, Core 2 Duo (codenamed Conroe) show a significant improvement over previous Pentium 4 processors; due to a more efficient architecture, performance increased while clock rate actually decreased.

Central Processing Unit (CPU)

"CPU" redirects here. For other uses, see CPU (disambiguation).

Die of an Intel 80486DX2 microprocessor (actual size: 12×6.75 mm) in its packaging.

A Central Processing Unit (CPU), is a description of a class of logic machines that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage. The term itself and its initialism have been in use in the computer industry at least since the early 1960s (Weik 1961). The form, design and implementation of CPUs have changed dramatically since the earliest examples, but their fundamental operation has remained much the same
Early CPUs were custom-designed as a part of a larger, sometimes one-of-a-kind, computer. However, this costly method of designing custom CPUs for a particular application has largely given way to the development of mass-produced processors that are suited for one or many purposes. This standardization trend generally began in the era of discrete transistor mainframes and minicomputers and has rapidly accelerated with the popularization of the integrated circuit (IC). The IC has allowed increasingly complex CPUs to be designed and manufactured to tolerances on the order of nanometers. Both the miniaturization and standardization of CPUs have increased the presence of these digital devices in modern life far beyond the limited application of dedicated computing machines. Modern microprocessors appear in everything from automobiles to cell phones to children's toys.

History of CPUs

Main article: History of general purpose CPUs

EDVAC, one of the first electronic stored program computers.

Prior to the advent of machines that resemble today's CPUs, computers such as the ENIAC had to be physically rewired in order to perform different tasks. These machines are often referred to as "fixed-program computers," since they had to be physically reconfigured in order to run a different program. Since the term "CPU" is generally defined as a software (computer program) execution device, the earliest devices that could rightly be called CPUs came with the advent of the stored-program computer.
The idea of a stored-program computer was already present during ENIAC's design, but was initially omitted so the machine could be finished sooner. On June 30, 1945, before ENIAC was even completed, mathematician John von Neumann distributed the paper entitled "First Draft of a Report on the EDVAC." It outlined the design of a stored-program computer that would eventually be completed in August 1949 (von Neumann 1945). EDVAC was designed to perform a certain number of instructions (or operations) of various types. These instructions could be combined to create useful programs for the EDVAC to run. Significantly, the programs written for EDVAC were stored in high-speed computer memory rather than specified by the physical wiring of the computer. This overcame a severe limitation of ENIAC, which was the large amount of time and effort it took to reconfigure the computer to perform a new task. With von Neumann's design, the program, or software, that EDVAC ran could be changed simply by changing the contents of the computer's memory. ^[1]
While von Neumann is most often credited with the design of the stored-program computer because of his design of EDVAC, others before him such as Konrad Zuse had suggested similar ideas. Additionally, the so-called Harvard architecture of the Harvard Mark I, which was completed before EDVAC, also utilized a stored-program design using punched paper tape rather than electronic memory. The key difference between the von Neumann and Harvard architectures is that the latter separates the storage and treatment of CPU instructions and data, while the former uses the same memory space for both. Most modern CPUs are primarily von Neumann in design, but elements of the Harvard architecture are commonly seen as well.
Being digital devices, all CPUs deal with discrete states and therefore require some kind of switching elements to differentiate between and change these states. Prior to commercial acceptance of the transistor, electrical relays and vacuum tubes (thermionic valves) were commonly used as switching elements. Although these had distinct speed advantages over earlier, purely mechanical designs, they were unreliable for various reasons. For example, building direct current sequential logic circuits out of relays requires additional hardware to cope with the problem of contact bounce. While vacuum tubes do not suffer from contact bounce, they must heat up before becoming fully operational and eventually stop functioning altogether.^[2] Usually, when a tube failed, the CPU would have to be diagnosed to locate the failing component so it could be replaced. Therefore, early electronic (vacuum tube based) computers were generally faster but less reliable than electromechanical (relay based) computers.
Tube computers like EDVAC tended to average eight hours between failures, whereas relay computers like the (slower, but earlier) Harvard Mark I failed very rarely (Weik 1961:238). In the end, tube based CPUs became dominant because the significant speed advantages afforded generally outweighed the reliability problems. Most of these early synchronous CPUs ran at low clock rates compared to modern microelectronic designs (see below for a discussion of clock rate). Clock signal frequencies ranging from 100 kHz to 4 MHz were very common at this time, limited largely by the speed of the switching devices they were built with.

Discrete transistor and IC CPUs

CPU, core memory, and external bus interface of a DEC PDP-8/I. made of medium-scale integrated circuits

The design complexity of CPUs increased as various technologies facilitated building smaller and more reliable electronic devices. The first such improvement came with the advent of the transistor. Transistorized CPUs during the 1950s and 1960s no longer had to be built out of bulky, unreliable, and fragile switching elements like vacuum tubes and electrical relays. With this improvement more complex and reliable CPUs were built onto one or several printed circuit boards containing discrete (individual) components.
During this period, a method of manufacturing many transistors in a compact space gained popularity.The integrated circuit (IC) allowed a large number of transistors to be manufactured on a single semiconductor-based die, or "chip." At first only very basic non-specialized digital circuits such as NOR gates were miniaturized into ICs. CPUs based upon these "building block" ICs are generally referred to as "small-scale integration" (SSI) devices. SSI ICs, such as the ones used in the Apollo guidance computer, usually contained transistor counts numbering in multiples of ten. To build an entire CPU out of SSI ICs required thousands of individual chips, but still consumed much less space and power than earlier discrete transistor designs. As microelectronic technology advanced, an increasing number of transistors were placed on ICs, thus decreasing the quantity of individual ICs needed for a complete CPU. MSI and LSI (medium- and large-scale integration) ICs increased transistor counts to hundreds, and then thousands.
In 1964 IBM introduced its System/360 computer architecture which was used in a series of computers that could run the same programs with different speed and performance. This was significant at a time when most electronic computers were incompatible with one another, even those made by the same manufacturer. To facilitate this improvement, IBM utilized the concept of a microprogram (often called "microcode"), which still sees widespread usage in modern CPUs (Amdahl et al. 1964). The System/360 architecture was so popular that it dominated the mainframe computer market for the decades and left a legacy that is still continued by similar modern computers like the IBM zSeries. In the same year (1964), Digital Equipment Corporation (DEC) introduced another influential computer aimed at the scientific and research markets, the PDP-8. DEC would later introduce the extremely popular PDP-11 line that originally was built with SSI ICs but was eventually implemented with LSI components once these became practical. In stark contrast with its SSI and MSI predecessors, the first LSI implementation of the PDP-11 contained a CPU composed of only four LSI integrated circuits (Digital Equipment Corporation 1975).
Transistor-based computers had several distinct advantages over their predecessors. Aside from facilitating increased reliability and lower power consumption, transistors also allowed CPUs to operate at much higher speeds because of the short switching time of a transistor in comparison to a tube or relay. Thanks to both the increased reliability as well as the dramatically increased speed of the switching elements (which were almost exclusively transistors by this time), CPU clock rates in the tens of megahertz were obtained during this period. Additionally, while discrete transistor and IC CPUs were in heavy usage, new high-performance designs like SIMD (Single Instruction Multiple Data) vector processors began to appear. These early experimental designs later gave rise to the era of specialized supercomputers like those made by Cray Inc.

Microprocessors

Main article: Microprocessor

The integrated circuit from an Intel 8742, an 8-bit microcontroller that includes a CPU running at 12 MHz, 128 bytes of RAM, 2048 bytes of EPROM, and I/O in the same chip.

Intel 80486DX2 microprocessor in a ceramic PGA package.

The introduction of the microprocessor in the 1970s significantly affected the design and implementation of CPUs. Since the introduction of the first microprocessor (the Intel 4004) in 1970 and the first widely used microprocessor (the Intel 8080) in 1974, this class of CPUs has almost completely overtaken all other central processing unit implementation methods. Mainframe and minicomputer manufacturers of the time launched proprietary IC development programs to upgrade their older computer architectures, and eventually produced instruction set compatible microprocessors that were backward-compatible with their older hardware and software. Combined with the advent and eventual vast success of the now ubiquitous personal computer, the term "CPU" is now applied almost exclusively to microprocessors.
Previous generations of CPUs were implemented as discrete components and numerous small integrated circuits (ICs) on one or more circuit boards. Microprocessors, on the other hand, are CPUs manufactured on a very small number of ICs; usually just one. The overall smaller CPU size as a result of being implemented on a single die means faster switching time because of physical factors like decreased gate parasitic capacitance. This has allowed synchronous microprocessors to have clock rates ranging from tens of megahertz to several gigahertz. Additionally, as the ability to construct exceedingly small transistors on an IC has increased, the complexity and number of transistors in a single CPU has increased dramatically. This widely observed trend is described by Moore's law, which has proven to be a fairly accurate predictor of the growth of CPU (and other IC) complexity to date.
While the complexity, size, construction, and general form of CPUs have changed drastically over the past sixty years, it is notable that the basic design and function has not changed much at all. Almost all common CPUs today can be very accurately described as von Neumann stored-program machines. As the aforementioned Moore's law continues to hold true, concerns have arisen about the limits of integrated circuit transistor technology. Extreme miniaturization of electronic gates is causing the effects of phenomena like electromigration and subthreshold leakage to become much more significant. These newer concerns are among the many factors causing researchers to investigate new methods of computing such as the quantum computer, as well as to expand the usage of parallelism and other methods that extend the usefulness of the classical von Neumann model.

CPU operation

The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored instructions called a program.The program is represented by a series of numbers that are kept in some kind of computer memory. There are four steps that nearly all von Neumann CPUs use in their operation: fetch, decode, execute, and writeback.

The first step, fetch, involves retrieving an instruction (which is represented by a number or sequence of numbers) from program memory. The location in program memory is determined by a program counter (PC), which stores a number that identifies the current position in the program. In other words, the program counter keeps track of the CPU's place in the current program. After an instruction is fetched, the PC is incremented by the length of the instruction word in terms of memory units.^[3] Often the instruction to be fetched must be retrieved from relatively slow memory, causing the CPU to stall while waiting for the instruction to be returned. This issue is largely addressed in modern processors by caches and pipeline architectures (see below).
The instruction that the CPU fetches from memory is used to determine what the CPU is to do. In the decode step, the instruction is broken up into parts that have significance to other portions of the CPU. The way in which the numerical instruction value is interpreted is defined by the CPU's instruction set architecture(ISA).^[4] Often, one group of numbers in the instruction, called the opcode, indicates which operation to perform. The remaining parts of the number usually provide information required for that instruction, such as operands for an addition operation. Such operands may be given as a constant value (called an immediate value), or as a place to locate a value: a register or a memory address, as determined by some addressing mode. In older designs the portions of the CPU responsible for instruction decoding were unchangeable hardware devices. However, in more abstract and complicated CPUs and ISAs, a microprogram is often used to assist in translating instructions into various configuration signals for the CPU. This microprogram is sometimes rewritable so that it can be modified to change the way the CPU decodes instructions even after it has been manufactured.
After the fetch and decode steps, the execute step is performed. During this step, various portions of the CPU are connected so they can perform the desired operation. If, for instance, an addition operation was requested, an arithmetic logic unit (ALU) will be connected to a set of inputs and a set of outputs. The inputs provide the numbers to be added, and the outputs will contain the final sum. The ALU contains the circuitry to perform simple arithmetic and logical operations on the inputs (like addition and bitwise operations). If the addition operation produces a result too large for the CPU to handle, an arithmetic overflow flag in a flags register may also be set .
The final step, writeback, simply "writes back" the results of the execute step to some form of memory. Very often the results are written to some internal CPU register for quick access by subsequent instructions. In other cases results may be written to slower, but cheaper and larger, main memory. Some types of instructions manipulate the program counter rather than directly produce result data. These are generally called "jumps" and facilitate behavior like loops, conditional program execution (through the use of a conditional jump), and functions in programs.^[5] Many instructions will also change the state of digits in a "flags" register. These flags can be used to influence how a program behaves, since they often indicate the outcome of various operations. For example, one type of "compare" instruction considers two values and sets a number in the flags register according to which one is greater. This flag could then be used by a later jump instruction to determine program flow.
After the execution of the instruction and writeback of the resulting data, the entire process repeats, with the next instruction cycle normally fetching the next-in-sequence instruction because of the incremented value in the program counter. If the completed instruction was a jump, the program counter will be modified to contain the address of the instruction that was jumped to, and program execution continues normally. In more complex CPUs than the one described here, multiple instructions can be fetched, decoded, and executed simultaneously. This section describes what is generally referred to as the "Classic RISC pipeline," which in fact is quite common among the simple CPUs used in many electronic devices (often called microcontroller). It largely ignores the important role of CPU cache, and therefore the access stage of the pipeline.

Design and implementation

Main article: CPU design

Prerequisites
Computer architecture
Digital circuits

Integer range

The way a CPU represents numbers is a design choice that affects the most basic ways in which the device functions. Some early digital computers used an electrical model of the common decimal (base ten) numeral system to represent numbers internally. A few other computers have used more exotic numeral systems like ternary (base three). Nearly all modern CPUs represent numbers in binary form, with each digit being represented by some two-valued physical quantity such as a "high" or "low" voltage.^[6]

MOS 6502 microprocessor in a dual in-line package, an extremely popular 8-bit design.

Related to number representation is the size and precision of numbers that a CPU can represent. In the case of a binary CPU, a bit refers to one significant place in the numbers a CPU deals with. The number of bits (or numeral places) a CPU uses to represent numbers is often called "word size", "bit width", "data path width", or "integer precision" when dealing with strictly integer numbers (as opposed to floating point). This number differs between architectures, and often within different parts of the very same CPU. For example, an 8-bit CPU deals with a range of numbers that can be represented by eight binary digits (each digit having two possible values), that is, 2⁸ or 256 discrete numbers. In effect, integer size sets a hardware limit on the range of integers the software run by the CPU can utilize.^[7]
Integer range can also affect the number of locations in memory the CPU can address (locate). For example, if a binary CPU uses 32 bits to represent a memory address, and each memory address represents one octet (8 bits), the maximum quantity of memory that CPU can address is 2³² octets, or 4 GiB. This is a very simple view of CPU address space, and many designs use more complex addressing methods like paging in order to locate more memory than their integer range would allow with a flat address space.
Higher levels of integer range require more structures to deal with the additional digits, and therefore more complexity, size, power usage, and general expense. It is not at all uncommon, therefore, to see 4- or 8-bit microcontrollers used in modern applications, even though CPUs with much higher range (such as 16, 32, 64, even 128-bit) are available. The simpler microcontrollers are usually cheaper, use less power, and therefore dissipate less heat, all of which can be major design considerations for electronic devices. However, in higher-end applications, the benefits afforded by the extra range (most often the additional address space) are more significant and often affect design choices. To gain some of the advantages afforded by both lower and higher bit lengths, many CPUs are designed with different bit widths for different portions of the device. For example, the IBM System/370 used a CPU that was primarily 32 bit, but it used 128-bit precision inside its floating point units to facilitate greater accuracy and range in floating point numbers (Amdahl et al. 1964). Many later CPU designs use similar mixed bit width, especially when the processor is meant for general-purpose usage where a reasonable balance of integer and floating point capability is required.

Clock rate

Main article: Clock rate

Most CPUs, and indeed most sequential logic devices, are synchronous in nature.^[8] That is, they are designed and operate on assumptions about a synchronization signal. This signal, known as a clock signal, usually takes the form of a periodic square wave. By calculating the maximum time that electrical signals can move in various branches of a CPU's many circuits, the designers can select an appropriate period for the clock signal.
This period must be longer than the amount of time it takes for a signal to move, or propagate, in the worst-case scenario. In setting the clock period to a value well above the worst-case propagation delay, it is possible to design the entire CPU and the way it moves data around the "edges" of the rising and falling clock signal. This has the advantage of simplifying the CPU significantly, both from a design perspective and a component-count perspective. However, it also carries the disadvantage that the entire CPU must wait on its slowest elements, even though some portions of it are much faster. This limitation has largely been compensated for by various methods of increasing CPU parallelism (see below).
However architectural improvements alone do not solve all of the drawbacks of globally synchronous CPUs. For example, a clock signal is subject to the delays of any other electrical signal. Higher clock rates in increasingly complex CPUs make it more difficult to keep the clock signal in phase (synchronized) throughout the entire unit. This has led many modern CPUs to require multiple identical clock signals to be provided in order to avoid delaying a single signal significantly enough to cause the CPU to malfunction. Another major issue as clock rates increase dramatically is the amount of heat that is dissipated by the CPU. The constantly changing clock causes many components to switch regardless of whether they are being used at that time. In general, a component that is switching uses more energy than an element in a static state. Therefore, as clock rate increases, so does heat dissipation, causing the CPU to require more effective cooling solutions.
One method of dealing with the switching of unneeded components is called clock gating, which involves turning off the clock signal to unneeded components (effectively disabling them). However, this is often regarded as difficult to implement and therefore does not see common usage outside of very low-power designs.^[9] Another method of addressing some of the problems with a global clock signal is the removal of the clock signal altogether. While removing the global clock signal makes the design process considerably more complex in many ways, asynchronous (or clockless) designs carry marked advantages in power consumption and heat dissipation in comparison with similar synchronous designs. While somewhat uncommon, entire CPUs have been built without utilizing a global clock signal. Two notable examples of this are the ARM compliant AMULET and the MIPS R3000 compatible MiniMIPS. Rather than totally removing the clock signal, some CPU designs allow certain portions of the device to be asynchronous, such as using asynchronous ALUs in conjunction with superscalar pipelining to achieve some arithmetic performance gains. While it is not altogether clear whether totally asynchronous designs can perform at a comparable or better level than their synchronous counterparts, it is evident that they do at least excel in simpler math operations. This, combined with their excellent power consumption and heat dissipation properties, makes them very suitable for embedded computers (Garside et al. 1999).

Parallelism

Main article: Parallel computing

Model of a subscalar CPU. Notice that it takes fifteen cycles to complete three instructions.

The description of the basic operation of a CPU offered in the previous section describes the simplest form that a CPU can take. This type of CPU, usually referred to as subscalar, operates on and executes one instruction on one or two pieces of data at a time.
This process gives rise to an inherent inefficiency in subscalar CPUs. Since only one instruction is executed at a time, the entire CPU must wait for that instruction to complete before proceeding to the next instruction. As a result the subscalar CPU gets "hung up" on instructions which take more than one clock cycle to complete execution. Even adding a second execution unit (see below) does not improve performance much; rather than one pathway being hung up, now two pathways are hung up and the number of unused transistors is increased. This design, wherein the CPU's execution resources can operate on only one instruction at a time, can only possibly reach scalar performance (one instruction per clock). However, the performance is nearly always subscalar (less than one instruction per cycle).
Attempts to achieve scalar and better performance have resulted in a variety of design methodologies that cause the CPU to behave less linearly and more in parallel. When referring to parallelism in CPUs, two terms are generally used to classify these design techniques. Instruction level parallelism (ILP) seeks to increase the rate at which instructions are executed within a CPU (that is, to increase the utilization of on-die execution resources), and thread level parallelism (TLP) purposes to increase the number of threads (effectively individual programs) that a CPU can execute simultaneously. Each methodology differs both in the ways in which they are implemented, as well as the relative effectiveness they afford in increasing the CPU's performance for an application.^[10]

Instruction level parallelism

Main articles: Instruction pipelining and Superscalar

Basic five-stage pipeline. In the best case scenario, this pipeline can sustain a completion rate of one instruction per cycle.

One of the simplest methods used to accomplish increased parallelism is to begin the first steps of instruction fetching and decoding before the prior instruction finishes executing. This is the simplest form of a technique known as instruction pipelining, and is utilized in almost all modern general-purpose CPUs. Pipelining allows more than one instruction to be executed at any given time by breaking down the execution pathway into discrete stages. This separation can be compared to an assembly line, in which an instruction is made more complete at each stage until it exits the execution pipeline and is retired.
Pipelining does, however, introduce the possibility for a situation where the result of the previous operation is needed to complete the next operation; a condition often termed data dependency conflict. To cope with this, additional care must be taken to check for these sorts of conditions and delay a portion of the instruction pipeline if this occurs. Naturally, accomplishing this requires additional circuitry, so pipelined processors are more complex than subscalar ones (though not very significantly so). A pipelined processor can become very nearly scalar, inhibited only by pipeline stalls (an instruction spending more than one clock cycle in a stage).

Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed.

Further improvement upon the idea of instruction pipelining led to the development of a method that decreases the idle time of CPU components even further. Designs that are said to be superscalar include a long instruction pipeline and multiple identical execution units. ^{[Huynh 2003]} In a superscalar pipeline, multiple instructions are read and passed to a dispatcher, which decides whether or not the instructions can be executed in parallel (simultaneously). If so they are dispatched to available execution units, resulting in the ability for several instructions to be executed simultaneously. In general, the more instructions a superscalar CPU is able to dispatch simultaneously to waiting execution units, the more instructions will be completed in a given cycle.
Most of the difficulty in the design of a superscalar CPU architecture lies in creating an effective dispatcher. The dispatcher needs to be able to quickly and correctly determine whether instructions can be executed in parallel, as well as dispatch them in such a way as to keep as many execution units busy as possible. This requires that the instruction pipeline is filled as often as possible and gives rise to the need in superscalar architectures for significant amounts of CPU cache. It also makes hazard-avoiding techniques like branch prediction, speculative execution, and out-of-order execution crucial to maintaining high levels of performance. By attempting to predict which branch (or path) a conditional instruction will take, the CPU can minimize the number of times that the entire pipeline must wait until a conditional instruction is completed. Speculative execution often provides modest performance increases by executing portions of code that may or may not be needed after a conditional operation completes. Out-of-order execution somewhat rearranges the order in which instructions are executed to reduce delays due to data dependencies.
In the case where a portion of the CPU is superscalar and part is not, the part which is not suffers a performance penalty due to scheduling stalls. The original Intel Pentium (P5) had two superscalar ALUs which could accept one instruction per clock each, but its FPU could not accept one instruction per clock. Thus the P5 was integer superscalar but not floating point superscalar. Intel's successor to the Pentium architecture, P6, added superscalar capabilities to its floating point features, and therefore afforded a significant increase in floating point instruction performance.
Both simple pipelining and superscalar design increase a CPU's ILP by allowing a single processor to complete execution of instructions at rates surpassing one instruction per cycle (IPC).^[11] Most modern CPU designs are at least somewhat superscalar, and nearly all general purpose CPUs designed in the last decade are superscalar. In later years some of the emphasis in designing high-ILP computers has been moved out of the CPU's hardware and into its software interface, or ISA. The strategy of the very long instruction word (VLIW) causes some ILP to become implied directly by the software, reducing the amount of work the CPU must perform to boost ILP and thereby reducing the design's complexity.

Thread level parallelism

Another strategy of achieving performance is to execute multiple programs or threads in parallel. This area of research is known as parallel computing. In Flynn's taxonomy, this strategy is known as Multiple Instructions-Multiple Data or MIMD.
One technology used for this purpose was multiprocessing (MP). The initial flavor of this technology is known as symmetric multiprocessing (SMP), where a small number of CPUs share a coherent view of their memory system. In this scheme, each CPU has additional hardware to maintain a constantly up-to-date view of memory. By avoiding stale views of memory, the CPUs can cooperate on the same program and programs can migrate from one CPU to another. To increase the number of cooperating CPUs beyond a handful, schemes such as non-uniform memory access (NUMA) and directory-based coherence protocols were introduced in the 1990s. SMP systems are limited to a small number of CPUs while NUMA systems have been built with thousands of processors. Initially, multiprocessing was built using multiple discrete CPUs and boards to implement the interconnect between the processors. When the processors and their interconnect are all implemented on a single silicon chip, the technology is known as a multi-core microprocessor.
It was later recognized that finer-grain parallelism existed with a single program. A single program might have several threads (or functions) that could be executed separately or in parallel. Some of earliest examples of this technology implemented input/output processing such as direct memory access as a separate thread from the computation thread. A more general approach to this technology was introduced in the 1970s when systems were designed to run multiple computation threads in parallel. This technology is known as multi-threading (MT). This approach is considered more cost-effective than multiprocessing, as only a small number of components within a CPU is replicated in order to support MT as opposed to the entire CPU in the case of MP. In MT, the execution units and the memory system including the caches are shared among multiple threads. The downside of MT is that the hardware support for multithreading is more visible to software than that of MP and thus supervisor software like operating systems have to undergo larger changes to support MT. One type of MT that was implemented is known as block multithreading, where one thread is executed until it is stalled waiting for data to return from external memory. In this scheme, the CPU would then quickly switch to another thread which is ready to run, the switch often done in one CPU clock cycle. Another type of MT is known as simultaneous multithreading, where instructions of multiple threads are executed in parallel within one CPU clock cycle.
For several decades from the 1970s to early 2000s, the focus in designing high performance general purpose CPUs was largely on achieving high ILP through technologies such as pipelining, caches, superscalar execution, Out-of-order execution, etc. This trend culminated in large, power-hungry CPUs such as the Intel Pentium 4. By the early 2000s, CPU designers were thwarted from achieving higher performance from ILP techniques due to the growing disparity between CPU operating frequencies and main memory operating frequencies as well as escalating CPU power dissipation owing to more esoteric ILP techniques.
CPU designers then borrowed ideas from commercial computing markets such as transaction processing, where the aggregate performance of multiple programs, also known as throughput computing, was more important than the performance of a single thread or program.
This reversal of emphasis is evidenced by the proliferation of dual and multiple core CMP (chip-level multiprocessing) designs and notably, Intel's newer designs resembling its less superscalar P6 architecture. Late designs in several processor families exhibit CMP, including the x86-64 Opteron and Athlon 64 X2, the SPARC UltraSPARC T1, IBM POWER4 and POWER5, as well as several video game console CPUs like the Xbox 360's triple-core PowerPC design, and the PS3's 8-core Cell microprocessor.

Data parallelism

Main articles: Vector processor and SIMD

A less common but increasingly important paradigm of CPUs (and indeed, computing in general) deals with data parallelism. The processors discussed earlier are all referred to as some type of scalar device.^[12] As the name implies, vector processors deal with multiple pieces of data in the context of one instruction. This contrasts with scalar processors, which deal with one piece of data for every instruction. Using Flynn's taxonomy, these two schemes of dealing with data are generally referred to as SISD (single instruction, single data) and SIMD (single instruction, multiple data), respectively. The great utility in creating CPUs that deal with vectors of data lies in optimizing tasks that tend to require the same operation (for example, a sum or a dot product) to be performed on a large set of data. Some classic examples of these types of tasks are multimedia applications (images, video, and sound), as well as many types of scientific and engineering tasks. Whereas a scalar CPU must complete the entire process of fetching, decoding, and executing each instruction and value in a set of data, a vector CPU can perform a single operation on a comparatively large set of data with one instruction. Of course, this is only possible when the application tends to require many steps which apply one operation to a large set of data.
Most early vector CPUs, such as the Cray-1, were associated almost exclusively with scientific research and cryptography applications. However, as multimedia has largely shifted to digital media, the need for some form of SIMD in general-purpose CPUs has become significant. Shortly after floating point execution units started to become commonplace to include in general-purpose processors, specifications for and implementations of SIMD execution units also began to appear for general-purpose CPUs. Some of these early SIMD specifications like Intel's MMX were integer-only. This proved to be a significant impediment for some software developers, since many of the applications that benefit from SIMD primarily deal with floating point numbers. Progressively, these early designs were refined and remade into some of the common, modern SIMD specifications, which are usually associated with one ISA. Some notable modern examples are Intel's SSE and the PowerPC-related AltiVec (also known as VMX)

Motherboard

A motherboard is the central or primary printed circuit board (PCB) making up a complex electronic system, such as a modern computer or laptop. It is also known as a mainboard, baseboard, system board, planar board, or, on Apple computers, a logic board, and is sometimes abbreviated casually as mobo.^[1]
Most motherboards produced today are designed for so-called IBM-compatible computers, which held over 96% of the global personal computer market in 2005.^[2] Motherboards for IBM-compatible computers are specifically covered in the PC motherboard article.
A motherboard, like a backplane, provides the electrical connections by which the other components of the system communicate, but unlike a backplane also contains the central processing unit and other subsystems such as real time clock, and some peripheral interfaces.
A typical desktop computer is built with the microprocessor, main memory, and other essential components on the motherboard. Other components such as external storage, controllers for video display and sound, and peripheral devices are typically attached to the motherboard via edge connectors and cables, although in modern computers it is increasingly common to integrate these "peripherals" into the motherboard.
All of the basic circuitry and components required for a computer to function are onboard the motherboard or are connected with a cable. The most important component on a motherboard is the chipset. It often consists of two components or chips known as the Northbridge and Southbridge, though they may also be integrated into a single component. These chips determine, to an extent, the features and capabilities of the motherboard.

The motherboard of a typical desktop consists of a large printed circuit board. It holds electronic components and interconnects, as well as physical connectors (sockets, slots, and headers) into which other computer components may be inserted or attached.
Most motherboards include, at a minimum:

sockets (or slots) in which one or more microprocessors (CPUs) are installed^[3]
slots into which the system's main memory is installed (typically in the form of DIMM modules containing DRAM chips)
a chipset which forms an interface between the CPU's front-side bus, main memory, and peripheral buses
non-volatile memory chips (usually Flash ROM in modern motherboards) containing the system's firmware or BIOS
a clock generator which produces the system clock signal to synchronize the various components
slots for expansion cards (these interface to the system via the buses supported by the chipset)
power connectors flickers, which receive electrical power from the computer power supply and distribute it to the CPU, chipset, main memory, and expansion cards.^[4]

The Octek Jaguar V motherboard from 1993.^[5] This board has 6 ISA slots but few onboard peripherals, as evidenced by the lack of external connectors.

Additionally, nearly all motherboards include logic and connectors to support commonly-used input devices, such as PS/2 connectors for a mouse and keyboard. Early personal computers such as the Apple II or IBM PC included only this minimal peripheral support on the motherboard. Occasionally video interface hardware was also integrated into the motherboard; for example on the Apple II, and rarely on IBM-compatible computers such as the IBM PC Jr. Additional peripherals such as disk controllers and serial ports were provided as expansion cards.
Given the high thermal design power of high-speed computer CPUs and components, modern motherboards nearly always include heatsinks and mounting points for fans to dissipate excess heat.

Integrated peripherals

Diagram of a modern motherboard, which supports many on-board peripheral functions as well as several expansion slots.

With the steadily declining costs and size of integrated circuits, it is now possible to include support for many peripherals on the motherboard. By combining many functions on one PCB, the physical size and total cost of the system may be reduced; highly-integrated motherboards are thus especially popular in small form factor and budget computers.
For example, the ECS RS485M-M,^[6] a typical modern budget motherboard for computers based on AMD processors, has on-board support for a very large range of peripherals:

disk controllers for a floppy disk drive, up to 2 PATA drives, and up to 6 SATA drives (including RAID 0/1 support)
integrated ATI Radeon graphics controller supporting 2D and 3D graphics, with VGA and TV output
integrated sound card supporting 8-channel (7.1) audio and S/PDIF output
fast Ethernet network controller for 10/100 Mbit networking
USB 2.0 controller supporting up to 12 USB ports
IrDA controller for infrared data communication (e.g. with an IrDA enabled Cellular Phone or Printer)
temperature, voltage, and fan-speed sensors that allow software to monitor the health of computer components

Expansion cards to support all of these functions would have cost hundreds of dollars even a decade ago, however as of April 2007 such highly-integrated motherboards are available for as little as $30 in the USA.

Peripheral card slots

A typical motherboard of 2007 will have a different number of connections depending on its standard. A standard ATX motherboard will typically have 1x PCI-E 16x connection for a graphics card, 2x PCI slots for various expansion cards and 1x PCI-E 1x which will eventually supersede PCI.
A standard Super ATX motherboard will have 1x PCI-E 16x connection for a graphics card. It will also have a varying number of PCI and PCI-E 1x slots. It can sometimes also have a PCI-E 4x slot. This varies between brands and models.
Some motherboards have 2x PCI-E 16x slots to allow more than 2 monitors without special hardware or to allow use of a special graphics technology called SLI (for Nvidia) and Crossfire (for ATI). These allow 2 graphics cards to be linked together to allow better performance in intensive graphical computing tasks such as gaming and video editing.
As of 2007, virtually all motherboards come with at least 4x USB ports on the rear with at least 2 connections on the board internally for wiring additional front ports that are built into the computers case. Ethernet is also included now. This is a standard networking cable for connecting the computer to a network or a modem. A sound chip is always included on the motherboard to allow sound to be output without the need for any extra components. This allows computers to be far more multimedia based than before. Cheaper machines now often have their graphics chip built into the motherboard rather than a separate card.

[edit] Temperature and reliability

Motherboards are generally air cooled with heat sinks often mounted on larger chips, such as the northbridge, in modern motherboards. If the motherboard is not cooled properly, then this can cause the motherboard to crash. Passive cooling, or a single fan mounted on the power supply, was sufficient for many desktop computer CPUs until the late 1990s; since then, most have required CPU fans mounted on their heatsinks, due to rising clock speeds and power consumption. Most motherboards have connectors for additional case fans as well. Newer motherboards have integrated temperature sensors to detect motherboard and CPU temperatures, and controllable fan connectors which the BIOS or operating system can use to regulate fan speed.
Some small form factor computers and home theater PCs designed for quiet and energy-efficient operation boast fan-less designs. This typically requires the use of a low-power CPU, as well as careful layout of the motherboard and other components to allow for heat sink placement.
A 2003 study^[7] found that some spurious computer crashes and general reliability issues, ranging from screen image distortions to I/O read/write errors, can be attributed not to software or peripheral hardware but to aging capacitors on PC motherboards. Ultimately this was shown to be the result of a faulty electrolyte formulation.^[8]

For more information on premature capacitor failure on PC motherboards, see capacitor plague.

Motherboards use electrolytic capacitors to filter the DC power distributed around the board. These capacitors age at a temperature-dependent rate, as their water based electrolytes slowly evaporate. This can lead to loss of capacitance and subsequent motherboard malfunctions due to voltage instabilities. While most capacitors are rated for 2000 hours of operation at 105 °C,^[9] their expected design life roughly doubles for every 10 °C below this. At 45 °C a lifetime of 15 years can be expected. This appears reasonable for a computer motherboard, however many manufacturers have delivered substandard capacitors,^{[citation needed]} which significantly reduce life expectancy. Inadequate case cooling and elevated temperatures easily exacerbate this problem. It is possible, but tedious and time-consuming, to find and replace failed capacitors on PC motherboards; it is less expensive to buy a new motherboard than to pay for such a repair.^{[citation needed]}

Form factor

Main article: Comparison of computer form factors

Motherboards are produced in a variety of sizes and shapes ("form factors"), some of which are specific to individual computer manufacturers. However, the motherboards used in IBM-compatible commodity computers have been standardized to fit various case sizes. As of 2007, most desktop computer motherboards use one of these standard form factors—even those found in Macintosh and Sun computers which have not traditionally been built from commodity components.
Laptop computers generally use highly integrated, miniaturized, and customized motherboards. This is one of the reasons that laptop computers are difficult to upgrade and expensive to repair. Often the failure of one laptop component requires the replacement of the entire motherboard, which is usually more expensive than a desktop motherboard due to the large number of integrated components.

Nvidia SLI and ATI Crossfire

Nvidia SLI and ATI Crossfire technology allows 2 or more of the same series graphics cards to be linked together to allow a faster graphics experience. Almost all medium to high end Nvidia cards and most high end ATI cards support the technology.
They both require compatible motherboards. There is an obvious need for 2x PCI-E 16x slots to allow 2 cards to be inserted into the computer. The same function can be acheived in 650i motherboards by NVIDIA, with a pair of x8 slots. Originally, tri-Crossfire was acheived at 8x speeds with 2 16x slots and 1 8x slot albeit at a slower speed. ATI opened the technology up to Intel in 2006 and such all new Intel chipsets support Crossfire.
SLI is a little more proprietary in its needs. It requires a motherboard with Nvidia's own NForce chipset series to allow it to run.
Although this might seem like a great idea, it is important to note that SLI and Crossfire only offer up to 1.5x the performance of a single card when using a dual setup. They also do not double the effective amount of Vram.

History

Prior to the advent of the microprocessor, a computer was usually built in a card-cage case or mainframe with components connected by a backplane consisting of a set of slots themselves connected with wires; in very old designs the wires were discrete connections between card connector pins, but printed-circuit boards soon became the standard practice. The central processing unit, memory and peripherals were housed on individual printed circuit boards which plugged into the backplane.
During the late 1980s and 1990s, it became economical to move an increasing number of peripheral functions onto the motherboard (see above). In the late 1980s, motherboards began to include single ICs (called Super I/O chips) capable of supporting a set of low-speed peripherals: keyboard, mouse, floppy disk drive, serial ports, and parallel ports. As of the late 1990s, many personal computer motherboards support a full range of audio, video, storage, and networking functions without the need for any expansion cards at all; higher-end systems for 3D gaming and computer graphics typically retain only the graphics card as a separate component.
The early pioneers of motherboard manufacturing were Micronics, Mylex, AMI, DTK, Hauppauge, Orchid Technology, Elitegroup, DFI, and a number of Taiwan-based manufacturers.
Popular personal computers such as the Apple II and IBM PC had published schematic diagrams and other documentation which permitted rapid reverse-engineering and third-party replacement motherboards. Usually intended for building new computers compatible with the exemplars, many motherboards offered additional performance or other features and were used to upgrade the manufacturer's original equipment.

Bootstrapping using the BIOS

Main article: booting

Motherboards contain some non-volatile memory to initialize the system and load an operating system from some external peripheral device. Microcomputers such as the Apple II and IBM PC used read-only memory chips, mounted in sockets on the motherboard. At power up the central processor would load its program counter with the address of the boot ROM and start executing ROM instructions displaying system information on the screen and running memory checks, which would in turn start loading memory from an external or peripheral device (disk drive) if one isn't available then the computer can perform tasks from other memory stores or displays an error message depending on the model and design of the computer and version of the bios.
Most modern motherboard designs use a BIOS, stored in a EEPROM chip soldered to the motherboard, to bootstrap the motherboard. (Socketed BIOS chips are widely used, also.) By booting the motherboard, the memory, circuitry, and peripherals are tested and configured. This process is known as a Power On Self Test or POST. Errors during POST result in POST error codes, ranging from simple audible beeps from the speaker to complex diagnostic messages displayed on the video monitor.
The BIOS often requires configuration settings to be stored on the motherboard. Since configuration settings must be easily edited, these settings are often stored in non-volatile RAM (NVRAM) rather than in some sort of read-only memory (ROM). When a user makes configuration changes or alters the date and time of the computer, this small NVRAM circuit stores the data. Typically, a small, long-lasting battery (e.g. a lithium coin cell CR2032) is used to keep the NVRAM "refreshed" for many years. Therefore, a failing battery on a motherboard will produce the symptoms of a computer that cannot determine the correct date and time, nor remember what hardware configuration the user has selected. The BIOS itself is unaffected by the status of the battery.
When IBM first introduced the PC in the 1980s, imitations were quite common. (The physical parts which made up the motherboard were trivial to acquire.) However, the imitations were never successful until the IBM ROM BIOS was legally copied.^[10] To understand why copying the BIOS was an important step, consider that the BIOS contained vital instructions which interacted with peripherals. Without these software instructions in the BIOS, a PC would not function properly. (In most modern computer operating systems, the BIOS is bypassed for most hardware functions, but in the 1980s, the BIOS served many vital low-level functions.)
So when Compaq Computer Corp. spent US$1 million to clone the IBM BIOS using reverse engineering, they became an elite computer manufacturer of IBM PC Clones. Phoenix Technology soon matched their feat and began reselling BIOSes to other clone makers.^[11] It has been noted that Microsoft was more than happy to license the operating system (DOS), and IBM was more than happy to sue companies^[12] that violated the copyright of their BIOS. But by documenting and publicizing the reverse engineering of the BIOS, Compaq and Phoenix were legally competing with IBM using their own copyrighted BIOS.
Once the bootstrapping of the computer's peripherals are complete, the BIOS will normally pass control to another set of instructions stored on a bootable device.
Devices which are normally used to boot a computer:

floppy drive
network controller
CD-ROM drive
DVD-ROM drive
SCSI hard drive
IDE, EIDE, or SATA hard drive
External USB memory storage device

Information Technology

Sunday, September 28, 2008

Random Acces Memory (RAM)

History

Types of RAM

Memory hierarchy

Swapping

Other uses of the "RAM" term

RAM disks

Shadow RAM

Recent developments

Memory wall

Central Processing Unit (CPU)

History of CPUs

Discrete transistor and IC CPUs

Microprocessors

CPU operation

Design and implementation

Integer range

Clock rate

Parallelism

Instruction level parallelism

Thread level parallelism

Data parallelism

Motherboard

Integrated peripherals

Peripheral card slots

[edit] Temperature and reliability

Form factor

Nvidia SLI and ATI Crossfire

History

Bootstrapping using the BIOS

Pengalaman Mengikuti Program COOP Telkom 2011

Search This Blog