Jim Blinn's book [1] provides a vivid account of the graphics pipeline. In particular read chapters 13 to 18. Each of these chapters were originally published in IEEE Computer Graphics & Its Applications.
The pipeline can be interpreted as a series of coordinate spaces where points defining objects exist. There is little agreement on the name of these spaces, and, more importantly, architectures and algorithmics may eliminate or add spaces, or change when and how certain steps are made. We will distinguish the following spaces:
Other terms for these spaces are: model and object for master; universe for world (big minds, perhaps); eye or camera for view; normalized device or screen for normalized; and pixel or raster for device.
Now a word about notation. We'll use x, y, z, and w to represent coordinates in some space (if you do not understand what w is, please be patient, or look it up; if you do not understand what x, y and z are, please speak with your instructor). When it seems necessary to explicitly mention the space that these points are in we will use use a subscript M, W, V, P, C or D for master, world, view, perspective, clip, or device coordinates. Got that? Most often we are interested in a sequence of points, so they will need to be subscripted by integers starting at 0. Points is a space are written as rows: (x, y, z, w).
Always, the map from one space (coordinate system) to the next involves
multiplying a point times a matrix, one of all the answers in graphics.
All matrices will be denoted by capital, math italic fonts, usually, M,
T, S, R, P to denote a general matrix, or a translation, scale,
rotation, or projection.
Almost always a matrix (transformation or map) will have four rows and four
columns.
Point-matrix multiplication transforms a point in space A into a point in
another space B, this is written
Now let's go into some of these spaces and see what they are all about. We'll start with where we want to get: device coordinates or pixels.
In most graphics systems the image is stored in memory, often called a framebuffer, as a collection of integers that specify the color of a pixel on a display device. The framebuffer represents an image of the entire display surface and may also include off-screen regions. There may be two framebuffers to support double buffering, where one buffer is displayed as the other is written too. Other buffers, such as depth, shadow, and accumulation buffers, may exist in some graphics systems.
For concreteness, we will assume a display surface with integer grid
and
, dating this discussion to
a particular instance in time.
Now let's back up a couple of spaces to clip space. This is the place where we crop off any portion of the world to just what we see through our window on the world.
Clipping to the unit cube:
, can be implemented
efficiently.
Although clearly not the only choice for clip space, this is how we will
define it.
Normal coordinates, more commonly, normalized device coordinates, in a graphics system dependent used to eliminate geometric distortion between our specified screen and what we actually see. This is arrived at by a geometric translation and scale that removes distortion that might occur in the map for our envisioned world to the particular display device. Think of the projection of the world we see onto a rectangular window, which is mapped, without distortion, to an on-screen viewport.
Now (again dating the discussion) typically, a window system manages the framebuffer and allocates a viewport where our graphics will display. Of course the window system will (usually) allow us to move and resize the viewport as we will. But the point is that we want to record colors in a portion of the framebuffer defined by an offset (xo, yo) and a width and height (w, h), where each of these quantities is an integer that can be translated into a framebuffer memory address.