- Floating-point geometry processing: transformations and clipping
- Integer pixel processing: Scan conversion
- Frame buffer bandwidth: Reads/writes
To begin to overcome these barriers, we'll review the standard graphics
pipeline for Gouraud- or Phong shaded polygons.
- Front-end subsystem
- Display traversal
- Modeling transformation
- Trivial accept/reject test
- Lighting
- Viewing transform
- Clipping
- Division by w and mapping to viewport
- Back-end subsystem
- Rasterization
- Needed because model may change between frames
- Feeds all primitives, and context information (color,
current transformations) into remainder of pipeline
- Two types of traversal
- Immediate mode
- No record of primitives and attributes
- Application regenerates the scene when model changes
- Flexible - display model does not need to conform to a
standard structure
- The main CPU performs immediate mode traversal, rebuilding
the structure from scratch
- Support wider range of applications
- Retained mode
- Model stored in a central structure store
- Separate processor can traverse the model
- Main CPU only edits, rather than rebuilds, display list
- Processing requirements for display traversal depend on
- Traversal method
- The model itself
- At least one read/write for each word of data displayed
- If structure hierarchy is deep or contains many modeling
transforms, processing requirements can be great
- Primitives transformed from object to world coordinate system
- Single transformation - concatenation of individual transforms
- One or more surface normals may be transformed as well
(object-space surface normal multiplied by transpose
of the inverse of modeling transform)
- Floating-point calculations for a single vertex:
- Homogeneous point by 4 x 4 matrix = 16 multiplies, 12 adds
- Vertex normal by matrix = 9 multiplies, 6 adds
- Are primitives wholly inside or outside view volume?
- Test each world vertex against the 6 bounding planes
- Requires 4 multiplies, 3 adds per plane
- Depending on shading algorithm, illumination model
must be evaluate (once, once per vertex, or once per pixel)
- Constant shading
- Inner product of light vector and surface normal = 3 multiplies,
2 adds
- Attenuation factor = 1 multiply
- For each R, G, B we multiply diffuse reflectivity by light
intensity by dot product (2 multiplies) and ambient
reflectivity by ambient intensity (1 multiply) and add
results
- Total of 3+3(2+1) = 12 multiplies, 2 + 3(1) = 5 additions
- Gouraud has 12 multiplies and 5 adds per vertex
- Phong has more multiplies and adds per pixel when
the specular term is included, but they
occur during rasterization stage
- World coordinates transformed to view coordinates
- Uses a single 4 x 4 matrix and requires 16 multiplies
and 12 adds per vertex
- Some terms in view transform are always zero - taking
advantage of this may reduce work by 25 per cent
- Lit primitives are clipped to view volume to prevent
one screen window from interfering with another
and to prevent underflow or overflow from primitives
passing behind the eye or at a great distance
- Exact clipping only practical for simple primitive such as
lines and polygons
- Scissoring used for complex primitives, processing at
rasterization stage (source of inefficiency since effort
expanded on pixels outside view volume)
- Clipping performed in homogeneous coordinates (for z clipping
where w value used to recognize vertices behind eye)
- A common assumption is 10 per cent of primitives need clipping
- Divide x, y and z by w = 3 divides per vertex
- Vertex x and y coordinates must be scaled and translated
to 3D viewport = 2 multiplies and 2 adds
- Transformation of primitives to pixels
- Three subtasks
- Scan conversion
- Visible-surface determination
- Shading
- Scan conversion can be performed in two orders
- Primitive (object) order (leads to z-buffer algorithm)
for (each primitive P)
for (each pixel q within P)
update frame buffer based on visibility of q at P
- Pixel (image) order (leads to scan line algorithm)
for (each pixel q)
for (each primitive P covering q)
update frame buffer based on P's contribution to q
- It is difficult to count the calculations in scan conversion
- Shading
- Constant shading requires no additional calculations
- Gouraud shading requires bilinear interpolation of R, G, B values
(using incremental methods)
- Phong shading requires
- Bilinear interpolation of vertex normals (and rescaling to unit
length = 2 adds, 1 square root, 3 multiplies)
- Evaluation of Phong illumination model = ambient/diffuse/specular
terms, calculation of reflected vector, exponentiation
- For each covered pixel z is calculated
(1 add using increments)
- For each covered pixel, a z value is read (1 frame buffer cycle)
- Current z is compared to stored z (1 subtract)
- Newly visible pixels require updates of RGB (3 adds, 1 frame buffer
cycle)
- For newly visible pixels z and RGB values are written (2 cycles)
- Assume a database of 10,000 triangles each covering 100 pixels
on average
- For simplicity assume no primitives need clipping and 1/2
of the pixels of all triangle are obscured by another
- Assume ambient/diffuse illumination and Gouraud shading
- Assume 1280 x 1024 display screen updated 30 frames per second
- For each frame, must process 3(10,000) = 30,000 vertices and
normal vectors
- Modeling stage requires: 25(30,000) = 750,000 multiplies
and 18(30,000) = 540,000 adds
- Trivial accept/reject stage requires:
24(30,000) = 720,000 multiplies and 18(30,000) = 540,000 adds
- Lighting stage requires:
12(30,000) = 360,000 multiplies and 5(30,000) = 150,000 adds
- View transform stage requires:
8(30,000) = 240,000 multiplies and 6(30,000) = 180,000 adds
- Clipping stage we are ignoring
- Division by w and mapping to viewport requires:
3(30,000) = 90,000 divisions, 2(30,000) = 60,000 multiplies
and 2(30,000) = 60,000 adds
- Total of 2,220,000 multiplies/divides, 1,470,000 adds per frame
- Or 66.6 million multiplies/divides, 44.1 million adds per second
(or 110.7 MFLOPS)
- Assume z values and RGB triples occupy 32 bits (of
z-buffer and frame buffer memory)
- Assume 3/4 of pixels are initially visible
(3/4 x 100 x 10,000 = 750,000 visible pixels)
and
(1/4 x 100 x 10,000 = 250,000 invisible pixels)
- To display the frame requires
5(750,000) + 2(250,000) = 4.25 million integer adds and
3(750,000) + 1(250,000) = 2.5 million frame buffer accesses
- To initialize the frame and z buffer another
1280 x 1024 x 2 = 2.6 million frame buffer accesses
- If 30 frames are generated per second, 127.5 million integer adds and
153 million frame buffer cycles are required per second
- In 1996 the fastest floating-point processors compute about 100 MFLOPS
- In 1996 the fastest integers processors compute about 300 MIPS
- In 1996 the fastest DRAM memory have cycle times of about 50 nanoseconds
or 20 million cycles per second
- Thus for this simple data base (with many simplifying assumptions)
we are just at the capability of a high-performance machine
Florida Tech Computer Science
William D. Shoaff
Comments to author:wds@cs.fit.edu
All contents copyright ©, William D. Shoaff
Revised: Tue Sep 24 13:26:40 EST 1996