Last Goat Standing - Part 2

April 02, 2021

Last Goat Standing is an effect driven demo - this is an overview of said effects.

Introduction

As a coder I really do like to write effects. I also don’t see a point of showing the same effect in 10 different variations. That’s why most of our demos (at least the ones I write) only has one version of a certain effect.

Side Note: My target for all effects was 25fps. IF an effect was around 10-15fps after fiddling with parameters and a decent C language opitmization I decided it was worth keeping and optimize it in assembler.

Effects (order of apparence)

  • The Fluids text dissolver (intro) uses 16 steps of directional vector data, changing every 4th frame. The directional data is calculated offline.
  • The font writer use a 16x16 pixels wide “block” font, the drawing routine implements individual character settings to achieve simple kerning and other effects. Used throughout the demo.
  • The Credits use the contour from a TTF font (Arial?) divided into several segments
  • Landscape with scaling pillars. Scaling the pillars is achieved by looking for special markers in the heightmap and replace the marker with the height value from a function.
  • The picture-in-picture zoomer is using filtering to avoid graining when a pixel becomes an image. The blending is achieved by having a linear gradient of 256 grey steps and no lookup table. Each pixel could be mapped to a different picture - but we lacked the graphics (without recycling)..
  • Twist-scroller is textured mapped with a mask texture
  • The particles have 7 bits alpha and can have 16 shades the picture has 32 colors. Blur and blending is done on the fly in one pass using a 128 x 16 x 32 blending matrix
  • Big up-scrolling picture has 4 levels of sub-pixel detail, each level has it’s own palette
  • Raycast tunnel is using 4 light cones and the ‘holes’ are made by traversing a hitmap as well as the texture (so dual texturing + light interpolation)
  • The Corona effect (greets) is a simple fire effect drawn with a simplified blob lookup table
  • Rotating voxel cylinder is textured and lit (gourade shading)
  • Voxel blob with 3 axis rotation is lit but not textured. A normal map is used to achieve the light effect - it is by no means correct but looks quite ok..
  • End-part is not voxel based, it is using 32 textured and lit spans per scanline with Z-Buffer

Stuff/Tricks worth mentioning

The voxel stuff is described separately but here are some tricks used in the various parts.

Raycast Tunnel

A long time ago Krikkit was playing with raytracing (mid 90’s) and implemented an effect I have always liked. In order to create complex objects he used a hitmap to describe if there was a hit or not. I thought this could be a good way to enhance the regular raycast tunnel effect seen in plenty of demos. I also thought it would be neat if you could see the “outside” of the tunnel through the holes. But this never look that good. And I also didn’t achieve the framerate I wished for. I simply couldn’t optimize the dual-texturing + lightsource interpolation in a fast enough manner on the Amiga.

There are actually four light-sources. Each orginating from the same point going off in different directions (the light vectors are rotated). The light source is described as a cone and the light cone is properly calculated.

The full light calculation is here:

// intersection along Z-Axis
static float intersect(float *ray, float r) {
/*
    NOTE: Assuming camera is fixed at '0,0,0' this eliminates a few things

    Solving the 2nd order equation using:
    t^2 + pt + q = 0;       t = -p/2 +/- sqrt((p/2)^2 - q)    <some cool formula>
                                                                         | discriminant  |
    ....gives...
*/
    float a = (ray[0]*ray[0] + ray[1] * ray[1]); //a = dx^2 + dy^2)
    float q = (r*r)/a;                       // Just for clarity right now

    // This is costly...
    float dist = sqrt(q);
    return dist;
}

static __inline float calc_light(float *hit, float ht, float r, float *lpos, float *dir) {
    // calculate light as a cone
    static const float cone_r = 80;

    float tmp[3],tmp2[3];
    float cone_dist = vDot(vSub(tmp, hit, lpos), dir);
    if (cone_dist < 0) {
        return 0;
    }
    float cr = (cone_dist / r) * cone_r;
    float orth_dist = vAbs(vSub(tmp, tmp,vMul(tmp2, dir, cone_dist)));

    if (orth_dist < cr) {
        float cdfac = 1.0f;
        float dist_fac = orth_dist / cr;
        return ((LIGHT_STEPS-1) * cdfac * (1.0 - dist_fac));
    }
    return 0;
}
static __inline float check_one_ray(float *hit, float *ray, float r, float *light, float lfac) {
    static float mid[3];
    static float lv[3];
    static float n[3];
    static float l;

    float ht = intersect(ray,r);
    vMul(hit, ray, ht/8);

    if (ht > 48) {
        return 0;
    }

    l = LIGHT_AMBIENT * (1.0 - ht/48.0);

    for(int i=0;i<NUM_LIGHT_DIR-1;i++) {
        l += calc_light(hit, ht, r, light, &light_dir_dst[3 * i]);
    }
 
    if (l < 0) {
        l = 0;
    }

    l += lfac;

    if (l > (LIGHT_STEPS-1)) {
        l = LIGHT_STEPS-1;
    }

    return l;
}

Due to fairly costly light calculations the tunnel is drawn in 16x16 blocks. It uses dual-texture mapping (hitmap + texture) and shading. In the end 3 values are interpolated as the hitmap/texture values are aligned.

Picture in Picture

This is an old routine. It started out a long time ago when Zyrax/Obscure sent me an effect preview and I thought it was really cool. I implemented it in beginning of 2000, ported it to HW acceleration and used it in our Remedy 2003 demo ‘Schism’ (https://www.pouet.net/prod.php?which=10179).

Basically this is a bilnear scaler which use the fractional part as U/V offsets from same (or another picture). However that is not quite enough as you will have horrible graining. Instead you must x-fade beteween using the fractional parts and the integer parts in your scaling routine - based on the scaling factor. And this is were it becomes costly.

Actually, I never got around converting this to assembler. It would probably have benefitted quite a bit. There is even a comment in the code saying I should - but at the end there were just too many ‘todos’.

The scanline code looks like this:

// TODO: Convert to Amiga ASM
static void dozoom_scanline_fix(uint8_t *scanline, int width, GOA_PIXMAP8 *src, GOA_PIXMAP8 *inner, int blendfactor, int invblend, int u_fix, int dx_fix, int vi, int vs) {
    int ui;
    int us;
    uint8_t cs, cf, ci;

    // do scanline
    for(int x = 0; x < width; x++) {

        // this should fit
        ui = ((u_fix & 65535) * inner->width) >> 16;
        ci = inner->image[ui + vi];

        us = u_fix >> 16;
        cs = src->image[vs + us];

        cf = ((ci *invblend) + (cs * blendfactor))>>8;

        scanline[x] = cf;

        u_fix+=dx_fix;
    }
}

End Part

Based off an height map. Each scanline is drawn using 32 spans. The spans are stored in a lookup table (32 * 256 rotational steps). Each vertex grabs a value from the height map to be used as the radius from of the cylinder. The span is then drawn using a 32bit scanline-only Z-buffer. To avoid clearing the Z-buffer a Z-offset is calculated per scanline. The radius range goes from -32..+32 in the Z-direction. For each scanline 64 is added to the Z-value. This gives a per frame range of 0..2895102 values per frame. Given that a 32bit unsigned integer can hold 4GB of data this leaves enough show-time to last at least a minute a 25fps.

Lighting is calculated on a per-vertex level using the vertex as a normal (it is a directional vector from the center of the cylinder). Since the light source is placed directly infront of the cylinder this is a single multiplication.

The texture mapping is only traversing the U direction (V direction is increased on a per scanline basis). Which means the innerloop could use the 5 instr. texturing trick (it didn’t - I was well above 25fps anyway so I stopped).


Profile picture

Written by Fredrik Kling. I live and work in Switzerland. Follow me Twitter