Last Goat Standing - Part 2

April 02, 2021

Last Goat Standing is an effect driven demo - this is an overview of said effects.

Introduction

As a coder I really do like to write effects. I also don’t see a point of showing the same effect in 10 different variations. That’s why most of our demos (at least the ones I write) only has one version of a certain effect.

Side Note: My target for all effects was 25fps. IF an effect was around 10-15fps after fiddling with parameters and a decent C language opitmization I decided it was worth keeping and optimize it in assembler.

Effects (order of apparence)

The Fluids text dissolver (intro) uses 16 steps of directional vector data, changing every 4th frame. The directional data is calculated offline.
The font writer use a 16x16 pixels wide “block” font, the drawing routine implements individual character settings to achieve simple kerning and other effects. Used throughout the demo.
The Credits use the contour from a TTF font (Arial?) divided into several segments
Landscape with scaling pillars. Scaling the pillars is achieved by looking for special markers in the heightmap and replace the marker with the height value from a function.
The picture-in-picture zoomer is using filtering to avoid graining when a pixel becomes an image. The blending is achieved by having a linear gradient of 256 grey steps and no lookup table. Each pixel could be mapped to a different picture - but we lacked the graphics (without recycling)..
Twist-scroller is textured mapped with a mask texture
The particles have 7 bits alpha and can have 16 shades the picture has 32 colors. Blur and blending is done on the fly in one pass using a 128 x 16 x 32 blending matrix
Big up-scrolling picture has 4 levels of sub-pixel detail, each level has it’s own palette
Raycast tunnel is using 4 light cones and the ‘holes’ are made by traversing a hitmap as well as the texture (so dual texturing + light interpolation)
The Corona effect (greets) is a simple fire effect drawn with a simplified blob lookup table
Rotating voxel cylinder is textured and lit (gourade shading)
Voxel blob with 3 axis rotation is lit but not textured. A normal map is used to achieve the light effect - it is by no means correct but looks quite ok..
End-part is not voxel based, it is using 32 textured and lit spans per scanline with Z-Buffer

Stuff/Tricks worth mentioning

The voxel stuff is described separately but here are some tricks used in the various parts.

Raycast Tunnel

A long time ago Krikkit was playing with raytracing (mid 90’s) and implemented an effect I have always liked. In order to create complex objects he used a hitmap to describe if there was a hit or not. I thought this could be a good way to enhance the regular raycast tunnel effect seen in plenty of demos. I also thought it would be neat if you could see the “outside” of the tunnel through the holes. But this never look that good. And I also didn’t achieve the framerate I wished for. I simply couldn’t optimize the dual-texturing + lightsource interpolation in a fast enough manner on the Amiga.

There are actually four light-sources. Each orginating from the same point going off in different directions (the light vectors are rotated). The light source is described as a cone and the light cone is properly calculated.

The full light calculation is here:

// intersection along Z-Axis
static float intersect(float *ray, float r) {
/*
    NOTE: Assuming camera is fixed at '0,0,0' this eliminates a few things

    Solving the 2nd order equation using:
    t^2 + pt + q = 0;       t = -p/2 +/- sqrt((p/2)^2 - q)    <some cool formula>
                                                                         | discriminant  |
    ....gives...
*/
    float a = (ray[0]*ray[0] + ray[1] * ray[1]); //a = dx^2 + dy^2)
    float q = (r*r)/a;                       // Just for clarity right now

    // This is costly...
    float dist = sqrt(q);
    return dist;
}

static __inline float calc_light(float *hit, float ht, float r, float *lpos, float *dir) {
    // calculate light as a cone
    static const float cone_r = 80;

    float tmp[3],tmp2[3];
    float cone_dist = vDot(vSub(tmp, hit, lpos), dir);
    if (cone_dist < 0) {
        return 0;
    }
    float cr = (cone_dist / r) * cone_r;
    float orth_dist = vAbs(vSub(tmp, tmp,vMul(tmp2, dir, cone_dist)));

    if (orth_dist < cr) {
        float cdfac = 1.0f;
        float dist_fac = orth_dist / cr;
        return ((LIGHT_STEPS-1) * cdfac * (1.0 - dist_fac));
    }
    return 0;
}
static __inline float check_one_ray(float *hit, float *ray, float r, float *light, float lfac) {
    static float mid[3];
    static float lv[3];
    static float n[3];
    static float l;

    float ht = intersect(ray,r);
    vMul(hit, ray, ht/8);

    if (ht > 48) {
        return 0;
    }

    l = LIGHT_AMBIENT * (1.0 - ht/48.0);

    for(int i=0;i<NUM_LIGHT_DIR-1;i++) {
        l += calc_light(hit, ht, r, light, &light_dir_dst[3 * i]);
    }
 
    if (l < 0) {
        l = 0;
    }

    l += lfac;

    if (l > (LIGHT_STEPS-1)) {
        l = LIGHT_STEPS-1;
    }

    return l;
}

Due to fairly costly light calculations the tunnel is drawn in 16x16 blocks. It uses dual-texture mapping (hitmap + texture) and shading. In the end 3 values are interpolated as the hitmap/texture values are aligned.

Picture in Picture

This is an old routine. It started out a long time ago when Zyrax/Obscure sent me an effect preview and I thought it was really cool. I implemented it in beginning of 2000, ported it to HW acceleration and used it in our Remedy 2003 demo ‘Schism’ (https://www.pouet.net/prod.php?which=10179).

Basically this is a bilnear scaler which use the fractional part as U/V offsets from same (or another picture). However that is not quite enough as you will have horrible graining. Instead you must x-fade beteween using the fractional parts and the integer parts in your scaling routine - based on the scaling factor. And this is were it becomes costly.

Actually, I never got around converting this to assembler. It would probably have benefitted quite a bit. There is even a comment in the code saying I should - but at the end there were just too many ‘todos’.

The scanline code looks like this:

// TODO: Convert to Amiga ASM
static void dozoom_scanline_fix(uint8_t *scanline, int width, GOA_PIXMAP8 *src, GOA_PIXMAP8 *inner, int blendfactor, int invblend, int u_fix, int dx_fix, int vi, int vs) {
    int ui;
    int us;
    uint8_t cs, cf, ci;

    // do scanline
    for(int x = 0; x < width; x++) {

        // this should fit
        ui = ((u_fix & 65535) * inner->width) >> 16;
        ci = inner->image[ui + vi];

        us = u_fix >> 16;
        cs = src->image[vs + us];

        cf = ((ci *invblend) + (cs * blendfactor))>>8;

        scanline[x] = cf;

        u_fix+=dx_fix;
    }
}

End Part

Based off an height map. Each scanline is drawn using 32 spans. The spans are stored in a lookup table (32 * 256 rotational steps). Each vertex grabs a value from the height map to be used as the radius from of the cylinder. The span is then drawn using a 32bit scanline-only Z-buffer. To avoid clearing the Z-buffer a Z-offset is calculated per scanline. The radius range goes from -32..+32 in the Z-direction. For each scanline 64 is added to the Z-value. This gives a per frame range of 0..2895102 values per frame. Given that a 32bit unsigned integer can hold 4GB of data this leaves enough show-time to last at least a minute a 25fps.

Lighting is calculated on a per-vertex level using the vertex as a normal (it is a directional vector from the center of the cylinder). Since the light source is placed directly infront of the cylinder this is a single multiplication.

The texture mapping is only traversing the U direction (V direction is increased on a per scanline basis). Which means the innerloop could use the 5 instr. texturing trick (it didn’t - I was well above 25fps anyway so I stopped).