This article originally appeared at the D-Bug forum.

Recommended reading

A good summary of the STE scrolling is in Paranoid's authoritative STE FAQ, which is well worth a read! To be honest I recommend it reading in parallel, since it shows memory addresses of hardware registers and explains them in good detail.

Also, Mikro's Videl in practice will explain all the registers of the Falcon's Videl if you want a better explanation of the Falcon bit of this text (I only explain it in principle!)

I'll be taking a different approach here, explaining a bit how the hardware works so you guys will have a better understanding of the hardware instead of blindly following instructions here :-)

Warm up (or why the hell we're in this mess)

Before I begin talking about the STE, I'll write a few words on the STM/STF/STFM shifter, as I think it'll be beneficial later - don't want to head dive in the STE shifter right away :-)

As we all know, a low resolution ST screen is composed of a 320x200 image, with borders on top, bottom, left and right. The screen is composed (as in: how the shifter chip sends the data to the monitor) as follows:

  1. Assuming that the monitor's "Photon gun" is at top left of the screen, the shifter sends the background colour to the screen, up to the point it has to draw the first line
  2. Then for each line the shifter sends a few cycles of background colour until the photon gun reaches a visible point, then it sends the line's screen data, and then the background colour again until we reach the end of the scanline. (actually while the photon gun repositions itself from the right border to the left it shoots black colour to avoid interfering with the rendered pic, hence "horizontal blank"!)
  3. And finally the shifter sends the background colour until it reaches bottom right corner

I assume you're used to the fact the ST screen is composed of interleaved bitplanes in memory i.e. to describe 16 pixels we use one word for plane 0, next for plane 1, next for plane 2, and another one for plane 3. I'll briefly explain the ST shifter pixel generating pipeline now to show you why this is practical.

Firstly the chip fetches the screen address pointer and loads up the first 4 words (bitplanes) into 4 16-bit registers. Those registers have the following data on them:

Register Pixel Plane 0  0123456789ABCDEF Plane 1  0123456789ABCDEF Plane 2  0123456789ABCDEF Plane 3  0123456789ABCDEF

So, to determine pixel 0's palette index, the hardware has to grab the leftmost bits from each register and combine them in another. This is done using shifting left in hardware (so now you might get an idea where the term "shifter chip" comes from :-)). Then, this value is fed to a look up table that has the R,G,B intensity values for each palette index ($ffff8240.w anyone?), which produces 9 or 12 bits (depending if we're on STF or STE), which are led to a pack of resistors, which produce the intensity of each R,G,B value, and then sent to the monitor. The shifting is efficient because once pixel 0 is determined, the hardware is ready to calculate pixel 1 with another set of shifts.

So, schematically, the pipeline per scanline is:

Fetch 4 bitplanes->Shift left 16 times and feed the data to the palette index->Increase screen address pointer->Loop back until all the scanline's pixels have been drawn

On the Amiga it's more or less the same logic, but more complex. One of the key differences there is that you can assign a pointer to each bitplane separately, so the Amiga's bitplanes aren't interleaved.

One step beyond (AKA the STE)

So, for the STE, Atari wanted to implement fine scrolling horizontally and vertically without hurting backwards compatibility. The vertical bit is easy - they just enabled the screen pointer to be given any even address. So, by adding 160 (or the line width) to the screen pointer, you can scroll upwards without doing anything else. You can do cheap horizontal scrolling in 16-pixel steps by adding 8 to the screen pointer. But what about fine scrolling? Hmmm. That's trickier with that damn planar mode :-). Well, here's how Atari got out of this situation.

It's easy to figure out how to do a vertical scroll - just reserve a bigger memory area and fill it in the same fashion as one would fill a normal screen. But think about horizontally - a single scanline must consist of more bytes than a normal one! So, we need a way to tell the shifter how many bytes our scanline consists of, so it will know how many words to skip until it reaches the start of the next one, otherwise we'd have to feed it manually each scanline. That's where the new hardware register, linewid, comes into play and it simply does what's described above. So, this value must be loaded with the number of extra words our virtual screen will have. So, we have to modify the above pipeline as follows:

Fetch 4 bitplanes->Shift left 16 times and feed the data to the palette index->Increase screen address pointer->Loop back until all the scanline's pixels have been drawn->Add 2*linewid to screen address

That's all fine and dandy and now we have a way to set a virtual screen bigger than the native resolution, and the shifter will render that properly. But we have forgotten fine pixel scrolling! D'oh!

Well, obviously we'll need another register for this, to say how pixels we need to be inside the bitplane, and sure enough hscroll does exactly this. But how do we implement this in hardware? Well, remember I mentioned that the shifter shifts data all the time? What would happen if we shifted out the bits we don't want to show from the first bitplane pack and then keep shifting the rest? Wouldn't that solve our problem? Hooray! So we can change the pipeline as follows:

Fetch first 4 bitplanes->Shift left as many times as hscroll tells us->Shift left the rest of the words' pixels and feed the data to the palette index->Fetch 4 bitplanes->Shift left 16 times and feed the data to the palette index->Increase screen address pointer->Loop back until all the scanline's pixels have been drawn->Add 2*linewid to screen address

Well, don't open the champagne yet! Sure enough it won't show the pixels we don't want it to, but it will produce 2 major problems! Firstly, because the hardware doesn't have infinite time to perform this preshifting, the screen will be positioned wrongly, so it'll show like it's dancing left and right. Secondly, if we are to shift out some bits at the start, don't we have to fetch some extra data at the end of the scanline and shift some in order to finish the scanline?

The first problem is solved this way: When hscroll0 (because of course when it is zero you don't have these nasty side effects), begin rendering the screen 16 pixels to the left, wait 16-hscroll cycles (sending the background col to the monitor), start the preshifting as many times as the value of hscroll (still sending the background col to the monitor), and finally execute the rest of the pipeline. So we adjust the pipeline yet again:

Fetch first 4 bitplanes->Wait 16-hscroll pixel render cycles, sending the background colour->Shift left as many times as hscroll tells us, sending the background colour->Shift left the rest of the words' pixels and feed the data to the palette index->Fetch 4 bitplanes->Shift left 16 times and feed the data to the palette index->Increase screen address pointer->Loop back until all the scanline's pixels have been drawn->Add 2*linewid to screen address

Okay, we're nearly done now, apart from the second problem! Let's get back to it again. Basically what happens is when hscroll0 the shifter will need to fetch an extra group of bitplanes to compensate for the preshifted pixels at the start of the scanline. Now, if you recall, linewid doesn't say how many words a scanline consists of, but how many words to skip to get to the next scan line. So, the shifter fetches an extra 8 bytes in these cases. How do we solve this? Well, in the end Atari decided to be lazy and leave this to the programmer's hands. That's why you need to adjust linewid per frame when you're doing fine scrolling on the STE.

The tl;dr version (AKA what you need to do to get 8-way directional scrolling on STE!)

  1. Decide beforehand the virtual screen window you want to set up
  2. If you want horizontal scroll, determine the scanline's length in bytes. Make sure it's in multiples of 8. Also, you can't have a smaller window than 160 bytes (320 pixels).
  3. Multiply the bytes per scanline with total number of scanlines to determine the virtual screen's size.
  4. Fill the screen as you would fill a normal screen, but taking into account the new line width.
  5. Calculate linewid by subtracting 80 from the number of words of your scanline.
  6. Set hscroll to 0.
  7. Set the screen address pointer to the virtual screen.
  8. To scroll vertically, add or subtract the line width from the screen address.
  9. To scroll horizontally, firstly adjust the screen pointer to position it to the proper bitplane group within the scanline (basically (x/16)*8 will do the trick). Then load hscroll the value of (x AND 15). Finally, if hscroll is zero, then linewid is the value you calculated above. Otherwise, add 8 to it. Then load linewid again.

Falcon time!

Enter the Falcon with its shiny new VIDEL instead of the shifter! Basically most of the things mentioned above works as expected even for 8 planes mode (256 bytes), with a few minor exceptions. Most notably, linewid is now a word value instead of a byte that the STE has. So you can now introduce a much bigger horizontal window, hooray! Also, we get a new register that needs to be set, called VWRAP, which is actually the visible line's width in pixels (which means you can set up resolutions smaller than for example 320!). But there's a teeny weeny restriction: The screen address pointer now can't be a multiple of 2, but a multiple of 4. So in planar mode you need to align your screen buffer on a 4-byte boundary.

In addition to the planar modes, the falcon also offers a sexy 16-bit mode, dubbed by Atari True colour. In this mode, each pixel consists of 5 bits for red, 6 bits for green and 5 for blue. Let's see how many shouted "Oh Fxxk" before I mention the problem! Basically, you're apparently screwed for horizontal scrolling per pixel!

Actually, things are much simpler in this mode - you don't even have to touch hscroll/linewid, just update the screen address pointer. But since a pixel is 16 bits AKA 2 bytes, scrolling horizontally per pixel seems impossible if you can only place the screen pointer in a 4 byte boundary!

Initially I thought I'd solve this problem using 2 buffers. One buffer would have the initial screen, and the second would have the same screen, but one pixel to the left. So, if I wanted to scroll per pixel, I'd show the first buffer, then the second, then the first buffer+4 bytes, then the second+4 bytes, etc. Of course this works, but it uses an extra screen, plus it makes updating of the 2 buffers a bit weird (well not too much, but still :-)).

Then I remembered a post by Patrice Mandin that claimed that you can fine scroll by tweaking the border values of the Videl. See, the Videl is a much more capable IC than the STE's shifter is, and can compose many more screens than the 3 the STE shifter was capable of. So, many more variables are user definable now. What Patrice actually described is the same logic we applied to describe the STE shifter's rendering pipeline: start rendering the screen a bit earlier if you want to scroll by 1 pixel (not aligned to 4 byte boundary) and increase the border size, so it won't be shown!

To get somewhate practical here (since this was going to be the heart of the original text), Patrice gave me the "magic" values you have to enter to perform this. Note that these values assume a 320x240 VGA screen. RGB or other resolutions will need some fiddling around, and for now I'll leave it as an excercise to the reader! It's not too difficult to guess and experiment given the values below though!

The values you'll need to change per VBL are HBB - Horizontal Border Begin, HBE - Horizontal Border End, HDB - Horizontal Display Begin, HDE - Horizontal Display End.

When one uses XBIOS to set a 320x240xTC screen, hdb,hde,hbb and hbe should are set to $2ac,$91,$8d,$15 respectively. Then, when your screen pointer is aligned to 4 bytes, set hdb,hde,hbb and hbe to $2ac,$91,$8b,$17 and when you want to scroll 1 pixel left, set hdb,hde,hbb and hbe to $2ab,$91,$8b,$17.


And that's the end for now. Hope I made myself clear most of the time. Again, don't hesitate to ask for clarifications if something seems wrong or odd!