ATI Radeon programming with XBIOS

Author: Bus Error admin - Published 21-06-2011 14:54

Following article explains how to access ATI Radeon hardware accelerated functions via XBIOS calls and create double buffered screen output as in little demonstration I have posted some time ago. It also exposes problems with current versions of drivers. To run this code Atari Falcon030 is required with CT6x accelerator and CTPCI bridge with ATI Radeon card.

How to draw something?

Loading textures

Before anything else I am loading some bitmaps for display stored in PNG format. The one for background and logo which will be displayed at variable screen height. I will write about the loading texture process a little.

The one I have implemented allows to load PNG image data to ST-RAM, TT-RAM or Radeon video ram. ST-RAM option is not very usable. The most interesting is loading image to Radeon video ram, because when image data are there, then we can use hardware accelerated functions and copy it in different ways blazingly fast. The cost in CPU is only in passing correct parameters to hardware accelerated function. After loading, image data are converted manually to current screen format (xRGB).

Informations about texture are stored in custom structure:

typedef struct stexture {
  int size;	   //size of the structure
  char filename[FILENAME_MAX];
  int width,height;
  int format;	     // pixel format xRGB,RGBA,ARGB etc.
  int memflag; 	     // indicates if the image is loaded into Radeon VRAM or in TT Ram
  int x_offset;	     // precalculated x offset in radeon memory 
                     // (to avoid calculating it each frame) not applicable for
                     // conventional RAM
  int y_offset;	     // precalculated y offset in radeon memory 
                     // (to avoid calculating it each frame) not applicable for 
                     // conventional RAM
  void *pImageData;  //image data itself
} sTexture;

For hardware accelerated functions we will not operate on pointers (as I have mentioned earlier). We need to calculate x,y coordinates of texture block in Radeon buffer. But how we do that?

Here is a helper function that does everything during texture load time:

#define RADEON_BASE 0x40000000UL

void calculateVideoBufferOffset(SCREENINFO *scrnInfo,void *p,int *x_offset,int *y_offset){

  //maybe use get info instead passing scrnInfo?

  assert(p>RADEON_BASE); //function works only for Radeon Video RAM
 //if assertion failed, pointer doesn't point to the Radeon memory
 
 long offset=(long)p-(long)RADEON_BASE;
 int bpp=scrnInfo->scrPlanes/8;

 *y_offset= offset / (scrnInfo->virtWidth*bpp);
 *x_offset= (offset % (scrnInfo->virtWidth*bpp))/bpp;
 
 logd("Texture VRAM offset x:%d, y:%d\n",*x_offset,*y_offset);
 
}

The returned coordinates are stored inside the struct, so we don't need to calculate those coordinates every time we need to pass texture to hardware accelerated function. We need to track what type of memory we store texture data in 'memFlag' field. It is crucial during freeing up the memory. That's because deallocations in ST/TT-RAM has to be done using standard MFree(), deallocations in video memory has to be done with new function ct60_vmalloc(),which is actually used for both: allocations and deallocations in video ram.

Example draw function overview

Ok, then... Now we can explore what,in our example program, draw function:

void draw(SCREENINFO *pScrnInfo);

is actually doing.
I'm passing pointer to current screen info(pScrnInfo) to draw() function every frame to avoid storing it in global memory and avoid calling XBIOS (CMD_GETINFO) every frame.
The code I've have written is also resolution independent(the data used in demo are not ;)).
The first thing I'm doing is checking if current logic screen (not currently displayed) is the first one in memory or second:

//which buffer isn't displayed now?
 if(Physbase()==RADEON_BASE){
   screenNb=1;
  }else{
   screenNb=0;
 }

y_offset=screenNb*y_second;

With this, I will know what y offset to use when operating on video memory. We don't want to draw to currently displayed screen, don't we? After that we will be using this offset as a reference starting point of our non-visible buffer we intend to draw to.

Which operations are hardware accelerated?

Now, after short lecture you will know everything about strength's and current limitations of Radeon drivers.

In example source code I've tried to implement several basic functions like:

drawing horizontal and vertical line
copying memory blocks within video ram and from TT-RAM to video ram
putting individual pixels
drawing arbitrary lines with basic, unoptimised Bresenham's linerout

drawing horizontal and vertical line

No problem here, it can be done very fast. Here I am drawing line on the whole screen width with function which is also usable for filing the whole blocks of memory - it is used to clear the screen.

For horizontal line code will looks like this:

dst.Xpos=0;
 dst.Ypos=y_offset+line;
 dst.width=scrnWidth;
 dst.height=1; 	
 dst.block_op=BLK_COPY;
    
 //draw horizontal line, fast
 hwFillScreenRadeon(0x00ff0000,&dst);

We only set x coordinates, y coordinates with y offset calculated before and height=1.
Line is variable which changes the y coordinate in each frame so line is drawn at variable height.

For vertical line code could look like this:

dst.Xpos=10;
 dst.Ypos=y_offset+line;
 dst.width=scrnWidth;
 dst.height=scrnHeight-1; 	
 dst.block_op=BLK_COPY;
    
 //draw horizontal line, fast
 hwFillScreenRadeon(0x00ff0000,&dst);

This will draw our line begining at (10,0) and ending at (10,scrnHeight-1).

Implementation of hwFillScreenRadeon() looks like this:

//function fills video buffer with given fillColor value 
// and with given block operation type
static inline void hwFillScreenRadeon(long fillColor,sRect_t *destRect){
  static SCRFILLMEMBLK fill;
  
  fill.size = sizeof(SCRFILLMEMBLK);
  fill.blk_status = 0;
  fill.blk_op = destRect->block_op;     /* mode operation */
  fill.blk_color = fillColor; 		/* background fill color */
  fill.blk_x = destRect->Xpos; 		/* x pos in total screen */
  fill.blk_y = destRect->Ypos; 		/* y pos in total screen */
  fill.blk_w = destRect->width; 	/* width  */
  fill.blk_h = destRect->height; 	/* height */
  
  Vsetscreen(-1,&fill,VN_MAGIC,CMD_FILLMEM);
}

So, as you see it only fills out the proper struct (SCRFILLMEMBLK) and passes it to Vsetscreen() with CMD_FILLMEM flag. You can also change block operation type which is used when writing pixel to destination. Easy!

copying memory blocks within video ram and from TT-RAM to video ram

If you are copying memory block from let's say buffer inside video ram to current screen buffer or another video ram region without transparency then everything is fine and fast as hell.
In demonstration I am clearing whole 640x480 screen in 32 bit mode, I'm copying block memory with image of whole screen size, draw smaller logo which has the screen width, draw horizontal lines. Everything in one frame with no flickering. Here is how background is copied:

//draw bg from image buffer
 src.Xpos=0;
 src.Ypos=image->y_offset;
 src.width=640;
 src.height=480;
 
 dst.Xpos=0;
 dst.Ypos=y_offset;
 dst.block_op=blokOp;
 
 hwCopyBlockRadeon(&src,&dst);

hwCopyBlockRadeon() implementation looks like this:

//function copies texture block to video buffer with given blit operation
static inline void hwCopyTextureRadeon(void *srcTexBuf,sRect_t *srcRect,sRect_t *dstRect)
{
 static SCRTEXTUREMEMBLK src;
 src.size=sizeof(SCRTEXTUREMEMBLK); //boring stuff
 src.blk_status=0;
 
 //more intersting stuff
 src.blk_src_tex=srcTexBuf;
 
 //set source
 src.blk_src_x=srcRect->Xpos;
 src.blk_src_y=srcRect->Ypos;
 src.blk_w_tex=srcRect->width;
 src.blk_h_tex=srcRect->height;
 
 //set destination
 src.blk_dst_x=dstRect->Xpos;
 src.blk_dst_y=dstRect->Ypos;
 src.blk_w=dstRect->width;
 src.blk_h=dstRect->height;
 src.blk_op=dstRect->block_op;
 
 //go!
 Vsetscreen(-1,&src,VN_MAGIC,CMD_TEXTUREMEM);
 if(src.blk_status!=BLK_OK) logd("Error during copyblock(CMD_TEXTUREMEM) operation\n");
}

The problem is when you are trying to blit, let's say a 2D 256x256 sprite with transparency on some kind of background which is also stored in video ram.

You have to:

Retrieve each sprite from video ram to TT-RAM buffer.
Do another fetch from video ram to get background. We have to put the sprite somewhere
Mask it in TT-RAM somehow, which it should be very fast
Combine with background
Put it back somewhere in screen buffer

So currently you have to do all the masking/blitting by CPU which, as you will see in case of pixels, will choke your machine to death.

CTPCI TT-RAM bus is slow, actually there should be burst mode for one directional transfers from TT-RAM to CTPCI bus, which could speed up things a bit, but it is not yet fixed.
Anyway these kind operations should be done on Radeon side, lack of RGBA pixelformat screen modes complicates things further, because we could have a texture with alpha channel and blit sprite to arbitrary video ram buffer with hardware accelerated function and transparency support.

Apparently there is a function (commented out in sources, CMD_TEXTUREMEM flag) which copies hicolor(16-bit) texture with transparency) - hwCopyTextureRadeon(). It is used in new TOS boot up screen, but what is wrong with it? It's inefficient, because each time you call it it loads texture from ST/TT-RAM which chokes the bus. Using it with larger textures is no go.
The textures should be loaded once to video memory and referenced by handles once loaded. So if we would like to perform operations on textures we could pass type of operation, assigned texture handle and destination block for result, so everything would be done on Radeon side without TT-RAM CTPCI transfers. System of handles is used for loading textures in OpenGL api. And it's not without reason. Transferring data between RAM and video ram is expensive even on modern PCs.

putting individual pixels and drawing arbitrary lines with basic, unoptimised Bresenham's linerout

And here are bad news. Putting pixels now is an overkill. The only hardware accelerated function that can put pixel is? Guess which? Yes, the one used for filling whole memory blocks and drawing horizontal/vertical lines.

//puts pixels at given coordinates with given block operation and color
static inline void hwPutPixelRadeon(long fillColor,sRect_t *destRect){
  static SCRFILLMEMBLK put;
  
  put.size = sizeof(SCRFILLMEMBLK);
  put.blk_status = 0;
  put.blk_op = destRect->block_op;     /* mode operation */
  put.blk_color = fillColor; 		/* background fill color */
  put.blk_x = destRect->Xpos; 		/* x pos in total screen */
  put.blk_y = destRect->Ypos; 		/* y pos in total screen */
  put.blk_w = 1; 	/* width  */
  put.blk_h = 1; 	/* height */
  
  Vsetscreen(-1,&put,VN_MAGIC,CMD_FILLMEM);
}

Second option would be putting pixels with CPU, by accessing video memory directly. The drawback is that it's damn slow and this solution sabotages whole idea of hardware acceleration.
It is plainly visible when trying to draw a line from one screen corner to another with unoptimised Bresenham line routine. The code is present in the sources, but is commented out. You can uncomment it and recompile whole program to see what I am talking about.

Conclusion

After testing the current state of drivers it is plain to see that it needs alot of work to be actually usable. There should be dedicated, hardware accelerated functions for basic primitives (point, line, triangle, quads) with texturing support and variants that would load whole batches of these primitives in one go. Sounds similar?
Yes, it's what OpenGL api is providing since a long time, but not only for 2D. So, maybe focusing on hardware accelerated OpenGL 1.4 support would be better option than adding and implementing new Vsetscreen() flags ?

page:2/2

Article actions