Thursday, October 26, 2006

Scrolling routine

So I thought that I would discuss the scrolling routine to let everyone see what I'm doing, on the Spectrum there is no hardware so we have to do everything with the processor so I have to re-render the whole scrolling area as it is constantly moving, so in pseudo code the routine is.

                 Initialise
outer_loop: Read jump Address, graphic address and screen address from table
if jump address high byte is 0 then jump to exit
inner_loop: Pop graphics into registers
jmp Jump address
return_from_jump:advance to the next line of the screen
jump to outer_loop

exit: restore
return.

jump_address: push registers to screen
jump to return_from_jump


As I have said before the fastest way on the Z80 to access memory is to use the stack, it takes 11T states to pop 16 bits from memory, whereas it takes 20T states to read 16 bits normally with a "ld hl,(NN)" instruction. The problem with using the stack pointer is that it means that we cannot do anything that would use the stack (i.e. call sub-routines or allow any interrupts), so initialisation looks like this

Scroll_Render:  ld  de,SR_renderData
di

ld (SR_sp_restore+1),sp
ld (SR_outer_loop+1),de


So this little bit of code disables the interrupts and then saves the sp register into some self modifiying code to restore the current stack pointer value, then saves the address of the table (SR_renderData) into another bit of self modifying code. The Z80 has very limited instructions for setting the value of sp only

  ld sp,NN
ld sp,(NN)
ld sp,hl
ld sp,ix
ld sp,iy


so to set sp to a definitive value I use the ld sp,NN instruction and then set the value of NN by writing to that address (the address of the opcode +1).

The outer loop then has to read the values from the table

SR_outer_loop:  ld   sp,#ffff     
pop hl
ld a,h
and a
jr z,SR_exit
ld (SR_prepare_jump+1),hl

pop ix
pop iy
ld (SR_outer_loop+1),sp


So the first ld sp,NN is initially modified with the the start address of the table, the values are then popped of the stack (advancing sp at the same time) and then the last instruction stores the new value of sp back in the instruction that loads sp for the next time around the loop. Oh and a jump instruction (coming up in the next section) is then modified so that the jump goes to the correct place.

SR_inner_loop:  ld   sp,ix     
pop hl
pop de
pop bc
pop af
ld sp,iy
SR_prepare_jump:jp #ffff
SR_jump_return: ld de,8
add ix,de
inc iyh
ld a,iyh
and 7
jr nz,SR_inner_loop
jr SR_outer_loop


In this section we prepare the registers with the graphics for this line of the screen, the IX register holds the base of the graphics which is then used to set sp, the graphics are popped into the register and then sp is set to the screen address (IY) then we jump to the correct routine (self modified from earlier), the return address from the routine is just below which advances the graphic address and then the screen address is incremented to the next line down, then we keep going until the end of the table.


SR_exit:
SR_sp_restore: ld sp,#ffff
ei
ret


This section just restores sp back to it's original address, renables the interrupts and then returns back to the calling routine (in this case the main loop).

Now the routines that are called for rendering each line, will actually be constructed from the map section that we want to render so these will change all the time (and I'll be writing some code later to create them), but they look like this

SR_line_00:   push hl
push de
push bc
push af
push hl
push de
push bc
push af
push hl
push bc
push de
push af
push hl
push bc
SR_line_00ret: jp SR_jump_return


This code is relatively simple and since each opcode is only 1 byte it will be relatively simple to construct, the jp address always returns to the same place for each bit of code so that is even easier.

Now those who are still awake and have reached this section will have noticed that I have avoided using the exchange register set in this routine completely, this is because currently I am only using 4 different graphic tiles per line for the map but if I use the exchange registers then I could get a total of 8 different tiles on each line. But I have to trade that off against the amount of time it will take to setup i.e. 8 pops and not 4 as it is just now (and also complicating the line rendering code as it will have to swap the register sets as well as it push's), this will slow down rendering. I do have an option though of moving the line setup code into the rendering as well, this would be relatively simple and mean that only the registers that are used would be popped (so if only one is used then only that one is popped) this would give a lot more flexibility.

Anyway thats all for just now, back to the assembler...

All change

Mike has been educating me on the wonders of Cascading Styling Sheets so it's all change today on the blog.

Last night I optimised the rendering code quite a bit and managed to shave off a lot of time (around 40 scan lines!!!) it was all down to reorganising things and using tables rather than calculating.

The picture attached shows how much time I've saved in the timing bars (cyan is the render time for the scrolling area, red is the sprite rendering time) so you can see the change from the last post.

So I have now moved some of the calulcation to tables and I've made better use of the available registers. Tonight I'll post the code for everyone to look at (and criticise :) )

Monday, October 23, 2006

Pictures!!!

Mike has been pestering me to put up an in-progress picture, and I have to emphasise that this is very much in-progress (including programmer art).

Basically this shows the scrolling area being redrawn every frame with a sprite being rendered on top (actually it is currently 4 sprites in the same position).

The border colours are showing how long is being taken for the 2 routines (Cyan is the scroll render and red is the sprite drawing). So the scroll render is taking around 2/3rds of a full frame and we are only rendering the top 2/3rds of the screen (128 pixels).

I've been talking tactics with Mike and I've had to rethink some things over the weekend as the Z80 was just not fast enough for what I was thinking... it really take some time to readjust to how little can really be done by these processors, we really are spoiled by current machines!

I still have to write the processing part of the scrolling which will create the code to render the screen and rotate the tiles of the actual screen, but we have already decided that the game will run in 2 frames (i.e. it will update at 25fps, the same as the Plus/4 and the C64) so I have plenty of time.

Mike also pointed out that the 128K Spectrum would allow me to double buffer the screen (which would mean I would not need to use the floating bus technique), currently I'm a purist aiming at the 48K Spectrum but I will be doing a 128K version as well so I'll keep that knowledge in my back pocket.

I need to get back to this processing stuff now so that I can get it all moving...

Wednesday, October 18, 2006

Old and New

So I've spent a while now implementing scrolling on the Spectrum and its the first Z80 code of any size for a long time and I'd forgotten how processors have moved on over the years. I've been frustrated by little things that you take forgranted on x86, Mips etc.

1. No instruction to add a constant value to a register, only register to register.
2. Rotate instructions only rotate by one at a time... oh to be able to rotate by a variable amount.
3. Lots of registers but a lack of addressing modes, a lot of instructions only allow register to register interactions and that means you need to shuffle them around all the time (and a lack of shuffling instructions (at least for 16bit regs)).

Some things are good though, the ability to use the flags registers as return states (mainly because a lot of instructions do not set them), this makes things much easier in many ways.

More modern processors make it much easier and what is natural these days is far too slow for the old processors, I need to think much more about minimising the amount of work that needs to be done by the Z80, a readjustment in my thinking is needed...

Saturday, October 14, 2006

Waiting for the right moment!

The Spectrum is an interesting beast because it basically has no hardware and you need to do everything in software. When rendering the best way to do things is to render behind the raster that is sending everything to the TV screen.

The only synchronisation that is available is with the vertical sync of the raster, but this leaves a lot of time (in terms of machine cycles) before the start of the screen, where we want to start rendering to allow us to draw just behind the raster, if we render before hand then we will see flicker, just behind and the display will be rock solid (which is what we are aiming for).

Fortunately on the Spectrum there is a little known way that this can be done, once again a technique which was used on Cobra (a technical masterpiece), it uses the floating bus on the Spectrum (see http://www.ramsoft.bbk.org/floatingbus.html for an explanation). The code is relatively simple


SyncToScreen: ld bc,#3f28
STS_lp: ld a,c
in a,(#ff)
cp b
jr nc,STS_lp
ret


This routine basically waits until an attribute is being output from the ULA of the Spectrum, which if it is started during the border region of the display, then it will wait for the first line of the screen... exactly what we need!

So now our armoury of techniques is filling up I really need to get on and write some real code...

Wednesday, October 11, 2006

First Post

Well here we are, first post on a journey back into time.

It's been around 20 years since I last touched a real Spectrum to program it, but here I am back again!!!

Mike has given me this chance to put together a Spectrum version of Xenon TriOxide, so I've been re-acquainting myself with an old friend.

The game is a scrolling shoot em up, and the best scrolling method on the Spectrum was used in Cobra, where the screen is redrawn every frame by "push"ing the map onto the screen.

The technique is fairly simple the stack pointer is set to a screen address and then code is executed which draws the screen, this code is created procedurally from the game map, it is a simple tower of push instructions, in psuedo code it is reasonably simple

This is the fastest way to redraw the screen.

Tomorrow I'll talk about how to synchronise the screen so that we don't get any flicker.

This page is powered by Blogger. Isn't yours?