Thursday, October 26, 2006
Scrolling routine
So I thought that I would discuss the scrolling routine to let everyone see what I'm doing, on the Spectrum there is no hardware so we have to do everything with the processor so I have to re-render the whole scrolling area as it is constantly moving, so in pseudo code the routine is.
As I have said before the fastest way on the Z80 to access memory is to use the stack, it takes 11T states to pop 16 bits from memory, whereas it takes 20T states to read 16 bits normally with a "ld hl,(NN)" instruction. The problem with using the stack pointer is that it means that we cannot do anything that would use the stack (i.e. call sub-routines or allow any interrupts), so initialisation looks like this
So this little bit of code disables the interrupts and then saves the sp register into some self modifiying code to restore the current stack pointer value, then saves the address of the table (SR_renderData) into another bit of self modifying code. The Z80 has very limited instructions for setting the value of sp only
so to set sp to a definitive value I use the ld sp,NN instruction and then set the value of NN by writing to that address (the address of the opcode +1).
The outer loop then has to read the values from the table
So the first ld sp,NN is initially modified with the the start address of the table, the values are then popped of the stack (advancing sp at the same time) and then the last instruction stores the new value of sp back in the instruction that loads sp for the next time around the loop. Oh and a jump instruction (coming up in the next section) is then modified so that the jump goes to the correct place.
In this section we prepare the registers with the graphics for this line of the screen, the IX register holds the base of the graphics which is then used to set sp, the graphics are popped into the register and then sp is set to the screen address (IY) then we jump to the correct routine (self modified from earlier), the return address from the routine is just below which advances the graphic address and then the screen address is incremented to the next line down, then we keep going until the end of the table.
This section just restores sp back to it's original address, renables the interrupts and then returns back to the calling routine (in this case the main loop).
Now the routines that are called for rendering each line, will actually be constructed from the map section that we want to render so these will change all the time (and I'll be writing some code later to create them), but they look like this
This code is relatively simple and since each opcode is only 1 byte it will be relatively simple to construct, the jp address always returns to the same place for each bit of code so that is even easier.
Now those who are still awake and have reached this section will have noticed that I have avoided using the exchange register set in this routine completely, this is because currently I am only using 4 different graphic tiles per line for the map but if I use the exchange registers then I could get a total of 8 different tiles on each line. But I have to trade that off against the amount of time it will take to setup i.e. 8 pops and not 4 as it is just now (and also complicating the line rendering code as it will have to swap the register sets as well as it push's), this will slow down rendering. I do have an option though of moving the line setup code into the rendering as well, this would be relatively simple and mean that only the registers that are used would be popped (so if only one is used then only that one is popped) this would give a lot more flexibility.
Anyway thats all for just now, back to the assembler...
Initialise
outer_loop: Read jump Address, graphic address and screen address from table
if jump address high byte is 0 then jump to exit
inner_loop: Pop graphics into registers
jmp Jump address
return_from_jump:advance to the next line of the screen
jump to outer_loop
exit: restore
return.
jump_address: push registers to screen
jump to return_from_jump
As I have said before the fastest way on the Z80 to access memory is to use the stack, it takes 11T states to pop 16 bits from memory, whereas it takes 20T states to read 16 bits normally with a "ld hl,(NN)" instruction. The problem with using the stack pointer is that it means that we cannot do anything that would use the stack (i.e. call sub-routines or allow any interrupts), so initialisation looks like this
Scroll_Render: ld de,SR_renderData
di
ld (SR_sp_restore+1),sp
ld (SR_outer_loop+1),de
So this little bit of code disables the interrupts and then saves the sp register into some self modifiying code to restore the current stack pointer value, then saves the address of the table (SR_renderData) into another bit of self modifying code. The Z80 has very limited instructions for setting the value of sp only
ld sp,NN
ld sp,(NN)
ld sp,hl
ld sp,ix
ld sp,iy
so to set sp to a definitive value I use the ld sp,NN instruction and then set the value of NN by writing to that address (the address of the opcode +1).
The outer loop then has to read the values from the table
SR_outer_loop: ld sp,#ffff
pop hl
ld a,h
and a
jr z,SR_exit
ld (SR_prepare_jump+1),hl
pop ix
pop iy
ld (SR_outer_loop+1),sp
So the first ld sp,NN is initially modified with the the start address of the table, the values are then popped of the stack (advancing sp at the same time) and then the last instruction stores the new value of sp back in the instruction that loads sp for the next time around the loop. Oh and a jump instruction (coming up in the next section) is then modified so that the jump goes to the correct place.
SR_inner_loop: ld sp,ix
pop hl
pop de
pop bc
pop af
ld sp,iy
SR_prepare_jump:jp #ffff
SR_jump_return: ld de,8
add ix,de
inc iyh
ld a,iyh
and 7
jr nz,SR_inner_loop
jr SR_outer_loop
In this section we prepare the registers with the graphics for this line of the screen, the IX register holds the base of the graphics which is then used to set sp, the graphics are popped into the register and then sp is set to the screen address (IY) then we jump to the correct routine (self modified from earlier), the return address from the routine is just below which advances the graphic address and then the screen address is incremented to the next line down, then we keep going until the end of the table.
SR_exit:
SR_sp_restore: ld sp,#ffff
ei
ret
This section just restores sp back to it's original address, renables the interrupts and then returns back to the calling routine (in this case the main loop).
Now the routines that are called for rendering each line, will actually be constructed from the map section that we want to render so these will change all the time (and I'll be writing some code later to create them), but they look like this
SR_line_00: push hl
push de
push bc
push af
push hl
push de
push bc
push af
push hl
push bc
push de
push af
push hl
push bc
SR_line_00ret: jp SR_jump_return
This code is relatively simple and since each opcode is only 1 byte it will be relatively simple to construct, the jp address always returns to the same place for each bit of code so that is even easier.
Now those who are still awake and have reached this section will have noticed that I have avoided using the exchange register set in this routine completely, this is because currently I am only using 4 different graphic tiles per line for the map but if I use the exchange registers then I could get a total of 8 different tiles on each line. But I have to trade that off against the amount of time it will take to setup i.e. 8 pops and not 4 as it is just now (and also complicating the line rendering code as it will have to swap the register sets as well as it push's), this will slow down rendering. I do have an option though of moving the line setup code into the rendering as well, this would be relatively simple and mean that only the registers that are used would be popped (so if only one is used then only that one is popped) this would give a lot more flexibility.
Anyway thats all for just now, back to the assembler...
Comments:
<< Home
So that's the trick, you cache all the patterns on a line. In a monochrome shooter you probably want a discrete background anyway.
I assume you'll still do smooth scrolling, or will that be to limiting? Also, aside from the interrupt flag, can you use the flag register freely?
Another thing. What speed does the Z80's memory bus run at, and how is it shared with the spectrum's graphics chip?
/doynax
I assume you'll still do smooth scrolling, or will that be to limiting? Also, aside from the interrupt flag, can you use the flag register freely?
Another thing. What speed does the Z80's memory bus run at, and how is it shared with the spectrum's graphics chip?
/doynax
Well the line cache is for the tiles that are used on a single line so it is more like a character bank (with 16 pixel wide characters), and then the code that is constructed (with all the push's) in it is like the actual character map (except op codes rather than character references).
For the actual scrolling I need to construct a routine that will rotates the character bank (the line cache), or if I have the memory I may store them pre-rotated.
The Z80 in the Spectrum is 3.5MHz and it shares a block of RAM (16K running from $4000 t0 $7fff) which is contended with the graphics chip.
For the actual scrolling I need to construct a routine that will rotates the character bank (the line cache), or if I have the memory I may store them pre-rotated.
The Z80 in the Spectrum is 3.5MHz and it shares a block of RAM (16K running from $4000 t0 $7fff) which is contended with the graphics chip.
So now you have 4 double-wide chars - let's call them A, B, D & H.
How do you rotate them so that character map like AAABADAH works? To me it looks like you need four different As already. Is that taken care with map layout in Spectrum games?
(Being Gollop brothers fan I haven't played any Spectrun games which scroll, so feel free to label my question with "Doh!" if appropriate :)
--
TNT
How do you rotate them so that character map like AAABADAH works? To me it looks like you need four different As already. Is that taken care with map layout in Spectrum games?
(Being Gollop brothers fan I haven't played any Spectrun games which scroll, so feel free to label my question with "Doh!" if appropriate :)
--
TNT
True, I just meant that four pairs of patterns per line will be severely limiting especially when you have to account for horizontal scrolling. Of course it's still far from intuitively obvious what kind of results we can expect, I guess vertical scrolling would've been easier though.
I'm aware of that the Z80 itself is clocked at 3.5 Mhz. But at what speed does it access memory, and is it ever blocked by the graphics chip as on the C64?
From looking at the instruction timings I'd guess that it runs at quarter speed and that the discrepancies can be accounted for by some sort of prefetch cache.
Of course nothing's to say that Zilog didn't design the Z80 to access memory at an arbitrary rate, it's just that the timings seem suspiciously close to 4 T-states per byte to me.
I guess what I'm asking is can you count cycles on a Z80 simply by adding up the T-states, or will code like your push loop in reality execute at 12 T-states per instruction?
/doynax
I'm aware of that the Z80 itself is clocked at 3.5 Mhz. But at what speed does it access memory, and is it ever blocked by the graphics chip as on the C64?
From looking at the instruction timings I'd guess that it runs at quarter speed and that the discrepancies can be accounted for by some sort of prefetch cache.
Of course nothing's to say that Zilog didn't design the Z80 to access memory at an arbitrary rate, it's just that the timings seem suspiciously close to 4 T-states per byte to me.
I guess what I'm asking is can you count cycles on a Z80 simply by adding up the T-states, or will code like your push loop in reality execute at 12 T-states per instruction?
/doynax
OK the Spectrum has 224T states in a scan line (96 of which are in the border area), and you can cycle count in the same/similar way to the C64 to get effects happening see Starion or Uridium for that sort of effect in getting more colours on screen or extended border effects.
As to the map the layout is limiting (one of the reasons that I want to extend my possible tile set to 8 tiles) and you really need to think in terms of pairs of tails (as we will be scrolling the next one into this tile) so in your example AAABADAH would actually be tile pairs
AA
AB
BA
AD
DA
AH
so you would need 6 tiles to display that map layout.
Mike and I have been discussing this with relevance to the Map editor and we are looking at ways to make this as transparent as possible.
I can put extra detail on the maps using sprites placed on the map (as I would not need to mask in the same way as game sprites).
As to the map the layout is limiting (one of the reasons that I want to extend my possible tile set to 8 tiles) and you really need to think in terms of pairs of tails (as we will be scrolling the next one into this tile) so in your example AAABADAH would actually be tile pairs
AA
AB
BA
AD
DA
AH
so you would need 6 tiles to display that map layout.
Mike and I have been discussing this with relevance to the Map editor and we are looking at ways to make this as transparent as possible.
I can put extra detail on the maps using sprites placed on the map (as I would not need to mask in the same way as game sprites).
Great, that makes timing a lot easier. A lot easier than on the C64 in fact, since you don't have to worry about badlines or interrupts.
Have you thought about schemes to replace the "graphics sets" at certain points in the map?
For example one way to make use the secondary register set would be to dynamically place an EXX instruction somewhere on the line, thus allowing you to replace the tileset once for every horizontal screen.
How do other games using this technique handle these kinds of problems?
/doynax
Have you thought about schemes to replace the "graphics sets" at certain points in the map?
For example one way to make use the secondary register set would be to dynamically place an EXX instruction somewhere on the line, thus allowing you to replace the tileset once for every horizontal screen.
How do other games using this technique handle these kinds of problems?
/doynax
That is exactly what I'm planning to do with the code generation routines as I can just generate the correct code to render that line.
In fact you've just made me think and I could actually have multiple loads if necessary to display the map which would lift a few restrictions. The main issue here is actually timing as loading the registers takes up extra time and on a 48K Spectrum I need to beat the raster to ensure that everything is displayed smoothly.
Actually thinking about it I need to advance the graphics bank register each line (to get to the next line of the graphics bank) so I am limited to 8 tiles per scan line... still you can do a lot with that. I also plan to make it more dynamic so that we are not stuck with graphics zones on the map, so new tiles can be added as the scrolling moves through the map.
Post a Comment
In fact you've just made me think and I could actually have multiple loads if necessary to display the map which would lift a few restrictions. The main issue here is actually timing as loading the registers takes up extra time and on a 48K Spectrum I need to beat the raster to ensure that everything is displayed smoothly.
Actually thinking about it I need to advance the graphics bank register each line (to get to the next line of the graphics bank) so I am limited to 8 tiles per scan line... still you can do a lot with that. I also plan to make it more dynamic so that we are not stuck with graphics zones on the map, so new tiles can be added as the scrolling moves through the map.
<< Home