Okay well apart from it returning completely the wrong values, this tile renderer works quite well.
-
Oh hey, if you give the tile renderer the correct VRAM address to render from, the quality of results improves somewhat.
-
Hot damn, tiler might be working again! Now with less bufferbloat, and also the weird partial renders and flipped tiles stuff might be working, though I still need to write tests for that.
-
Oh but I did reintroduce the latency bubble where the renderer is forced into idle state for a cycle before the next request can proceed. That should still be easy to eliminate, though maybe I'll hold off on tweaking the timing until the functionality tests are done.
-
OTOH the stalling works perfectly, if a consumer's slow everything just slows down but keeps pipelining as much as possible, but without tripping over itself.
Now I suppose ideally all the as yet untested features would also work...
-
Wow horizontal scroll offsets... work? I can start a render partially into a tile, and stop a render short of the tile's end, and the output is correct in all cases, regardless of the number of bytes that need to be read, skipped over, partially consumed, ...
Just horizontal flip left, if that works I may cry a little.
-
Flipped tiles... also work? This might just be fully working now?...
Well aside from that pipeline bubble that's easy to solve once I lock in tests, but... it works? I think?
-
Hahaha, even flipping tiles combined with complicated partial rendering works! Given a 16-pixel tile, I can tell it to flip the tile, then render pixels 2 through 7.
Although, hmm, I might be doing the flipping and the indexing in the wrong order, if I tell it I want columns 2 to 7, should that be pre- or post-flip?...
-
... yeah I'm doing that bit backwards, slicing first then flipping the slice. Hopefully that's an easy fix, presumably I just need to swap the initial and final bit drops around, I think...
-
Yup easy fix, I was swapping the start and end skips when flipping the tile, so effectively undoing the flip in terms of which pixels to output. Now flipping and slicing works properly!
So... I think this module is done and works? Given the RAM address of a row of tile pixels, I can render out those pixels at 1/2/4/8bpp, accounting for pixel-wise horizontal scroll and optional horizontal flipping.
Now I just need to chain those together into a scanline and... profit??
-
I am dreading a little bit what this is going to look like after synthesis, because this feels like an absolute clowncar of barrel shifters, adders, muxes and other cursed shit. But I optimized the code for being understandable, and a fair bit of the source text should still evaporate during static elaboration, and hopefully yosys can do further intelligent things to it...
I guess in fairness the Verilog module I'm reimplementing is also messy af on the inside, and that works apparently so...
-
oh right wait also that pipeline bubble that I need to squeeze out, not a huge deal tbh I think everything would still run plenty fast enough, but it offends me that I can't _quite_ stream out a pixel per cycle when it's definitely possible