jak-project/docs/scratch/shrub_asm.md

1487 lines
45 KiB
Markdown
Raw Normal View History

# Shrub Renderer
The shrub renderer is part of the background system. Each level probably has 1 or 0 `drawable-tree-instance-shrub`s, containing all of the shrubs in that level (if it has any shrubs).
Because the shrub renderer is part of the background system, actual DMA generation happens in `finish-background`.
## Original Design
In `shrub`, there are prototypes and instances. Each "prototype" defines a model (like a bush, tree, etc). Each "instance" is a particular placement of a prototype in the world.
Each "prototype" has 4 different geometries. Some of the geometries can be missing:
- prototype-generic-shrub
- prototype-shrubbery
- prototype-trans-shrubbery
- billboard
The first two are believed to have the same data, but if the shrub is very close to the player and partially off-screen, it must be scissored, and only the `generic` renderer supports scissoring.
The `prototype-trans-shrubbery` allows shrubs to fade away. It's likely that the format is extremely similar, or even the exact same.
The `billboard` is a single quad.
Effects:
- Time of Day lighting. It looks like each "drawable-tree-instance-shrub" has a time-of-day color palette that is adjusted based on the time of day
- Per-instance time of day lighting. Each instance may use different colors.
- Wind effect. This applies an additional transformation matrix per instance.
## Our Design
We will ignore the prototype-generic-shrub - OpenGL will take care of scissoring for us.
Like with tfrag/tie, we will do the time of day interpolation in C++.
The shrubs without wind effect will be converted into a single giant mesh. Doing it as a single mesh reduces the number of draw calls, and the entire mesh can be left in GPU memory the whole time.
The shrubs with wind effect will be drawn as individual instances, as different shrubs need different wind matrices. It's likely going to be similar to `render_tree_wind`.
The time-of-day effect will be done like in tfrag/tie. We will create a new time of day texture on each frame, based on the current time, and each vertex will index into a single large texture. This approach is nice because the interpolation/upload can be done in a single large batch.
## Setup Before (in `background.gc`)
The shrub system doesn't use the precomputed visibility strings, so we can ignore this.
- The `background-upload-vu0` function loads `vf16-vf31` with various math camera values.
- The `background-upload-vu0` function loads hte `background-vu0-block` program to VU0 and runs the subroutine at 0.
- The current level index (0 or 1) is stored in the scratchpad (as a `terrain-context`)
- The time of day colors are calculated with `time-of-day-interp-colors`. The colors are stored in `*instance-tie-work*`. We can move this to C++ and do it faster.
After setup, the main function to generate DMA is `draw-drawable-tree-instance-shrub`. This function will be removed in the PC port. Instead, we will send the C++ code some data:
- camera matrix
- name of the level
## `draw-drawable-tree-instance-shrub`
Basic outline
- Reset the `instance-shrub-work`
- Check if renderer is enabled
- Call `draw-inline-array-instance-shrub`. Each prototype has a "bucket" containing a linked list of instances. This function adds the instances to the buckets.
- Call `draw-prototype-inline-array-shrub`. This builds the final DMA list from the buckets.
- Various performance counter things that we can ignore.
## `draw-inline-array-instance-shrub`
Args:
- `a0` dma buffer
- `a1` inline array of `draw-node` (a usual draw-node BVH with child type `instance-shrubbery`)
- `a2` length of this array
- `a3` inline array of `prototype-bucket-shrub`
```lisp
B0: ;; block 0: one-time setup
L57:
;; Function prologue
daddiu sp, sp, -32
sd ra, 0(sp)
sq gp, 16(sp)
lui t3, 28672 ;; t3 = 0x70000000, the scratchpad
lw v1, 4(a0) ;; v1 = (-> dma-buf base). we'll be writing DMA data here.
lui t2, 4096 ;; t2 = 0x10000000 (used later)
lui t1, 4096 ;; t1 = 0x10000000 (used later)
;; this does some data cache stuff. we don't have to worry about it.
sync.l
cache dxwbin v1, 0
sync.l
cache dxwbin v1, 1
sync.l
lw t0, *instance-shrub-work*(s7) ;; t0 = instance-shrub-work. This stores many temporary variables.
ori t5, t2, 54272 ;; t5 = 0x1000D400 (DMA SPR_TO register)
sw a0, 6524(t0) ;; stash dma-buf argument in instance-shrub-work.dma-buffer
ori a0, t1, 53248 ;; a0 = 0x1000D000 (DMA SPR_FROM register)
lw t2, *wind-work*(s7) ;; t2 = *wind-work*
;; note on crazy scratchpad stuff.
;; to get faster speed, it is useful to have both the input (instances) and output (DMA data) stored
;; in the scratchpad. However, the scratchpad is not big enough to store everything.
;; they divide the scrachpad in 4:
;; 0-5200 is one "instance" buffer
;; 5200-10400 is the other "instance" buffer
;; 10400-12448 is on "out" buffer
;; 12448-end is the other "out" buffer.
;; This code reads instance data from one instance buffer and writes DMA data to one out buffer.
;; while this is happening, the SPR_TO/SPR_FROM channels will be copying the next instances to
;; the other instance buffer, and copying the output dma back into the dma-buf.
;; Once they are done, the buffers will swapped. So there is continuous copying and processing.
;; I will use notation like spad.instance-buf and spad.out-buf to indicate the scratchpad buffers.
;; There are two instance buffers, and we don't have to really care which one they are using -
;; we can assume that they implemented double buffering properly.
ori t1, t3, 10416 ;; t1 = spad.out-buf (high buffer)
sw r0, 6544(t0) ;; instance-work.chains = 0
;; Note on "stack"
;; this draw-node tree is... a tree.
;; this drawing function traverses the tree.
;; in order to traverse a tree, you need something like a stack.
;; the tree has a fixed max depth of 6
;; The node/length fields of the instance-shrub-work are this stack.
;; t4 is the "stack pointer". It points to instance-shrub-work + 4*depth.
;; Then you can access at the normal offsets of node/length to access the correct
;; slot for your stack frame.
or t4, t0, r0 ;; t4 = instance-work (todo, why?)
lqc2 vf3, 6064(t0) ;; vf3 = instance-work.constants (128, 1.0, 0.0, fog0)
sw t5, 6412(t0) ;; instance-work.to-spr = 0x1000D400 (just stashing this here for later)
ori t6, t3, 16 ;; t6 = spad.instance-buf (low buffer)
addiu t7, r0, 720 ;; t7 = 720
sw a3, 6476(t0) ;; instance-work.prototypes = the input inline array of prototypes
addiu t3, r0, 0 ;; t3 = 0
sw a3, 6404(t0) ;; instance-work.bucket-ptr = the input inline array of prototypes
addiu a3, r0, 0 ;; a3 = 0
sw a1, 6428(t4) ;; instance-work.node = the input draw node. (note, we're using t4 here)
or t3, t1, r0 ;; t3 = spad.out-buf
sw a2, 6452(t4) ;; instance-work.length = the input length (num draw nodes at this level)
addiu a1, r0, -1 ;; a1 = -1
sw t7, 6516(t0) ;; instance-work.current-shrub-near-packet = 720 (?)
daddiu t7, t0, 48 ;; t7 = instance-work.chaina
sw t6, 6408(t0) ;; instance-work.src-ptr = spad.instance-buf
daddiu a2, t0, 176 ;; a2 = instance-work.chainb
sw t6, 6388(t0) ;; instance-work.instance-ptr = spad.instance-buf
daddiu t6, r0, -64 ;; t6 = -64
sw t5, 6412(t0) ;; instance-work.to-spr = 0x1000D4000 (oops, did it twice)
;; note on alignment.
;; the instance-shrub-work object is only 16-byte aligned.
;; but, for some reason, they want these chaina/chainb things to be 64 byte aligned.
;; they put a 48 byte "dummy" field before them, and and with -64 to get aligned versions.
;; I'll call these aligned versions chaina-aligned/chainb-aligned
and t5, t7, t6 ;; t5 = chaina-aligned
sw a0, 6416(t0) ;; instance-work.from-spr = 0x1000D000
and a2, a2, t6 ;; a2 = chainb-aligned
sw t5, 6392(t0) ;; instance-work.chain-ptr = chaina-aligned
addiu t5, r0, -1 ;; t5 = -1
sw a2, 6396(t0) ;; instance-work.chain-ptr-next = chainb-aligned
sll r0, r0, 0 ;; nop
sw t4, 6400(t0) ;; instance-work.stack-ptr = t4 (right now, at base)
sll r0, r0, 0 ;; nop
sw t5, 6540(t0) ;; instance-work.last-shrubs = -1
sll r0, r0, 0 ;; nop
sw r0, 6548(t0) ;; instance-work.flags = 0
sll r0, r0, 0 ;; nop
sw r0, 6560(t0) ;; instance-work.inst-count = 0
sll r0, r0, 0 ;; nop
sw r0, 6556(t0) ;; instance-work.node-count = 0
;; Note on vcallms 17. this is a tiny program that loads vf's
;; plane is the culling planes (in normal world coordinates)
;; vf24-vf27 use the camera-rot matrix. This confusingly also includes the
;; translation, but does not include the projection matrix.
;; each vector is just the z component of that camera vector repeated 4 times
;; (it's computed in the vcallms 0 of background-upload-vu0)
;; lq.xyzw vf16, 0(vi00) | nop ;; plane0
;; lq.xyzw vf17, 1(vi00) | nop ;; plane1
;; lq.xyzw vf18, 2(vi00) | nop ;; plane2
;; lq.xyzw vf19, 3(vi00) | nop ;; plane3
;; lq.xyzw vf24, 12(vi00) | nop ;; [cam-rot0.z cam0-rot.z cam0-rot.z cam0-rot.z]
;; lq.xyzw vf25, 13(vi00) | nop ;; same but cam-rot1
;; lq.xyzw vf26, 14(vi00) | nop :e
;; lq.xyzw vf27, 15(vi00) | nop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
B1:
L58: ;; LOOP TOP. We reach here when we want to explore a new draw node.
vcallms 17 ;; set up vf registers
lw t4, 6400(t0) ;; t4 = instance-work.stack-ptr
addiu t5, r0, 7 ;; t5 = 7 (remaining instances in group. we find up to 7 visible instances)
lw a2, 6392(t0) ;; a2 = instance-work.chain-ptr
sll r0, r0, 0 ;; nops, I guess to wait for the vu program?
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
;; starting here, we're looking for a node that we can draw.
;; this is doing "sphere in view frustum" culling through the BVH tree
;; it will exit once it's found the next visible thing to draw.
;; the details here are:
;; - normal "can we see the sphere?" check
;; - also a distance from the camera check. If we fail that, skip.
;; - this builds DMA, but not drawing DMA. It builds DMA to upload the thing to the scratchpad.
;; once we find it, go to L63.
B2:
L59:
dsubu t7, t4, t0 ;; t7 = 0 if at root of tree, negative otherwise
lw t6, 6452(t4) ;; t6 = length at current stack frame
bltz t7, L63 ;; if we're not at one of the roots, draw it. we wouldn't have added it otherwise.
lw t8, 6428(t4) ;; t8 = node
;; we'll only get here if we're at the root. We have no idea if the roots are visible or not
beq t6, r0, L62 ;; if no nodes, skip!
lqc2 vf2, 12(t8) ;; vf2 = bsphere of the node
;; note that this code assumes we're deep enough to find instance-shrubs.
;; and sets up DMA to DMA them to the scratchpad for later processing.
;; but, we might have only found draw-nodes.
;; this is okay. The DMA we set up here will only be used if we actually find instance-shrubs.
;; we also set up the stack for more draw nodes. Again, it's okay because we'll only actually increment
;; the stack pointer if we find out that there are more levels.
;; the bsphere culling code for draw nodes/instances are identical, so that part
;; can be used in either case.
B4:
sll r0, r0, 0 ;; nop
lqc2 vf6, -4(t8) ;; vf6.w = distance of the node. (other stuff is junk I think)
vmulax.xyzw acc, vf16, vf2 ;; sphere in view frustum (will eventually put result in vf4)
lbu t6, 3(t8) ;; t6 = node flags
vmadday.xyzw acc, vf17, vf2 ;; sphere in view frustum
lw t7, 4(t8) ;; t7 = node child
vmaddaz.xyzw acc, vf18, vf2 ;; sphere in view frustum
lbu t8, 2(t8) ;; t8 = node child count
vmsubaw.xyzw acc, vf19, vf0 ;; sphere in view frustum
lq t9, 6016(t0) ;; t9 = instance-work.dma-ref
vmaddw.xyzw vf4, vf1, vf2 ;; sphere in view frustum (done!, vf4 now has signed distance from planes)
sw t7, 6432(t4) ;; place child on stack
vmulaw.xyzw acc, vf1, vf6 ;; acc = [dist, dist, dist, dist]
sw t8, 6456(t4) ;; place child's length on stack
vmsubax.xyzw acc, vf24, vf2 ;; dist calc (note, just for computing z)
sq t9, 0(a2) ;; store dma-ref in chain-ptr
vmsubay.xyzw acc, vf25, vf2 ;; more dist calc
daddiu t9, t7, -4 ;; t9 = node minus type tag
vmsubaz.xyzw acc, vf26, vf2 ;; more dist calc
sll t7, t8, 2 ;; t7 = num children * 4
qmfc2.i ra, vf4 ;; ra = sphere/plane signed distances
addu t7, t7, t8 ;; t7 = num children * 5
vmsubaw.xyzw acc, vf27, vf0 ;; more dist calc
sw t9, 4(a2) ;; store address of draw nodes in the dma tag
vmaddw.xyzw vf7, vf1, vf2 ;; finish dist calc
sw t8, 8(a2) ;; stash the child count after the dma tag (space unused)
pcgtw t8, r0, ra ;; check signed distance to planes
lw t9, 6452(t4) ;; t9 = current stack length
ppach ra, r0, t8 ;; pack so signed distance compares are in lower 64
lw t8, 6428(t4) ;; t8 = node
bne ra, r0, L61 ;; branch on reject
sb t7, 0(a2) ;; store qwc in chain
;; if we reach here, we passed the sphere in view check
B5:
sll r0, r0, 0
sll r0, r0, 0
daddiu t7, t9, -1 ;; t7 = stack length - 1
qmfc2.i t9, vf7 ;; t9 = dist check result
daddiu t8, t8, 32 ;; advance to next node (assuming draw nodes)
sll r0, r0, 0
bltz t9, L61 ;; branch if failed dist check
sll r0, r0, 0
B6:
beq t6, r0, L60 ;; check if we actually reached the instances (0 = instances).
sll r0, r0, 0 ;;
B7:
beq r0, r0, L59 ;; didn't reach instances. need to go deeper in tree!
daddiu t4, t4, 4 ;; inrease stack depth. branch will find visible things.
;; if we reach here:
;; - we've reached leaves (instances)
;; - the instance is visible
;; - we have a chain set up to DMA it to the scratchpad.
B8:
L60:
daddiu a2, a2, 16 ;; advance dma building pointer (looks like we have room for up to 8)
sw t7, 6452(t4) ;; decrement stack length (we're done with this one)
daddiu t5, t5, -1 ;; decrement instance count (counts down from 7, we can only do 7 in a group)
sw t8, 6428(t4) ;; increment node in stack
blez t5, L63 ;; goto L63 if we're full for this group
dsubu t6, t4, t0 ;; check if we're at the root still
B9:
bgtz t7, L59 ;; not full, more at this level.
sll r0, r0, 0
B10:
blez t6, L63 ;; if we're at the root of the tree and the lenth is zero, we're done, draw what we have.
daddiu t4, t4, -4 ;; "return" and decrement sp (go up a level, we finished exploring this one)
;; common "advance to next based on stack"
;; we might have to return multiple levels, and this loop here does this.
B11:
L61:
sll r0, r0, 0
lw t7, 6452(t4) ;; t7 = length
sll r0, r0, 0
lw t6, 6428(t4) ;; t6 = node
daddiu t7, t7, -1 ;; dec
dsubu t8, t4, t0 ;; depth check
daddiu t6, t6, 32 ;; inc node
sw t7, 6452(t4) ;; store len
bgtz t7, L59 ;; keep going if not done (break out of returning loop)
sw t6, 6428(t4) ;; store node
B12:
blez t8, L63 ;; draw if we're at the end.
sll r0, r0, 0
B13:
L62:
beq r0, r0, L61 ;; reloop in the return loop
daddiu t4, t4, -4 ;; ascend one level
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; DMA TO SPR
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; if we reach here, we've got a chain set up that will send visible instances to the SPR.
B14:
L63:
sll r0, r0, 0 ;; nop
sw t4, 6400(t0) ;; store draw node stack pointer in instance-shurb-work
sll r0, r0, 0 ;; nop
lw t5, 6392(t0) ;; t5 = instance-work.chain-ptr (the start of the visible instance chain we just made)
sll r0, r0, 0 ;; nop
lw t4, 6412(t0) ;; t4 = instance-work.to-spr (EE DMA control register address)
beq t5, a2, L66 ;; will be equal if we didn't have any DMA
lq t5, 6032(t0) ;; dma-end (an 'end packet)
;; if we get here, we actually have data to send
;; these two blocks just wait until any in-progress to-sprs finish.
;; every iteration of the loop increments the "wait-to-spr" counter
;; (they likely tuned this code to reduce waits by moving stuff around)
B15:
L64:
lw t6, 0(t4)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
andi t6, t6, 256
sll r0, r0, 0
beq t6, r0, L65
sll r0, r0, 0
B16:
sll r0, r0, 0
lw t6, 6568(t0)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
daddiu t6, t6, 1
sll r0, r0, 0
sw t6, 6568(t0)
beq r0, r0, L64
sll r0, r0, 0
;; when we get here, there is no in-progress spr-to transfer
B17:
L65:
sll r0, r0, 0 ;; nop
lw t6, 6544(t0) ;; t6 = instance-work.chains (just a counter of how many spad uploads we do)
sll r0, r0, 0 ;; nop
sq t5, 0(a2) ;; store the end DMA tag (must go at the end of the DMA transfer)
lw t5, 6392(t0) ;; t5 = instance-work.chain-ptr (start of the DMA chain)
addiu a2, r0, 324 ;; a2 = 324 (constant to start DMA)
lw t7, 6396(t0) ;; t7 = instance-work.chain-ptr-next (to-spr chain dma mem is double buffered)
ori t8, r0, 65535 ;; t8 = 65535
sw t5, 6396(t0) ;; instance-work.chain-ptr-next = chain-ptr (swap!)
daddiu t6, t6, 1 ;; increment chain count
sw t7, 6392(t0) ;; instance-work.chain-ptr = chain-ptr-next (swap!)
or t7, t5, r0 ;; t7 = chain for next time
sll r0, r0, 0 ;; nop
sw t6, 6544(t0) ;; write back incremented chain count
sll r0, r0, 0 ;; nop
lw t6, 6388(t0) ;; t6 = instance-work.instance-ptr (the scratchpad destination for the instance)
sync.l
cache dxwbin t7, 0 ;; write back the data (required before DMAing, EE DMA bypasses CPU caches)
sync.l
cache dxwbin t7, 1
sync.l
daddiu t7, t7, 64
sync.l
cache dxwbin t7, 0
sync.l
cache dxwbin t7, 1
sync.l
sw t6, 128(t4) ;; set up destination addr in DMA register
sw t5, 48(t4) ;; set up source addr
xori t5, t6, 5232 ;; toggle destination pointer (scratchpad destinations are double buffered)
sw r0, 32(t4) ;; set qwc = 0 (I think it's ignored in chain mode)
sync.l
sw a2, 0(t4) ;; start transfer!
sync.l
sll r0, r0, 0
sw t5, 6408(t0) ;; store instance-work.src-ptr
beq r0, r0, L68 ;; always go to L68!
sw t5, 6388(t0) ;; store instance-work.instance-ptr (starting a new block, so equal to src-ptr)
;; if we reach here, it's because we didn't have any more visible instances.
;; we have two cases:
;; 1). we have stuff in scratchpad (the other buffer) waiting to be drawn.
;; 2). nothing was visible, so we have nothing in scratchpad.
;; we can tell these two cases from the sign of the a1 flag.
B18:
L66:
bltz a1, L98 ;; goto end (L98) if the flag is negative
lw a2, 6388(t0) ;; a2 = instance-work.instance-ptr.
B19:
sll r0, r0, 0
sw r0, 6540(t0) ;; instance-work.last-shrubs = 0
sll r0, r0, 0
xori a2, a2, 5232 ;; flip spad buffer (the last group isn't double buffered)
sll r0, r0, 0
sw a2, 6408(t0) ;; store src-ptr
sll r0, r0, 0
sw a2, 6388(t0) ;; store instance-ptr
;; dma sync - make sure the last to-spr is done.
B20:
L67:
lw a2, 0(t4)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
andi a2, a2, 256
sll r0, r0, 0
beq a2, r0, L68
sll r0, r0, 0
B21:
sll r0, r0, 0
lw a2, 6568(t0)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
daddiu a2, a2, 1
sll r0, r0, 0
sw a2, 6568(t0)
beq r0, r0, L67
sll r0, r0, 0
;; the details of the from-spr is unknown, but it seems like setting a1 flag > 0 is used to indicate
;; that we have some pending stuff in spad that we have to copy back.
B22:
L68:
bgez a1, L93 ;; if we have stuff, go to some later spad dma code
lw a2, 6408(t0) ;; a2 = instance-work.src-ptr
B23:
beq r0, r0, L58 ;; nope, we're done, go to loop top
addiu a1, r0, 10000 ;; but, remember we just did a dma sync for to. So we do have more work to do.
;; ideally we'll find more visible stuff and add to what we have now.
;; but if we don't, we set this flag to >0 to indicate that we have
;; stuff that we still need to process.
;; we reach here once we have visible instances in the scratchpad.
;; but, before we can process them, we have to make sure the output buffer
;; in the scratchpad has enough room.
;; If not, we do a DMA transfer back to RAM (to the dma-buf passed in)
;; this is copying completed VU1 DMA data.
B24:
L69:
daddiu t4, a3, -106 ;; 106 instances max in out buf, I guess
lqc2 vf2, 16(a2) ;; vf2 = bsphere of the first instance (they start prepping for the instance loop here...)
blez t4, L72 ;; goto L72 if we have enough room in spr
lbu t4, 6(a2) ;; t4 = instance.bucket-index (loaded as a u8, maybe only up to 255 buckets/tree?)
;; next three blocks wait for from-spr to finish. Need to do this before
;; starting the next from-spr transfer
B25:
sll r0, r0, 0
lw a0, 6416(t0)
sll r0, r0, 0
sll r0, r0, 0
B26:
L70:
lw t3, 0(a0)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
andi t3, t3, 256
sll r0, r0, 0
beq t3, r0, L71
sll r0, r0, 0
B27:
sll r0, r0, 0
lw t3, 6564(t0)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
daddiu t3, t3, 1
sll r0, r0, 0
sw t3, 6564(t0)
beq r0, r0, L70
sll r0, r0, 0
;; start from-spr and swap output data buffers
B28:
L71:
sw t1, 128(a0)
xori t1, t1, 6144 ;; swap buffer
sw v1, 16(a0)
sll t3, a3, 4 ;; compute size (16 qw's per instance?)
addu v1, v1, t3 ;; v1 is the next dma-buf output address (maybe needed for refs in upcoming DMA build)
or t3, t1, r0
sw a3, 32(a0)
addiu a3, r0, 256
sw a3, 0(a0) ;; start!
addiu a3, r0, 0 ;; reset count
;; if we reach here, we're finally ready to process the instance.
;; one cool trick they do here is to build
B29:
L72:
vcallms 33 ;; see backround-vu0-result.txt. This program does the sphere in view and distance checks.
;; the result is stored in vf04/vf06 and vi02
lw t5, 6548(t0) ;; t5 = instance-work.flags (was initialized to 0)
beq a1, t4, L74 ;; if we're using the same prototype as last time, skip ahead a bit.
daddiu t6, a1, -10000
B30:
beq t6, r0, L73
lw a1, 6404(t0)
B31: ;; I think this only runs on the very first run.
sll r0, r0, 0 ;; it copies the last/next/counts of instance-work to the first thing in the proto bucket array
lq t5, 6336(t0)
sll r0, r0, 0
lq t6, 6352(t0)
sll r0, r0, 0
lq t7, 6368(t0)
sll r0, r0, 0
sq t5, 92(a1)
sll r0, r0, 0
sq t6, 60(a1)
sll r0, r0, 0
sq t7, 76(a1)
B32:
L73:
or a1, t4, r0 ;; a1 = current prototype idx (remember it for next time)
lw t5, 6476(t0) ;; t5 = prototypes array
addiu t6, r0, 112 ;; t6 = 112
sq r0, 6336(t0) ;; work.lasts = 0
multu3 t4, t4, t6 ;; multiply for array access
sq r0, 6352(t0) ;; work.nexts = 0
daddu t4, t5, t4 ;; t4 = ptr to bucket
sq r0, 6368(t0) ;; work.counts = 0
sll r0, r0, 0 ;; nop
sw t4, 6404(t0) ;; store bucket in work.bucket-ptr
sll r0, r0, 0 ;; nop
lw t5, 4(t4) ;; t5 = bucket flags
sll r0, r0, 0 ;; nop
lqc2 vf15, 44(t4) ;; vf15 = lengths
andi t5, t5, 1 ;; t5 = flag & 1
lqc2 vf14, 28(t4) ;; vf14 = near/mid/far plane
vmul.xyz vf15, vf15, vf3 ;; vf15 = lengths * some constants?
sw t5, 6548(t0) ;; store flags in instance-work.flags
;; from here on, it looks like we jump to L92 if we reject the instance
;; NOTE: starting here is the matrix stuff.
;; we'll need to understand this to "de-instance" the non-wind instances
;; and to implement wind in C++
B33:
L74:
bne t5, r0, L92 ;; check flags & 1. This flag is only set from the debug menu (see dm-enable-instance-func)
;; and it's just used to disable a specific prototype for debugging.
ld t5, 56(a2) ;; loading the origin matrix (4x 16-bit integers/row) (this the last row)
B34:
sll r0, r0, 0
ld t4, 32(a2) ;; t4 = row 0
pextlh t5, t5, r0 ;; unpack row 3 to u32's (effectively shifts left 16)
ld t6, 40(a2) ;; t6 = row 1
psraw t7, t5, 10 ;; t7 = shift row 3 right by 10 (two shifts equivalent to shift left by 6 and sign extend)
ld t5, 48(a2) ;; t5 = row 2
pextlh t8, t4, r0 ;; t8 = row 0 to u32's
lhu t4, 8(a2) ;; t4 = instance.color-indices (I think an offset in the tree's palette, different from TIE)
psraw t8, t8, 16 ;; t8 = shift row 0 right by 16 (two shifts equivalent to just sign extending)
lq t9, 64(a2) ;; t9 = instance.flat-normal
pextlh t6, t6, r0 ;; t6 = row 1 unpacked
qmtc2.ni vf13, t7 ;; vf13 = row 3
psraw t6, t6, 16 ;; t6 = row 1 shifted
qmtc2.ni vf18, t9 ;; vf18 = instance.flat-normal
pextlh t5, t5, r0 ;; t5 = row 2 unpacked
qmtc2.ni vf10, t8 ;; vf10 = row 0
psraw t5, t5, 16 ;; t5 = row 2 shifted
qmtc2.ni vf11, t6 ;; vf11 = row 1
daddu t4, t4, t0 ;; t4 = color data - 304
qmtc2.ni vf12, t5 ;; vf12 = row 2
sll r0, r0, 0
cfc2.i t5, vi1 ;; t5 = vis result.
vitof0.xyzw vf13, vf13 ;; vf13 = row 3, as floats
lw t6, 304(t4) ;; t6 = rgba for this instance (8888 format)
bne t5, r0, L92 ;; possibly reject this instance.
lq t4, 6080(t0) ;; t4 = color constants (some hacky int to float stuff here)
B35:
pextlb t5, r0, t6 ;; t5 = unpacked rgba to u16's
lqc2 vf4, 6096(t0) ;; vf4 = hmge-d
pextlh t5, r0, t5 ;; t5 = unpacked rgba to u32's
lqc2 vf25, 6176(t0) ;; vf25 = min-dist (interesting...)
vsub.xyzw vf9, vf6, vf14 ;; vf6 is the "dist" of the draw node?
sll r0, r0, 0
psllw t6, t5, 8 ;; t6 = multiply colors by 256
mfc1 r0, f31
paddw t4, t6, t4 ;; t4 = colors + color constants
mfc1 r0, f31
vmula.xyzw acc, vf1, vf3 ;;
sll r0, r0, 0
vmsub.xyzw vf9, vf9, vf15
sq t5, 6160(t0) ;; stash bb color
vadd.xyz vf13, vf13, vf2 ;; same bsphere origin trick as tie
sq t4, 6144(t0) ;; store floating point color
vsubw.xyzw vf8, vf6, vf2 ;; distance compensate for bsphere radius
sll r0, r0, 0
vitof12.xyzw vf10, vf10 ;; row 0 as floats
sll r0, r0, 0
vmini.xyzw vf9, vf9, vf3 ;; dist crap
lw t4, 6404(t0) ;; t4 = bucket-ptr
vadd.xyz vf18, vf18, vf13 ;; flat-normal + real-origin
sll r0, r0, 0
vmulax.xyzw acc, vf28, vf13 ;;
lw t4, 24(t4) ;; geom3
vmadday.xyzw acc, vf29, vf13
sll r0, r0, 0
vmaxx.xyzw vf9, vf9, vf0
sll r0, r0, 0
vmaddaz.xyzw acc, vf30, vf13
sll r0, r0, 0
vmaddw.xyzw vf5, vf31, vf0 ;; vf.w is inverse distance from camera, I think
sll r0, r0, 0
vitof12.xyzw vf11, vf11 ;; vf11 = row 1 floats
sll r0, r0, 0
vftoi0.xyzw vf19, vf9 ;; distance stuff
sll r0, r0, 0
vmini.xyzw vf25, vf8, vf25 ;; apply min dist
sll r0, r0, 0
vsubz.xyzw vf4, vf8, vf4 ;; apply hmge
addiu t5, r0, 128 ;; ?? t5 = 128
vitof12.xyzw vf12, vf12 ;; vf12 = row 2 float
addiu t6, r0, 255 ;; ?? t6 = 255
vmulw.y vf9, vf9, vf15 ;; multiply by lengths
sll r0, r0, 0
sll r0, r0, 0
qmfc2.i t7, vf19 ;; integer dist compare
vdiv Q, vf3.w, vf5.w ;; compute Q here, I guess
sll r0, r0, 0
and t6, t7, t6
sll r0, r0, 0
dsubu t7, t5, t6
sw t6, 6156(t0) ;; adjusted color for fade out.
beq t5, t6, L80 ;; branch if don't try billboard, I think?
sqc2 vf25, 6176(t0)
B36:
beq t4, r0, L75 ;; don't do billboard if we don't have it
sw t7, 6172(t0)
B37:
;;;;;;;;;;;;;;;
;; BILLBOARD
;;;;;;;;;;;;;;;
vmulax.xyzw acc, vf28, vf18
lq t4, 5104(t0)
vmadday.xyzw acc, vf29, vf18
lq t5, 5120(t0)
vmaddaz.xyzw acc, vf30, vf18
lw t6, 6348(t0)
vmaddw.xyzw vf18, vf31, vf0
lw t7, 6364(t0)
sll t8, a3, 4
lqc2 vf8, 6112(t0)
addu t8, t8, v1
lqc2 vf7, 64(a2)
vmulaq.xyz acc, vf5, Q
lq a2, 6160(t0)
vmulaw.w acc, vf5, vf0
movz t6, t8, t6
vmadd.xyzw vf5, vf1, vf8
lhu t9, 6374(t0)
vmulq.w vf19, vf7, Q
sll r0, r0, 0
daddiu t9, t9, 1
lqc2 vf6, 5136(t0)
vmulq.xyzw vf26, vf1, Q
sw t6, 6348(t0)
vmulq.xyzw vf27, vf1, Q
sw t8, 6364(t0)
vnop
sll r0, r0, 0
vmaxz.w vf5, vf5, vf6
sh t9, 6374(t0)
vdiv Q, vf3.w, vf18.w
sll r0, r0, 0
vmulax.xyzw acc, vf20, vf10
sq t4, 0(t3)
vaddx.x vf26, vf0, vf0
sq t5, 16(t3)
vminiw.w vf5, vf5, vf6
sq a2, 48(t3)
vmadday.xyzw acc, vf21, vf10
sq a2, 96(t3)
vmaddz.xyzw vf10, vf22, vf10
sq a2, 144(t3)
vmulaw.w acc, vf18, vf0
sq a2, 192(t3)
vmulaq.xyz acc, vf18, Q
sw t7, 4(t3)
vmadd.xyzw vf18, vf1, vf8
sll r0, r0, 0
vmulq.w vf8, vf7, Q
sll r0, r0, 0
vmulq.xyzw vf24, vf1, Q
sll r0, r0, 0
vmulq.xyzw vf25, vf1, Q
sll r0, r0, 0
vmaxz.w vf18, vf18, vf6
sll r0, r0, 0
vadd.xy vf24, vf0, vf0
sll r0, r0, 0
vaddy.y vf25, vf0, vf0
sll r0, r0, 0
vmulax.xyzw acc, vf20, vf11
sll r0, r0, 0
vminiw.w vf18, vf18, vf6
sll r0, r0, 0
vmadday.xyzw acc, vf21, vf11
sll r0, r0, 0
vmaddz.xyzw vf11, vf22, vf11
sll r0, r0, 0
vmulax.xyzw acc, vf20, vf12
sll r0, r0, 0
vsub.xyzw vf16, vf18, vf5
sll r0, r0, 0
vmadday.xyzw acc, vf21, vf12
sll r0, r0, 0
vmaddz.xyzw vf12, vf22, vf12
sll r0, r0, 0
vmulax.xyzw acc, vf20, vf13
sll r0, r0, 0
vaddy.y vf16, vf16, vf16
sll r0, r0, 0
vmadday.xyzw acc, vf21, vf13
sll r0, r0, 0
vmaddaz.xyzw acc, vf22, vf13
sll r0, r0, 0
vmaddw.xyzw vf13, vf23, vf0
sll r0, r0, 0
vmul.xy vf17, vf16, vf16
sll r0, r0, 0
sll r0, r0, 0
sqc2 vf24, 32(t3)
sll r0, r0, 0
sqc2 vf25, 80(t3)
sll r0, r0, 0
sqc2 vf26, 128(t3)
vaddy.x vf17, vf17, vf17
sll r0, r0, 0
sll r0, r0, 0
sqc2 vf27, 176(t3)
vmulw.xyzw vf2, vf18, vf0
sll r0, r0, 0
vmulw.xyzw vf4, vf18, vf0
sll r0, r0, 0
vrsqrt Q, vf0.w, vf17.x
sll r0, r0, 0
sll r0, r0, 0
vwaitq
vmulq.xy vf17, vf16, Q
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
vsuby.x vf16, vf0, vf17
sll r0, r0, 0
vaddx.y vf16, vf0, vf17
sll r0, r0, 0
sll r0, r0, 0
sqc2 vf10, 240(t3)
sll r0, r0, 0
sqc2 vf11, 256(t3)
vmulw.xy vf8, vf16, vf8
sll r0, r0, 0
vmulw.xy vf19, vf16, vf19
sll r0, r0, 0
sll r0, r0, 0
lq a2, 6144(t0)
sll r0, r0, 0
sll r0, r0, 0
vmul.xy vf8, vf8, vf6
sll r0, r0, 0
vmul.xy vf19, vf19, vf6
sll r0, r0, 0
vmulw.xyzw vf6, vf5, vf0
sll r0, r0, 0
vmulw.xyzw vf7, vf5, vf0
sq a2, 304(t3)
vadd.xy vf2, vf18, vf8
sll r0, r0, 0
vsub.xy vf4, vf18, vf8
sll r0, r0, 0
vadd.xy vf6, vf5, vf19
sll r0, r0, 0
vsub.xy vf7, vf5, vf19
sll r0, r0, 0
vftoi4.xyzw vf2, vf2
sll r0, r0, 0
vftoi4.xyzw vf4, vf4
daddiu t3, t3, 224
vftoi4.xyzw vf6, vf6
daddiu a3, a3, 14
vftoi4.xyzw vf7, vf7
lw a2, 6156(t0)
sll r0, r0, 0
sqc2 vf2, -160(t3)
sll r0, r0, 0
sqc2 vf4, -112(t3)
sll r0, r0, 0
sqc2 vf6, -64(t3)
beq a2, r0, L92
sqc2 vf7, -16(t3)
B38:
beq r0, r0, L76
sll r0, r0, 0
B39:
L75:
beq t6, r0, L92
vmulax.xyzw acc, vf20, vf10
B40:
vmadday.xyzw acc, vf21, vf10
lq a2, 6144(t0)
vmaddz.xyzw vf10, vf22, vf10
sll r0, r0, 0
vmulax.xyzw acc, vf20, vf11
sll r0, r0, 0
vmadday.xyzw acc, vf21, vf11
sll r0, r0, 0
vmaddz.xyzw vf11, vf22, vf11
sll r0, r0, 0
vmulax.xyzw acc, vf20, vf12
sll r0, r0, 0
vmadday.xyzw acc, vf21, vf12
sll r0, r0, 0
vmaddz.xyzw vf12, vf22, vf12
sll r0, r0, 0
vmulax.xyzw acc, vf20, vf13
sll r0, r0, 0
vmadday.xyzw acc, vf21, vf13
sll r0, r0, 0
vmaddaz.xyzw acc, vf22, vf13
sll r0, r0, 0
vmaddw.xyzw vf13, vf23, vf0
sq a2, 80(t3)
sll r0, r0, 0
sqc2 vf10, 16(t3)
sll r0, r0, 0
sqc2 vf11, 32(t3)
B41:
L76:
sll a2, a3, 4
lhu t4, 6380(t0)
addu t5, a2, v1
lhu t7, 6372(t0)
sll t6, t4, 4
lw a2, 6360(t0)
daddu t8, t6, t0
lw t6, 6344(t0)
daddiu t7, t7, 1
lq t8, 4400(t8)
daddiu a3, a3, 6
sh t7, 6372(t0)
daddiu t7, t4, 1
sq t8, 0(t3)
daddiu t8, t7, -20
sqc2 vf12, 48(t3)
movz t7, r0, t8
sqc2 vf13, 64(t3)
daddiu t8, t4, -10
sh t7, 6380(t0)
daddiu t3, t3, 96
sw a2, -92(t3)
beq t4, r0, L77
sw t5, 6360(t0)
B42:
bne t8, r0, L78
sll r0, r0, 0
B43:
L77:
sll r0, r0, 0
lq t4, 5040(t0)
sll r0, r0, 0
lq t7, 5056(t0)
sll r0, r0, 0
sw t5, 6344(t0)
sll r0, r0, 0
movz t4, t7, t6
daddiu a3, a3, 1
sq t4, 0(t3)
sll r0, r0, 0
sw a2, 4(t3)
beq r0, r0, L92
daddiu t3, t3, 16
B44:
L78:
daddiu t5, t4, -9
sll r0, r0, 0
beq t5, r0, L79
daddiu t4, t4, -19
B45:
bne t4, r0, L92
sll r0, r0, 0
B46:
L79:
sll r0, r0, 0
sll t4, t7, 4
sll r0, r0, 0
daddu t4, t4, t0
daddiu a3, a3, 1
lq t4, 4720(t4)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
sq t4, 0(t3)
sll r0, r0, 0
sw a2, 4(t3)
beq r0, r0, L92
daddiu t3, t3, 16
;; I think the end of billboard.
B47:
L80:
sll r0, r0, 0
lw t4, 1324(t2) ;; t4 = wind time (from global wind work)
sll r0, r0, 0
lhu t5, 62(a2) ;; t5 = wind-index of the instance
sll r0, r0, 0
lw a2, 6384(t0) ;; a2 = wind-vectors
dsll t6, t5, 4 ;; t6 = t5 * 16
lqc2 vf19, 6048(t0) ;; vf19 = wind-const
daddu a2, a2, t6 ;; a2 = wind-vector + (wind-index * 16)
daddu t4, t5, t4 ;; t4 = wind-time + wind-index
andi t5, t4, 63 ;; t5 = (wind-time + wind-index) & 63
ld t4, 8(a2) ;; t4 = winds
sll t6, t5, 4 ;; t6 = ((wind-time + wind-index) & 63) * 16
ld t5, 0(a2) ;; t5 = winds
addu t7, t6, t2
qmfc2.i t6, vf4
pextlw t4, r0, t4
lqc2 vf16, 12(t7)
pextlw t5, r0, t5
qmtc2.i vf18, t4
sll r0, r0, 0
qmtc2.i vf17, t5
vmula.xyzw acc, vf16, vf1
sll r0, r0, 0
vmsubax.xyzw acc, vf18, vf19
sll r0, r0, 0
vmsuby.xyzw vf16, vf17, vf19
sll r0, r0, 0
pcgtw t5, r0, t6
mfc1 r0, f31
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
lqc2 vf24, 6208(t0)
vmulaz.xyzw acc, vf16, vf19
sll r0, r0, 0
vmadd.xyzw vf18, vf1, vf18
sll r0, r0, 0
sll r0, r0, 0
lqc2 vf25, 6224(t0)
sll r0, r0, 0
lqc2 vf26, 6240(t0)
sll r0, r0, 0
lqc2 vf27, 6256(t0)
vmulaz.xyzw acc, vf18, vf19
sll r0, r0, 0
vmadd.xyzw vf17, vf17, vf1
sll r0, r0, 0
vmulax.xyzw acc, vf24, vf2
sll r0, r0, 0
vmadday.xyzw acc, vf25, vf2
sll r0, r0, 0
vmaddaz.xyzw acc, vf26, vf2
sll r0, r0, 0
vminiw.xyzw vf17, vf17, vf0
sll r0, r0, 0
vmsubaw.xyzw acc, vf27, vf0
sll r0, r0, 0
vmsubw.xyzw vf24, vf1, vf2
sll r0, r0, 0
sll r0, r0, 0
qmfc2.i t4, vf18
vmaxw.xyzw vf27, vf17, vf19
sll r0, r0, 0
ppacw t4, r0, t4
mfc1 r0, f31
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
qmfc2.i t6, vf24
vmuly.xyzw vf27, vf27, vf9
sll r0, r0, 0
pcgtw t6, r0, t6
mfc1 r0, f31
ppach t6, r0, t6
mfc1 r0, f31
vmulax.yw acc, vf0, vf0
sll r0, r0, 0
vmulay.xz acc, vf27, vf10
sll r0, r0, 0
vmadd.xyzw vf10, vf1, vf10
sll r0, r0, 0
or t5, t6, t5
qmfc2.i t6, vf27
vmulax.yw acc, vf0, vf0
lw t7, 6552(t0)
vmulay.xz acc, vf27, vf11
sll r0, r0, 0
vmadd.xyzw vf11, vf1, vf11
sll r0, r0, 0
bne t7, s7, L81
ppacw t6, r0, t6
B48:
vmulax.yw acc, vf0, vf0
sd t4, 8(a2)
vmulay.xz acc, vf27, vf12
sd t6, 0(a2)
bne t5, r0, L86
vmadd.xyzw vf12, vf1, vf12
B49:
beq r0, r0, L82
sll r0, r0, 0
B50:
L81:
vmulax.yw acc, vf0, vf0
sll r0, r0, 0
vmulay.xz acc, vf27, vf12
sll r0, r0, 0
bne t5, r0, L86
vmadd.xyzw vf12, vf1, vf12
B51:
L82:
vmulax.xyzw acc, vf20, vf10
lq a2, 6144(t0)
vmadday.xyzw acc, vf21, vf10
sll r0, r0, 0
vmaddz.xyzw vf10, vf22, vf10
sll r0, r0, 0
vmulax.xyzw acc, vf20, vf11
sll r0, r0, 0
vmadday.xyzw acc, vf21, vf11
sll r0, r0, 0
vmaddz.xyzw vf11, vf22, vf11
sll r0, r0, 0
vmulax.xyzw acc, vf20, vf12
sll r0, r0, 0
vmadday.xyzw acc, vf21, vf12
sll r0, r0, 0
vmaddz.xyzw vf12, vf22, vf12
sll r0, r0, 0
vmulax.xyzw acc, vf20, vf13
sll r0, r0, 0
vmadday.xyzw acc, vf21, vf13
sll r0, r0, 0
vmaddaz.xyzw acc, vf22, vf13
sll r0, r0, 0
vmaddw.xyzw vf13, vf23, vf0
sq a2, 80(t3)
sll r0, r0, 0
sqc2 vf10, 16(t3)
sll r0, r0, 0
sqc2 vf11, 32(t3)
sll a2, a3, 4
lhu t4, 6378(t0)
addu t5, a2, v1
lhu t7, 6370(t0)
sll t6, t4, 4
lw a2, 6356(t0)
daddu t8, t6, t0
lw t6, 6340(t0)
daddiu t7, t7, 1
lq t8, 4400(t8)
daddiu a3, a3, 6
sh t7, 6370(t0)
daddiu t7, t4, 1
sq t8, 0(t3)
daddiu t8, t7, -20
sqc2 vf12, 48(t3)
movz t7, r0, t8
sqc2 vf13, 64(t3)
daddiu t8, t4, -10
sh t7, 6378(t0)
daddiu t3, t3, 96
sw a2, -92(t3)
beq t4, r0, L83
sw t5, 6356(t0)
B52:
bne t8, r0, L84
sll r0, r0, 0
B53:
L83:
sll r0, r0, 0
lq t4, 5040(t0)
sll r0, r0, 0
lq t7, 5056(t0)
sll r0, r0, 0
sw t5, 6340(t0)
sll r0, r0, 0
movz t4, t7, t6
daddiu a3, a3, 1
sq t4, 0(t3)
sll r0, r0, 0
sw a2, 4(t3)
beq r0, r0, L92
daddiu t3, t3, 16
B54:
L84:
daddiu t5, t4, -9
sll r0, r0, 0
beq t5, r0, L85
daddiu t4, t4, -19
B55:
bne t4, r0, L92
sll r0, r0, 0
B56:
L85:
sll r0, r0, 0
sll t4, t7, 4
sll r0, r0, 0
daddu t4, t4, t0
daddiu a3, a3, 1
lq t4, 4720(t4)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
sq t4, 0(t3)
sll r0, r0, 0
sw a2, 4(t3)
beq r0, r0, L92
daddiu t3, t3, 16
B57:
L86:
vmulax.xyzw acc, vf28, vf10
lqc2 vf24, 6160(t0)
vmadday.xyzw acc, vf29, vf10
sll r0, r0, 0
vmaddz.xyzw vf10, vf30, vf10
sll r0, r0, 0
vmulax.xyzw acc, vf28, vf11
sll r0, r0, 0
vmadday.xyzw acc, vf29, vf11
lhu t4, 6536(t0)
vmaddz.xyzw vf11, vf30, vf11
lw a2, 6404(t0)
vmulax.xyzw acc, vf28, vf12
daddiu t8, t4, 1
vmadday.xyzw acc, vf29, vf12
sh t8, 6536(t0)
vmaddz.xyzw vf12, vf30, vf12
lw t4, 12(a2) ;; load the generic geometry?
vmulax.xyzw acc, vf28, vf13
lw t5, 6532(t0)
vmadday.xyzw acc, vf29, vf13
lh t6, 2(t4) ;; generic frag count.
vmaddaz.xyzw acc, vf30, vf13
lw a2, 6528(t0)
vmaddw.xyzw vf13, vf31, vf0
lw t7, 6516(t0)
vitof0.xyz vf24, vf24
sh t8, 6368(t0)
B58: ;; generic loop
L87:
daddiu t8, a3, -115
sll r0, r0, 0
blez t8, L90
lw t8, 28(t4) ;; load the frag
B59: ;; dma
L88:
lw t3, 0(a0)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
andi t3, t3, 256
sll r0, r0, 0
beq t3, r0, L89
sll r0, r0, 0
B60:
sll r0, r0, 0
lw t3, 6564(t0)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
daddiu t3, t3, 1
sll r0, r0, 0
sw t3, 6564(t0)
beq r0, r0, L88
sll r0, r0, 0
B61:
L89:
sw t1, 128(a0)
xori t1, t1, 6144
sw v1, 16(a0)
sll t3, a3, 4
addu v1, v1, t3
or t3, t1, r0
sw a3, 32(a0)
addiu a3, r0, 256
sw a3, 0(a0)
addiu a3, r0, 0
B62:
L90:
daddu t9, t7, t0
addiu t7, t7, -144
daddiu t4, t4, 4
daddiu t9, t9, 5152
bgez t7, L91
lq ra, 0(t9)
B63:
sll r0, r0, 0
addiu t7, r0, 720
B64:
L91:
sll r0, r0, 0
sw t5, 84(t9)
sll t5, a3, 4
sq ra, 0(t3)
addu t5, t5, v1
sqc2 vf10, 16(t3)
movz a2, t5, a2
sqc2 vf11, 32(t3)
daddiu a3, a3, 12
sqc2 vf12, 48(t3)
sll r0, r0, 0
lw ra, 4(t8) ;; ra = vtx-cnt
sll r0, r0, 0
sqc2 vf13, 64(t3)
sll r0, r0, 0
sqc2 vf24, 80(t3)
sll r0, r0, 0
sw ra, 96(t3)
sll r0, r0, 0
lw ra, 12(t8) ;; ra = cnt
sll r0, r0, 0
lbu gp, 8(t8) ;; gp = cnt-qwc
sll r0, r0, 0
sw ra, 20(t9)
sll r0, r0, 0
sb gp, 16(t9)
sll r0, r0, 0
sb gp, 30(t9)
sll r0, r0, 0
lw ra, 24(t8) ;; ra = stq
sll r0, r0, 0
lbu gp, 11(t8) ;; gp = stq-qwc
sll r0, r0, 0
sw ra, 36(t9)
sll r0, r0, 0
sb gp, 32(t9)
sll r0, r0, 0
lw ra, 20(t8) ;; ra = col
sll r0, r0, 0
lbu gp, 10(t8) ;; gp = col-qwc
sll r0, r0, 0
sw ra, 52(t9)
sll r0, r0, 0
sb gp, 48(t9)
sll r0, r0, 0
lw ra, 16(t8) ;; ra = vtx
sll r0, r0, 0
lbu gp, 9(t8) ;; gp = vtx-qwc
sll r0, r0, 0
sw ra, 68(t9)
sll r0, r0, 0
sb gp, 64(t9)
sll r0, r0, 0
lw t8, 4(t8)
sll r0, r0, 0
lq ra, 16(t9)
sll r0, r0, 0
sb t8, 46(t9)
sll r0, r0, 0
sb t8, 62(t9)
sll r0, r0, 0
sb t8, 78(t9)
sll r0, r0, 0
sq ra, 112(t3)
sll r0, r0, 0
lq t8, 32(t9)
sll r0, r0, 0
lq ra, 48(t9)
sll r0, r0, 0
sq t8, 128(t3)
sll r0, r0, 0
sq ra, 144(t3)
sll r0, r0, 0
lq t8, 64(t9)
sll r0, r0, 0
lq t9, 80(t9)
sll r0, r0, 0
sq t8, 160(t3)
daddiu t3, t3, 192
sq t9, -16(t3)
daddiu t6, t6, -1
sll r0, r0, 0
bgtz t6, L87
sll r0, r0, 0
B65:
sll r0, r0, 0
sw t7, 6516(t0)
lui t4, 4096
sw t5, 6532(t0)
ori t4, t4, 54272
sw a2, 6528(t0)
sll r0, r0, 0
sll r0, r0, 0
B66:
L92:
vcallms 25
lw a2, 6408(t0)
sll r0, r0, 0
lw t4, 6420(t0)
daddiu a2, a2, 80
sll r0, r0, 0
daddiu t4, t4, -1
sw a2, 6408(t0)
bgtz t4, L69
sw t4, 6420(t0)
B67:
L93:
sll r0, r0, 0
lw t4, 8(a2)
daddiu a2, a2, 16
lw t5, 6540(t0)
sll r0, r0, 0
sw a2, 6408(t0)
bne t4, r0, L69
sw t4, 6420(t0)
B68:
bne t5, r0, L58
sll r0, r0, 0
B69:
sll r0, r0, 0
lw a1, 6404(t0)
sll r0, r0, 0
lq a2, 6336(t0)
sll r0, r0, 0
lq t2, 6352(t0)
sll r0, r0, 0
lq t3, 6368(t0)
sll r0, r0, 0
sq a2, 92(a1)
sll r0, r0, 0
sq t2, 60(a1)
sll r0, r0, 0
sq t3, 76(a1)
beq a3, r0, L96
sll r0, r0, 0
B70:
sll r0, r0, 0
lw a0, 6416(t0)
sll r0, r0, 0
sll r0, r0, 0
B71:
L94:
lw a1, 0(a0)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
andi a1, a1, 256
sll r0, r0, 0
beq a1, r0, L95
sll r0, r0, 0
B72:
sll r0, r0, 0
lw a1, 6564(t0)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
daddiu a1, a1, 1
sll r0, r0, 0
sw a1, 6564(t0)
beq r0, r0, L94
sll r0, r0, 0
B73:
L95:
sw v1, 16(a0)
sll a1, a3, 4
sw t1, 128(a0)
xori a2, t1, 6144
addu v1, v1, a1
or a1, a2, r0
sw a3, 32(a0)
addiu a1, r0, 256
sw a1, 0(a0)
addiu a1, r0, 0
B74:
L96:
lw a1, 0(a0)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
andi a1, a1, 256
sll r0, r0, 0
beq a1, r0, L97
sll r0, r0, 0
B75:
sll r0, r0, 0
lw a1, 6564(t0)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
daddiu a1, a1, 1
sll r0, r0, 0
sw a1, 6564(t0)
beq r0, r0, L96
sll r0, r0, 0
B76:
L97:
lw a0, 6524(t0)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
sw v1, 4(a0)
sll r0, r0, 0
B77:
L98:
or v0, r0, r0
ld ra, 0(sp)
lq gp, 16(sp)
jr ra
daddiu sp, sp, 32
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
```