jak-project/docs/progress-notes/joint-decompressor.md
water111 79d14af0b5
Decompile joint, collide-func, clean up joint decompression code for all games (#3369)
I finally read through all the joint code and wrote up some
documentation. I think this will be really helpful when we try to
understand all the functions in `process-drawable`, or if somebody ever
wants to import/export animations.

This switches all three games to using a new faster GOAL joint
decompressor. It is on by default, but you can go back to the old
version by setting `*use-new-decompressor*` to false.

Also fix the log-related crash, fix the clock speed used in timer math.
2024-02-11 09:50:07 -05:00

1684 lines
50 KiB
Markdown

This is an explanation of reverse engineering and porting the Jak joint decompressor from PS2 assembly to higher-level OpenGOAL code. It's the same from all three games. I made this document as an example of how to approach this type of reverse engineering.
# What is `joint.gc`?
The "joint" system is used to play back animations.
At a very high level, the process to animate a character:
- the gameplay code uses the `ja` macros to set up joint animations.
- the process-drawable system updates the animation metadata in joint-control. This is responsible for
producing an array of joint-control-channel. The channels are arranged into a blend tree, and each
channel has a frame number (possibly in between frames) and interpolation weight for the blend tree.
- This file, joint.gc looks at those animations, frame numbers, blend weights, and tree structure, then
produces the relative transform between bones, called "joint transforms"
- The process-drawable system uses the joint transforms to compute bone transforms.
- The gameplay code and collision code use the bone transforms to determine world-space positions/rotations.
- The "bones.gc" system builds rendering matrices for the foreground renderer
- The merc, mercneric, and shadow renderers consume these matrices for their skinning calculations.
The focus of this documentation is the function that computes joint transforms.
There are three:
1. `create-interpolated-joint-animation-frame`, which is the most standard one. It takes a `joint-control`, a number of joints (since low lod versions might not need them all), and produces a `joint-anim-frame` - a list of joint transforms.
2. `create-interpolated2-joint-animation-frame`, which is the same, but uses the "interp2" system described later.
3. `cspace<-matrix-no-push-joint!` which computes a single joint transform. The joint must be one of the first two, and it uses a special `no-push` mode. More on this later.
Reading the top of `joint.gc` has a lot more details, if you are curious about the details of the blending, etc. That was written before touching the actually decompressor assembly. Figuring all this out first can be really helpful.
# Background information
The end result of this routine is to compute a `joint-anim-frame`:
```lisp
(deftype joint-anim-frame (structure)
((matrices matrix 2 :inline)
(data transformq :inline :dynamic)
)
(:methods
(new (symbol type int) _type_)
)
)
```
which is all the joint transforms for a interpolated frame. Note that there is interpolation in both time (in between frames) and between different animations (blending).
This is the code that is used to generate a `joint-anim-frame`:
```lisp
(defun create-interpolated-joint-animation-frame ((dst joint-anim-frame) (num-joints int) (jc joint-control))
(flatten-joint-control-to-spr jc)
(make-joint-jump-tables)
(calc-animation-from-spr dst num-joints)
0
)
```
# Flattening the Joint Control
The `flatten-joint-control-to-spr` function is written in plain GOAL, so I've decompiled/annotated it here.
The first part of the function is iterating through `joint-control-channel`s, and computing blend weights from a blend tree. This computation acts like a stack machine. There is a `push` operation to push a new stack frame, with all channels zero except for the specified animation. There is a `stack` operation which takes the two most top things on the stack, then adds them together with the given weights (effectively `pop`ing back 1 stack frame in the end). Finally, there is a `blend` operation, which modifies the top of the animation stack to be a blend of multiple animations. I imagine this can stay almost the same in the new decompressor, other than scratchpad use.
The second part builds a list of "uploads". This is a list of data that should be uploaded to the scratchpad. Each animation has some "fixed" data, which is a chunk of common data needed to produce any frame.
There is also "frame" data, one for each frame in the animation. If we need to interpolate between two frames, we upload 2 frames worth of data. Otherwise, if we want an exact frame, we just upload the one.
```lisp
(defun flatten-joint-control-to-spr ((jc joint-control))
"Walk the blend tree and compute interpolation weights, prepare animation upload info."
(rlet ((vf1 :class vf)
(vf10 :class vf)
(vf11 :class vf)
(vf12 :class vf)
(vf13 :class vf)
(vf14 :class vf)
(vf2 :class vf)
(vf3 :class vf)
(vf4 :class vf)
(vf5 :class vf)
(vf6 :class vf)
(vf7 :class vf)
(vf8 :class vf)
(vf9 :class vf)
)
;; We assume a maximum of 4 * 6 = 24 channels.
(let ((chan-count (+ (-> jc active-channels) (-> jc float-channels))))
(let ((one 1.0) ;; constant
(chan-float-offset 0) ;; which channel's weight to adjust.
(chan-vector-ptr (the-as (inline-array vector) #x70000960)) ;; stack pointer
(interp2-selected-idx (-> jc active-frame-interp)) ;; interp2 0 or 1 selector.
)
;; loop over channels
(dotimes (chan-idx (the-as int chan-count))
(let ((chan (-> jc channel chan-idx)))
(case (-> chan command)
(((joint-control-command push))
;; push a new stack frame with this anim set to 1.
(let ((flt1 (the-as (pointer float) (+ (the-as int chan-vector-ptr) chan-float-offset))))
;; initialize all channels in this frame to 0
(set! (-> chan-vector-ptr 0 quad) (the-as uint128 0))
(set! (-> chan-vector-ptr 1 quad) (the-as uint128 0))
(set! (-> chan-vector-ptr 2 quad) (the-as uint128 0))
(set! (-> chan-vector-ptr 3 quad) (the-as uint128 0))
(set! (-> chan-vector-ptr 4 quad) (the-as uint128 0))
(set! (-> chan-vector-ptr 5 quad) (the-as uint128 0))
;; then, set the weight of this animation to 1.
(set! (-> flt1 0) one)
)
;; advance stack pointer.
(set! chan-vector-ptr (the-as (inline-array vector) (-> chan-vector-ptr 6)))
)
(((joint-control-command blend) (joint-control-command push1) (joint-control-command float))
;; determine the blend factor for this animation.
(let ((interp2-selected-weight1 (-> chan frame-interp interp2-selected-idx)))
;; one - blend_factor gives us the blend weight for the previous thing in the frame
(let ((a3-5 (- one interp2-selected-weight1)))
(.mov vf1 a3-5)
)
;; modify the previous thing in the stack to reduce weight:
(let ((prev-chan-ptr (the-as (inline-array vector) (-> chan-vector-ptr -6))))
(.lvf vf2 (&-> prev-chan-ptr 0 quad))
(let ((a3-6 (&+ (the-as pointer prev-chan-ptr) chan-float-offset)))
(.lvf vf3 (&-> prev-chan-ptr 1 quad))
(.lvf vf4 (&-> prev-chan-ptr 2 quad))
(.lvf vf5 (&-> prev-chan-ptr 3 quad))
(.lvf vf6 (&-> prev-chan-ptr 4 quad))
(.lvf vf7 (&-> prev-chan-ptr 5 quad))
(.mul.x.vf vf2 vf2 vf1) ;; multiply all weights by (1 - push_blend)
(.mul.x.vf vf3 vf3 vf1)
(.mul.x.vf vf4 vf4 vf1)
(.mul.x.vf vf5 vf5 vf1)
(.mul.x.vf vf6 vf6 vf1)
(.mul.x.vf vf7 vf7 vf1)
(.svf (&-> prev-chan-ptr 0 quad) vf2)
(.svf (&-> prev-chan-ptr 1 quad) vf3)
(.svf (&-> prev-chan-ptr 2 quad) vf4)
(.svf (&-> prev-chan-ptr 3 quad) vf5)
(.svf (&-> prev-chan-ptr 4 quad) vf6)
(.svf (&-> prev-chan-ptr 5 quad) vf7)
;; but, modify our channel to add in the push_blend
(+! (-> (the-as (pointer float) a3-6) 0) interp2-selected-weight1)
)
(set! chan-vector-ptr (the-as (inline-array vector) (-> prev-chan-ptr 6)))
)
)
)
(((joint-control-command stack))
;; add together the last two stack frames, using the given weight.
(let* ((interp2-selected-weight2 (-> chan frame-interp interp2-selected-idx))
(one-minus-interp2 (- one interp2-selected-weight2))
;; back up 2 stack frames, to add them.
(chans-to-stack (the-as (inline-array vector) (-> chan-vector-ptr -12)))
)
(let ((a3-8 interp2-selected-weight2))
(.mov vf1 a3-8)
)
(let ((a3-9 one-minus-interp2))
(.mov vf2 a3-9)
)
;; load first stack frame
(.lvf vf3 (&-> chans-to-stack 0 quad))
(.lvf vf4 (&-> chans-to-stack 1 quad))
(.lvf vf5 (&-> chans-to-stack 2 quad))
(.lvf vf6 (&-> chans-to-stack 3 quad))
(.lvf vf7 (&-> chans-to-stack 4 quad))
(.lvf vf8 (&-> chans-to-stack 5 quad))
;; multiply by blend weight
(.mul.x.vf vf3 vf3 vf2)
(.mul.x.vf vf4 vf4 vf2)
(.mul.x.vf vf5 vf5 vf2)
(.mul.x.vf vf6 vf6 vf2)
(.mul.x.vf vf7 vf7 vf2)
(.mul.x.vf vf8 vf8 vf2)
;; load second stack frame
(.lvf vf9 (&-> chans-to-stack 6 quad))
(.lvf vf10 (&-> chans-to-stack 7 quad))
(.lvf vf11 (&-> chans-to-stack 8 quad))
(.lvf vf12 (&-> chans-to-stack 9 quad))
(.lvf vf13 (&-> chans-to-stack 10 quad))
(.lvf vf14 (&-> chans-to-stack 11 quad))
;; multiply by blend weight
(.mul.x.vf vf9 vf9 vf1)
(.mul.x.vf vf10 vf10 vf1)
(.mul.x.vf vf11 vf11 vf1)
(.mul.x.vf vf12 vf12 vf1)
(.mul.x.vf vf13 vf13 vf1)
(.mul.x.vf vf14 vf14 vf1)
;; the add!
(.add.vf vf3 vf3 vf9)
(.add.vf vf4 vf4 vf10)
(.add.vf vf5 vf5 vf11)
(.add.vf vf6 vf6 vf12)
(.add.vf vf7 vf7 vf13)
(.add.vf vf8 vf8 vf14)
;; overwrite the first
(.svf (&-> chans-to-stack 0 quad) vf3)
(.svf (&-> chans-to-stack 1 quad) vf4)
(.svf (&-> chans-to-stack 2 quad) vf5)
(.svf (&-> chans-to-stack 3 quad) vf6)
(.svf (&-> chans-to-stack 4 quad) vf7)
(.svf (&-> chans-to-stack 5 quad) vf8)
;; this ends up moving the stack pointer back 1 stack frame (went back 2, then fwd 1)
(set! chan-vector-ptr (the-as (inline-array vector) (&+ (the-as pointer chans-to-stack) 96)))
)
)
)
)
;; advance channel
(+! chan-float-offset 4)
)
)
;; now we have figured out all the weights for each channel - we need to figure out which animations need decompressing.
(let ((upload-count 0))
(dotimes (upload-chan-idx (the-as int chan-count))
;; only upload if the weight is nonzero.
(when (< 0.001 (-> (the-as terrain-context #x70000000) work foreground joint-work flatten-array upload-chan-idx))
;; determine integer frame we need
(let* ((upload-chan (-> jc channel upload-chan-idx))
(anim (-> upload-chan frame-group frames))
(frame-num (-> upload-chan frame-num))
(int-frame-num (the int frame-num))
(frame-frac (- frame-num (the float int-frame-num)))
)
(let ((last-frame (+ (-> anim num-frames) -1)))
(if (not (-> upload-chan frame-group))
(format 0 "Channel ~D skel ~A frame-group is #f!!!~%" upload-chan-idx jc)
)
;; if we're past the end, clamp
(when (>= int-frame-num (the-as int last-frame))
(set! frame-frac 0.0)
(set! int-frame-num (the-as int last-frame))
)
)
;; set up the upload:
(let ((upload (-> (the-as terrain-context #x70000000) work foreground joint-work uploads upload-count)))
(set! (-> upload fixed) (-> anim fixed))
(set! (-> upload fixed-qwc) (the-as int (-> anim fixed-qwc)))
(set! (-> upload frame) (-> anim data int-frame-num))
;; if we are a fractional frame, upload 2 frames so we can interpolate from the next.
(set! (-> upload frame-qwc) (the-as int (if (= frame-frac 0.0)
(-> anim frame-qwc)
(* (-> anim frame-qwc) 2)
)
)
)
(set! (-> upload amount)
(-> (the-as terrain-context #x70000000) work foreground joint-work flatten-array upload-chan-idx)
)
(set! (-> upload interp) frame-frac)
)
)
(+! upload-count 1)
)
)
(set! (-> (the-as terrain-context #x70000000) work foreground joint-work num-uploads) upload-count)
)
;; record amounts in the channel so we can print it for debug.
(dotimes (v1-26 (the-as int chan-count))
(set! (-> jc channel v1-26 inspector-amount)
(the-as
uint
(the int (* 255.0 (-> (the-as terrain-context #x70000000) work foreground joint-work flatten-array v1-26)))
)
)
)
)
0
)
)
```
# Make Joint Jump Tables
This function is building a number of jump tables like this:
```
lw v1, decompress-fixed-data-to-accumulator(s7) ;; get address of function
addiu a0, r0, 301 ;; offset into instruction (in instructions)
dsll a0, a0, 2 ;; convert offset to bytes
daddu v1, v1, a0 ;; get pointer to inside of function
lui a0, 28672 ;; store into the jump table (in the scratchpad)
sw v1, 1632(a0)
```
This is likely some strange macro, as I think the GOAL compiler would normally have constant propagated the `addiu`/`dsll` pair.
We can see the layout of the scratchpad here:
```lisp
(deftype joint-work (structure)
;; used in cspace<-matrix-no-push-joint!, which is not asm
((temp-mtx matrix :inline :offset-assert 0)
(joint-stack matrix-stack :inline :offset-assert 64)
;; the jump tables. seems like there's 3 different modes: fix (fixed), frm (frame), pair (2x frame)
(fix-jmp-table (function none) 16 :offset-assert 1616) ;; guessed by decompiler
(frm-jmp-table (function none) 16 :offset-assert 1680) ;; guessed by decompiler
(pair-jmp-table (function none) 16 :offset-assert 1744) ;; guessed by decompiler
;; upload records generated in the `flatten-joint-control-to-spr` function above
(uploads channel-upload-info 24 :inline :offset-assert 1808) ;; guessed by decompiler
(num-uploads int32 :offset-assert 2384)
;; "accumulators", which I believe will eventually contain the full transforms
(mtx-acc matrix 2 :inline :offset-assert 2400) ;; guessed by decompiler
(tq-acc transformq 100 :inline :offset-assert 2528) ;; guessed by decompiler
;; ?? likely destination for uploads
(jacp-hdr joint-anim-compressed-hdr :inline :offset-assert 7328)
(fixed-data joint-anim-compressed-fixed :inline :offset-assert 7392)
(frame-data joint-anim-compressed-frame 2 :inline :offset-assert 9600) ;; guessed by decompiler
;; used during the flatten-joint-control-to-spr function
(flatten-array float 576 :offset 2400) ;; guessed by decompiler
(flattened vector 24 :inline :offset 2400) ;; guessed by decompiler
)
```
and the upload type:
```
(deftype channel-upload-info (structure)
"Information about an upload of animation data to a single joint channel."
((fixed joint-anim-compressed-fixed :offset-assert 0)
(fixed-qwc int32 :offset-assert 4)
(frame joint-anim-compressed-frame :offset-assert 8)
(frame-qwc int32 :offset-assert 12)
(amount float :offset-assert 16)
(interp float :offset-assert 20)
)
```
I did not learn much from staring at this yet, so I am going to move on to `calc-animation-from-spr`, and revisit once I find the actual jumps.
# `calc-animation-from-spr`
This function is entirely assembly. I approach these by adding a comment to each line:
```python
# a0 = dst-joint-anim-frame
# a1 = num-joints
or v1, a1, r0 # v1 = num-joint
# backup registers
daddiu sp, sp, -192
sq s0, 0(sp)
sq s1, 16(sp)
sq s2, 32(sp)
sq s3, 48(sp)
sq s4, 64(sp)
sq s5, 80(sp)
sq s6, 96(sp)
sq t8, 112(sp)
sq t9, 128(sp)
sq gp, 144(sp)
sq fp, 160(sp)
sq ra, 176(sp)
daddiu sp, sp, -16 # allocate another 16 bytes on the stack
qmtc2.i vf15, r0 # vf15 = 0
sw a1, 0(sp) # 0(sp) = num-joints
lui v1, 28672
lw s1, 2384(v1) # s1 = num-uploads
daddiu t7, v1, 1808 # t7 = uploads
lui s0, 4096 # generating some constant here, likely DMA register
daddiu t1, v1, 7328 # t1 = jacp-hdr
beq s1, r0, L53 # early return if we don't have any uploads
ori s0, s0, 54272 # more DMA register constant stuff
B1:
lw t2, 0(t7) # t2 = fixed
addiu t3, r0, 7392 # t3 = fixed-data
lw t4, 4(t7) # t4 = fixed-qwc
addiu v1, r0, 256 # DMA constant?
sw t2, 16(s0) # DMA setup
vaddw.xyzw vf14, vf15, vf0 # vf14 = 1, 1, 1, 1
sw t3, 128(s0) # DMA setup
sll r0, r0, 0 #
sw t4, 32(s0) # DMA setup QWC
sync.l
sw v1, 0(s0) # DMA GO!!
sync.l
lw t9, clear-frame-accumulator(s7) # t9 = this function
vadd.yz vf14, vf14, vf14 # vf14 = [1, 2, 2, 1]
lw s2, 0(sp) # s2 = num-joint
sll r0, r0, 0
jalr ra, t9 # call!
vadd.yz vf14, vf14, vf14 # vf14 = [1, 4, 4, 1]
B2:
L49:
lw v1, 0(s0) # wait on DMA
sll r0, r0, 0
andi v1, v1, 256
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
bne v1, r0, L49
sll r0, r0, 0
B3:
lw t2, 8(t7) # t2 = frame
addiu t3, r0, 9600 # t3 = frame-data
lw t4, 12(t7) # t4 = frame-qwc
addiu v1, r0, 256 # dma constant
sw t2, 16(s0) # dma (src addr)
sll r0, r0, 0
sw t3, 128(s0) # dma (dst addr)
sll r0, r0, 0
sw t4, 32(s0) # dma (qwc)
sync.l
sw v1, 0(s0) # dma (start)
sync.l
lw a2, 16(t7)
lui a1, 28672
lw t9, decompress-fixed-data-to-accumulator(s7)
daddiu a1, a1, 7392 # a1 = fixed-data
jalr ra, t9
daddiu s1, s1, -1 # decrement upload counter
B4:
L50:
lw v1, 0(s0) # wait on frame data dma
sll r0, r0, 0
andi v1, v1, 256
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
bne v1, r0, L50
sll r0, r0, 0
B5:
beq s1, r0, L51 # jump over next dma start if there's nothing next
sll r0, r0, 0
B6:
lw t2, 24(t7) # dma the next upload
addiu t3, r0, 7392
lw t4, 28(t7)
addiu v1, r0, 256
sw t2, 16(s0)
sll r0, r0, 0
sw t3, 128(s0)
sll r0, r0, 0
sw t4, 32(s0)
sync.l
sw v1, 0(s0)
sync.l
B7:
L51:
lw t0, 20(t7) # t0 = interp (between consecutive frames)
lui a1, 28672
lw a2, 16(t7) # a2 = amount (weight)
daddiu a1, a1, 9600 # a1 = frame-data
beq t0, r0, L52 # if interp is 0, take the non-pair case, just one frame
sll r0, r0, 0
B8:
lw a3, 12(t7) # a3 = frame-qwc
sll r0, r0, 0
lw t9, decompress-frame-data-pair-to-accumulator(s7)
sll r0, r0, 0
jalr ra, t9
sll a3, a3, 3
bne s1, r0, L49
daddiu t7, t7, 24
B9:
lw t9, normalize-frame-quaternions(s7)
sll r0, r0, 0
lw s2, 0(sp)
sll r0, r0, 0
jalr ra, t9
sll r0, r0, 0
daddiu sp, sp, 16
sll r0, r0, 0
lq s0, 0(sp)
lq s1, 16(sp)
lq s2, 32(sp)
lq s3, 48(sp)
lq s4, 64(sp)
lq s5, 80(sp)
lq s6, 96(sp)
lq t8, 112(sp)
lq t9, 128(sp)
lq gp, 144(sp)
lq ra, 176(sp)
lq fp, 160(sp)
jr ra
daddiu sp, sp, 192
B10:
L52:
lw t9, decompress-frame-data-to-accumulator(s7)
sll r0, r0, 0
jalr ra, t9
sll r0, r0, 0
bne s1, r0, L49
daddiu t7, t7, 24
B11:
lw t9, normalize-frame-quaternions(s7)
sll r0, r0, 0
lw s2, 0(sp)
sll r0, r0, 0
jalr ra, t9
sll r0, r0, 0
B12:
L53:
daddiu sp, sp, 16
sll r0, r0, 0
lq s0, 0(sp)
lq s1, 16(sp)
lq s2, 32(sp)
lq s3, 48(sp)
lq s4, 64(sp)
lq s5, 80(sp)
lq s6, 96(sp)
lq t8, 112(sp)
lq t9, 128(sp)
lq gp, 144(sp)
lq ra, 176(sp)
lq fp, 160(sp)
jr ra
daddiu sp, sp, 192
jr ra
daddu sp, sp, r0
```
The basic operation is:
- call `clear-frame-accumulator`, to zero out the tq and matrix accumulators
- for each upload
- DMA the fixed data, then `decompress-fixed-data-to-accumulator`
- DMA the fixed data
- Call `decompress-frame-data-pair-to-accumulator` or `decompress-frame-data-to-accumulator`. Use the `pair` one if interp is nonzero, meaning we need to combine two consecutive frames
- Call `normalize-frame-quaternions`
# `clear-frame-accumulator`
Just sets the accumulator to 0. The first two joints are matrices and zeroed entirely. The remaining ones are `transformq` (position, quaternion, scale). The pos/quat are zeroed, but not the scale.
# `normalize-frame-quaternion`
This function simply sets `trans.w = 1`, `scale.w = 1`, and normalizes the quaternions.
The quaternions may be slightly off due to the blending or decompression.
```python
# s2 = num-joints
L91:
daddiu sp, sp, -16
B0:
daddiu s2, s2, -2 # subtract off first two matrix joints
sw a0, 0(sp)
daddiu a0, a0, 128 # seek past matrices in the accumulator
B1:
L92: # per transformq loop
lqc2 vf4, 16(a0) # vf4 = quat
lqc2 vf1, 0(a0) # vf1 = trans
lqc2 vf7, 32(a0) # vf7 = scale
vmul.xyzw vf10, vf4, vf4 # vf10 = x^2, y^2, z^2, w^2
vmove.w vf1, vf0 # vf1.w = 1
vmove.w vf7, vf0 # vf7.w = 1
vmulaw.xyzw acc, vf0, vf10 # sum up x^2, y^2, z^2, w^2
vmaddaz.xyzw acc, vf0, vf10
vmadday.xyzw acc, vf0, vf10
vmaddx.xyzw vf10, vf0, vf10
sqc2 vf1, 0(a0) # store fixed trans
sqc2 vf7, 32(a0) # store fixed scale
daddiu a0, a0, 48 # advance tq ptr
vrsqrt Q, vf0.w, vf10.w # get length
vwaitq
vmulq.xyzw vf4, vf4, Q # multiply by inverse length to normalize
daddiu s2, s2, -1 # dec count
bne s2, r0, L92
sqc2 vf4, -32(a0) # store fixed quat.
B2:
lw a0, 0(sp)
sll r0, r0, 0
jr ra
daddiu sp, sp, 16
jr ra
daddu sp, sp, r0
```
# `decompress-fixed-data-to-accumulator`
Useful types
```lisp
(deftype joint-anim-compressed-hdr (structure)
"Header for the compressed joint animation format."
((control-bits uint32 14 :offset-assert 0) ;; guessed by decompiler
(num-joints uint32 :offset-assert 56)
(matrix-bits uint32 :offset-assert 60)
)
:method-count-assert 9
:size-assert #x40
:flag-assert #x900000040
)
(deftype joint-anim-compressed-fixed (structure)
((hdr joint-anim-compressed-hdr :inline :offset-assert 0)
(offset-64 uint32 :offset-assert 64)
(offset-32 uint32 :offset-assert 68)
(offset-16 uint32 :offset-assert 72)
(reserved uint32 :offset-assert 76)
(data vector 133 :inline :offset-assert 80) ;; guessed by decompiler
)
```
```python
# a0 = accumulator
# a1 = fixed data (joint-anim-compressed-fixed)
# a2 = amount
# t1 = jacp-header (unset when we get here)
# s2 = num-joints
# vf14 = [1, 4, 4, 1]
L79:
# copy the header to the jacp-header (i think, so we can continue using in frame functions
# even when dma'ing the next upload's fixed to here)
lq t4, 0(a1) # t4 = fixed.hdr.quad[0]
daddiu sp, sp, -16
lq t5, 16(a1) # t4 = fixed.hdr.quad[1]
sll r0, r0, 0
sq t4, 0(t1) # move to jacp header
sll r0, r0, 0
sq t5, 16(t1) # move to jacp header
sll r0, r0, 0
lq t4, 32(a1) # quad2
sll r0, r0, 0
lq t5, 48(a1) # quad3
sll r0, r0, 0
sq t4, 32(t1)
sll r0, r0, 0
sq t5, 48(t1)
sll r0, r0, 0
sq a0, 0(sp) # backup acc pointer
sll r0, r0, 0
qmtc2.i vf13, a2 # vf13.x = amount
lui t2, 28672 # scratchpad addr
lw t4, 64(a1) # t4 = fixed.offset-64
daddiu v1, a1, 80 # v1 = fixed.data
lw t5, 68(a1) # t4 = fixed.offset-32
daddu s5, t1, r0 # s5 = control-bits-ptr
lw t6, 72(a1) # t6 = fixed.offset-16
daddu t4, t4, v1 # t4 = fixed.data + fixed.offset-64
vmulx.xyzw vf13, vf14, vf13 # vf13 = [amt, 4*amt, 4*amt, amt]
daddiu t2, t2, 1616 # t2 = fix-jmp-table
lw s2, 56(t1) # hdr.num_joints = num-joints
daddu t5, t5, v1 # t5 = fixed.data + fixed.offset-32
lw s4, 60(t1) # s4 = matrix-bits
daddu t6, t6, v1 # t6 = fixed.data + fixed.offset-16
addiu s3, r0, 8 # s3 = 8
daddiu s5, s5, 4 # s5 (control-bits-ptr) += 4 bytes
andi t3, s4, 1 # t3 = matrix-bits & 1
sll r0, r0, 0
bne t3, r0, L80
sll r0, r0, 0
B1: # if (matrix-bits & 1) == 0
lqc2 vf1, 0(t4) # load matrix from offset-64 data (1, 2, 4, 4)
lqc2 vf2, 16(t4)
lqc2 vf3, 32(t4)
lqc2 vf4, 48(t4)
lqc2 vf9, 0(a0) # load matrix from accumulator
daddiu t4, t4, 64 # increment data-64 ptr
lqc2 vf10, 16(a0)
sll r0, r0, 0
lqc2 vf11, 32(a0)
lqc2 vf12, 48(a0)
vmulaw.xyzw acc, vf9, vf0 # acc = existing
vmaddx.xyzw vf9, vf1, vf13 # existing += amount * new
vmulaw.xyzw acc, vf10, vf0
vmaddx.xyzw vf10, vf2, vf13
vmulaw.xyzw acc, vf11, vf0
vmaddx.xyzw vf11, vf3, vf13
vmulaw.xyzw acc, vf12, vf0
vmaddx.xyzw vf12, vf4, vf13
sqc2 vf9, 0(a0) # store back modified matrix.
sqc2 vf10, 16(a0)
sqc2 vf11, 32(a0)
sqc2 vf12, 48(a0)
B2:
L80: # endif
andi t3, s4, 2 # check matrix bit
daddiu a0, a0, 64 # increment acc ptr to second matrix
bne t3, r0, L81
sll r0, r0, 0
B3: # if (matrix-bits & 2) == 0
# same stuff as for the first matrix.
lqc2 vf1, 0(t4)
lqc2 vf2, 16(t4)
lqc2 vf3, 32(t4)
lqc2 vf4, 48(t4)
lqc2 vf9, 0(a0)
daddiu t4, t4, 64
lqc2 vf10, 16(a0)
sll r0, r0, 0
lqc2 vf11, 32(a0)
lqc2 vf12, 48(a0)
vmulaw.xyzw acc, vf9, vf0
vmaddx.xyzw vf9, vf1, vf13
vmulaw.xyzw acc, vf10, vf0
vmaddx.xyzw vf10, vf2, vf13
vmulaw.xyzw acc, vf11, vf0
vmaddx.xyzw vf11, vf3, vf13
vmulaw.xyzw acc, vf12, vf0
vmaddx.xyzw vf12, vf4, vf13
sqc2 vf9, 0(a0)
sqc2 vf10, 16(a0)
sqc2 vf11, 32(a0)
sqc2 vf12, 48(a0)
B4:
L81:
lw s4, -4(s5) # load control-bits-ptr[-1]
daddiu a0, a0, 64 # increment to first TQ
B5:
L82:
andi t3, s4, 15 # grab lower 4 bits of the control bits
sra s4, s4, 4 # shift in next 4
sll t3, t3, 2 # multiply by 4 (ah - this tells us which case!)
daddiu s3, s3, -1 # count down if we need to load the next u32 control bits or not.
daddu t3, t3, t2 # look up in the jump table
daddiu s2, s2, -1 # count down joints
lw t3, 0(t3) # load from jump table!
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
jr t3
sll r0, r0, 0
B6: # JUMP TABLE 7 and 15, also used to advance
L83:
beq s2, r0, L90 # if out of joints, end.
daddiu a0, a0, 48 # increment acc ptr.
B7:
bne s3, r0, L82 # see if we need to load a new control word or not.
sll r0, r0, 0
B8:
lw s4, 0(s5) # load up new control word
daddiu s5, s5, 4
beq r0, r0, L82
addiu s3, r0, 8 # back to the beginning!
B9: # JUMP TABLE 0
ld t9, 0(t4) # load one uint64 from data64
daddiu t4, t4, 8
lqc2 vf6, 16(a0) # vf6 = acc.quat
pextlh t9, t9, r0 # expand to uint32's (upper)
psraw t9, t9, 16 # treat as signed s16
qmtc2.i vf4, t9 # vf4 = new_data (sext)
lw s6, 0(t5) # load one uint32 from data32
daddiu t5, t5, 4
lh gp, 0(t6) # load one uint16 from data16
daddiu t6, t6, 2
lqc2 vf3, 0(a0) # vf3 = acc.trans
vitof15.xyzw vf4, vf4 # quat int -> float
pextlw s6, gp, s6 # s6 = [XX, XX, u16, u32]
pextlh s6, s6, r0 # s6 = [XX, u16, u32[1], u32[0]]
psraw s6, s6, 16 # sext
vmul.xyzw vf10, vf4, vf6 # elementwise quat product
qmtc2.i vf1, s6 # vf1 = the first triple
lw t8, 0(t5) # load another uint32, uint16
daddiu t5, t5, 4
lh fp, 0(t6)
daddiu t6, t6, 2
vmulaw.xyzw acc, vf10, vf0 # add up quaternions (dot product)
vmaddaz.xyzw acc, vf0, vf10
vmadday.xyzw acc, vf0, vf10
vmaddx.xyzw vf10, vf0, vf10
vitof0.xyzw vf1, vf1 # first triple to float (trans)
lqc2 vf9, 32(a0) # acc scale
pextlw t8, fp, t8 # triple stuff
qmfc2.i t9, vf10 # quat dot to grp
pextlh t8, t8, r0 # triple stuff
psraw t8, t8, 16 # triple sext
qmtc2.i vf7, t8 # second triple
pcpyud t9, t9, r0 # check sign of quaternion
bltzl t9, L84
B10:
vsub.xyzw vf4, vf15, vf4 # flip quaternion if dot product is negative.
B11:
L84:
vitof12.xyzw vf7, vf7 # second triple (scale)
vmulaw.xyzw acc, vf6, vf0
vmaddx.xyzw vf6, vf4, vf13 # vf6 = acc.quat + amount * decompressed_quat
vmulaw.xyzw acc, vf3, vf0
vmaddy.xyzw vf3, vf1, vf13 # vf3 = acc.trans + amount * 4 * decompressed_trans
vmulaw.xyzw acc, vf9, vf0 # vf9 = acc.scale + amount * decompressed_scale
vmaddx.xyzw vf9, vf7, vf13
sqc2 vf3, 0(a0)
sqc2 vf6, 16(a0)
sqc2 vf9, 32(a0)
beq r0, r0, L83
sll r0, r0, 0
B12: # JUMP TABLE 8
ld t9, 8(t4) # one ahead data64 load??
daddiu t4, t4, 8
lqc2 vf6, 16(a0) # vf6 = acc.quat
pextlh t9, t9, r0 #
psraw t9, t9, 16
qmtc2.i vf4, t9 # quat expand
ld s6, -8(t4) # now load the previous data64
daddiu t4, t4, 8
lw gp, 0(t5) # load one uint32 from data32
daddiu t5, t5, 4
lqc2 vf3, 0(a0) # vf3 = acc.trans
vitof15.xyzw vf4, vf4 # quat int->float
pcpyld s6, gp, s6 # trans combingin ints
vmul.xyzw vf10, vf4, vf6 # quat dot
qmtc2.i vf1, s6 # trans int
lw t8, 0(t5) # scale load
daddiu t5, t5, 4
lh fp, 0(t6)
daddiu t6, t6, 2
vmulaw.xyzw acc, vf10, vf0 # quat dot add
vmaddaz.xyzw acc, vf0, vf10
vmadday.xyzw acc, vf0, vf10
vmaddx.xyzw vf10, vf0, vf10
lqc2 vf9, 32(a0) # existing scale
pextlw t8, fp, t8
qmfc2.i t9, vf10
pextlh t8, t8, r0
psraw t8, t8, 16
qmtc2.i vf7, t8
pcpyud t9, t9, r0
bltzl t9, L85
B13:
vsub.xyzw vf4, vf15, vf4
B14:
L85:
vitof12.xyzw vf7, vf7
vmulaw.xyzw acc, vf6, vf0
vmaddx.xyzw vf6, vf4, vf13
vmulaw.xyzw acc, vf3, vf0
vmaddx.xyzw vf3, vf1, vf13
vmulaw.xyzw acc, vf9, vf0
vmaddx.xyzw vf9, vf7, vf13
sqc2 vf3, 0(a0)
sqc2 vf6, 16(a0)
sqc2 vf9, 32(a0)
beq r0, r0, L83
sll r0, r0, 0
B15: # JUMP 1, 9
ld t9, 0(t4) # 64 load for quat
daddiu t4, t4, 8
lqc2 vf6, 16(a0) # vf6 = acc.quat
pextlh t9, t9, r0
psraw t9, t9, 16
qmtc2.i vf4, t9 # vf4 = quat expand int
lw t8, 0(t5) # load a 32-16
daddiu t5, t5, 4
lh fp, 0(t6)
daddiu t6, t6, 2
lqc2 vf9, 32(a0) # vf9 = acc.scale
vitof15.xyzw vf4, vf4
pextlw t8, fp, t8
pextlh t8, t8, r0
psraw t8, t8, 16 # scale expand
vmul.xyzw vf10, vf4, vf6
qmtc2.i vf7, t8 # vf7 = scale
vmulaw.xyzw acc, vf10, vf0
vitof12.xyzw vf7, vf7
vmaddaz.xyzw acc, vf0, vf10
vmadday.xyzw acc, vf0, vf10
vmaddx.xyzw vf10, vf0, vf10
vmulaw.xyzw acc, vf9, vf0
vmaddx.xyzw vf9, vf7, vf13
qmfc2.i t9, vf10
pcpyud t9, t9, r0
bltzl t9, L86
B16:
vsub.xyzw vf4, vf15, vf4
B17:
L86:
vmulaw.xyzw acc, vf6, vf0
vmaddx.xyzw vf6, vf4, vf13
sqc2 vf6, 16(a0)
sqc2 vf9, 32(a0)
beq r0, r0, L83
sll r0, r0, 0
B18: # JUMP 2
lw s6, 0(t5) # load 32, 16
daddiu t5, t5, 4
lh gp, 0(t6)
daddiu t6, t6, 2
lqc2 vf3, 0(a0) # old trans
pextlw s6, gp, s6
pextlh s6, s6, r0
psraw s6, s6, 16
qmtc2.i vf1, s6
lw t8, 0(t5) # load 32, 16
daddiu t5, t5, 4
lh fp, 0(t6)
daddiu t6, t6, 2
lqc2 vf9, 32(a0)
vitof0.xyzw vf1, vf1
pextlw t8, fp, t8
pextlh t8, t8, r0
psraw t8, t8, 16
qmtc2.i vf7, t8
vmulaw.xyzw acc, vf3, vf0
vmaddy.xyzw vf3, vf1, vf13
vitof12.xyzw vf7, vf7
vmulaw.xyzw acc, vf9, vf0
vmaddx.xyzw vf9, vf7, vf13
sqc2 vf3, 0(a0)
sqc2 vf9, 32(a0)
beq r0, r0, L83
sll r0, r0, 0
B19: # JUMP 10
ld s6, 0(t4)
daddiu t4, t4, 8
lw gp, 0(t5)
daddiu t5, t5, 4
lqc2 vf3, 0(a0)
pcpyld s6, gp, s6
qmtc2.i vf1, s6
lw t8, 0(t5)
daddiu t5, t5, 4
lh fp, 0(t6)
daddiu t6, t6, 2
lqc2 vf9, 32(a0)
pextlw t8, fp, t8
pextlh t8, t8, r0
psraw t8, t8, 16
qmtc2.i vf7, t8
vmulaw.xyzw acc, vf3, vf0
vmaddx.xyzw vf3, vf1, vf13
vitof12.xyzw vf7, vf7
vmulaw.xyzw acc, vf9, vf0
vmaddx.xyzw vf9, vf7, vf13
sqc2 vf3, 0(a0)
sqc2 vf9, 32(a0)
beq r0, r0, L83
sll r0, r0, 0
B20: # JUMP 3, 11
lw t8, 0(t5)
daddiu t5, t5, 4
lh fp, 0(t6)
daddiu t6, t6, 2
lqc2 vf9, 32(a0)
pextlw t8, fp, t8
pextlh t8, t8, r0
psraw t8, t8, 16
qmtc2.i vf7, t8
vitof12.xyzw vf7, vf7
vmulaw.xyzw acc, vf9, vf0
vmaddx.xyzw vf9, vf7, vf13
sqc2 vf9, 32(a0)
beq r0, r0, L83
sll r0, r0, 0
B21: # JUMP 4
ld t9, 0(t4)
daddiu t4, t4, 8
lqc2 vf6, 16(a0)
pextlh t9, t9, r0
psraw t9, t9, 16
qmtc2.i vf4, t9
lw s6, 0(t5)
daddiu t5, t5, 4
lh gp, 0(t6)
daddiu t6, t6, 2
lqc2 vf3, 0(a0)
vitof15.xyzw vf4, vf4
pextlw s6, gp, s6
pextlh s6, s6, r0
psraw s6, s6, 16
vmul.xyzw vf10, vf4, vf6
qmtc2.i vf1, s6
vmulaw.xyzw acc, vf10, vf0
vitof0.xyzw vf1, vf1
vmaddaz.xyzw acc, vf0, vf10
vmadday.xyzw acc, vf0, vf10
vmaddx.xyzw vf10, vf0, vf10
vmulaw.xyzw acc, vf3, vf0
vmaddy.xyzw vf3, vf1, vf13
qmfc2.i t9, vf10
pcpyud t9, t9, r0
bltzl t9, L87
B22:
vsub.xyzw vf4, vf15, vf4
B23:
L87:
vmulaw.xyzw acc, vf6, vf0
vmaddx.xyzw vf6, vf4, vf13
sqc2 vf3, 0(a0)
sqc2 vf6, 16(a0)
beq r0, r0, L83
sll r0, r0, 0
B24: # JUMP 12
ld t9, 8(t4)
daddiu t4, t4, 8
lqc2 vf6, 16(a0)
pextlh t9, t9, r0
psraw t9, t9, 16
qmtc2.i vf4, t9
ld s6, -8(t4)
daddiu t4, t4, 8
lw gp, 0(t5)
daddiu t5, t5, 4
lqc2 vf3, 0(a0)
vitof15.xyzw vf4, vf4
pcpyld s6, gp, s6
vmul.xyzw vf10, vf4, vf6
qmtc2.i vf1, s6
vmulaw.xyzw acc, vf10, vf0
vmaddaz.xyzw acc, vf0, vf10
vmadday.xyzw acc, vf0, vf10
vmaddx.xyzw vf10, vf0, vf10
vmulaw.xyzw acc, vf3, vf0
vmaddx.xyzw vf3, vf1, vf13
qmfc2.i t9, vf10
pcpyud t9, t9, r0
bltzl t9, L88
B25:
vsub.xyzw vf4, vf15, vf4
B26:
L88:
vmulaw.xyzw acc, vf6, vf0
vmaddx.xyzw vf6, vf4, vf13
sqc2 vf3, 0(a0)
sqc2 vf6, 16(a0)
beq r0, r0, L83
sll r0, r0, 0
B27: # JUMP 5, 13
ld t9, 0(t4)
daddiu t4, t4, 8
lqc2 vf6, 16(a0)
pextlh t9, t9, r0
psraw t9, t9, 16
qmtc2.i vf4, t9
vitof15.xyzw vf4, vf4
vmul.xyzw vf10, vf4, vf6
vmulaw.xyzw acc, vf10, vf0
vmaddaz.xyzw acc, vf0, vf10
vmadday.xyzw acc, vf0, vf10
vmaddx.xyzw vf10, vf0, vf10
qmfc2.i t9, vf10
pcpyud t9, t9, r0
bltzl t9, L89
B28:
vsub.xyzw vf4, vf15, vf4
B29:
L89:
vmulaw.xyzw acc, vf6, vf0
vmaddx.xyzw vf6, vf4, vf13
sqc2 vf6, 16(a0)
beq r0, r0, L83
sll r0, r0, 0
B30: # JUMP 6
lw s6, 0(t5)
daddiu t5, t5, 4
lh gp, 0(t6)
daddiu t6, t6, 2
lqc2 vf3, 0(a0)
pextlw s6, gp, s6
pextlh s6, s6, r0
psraw s6, s6, 16
qmtc2.i vf1, s6
vitof0.xyzw vf1, vf1
vmulaw.xyzw acc, vf3, vf0
vmaddy.xyzw vf3, vf1, vf13
sqc2 vf3, 0(a0)
beq r0, r0, L83
sll r0, r0, 0
B31:
ld s6, 0(t4)
daddiu t4, t4, 8
lw gp, 0(t5)
daddiu t5, t5, 4
lqc2 vf3, 0(a0)
pcpyld s6, gp, s6
qmtc2.i vf1, s6
vmulaw.xyzw acc, vf3, vf0
vmaddx.xyzw vf3, vf1, vf13
sqc2 vf3, 0(a0)
beq r0, r0, L83
sll r0, r0, 0
B32:
L90:
lq a0, 0(sp)
sll r0, r0, 0
jr ra
daddiu sp, sp, 16
jr ra
daddu sp, sp, r0
```
Explanation:
```python
# set up pointers
data64 = data + offset_64
data32 = data + offset_32
data16 = data + offset_16
# do matrices
if (matrix_bits & 1) == 0:
acc.matrix[0] += amount * Matrix(data64)
data64 += sizeof(Matrix)
if (matrix_bits & 2) == 0:
acc.matrix[1] += amount * Matrix(data64)
data64 += sizeof(Matrix)
# decomp helpers
def unpack_quat_s16s_from_d64():
data = S16x4(data64)
data64++
quat_ints = [data[0], data[1], data[2], data[3]]
# the control words have 4-bits per joint
for joint_idx in range(num_joints):
control = control_words.extract_4bits(joint_idx)
if control == 0:
# quat is packed as 4x s16's in data64, scale is 32768
# dot product is checked before lerping.
# trans is packed as 2x s16's in data32, 1x s16 in data16, scale is 4
# scale is packed as 2x s16's in data32, 1x s16 in data16, scale is 4096
if control == 7:
pass # nothing to do!
if control == 8:
# trans is packed as 2x s32's in data64, 1x s32 in data32, scale is 1
# quat is packed as 4x s16's in data64, scale is 32768
# dot product is checked before lerping.
# scale is packed as 2x s16's in data32, 1x s16 in data16, scale is 4096
if control == 15:
pass # nothing to do!
```
Control Modes:
0. FIXED has `trans-32-16-4`, `quat-64-32768`, `scale-32-16-4096`, FRAME has nothing
1. FIXED has `trans-0`, `quat-64-32768`, `scale-32-16-4096`, FRAME has trans (small)
2. FIXED has `trans-32-16-4`, `quat-0`, `scale-32-16-4096`, FRAME has quat
3. FIXED has `trans-0`, `quat-0`, `scale-32-16-4096`
4. FIXED has `trans-32-16-4`, `quat-64-32768`, `scale-0`
5. FIXED has `trans-0`, `quat-64-32768`, `scale-0`
6. FIXED has `trans-32-16-4`, `quat-0`, `scale-0`
7. FIXED 0
8. FIXED as `trans-64-32-1`, `quat-64-32768`, `scale-32-16-4096`, FRAME has nothing
9. FIXED has `trans-0`, `quat-64-32768`, `scale-32-16-4096`, FRAME has trans (large)
10. FIXED has `trans-64-32-1`, `quat-0`, `scale-32-16-4096`
11. FIXED has `trans-0`, `quat-0`, `scale-32-16-4096`, FRAME HAS QUAT
12. FIXED has `trans-64-32-1`, `quat-64-32768`, `scale-0`
13. FIXED has `trans-0`, `quat-64-32768`, `scale-0`
14. FIXED has `trans-64-32-1`, `quat-0`, `scale-0`
15. FIXED 0
So bits are:
control:
0. frame-trans
1. frame-quat
2. frame-scale
4. big-trans
matrix:
0. matrix 0 from frame
1. matrix 1 from frame
# `decompress-frame-data-to-accumulator`
```lisp
(deftype joint-anim-compressed-frame (structure)
((offset-64 uint32 :offset-assert 0)
(offset-32 uint32 :offset-assert 4)
(offset-16 uint32 :offset-assert 8)
(reserved uint32 :offset-assert 12)
(data vector 133 :inline :offset-assert 16) ;; guessed by decompiler
)
:method-count-assert 9
:size-assert #x860
:flag-assert #x900000860
)
```
```python
# a2 = amount (weight)
# a1 = frame-data
qmtc2.i vf13, a2 # vf13 = amount
sq a0, 0(sp) # back up accumulator
lui t2, 28672 # t2 = spad
lw t4, 0(a1) # t4 = offset64
daddiu v1, a1, 16 # v1 = data
lw t5, 4(a1) # offset32
daddu s5, t1, r0 # control bits ptr
lw t6, 8(a1) # offset16
daddu t4, t4, v1 # t4 = data64
vmulx.xyzw vf13, vf14, vf13 # usual [amt, 4*amt, 4*amt, amt] stuff
daddiu t2, t2, 1680 # t2 = jump table
lw s2, 56(t1) # s2 = num-joints
daddu t5, t5, v1 # t5 = data32
lw s4, 60(t1) # s4 = matrix-bits
daddu t6, t6, v1 # t6 = data16
addiu s3, r0, 8 # load next u32 word counter
daddiu s5, s5, 4 # control word ptr
andi t3, s4, 1 # check matrix bit
sll r0, r0, 0
beq t3, r0, L68
sll r0, r0, 0
B1: # if matrix bit is set:
lqc2 vf1, 0(t4)
lqc2 vf2, 16(t4)
lqc2 vf3, 32(t4)
lqc2 vf4, 48(t4)
lqc2 vf9, 0(a0)
daddiu t4, t4, 64
lqc2 vf10, 16(a0)
sll r0, r0, 0
lqc2 vf11, 32(a0)
lqc2 vf12, 48(a0)
vmulaw.xyzw acc, vf9, vf0
vmaddx.xyzw vf9, vf1, vf13
vmulaw.xyzw acc, vf10, vf0
vmaddx.xyzw vf10, vf2, vf13
vmulaw.xyzw acc, vf11, vf0
vmaddx.xyzw vf11, vf3, vf13
vmulaw.xyzw acc, vf12, vf0
vmaddx.xyzw vf12, vf4, vf13
sqc2 vf9, 0(a0)
sqc2 vf10, 16(a0)
sqc2 vf11, 32(a0)
sqc2 vf12, 48(a0)
B2:
L68:
andi t3, s4, 2
daddiu a0, a0, 64
beq t3, r0, L69
sll r0, r0, 0
B3:
lqc2 vf1, 0(t4)
lqc2 vf2, 16(t4)
lqc2 vf3, 32(t4)
lqc2 vf4, 48(t4)
lqc2 vf9, 0(a0)
daddiu t4, t4, 64
lqc2 vf10, 16(a0)
sll r0, r0, 0
lqc2 vf11, 32(a0)
lqc2 vf12, 48(a0)
vmulaw.xyzw acc, vf9, vf0
vmaddx.xyzw vf9, vf1, vf13
vmulaw.xyzw acc, vf10, vf0
vmaddx.xyzw vf10, vf2, vf13
vmulaw.xyzw acc, vf11, vf0
vmaddx.xyzw vf11, vf3, vf13
vmulaw.xyzw acc, vf12, vf0
vmaddx.xyzw vf12, vf4, vf13
sqc2 vf9, 0(a0)
sqc2 vf10, 16(a0)
sqc2 vf11, 32(a0)
sqc2 vf12, 48(a0)
B4:
L69:
lw s4, -4(s5)
daddiu a0, a0, 64
B5:
L70:
andi t3, s4, 15
sra s4, s4, 4
sll t3, t3, 2
daddiu s3, s3, -1
daddu t3, t3, t2
daddiu s2, s2, -1
lw t3, 0(t3)
sll r0, r0, 0
sll r0, r0, 0
sll r0, r0, 0
jr t3
sll r0, r0, 0
B6: # JUMP 0, 8
L71:
beq s2, r0, L78
daddiu a0, a0, 48
B7:
bne s3, r0, L70
sll r0, r0, 0
B8:
lw s4, 0(s5)
daddiu s5, s5, 4
beq r0, r0, L70
addiu s3, r0, 8
B9: # JUMP 1
lw s6, 0(t5)
daddiu t5, t5, 4
lh gp, 0(t6)
daddiu t6, t6, 2
lqc2 vf3, 0(a0)
pextlw s6, gp, s6
pextlh s6, s6, r0
psraw s6, s6, 16
qmtc2.i vf1, s6
vitof0.xyzw vf1, vf1
vmulaw.xyzw acc, vf3, vf0
vmaddy.xyzw vf3, vf1, vf13
sqc2 vf3, 0(a0)
beq r0, r0, L71
sll r0, r0, 0
B10: # JUMP 9
ld s6, 0(t4)
daddiu t4, t4, 8
lw gp, 0(t5)
daddiu t5, t5, 4
lqc2 vf3, 0(a0)
pcpyld s6, gp, s6
qmtc2.i vf1, s6
vmulaw.xyzw acc, vf3, vf0
vmaddx.xyzw vf3, vf1, vf13
sqc2 vf3, 0(a0)
beq r0, r0, L71
sll r0, r0, 0
B11:
ld t9, 0(t4)
daddiu t4, t4, 8
lqc2 vf6, 16(a0)
pextlh t9, t9, r0
psraw t9, t9, 16
qmtc2.i vf4, t9
vitof15.xyzw vf4, vf4
vmul.xyzw vf10, vf4, vf6
vmulaw.xyzw acc, vf10, vf0
vmaddaz.xyzw acc, vf0, vf10
vmadday.xyzw acc, vf0, vf10
vmaddx.xyzw vf10, vf0, vf10
qmfc2.i t9, vf10
pcpyud t9, t9, r0
bltzl t9, L72
B12:
vsub.xyzw vf4, vf15, vf4
B13:
L72:
vmulaw.xyzw acc, vf6, vf0
vmaddx.xyzw vf6, vf4, vf13
sqc2 vf6, 16(a0)
beq r0, r0, L71
sll r0, r0, 0
B14:
ld t9, 0(t4)
daddiu t4, t4, 8
lqc2 vf6, 16(a0)
pextlh t9, t9, r0
psraw t9, t9, 16
qmtc2.i vf4, t9
lw s6, 0(t5)
daddiu t5, t5, 4
lh gp, 0(t6)
daddiu t6, t6, 2
lqc2 vf3, 0(a0)
vitof15.xyzw vf4, vf4
pextlw s6, gp, s6
pextlh s6, s6, r0
psraw s6, s6, 16
vmul.xyzw vf10, vf4, vf6
qmtc2.i vf1, s6
vmulaw.xyzw acc, vf10, vf0
vitof0.xyzw vf1, vf1
vmaddaz.xyzw acc, vf0, vf10
vmadday.xyzw acc, vf0, vf10
vmaddx.xyzw vf10, vf0, vf10
vmulaw.xyzw acc, vf3, vf0
vmaddy.xyzw vf3, vf1, vf13
qmfc2.i t9, vf10
pcpyud t9, t9, r0
bltzl t9, L73
B15:
vsub.xyzw vf4, vf15, vf4
B16:
L73:
vmulaw.xyzw acc, vf6, vf0
vmaddx.xyzw vf6, vf4, vf13
sqc2 vf3, 0(a0)
sqc2 vf6, 16(a0)
beq r0, r0, L71
sll r0, r0, 0
B17:
ld t9, 8(t4)
daddiu t4, t4, 8
lqc2 vf6, 16(a0)
pextlh t9, t9, r0
psraw t9, t9, 16
qmtc2.i vf4, t9
ld s6, -8(t4)
daddiu t4, t4, 8
lw gp, 0(t5)
daddiu t5, t5, 4
lqc2 vf3, 0(a0)
vitof15.xyzw vf4, vf4
pcpyld s6, gp, s6
vmul.xyzw vf10, vf4, vf6
qmtc2.i vf1, s6
vmulaw.xyzw acc, vf10, vf0
vmaddaz.xyzw acc, vf0, vf10
vmadday.xyzw acc, vf0, vf10
vmaddx.xyzw vf10, vf0, vf10
vmulaw.xyzw acc, vf3, vf0
vmaddx.xyzw vf3, vf1, vf13
qmfc2.i t9, vf10
pcpyud t9, t9, r0
bltzl t9, L74
B18:
vsub.xyzw vf4, vf15, vf4
B19:
L74:
vmulaw.xyzw acc, vf6, vf0
vmaddx.xyzw vf6, vf4, vf13
sqc2 vf3, 0(a0)
sqc2 vf6, 16(a0)
beq r0, r0, L71
sll r0, r0, 0
B20:
lw t8, 0(t5)
daddiu t5, t5, 4
lh fp, 0(t6)
daddiu t6, t6, 2
lqc2 vf9, 32(a0)
pextlw t8, fp, t8
pextlh t8, t8, r0
psraw t8, t8, 16
qmtc2.i vf7, t8
vitof12.xyzw vf7, vf7
vmulaw.xyzw acc, vf9, vf0
vmaddx.xyzw vf9, vf7, vf13
sqc2 vf9, 32(a0)
beq r0, r0, L71
sll r0, r0, 0
B21:
lw s6, 0(t5)
daddiu t5, t5, 4
lh gp, 0(t6)
daddiu t6, t6, 2
lqc2 vf3, 0(a0)
pextlw s6, gp, s6
pextlh s6, s6, r0
psraw s6, s6, 16
qmtc2.i vf1, s6
lw t8, 0(t5)
daddiu t5, t5, 4
lh fp, 0(t6)
daddiu t6, t6, 2
lqc2 vf9, 32(a0)
vitof0.xyzw vf1, vf1
pextlw t8, fp, t8
pextlh t8, t8, r0
psraw t8, t8, 16
qmtc2.i vf7, t8
vmulaw.xyzw acc, vf3, vf0
vmaddy.xyzw vf3, vf1, vf13
vitof12.xyzw vf7, vf7
vmulaw.xyzw acc, vf9, vf0
vmaddx.xyzw vf9, vf7, vf13
sqc2 vf3, 0(a0)
sqc2 vf9, 32(a0)
beq r0, r0, L71
sll r0, r0, 0
B22:
ld s6, 0(t4)
daddiu t4, t4, 8
lw gp, 0(t5)
daddiu t5, t5, 4
lqc2 vf3, 0(a0)
pcpyld s6, gp, s6
qmtc2.i vf1, s6
lw t8, 0(t5)
daddiu t5, t5, 4
lh fp, 0(t6)
daddiu t6, t6, 2
lqc2 vf9, 32(a0)
pextlw t8, fp, t8
pextlh t8, t8, r0
psraw t8, t8, 16
qmtc2.i vf7, t8
vmulaw.xyzw acc, vf3, vf0
vmaddx.xyzw vf3, vf1, vf13
vitof12.xyzw vf7, vf7
vmulaw.xyzw acc, vf9, vf0
vmaddx.xyzw vf9, vf7, vf13
sqc2 vf3, 0(a0)
sqc2 vf9, 32(a0)
beq r0, r0, L71
sll r0, r0, 0
B23:
ld t9, 0(t4)
daddiu t4, t4, 8
lqc2 vf6, 16(a0)
pextlh t9, t9, r0
psraw t9, t9, 16
qmtc2.i vf4, t9
lw t8, 0(t5)
daddiu t5, t5, 4
lh fp, 0(t6)
daddiu t6, t6, 2
lqc2 vf9, 32(a0)
vitof15.xyzw vf4, vf4
pextlw t8, fp, t8
pextlh t8, t8, r0
psraw t8, t8, 16
vmul.xyzw vf10, vf4, vf6
qmtc2.i vf7, t8
vmulaw.xyzw acc, vf10, vf0
vitof12.xyzw vf7, vf7
vmaddaz.xyzw acc, vf0, vf10
vmadday.xyzw acc, vf0, vf10
vmaddx.xyzw vf10, vf0, vf10
vmulaw.xyzw acc, vf9, vf0
vmaddx.xyzw vf9, vf7, vf13
qmfc2.i t9, vf10
pcpyud t9, t9, r0
bltzl t9, L75
B24:
vsub.xyzw vf4, vf15, vf4
B25:
L75:
vmulaw.xyzw acc, vf6, vf0
vmaddx.xyzw vf6, vf4, vf13
sqc2 vf6, 16(a0)
sqc2 vf9, 32(a0)
beq r0, r0, L71
sll r0, r0, 0
B26:
ld t9, 0(t4)
daddiu t4, t4, 8
lqc2 vf6, 16(a0)
pextlh t9, t9, r0
psraw t9, t9, 16
qmtc2.i vf4, t9
lw s6, 0(t5)
daddiu t5, t5, 4
lh gp, 0(t6)
daddiu t6, t6, 2
lqc2 vf3, 0(a0)
vitof15.xyzw vf4, vf4
pextlw s6, gp, s6
pextlh s6, s6, r0
psraw s6, s6, 16
vmul.xyzw vf10, vf4, vf6
qmtc2.i vf1, s6
lw t8, 0(t5)
daddiu t5, t5, 4
lh fp, 0(t6)
daddiu t6, t6, 2
vmulaw.xyzw acc, vf10, vf0
vmaddaz.xyzw acc, vf0, vf10
vmadday.xyzw acc, vf0, vf10
vmaddx.xyzw vf10, vf0, vf10
vitof0.xyzw vf1, vf1
lqc2 vf9, 32(a0)
pextlw t8, fp, t8
qmfc2.i t9, vf10
pextlh t8, t8, r0
psraw t8, t8, 16
qmtc2.i vf7, t8
pcpyud t9, t9, r0
bltzl t9, L76
B27:
vsub.xyzw vf4, vf15, vf4
B28:
L76:
vitof12.xyzw vf7, vf7
vmulaw.xyzw acc, vf6, vf0
vmaddx.xyzw vf6, vf4, vf13
vmulaw.xyzw acc, vf3, vf0
vmaddy.xyzw vf3, vf1, vf13
vmulaw.xyzw acc, vf9, vf0
vmaddx.xyzw vf9, vf7, vf13
sqc2 vf3, 0(a0)
sqc2 vf6, 16(a0)
sqc2 vf9, 32(a0)
beq r0, r0, L71
sll r0, r0, 0
B29:
ld t9, 8(t4)
daddiu t4, t4, 8
lqc2 vf6, 16(a0)
pextlh t9, t9, r0
psraw t9, t9, 16
qmtc2.i vf4, t9
ld s6, -8(t4)
daddiu t4, t4, 8
lw gp, 0(t5)
daddiu t5, t5, 4
lqc2 vf3, 0(a0)
vitof15.xyzw vf4, vf4
pcpyld s6, gp, s6
vmul.xyzw vf10, vf4, vf6
qmtc2.i vf1, s6
lw t8, 0(t5)
daddiu t5, t5, 4
lh fp, 0(t6)
daddiu t6, t6, 2
vmulaw.xyzw acc, vf10, vf0
vmaddaz.xyzw acc, vf0, vf10
vmadday.xyzw acc, vf0, vf10
vmaddx.xyzw vf10, vf0, vf10
lqc2 vf9, 32(a0)
pextlw t8, fp, t8
qmfc2.i t9, vf10
pextlh t8, t8, r0
psraw t8, t8, 16
qmtc2.i vf7, t8
pcpyud t9, t9, r0
bltzl t9, L77
B30:
vsub.xyzw vf4, vf15, vf4
B31:
L77:
vitof12.xyzw vf7, vf7
vmulaw.xyzw acc, vf6, vf0
vmaddx.xyzw vf6, vf4, vf13
vmulaw.xyzw acc, vf3, vf0
vmaddx.xyzw vf3, vf1, vf13
vmulaw.xyzw acc, vf9, vf0
vmaddx.xyzw vf9, vf7, vf13
sqc2 vf3, 0(a0)
sqc2 vf6, 16(a0)
sqc2 vf9, 32(a0)
beq r0, r0, L71
sll r0, r0, 0
B32:
L78:
lq a0, 0(sp)
sll r0, r0, 0
jr ra
daddiu sp, sp, 16
jr ra
daddu sp, sp, r0
```