mirror of
https://github.com/open-goal/jak-project.git
synced 2024-10-20 11:26:18 -04:00
79d14af0b5
I finally read through all the joint code and wrote up some documentation. I think this will be really helpful when we try to understand all the functions in `process-drawable`, or if somebody ever wants to import/export animations. This switches all three games to using a new faster GOAL joint decompressor. It is on by default, but you can go back to the old version by setting `*use-new-decompressor*` to false. Also fix the log-related crash, fix the clock speed used in timer math.
1684 lines
50 KiB
Markdown
1684 lines
50 KiB
Markdown
This is an explanation of reverse engineering and porting the Jak joint decompressor from PS2 assembly to higher-level OpenGOAL code. It's the same from all three games. I made this document as an example of how to approach this type of reverse engineering.
|
|
|
|
# What is `joint.gc`?
|
|
|
|
The "joint" system is used to play back animations.
|
|
|
|
At a very high level, the process to animate a character:
|
|
- the gameplay code uses the `ja` macros to set up joint animations.
|
|
- the process-drawable system updates the animation metadata in joint-control. This is responsible for
|
|
producing an array of joint-control-channel. The channels are arranged into a blend tree, and each
|
|
channel has a frame number (possibly in between frames) and interpolation weight for the blend tree.
|
|
- This file, joint.gc looks at those animations, frame numbers, blend weights, and tree structure, then
|
|
produces the relative transform between bones, called "joint transforms"
|
|
- The process-drawable system uses the joint transforms to compute bone transforms.
|
|
- The gameplay code and collision code use the bone transforms to determine world-space positions/rotations.
|
|
- The "bones.gc" system builds rendering matrices for the foreground renderer
|
|
- The merc, mercneric, and shadow renderers consume these matrices for their skinning calculations.
|
|
|
|
The focus of this documentation is the function that computes joint transforms.
|
|
|
|
There are three:
|
|
|
|
1. `create-interpolated-joint-animation-frame`, which is the most standard one. It takes a `joint-control`, a number of joints (since low lod versions might not need them all), and produces a `joint-anim-frame` - a list of joint transforms.
|
|
2. `create-interpolated2-joint-animation-frame`, which is the same, but uses the "interp2" system described later.
|
|
3. `cspace<-matrix-no-push-joint!` which computes a single joint transform. The joint must be one of the first two, and it uses a special `no-push` mode. More on this later.
|
|
|
|
Reading the top of `joint.gc` has a lot more details, if you are curious about the details of the blending, etc. That was written before touching the actually decompressor assembly. Figuring all this out first can be really helpful.
|
|
|
|
# Background information
|
|
|
|
The end result of this routine is to compute a `joint-anim-frame`:
|
|
```lisp
|
|
(deftype joint-anim-frame (structure)
|
|
((matrices matrix 2 :inline)
|
|
(data transformq :inline :dynamic)
|
|
)
|
|
(:methods
|
|
(new (symbol type int) _type_)
|
|
)
|
|
)
|
|
```
|
|
which is all the joint transforms for a interpolated frame. Note that there is interpolation in both time (in between frames) and between different animations (blending).
|
|
|
|
This is the code that is used to generate a `joint-anim-frame`:
|
|
```lisp
|
|
(defun create-interpolated-joint-animation-frame ((dst joint-anim-frame) (num-joints int) (jc joint-control))
|
|
(flatten-joint-control-to-spr jc)
|
|
(make-joint-jump-tables)
|
|
(calc-animation-from-spr dst num-joints)
|
|
0
|
|
)
|
|
```
|
|
|
|
# Flattening the Joint Control
|
|
|
|
The `flatten-joint-control-to-spr` function is written in plain GOAL, so I've decompiled/annotated it here.
|
|
The first part of the function is iterating through `joint-control-channel`s, and computing blend weights from a blend tree. This computation acts like a stack machine. There is a `push` operation to push a new stack frame, with all channels zero except for the specified animation. There is a `stack` operation which takes the two most top things on the stack, then adds them together with the given weights (effectively `pop`ing back 1 stack frame in the end). Finally, there is a `blend` operation, which modifies the top of the animation stack to be a blend of multiple animations. I imagine this can stay almost the same in the new decompressor, other than scratchpad use.
|
|
|
|
The second part builds a list of "uploads". This is a list of data that should be uploaded to the scratchpad. Each animation has some "fixed" data, which is a chunk of common data needed to produce any frame.
|
|
|
|
There is also "frame" data, one for each frame in the animation. If we need to interpolate between two frames, we upload 2 frames worth of data. Otherwise, if we want an exact frame, we just upload the one.
|
|
|
|
```lisp
|
|
|
|
(defun flatten-joint-control-to-spr ((jc joint-control))
|
|
"Walk the blend tree and compute interpolation weights, prepare animation upload info."
|
|
(rlet ((vf1 :class vf)
|
|
(vf10 :class vf)
|
|
(vf11 :class vf)
|
|
(vf12 :class vf)
|
|
(vf13 :class vf)
|
|
(vf14 :class vf)
|
|
(vf2 :class vf)
|
|
(vf3 :class vf)
|
|
(vf4 :class vf)
|
|
(vf5 :class vf)
|
|
(vf6 :class vf)
|
|
(vf7 :class vf)
|
|
(vf8 :class vf)
|
|
(vf9 :class vf)
|
|
)
|
|
;; We assume a maximum of 4 * 6 = 24 channels.
|
|
(let ((chan-count (+ (-> jc active-channels) (-> jc float-channels))))
|
|
(let ((one 1.0) ;; constant
|
|
(chan-float-offset 0) ;; which channel's weight to adjust.
|
|
(chan-vector-ptr (the-as (inline-array vector) #x70000960)) ;; stack pointer
|
|
(interp2-selected-idx (-> jc active-frame-interp)) ;; interp2 0 or 1 selector.
|
|
)
|
|
;; loop over channels
|
|
(dotimes (chan-idx (the-as int chan-count))
|
|
(let ((chan (-> jc channel chan-idx)))
|
|
(case (-> chan command)
|
|
(((joint-control-command push))
|
|
;; push a new stack frame with this anim set to 1.
|
|
(let ((flt1 (the-as (pointer float) (+ (the-as int chan-vector-ptr) chan-float-offset))))
|
|
;; initialize all channels in this frame to 0
|
|
(set! (-> chan-vector-ptr 0 quad) (the-as uint128 0))
|
|
(set! (-> chan-vector-ptr 1 quad) (the-as uint128 0))
|
|
(set! (-> chan-vector-ptr 2 quad) (the-as uint128 0))
|
|
(set! (-> chan-vector-ptr 3 quad) (the-as uint128 0))
|
|
(set! (-> chan-vector-ptr 4 quad) (the-as uint128 0))
|
|
(set! (-> chan-vector-ptr 5 quad) (the-as uint128 0))
|
|
;; then, set the weight of this animation to 1.
|
|
(set! (-> flt1 0) one)
|
|
)
|
|
;; advance stack pointer.
|
|
(set! chan-vector-ptr (the-as (inline-array vector) (-> chan-vector-ptr 6)))
|
|
)
|
|
(((joint-control-command blend) (joint-control-command push1) (joint-control-command float))
|
|
;; determine the blend factor for this animation.
|
|
(let ((interp2-selected-weight1 (-> chan frame-interp interp2-selected-idx)))
|
|
;; one - blend_factor gives us the blend weight for the previous thing in the frame
|
|
(let ((a3-5 (- one interp2-selected-weight1)))
|
|
(.mov vf1 a3-5)
|
|
)
|
|
;; modify the previous thing in the stack to reduce weight:
|
|
(let ((prev-chan-ptr (the-as (inline-array vector) (-> chan-vector-ptr -6))))
|
|
(.lvf vf2 (&-> prev-chan-ptr 0 quad))
|
|
(let ((a3-6 (&+ (the-as pointer prev-chan-ptr) chan-float-offset)))
|
|
(.lvf vf3 (&-> prev-chan-ptr 1 quad))
|
|
(.lvf vf4 (&-> prev-chan-ptr 2 quad))
|
|
(.lvf vf5 (&-> prev-chan-ptr 3 quad))
|
|
(.lvf vf6 (&-> prev-chan-ptr 4 quad))
|
|
(.lvf vf7 (&-> prev-chan-ptr 5 quad))
|
|
(.mul.x.vf vf2 vf2 vf1) ;; multiply all weights by (1 - push_blend)
|
|
(.mul.x.vf vf3 vf3 vf1)
|
|
(.mul.x.vf vf4 vf4 vf1)
|
|
(.mul.x.vf vf5 vf5 vf1)
|
|
(.mul.x.vf vf6 vf6 vf1)
|
|
(.mul.x.vf vf7 vf7 vf1)
|
|
(.svf (&-> prev-chan-ptr 0 quad) vf2)
|
|
(.svf (&-> prev-chan-ptr 1 quad) vf3)
|
|
(.svf (&-> prev-chan-ptr 2 quad) vf4)
|
|
(.svf (&-> prev-chan-ptr 3 quad) vf5)
|
|
(.svf (&-> prev-chan-ptr 4 quad) vf6)
|
|
(.svf (&-> prev-chan-ptr 5 quad) vf7)
|
|
;; but, modify our channel to add in the push_blend
|
|
(+! (-> (the-as (pointer float) a3-6) 0) interp2-selected-weight1)
|
|
)
|
|
(set! chan-vector-ptr (the-as (inline-array vector) (-> prev-chan-ptr 6)))
|
|
)
|
|
)
|
|
)
|
|
(((joint-control-command stack))
|
|
;; add together the last two stack frames, using the given weight.
|
|
(let* ((interp2-selected-weight2 (-> chan frame-interp interp2-selected-idx))
|
|
(one-minus-interp2 (- one interp2-selected-weight2))
|
|
;; back up 2 stack frames, to add them.
|
|
(chans-to-stack (the-as (inline-array vector) (-> chan-vector-ptr -12)))
|
|
)
|
|
(let ((a3-8 interp2-selected-weight2))
|
|
(.mov vf1 a3-8)
|
|
)
|
|
(let ((a3-9 one-minus-interp2))
|
|
(.mov vf2 a3-9)
|
|
)
|
|
;; load first stack frame
|
|
(.lvf vf3 (&-> chans-to-stack 0 quad))
|
|
(.lvf vf4 (&-> chans-to-stack 1 quad))
|
|
(.lvf vf5 (&-> chans-to-stack 2 quad))
|
|
(.lvf vf6 (&-> chans-to-stack 3 quad))
|
|
(.lvf vf7 (&-> chans-to-stack 4 quad))
|
|
(.lvf vf8 (&-> chans-to-stack 5 quad))
|
|
;; multiply by blend weight
|
|
(.mul.x.vf vf3 vf3 vf2)
|
|
(.mul.x.vf vf4 vf4 vf2)
|
|
(.mul.x.vf vf5 vf5 vf2)
|
|
(.mul.x.vf vf6 vf6 vf2)
|
|
(.mul.x.vf vf7 vf7 vf2)
|
|
(.mul.x.vf vf8 vf8 vf2)
|
|
;; load second stack frame
|
|
(.lvf vf9 (&-> chans-to-stack 6 quad))
|
|
(.lvf vf10 (&-> chans-to-stack 7 quad))
|
|
(.lvf vf11 (&-> chans-to-stack 8 quad))
|
|
(.lvf vf12 (&-> chans-to-stack 9 quad))
|
|
(.lvf vf13 (&-> chans-to-stack 10 quad))
|
|
(.lvf vf14 (&-> chans-to-stack 11 quad))
|
|
;; multiply by blend weight
|
|
(.mul.x.vf vf9 vf9 vf1)
|
|
(.mul.x.vf vf10 vf10 vf1)
|
|
(.mul.x.vf vf11 vf11 vf1)
|
|
(.mul.x.vf vf12 vf12 vf1)
|
|
(.mul.x.vf vf13 vf13 vf1)
|
|
(.mul.x.vf vf14 vf14 vf1)
|
|
;; the add!
|
|
(.add.vf vf3 vf3 vf9)
|
|
(.add.vf vf4 vf4 vf10)
|
|
(.add.vf vf5 vf5 vf11)
|
|
(.add.vf vf6 vf6 vf12)
|
|
(.add.vf vf7 vf7 vf13)
|
|
(.add.vf vf8 vf8 vf14)
|
|
;; overwrite the first
|
|
(.svf (&-> chans-to-stack 0 quad) vf3)
|
|
(.svf (&-> chans-to-stack 1 quad) vf4)
|
|
(.svf (&-> chans-to-stack 2 quad) vf5)
|
|
(.svf (&-> chans-to-stack 3 quad) vf6)
|
|
(.svf (&-> chans-to-stack 4 quad) vf7)
|
|
(.svf (&-> chans-to-stack 5 quad) vf8)
|
|
;; this ends up moving the stack pointer back 1 stack frame (went back 2, then fwd 1)
|
|
(set! chan-vector-ptr (the-as (inline-array vector) (&+ (the-as pointer chans-to-stack) 96)))
|
|
)
|
|
)
|
|
)
|
|
)
|
|
;; advance channel
|
|
(+! chan-float-offset 4)
|
|
)
|
|
)
|
|
|
|
;; now we have figured out all the weights for each channel - we need to figure out which animations need decompressing.
|
|
|
|
(let ((upload-count 0))
|
|
(dotimes (upload-chan-idx (the-as int chan-count))
|
|
;; only upload if the weight is nonzero.
|
|
(when (< 0.001 (-> (the-as terrain-context #x70000000) work foreground joint-work flatten-array upload-chan-idx))
|
|
|
|
;; determine integer frame we need
|
|
(let* ((upload-chan (-> jc channel upload-chan-idx))
|
|
(anim (-> upload-chan frame-group frames))
|
|
(frame-num (-> upload-chan frame-num))
|
|
(int-frame-num (the int frame-num))
|
|
(frame-frac (- frame-num (the float int-frame-num)))
|
|
)
|
|
(let ((last-frame (+ (-> anim num-frames) -1)))
|
|
(if (not (-> upload-chan frame-group))
|
|
(format 0 "Channel ~D skel ~A frame-group is #f!!!~%" upload-chan-idx jc)
|
|
)
|
|
;; if we're past the end, clamp
|
|
(when (>= int-frame-num (the-as int last-frame))
|
|
(set! frame-frac 0.0)
|
|
(set! int-frame-num (the-as int last-frame))
|
|
)
|
|
)
|
|
;; set up the upload:
|
|
(let ((upload (-> (the-as terrain-context #x70000000) work foreground joint-work uploads upload-count)))
|
|
(set! (-> upload fixed) (-> anim fixed))
|
|
(set! (-> upload fixed-qwc) (the-as int (-> anim fixed-qwc)))
|
|
(set! (-> upload frame) (-> anim data int-frame-num))
|
|
;; if we are a fractional frame, upload 2 frames so we can interpolate from the next.
|
|
(set! (-> upload frame-qwc) (the-as int (if (= frame-frac 0.0)
|
|
(-> anim frame-qwc)
|
|
(* (-> anim frame-qwc) 2)
|
|
)
|
|
)
|
|
)
|
|
(set! (-> upload amount)
|
|
(-> (the-as terrain-context #x70000000) work foreground joint-work flatten-array upload-chan-idx)
|
|
)
|
|
(set! (-> upload interp) frame-frac)
|
|
)
|
|
)
|
|
(+! upload-count 1)
|
|
)
|
|
)
|
|
(set! (-> (the-as terrain-context #x70000000) work foreground joint-work num-uploads) upload-count)
|
|
)
|
|
;; record amounts in the channel so we can print it for debug.
|
|
(dotimes (v1-26 (the-as int chan-count))
|
|
(set! (-> jc channel v1-26 inspector-amount)
|
|
(the-as
|
|
uint
|
|
(the int (* 255.0 (-> (the-as terrain-context #x70000000) work foreground joint-work flatten-array v1-26)))
|
|
)
|
|
)
|
|
)
|
|
)
|
|
0
|
|
)
|
|
)
|
|
|
|
```
|
|
|
|
|
|
# Make Joint Jump Tables
|
|
|
|
This function is building a number of jump tables like this:
|
|
```
|
|
lw v1, decompress-fixed-data-to-accumulator(s7) ;; get address of function
|
|
addiu a0, r0, 301 ;; offset into instruction (in instructions)
|
|
dsll a0, a0, 2 ;; convert offset to bytes
|
|
daddu v1, v1, a0 ;; get pointer to inside of function
|
|
lui a0, 28672 ;; store into the jump table (in the scratchpad)
|
|
sw v1, 1632(a0)
|
|
```
|
|
This is likely some strange macro, as I think the GOAL compiler would normally have constant propagated the `addiu`/`dsll` pair.
|
|
|
|
We can see the layout of the scratchpad here:
|
|
```lisp
|
|
(deftype joint-work (structure)
|
|
;; used in cspace<-matrix-no-push-joint!, which is not asm
|
|
((temp-mtx matrix :inline :offset-assert 0)
|
|
(joint-stack matrix-stack :inline :offset-assert 64)
|
|
|
|
;; the jump tables. seems like there's 3 different modes: fix (fixed), frm (frame), pair (2x frame)
|
|
(fix-jmp-table (function none) 16 :offset-assert 1616) ;; guessed by decompiler
|
|
(frm-jmp-table (function none) 16 :offset-assert 1680) ;; guessed by decompiler
|
|
(pair-jmp-table (function none) 16 :offset-assert 1744) ;; guessed by decompiler
|
|
|
|
;; upload records generated in the `flatten-joint-control-to-spr` function above
|
|
(uploads channel-upload-info 24 :inline :offset-assert 1808) ;; guessed by decompiler
|
|
(num-uploads int32 :offset-assert 2384)
|
|
|
|
;; "accumulators", which I believe will eventually contain the full transforms
|
|
(mtx-acc matrix 2 :inline :offset-assert 2400) ;; guessed by decompiler
|
|
(tq-acc transformq 100 :inline :offset-assert 2528) ;; guessed by decompiler
|
|
|
|
;; ?? likely destination for uploads
|
|
(jacp-hdr joint-anim-compressed-hdr :inline :offset-assert 7328)
|
|
(fixed-data joint-anim-compressed-fixed :inline :offset-assert 7392)
|
|
(frame-data joint-anim-compressed-frame 2 :inline :offset-assert 9600) ;; guessed by decompiler
|
|
|
|
;; used during the flatten-joint-control-to-spr function
|
|
(flatten-array float 576 :offset 2400) ;; guessed by decompiler
|
|
(flattened vector 24 :inline :offset 2400) ;; guessed by decompiler
|
|
)
|
|
```
|
|
|
|
and the upload type:
|
|
```
|
|
(deftype channel-upload-info (structure)
|
|
"Information about an upload of animation data to a single joint channel."
|
|
((fixed joint-anim-compressed-fixed :offset-assert 0)
|
|
(fixed-qwc int32 :offset-assert 4)
|
|
(frame joint-anim-compressed-frame :offset-assert 8)
|
|
(frame-qwc int32 :offset-assert 12)
|
|
(amount float :offset-assert 16)
|
|
(interp float :offset-assert 20)
|
|
)
|
|
```
|
|
|
|
I did not learn much from staring at this yet, so I am going to move on to `calc-animation-from-spr`, and revisit once I find the actual jumps.
|
|
|
|
# `calc-animation-from-spr`
|
|
|
|
|
|
This function is entirely assembly. I approach these by adding a comment to each line:
|
|
```python
|
|
# a0 = dst-joint-anim-frame
|
|
# a1 = num-joints
|
|
or v1, a1, r0 # v1 = num-joint
|
|
|
|
# backup registers
|
|
daddiu sp, sp, -192
|
|
sq s0, 0(sp)
|
|
sq s1, 16(sp)
|
|
sq s2, 32(sp)
|
|
sq s3, 48(sp)
|
|
sq s4, 64(sp)
|
|
sq s5, 80(sp)
|
|
sq s6, 96(sp)
|
|
sq t8, 112(sp)
|
|
sq t9, 128(sp)
|
|
sq gp, 144(sp)
|
|
sq fp, 160(sp)
|
|
sq ra, 176(sp)
|
|
|
|
daddiu sp, sp, -16 # allocate another 16 bytes on the stack
|
|
|
|
qmtc2.i vf15, r0 # vf15 = 0
|
|
sw a1, 0(sp) # 0(sp) = num-joints
|
|
lui v1, 28672
|
|
lw s1, 2384(v1) # s1 = num-uploads
|
|
daddiu t7, v1, 1808 # t7 = uploads
|
|
lui s0, 4096 # generating some constant here, likely DMA register
|
|
daddiu t1, v1, 7328 # t1 = jacp-hdr
|
|
beq s1, r0, L53 # early return if we don't have any uploads
|
|
ori s0, s0, 54272 # more DMA register constant stuff
|
|
|
|
B1:
|
|
lw t2, 0(t7) # t2 = fixed
|
|
addiu t3, r0, 7392 # t3 = fixed-data
|
|
lw t4, 4(t7) # t4 = fixed-qwc
|
|
addiu v1, r0, 256 # DMA constant?
|
|
sw t2, 16(s0) # DMA setup
|
|
vaddw.xyzw vf14, vf15, vf0 # vf14 = 1, 1, 1, 1
|
|
sw t3, 128(s0) # DMA setup
|
|
sll r0, r0, 0 #
|
|
sw t4, 32(s0) # DMA setup QWC
|
|
sync.l
|
|
sw v1, 0(s0) # DMA GO!!
|
|
sync.l
|
|
lw t9, clear-frame-accumulator(s7) # t9 = this function
|
|
vadd.yz vf14, vf14, vf14 # vf14 = [1, 2, 2, 1]
|
|
lw s2, 0(sp) # s2 = num-joint
|
|
sll r0, r0, 0
|
|
jalr ra, t9 # call!
|
|
vadd.yz vf14, vf14, vf14 # vf14 = [1, 4, 4, 1]
|
|
|
|
B2:
|
|
L49:
|
|
lw v1, 0(s0) # wait on DMA
|
|
sll r0, r0, 0
|
|
andi v1, v1, 256
|
|
sll r0, r0, 0
|
|
sll r0, r0, 0
|
|
sll r0, r0, 0
|
|
bne v1, r0, L49
|
|
sll r0, r0, 0
|
|
|
|
B3:
|
|
lw t2, 8(t7) # t2 = frame
|
|
addiu t3, r0, 9600 # t3 = frame-data
|
|
lw t4, 12(t7) # t4 = frame-qwc
|
|
addiu v1, r0, 256 # dma constant
|
|
sw t2, 16(s0) # dma (src addr)
|
|
sll r0, r0, 0
|
|
sw t3, 128(s0) # dma (dst addr)
|
|
sll r0, r0, 0
|
|
sw t4, 32(s0) # dma (qwc)
|
|
sync.l
|
|
sw v1, 0(s0) # dma (start)
|
|
sync.l
|
|
lw a2, 16(t7)
|
|
lui a1, 28672
|
|
lw t9, decompress-fixed-data-to-accumulator(s7)
|
|
daddiu a1, a1, 7392 # a1 = fixed-data
|
|
jalr ra, t9
|
|
daddiu s1, s1, -1 # decrement upload counter
|
|
|
|
B4:
|
|
L50:
|
|
lw v1, 0(s0) # wait on frame data dma
|
|
sll r0, r0, 0
|
|
andi v1, v1, 256
|
|
sll r0, r0, 0
|
|
sll r0, r0, 0
|
|
sll r0, r0, 0
|
|
bne v1, r0, L50
|
|
sll r0, r0, 0
|
|
|
|
B5:
|
|
beq s1, r0, L51 # jump over next dma start if there's nothing next
|
|
sll r0, r0, 0
|
|
|
|
B6:
|
|
lw t2, 24(t7) # dma the next upload
|
|
addiu t3, r0, 7392
|
|
lw t4, 28(t7)
|
|
addiu v1, r0, 256
|
|
sw t2, 16(s0)
|
|
sll r0, r0, 0
|
|
sw t3, 128(s0)
|
|
sll r0, r0, 0
|
|
sw t4, 32(s0)
|
|
sync.l
|
|
sw v1, 0(s0)
|
|
sync.l
|
|
B7:
|
|
L51:
|
|
lw t0, 20(t7) # t0 = interp (between consecutive frames)
|
|
lui a1, 28672
|
|
lw a2, 16(t7) # a2 = amount (weight)
|
|
daddiu a1, a1, 9600 # a1 = frame-data
|
|
beq t0, r0, L52 # if interp is 0, take the non-pair case, just one frame
|
|
sll r0, r0, 0
|
|
|
|
B8:
|
|
lw a3, 12(t7) # a3 = frame-qwc
|
|
sll r0, r0, 0
|
|
lw t9, decompress-frame-data-pair-to-accumulator(s7)
|
|
sll r0, r0, 0
|
|
jalr ra, t9
|
|
sll a3, a3, 3
|
|
|
|
bne s1, r0, L49
|
|
daddiu t7, t7, 24
|
|
|
|
B9:
|
|
lw t9, normalize-frame-quaternions(s7)
|
|
sll r0, r0, 0
|
|
lw s2, 0(sp)
|
|
sll r0, r0, 0
|
|
jalr ra, t9
|
|
sll r0, r0, 0
|
|
|
|
daddiu sp, sp, 16
|
|
sll r0, r0, 0
|
|
lq s0, 0(sp)
|
|
lq s1, 16(sp)
|
|
lq s2, 32(sp)
|
|
lq s3, 48(sp)
|
|
lq s4, 64(sp)
|
|
lq s5, 80(sp)
|
|
lq s6, 96(sp)
|
|
lq t8, 112(sp)
|
|
lq t9, 128(sp)
|
|
lq gp, 144(sp)
|
|
lq ra, 176(sp)
|
|
lq fp, 160(sp)
|
|
jr ra
|
|
daddiu sp, sp, 192
|
|
|
|
B10:
|
|
L52:
|
|
lw t9, decompress-frame-data-to-accumulator(s7)
|
|
sll r0, r0, 0
|
|
jalr ra, t9
|
|
sll r0, r0, 0
|
|
|
|
bne s1, r0, L49
|
|
daddiu t7, t7, 24
|
|
|
|
B11:
|
|
lw t9, normalize-frame-quaternions(s7)
|
|
sll r0, r0, 0
|
|
lw s2, 0(sp)
|
|
sll r0, r0, 0
|
|
jalr ra, t9
|
|
sll r0, r0, 0
|
|
|
|
B12:
|
|
L53:
|
|
daddiu sp, sp, 16
|
|
sll r0, r0, 0
|
|
lq s0, 0(sp)
|
|
lq s1, 16(sp)
|
|
lq s2, 32(sp)
|
|
lq s3, 48(sp)
|
|
lq s4, 64(sp)
|
|
lq s5, 80(sp)
|
|
lq s6, 96(sp)
|
|
lq t8, 112(sp)
|
|
lq t9, 128(sp)
|
|
lq gp, 144(sp)
|
|
lq ra, 176(sp)
|
|
lq fp, 160(sp)
|
|
jr ra
|
|
daddiu sp, sp, 192
|
|
|
|
jr ra
|
|
daddu sp, sp, r0
|
|
```
|
|
|
|
The basic operation is:
|
|
- call `clear-frame-accumulator`, to zero out the tq and matrix accumulators
|
|
- for each upload
|
|
- DMA the fixed data, then `decompress-fixed-data-to-accumulator`
|
|
- DMA the fixed data
|
|
- Call `decompress-frame-data-pair-to-accumulator` or `decompress-frame-data-to-accumulator`. Use the `pair` one if interp is nonzero, meaning we need to combine two consecutive frames
|
|
- Call `normalize-frame-quaternions`
|
|
|
|
|
|
# `clear-frame-accumulator`
|
|
Just sets the accumulator to 0. The first two joints are matrices and zeroed entirely. The remaining ones are `transformq` (position, quaternion, scale). The pos/quat are zeroed, but not the scale.
|
|
|
|
# `normalize-frame-quaternion`
|
|
This function simply sets `trans.w = 1`, `scale.w = 1`, and normalizes the quaternions.
|
|
The quaternions may be slightly off due to the blending or decompression.
|
|
|
|
```python
|
|
# s2 = num-joints
|
|
L91:
|
|
daddiu sp, sp, -16
|
|
B0:
|
|
daddiu s2, s2, -2 # subtract off first two matrix joints
|
|
sw a0, 0(sp)
|
|
daddiu a0, a0, 128 # seek past matrices in the accumulator
|
|
B1:
|
|
L92: # per transformq loop
|
|
lqc2 vf4, 16(a0) # vf4 = quat
|
|
lqc2 vf1, 0(a0) # vf1 = trans
|
|
lqc2 vf7, 32(a0) # vf7 = scale
|
|
vmul.xyzw vf10, vf4, vf4 # vf10 = x^2, y^2, z^2, w^2
|
|
vmove.w vf1, vf0 # vf1.w = 1
|
|
vmove.w vf7, vf0 # vf7.w = 1
|
|
vmulaw.xyzw acc, vf0, vf10 # sum up x^2, y^2, z^2, w^2
|
|
vmaddaz.xyzw acc, vf0, vf10
|
|
vmadday.xyzw acc, vf0, vf10
|
|
vmaddx.xyzw vf10, vf0, vf10
|
|
sqc2 vf1, 0(a0) # store fixed trans
|
|
sqc2 vf7, 32(a0) # store fixed scale
|
|
daddiu a0, a0, 48 # advance tq ptr
|
|
vrsqrt Q, vf0.w, vf10.w # get length
|
|
vwaitq
|
|
vmulq.xyzw vf4, vf4, Q # multiply by inverse length to normalize
|
|
daddiu s2, s2, -1 # dec count
|
|
bne s2, r0, L92
|
|
sqc2 vf4, -32(a0) # store fixed quat.
|
|
|
|
B2:
|
|
lw a0, 0(sp)
|
|
sll r0, r0, 0
|
|
jr ra
|
|
daddiu sp, sp, 16
|
|
|
|
jr ra
|
|
daddu sp, sp, r0
|
|
```
|
|
|
|
# `decompress-fixed-data-to-accumulator`
|
|
|
|
Useful types
|
|
```lisp
|
|
(deftype joint-anim-compressed-hdr (structure)
|
|
"Header for the compressed joint animation format."
|
|
((control-bits uint32 14 :offset-assert 0) ;; guessed by decompiler
|
|
(num-joints uint32 :offset-assert 56)
|
|
(matrix-bits uint32 :offset-assert 60)
|
|
)
|
|
:method-count-assert 9
|
|
:size-assert #x40
|
|
:flag-assert #x900000040
|
|
)
|
|
|
|
(deftype joint-anim-compressed-fixed (structure)
|
|
((hdr joint-anim-compressed-hdr :inline :offset-assert 0)
|
|
(offset-64 uint32 :offset-assert 64)
|
|
(offset-32 uint32 :offset-assert 68)
|
|
(offset-16 uint32 :offset-assert 72)
|
|
(reserved uint32 :offset-assert 76)
|
|
(data vector 133 :inline :offset-assert 80) ;; guessed by decompiler
|
|
)
|
|
```
|
|
|
|
```python
|
|
# a0 = accumulator
|
|
# a1 = fixed data (joint-anim-compressed-fixed)
|
|
# a2 = amount
|
|
# t1 = jacp-header (unset when we get here)
|
|
# s2 = num-joints
|
|
# vf14 = [1, 4, 4, 1]
|
|
|
|
L79:
|
|
# copy the header to the jacp-header (i think, so we can continue using in frame functions
|
|
# even when dma'ing the next upload's fixed to here)
|
|
lq t4, 0(a1) # t4 = fixed.hdr.quad[0]
|
|
daddiu sp, sp, -16
|
|
lq t5, 16(a1) # t4 = fixed.hdr.quad[1]
|
|
sll r0, r0, 0
|
|
sq t4, 0(t1) # move to jacp header
|
|
sll r0, r0, 0
|
|
sq t5, 16(t1) # move to jacp header
|
|
sll r0, r0, 0
|
|
lq t4, 32(a1) # quad2
|
|
sll r0, r0, 0
|
|
lq t5, 48(a1) # quad3
|
|
sll r0, r0, 0
|
|
sq t4, 32(t1)
|
|
sll r0, r0, 0
|
|
sq t5, 48(t1)
|
|
|
|
sll r0, r0, 0
|
|
sq a0, 0(sp) # backup acc pointer
|
|
sll r0, r0, 0
|
|
qmtc2.i vf13, a2 # vf13.x = amount
|
|
lui t2, 28672 # scratchpad addr
|
|
lw t4, 64(a1) # t4 = fixed.offset-64
|
|
daddiu v1, a1, 80 # v1 = fixed.data
|
|
lw t5, 68(a1) # t4 = fixed.offset-32
|
|
daddu s5, t1, r0 # s5 = control-bits-ptr
|
|
lw t6, 72(a1) # t6 = fixed.offset-16
|
|
daddu t4, t4, v1 # t4 = fixed.data + fixed.offset-64
|
|
vmulx.xyzw vf13, vf14, vf13 # vf13 = [amt, 4*amt, 4*amt, amt]
|
|
daddiu t2, t2, 1616 # t2 = fix-jmp-table
|
|
lw s2, 56(t1) # hdr.num_joints = num-joints
|
|
daddu t5, t5, v1 # t5 = fixed.data + fixed.offset-32
|
|
lw s4, 60(t1) # s4 = matrix-bits
|
|
daddu t6, t6, v1 # t6 = fixed.data + fixed.offset-16
|
|
addiu s3, r0, 8 # s3 = 8
|
|
daddiu s5, s5, 4 # s5 (control-bits-ptr) += 4 bytes
|
|
andi t3, s4, 1 # t3 = matrix-bits & 1
|
|
sll r0, r0, 0
|
|
bne t3, r0, L80
|
|
sll r0, r0, 0
|
|
|
|
B1: # if (matrix-bits & 1) == 0
|
|
lqc2 vf1, 0(t4) # load matrix from offset-64 data (1, 2, 4, 4)
|
|
lqc2 vf2, 16(t4)
|
|
lqc2 vf3, 32(t4)
|
|
lqc2 vf4, 48(t4)
|
|
lqc2 vf9, 0(a0) # load matrix from accumulator
|
|
daddiu t4, t4, 64 # increment data-64 ptr
|
|
lqc2 vf10, 16(a0)
|
|
sll r0, r0, 0
|
|
lqc2 vf11, 32(a0)
|
|
lqc2 vf12, 48(a0)
|
|
vmulaw.xyzw acc, vf9, vf0 # acc = existing
|
|
vmaddx.xyzw vf9, vf1, vf13 # existing += amount * new
|
|
vmulaw.xyzw acc, vf10, vf0
|
|
vmaddx.xyzw vf10, vf2, vf13
|
|
vmulaw.xyzw acc, vf11, vf0
|
|
vmaddx.xyzw vf11, vf3, vf13
|
|
vmulaw.xyzw acc, vf12, vf0
|
|
vmaddx.xyzw vf12, vf4, vf13
|
|
sqc2 vf9, 0(a0) # store back modified matrix.
|
|
sqc2 vf10, 16(a0)
|
|
sqc2 vf11, 32(a0)
|
|
sqc2 vf12, 48(a0)
|
|
B2:
|
|
L80: # endif
|
|
andi t3, s4, 2 # check matrix bit
|
|
daddiu a0, a0, 64 # increment acc ptr to second matrix
|
|
bne t3, r0, L81
|
|
sll r0, r0, 0
|
|
|
|
B3: # if (matrix-bits & 2) == 0
|
|
# same stuff as for the first matrix.
|
|
lqc2 vf1, 0(t4)
|
|
lqc2 vf2, 16(t4)
|
|
lqc2 vf3, 32(t4)
|
|
lqc2 vf4, 48(t4)
|
|
lqc2 vf9, 0(a0)
|
|
daddiu t4, t4, 64
|
|
lqc2 vf10, 16(a0)
|
|
sll r0, r0, 0
|
|
lqc2 vf11, 32(a0)
|
|
lqc2 vf12, 48(a0)
|
|
vmulaw.xyzw acc, vf9, vf0
|
|
vmaddx.xyzw vf9, vf1, vf13
|
|
vmulaw.xyzw acc, vf10, vf0
|
|
vmaddx.xyzw vf10, vf2, vf13
|
|
vmulaw.xyzw acc, vf11, vf0
|
|
vmaddx.xyzw vf11, vf3, vf13
|
|
vmulaw.xyzw acc, vf12, vf0
|
|
vmaddx.xyzw vf12, vf4, vf13
|
|
sqc2 vf9, 0(a0)
|
|
sqc2 vf10, 16(a0)
|
|
sqc2 vf11, 32(a0)
|
|
sqc2 vf12, 48(a0)
|
|
B4:
|
|
L81:
|
|
lw s4, -4(s5) # load control-bits-ptr[-1]
|
|
daddiu a0, a0, 64 # increment to first TQ
|
|
B5:
|
|
L82:
|
|
andi t3, s4, 15 # grab lower 4 bits of the control bits
|
|
sra s4, s4, 4 # shift in next 4
|
|
sll t3, t3, 2 # multiply by 4 (ah - this tells us which case!)
|
|
daddiu s3, s3, -1 # count down if we need to load the next u32 control bits or not.
|
|
daddu t3, t3, t2 # look up in the jump table
|
|
daddiu s2, s2, -1 # count down joints
|
|
lw t3, 0(t3) # load from jump table!
|
|
sll r0, r0, 0
|
|
sll r0, r0, 0
|
|
sll r0, r0, 0
|
|
jr t3
|
|
sll r0, r0, 0
|
|
|
|
B6: # JUMP TABLE 7 and 15, also used to advance
|
|
L83:
|
|
beq s2, r0, L90 # if out of joints, end.
|
|
daddiu a0, a0, 48 # increment acc ptr.
|
|
|
|
B7:
|
|
bne s3, r0, L82 # see if we need to load a new control word or not.
|
|
sll r0, r0, 0
|
|
|
|
B8:
|
|
lw s4, 0(s5) # load up new control word
|
|
daddiu s5, s5, 4
|
|
beq r0, r0, L82
|
|
addiu s3, r0, 8 # back to the beginning!
|
|
|
|
B9: # JUMP TABLE 0
|
|
ld t9, 0(t4) # load one uint64 from data64
|
|
daddiu t4, t4, 8
|
|
lqc2 vf6, 16(a0) # vf6 = acc.quat
|
|
pextlh t9, t9, r0 # expand to uint32's (upper)
|
|
psraw t9, t9, 16 # treat as signed s16
|
|
qmtc2.i vf4, t9 # vf4 = new_data (sext)
|
|
lw s6, 0(t5) # load one uint32 from data32
|
|
daddiu t5, t5, 4
|
|
lh gp, 0(t6) # load one uint16 from data16
|
|
daddiu t6, t6, 2
|
|
lqc2 vf3, 0(a0) # vf3 = acc.trans
|
|
vitof15.xyzw vf4, vf4 # quat int -> float
|
|
pextlw s6, gp, s6 # s6 = [XX, XX, u16, u32]
|
|
pextlh s6, s6, r0 # s6 = [XX, u16, u32[1], u32[0]]
|
|
psraw s6, s6, 16 # sext
|
|
vmul.xyzw vf10, vf4, vf6 # elementwise quat product
|
|
qmtc2.i vf1, s6 # vf1 = the first triple
|
|
lw t8, 0(t5) # load another uint32, uint16
|
|
daddiu t5, t5, 4
|
|
lh fp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
vmulaw.xyzw acc, vf10, vf0 # add up quaternions (dot product)
|
|
vmaddaz.xyzw acc, vf0, vf10
|
|
vmadday.xyzw acc, vf0, vf10
|
|
vmaddx.xyzw vf10, vf0, vf10
|
|
|
|
vitof0.xyzw vf1, vf1 # first triple to float (trans)
|
|
lqc2 vf9, 32(a0) # acc scale
|
|
pextlw t8, fp, t8 # triple stuff
|
|
qmfc2.i t9, vf10 # quat dot to grp
|
|
pextlh t8, t8, r0 # triple stuff
|
|
psraw t8, t8, 16 # triple sext
|
|
qmtc2.i vf7, t8 # second triple
|
|
pcpyud t9, t9, r0 # check sign of quaternion
|
|
bltzl t9, L84
|
|
B10:
|
|
vsub.xyzw vf4, vf15, vf4 # flip quaternion if dot product is negative.
|
|
|
|
B11:
|
|
L84:
|
|
vitof12.xyzw vf7, vf7 # second triple (scale)
|
|
vmulaw.xyzw acc, vf6, vf0
|
|
vmaddx.xyzw vf6, vf4, vf13 # vf6 = acc.quat + amount * decompressed_quat
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddy.xyzw vf3, vf1, vf13 # vf3 = acc.trans + amount * 4 * decompressed_trans
|
|
vmulaw.xyzw acc, vf9, vf0 # vf9 = acc.scale + amount * decompressed_scale
|
|
vmaddx.xyzw vf9, vf7, vf13
|
|
sqc2 vf3, 0(a0)
|
|
sqc2 vf6, 16(a0)
|
|
sqc2 vf9, 32(a0)
|
|
beq r0, r0, L83
|
|
sll r0, r0, 0
|
|
|
|
B12: # JUMP TABLE 8
|
|
ld t9, 8(t4) # one ahead data64 load??
|
|
daddiu t4, t4, 8
|
|
lqc2 vf6, 16(a0) # vf6 = acc.quat
|
|
pextlh t9, t9, r0 #
|
|
psraw t9, t9, 16
|
|
qmtc2.i vf4, t9 # quat expand
|
|
ld s6, -8(t4) # now load the previous data64
|
|
daddiu t4, t4, 8
|
|
lw gp, 0(t5) # load one uint32 from data32
|
|
daddiu t5, t5, 4
|
|
lqc2 vf3, 0(a0) # vf3 = acc.trans
|
|
vitof15.xyzw vf4, vf4 # quat int->float
|
|
pcpyld s6, gp, s6 # trans combingin ints
|
|
vmul.xyzw vf10, vf4, vf6 # quat dot
|
|
qmtc2.i vf1, s6 # trans int
|
|
lw t8, 0(t5) # scale load
|
|
daddiu t5, t5, 4
|
|
lh fp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
vmulaw.xyzw acc, vf10, vf0 # quat dot add
|
|
vmaddaz.xyzw acc, vf0, vf10
|
|
vmadday.xyzw acc, vf0, vf10
|
|
vmaddx.xyzw vf10, vf0, vf10
|
|
lqc2 vf9, 32(a0) # existing scale
|
|
pextlw t8, fp, t8
|
|
qmfc2.i t9, vf10
|
|
pextlh t8, t8, r0
|
|
psraw t8, t8, 16
|
|
qmtc2.i vf7, t8
|
|
pcpyud t9, t9, r0
|
|
bltzl t9, L85
|
|
B13:
|
|
vsub.xyzw vf4, vf15, vf4
|
|
|
|
B14:
|
|
L85:
|
|
vitof12.xyzw vf7, vf7
|
|
vmulaw.xyzw acc, vf6, vf0
|
|
vmaddx.xyzw vf6, vf4, vf13
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddx.xyzw vf3, vf1, vf13
|
|
vmulaw.xyzw acc, vf9, vf0
|
|
vmaddx.xyzw vf9, vf7, vf13
|
|
sqc2 vf3, 0(a0)
|
|
sqc2 vf6, 16(a0)
|
|
sqc2 vf9, 32(a0)
|
|
beq r0, r0, L83
|
|
sll r0, r0, 0
|
|
|
|
B15: # JUMP 1, 9
|
|
ld t9, 0(t4) # 64 load for quat
|
|
daddiu t4, t4, 8
|
|
lqc2 vf6, 16(a0) # vf6 = acc.quat
|
|
pextlh t9, t9, r0
|
|
psraw t9, t9, 16
|
|
qmtc2.i vf4, t9 # vf4 = quat expand int
|
|
lw t8, 0(t5) # load a 32-16
|
|
daddiu t5, t5, 4
|
|
lh fp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
lqc2 vf9, 32(a0) # vf9 = acc.scale
|
|
vitof15.xyzw vf4, vf4
|
|
pextlw t8, fp, t8
|
|
pextlh t8, t8, r0
|
|
psraw t8, t8, 16 # scale expand
|
|
vmul.xyzw vf10, vf4, vf6
|
|
qmtc2.i vf7, t8 # vf7 = scale
|
|
vmulaw.xyzw acc, vf10, vf0
|
|
vitof12.xyzw vf7, vf7
|
|
vmaddaz.xyzw acc, vf0, vf10
|
|
vmadday.xyzw acc, vf0, vf10
|
|
vmaddx.xyzw vf10, vf0, vf10
|
|
vmulaw.xyzw acc, vf9, vf0
|
|
vmaddx.xyzw vf9, vf7, vf13
|
|
qmfc2.i t9, vf10
|
|
pcpyud t9, t9, r0
|
|
bltzl t9, L86
|
|
B16:
|
|
vsub.xyzw vf4, vf15, vf4
|
|
|
|
B17:
|
|
L86:
|
|
vmulaw.xyzw acc, vf6, vf0
|
|
vmaddx.xyzw vf6, vf4, vf13
|
|
sqc2 vf6, 16(a0)
|
|
sqc2 vf9, 32(a0)
|
|
beq r0, r0, L83
|
|
sll r0, r0, 0
|
|
|
|
B18: # JUMP 2
|
|
lw s6, 0(t5) # load 32, 16
|
|
daddiu t5, t5, 4
|
|
lh gp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
lqc2 vf3, 0(a0) # old trans
|
|
pextlw s6, gp, s6
|
|
pextlh s6, s6, r0
|
|
psraw s6, s6, 16
|
|
qmtc2.i vf1, s6
|
|
lw t8, 0(t5) # load 32, 16
|
|
daddiu t5, t5, 4
|
|
lh fp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
lqc2 vf9, 32(a0)
|
|
vitof0.xyzw vf1, vf1
|
|
pextlw t8, fp, t8
|
|
pextlh t8, t8, r0
|
|
psraw t8, t8, 16
|
|
qmtc2.i vf7, t8
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddy.xyzw vf3, vf1, vf13
|
|
vitof12.xyzw vf7, vf7
|
|
vmulaw.xyzw acc, vf9, vf0
|
|
vmaddx.xyzw vf9, vf7, vf13
|
|
sqc2 vf3, 0(a0)
|
|
sqc2 vf9, 32(a0)
|
|
beq r0, r0, L83
|
|
sll r0, r0, 0
|
|
|
|
B19: # JUMP 10
|
|
ld s6, 0(t4)
|
|
daddiu t4, t4, 8
|
|
lw gp, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lqc2 vf3, 0(a0)
|
|
pcpyld s6, gp, s6
|
|
qmtc2.i vf1, s6
|
|
lw t8, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lh fp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
lqc2 vf9, 32(a0)
|
|
pextlw t8, fp, t8
|
|
pextlh t8, t8, r0
|
|
psraw t8, t8, 16
|
|
qmtc2.i vf7, t8
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddx.xyzw vf3, vf1, vf13
|
|
vitof12.xyzw vf7, vf7
|
|
vmulaw.xyzw acc, vf9, vf0
|
|
vmaddx.xyzw vf9, vf7, vf13
|
|
sqc2 vf3, 0(a0)
|
|
sqc2 vf9, 32(a0)
|
|
beq r0, r0, L83
|
|
sll r0, r0, 0
|
|
|
|
B20: # JUMP 3, 11
|
|
lw t8, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lh fp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
lqc2 vf9, 32(a0)
|
|
pextlw t8, fp, t8
|
|
pextlh t8, t8, r0
|
|
psraw t8, t8, 16
|
|
qmtc2.i vf7, t8
|
|
vitof12.xyzw vf7, vf7
|
|
vmulaw.xyzw acc, vf9, vf0
|
|
vmaddx.xyzw vf9, vf7, vf13
|
|
sqc2 vf9, 32(a0)
|
|
beq r0, r0, L83
|
|
sll r0, r0, 0
|
|
|
|
B21: # JUMP 4
|
|
ld t9, 0(t4)
|
|
daddiu t4, t4, 8
|
|
lqc2 vf6, 16(a0)
|
|
pextlh t9, t9, r0
|
|
psraw t9, t9, 16
|
|
qmtc2.i vf4, t9
|
|
lw s6, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lh gp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
lqc2 vf3, 0(a0)
|
|
vitof15.xyzw vf4, vf4
|
|
pextlw s6, gp, s6
|
|
pextlh s6, s6, r0
|
|
psraw s6, s6, 16
|
|
vmul.xyzw vf10, vf4, vf6
|
|
qmtc2.i vf1, s6
|
|
vmulaw.xyzw acc, vf10, vf0
|
|
vitof0.xyzw vf1, vf1
|
|
vmaddaz.xyzw acc, vf0, vf10
|
|
vmadday.xyzw acc, vf0, vf10
|
|
vmaddx.xyzw vf10, vf0, vf10
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddy.xyzw vf3, vf1, vf13
|
|
qmfc2.i t9, vf10
|
|
pcpyud t9, t9, r0
|
|
bltzl t9, L87
|
|
B22:
|
|
vsub.xyzw vf4, vf15, vf4
|
|
|
|
B23:
|
|
L87:
|
|
vmulaw.xyzw acc, vf6, vf0
|
|
vmaddx.xyzw vf6, vf4, vf13
|
|
sqc2 vf3, 0(a0)
|
|
sqc2 vf6, 16(a0)
|
|
beq r0, r0, L83
|
|
sll r0, r0, 0
|
|
|
|
B24: # JUMP 12
|
|
ld t9, 8(t4)
|
|
daddiu t4, t4, 8
|
|
lqc2 vf6, 16(a0)
|
|
pextlh t9, t9, r0
|
|
psraw t9, t9, 16
|
|
qmtc2.i vf4, t9
|
|
ld s6, -8(t4)
|
|
daddiu t4, t4, 8
|
|
lw gp, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lqc2 vf3, 0(a0)
|
|
vitof15.xyzw vf4, vf4
|
|
pcpyld s6, gp, s6
|
|
vmul.xyzw vf10, vf4, vf6
|
|
qmtc2.i vf1, s6
|
|
vmulaw.xyzw acc, vf10, vf0
|
|
vmaddaz.xyzw acc, vf0, vf10
|
|
vmadday.xyzw acc, vf0, vf10
|
|
vmaddx.xyzw vf10, vf0, vf10
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddx.xyzw vf3, vf1, vf13
|
|
qmfc2.i t9, vf10
|
|
pcpyud t9, t9, r0
|
|
bltzl t9, L88
|
|
B25:
|
|
vsub.xyzw vf4, vf15, vf4
|
|
|
|
B26:
|
|
L88:
|
|
vmulaw.xyzw acc, vf6, vf0
|
|
vmaddx.xyzw vf6, vf4, vf13
|
|
sqc2 vf3, 0(a0)
|
|
sqc2 vf6, 16(a0)
|
|
beq r0, r0, L83
|
|
sll r0, r0, 0
|
|
|
|
B27: # JUMP 5, 13
|
|
ld t9, 0(t4)
|
|
daddiu t4, t4, 8
|
|
lqc2 vf6, 16(a0)
|
|
pextlh t9, t9, r0
|
|
psraw t9, t9, 16
|
|
qmtc2.i vf4, t9
|
|
vitof15.xyzw vf4, vf4
|
|
vmul.xyzw vf10, vf4, vf6
|
|
vmulaw.xyzw acc, vf10, vf0
|
|
vmaddaz.xyzw acc, vf0, vf10
|
|
vmadday.xyzw acc, vf0, vf10
|
|
vmaddx.xyzw vf10, vf0, vf10
|
|
qmfc2.i t9, vf10
|
|
pcpyud t9, t9, r0
|
|
bltzl t9, L89
|
|
B28:
|
|
vsub.xyzw vf4, vf15, vf4
|
|
|
|
B29:
|
|
L89:
|
|
vmulaw.xyzw acc, vf6, vf0
|
|
vmaddx.xyzw vf6, vf4, vf13
|
|
sqc2 vf6, 16(a0)
|
|
beq r0, r0, L83
|
|
sll r0, r0, 0
|
|
|
|
B30: # JUMP 6
|
|
lw s6, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lh gp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
lqc2 vf3, 0(a0)
|
|
pextlw s6, gp, s6
|
|
pextlh s6, s6, r0
|
|
psraw s6, s6, 16
|
|
qmtc2.i vf1, s6
|
|
vitof0.xyzw vf1, vf1
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddy.xyzw vf3, vf1, vf13
|
|
sqc2 vf3, 0(a0)
|
|
beq r0, r0, L83
|
|
sll r0, r0, 0
|
|
|
|
B31:
|
|
ld s6, 0(t4)
|
|
daddiu t4, t4, 8
|
|
lw gp, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lqc2 vf3, 0(a0)
|
|
pcpyld s6, gp, s6
|
|
qmtc2.i vf1, s6
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddx.xyzw vf3, vf1, vf13
|
|
sqc2 vf3, 0(a0)
|
|
beq r0, r0, L83
|
|
sll r0, r0, 0
|
|
|
|
B32:
|
|
L90:
|
|
lq a0, 0(sp)
|
|
sll r0, r0, 0
|
|
jr ra
|
|
daddiu sp, sp, 16
|
|
|
|
jr ra
|
|
daddu sp, sp, r0
|
|
```
|
|
|
|
Explanation:
|
|
```python
|
|
# set up pointers
|
|
data64 = data + offset_64
|
|
data32 = data + offset_32
|
|
data16 = data + offset_16
|
|
|
|
# do matrices
|
|
if (matrix_bits & 1) == 0:
|
|
acc.matrix[0] += amount * Matrix(data64)
|
|
data64 += sizeof(Matrix)
|
|
|
|
if (matrix_bits & 2) == 0:
|
|
acc.matrix[1] += amount * Matrix(data64)
|
|
data64 += sizeof(Matrix)
|
|
|
|
# decomp helpers
|
|
def unpack_quat_s16s_from_d64():
|
|
data = S16x4(data64)
|
|
data64++
|
|
quat_ints = [data[0], data[1], data[2], data[3]]
|
|
|
|
# the control words have 4-bits per joint
|
|
|
|
for joint_idx in range(num_joints):
|
|
control = control_words.extract_4bits(joint_idx)
|
|
if control == 0:
|
|
# quat is packed as 4x s16's in data64, scale is 32768
|
|
# dot product is checked before lerping.
|
|
# trans is packed as 2x s16's in data32, 1x s16 in data16, scale is 4
|
|
# scale is packed as 2x s16's in data32, 1x s16 in data16, scale is 4096
|
|
|
|
if control == 7:
|
|
pass # nothing to do!
|
|
if control == 8:
|
|
# trans is packed as 2x s32's in data64, 1x s32 in data32, scale is 1
|
|
# quat is packed as 4x s16's in data64, scale is 32768
|
|
# dot product is checked before lerping.
|
|
# scale is packed as 2x s16's in data32, 1x s16 in data16, scale is 4096
|
|
if control == 15:
|
|
pass # nothing to do!
|
|
|
|
```
|
|
|
|
Control Modes:
|
|
|
|
0. FIXED has `trans-32-16-4`, `quat-64-32768`, `scale-32-16-4096`, FRAME has nothing
|
|
1. FIXED has `trans-0`, `quat-64-32768`, `scale-32-16-4096`, FRAME has trans (small)
|
|
2. FIXED has `trans-32-16-4`, `quat-0`, `scale-32-16-4096`, FRAME has quat
|
|
3. FIXED has `trans-0`, `quat-0`, `scale-32-16-4096`
|
|
4. FIXED has `trans-32-16-4`, `quat-64-32768`, `scale-0`
|
|
5. FIXED has `trans-0`, `quat-64-32768`, `scale-0`
|
|
6. FIXED has `trans-32-16-4`, `quat-0`, `scale-0`
|
|
7. FIXED 0
|
|
8. FIXED as `trans-64-32-1`, `quat-64-32768`, `scale-32-16-4096`, FRAME has nothing
|
|
9. FIXED has `trans-0`, `quat-64-32768`, `scale-32-16-4096`, FRAME has trans (large)
|
|
10. FIXED has `trans-64-32-1`, `quat-0`, `scale-32-16-4096`
|
|
11. FIXED has `trans-0`, `quat-0`, `scale-32-16-4096`, FRAME HAS QUAT
|
|
12. FIXED has `trans-64-32-1`, `quat-64-32768`, `scale-0`
|
|
13. FIXED has `trans-0`, `quat-64-32768`, `scale-0`
|
|
14. FIXED has `trans-64-32-1`, `quat-0`, `scale-0`
|
|
15. FIXED 0
|
|
|
|
So bits are:
|
|
control:
|
|
|
|
0. frame-trans
|
|
1. frame-quat
|
|
2. frame-scale
|
|
4. big-trans
|
|
|
|
matrix:
|
|
|
|
0. matrix 0 from frame
|
|
1. matrix 1 from frame
|
|
|
|
# `decompress-frame-data-to-accumulator`
|
|
|
|
```lisp
|
|
(deftype joint-anim-compressed-frame (structure)
|
|
((offset-64 uint32 :offset-assert 0)
|
|
(offset-32 uint32 :offset-assert 4)
|
|
(offset-16 uint32 :offset-assert 8)
|
|
(reserved uint32 :offset-assert 12)
|
|
(data vector 133 :inline :offset-assert 16) ;; guessed by decompiler
|
|
)
|
|
:method-count-assert 9
|
|
:size-assert #x860
|
|
:flag-assert #x900000860
|
|
)
|
|
```
|
|
|
|
```python
|
|
# a2 = amount (weight)
|
|
# a1 = frame-data
|
|
|
|
qmtc2.i vf13, a2 # vf13 = amount
|
|
sq a0, 0(sp) # back up accumulator
|
|
lui t2, 28672 # t2 = spad
|
|
lw t4, 0(a1) # t4 = offset64
|
|
daddiu v1, a1, 16 # v1 = data
|
|
lw t5, 4(a1) # offset32
|
|
daddu s5, t1, r0 # control bits ptr
|
|
lw t6, 8(a1) # offset16
|
|
daddu t4, t4, v1 # t4 = data64
|
|
vmulx.xyzw vf13, vf14, vf13 # usual [amt, 4*amt, 4*amt, amt] stuff
|
|
daddiu t2, t2, 1680 # t2 = jump table
|
|
lw s2, 56(t1) # s2 = num-joints
|
|
daddu t5, t5, v1 # t5 = data32
|
|
lw s4, 60(t1) # s4 = matrix-bits
|
|
daddu t6, t6, v1 # t6 = data16
|
|
addiu s3, r0, 8 # load next u32 word counter
|
|
daddiu s5, s5, 4 # control word ptr
|
|
andi t3, s4, 1 # check matrix bit
|
|
sll r0, r0, 0
|
|
beq t3, r0, L68
|
|
sll r0, r0, 0
|
|
|
|
B1: # if matrix bit is set:
|
|
lqc2 vf1, 0(t4)
|
|
lqc2 vf2, 16(t4)
|
|
lqc2 vf3, 32(t4)
|
|
lqc2 vf4, 48(t4)
|
|
lqc2 vf9, 0(a0)
|
|
daddiu t4, t4, 64
|
|
lqc2 vf10, 16(a0)
|
|
sll r0, r0, 0
|
|
lqc2 vf11, 32(a0)
|
|
lqc2 vf12, 48(a0)
|
|
vmulaw.xyzw acc, vf9, vf0
|
|
vmaddx.xyzw vf9, vf1, vf13
|
|
vmulaw.xyzw acc, vf10, vf0
|
|
vmaddx.xyzw vf10, vf2, vf13
|
|
vmulaw.xyzw acc, vf11, vf0
|
|
vmaddx.xyzw vf11, vf3, vf13
|
|
vmulaw.xyzw acc, vf12, vf0
|
|
vmaddx.xyzw vf12, vf4, vf13
|
|
sqc2 vf9, 0(a0)
|
|
sqc2 vf10, 16(a0)
|
|
sqc2 vf11, 32(a0)
|
|
sqc2 vf12, 48(a0)
|
|
B2:
|
|
L68:
|
|
andi t3, s4, 2
|
|
daddiu a0, a0, 64
|
|
beq t3, r0, L69
|
|
sll r0, r0, 0
|
|
|
|
B3:
|
|
lqc2 vf1, 0(t4)
|
|
lqc2 vf2, 16(t4)
|
|
lqc2 vf3, 32(t4)
|
|
lqc2 vf4, 48(t4)
|
|
lqc2 vf9, 0(a0)
|
|
daddiu t4, t4, 64
|
|
lqc2 vf10, 16(a0)
|
|
sll r0, r0, 0
|
|
lqc2 vf11, 32(a0)
|
|
lqc2 vf12, 48(a0)
|
|
vmulaw.xyzw acc, vf9, vf0
|
|
vmaddx.xyzw vf9, vf1, vf13
|
|
vmulaw.xyzw acc, vf10, vf0
|
|
vmaddx.xyzw vf10, vf2, vf13
|
|
vmulaw.xyzw acc, vf11, vf0
|
|
vmaddx.xyzw vf11, vf3, vf13
|
|
vmulaw.xyzw acc, vf12, vf0
|
|
vmaddx.xyzw vf12, vf4, vf13
|
|
sqc2 vf9, 0(a0)
|
|
sqc2 vf10, 16(a0)
|
|
sqc2 vf11, 32(a0)
|
|
sqc2 vf12, 48(a0)
|
|
B4:
|
|
L69:
|
|
lw s4, -4(s5)
|
|
daddiu a0, a0, 64
|
|
B5:
|
|
L70:
|
|
andi t3, s4, 15
|
|
sra s4, s4, 4
|
|
sll t3, t3, 2
|
|
daddiu s3, s3, -1
|
|
daddu t3, t3, t2
|
|
daddiu s2, s2, -1
|
|
lw t3, 0(t3)
|
|
sll r0, r0, 0
|
|
sll r0, r0, 0
|
|
sll r0, r0, 0
|
|
jr t3
|
|
sll r0, r0, 0
|
|
|
|
B6: # JUMP 0, 8
|
|
L71:
|
|
beq s2, r0, L78
|
|
daddiu a0, a0, 48
|
|
|
|
B7:
|
|
bne s3, r0, L70
|
|
sll r0, r0, 0
|
|
|
|
B8:
|
|
lw s4, 0(s5)
|
|
daddiu s5, s5, 4
|
|
beq r0, r0, L70
|
|
addiu s3, r0, 8
|
|
|
|
B9: # JUMP 1
|
|
lw s6, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lh gp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
lqc2 vf3, 0(a0)
|
|
pextlw s6, gp, s6
|
|
pextlh s6, s6, r0
|
|
psraw s6, s6, 16
|
|
qmtc2.i vf1, s6
|
|
vitof0.xyzw vf1, vf1
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddy.xyzw vf3, vf1, vf13
|
|
sqc2 vf3, 0(a0)
|
|
beq r0, r0, L71
|
|
sll r0, r0, 0
|
|
|
|
B10: # JUMP 9
|
|
ld s6, 0(t4)
|
|
daddiu t4, t4, 8
|
|
lw gp, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lqc2 vf3, 0(a0)
|
|
pcpyld s6, gp, s6
|
|
qmtc2.i vf1, s6
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddx.xyzw vf3, vf1, vf13
|
|
sqc2 vf3, 0(a0)
|
|
beq r0, r0, L71
|
|
sll r0, r0, 0
|
|
|
|
B11:
|
|
ld t9, 0(t4)
|
|
daddiu t4, t4, 8
|
|
lqc2 vf6, 16(a0)
|
|
pextlh t9, t9, r0
|
|
psraw t9, t9, 16
|
|
qmtc2.i vf4, t9
|
|
vitof15.xyzw vf4, vf4
|
|
vmul.xyzw vf10, vf4, vf6
|
|
vmulaw.xyzw acc, vf10, vf0
|
|
vmaddaz.xyzw acc, vf0, vf10
|
|
vmadday.xyzw acc, vf0, vf10
|
|
vmaddx.xyzw vf10, vf0, vf10
|
|
qmfc2.i t9, vf10
|
|
pcpyud t9, t9, r0
|
|
bltzl t9, L72
|
|
B12:
|
|
vsub.xyzw vf4, vf15, vf4
|
|
|
|
B13:
|
|
L72:
|
|
vmulaw.xyzw acc, vf6, vf0
|
|
vmaddx.xyzw vf6, vf4, vf13
|
|
sqc2 vf6, 16(a0)
|
|
beq r0, r0, L71
|
|
sll r0, r0, 0
|
|
|
|
B14:
|
|
ld t9, 0(t4)
|
|
daddiu t4, t4, 8
|
|
lqc2 vf6, 16(a0)
|
|
pextlh t9, t9, r0
|
|
psraw t9, t9, 16
|
|
qmtc2.i vf4, t9
|
|
lw s6, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lh gp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
lqc2 vf3, 0(a0)
|
|
vitof15.xyzw vf4, vf4
|
|
pextlw s6, gp, s6
|
|
pextlh s6, s6, r0
|
|
psraw s6, s6, 16
|
|
vmul.xyzw vf10, vf4, vf6
|
|
qmtc2.i vf1, s6
|
|
vmulaw.xyzw acc, vf10, vf0
|
|
vitof0.xyzw vf1, vf1
|
|
vmaddaz.xyzw acc, vf0, vf10
|
|
vmadday.xyzw acc, vf0, vf10
|
|
vmaddx.xyzw vf10, vf0, vf10
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddy.xyzw vf3, vf1, vf13
|
|
qmfc2.i t9, vf10
|
|
pcpyud t9, t9, r0
|
|
bltzl t9, L73
|
|
B15:
|
|
vsub.xyzw vf4, vf15, vf4
|
|
|
|
B16:
|
|
L73:
|
|
vmulaw.xyzw acc, vf6, vf0
|
|
vmaddx.xyzw vf6, vf4, vf13
|
|
sqc2 vf3, 0(a0)
|
|
sqc2 vf6, 16(a0)
|
|
beq r0, r0, L71
|
|
sll r0, r0, 0
|
|
|
|
B17:
|
|
ld t9, 8(t4)
|
|
daddiu t4, t4, 8
|
|
lqc2 vf6, 16(a0)
|
|
pextlh t9, t9, r0
|
|
psraw t9, t9, 16
|
|
qmtc2.i vf4, t9
|
|
ld s6, -8(t4)
|
|
daddiu t4, t4, 8
|
|
lw gp, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lqc2 vf3, 0(a0)
|
|
vitof15.xyzw vf4, vf4
|
|
pcpyld s6, gp, s6
|
|
vmul.xyzw vf10, vf4, vf6
|
|
qmtc2.i vf1, s6
|
|
vmulaw.xyzw acc, vf10, vf0
|
|
vmaddaz.xyzw acc, vf0, vf10
|
|
vmadday.xyzw acc, vf0, vf10
|
|
vmaddx.xyzw vf10, vf0, vf10
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddx.xyzw vf3, vf1, vf13
|
|
qmfc2.i t9, vf10
|
|
pcpyud t9, t9, r0
|
|
bltzl t9, L74
|
|
B18:
|
|
vsub.xyzw vf4, vf15, vf4
|
|
|
|
B19:
|
|
L74:
|
|
vmulaw.xyzw acc, vf6, vf0
|
|
vmaddx.xyzw vf6, vf4, vf13
|
|
sqc2 vf3, 0(a0)
|
|
sqc2 vf6, 16(a0)
|
|
beq r0, r0, L71
|
|
sll r0, r0, 0
|
|
|
|
B20:
|
|
lw t8, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lh fp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
lqc2 vf9, 32(a0)
|
|
pextlw t8, fp, t8
|
|
pextlh t8, t8, r0
|
|
psraw t8, t8, 16
|
|
qmtc2.i vf7, t8
|
|
vitof12.xyzw vf7, vf7
|
|
vmulaw.xyzw acc, vf9, vf0
|
|
vmaddx.xyzw vf9, vf7, vf13
|
|
sqc2 vf9, 32(a0)
|
|
beq r0, r0, L71
|
|
sll r0, r0, 0
|
|
|
|
B21:
|
|
lw s6, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lh gp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
lqc2 vf3, 0(a0)
|
|
pextlw s6, gp, s6
|
|
pextlh s6, s6, r0
|
|
psraw s6, s6, 16
|
|
qmtc2.i vf1, s6
|
|
lw t8, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lh fp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
lqc2 vf9, 32(a0)
|
|
vitof0.xyzw vf1, vf1
|
|
pextlw t8, fp, t8
|
|
pextlh t8, t8, r0
|
|
psraw t8, t8, 16
|
|
qmtc2.i vf7, t8
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddy.xyzw vf3, vf1, vf13
|
|
vitof12.xyzw vf7, vf7
|
|
vmulaw.xyzw acc, vf9, vf0
|
|
vmaddx.xyzw vf9, vf7, vf13
|
|
sqc2 vf3, 0(a0)
|
|
sqc2 vf9, 32(a0)
|
|
beq r0, r0, L71
|
|
sll r0, r0, 0
|
|
|
|
B22:
|
|
ld s6, 0(t4)
|
|
daddiu t4, t4, 8
|
|
lw gp, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lqc2 vf3, 0(a0)
|
|
pcpyld s6, gp, s6
|
|
qmtc2.i vf1, s6
|
|
lw t8, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lh fp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
lqc2 vf9, 32(a0)
|
|
pextlw t8, fp, t8
|
|
pextlh t8, t8, r0
|
|
psraw t8, t8, 16
|
|
qmtc2.i vf7, t8
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddx.xyzw vf3, vf1, vf13
|
|
vitof12.xyzw vf7, vf7
|
|
vmulaw.xyzw acc, vf9, vf0
|
|
vmaddx.xyzw vf9, vf7, vf13
|
|
sqc2 vf3, 0(a0)
|
|
sqc2 vf9, 32(a0)
|
|
beq r0, r0, L71
|
|
sll r0, r0, 0
|
|
|
|
B23:
|
|
ld t9, 0(t4)
|
|
daddiu t4, t4, 8
|
|
lqc2 vf6, 16(a0)
|
|
pextlh t9, t9, r0
|
|
psraw t9, t9, 16
|
|
qmtc2.i vf4, t9
|
|
lw t8, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lh fp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
lqc2 vf9, 32(a0)
|
|
vitof15.xyzw vf4, vf4
|
|
pextlw t8, fp, t8
|
|
pextlh t8, t8, r0
|
|
psraw t8, t8, 16
|
|
vmul.xyzw vf10, vf4, vf6
|
|
qmtc2.i vf7, t8
|
|
vmulaw.xyzw acc, vf10, vf0
|
|
vitof12.xyzw vf7, vf7
|
|
vmaddaz.xyzw acc, vf0, vf10
|
|
vmadday.xyzw acc, vf0, vf10
|
|
vmaddx.xyzw vf10, vf0, vf10
|
|
vmulaw.xyzw acc, vf9, vf0
|
|
vmaddx.xyzw vf9, vf7, vf13
|
|
qmfc2.i t9, vf10
|
|
pcpyud t9, t9, r0
|
|
bltzl t9, L75
|
|
B24:
|
|
vsub.xyzw vf4, vf15, vf4
|
|
|
|
B25:
|
|
L75:
|
|
vmulaw.xyzw acc, vf6, vf0
|
|
vmaddx.xyzw vf6, vf4, vf13
|
|
sqc2 vf6, 16(a0)
|
|
sqc2 vf9, 32(a0)
|
|
beq r0, r0, L71
|
|
sll r0, r0, 0
|
|
|
|
B26:
|
|
ld t9, 0(t4)
|
|
daddiu t4, t4, 8
|
|
lqc2 vf6, 16(a0)
|
|
pextlh t9, t9, r0
|
|
psraw t9, t9, 16
|
|
qmtc2.i vf4, t9
|
|
lw s6, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lh gp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
lqc2 vf3, 0(a0)
|
|
vitof15.xyzw vf4, vf4
|
|
pextlw s6, gp, s6
|
|
pextlh s6, s6, r0
|
|
psraw s6, s6, 16
|
|
vmul.xyzw vf10, vf4, vf6
|
|
qmtc2.i vf1, s6
|
|
lw t8, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lh fp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
vmulaw.xyzw acc, vf10, vf0
|
|
vmaddaz.xyzw acc, vf0, vf10
|
|
vmadday.xyzw acc, vf0, vf10
|
|
vmaddx.xyzw vf10, vf0, vf10
|
|
vitof0.xyzw vf1, vf1
|
|
lqc2 vf9, 32(a0)
|
|
pextlw t8, fp, t8
|
|
qmfc2.i t9, vf10
|
|
pextlh t8, t8, r0
|
|
psraw t8, t8, 16
|
|
qmtc2.i vf7, t8
|
|
pcpyud t9, t9, r0
|
|
bltzl t9, L76
|
|
B27:
|
|
vsub.xyzw vf4, vf15, vf4
|
|
|
|
B28:
|
|
L76:
|
|
vitof12.xyzw vf7, vf7
|
|
vmulaw.xyzw acc, vf6, vf0
|
|
vmaddx.xyzw vf6, vf4, vf13
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddy.xyzw vf3, vf1, vf13
|
|
vmulaw.xyzw acc, vf9, vf0
|
|
vmaddx.xyzw vf9, vf7, vf13
|
|
sqc2 vf3, 0(a0)
|
|
sqc2 vf6, 16(a0)
|
|
sqc2 vf9, 32(a0)
|
|
beq r0, r0, L71
|
|
sll r0, r0, 0
|
|
|
|
B29:
|
|
ld t9, 8(t4)
|
|
daddiu t4, t4, 8
|
|
lqc2 vf6, 16(a0)
|
|
pextlh t9, t9, r0
|
|
psraw t9, t9, 16
|
|
qmtc2.i vf4, t9
|
|
ld s6, -8(t4)
|
|
daddiu t4, t4, 8
|
|
lw gp, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lqc2 vf3, 0(a0)
|
|
vitof15.xyzw vf4, vf4
|
|
pcpyld s6, gp, s6
|
|
vmul.xyzw vf10, vf4, vf6
|
|
qmtc2.i vf1, s6
|
|
lw t8, 0(t5)
|
|
daddiu t5, t5, 4
|
|
lh fp, 0(t6)
|
|
daddiu t6, t6, 2
|
|
vmulaw.xyzw acc, vf10, vf0
|
|
vmaddaz.xyzw acc, vf0, vf10
|
|
vmadday.xyzw acc, vf0, vf10
|
|
vmaddx.xyzw vf10, vf0, vf10
|
|
lqc2 vf9, 32(a0)
|
|
pextlw t8, fp, t8
|
|
qmfc2.i t9, vf10
|
|
pextlh t8, t8, r0
|
|
psraw t8, t8, 16
|
|
qmtc2.i vf7, t8
|
|
pcpyud t9, t9, r0
|
|
bltzl t9, L77
|
|
B30:
|
|
vsub.xyzw vf4, vf15, vf4
|
|
|
|
B31:
|
|
L77:
|
|
vitof12.xyzw vf7, vf7
|
|
vmulaw.xyzw acc, vf6, vf0
|
|
vmaddx.xyzw vf6, vf4, vf13
|
|
vmulaw.xyzw acc, vf3, vf0
|
|
vmaddx.xyzw vf3, vf1, vf13
|
|
vmulaw.xyzw acc, vf9, vf0
|
|
vmaddx.xyzw vf9, vf7, vf13
|
|
sqc2 vf3, 0(a0)
|
|
sqc2 vf6, 16(a0)
|
|
sqc2 vf9, 32(a0)
|
|
beq r0, r0, L71
|
|
sll r0, r0, 0
|
|
|
|
B32:
|
|
L78:
|
|
lq a0, 0(sp)
|
|
sll r0, r0, 0
|
|
jr ra
|
|
daddiu sp, sp, 16
|
|
|
|
jr ra
|
|
daddu sp, sp, r0
|
|
``` |