This adds environment mapping support to `Merc2`, and turns it on for Jak 1 and Jak 2. - The performance is much better - Jak 1 can be toggled back to the old behavior with `(set! *emerc-hack* #f)`. The new environment mapping is identical to the old one everywhere I checked. - Jak 1 still falls back to generic for ripple/texscroll/blerc/eyes - there's still no dynamic texture or vertex updating support. The eye detection stuff will sometimes flag stuff as eyes which is not eyes, which is fine, but means that generic will be used in some places where emerc could be used. For example, the shiny plates on jak's arm will be drawn with generic because jak has eyes. - Jak 2 hasn't been checked super carefully against PCSX2 yet. - Jak 2 still isn't technically using emerc, but instead putting emerc models in the merc bucket. - The interface to merc is a lot different now and totally custom OpenGOAL DMA code. The original merc drawing asm doesn't run anymore. - The FR3 format changed - Something funky going on with foreground lighting in escape, but doesn't seem to be related to this change? Performance comparison, jak 1, in likely the most generic-merc heavy spot: ![image](https://user-images.githubusercontent.com/48171810/213882718-feb2ab59-95a9-44a2-b0e5-95fba860c7b0.png) ![image](https://user-images.githubusercontent.com/48171810/213882736-8dbbf4c9-6bbf-4d0b-96ce-78d63274660c.png)
104 KiB
Emerc
Outline
It's one of two renderers used for foreground + environment mapping. There's also a generc + merc (mercneric) renderer.
As far as I know, the supported effects are:
- skinning, with up to 3 bones influencing each vertex, and per-vertex specification of bone weights
- up to 3 directional lights, plus an ambient light
- vertex colors
- texturing
- texture-based environment mapping (done per vertex, not fragment)
Our hope is to port the emerc renderer to PC, then use it for all rendering for envmapped foreground objects. I believe that emerc
will be easier to understand than mercneric
. The hope is that either emerc
can be used for all models, or once we understand emerc
, it will be straightforward to convert mercneric
-only models to work with PC emerc
.
The mercneric renderer handles partially offscreen stuff, and is believed to be slower than emerc. However, mercneric may use less VU1 time, in exchange for more EE time.
As far as I can tell, the way the game decides to use emerc only if all three of these conditions are true:
emerc
effect bit is set in the model, indicating it can useemerc
.- we're an actor spawned by scene-player
- we're not in a frame range specified by
scissor-frame
in the scene info
The emerc
bit is only there on high-resolution cutscene models.
Most of the time, there are no frames specified in scissor-frame
. This makes sense, usually the actors are onscreen during cutscenes, and emerc
seems quite tolerant of partially offscreen characters. (similar story in jak 1 - they were aggressive at letting merc draw offscreen instead of clipping triangles, likely because the clipping pipeline is so much slower).
In very rare cases, they manually specified a frame range for a character who is mostly offscreen (like daxter's feet are visible in frame 2324 of city-krew-collection-intro
), and then the character is rendered with mercneric
.
My guess is that they just used emerc by default everywhere. If a cutscene character is partially offscreen/behind the camera in a bad way that causes GS coordinates to overflow, this would draw garbage triangles, and they would manually annotate the frame range where this happened.
Review of how all this gets called
Setup
- A level containing
entity-actor
s is loaded - The
level-update
method (called once per frame) inentity.gc
callsbirth!
onentity-actor
s that are visible and eligible to be spawned - The newly created actor process is initialized by calling
init-from-entity!
, which is a method that all objects must implement. - This method will eventually call
initialize-skeleton
, a method of the parentprocess-drawable
class. - This method creates a
draw-control
withskeleton-group->draw-control
- This method calls
setup-cspace-and-add
- This method adds the process drawable to
*foreground-draw-engine*
, a list of processes to be drawn. - The connection uses function
add-process-drawable
, which just calls thedma-add-func
of thedraw-control
, which isdma-add-process-drawable
by default
Per-Frame Draw
-
Game-objects are responsible for calling
ja-post
, or adding themselves to the matrix-engine list, or somehow coming up withjoint
transforms. -
main loop in
main.gc
calls(*draw-hook*)
, which points toreal-main-draw-hook
. This function generates all DMA data for drawing. -
foreground-engine-execute
foreground-init
(doesn't do anything emerc-related)- calls
execute-connections
on the engine, thedma-add-process-drawable
for each object- various stuff for shadows/picking lights
- generates
vu-lights
(light values in VU-friendly format) - picks LOD based on distances
- sets texture masks to indidate to texture system which LODs of which textures will be used
- determines if
close-to-screen
culling is needed. - call
foreground-draw
- add an entry to the
*bone-calculation-list*
to tell it to compute skinning matrices. - rotate lights to camera frame (note that merc only gets a perspective transform, transforming to camera frame is done in skinning calc to avoid a full affine transform on VU1)
- there's some confusing logic for the renderer selection, but in the end it populates
merc-effect-bucket-info
including a color and a few flags. - calls
foreground-emerc
, which generates DMA data foremerc
(asm func)
- add an entry to the
-
foreground-execute-cpu-vu0-engines
- runs bones, modifying the above DMA data to contain skinning matrices computed from joints.
-
display-frame-finish
called after all drawing- Calls
emerc-vu1-init-buffers
, which adds some init data to all usedemerc
buckets.
- Calls
Emerc DMA Generation
The call in GOAL:
(set! dma-ptr (foreground-emerc dc (-> (scratchpad-object foreground-work) regs mtxs) dma-ptr 29 19))
The arguments are:
draw-control
, which contains settings for drawing, and the actual merc geometry (calledgeo
)- a pointer to the "matrix area", which will contain skinning matrices computed by
bones
dma-ptr
, a pointer to the DMA buffer to write data to- 29, 19, likely addresses in the VU1 microprogram to start execution. Typically there is one program for the first run of the renderer, which initializes some VU1 registers/data memory, and then a slightly shorter program that skips the init step.
Before the asm, the rough breakdown is:
- a
draw-control
stores 4 geos, one for each lod (some may be unpopulated) - Each
geo
is amerc-ctrl
, which is an entire model - Each
merc-ctrl
is made upmerc-effect
s - Each
merc-ctrl
is made up of "fragment"s. Each fragment has afrag-geo
(actual data needed in VU1) andfrag-ctrl
(metadata describing how to upload data to VU1) - Each fragment has a few types of data:
unsigned-four
: containing weights (u8), rgba (u8), addresses for crosscopy/samecopy. Unpacked [u8x4] to [u32x4] by VIF on upload to VU1.lump-four
: containing vertex data. Unpacked [u8x4] to [u32x4 + some_magic_constant] by VIF on upload to VU1. This unpack magically converts integers to floats.fp
data: containing a header, and "shaders" (giftags for setting up textures/settings). Copied directly by VIF.
The calling function foreground-draw
sets flags (per effect) in the merc-bucket-info
array. All emerc stuff gets merc-path
set to 1.
High-level description of what it does. Note that this is simplified from the assembly version, which combines some dma transfers shown here. Also - this does not actually run any DMA or microprograms, it just generates a DMA chain that will do this later On the next game frame, the giant DMA chain generated by all renderers will be submitted, and all these will run.
// get the merc control for our level of detail (selected in drawable.gc)
MercControl& mc = draw_control.lod_set[draw_control.cur_lod].geo;
// loop over each "effect" in the merc control.
// The "effect" is the grouping for what can be sent to one renderer or another
for (int effect_idx = 0; effect_idx < mc.header.effect_count; effect_idx++) {
MercEffect& merc_effect = mc.effect[effect_idx]; // merc data in the art group
MercBuckedInfo& merc_effect_info = gForeground.merc_bucket_info[effect_idx]; // settings generated by foreground-draw
if (merc_effect_info.disable_draw) {
continue; // skip if disabled
}
if (merc_effect_info.merc_path != 1) {
continue; // skip if not emerc (1 means emerc here)
}
// where we started writing dma for this effect
u8* effect_dma_start = dma_ptr;
// the source data (stored in the art group) that we'll be sending.
u8* source_ptr = merc_effect.frag_geo;
// loop over fragments
for (int frag_idx = 0; frag_idx < merc_effect.frag_count; frag_idx++) {
MercFragmentControl& frag_ctrl = merc_effect.frag_ctrl[frag_idx];
// set the ROW register of the VIF.
// when kRowAdd flag is given, the VIF will add these 4 values to each component of each quadword it writes out.
// This is used as part of the process to go from u8's to floats
// (they do some cool magic where they don't actually do int->float, they just add integers with VIF and
// do float math on VU1 and it works out somehow)
dma_ptr = generate_vif_strow(dma_ptr, mc.header.st_vif_add, mc.header.st_vif_add, 0x47800000, 0x4b010000);
// number of quadwords (16-byte words) in EE memory of unsigned_four data to send
// unsigned_four data is stored as [u8, u8, u8, u8] and unpacked to [u32, u32, u32, u32].
// the count variable is in units of 4 values. (4 bytes in EE memory, 16 bytes in VU1 memory)
int u4_qwc_in_ee_mem = (frag_ctrl.unsigned_four_count + 3) / 4;
int dest_addr_qw = 140;
dma_ptr = generate_vif_unpack(dma_ptr
kUnpackV4_8, // unpack [u8, u8, u8, u8] to [u32, u32, u32, u32]
kUnsigned, // zero extend when unpacking
dest_addr_qw, // VU1 data address (in quadwords)
kUseTop, // add value of TOP register to destination (VU1 program controls destination)
source_ptr, // source pointer
u4_qwc_in_ee_mem, // number of QW to transfer from EE memory
frag_ctrl.unsigned_four_count, // number of QW written to VU1 memory
kNoRow, // do not add row
);
// note: to write 7 QW of data, the would have this in EE memory:
// [v0, v1, v2, v3] (4 bytes)
// [v4, v5, v6, XX] (4 bytes)
// they would transfer 2 QW to vif (including 1 padding byte)
// but you can tell VIF to unpack only 7 QW, and it will discard the padding.
// advance source pointer to the next data (lump data)
source_ptr += u4_qwc_in_ee_mem * 16;
// advance dest pointer.
dest_addr_qw += frag_ctrl.unsigned_four_count;
// lump 4 is unpacked from [u8, u8, u8, u8] to [u32 + rx, u32 + ry, u32 + rz, u32 + rw]
// where [rx, ry, rz, rw] are specified in ROW set above.
int l4_qwc_in_ee_mem = (frag_ctrl.lump_four_count + 3) / 4;
dma_ptr = generate_vif_unpack(dma_ptr
kUnpackV4_8, // unpack [u8, u8, u8, u8] to [u32, u32, u32, u32]
kUnsigned, // zero extend when unpacking
dest_addr_qw, // VU1 data address (in quadwords)
kUseTop, // add value of TOP register to destination (VU1 program controls destination)
source_ptr, // source pointer
l4_qwc_in_ee_mem, // number of QW to transfer from EE memory
frag_ctrl.lump_four_count, // number of QW written to VU1 memory
kAddRow // add the row value
);
// advance source pointer to the next data (lump data)
source_ptr += l4_qwc_in_ee_mem * 16;
// advance dest pointer.
dest_addr_qw += frag_ctrl.unsigned_four_count;
// send fp data.
dma_ptr = generate_vif_unpack(dma_ptr
kUnpackV4_32, // just plain memcpy to VU1 memory
kSigned, // no effect? they set it explicitly always, not sure why.
dest_addr_qw, // VU1 data address (in quadwords)
kUseTop, // add value of TOP register to destination (VU1 program controls destination)
source_ptr, // source pointer
fp_qwc, // number of QW to transfer from EE memory
frag_ctrl.fp_qwc, // number of QW written to VU1 memory
kNoRow // don't add the row value
);
// adavne source pointer
source_ptr += frag_ctrl.fp_qwc * 16;
// there's some special data shared between all fragments. We put this DMA after the DMA
// for the first fragment as an optimization. We can write the first fragment of this effect
// to VU1 data memory while VU1 is processing the last fragment of the previous effect.
// This is ok because the per-fragment data is double buffered (controlled with the TOP register)
// However, the shared data is not double buffered, and we must wait for the previous effect
// to be fully done before transferring. We want to delay this as long as possible, so we
// transfer the first per-fragment data of this effect before this part.
if (frag_idx == 0) {
// sneak some more data in lights
auto lights = gForeground.merc_bucket_info.lights;
lights.qws[1].w = ignore_alpha ? 0x3f85026b : 0x3f85026a;
// copy the 7 qw of lights to the dma buffer now, setting up a transfer for them to go
// to address 140 in VU1 (no TOP).
// the previous code sets up these lights in VU format (vu-lights).
dma_ptr = dma_memcpy_to_buffer_then_vu1(dma_ptr, 132, &lights, 7);
// copy these 4 values to address 139 (copying them to the dma-buffer now)
dma_ptr = dma_copy_to_buffer_then_vu1(dma_ptr, 139, merc_ctrl.header.xyz_scale, merc_ctrl.header.st_magic, merc_ctrl.header.st_out_a, merc_ctrl.header.st_out_b);
// emerc new transfer - copying 1 qw color_fade (u8's unpacked to u32)
dma_ptr = dma_copy_to_buffer_then_vu1(dma_ptr, 118, unpack_u8_to_u32(merc_effect_info.color_fade));
AdgifShader* envmap_shader = DefaultEnvmapShader;
if (merc_effect.extra_info && merc_effect.extra_info.shader_offset) { // nonzero check
envmap_shader = ((u8*)&merc_effect.extra_info) + 16 * merc_effect.extra_info.shader_offset;
}
// 5 qw envmap shader
dma_ptr = dma_copy_to_buffer_then_vu1(dma_ptr, 119, envmap_shader, 5 * 16);
}
// fragments will (most of the time) need new matrix data.
// there are some cases where they can reuse some matrix data from previous fragments in the same
// effect, so it's possible for there to be no matrices to transfer. But usually there are some
for (int mat_xfer = 0; mat_xfer < frag_ctrl.max_xfer_count; mat_xfer++) {
auto& info = frag_ctrl.mat_dest_data[mat_xfer];
dma_ptr = dma_transfer_matrix(dma_ptr, info.matrix_dest, matrix_mem + sizeof(MercMatrix) * info.mattrix_number);
}
// finally, call program.
dma_ptr = dma_mscal(frag_idx == 0 ? program_addr_1 : program_addr_2);
}
// a bunch of bucket patching crap
}
The actual asm:
L101: ;; function prologue
daddiu sp, sp, -128
sd ra, 0(sp)
sq s0, 16(sp)
sq s1, 32(sp)
sq s2, 48(sp)
sq s3, 64(sp)
sq s4, 80(sp)
sq s5, 96(sp)
sq gp, 112(sp)
;; one-time setup for this "merc-control". A merc-control is a model (at a particular lod)
;; for a process-drawable.
;; using dc as the input draw-control (a constant)
;; using mc as (-> dc lod-set (-> dc cur-lod) geo), the merc-control we're drawing (a constant)
;; using t8 = mep as (-> mc effect <n>), one of the merc-effects in the merc-control (variable)
;; using t7 = mec (merc-effect counter), the number of remaining merc-counters
;; using t9 = mebp as (-> *foreground* merc-bucket-info effect <n>), one of the merc-bucket-info's filled out by
;; the calling function, containing per-effect settings.
B0:
or t7, a3, r0 ;; t7 = program-addr-1
or v1, t0, r0 ;; v1 = program-addr-2
lui t0, 4096 ;; t0 = 0x10000000
lui t1, 18304 ;; t1 = 0x47800000
daddiu t0, t0, 1 ;; t0 = 0x10000001
dsll32 t1, t1, 0 ;; t1 = 0x47800000'00000000
lui a3, 12288 ;; a3 = 0x30000000
lui t8, 19201 ;; t8 = 0x4b010000
pcpyld t0, a3, t0 ;; t0 = 0x00000000'30000000'00000000'10000001 (STROW)
lbu a3, 78(a0) ;; a3 = (-> dc cur-lod)
pcpyld t1, t8, t1 ;; t1 = 0x00000000'4b010000'47800000'00000000
lui t2, 28160 ;; t2 = 0x6e000000
addiu t8, r0, 8 ;; t8 = 8
multu3 a3, a3, t8 ;; a3 = (* 8 (-> dc cur-lod))
lui t3, 1280 ;; t3 = 0x05000000
lui t4, 27648 ;; t4 = 0x6c000000
dsll32 t2, t2, 0 ;; t2 = 0x6e000000'00000000
dsll32 t4, t4, 0 ;; t4 = 0x6c000000'00000000
daddu t4, t4, t3 ;; t4 = 0x6c000000'05000000
daddu t3, t2, t3 ;; t3 = 0x6e000000'05000000
daddiu t3, t3, 1 ;; t3 = 0x6e000000'05000001
daddu a0, a3, a0 ;; a0 = (+ dc (* 8 (-> dc cur-lod)))
pcpyld t2, t2, r0 ;; t2 = 0x6e000000'00000000'00000000'00000000 (unpack-v4-8, no change to row)
lw a0, 28(a0) ;; a0 = (-> dc lod-set (-> dc cur-lod) geo) ;; a merc-ctrl
pcpyld t3, t3, r0 ;; t3 = 0x6e000000'05000001'00000000'00000000 (unpack-v4-8, row add)
pcpyld t4, t4, r0 ;; t4 = 0x6c000000'05000000'00000000'00000000 (unpack-v4-32, disable row add)
lui t5, 12288 ;; t5 = 0x30000000
lui t6, 4096 ;; t6 = 0x10000000
daddiu t5, t5, 7 ;; t5 = 0x30000007
lui t8, 5120 ;; t8 = 0x14000000
lui a3, 27655 ;; a3 = 0x6c070000
daddu t7, t8, t7 ;; t7 = 0x14000000 + program-addr-1
dsll32 a3, a3, 0 ;; a3 = 0x6c070000'00000000
dsll32 t8, t7, 0 ;; t8 = (0x14000000 + program-addr-1) << 32
pcpyld t5, a3, t5 ;; t5 = 0x6c070000'00000000'00000000'30000007
lwu t7, 52(a0) ;; t7 = (-> mc effect-count)
pcpyld t6, t8, t6 ;; t6 = ((0x14000000 + program-addr-1) << 32) << 64 + 0x00000000'10000000
daddiu t8, a0, 156 ;; t8 = (-> mc effect 0) = mep "merc effect pointer"
beq t7, r0, L109 ;; branch if there's no effects (I think this is buggy and jumps to the wrong spot)
lw a3, *foreground*(s7) ;; a3 = *foreground*
B1:
daddiu t9, a3, 2508 ;; t9 = (-> *foreground* merc-bucket-info effect 0)
B2:
;; TOP of per-effect loop
;; (I've marked lines with stats if they are just for computing statistics)
L102:
lbu a3, 6(t9) ;; a3 = (-> mebp disable-draw)
or ra, a2, r0 ;; ra = start-of-dma-for-this-effect
lbu gp, 4(t9) ;; gp = (-> mebp merc-path)
bne a3, r0, L109 ;; jump to next effect if this is disabled.
lw a3, *merc-global-stats*(s7) ;; a3 = mgs
B3:
daddiu a3, a3, 16 ;; a3 = (-> *merc-global-stats* emerc)
daddiu gp, gp, -1 ;; check if `merc-path` is 1, skip this fragment if it's something else
sll r0, r0, 0
bne gp, r0, L109
lhu s4, 2(a3) ;; stats.fragments
B4:
lhu s3, 18(t8) ;; s3 = (-> mep frag-count)
lwu gp, 4(a3) ;; stats
lhu s5, 22(t8) ;; s5 = (-> mep tri-count)
daddu s4, s4, s3 ;; stats
lwu s3, 8(a3) ;; stats
lhu s2, 24(t8) ;; s2 = (-> mep dvert-count)
daddu gp, gp, s5 ;; stats
sh s4, 2(a3) ;; stats
sw gp, 4(a3) ;; stats
daddu s5, s3, s2 ;; stats
lwu t2, 0(t8) ;; t2 = (-> mep frag-geo)
lwu gp, 4(t8) ;; gp = (-> mep frag-ctrl)
lui s4, 12288 ;; 0x30000000
dsll32 t2, t2, 0 ;; (-> mep frag-geo) << 32
sw s5, 8(a3) ;; stats
or t2, t2, s4 ;; t2 = ((-> mep frag-geo) << 32) + 0x30000000 (upper 64-bits still have dma tmpl)
lhu s5, 18(t8) ;; s5 = (-> mep frag-count)
addiu s4, r0, 0 ;; s4 = 0
beq s5, r0, L109 ;; skip to next effect if no frags in this effect.
sll r0, r0, 0
B5:
sll r0, r0, 0
;; top of per-fragment loop.
;; s4 = current-frag-idx
;; s5 = num-frags
;; a2 = dma-ptr
;; DMA memory layout
;; lower-bits higher bits
;; 0 [dmatag-lower, dmatag-upper, strow-viftag, ROW_X ] ;; transfer 1 qw, immediately after this
;; 1 [ROW_Y , ROW_Z , ROW_W , nop-viftag ] ;; the qw transferred by 0
;; 2 [dmatag-lower, dmatag-upper, nop , unpack-v4-8] ;; (unsigned4's)
;; 3 [dmatag-lower, dmatag-upper, strow 1 , unpack-v4-8] ;; lumps
;; 4 [dmatag-lower, dmatag-upper, strow 0 , unpack-v4-32]
B6:
L103:
lbu s0, 0(gp) ;; s0 = frag-ctrl.unsigned-four-count (number of 4xu8's in memory)
sll r0, r0, 0
lbu s2, 1(gp) ;; s2 = frag-ctrl.lump-four-count
xori s1, r0, 49292 ;; s1 = 0xc08c
lbu s3, 2(gp) ;; s3 = frag-ctrl.fp-qwc
daddiu v0, s0, 3 ;; v0 = u4count + 3
lw a3, 44(a0) ;; a3 = header.st-vif-add
srl v0, v0, 2 ;; v0 = (u4count + 3) / 4
sq t0, 0(a2) ;; set DMA qw 0 (dmatag-strow only)
xor t2, t2, v0 ;; set dma qwc
sq t2, 32(a2) ;; store dma line 2.
xor t2, t2, v0 ;; unset dma qwc
sh s1, 44(a2) ;; set addr for unpack (tops + unsigned bits)
daddu s1, s1, s0 ;; unpdate qwc for next unpack
sb s0, 46(a2) ;; set qwc for unpack
dsll32 s0, v0, 4 ;; v0 = (u4-ee-qwc << 36)
daddu t3, t2, s0 ;; t3 = dma-tag templ
daddiu s0, s2, 3 ;; s0 = l4c + 3
sw a3, 12(a2) ;; ROW_X = header.st-vif-add
srl s0, s0, 2 ;; s0 /= 4
sq t1, 16(a2) ;; ROW_Z, W
xor t3, t3, s0 ;; set dma qwc
sq t3, 48(a2) ;; store dma templ 3
xor t3, t3, s0 ;; unset dma qwc
sh s1, 60(a2) ;; set vif unpack
daddu s1, s1, s2 ;; next dest
sb s2, 62(a2) ;; store.
dsll32 s2, s0, 4 ;; s2 = dma-src-inc shifted
sw a3, 16(a2) ;; ROW Y
daddu t4, t3, s2 ;; unpack-v4-32 tmpl
xor t4, t4, s3 ;; set qwc in dma tmpl
xori a3, s1, 16384 ;; turn off sign extension in unpack
sq t4, 64(a2) ;; store dma 4
xor t4, t4, s3 ;; unset qwc
sb s3, 78(a2) ;; set qwc in unpack
dsll32 s3, s3, 4 ;; qwc -> bytes
sh a3, 76(a2) ;; set unpack
daddu t2, t4, s3 ;; ?? (maybe reset t2 tmpl)
lbu s3, 3(gp) ;; s3 = mat-xfer-count
daddiu gp, gp, 4 ;; next fragment control
bne s4, r0, L105 ;; do B7, B8, B9, B10 only on first fragment
daddiu a2, a2, 80 ;; advance DMA ptr.
B7:
sd t6, 0(a2) ;; weirdo dma generation code (somebody had too much fun here)
addiu s2, r0, 8 ;; transfer 8 qw
sd t6, 8(a2) ;; more weird crap
lui a3, 27656 ;; 0x6c08
sb s2, 0(a2) ;; transfer 8 qw
daddiu a3, a3, 132 ;; to 140
lw s2, *foreground*(s7) ;; fg
daddiu s2, s2, 2384 ;; s2 = merc-bucket-info array
sw a3, 12(a2) ;; unpack to 140
lq a3, 0(s2) ;; a3 = lights 0
lq s1, 16(s2) ;; s1 = lights 1
lq s0, 32(s2) ;; s0 = lights 2
lq v0, 48(s2) ;; v0 = lights 3
sq a3, 16(a2) ;; store lights
sq s1, 32(a2)
sq s0, 48(a2)
sq v0, 64(a2)
lq a3, 64(s2) ;; lights again
lq s1, 80(s2)
lq s0, 96(s2) ;; lights 6
lui v0, 16261
lq s2, 28(a0)
daddiu v0, v0, 619 ;; v0 = 0x3f85026b
sq a3, 80(a2) ;; light store
lbu a3, 5(t9) ;; a3 = ignore-alpha
sq s1, 96(a2) ;; lights
sq s0, 112(a2) ;; last lights
dsubu a3, v0, a3 ;; compute ignore alpha
sq s2, 128(a2) ;; header
sw a3, 28(a2) ;; light[1].w
daddiu a2, a2, 144 ;; inc dma
sd t6, 0(a2)
addiu s2, r0, 6
sd t6, 8(a2)
lui a3, 27654 ;; 0x6C06
sb s2, 0(a2)
daddiu a3, a3, 118
sw a3, 12(a2) ;; to 124
lw a3, 0(t9) ;; a3 = color fade
pextlb a3, r0, a3 ;; unpack u8 to u32's
pextlh a3, r0, a3
sq a3, 16(a2) ;; store color fade
lw a3, *default-envmap-shader*(s7) ;; envmap ptr.
lw s2, 28(t8) ;; merc-extra-info
beq s2, r0, L104
sll r0, r0, 0
B8:
lbu s1, 1(s2)
beq s1, r0, L104
sll r0, r0, 0
B9:
sll a3, s1, 4
addu a3, s2, a3
B10:
L104:
lq s2, 0(a3) ;; copy shader to dma buff
lq s1, 16(a3)
lq s0, 32(a3)
lq v0, 48(a3)
lq a3, 64(a3)
sq s2, 32(a2)
sq s1, 48(a2)
sq s0, 64(a2)
sq v0, 80(a2)
sq a3, 96(a2)
daddiu a2, a2, 112
;; after first time per-effect stuff
B11:
L105:
beq s3, r0, L107
addiu s2, r0, 128 ;; s2 = 128 (matrix size)
B12:
lbu a3, 0(gp) ;; get mat number
sll r0, r0, 0
B13:
L106:
multu3 s1, a3, s2 ;; s1 = matrix offset in ee world
sq t5, 0(a2) ;; mat transfer tmplate
lbu s0, 1(gp) ;; mat dest
daddiu gp, gp, 2 ;; gp = next mat transfer
lbu a3, 0(gp) ;; a3 = next matrix offset
daddiu s3, s3, -1 ;; dec remaining
sb s0, 12(a2) ;; store dest
daddiu a2, a2, 16 ;; inc dma
daddu s1, s1, a1 ;; compute matrix pointer
sll r0, r0, 0
bne s3, r0, L106
sw s1, -12(a2)
B14:
L107:
sq t6, 0(a2)
daddiu a2, a2, 16
bne s4, r0, L108
daddiu s4, s4, 1
B15:
or a3, v1, r0 ;; execute program (1 for first round, 2 for later ones)
sb a3, -4(a2)
B16:
L108:
bne s4, s5, L103 ;; loop frag
sll r0, r0, 0
B17: ;; patching crap, based on texture index now. should document eventually...
lui s5, 28672
lbu a3, 26(t8)
addiu gp, r0, 48
lw s5, 52(s5)
mult3 a3, a3, gp
sll r0, r0, 0
daddu a3, s5, a3
sll r0, r0, 0
lw gp, 12(a3)
sll r0, r0, 0
lw s5, 16(a3)
lui s4, 8192
sq r0, 0(a2)
movz gp, ra, gp
sw s4, 0(a2)
or s4, a2, r0
sw gp, 12(a3)
daddiu a2, a2, 16
beq s5, r0, L109
sw s4, 16(a3)
B18:
sll r0, r0, 0
sw ra, 4(s5)
B19:
L109:
daddiu t8, t8, 32
daddiu t9, t9, 8
daddiu t7, t7, -1
bne t7, r0, L102 ;; loop effect
sll r0, r0, 0
B20:
or v0, a2, r0
ld ra, 0(sp)
lq gp, 112(sp)
lq s5, 96(sp)
lq s4, 80(sp)
lq s3, 64(sp)
lq s2, 48(sp)
lq s1, 32(sp)
lq s0, 16(sp)
jr ra
daddiu sp, sp, 128
sll r0, r0, 0
sll r0, r0, 0
Summary of above
Overall, it's very similar to merc. There's some extra data transfered:
- the "low memory" stuff setup in
emerc.gc
is 1 QW longer (an extra "unperspect
QW") - the
rgba color-fade
is transferred to non-double-buffered memory (like lights) (1 Qw, unpack to u32's) - 5 QW shader for envmapping (either
*default-envmap-shader*
or one provided in merc extra info)
emerc data appears backward compatible with merc, which makes sense:
- emerc falls back to merc if it's too far away to envmap
- we put emerc stuff through merc code (blending and stuff is wrong, but the geometry comes out right)
The promising thing is that we don't seem to need much extra information to do environment mapping. I kinda though we'd need another set of texture coordinates, but I don't see where that enters yet.
If all we need is the shader, plus tint values, it would be easy to do this for any model that succeeds with merc.
EMERC VU1 constants
Triangle strip giftag - same as normal merc exactly (in the normal no-alpha case)
(set! (-> s5-0 tri-strip-gif tag)
(new 'static 'gif-tag64
:pre #x1
:prim (new 'static 'gs-prim :prim (gs-prim-type tri-strip) :iip #x1 :tme #x1 :fge #x1)
:nreg #x3
)
)
(set! (-> s5-0 tri-strip-gif regs)
(new 'static 'gif-tag-regs :regs0 (gif-reg-id st) :regs1 (gif-reg-id rgbaq) :regs2 (gif-reg-id xyzf2))
)
;; word 3 gets set to #x303e4000
Program list
0
: per-frame init19
: effect init29
: process frag
Memory map
All in 16-byte quadword addresses.
Low memory: after DMA
[0 ] : tri-strip-gif (st, rgbaq, xyzf2), no abe, 0x303e4000 in word 3, same as merc.
[1 ] : adgif-shader giftag (giftag for 5 a+d's)
[2 ] : hvdf-offset
[3 - 7] : perspective matrix (only perspective project, no rotation/translation). 3 gets set to persp_vector
[7 ] : fog (pfog0, fog-min, fog-max, 0.0)
[8 ] : unperspect (1/P(0, 0), 1/P(1, 1), 0.5, 1/P(2, 3))
Low memory: after inits (both frame and effect)
[0 ] : tri-strip-gif (st, rgbaq, xyzf2), no abe, 0x303e4000 in word 3, same as merc.
[1 ] : adgif-shader giftag (giftag for 5 a+d's)
[2 ] : hvdf-offset
[3 ] : P_mult = [low.P(0, 0), low.P(1, 1), low.P(2, 2), low.P(2, 3)]
[4 ] : P_add = [low.P(3, 0), low.P(3, 1), low.P(3, 2), low.P(3, 3)]
[5 ] : P_mult_scale = P_mult * header.xyz-scale
[7 ] : fog (pfog0, fog-min, fog-max, 0.0)
[8 ] : unperspect (1/P(0, 0), 1/P(1, 1), 0.5, 1/P(2, 3))
Summary of math
The "transformed vertex" refers to the vertex before perspective divide, and pfog0 multiply. The "transformed normal" is the rotated normal, after normalization.
vf08 = transformed
vf23 = unperspect
vf14 = rgba-fade
vf24 = normal st
mul.xyzw vf09, vf08, vf23 ;; do unperspect
subw.z vf10, vf10, vf00 ;; subtract 1 from z
addw.z vf09, vf00, vf09 ;; xyww the unperspected thing
mul.xyz vf15, vf09, vf10 ;;
adday.xyzw vf15, vf15
maddz.x vf15, vf21, vf15
div Q, vf15.x, vf10.z
mulaw.xyzw ACC, vf09, vf00
mul.xyzw vf09, vf08, vf23
madd.xyzw vf10, vf10, Q
eleng.xyz P, vf10
mfp.w vf10, P
div Q, vf23.z, vf10.w
addaz.xyzw vf00, vf23
madd.xyzw vf10, vf10, Q
mulz.xy vf24, vf10, vf24 ;; mul tex by q
;; new rgba
sq.xyzw vf14, 443(vi10)
;;
vf24
VU1 Program: init (per frame)
lq.xyzw vf01, 7(vi00) | nop
lq.xyzw vf25, 3(vi00) | nop
lq.xyzw vf26, 4(vi00) | nop
lq.xyzw vf27, 5(vi00) | nop
lq.xyzw vf28, 6(vi00) | nop
lq.xyzw vf08, 8(vi00) | nop
mr32.xyzw vf01, vf01 | nop
move.y vf25, vf26 | nop
move.zw vf25, vf27 | nop
sq.xyzw vf25, 3(vi00) | nop
sq.xyzw vf08, 124(vi00) | nop
2048.0 | nop :i
255.0 | maxi.x vf17, vf00, I :i
-65537.0 | maxi.y vf17, vf00, I :i
mr32.xyzw vf02, vf01 | minii.z vf17, vf00, I
lq.xyzw vf22, 2(vi00) | minii.z vf18, vf00, I
0.003921569 | minii.z vf19, vf00, I :i
sq.xyzw vf28, 4(vi00) | minii.w vf29, vf00, I :e
mr32.xyzw vf03, vf02 | nop
Simplified code (??
's are either garbage, or some value that isn't important later on). Leaving out stores to low memory documented in the Memory Map section.
vf01 = [??, ??, ??, low.pfog0]
vf02 = [??, ??, ??, low.fog_min]
vf03 = [??, ??, ??, low.fog_max]
vf17 = [2048., 255., -65537., ??]
vf22 = low_in.hvdf_offset
VU1 Program: init (per effect)
Note that this continues directly into the per-frag program, to match the note in frag == 0 case in the dma generation part.
lq.xyzw vf25, 139(vi00) | nop
lq.xyzw vf26, 3(vi00) | nop
lq.xyz vf01, 132(vi00) | nop
lq.xyz vf02, 133(vi00) | nop
lq.xyz vf03, 134(vi00) | addy.xy vf19, vf00, vf25
lq.xyzw vf04, 135(vi00) | mulx.xyzw vf26, vf26, vf25
lq.xyzw vf05, 136(vi00) | nop
lq.xyzw vf06, 137(vi00) | nop
lq.xyzw vf07, 138(vi00) | nop
sq.xyzw vf26, 5(vi00) | nop ;; P_mult_scale store.
Simplified code (note: some of this stuff set later)
vf25 = [xyz-scale, st-magic, st-out-a, st-out-b];
vf26 = low.P_mult * xyz-scale;
vf01 = [lt0.xyz, pfog0]
vf02 = [lt1.xyz, fog-min]
vf03 = [lt2.xyz, fog-max]
vf19 = [st-magic, st-magic, -65537, xyz-add.z];
vf04 = lt0_color;
vf05 = lt1_color;
vf06 = lt2_color;
VU1 Program: per-fragment, pre-looping init
;; reg setup stuff
lq.xyzw vf28, 139(vi00) | minix.xyzw vf15, vf00, vf00 ;; vf28 = merc-ctrl-header, vf15 = [0, 0, 0, 0]
xtop vi15 | nop ;; vi15 = 0 (output buffer)
iaddiu vi12, vi15, 0x8c | nop ;; vi12 = xtop + 140 (merc-byte-header, u4)
nop | nop ;; in merc was a branch for st-a/st-b select.
ilwr.w vi03, vi12 | maxz.xy vf18, vf00, vf28 ;; set vf18.xy = [st-out-a, st-out-a] (for a buffer)
iaddiu vi15, vi00, 0x173 | nop ;; vi15 = xtop + 371
lq.xyzw vf14, 0(vi00) | nop ;; vf14 = tri-strip-gif-tag
nop | nop ;; in merc was fadeout
iadd vi03, vi03, vi12 | nop ;; st-output location = st-out-a + xtop + 140
ilwr.w vi09, vi03 | nop ;; vi09 = fp-header u8's [shader-cnt, kick-off, kick-step, hword-cnt]
lqi.xyzw vf27, vi03 | nop ;; vf27 = xyz-add
ilw.x vi04, 1(vi12) | nop ;; vi04 = mat1-cnt
iaddiu vi05, vi00, 0x7f | addw.xyz vf15, vf15, vf00 ;; vf15 = [1, 1, 1, 0], vi05 = 0x7f
iand vi09, vi09, vi05 | nop ;; mask to get vi09 = shader-cnt
ilw.y vi06, 1(vi12) | miniz.w vf19, vf00, vf27 ;; setup vf19, vi06 = mat2-cnt
nop | miniy.w vf18, vf00, vf27 ;; setup vf18, merc had branch for no strips.
ilwr.z vi01, vi12 | minix.w vf17, vf00, vf27 ;; vi01 = lump-off
;; vf17 = [2048, 255, -65537, xyz-add.x]
;; vf18 = [st-out-X, st-out-X, -65537, xyz-add.y] (X = a if xtop = 0, X = b otherwise)
;; vf19 = [st-magic, st-magic, -65537, xyz-add.z]
;; shader setup (not envmap)
lq.xyzw vf13, 1(vi00) | nop ;; vf13 = adgif gif tag.
ilwr.w vi02, vi03 | nop ;; vi02 = shader control word 0 (dest offset)
lqi.xyzw vf08, vi03 | nop ;; load shader data
lqi.xyzw vf09, vi03 | nop
lqi.xyzw vf10, vi03 | nop
lqi.xyzw vf11, vi03 | nop
lqi.xyzw vf12, vi03 | nop
iadd vi02, vi02, vi15 | nop ;; compute destination
mtir vi08, vf09.w | nop ;; eop stuff (not sure this makes sense in 1-shader emerc)
sqi.xyzw vf13, vi02 | nop ;; store adgif gif tag
sqi.xyzw vf08, vi02 | nop ;; shader store 1
sqi.xyzw vf09, vi02 | nop ;; shader store 2
mfir.x vf14, vi08 | nop ;; set eop bit in giftag template
sqi.xyzw vf10, vi02 | nop ;; shader store 3
sqi.xyzw vf11, vi02 | nop ;; shader store 4
sqi.xyzw vf12, vi02 | nop ;; shader store 5
sq.xyzw vf14, 0(vi02) | nop ;; store end giftag
;; matrix warmup
lq.xyzw vf28, 3(vi00) | nop ;; vf28 = persp-diag
ilw.y vi08, 3(vi12) | nop ;; vi08 = mat-slot.0
lq.xyzw vf16, 5(vi00) | nop ;; vf16 = scaled-persp-diag
lq.xyzw vf20, 4(vi00) | nop ;; vf20 = persp-off
ilw.z vi09, 3(vi12) | mul.xyzw vf27, vf28, vf15 ;; vf27 = [pdx, pdy, pdz, 0], vi09 = mat-slot.1
ior vi11, vi08, vi00 | mul.xyzw vf28, vf28, vf00 ;; vf28 = [0, 0, 0, pdw], vi11 = vi08 = mat-slot.0
ibeq vi00, vi08, L2 | mul.xyzw vf15, vf16, vf15 ;; vf15 = [spdx, spdy, spdz, 0], skip if slot = 0
iaddi vi13, vi12, 0x3 | mul.xyzw vf16, vf16, vf00 ;; vi13 = mat-slot-ptr, vf16 = [0, 0, 0, spdw]
- mostly same as merc
- always picks
st-a
, merc had a branch here based on state ofxtop
. - no fade out flag stuff
Matrix multiply loop
Premultiplies uploaded matrices by perspective. Only does matrices that were uploaded this time. Same as merc, so skipping.
The rest of it
- Transformed vertex (before perspective divide and pfog0 multiply is store back over
lump[2]
) - Transformed normal is stored over
rgba
L2: (L14 in og merc)
;; Pipelining Start for vertex transform
ilw.x vi02, 3(vi12) | nop ;; vi02 = perc-off
ibeq vi00, vi04, L13 | nop ;; goto L13 if mat1 count is 0
iadd vi01, vi01, vi12 | nop ;; vi01 = lump.
;; Pipelining start for matrix 1's
ilwr.x vi08, vi01 | nop ;; vi08 = lump[0].x = mat-0?
lqi.xyzw vf08, vi01 | nop
lqi.xyzw vf11, vi01 | nop
lqi.xyzw vf14, vi01 | nop ;; vf14 = lump[2] = [texs, text, nrmz, posz]
lq.xyz vf29, 4(vi08) | nop
lq.xyz vf30, 5(vi08) | add.zw vf08, vf08, vf17
lq.xyzw vf31, 6(vi08) | add.xyzw vf11, vf11, vf18
iaddi vi04, vi04, -0x1 | add.xyzw vf14, vf14, vf19
iadd vi02, vi02, vi12 | nop
lqi.xyzw vf24, vi02 | mulaz.xyzw ACC, vf29, vf08
mtir vi10, vf11.x | maddaz.xyzw ACC, vf30, vf11
mtir vi13, vf11.y | maddz.xyz vf11, vf31, vf14
lq.xyzw vf25, 0(vi08) | nop
lq.xyzw vf26, 1(vi08) | itof0.xyzw vf24, vf24
lq.xyzw vf27, 2(vi08) | nop
erleng.xyz P, vf11 | nop
lq.xyzw vf28, 3(vi08) | mulaw.xyzw ACC, vf25, vf08
nop | maddaw.xyzw ACC, vf26, vf11 ;; modified from merc, no mercprime crap
mr32.z vf14, vf00 | maddw.xyzw vf08, vf27, vf14
lqi.xyzw vf09, vi01 | nop
ilwr.y vi03, vi12 | nop
ilw.z vi07, 1(vi12) | nop
lqi.xyzw vf12, vi01 | add.xyzw vf08, vf08, vf28
lqi.xyzw vf15, vi01 | nop
mtir vi08, vf09.x | nop ;; mercprime stuff in og.
;; CHANGE: transformed vf08 (pre perspective divide, pfog mult)
;; is stored back! over lop lump[2] (texs, text, nrmz, posz)
sq.xyzw vf08, -4(vi01) | miniw.w vf08, vf08, vf01
iadd vi03, vi03, vi12 | nop
div Q, vf01.w, vf08.w | add.zw vf09, vf09, vf17
iadd vi04, vi04, vi03 | add.xyzw vf12, vf12, vf18
lq.xyz vf29, 4(vi08) | add.xyzw vf15, vf15, vf19
lq.xyz vf30, 5(vi08) | nop
iadd vi06, vi06, vi04 | nop
lq.xyzw vf31, 6(vi08) | nop
lq.xyzw vf25, 0(vi08) | nop
lq.xyzw vf26, 1(vi08) | mul.xyz vf08, vf08, Q
mtir vi11, vf12.x | mul.xyzw vf14, vf14, Q
mtir vi14, vf12.y | nop
lq.xyzw vf27, 2(vi08) | nop
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22 ;; load rgba, hvdf offset
iadd vi07, vi07, vi06 | mulaz.xyzw ACC, vf29, vf09
lq.xyzw vf28, 3(vi08) | maddaz.xyzw ACC, vf30, vf12
mfp.w vf20, P | maddz.xyz vf12, vf31, vf15
nop | nop
1024.0 | miniw.w vf08, vf08, vf03 :i
nop | mulaw.xyzw ACC, vf25, vf09 ;; modified, no mercprime branch
ilw.y vi09, -6(vi01) | mulw.xyzw vf11, vf11, vf20 ;;
erleng.xyz P, vf12 | maxi.xy vf08, vf08, I ;; like mercprimt path (L82 in og merc)
3072.0 | nop :i
nop | minii.xy vf08, vf08, I
;; CHANGE store back normal over RGBA.
sq.xyzw vf11, -1(vi03) | maddaw.xyzw ACC, vf26, vf12
mr32.z vf15, vf00 | maddw.xyzw vf09, vf27, vf15
lqi.xyzw vf10, vi01 | mulax.xyzw ACC, vf01, vf11
ibne vi04, vi03, L4 | madday.xyzw ACC, vf02, vf11 ;; branch to L4, pipelined mat 1
nop | maddz.xyzw vf11, vf03, vf11
ibne vi06, vi03, L17 | nop
nop | nop
b L52 | nop
nop | nop
;; pipelined mat 1 loop start
L3: (L16 in og)
sq.xyzw vf11, -1(vi03) | nop ;; normal store back
3072.0 | mulax.xyzw ACC, vf01, vf11 :i ;; mercprime crap
lqi.xyzw vf10, vi01 | minii.xy vf08, vf08, I
sq.xyzw vf13, 1(vi12) | madday.xyzw ACC, vf02, vf11
sq.xyzw vf13, 1(vi15) | maddz.xyzw vf11, vf03, vf11
;; pipelined mat 1 entry point
L4: (L17 in og)
lqi.xyzw vf13, vi01 | add.xyzw vf09, vf09, vf28
lqi.xyzw vf16, vi01 | maxw.w vf08, vf08, vf02
mtir vi08, vf10.x | itof0.xyzw vf23, vf23
ilw.y vi09, -9(vi01) | maxx.xyzw vf11, vf11, vf00
sq.xyzw vf09, -4(vi01) | miniw.w vf09, vf09, vf01
div Q, vf01.w, vf09.w | add.zw vf10, vf10, vf17
move.xyzw vf21, vf08 | add.xyzw vf13, vf13, vf18
lq.xyz vf29, 4(vi08) | add.xyzw vf16, vf16, vf19
lq.xyz vf30, 5(vi08) | mulax.xyzw ACC, vf04, vf11
ibgtz vi09, L5 | madday.xyzw ACC, vf05, vf11
lq.xyzw vf31, 6(vi08) | maddaz.xyzw ACC, vf06, vf11
nop | addx.w vf21, vf21, vf17
L5: (L18 in og)
lq.xyzw vf25, 0(vi08) | maddw.xyzw vf11, vf07, vf00
lq.xyzw vf26, 1(vi08) | mul.xyz vf09, vf09, Q
mtir vi12, vf13.x | mul.xyzw vf15, vf15, Q
mtir vi15, vf13.y | ftoi4.xyzw vf21, vf21
lq.xyzw vf27, 2(vi08) | mul.xyzw vf11, vf11, vf23
lqi.xyzw vf23, vi03 | add.xyzw vf09, vf09, vf22
ibne vi00, vi09, L6 | mulaz.xyzw ACC, vf29, vf10
sq.xyzw vf21, 2(vi10) | maddaz.xyzw ACC, vf30, vf13
nop | ftoi4.xyzw vf21, vf08
L6: (L19 in og)
mfp.w vf20, P | maddz.xyz vf13, vf31, vf16
sq.xyzw vf14, 0(vi10) | miniy.xyzw vf11, vf11, vf17
sq.xyzw vf14, 0(vi13) | miniw.w vf09, vf09, vf03
sq.xyzw vf21, 2(vi13) | mulaw.xyzw ACC, vf25, vf10
lq.xyzw vf28, 3(vi08) | mulw.xyzw vf12, vf12, vf20
1024.0 | ftoi0.xyzw vf11, vf11 :i
erleng.xyz P, vf13 | maxi.xy vf09, vf09, I
ibne vi04, vi03, L7 | maddaw.xyzw ACC, vf26, vf13
mr32.z vf16, vf00 | maddw.xyzw vf10, vf27, vf16
ibne vi06, vi03, L22 | nop
ilw.y vi09, -6(vi01) | nop
ibne vi07, vi03, L57 | nop
nop | nop
b L67 | nop
nop | nop
L7: (L20 in og)
sq.xyzw vf12, -1(vi03) | nop
3072.0 | mulax.xyzw ACC, vf01, vf12 :i
lqi.xyzw vf08, vi01 | minii.xy vf09, vf09, I
sq.xyzw vf11, 1(vi10) | madday.xyzw ACC, vf02, vf12
sq.xyzw vf11, 1(vi13) | maddz.xyzw vf12, vf03, vf12
lqi.xyzw vf11, vi01 | add.xyzw vf10, vf10, vf28
lqi.xyzw vf14, vi01 | maxw.w vf09, vf09, vf02
mtir vi08, vf08.x | itof0.xyzw vf23, vf23
ilw.y vi09, -9(vi01) | maxx.xyzw vf12, vf12, vf00
sq.xyzw vf10, -4(vi01) | miniw.w vf10, vf10, vf01
div Q, vf01.w, vf10.w | add.zw vf08, vf08, vf17
move.xyzw vf21, vf09 | add.xyzw vf11, vf11, vf18
lq.xyz vf29, 4(vi08) | add.xyzw vf14, vf14, vf19
lq.xyz vf30, 5(vi08) | mulax.xyzw ACC, vf04, vf12
ibgtz vi09, L8 | madday.xyzw ACC, vf05, vf12
lq.xyzw vf31, 6(vi08) | maddaz.xyzw ACC, vf06, vf12
nop | addx.w vf21, vf21, vf17
L8: (L21 in og)
lq.xyzw vf25, 0(vi08) | maddw.xyzw vf12, vf07, vf00
lq.xyzw vf26, 1(vi08) | mul.xyz vf10, vf10, Q
mtir vi10, vf11.x | mul.xyzw vf16, vf16, Q
mtir vi13, vf11.y | ftoi4.xyzw vf21, vf21
lq.xyzw vf27, 2(vi08) | mul.xyzw vf12, vf12, vf23
lqi.xyzw vf23, vi03 | add.xyzw vf10, vf10, vf22
ibne vi00, vi09, L9 | mulaz.xyzw ACC, vf29, vf08
sq.xyzw vf21, 2(vi11) | maddaz.xyzw ACC, vf30, vf11
nop | ftoi4.xyzw vf21, vf09
L9: (L22 in og)
mfp.w vf20, P | maddz.xyz vf11, vf31, vf14
sq.xyzw vf15, 0(vi11) | miniy.xyzw vf12, vf12, vf17
sq.xyzw vf15, 0(vi14) | miniw.w vf10, vf10, vf03
sq.xyzw vf21, 2(vi14) | mulaw.xyzw ACC, vf25, vf08
lq.xyzw vf28, 3(vi08) | mulw.xyzw vf13, vf13, vf20
1024.0 | ftoi0.xyzw vf12, vf12 :i
erleng.xyz P, vf11 | maxi.xy vf10, vf10, I
ibne vi04, vi03, L10 | maddaw.xyzw ACC, vf26, vf11
mr32.z vf14, vf00 | maddw.xyzw vf08, vf27, vf14
ibne vi06, vi03, L27 | nop
ilw.y vi09, -6(vi01) | nop
ibne vi07, vi03, L62 | nop
nop | nop
b L72 | nop
nop | nop
L10: (L23 in og)
sq.xyzw vf13, -1(vi03) | nop
3072.0 | mulax.xyzw ACC, vf01, vf13 :i
lqi.xyzw vf09, vi01 | minii.xy vf10, vf10, I
sq.xyzw vf12, 1(vi11) | madday.xyzw ACC, vf02, vf13
sq.xyzw vf12, 1(vi14) | maddz.xyzw vf13, vf03, vf13
lqi.xyzw vf12, vi01 | add.xyzw vf08, vf08, vf28
lqi.xyzw vf15, vi01 | maxw.w vf10, vf10, vf02
mtir vi08, vf09.x | itof0.xyzw vf23, vf23
ilw.y vi09, -9(vi01) | maxx.xyzw vf13, vf13, vf00
sq.xyzw vf08, -4(vi01) | miniw.w vf08, vf08, vf01
div Q, vf01.w, vf08.w | add.zw vf09, vf09, vf17
move.xyzw vf21, vf10 | add.xyzw vf12, vf12, vf18
lq.xyz vf29, 4(vi08) | add.xyzw vf15, vf15, vf19
lq.xyz vf30, 5(vi08) | mulax.xyzw ACC, vf04, vf13
ibgtz vi09, L11 | madday.xyzw ACC, vf05, vf13
lq.xyzw vf31, 6(vi08) | maddaz.xyzw ACC, vf06, vf13
nop | addx.w vf21, vf21, vf17
L11: (L24 in og)
lq.xyzw vf25, 0(vi08) | maddw.xyzw vf13, vf07, vf00
lq.xyzw vf26, 1(vi08) | mul.xyz vf08, vf08, Q
mtir vi11, vf12.x | mul.xyzw vf14, vf14, Q
mtir vi14, vf12.y | ftoi4.xyzw vf21, vf21
lq.xyzw vf27, 2(vi08) | mul.xyzw vf13, vf13, vf23
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22
ibne vi00, vi09, L12 | mulaz.xyzw ACC, vf29, vf09
sq.xyzw vf21, 2(vi12) | maddaz.xyzw ACC, vf30, vf12
nop | ftoi4.xyzw vf21, vf10
L12: (L25 in og)
mfp.w vf20, P | maddz.xyz vf12, vf31, vf15
sq.xyzw vf16, 0(vi12) | miniy.xyzw vf13, vf13, vf17
sq.xyzw vf16, 0(vi15) | miniw.w vf08, vf08, vf03
sq.xyzw vf21, 2(vi15) | mulaw.xyzw ACC, vf25, vf09
lq.xyzw vf28, 3(vi08) | mulw.xyzw vf11, vf11, vf20
1024.0 | ftoi0.xyzw vf13, vf13 :i
erleng.xyz P, vf12 | maxi.xy vf08, vf08, I
ibne vi04, vi03, L3 | maddaw.xyzw ACC, vf26, vf12
mr32.z vf15, vf00 | maddw.xyzw vf09, vf27, vf15
ibne vi06, vi03, L16 | nop
ilw.y vi09, -6(vi01) | nop
ibne vi07, vi03, L51 | nop
nop | nop
b L77 | nop
nop | nop
L13 (L26 in og merc):
;; pipeline startup for mat 2's (assuming you have no mat1's)
ibeq vi00, vi06, L47 | nop
iadd vi02, vi02, vi12 | nop
lqi.xyzw vf08, vi01 | nop
lqi.xyzw vf24, vi02 | nop
lqi.xyzw vf11, vi01 | nop
lqi.xyzw vf14, vi01 | nop
mtir vi10, vf08.x | nop
mtir vi13, vf08.y | itof0.xyzw vf24, vf24
iaddi vi06, vi06, -0x1 | add.zw vf08, vf08, vf17
nop | add.xyzw vf11, vf11, vf18
iand vi10, vi10, vi05 | add.xyzw vf14, vf14, vf19
nop | mulw.xyzw vf24, vf24, vf29
iand vi13, vi13, vi05 | nop
lq.xyzw vf20, 0(vi10) | nop
lq.xyzw vf25, 0(vi13) | nop
lq.xyzw vf23, 1(vi10) | nop
lq.xyzw vf26, 1(vi13) | nop
lq.xyzw vf20, 2(vi10) | mulax.xyzw ACC, vf20, vf24
lq.xyzw vf27, 2(vi13) | maddy.xyzw vf25, vf25, vf24
lq.xyzw vf23, 3(vi10) | mulax.xyzw ACC, vf23, vf24
lq.xyzw vf28, 3(vi13) | maddy.xyzw vf26, vf26, vf24
lq.xyzw vf20, 4(vi10) | mulax.xyzw ACC, vf20, vf24
lq.xyz vf29, 4(vi13) | maddy.xyzw vf27, vf27, vf24
lq.xyzw vf23, 5(vi10) | mulax.xyzw ACC, vf23, vf24
lq.xyz vf30, 5(vi13) | maddy.xyzw vf28, vf28, vf24
lq.xyzw vf20, 6(vi10) | mulax.xyzw ACC, vf20, vf24
lq.xyzw vf31, 6(vi13) | maddy.xyz vf29, vf29, vf24
mtir vi10, vf11.x | mulax.xyzw ACC, vf23, vf24
mtir vi13, vf11.y | maddy.xyz vf30, vf30, vf24
nop | mulax.xyzw ACC, vf20, vf24
nop | maddy.xyzw vf31, vf31, vf24
nop | mulaz.xyzw ACC, vf29, vf08
nop | maddaz.xyzw ACC, vf30, vf11
nop | maddz.xyz vf11, vf31, vf14
nop | nop
nop | nop
nop | mulaw.xyzw ACC, vf25, vf08
nop | nop
erleng.xyz P, vf11 | nop
nop | maddaw.xyzw ACC, vf26, vf11
mr32.z vf14, vf00 | maddw.xyzw vf08, vf27, vf14
lqi.xyzw vf09, vi01 | nop
ilwr.y vi03, vi12 | nop
ilw.z vi07, 1(vi12) | nop
lqi.xyzw vf12, vi01 | add.xyzw vf08, vf08, vf28
lqi.xyzw vf15, vi01 | nop
mtir vi11, vf09.x | nop
mtir vi14, vf09.y | nop
sq.xyzw vf08, -4(vi01) | miniw.w vf08, vf08, vf01
div Q, vf01.w, vf08.w | add.zw vf09, vf09, vf17
iadd vi03, vi03, vi12 | add.xyzw vf12, vf12, vf18
iand vi11, vi11, vi05 | add.xyzw vf15, vf15, vf19
iadd vi06, vi06, vi03 | nop
iadd vi07, vi07, vi06 | nop
iand vi14, vi14, vi05 | nop
ibne vi05, vi11, L14 | nop
iaddiu vi08, vi00, 0x23a | mul.xyz vf08, vf08, Q
mtir vi11, vf12.x | mul.xyzw vf14, vf14, Q
mtir vi14, vf12.y | nop
b L15 | nop
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22
L14: (L28 in og)
lq.xyzw vf20, 0(vi11) | mul.xyzw vf14, vf14, Q
lq.xyzw vf25, 0(vi14) | nop
lq.xyzw vf23, 1(vi11) | nop
lq.xyzw vf26, 1(vi14) | add.xyzw vf08, vf08, vf22
lq.xyzw vf20, 2(vi11) | mulaz.xyzw ACC, vf20, vf24
lq.xyzw vf27, 2(vi14) | maddw.xyzw vf25, vf25, vf24
lq.xyzw vf23, 3(vi11) | mulaz.xyzw ACC, vf23, vf24
lq.xyzw vf28, 3(vi14) | maddw.xyzw vf26, vf26, vf24
lq.xyzw vf20, 4(vi11) | mulaz.xyzw ACC, vf20, vf24
lq.xyz vf29, 4(vi14) | maddw.xyzw vf27, vf27, vf24
lq.xyzw vf23, 5(vi11) | mulaz.xyzw ACC, vf23, vf24
lq.xyz vf30, 5(vi14) | maddw.xyzw vf28, vf28, vf24
lq.xyzw vf20, 6(vi11) | mulaz.xyzw ACC, vf20, vf24
lq.xyzw vf31, 6(vi14) | maddw.xyz vf29, vf29, vf24
lqi.xyzw vf23, vi02 | mulaz.xyzw ACC, vf23, vf24
mtir vi11, vf12.x | maddw.xyz vf30, vf30, vf24
mtir vi14, vf12.y | mulaz.xyzw ACC, vf20, vf24
iaddiu vi08, vi00, 0x18c | maddw.xyzw vf31, vf31, vf24
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
L15: (L29 in og)
nop | mulaz.xyzw ACC, vf29, vf09
nop | maddaz.xyzw ACC, vf30, vf12
mfp.w vf20, P | maddz.xyz vf12, vf31, vf15
nop | nop
1024.0 | miniw.w vf08, vf08, vf03 :i
nop | mulaw.xyzw ACC, vf25, vf09
ilw.y vi09, -6(vi01) | mulw.xyzw vf11, vf11, vf20
erleng.xyz P, vf12 | maxi.xy vf08, vf08, I
3072.0 | nop :i
sq.xyzw vf11, -1(vi03) | minii.xy vf08, vf08, I
ibeq vi06, vi03, L50 | maddaw.xyzw ACC, vf26, vf12
mr32.z vf15, vf00 | maddw.xyzw vf09, vf27, vf15
lqi.xyzw vf10, vi01 | mulax.xyzw ACC, vf01, vf11
jr vi08 | madday.xyzw ACC, vf02, vf11
nop | maddz.xyzw vf11, vf03, vf11
L16: (L30 in og)
sq.xyzw vf11, -1(vi03) | nop
3072.0 | mulax.xyzw ACC, vf01, vf11 :i
lqi.xyzw vf10, vi01 | minii.xy vf08, vf08, I
sq.xyzw vf13, 1(vi12) | madday.xyzw ACC, vf02, vf11
sq.xyzw vf13, 1(vi15) | maddz.xyzw vf11, vf03, vf11
L17: (L31 in og)
lqi.xyzw vf13, vi01 | add.xyzw vf09, vf09, vf28
lqi.xyzw vf16, vi01 | maxw.w vf08, vf08, vf02
mtir vi12, vf10.x | itof0.xyzw vf23, vf23
mtir vi15, vf10.y | maxx.xyzw vf11, vf11, vf00
sq.xyzw vf09, -4(vi01) | miniw.w vf09, vf09, vf01
div Q, vf01.w, vf09.w | add.zw vf10, vf10, vf17
move.xyzw vf21, vf08 | add.xyzw vf13, vf13, vf18
iand vi12, vi12, vi05 | add.xyzw vf16, vf16, vf19
nop | mulax.xyzw ACC, vf04, vf11
ibgtz vi09, L18 | madday.xyzw ACC, vf05, vf11
iand vi15, vi15, vi05 | maddaz.xyzw ACC, vf06, vf11
nop | addx.w vf21, vf21, vf17
L18: (L32 in og)
ibne vi05, vi12, L19 | maddw.xyzw vf11, vf07, vf00
ilw.x vi09, -9(vi01) | mul.xyz vf09, vf09, Q
mtir vi12, vf13.x | mul.xyzw vf15, vf15, Q
mtir vi15, vf13.y | ftoi4.xyzw vf21, vf21
b L20 | mul.xyzw vf11, vf11, vf23
lqi.xyzw vf23, vi03 | add.xyzw vf09, vf09, vf22
L19: (L33 in og)
lq.xyzw vf20, 0(vi12) | mul.xyzw vf15, vf15, Q
nop | mulw.xyzw vf24, vf24, vf29
lq.xyzw vf25, 0(vi15) | ftoi4.xyzw vf21, vf21
lq.xyzw vf23, 1(vi12) | mul.xyzw vf11, vf11, vf23
lq.xyzw vf26, 1(vi15) | add.xyzw vf09, vf09, vf22
lq.xyzw vf20, 2(vi12) | mulax.xyzw ACC, vf20, vf24
lq.xyzw vf27, 2(vi15) | maddy.xyzw vf25, vf25, vf24
lq.xyzw vf23, 3(vi12) | mulax.xyzw ACC, vf23, vf24
lq.xyzw vf28, 3(vi15) | maddy.xyzw vf26, vf26, vf24
lq.xyzw vf20, 4(vi12) | mulax.xyzw ACC, vf20, vf24
lq.xyz vf29, 4(vi15) | maddy.xyzw vf27, vf27, vf24
lq.xyzw vf23, 5(vi12) | mulax.xyzw ACC, vf23, vf24
lq.xyz vf30, 5(vi15) | maddy.xyzw vf28, vf28, vf24
lq.xyzw vf20, 6(vi12) | mulax.xyzw ACC, vf20, vf24
lq.xyzw vf31, 6(vi15) | maddy.xyz vf29, vf29, vf24
mtir vi12, vf13.x | mulax.xyzw ACC, vf23, vf24
mtir vi15, vf13.y | maddy.xyz vf30, vf30, vf24
b L35 | mulax.xyzw ACC, vf20, vf24
lqi.xyzw vf23, vi03 | maddy.xyzw vf31, vf31, vf24
L20: (L34 in og)
ibgez vi09, L21 | mulaz.xyzw ACC, vf29, vf10
sq.xyzw vf21, 2(vi10) | maddaz.xyzw ACC, vf30, vf13
nop | ftoi4.xyzw vf21, vf08
L21: (L35 in og)
mfp.w vf20, P | maddz.xyz vf13, vf31, vf16
sq.xyzw vf14, 0(vi10) | miniy.xyzw vf11, vf11, vf17
sq.xyzw vf14, 0(vi13) | miniw.w vf09, vf09, vf03
sq.xyzw vf21, 2(vi13) | mulaw.xyzw ACC, vf25, vf10
ilw.y vi09, -6(vi01) | mulw.xyzw vf12, vf12, vf20
1024.0 | ftoi0.xyzw vf11, vf11 :i
erleng.xyz P, vf13 | maxi.xy vf09, vf09, I
ibne vi06, vi03, L22 | maddaw.xyzw ACC, vf26, vf13
mr32.z vf16, vf00 | maddw.xyzw vf10, vf27, vf16
ibne vi07, vi03, L57 | nop
nop | nop
b L67 | nop
nop | nop
L22: (L36 in og)
sq.xyzw vf12, -1(vi03) | nop
3072.0 | mulax.xyzw ACC, vf01, vf12 :i
lqi.xyzw vf08, vi01 | minii.xy vf09, vf09, I
sq.xyzw vf11, 1(vi10) | madday.xyzw ACC, vf02, vf12
sq.xyzw vf11, 1(vi13) | maddz.xyzw vf12, vf03, vf12
lqi.xyzw vf11, vi01 | add.xyzw vf10, vf10, vf28
lqi.xyzw vf14, vi01 | maxw.w vf09, vf09, vf02
mtir vi10, vf08.x | itof0.xyzw vf23, vf23
mtir vi13, vf08.y | maxx.xyzw vf12, vf12, vf00
sq.xyzw vf10, -4(vi01) | miniw.w vf10, vf10, vf01
div Q, vf01.w, vf10.w | add.zw vf08, vf08, vf17
move.xyzw vf21, vf09 | add.xyzw vf11, vf11, vf18
iand vi10, vi10, vi05 | add.xyzw vf14, vf14, vf19
nop | mulax.xyzw ACC, vf04, vf12
ibgtz vi09, L23 | madday.xyzw ACC, vf05, vf12
iand vi13, vi13, vi05 | maddaz.xyzw ACC, vf06, vf12
nop | addx.w vf21, vf21, vf17
L23: (L37 in og)
ibne vi05, vi10, L24 | maddw.xyzw vf12, vf07, vf00
ilw.x vi09, -9(vi01) | mul.xyz vf10, vf10, Q
mtir vi10, vf11.x | mul.xyzw vf16, vf16, Q
mtir vi13, vf11.y | ftoi4.xyzw vf21, vf21
b L25 | mul.xyzw vf12, vf12, vf23
lqi.xyzw vf23, vi03 | add.xyzw vf10, vf10, vf22
L24: (L38 in og)
lq.xyzw vf20, 0(vi10) | mul.xyzw vf16, vf16, Q
nop | mulw.xyzw vf24, vf24, vf29
lq.xyzw vf25, 0(vi13) | ftoi4.xyzw vf21, vf21
lq.xyzw vf23, 1(vi10) | mul.xyzw vf12, vf12, vf23
lq.xyzw vf26, 1(vi13) | add.xyzw vf10, vf10, vf22
lq.xyzw vf20, 2(vi10) | mulax.xyzw ACC, vf20, vf24
lq.xyzw vf27, 2(vi13) | maddy.xyzw vf25, vf25, vf24
lq.xyzw vf23, 3(vi10) | mulax.xyzw ACC, vf23, vf24
lq.xyzw vf28, 3(vi13) | maddy.xyzw vf26, vf26, vf24
lq.xyzw vf20, 4(vi10) | mulax.xyzw ACC, vf20, vf24
lq.xyz vf29, 4(vi13) | maddy.xyzw vf27, vf27, vf24
lq.xyzw vf23, 5(vi10) | mulax.xyzw ACC, vf23, vf24
lq.xyz vf30, 5(vi13) | maddy.xyzw vf28, vf28, vf24
lq.xyzw vf20, 6(vi10) | mulax.xyzw ACC, vf20, vf24
lq.xyzw vf31, 6(vi13) | maddy.xyz vf29, vf29, vf24
mtir vi10, vf11.x | mulax.xyzw ACC, vf23, vf24
mtir vi13, vf11.y | maddy.xyz vf30, vf30, vf24
b L40 | mulax.xyzw ACC, vf20, vf24
lqi.xyzw vf23, vi03 | maddy.xyzw vf31, vf31, vf24
L25: (L39 in og)
ibgez vi09, L26 | mulaz.xyzw ACC, vf29, vf08
sq.xyzw vf21, 2(vi11) | maddaz.xyzw ACC, vf30, vf11
nop | ftoi4.xyzw vf21, vf09
L26: (L40 in og)
mfp.w vf20, P | maddz.xyz vf11, vf31, vf14
sq.xyzw vf15, 0(vi11) | miniy.xyzw vf12, vf12, vf17
sq.xyzw vf15, 0(vi14) | miniw.w vf10, vf10, vf03
sq.xyzw vf21, 2(vi14) | mulaw.xyzw ACC, vf25, vf08
ilw.y vi09, -6(vi01) | mulw.xyzw vf13, vf13, vf20
1024.0 | ftoi0.xyzw vf12, vf12 :i
erleng.xyz P, vf11 | maxi.xy vf10, vf10, I
ibne vi06, vi03, L27 | maddaw.xyzw ACC, vf26, vf11
mr32.z vf14, vf00 | maddw.xyzw vf08, vf27, vf14
ibne vi07, vi03, L62 | nop
nop | nop
b L72 | nop
nop | nop
L27: (L41 in og)
sq.xyzw vf13, -1(vi03) | nop
3072.0 | mulax.xyzw ACC, vf01, vf13 :i
lqi.xyzw vf09, vi01 | minii.xy vf10, vf10, I
sq.xyzw vf12, 1(vi11) | madday.xyzw ACC, vf02, vf13
sq.xyzw vf12, 1(vi14) | maddz.xyzw vf13, vf03, vf13
lqi.xyzw vf12, vi01 | add.xyzw vf08, vf08, vf28
lqi.xyzw vf15, vi01 | maxw.w vf10, vf10, vf02
mtir vi11, vf09.x | itof0.xyzw vf23, vf23
mtir vi14, vf09.y | maxx.xyzw vf13, vf13, vf00
sq.xyzw vf08, -4(vi01) | miniw.w vf08, vf08, vf01
div Q, vf01.w, vf08.w | add.zw vf09, vf09, vf17
move.xyzw vf21, vf10 | add.xyzw vf12, vf12, vf18
iand vi11, vi11, vi05 | add.xyzw vf15, vf15, vf19
nop | mulax.xyzw ACC, vf04, vf13
ibgtz vi09, L28 | madday.xyzw ACC, vf05, vf13
iand vi14, vi14, vi05 | maddaz.xyzw ACC, vf06, vf13
nop | addx.w vf21, vf21, vf17
L28: (L42 in og)
ibne vi05, vi11, L29 | maddw.xyzw vf13, vf07, vf00
ilw.x vi09, -9(vi01) | mul.xyz vf08, vf08, Q
mtir vi11, vf12.x | mul.xyzw vf14, vf14, Q
mtir vi14, vf12.y | ftoi4.xyzw vf21, vf21
b L30 | mul.xyzw vf13, vf13, vf23
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22
L29: (L43 in og)
lq.xyzw vf20, 0(vi11) | mul.xyzw vf14, vf14, Q
nop | mulw.xyzw vf24, vf24, vf29
lq.xyzw vf25, 0(vi14) | ftoi4.xyzw vf21, vf21
lq.xyzw vf23, 1(vi11) | mul.xyzw vf13, vf13, vf23
lq.xyzw vf26, 1(vi14) | add.xyzw vf08, vf08, vf22
lq.xyzw vf20, 2(vi11) | mulax.xyzw ACC, vf20, vf24
lq.xyzw vf27, 2(vi14) | maddy.xyzw vf25, vf25, vf24
lq.xyzw vf23, 3(vi11) | mulax.xyzw ACC, vf23, vf24
lq.xyzw vf28, 3(vi14) | maddy.xyzw vf26, vf26, vf24
lq.xyzw vf20, 4(vi11) | mulax.xyzw ACC, vf20, vf24
lq.xyz vf29, 4(vi14) | maddy.xyzw vf27, vf27, vf24
lq.xyzw vf23, 5(vi11) | mulax.xyzw ACC, vf23, vf24
lq.xyz vf30, 5(vi14) | maddy.xyzw vf28, vf28, vf24
lq.xyzw vf20, 6(vi11) | mulax.xyzw ACC, vf20, vf24
lq.xyzw vf31, 6(vi14) | maddy.xyz vf29, vf29, vf24
mtir vi11, vf12.x | mulax.xyzw ACC, vf23, vf24
mtir vi14, vf12.y | maddy.xyz vf30, vf30, vf24
b L45 | mulax.xyzw ACC, vf20, vf24
lqi.xyzw vf23, vi03 | maddy.xyzw vf31, vf31, vf24
L30: (L44 in og)
ibgez vi09, L31 | mulaz.xyzw ACC, vf29, vf09
sq.xyzw vf21, 2(vi12) | maddaz.xyzw ACC, vf30, vf12
nop | ftoi4.xyzw vf21, vf10
L31: (L45 in og)
mfp.w vf20, P | maddz.xyz vf12, vf31, vf15
sq.xyzw vf16, 0(vi12) | miniy.xyzw vf13, vf13, vf17
sq.xyzw vf16, 0(vi15) | miniw.w vf08, vf08, vf03
sq.xyzw vf21, 2(vi15) | mulaw.xyzw ACC, vf25, vf09
ilw.y vi09, -6(vi01) | mulw.xyzw vf11, vf11, vf20
1024.0 | ftoi0.xyzw vf13, vf13 :i
erleng.xyz P, vf12 | maxi.xy vf08, vf08, I
ibne vi06, vi03, L16 | maddaw.xyzw ACC, vf26, vf12
mr32.z vf15, vf00 | maddw.xyzw vf09, vf27, vf15
ibne vi07, vi03, L51 | nop
nop | nop
b L77 | nop
nop | nop
L32: (L46 in og)
sq.xyzw vf11, -1(vi03) | nop
3072.0 | mulax.xyzw ACC, vf01, vf11 :i
lqi.xyzw vf10, vi01 | minii.xy vf08, vf08, I
sq.xyzw vf13, 1(vi12) | madday.xyzw ACC, vf02, vf11
sq.xyzw vf13, 1(vi15) | maddz.xyzw vf11, vf03, vf11
lqi.xyzw vf13, vi01 | add.xyzw vf09, vf09, vf28
lqi.xyzw vf16, vi01 | maxw.w vf08, vf08, vf02
mtir vi12, vf10.x | itof0.xyzw vf23, vf23
mtir vi15, vf10.y | maxx.xyzw vf11, vf11, vf00
sq.xyzw vf09, -4(vi01) | miniw.w vf09, vf09, vf01
div Q, vf01.w, vf09.w | add.zw vf10, vf10, vf17
move.xyzw vf21, vf08 | add.xyzw vf13, vf13, vf18
iand vi12, vi12, vi05 | add.xyzw vf16, vf16, vf19
nop | mulax.xyzw ACC, vf04, vf11
ibgtz vi09, L33 | madday.xyzw ACC, vf05, vf11
iand vi15, vi15, vi05 | maddaz.xyzw ACC, vf06, vf11
nop | addx.w vf21, vf21, vf17
L33: (L47 in og)
ibne vi05, vi12, L34 | maddw.xyzw vf11, vf07, vf00
ilw.x vi09, -9(vi01) | mul.xyz vf09, vf09, Q
mtir vi12, vf13.x | mul.xyzw vf15, vf15, Q
mtir vi15, vf13.y | ftoi4.xyzw vf21, vf21
b L35 | mul.xyzw vf11, vf11, vf23
lqi.xyzw vf23, vi03 | add.xyzw vf09, vf09, vf22
L34: (L48 in og)
lq.xyzw vf20, 0(vi12) | mul.xyzw vf15, vf15, Q
lq.xyzw vf25, 0(vi15) | ftoi4.xyzw vf21, vf21
lq.xyzw vf23, 1(vi12) | mul.xyzw vf11, vf11, vf23
lq.xyzw vf26, 1(vi15) | add.xyzw vf09, vf09, vf22
lq.xyzw vf20, 2(vi12) | mulaz.xyzw ACC, vf20, vf24
lq.xyzw vf27, 2(vi15) | maddw.xyzw vf25, vf25, vf24
lq.xyzw vf23, 3(vi12) | mulaz.xyzw ACC, vf23, vf24
lq.xyzw vf28, 3(vi15) | maddw.xyzw vf26, vf26, vf24
lq.xyzw vf20, 4(vi12) | mulaz.xyzw ACC, vf20, vf24
lq.xyz vf29, 4(vi15) | maddw.xyzw vf27, vf27, vf24
lq.xyzw vf23, 5(vi12) | mulaz.xyzw ACC, vf23, vf24
lq.xyz vf30, 5(vi15) | maddw.xyzw vf28, vf28, vf24
lq.xyzw vf20, 6(vi12) | mulaz.xyzw ACC, vf20, vf24
lq.xyzw vf31, 6(vi15) | maddw.xyz vf29, vf29, vf24
lqi.xyzw vf23, vi02 | mulaz.xyzw ACC, vf23, vf24
mtir vi12, vf13.x | maddw.xyz vf30, vf30, vf24
mtir vi15, vf13.y | mulaz.xyzw ACC, vf20, vf24
b L20 | maddw.xyzw vf31, vf31, vf24
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
L35: (L49 in og)
ibgez vi09, L36 | mulaz.xyzw ACC, vf29, vf10
sq.xyzw vf21, 2(vi10) | maddaz.xyzw ACC, vf30, vf13
nop | ftoi4.xyzw vf21, vf08
L36: (L50 in og)
mfp.w vf20, P | maddz.xyz vf13, vf31, vf16
sq.xyzw vf14, 0(vi10) | miniy.xyzw vf11, vf11, vf17
sq.xyzw vf14, 0(vi13) | miniw.w vf09, vf09, vf03
sq.xyzw vf21, 2(vi13) | mulaw.xyzw ACC, vf25, vf10
ilw.y vi09, -6(vi01) | mulw.xyzw vf12, vf12, vf20
1024.0 | ftoi0.xyzw vf11, vf11 :i
erleng.xyz P, vf13 | maxi.xy vf09, vf09, I
ibne vi06, vi03, L37 | maddaw.xyzw ACC, vf26, vf13
mr32.z vf16, vf00 | maddw.xyzw vf10, vf27, vf16
ibne vi07, vi03, L57 | nop
nop | nop
b L67 | nop
nop | nop
L37: (L51 in og)
sq.xyzw vf12, -1(vi03) | nop
3072.0 | mulax.xyzw ACC, vf01, vf12 :i
lqi.xyzw vf08, vi01 | minii.xy vf09, vf09, I
sq.xyzw vf11, 1(vi10) | madday.xyzw ACC, vf02, vf12
sq.xyzw vf11, 1(vi13) | maddz.xyzw vf12, vf03, vf12
lqi.xyzw vf11, vi01 | add.xyzw vf10, vf10, vf28
lqi.xyzw vf14, vi01 | maxw.w vf09, vf09, vf02
mtir vi10, vf08.x | itof0.xyzw vf23, vf23
mtir vi13, vf08.y | maxx.xyzw vf12, vf12, vf00
sq.xyzw vf10, -4(vi01) | miniw.w vf10, vf10, vf01
div Q, vf01.w, vf10.w | add.zw vf08, vf08, vf17
move.xyzw vf21, vf09 | add.xyzw vf11, vf11, vf18
iand vi10, vi10, vi05 | add.xyzw vf14, vf14, vf19
nop | mulax.xyzw ACC, vf04, vf12
ibgtz vi09, L38 | madday.xyzw ACC, vf05, vf12
iand vi13, vi13, vi05 | maddaz.xyzw ACC, vf06, vf12
nop | addx.w vf21, vf21, vf17
L38: (L52 in og)
ibne vi05, vi10, L39 | maddw.xyzw vf12, vf07, vf00
ilw.x vi09, -9(vi01) | mul.xyz vf10, vf10, Q
mtir vi10, vf11.x | mul.xyzw vf16, vf16, Q
mtir vi13, vf11.y | ftoi4.xyzw vf21, vf21
b L40 | mul.xyzw vf12, vf12, vf23
lqi.xyzw vf23, vi03 | add.xyzw vf10, vf10, vf22
L39: (L53 in og)
lq.xyzw vf20, 0(vi10) | mul.xyzw vf16, vf16, Q
lq.xyzw vf25, 0(vi13) | ftoi4.xyzw vf21, vf21
lq.xyzw vf23, 1(vi10) | mul.xyzw vf12, vf12, vf23
lq.xyzw vf26, 1(vi13) | add.xyzw vf10, vf10, vf22
lq.xyzw vf20, 2(vi10) | mulaz.xyzw ACC, vf20, vf24
lq.xyzw vf27, 2(vi13) | maddw.xyzw vf25, vf25, vf24
lq.xyzw vf23, 3(vi10) | mulaz.xyzw ACC, vf23, vf24
lq.xyzw vf28, 3(vi13) | maddw.xyzw vf26, vf26, vf24
lq.xyzw vf20, 4(vi10) | mulaz.xyzw ACC, vf20, vf24
lq.xyz vf29, 4(vi13) | maddw.xyzw vf27, vf27, vf24
lq.xyzw vf23, 5(vi10) | mulaz.xyzw ACC, vf23, vf24
lq.xyz vf30, 5(vi13) | maddw.xyzw vf28, vf28, vf24
lq.xyzw vf20, 6(vi10) | mulaz.xyzw ACC, vf20, vf24
lq.xyzw vf31, 6(vi13) | maddw.xyz vf29, vf29, vf24
lqi.xyzw vf23, vi02 | mulaz.xyzw ACC, vf23, vf24
mtir vi10, vf11.x | maddw.xyz vf30, vf30, vf24
mtir vi13, vf11.y | mulaz.xyzw ACC, vf20, vf24
b L25 | maddw.xyzw vf31, vf31, vf24
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
L40: (L54 in og)
ibgez vi09, L41 | mulaz.xyzw ACC, vf29, vf08
sq.xyzw vf21, 2(vi11) | maddaz.xyzw ACC, vf30, vf11
nop | ftoi4.xyzw vf21, vf09
L41: (L55 in og)
mfp.w vf20, P | maddz.xyz vf11, vf31, vf14
sq.xyzw vf15, 0(vi11) | miniy.xyzw vf12, vf12, vf17
sq.xyzw vf15, 0(vi14) | miniw.w vf10, vf10, vf03
sq.xyzw vf21, 2(vi14) | mulaw.xyzw ACC, vf25, vf08
ilw.y vi09, -6(vi01) | mulw.xyzw vf13, vf13, vf20
1024.0 | ftoi0.xyzw vf12, vf12 :i
erleng.xyz P, vf11 | maxi.xy vf10, vf10, I
ibne vi06, vi03, L42 | maddaw.xyzw ACC, vf26, vf11
mr32.z vf14, vf00 | maddw.xyzw vf08, vf27, vf14
ibne vi07, vi03, L62 | nop
nop | nop
b L72 | nop
nop | nop
L42: (L56 in og)
sq.xyzw vf13, -1(vi03) | nop
3072.0 | mulax.xyzw ACC, vf01, vf13 :i
lqi.xyzw vf09, vi01 | minii.xy vf10, vf10, I
sq.xyzw vf12, 1(vi11) | madday.xyzw ACC, vf02, vf13
sq.xyzw vf12, 1(vi14) | maddz.xyzw vf13, vf03, vf13
lqi.xyzw vf12, vi01 | add.xyzw vf08, vf08, vf28
lqi.xyzw vf15, vi01 | maxw.w vf10, vf10, vf02
mtir vi11, vf09.x | itof0.xyzw vf23, vf23
mtir vi14, vf09.y | maxx.xyzw vf13, vf13, vf00
sq.xyzw vf08, -4(vi01) | miniw.w vf08, vf08, vf01
div Q, vf01.w, vf08.w | add.zw vf09, vf09, vf17
move.xyzw vf21, vf10 | add.xyzw vf12, vf12, vf18
iand vi11, vi11, vi05 | add.xyzw vf15, vf15, vf19
nop | mulax.xyzw ACC, vf04, vf13
ibgtz vi09, L43 | madday.xyzw ACC, vf05, vf13
iand vi14, vi14, vi05 | maddaz.xyzw ACC, vf06, vf13
nop | addx.w vf21, vf21, vf17
L43: (L57 in og)
ibne vi05, vi11, L44 | maddw.xyzw vf13, vf07, vf00
ilw.x vi09, -9(vi01) | mul.xyz vf08, vf08, Q
mtir vi11, vf12.x | mul.xyzw vf14, vf14, Q
mtir vi14, vf12.y | ftoi4.xyzw vf21, vf21
b L45 | mul.xyzw vf13, vf13, vf23
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22
L44: (L58 in og)
lq.xyzw vf20, 0(vi11) | mul.xyzw vf14, vf14, Q
lq.xyzw vf25, 0(vi14) | ftoi4.xyzw vf21, vf21
lq.xyzw vf23, 1(vi11) | mul.xyzw vf13, vf13, vf23
lq.xyzw vf26, 1(vi14) | add.xyzw vf08, vf08, vf22
lq.xyzw vf20, 2(vi11) | mulaz.xyzw ACC, vf20, vf24
lq.xyzw vf27, 2(vi14) | maddw.xyzw vf25, vf25, vf24
lq.xyzw vf23, 3(vi11) | mulaz.xyzw ACC, vf23, vf24
lq.xyzw vf28, 3(vi14) | maddw.xyzw vf26, vf26, vf24
lq.xyzw vf20, 4(vi11) | mulaz.xyzw ACC, vf20, vf24
lq.xyz vf29, 4(vi14) | maddw.xyzw vf27, vf27, vf24
lq.xyzw vf23, 5(vi11) | mulaz.xyzw ACC, vf23, vf24
lq.xyz vf30, 5(vi14) | maddw.xyzw vf28, vf28, vf24
lq.xyzw vf20, 6(vi11) | mulaz.xyzw ACC, vf20, vf24
lq.xyzw vf31, 6(vi14) | maddw.xyz vf29, vf29, vf24
lqi.xyzw vf23, vi02 | mulaz.xyzw ACC, vf23, vf24
mtir vi11, vf12.x | maddw.xyz vf30, vf30, vf24
mtir vi14, vf12.y | mulaz.xyzw ACC, vf20, vf24
b L30 | maddw.xyzw vf31, vf31, vf24
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
L45: (L59 in og)
ibgez vi09, L46 | mulaz.xyzw ACC, vf29, vf09
sq.xyzw vf21, 2(vi12) | maddaz.xyzw ACC, vf30, vf12
nop | ftoi4.xyzw vf21, vf10
L46: (L60 in og)
mfp.w vf20, P | maddz.xyz vf12, vf31, vf15
sq.xyzw vf16, 0(vi12) | miniy.xyzw vf13, vf13, vf17
sq.xyzw vf16, 0(vi15) | miniw.w vf08, vf08, vf03
sq.xyzw vf21, 2(vi15) | mulaw.xyzw ACC, vf25, vf09
ilw.y vi09, -6(vi01) | mulw.xyzw vf11, vf11, vf20
1024.0 | ftoi0.xyzw vf13, vf13 :i
erleng.xyz P, vf12 | maxi.xy vf08, vf08, I
ibne vi06, vi03, L32 | maddaw.xyzw ACC, vf26, vf12
mr32.z vf15, vf00 | maddw.xyzw vf09, vf27, vf15
ibne vi07, vi03, L57 | nop
nop | nop
b L77 | nop
nop | nop
;; mat 3
L47:
lqi.xyzw vf08, vi01 | nop
lqi.xyzw vf24, vi02 | nop
lqi.xyzw vf11, vi01 | nop
lqi.xyzw vf14, vi01 | nop
mtir vi10, vf08.x | nop
mtir vi13, vf08.y | itof0.xyzw vf24, vf24
nop | add.zw vf08, vf08, vf17
nop | add.xyzw vf11, vf11, vf18
iand vi10, vi10, vi05 | add.xyzw vf14, vf14, vf19
ilw.w vi08, -1(vi02) | mulw.xyzw vf24, vf24, vf29
iand vi13, vi13, vi05 | nop
lq.xyzw vf20, 0(vi10) | nop
lq.xyzw vf31, 0(vi13) | nop
lq.xyzw vf25, 0(vi08) | nop
lq.xyzw vf23, 1(vi10) | nop
lq.xyzw vf20, 1(vi13) | mulax.xyzw ACC, vf20, vf24
lq.xyzw vf26, 1(vi08) | madday.xyzw ACC, vf31, vf24
lq.xyzw vf31, 2(vi10) | maddz.xyzw vf25, vf25, vf24
lq.xyzw vf23, 2(vi13) | mulax.xyzw ACC, vf23, vf24
lq.xyzw vf27, 2(vi08) | madday.xyzw ACC, vf20, vf24
lq.xyzw vf20, 3(vi10) | maddz.xyzw vf26, vf26, vf24
lq.xyzw vf31, 3(vi13) | mulax.xyzw ACC, vf31, vf24
lq.xyzw vf28, 3(vi08) | madday.xyzw ACC, vf23, vf24
lq.xyzw vf23, 4(vi10) | maddz.xyzw vf27, vf27, vf24
lq.xyzw vf20, 4(vi13) | mulax.xyzw ACC, vf20, vf24
lq.xyz vf29, 4(vi08) | madday.xyzw ACC, vf31, vf24
lq.xyzw vf31, 5(vi10) | maddz.xyzw vf28, vf28, vf24
lq.xyzw vf23, 5(vi13) | mulax.xyzw ACC, vf23, vf24
lq.xyz vf30, 5(vi08) | madday.xyzw ACC, vf20, vf24
lq.xyzw vf20, 6(vi10) | maddz.xyz vf29, vf29, vf24
lq.xyzw vf22, 6(vi13) | mulax.xyzw ACC, vf31, vf24
lq.xyzw vf31, 6(vi08) | madday.xyzw ACC, vf23, vf24
lqi.xyzw vf23, vi02 | maddz.xyz vf30, vf30, vf24
mtir vi10, vf11.x | mulax.xyzw ACC, vf20, vf24
mtir vi13, vf11.y | madday.xyzw ACC, vf22, vf24
lq.xyzw vf22, 2(vi00) | maddz.xyzw vf31, vf31, vf24
nop | itof0.xyzw vf24, vf23
nop | mulaz.xyzw ACC, vf29, vf08
nop | maddaz.xyzw ACC, vf30, vf11
nop | maddz.xyz vf11, vf31, vf14
nop | nop
nop | nop
nop | mulaw.xyzw ACC, vf25, vf08
nop | nop
erleng.xyz P, vf11 | nop
nop | maddaw.xyzw ACC, vf26, vf11
mr32.z vf14, vf00 | maddw.xyzw vf08, vf27, vf14
lqi.xyzw vf09, vi01 | nop
ilwr.y vi03, vi12 | nop
ilw.z vi07, 1(vi12) | nop
lqi.xyzw vf12, vi01 | add.xyzw vf08, vf08, vf28
lqi.xyzw vf15, vi01 | nop
mtir vi11, vf09.x | nop
mtir vi14, vf09.y | nop
sq.xyzw vf08, -4(vi01) | miniw.w vf08, vf08, vf01
div Q, vf01.w, vf08.w | add.zw vf09, vf09, vf17
iadd vi03, vi03, vi12 | add.xyzw vf12, vf12, vf18
iand vi11, vi11, vi05 | add.xyzw vf15, vf15, vf19
ilw.w vi08, -1(vi02) | nop
iadd vi07, vi07, vi03 | nop
iand vi14, vi14, vi05 | nop
ibne vi05, vi11, L48 | nop
iaddi vi07, vi07, -0x1 | mul.xyz vf08, vf08, Q
mtir vi11, vf12.x | mul.xyzw vf14, vf14, Q
mtir vi14, vf12.y | nop
b L49 | nop
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22
L48:
lq.xyzw vf20, 0(vi11) | mul.xyzw vf14, vf14, Q
nop | mulw.xyzw vf24, vf24, vf29
lq.xyzw vf31, 0(vi14) | nop
lq.xyzw vf25, 0(vi08) | nop
lq.xyzw vf23, 1(vi11) | add.xyzw vf08, vf08, vf22
lq.xyzw vf20, 1(vi14) | mulax.xyzw ACC, vf20, vf24
lq.xyzw vf26, 1(vi08) | madday.xyzw ACC, vf31, vf24
lq.xyzw vf31, 2(vi11) | maddz.xyzw vf25, vf25, vf24
lq.xyzw vf23, 2(vi14) | mulax.xyzw ACC, vf23, vf24
lq.xyzw vf27, 2(vi08) | madday.xyzw ACC, vf20, vf24
lq.xyzw vf20, 3(vi11) | maddz.xyzw vf26, vf26, vf24
lq.xyzw vf31, 3(vi14) | mulax.xyzw ACC, vf31, vf24
lq.xyzw vf28, 3(vi08) | madday.xyzw ACC, vf23, vf24
lq.xyzw vf23, 4(vi11) | maddz.xyzw vf27, vf27, vf24
lq.xyzw vf20, 4(vi14) | mulax.xyzw ACC, vf20, vf24
lq.xyz vf29, 4(vi08) | madday.xyzw ACC, vf31, vf24
lq.xyzw vf31, 5(vi11) | maddz.xyzw vf28, vf28, vf24
lq.xyzw vf23, 5(vi14) | mulax.xyzw ACC, vf23, vf24
lq.xyz vf30, 5(vi08) | madday.xyzw ACC, vf20, vf24
lq.xyzw vf20, 6(vi11) | maddz.xyz vf29, vf29, vf24
lq.xyzw vf22, 6(vi14) | mulax.xyzw ACC, vf31, vf24
lq.xyzw vf31, 6(vi08) | madday.xyzw ACC, vf23, vf24
lqi.xyzw vf23, vi02 | maddz.xyz vf30, vf30, vf24
mtir vi11, vf12.x | mulax.xyzw ACC, vf20, vf24
mtir vi14, vf12.y | madday.xyzw ACC, vf22, vf24
lq.xyzw vf22, 2(vi00) | maddz.xyzw vf31, vf31, vf24
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
L49:
nop | mulaz.xyzw ACC, vf29, vf09
nop | maddaz.xyzw ACC, vf30, vf12
mfp.w vf20, P | maddz.xyz vf12, vf31, vf15
nop | nop
1024.0 | miniw.w vf08, vf08, vf03 :i
nop | mulaw.xyzw ACC, vf25, vf09
ilw.y vi09, -6(vi01) | mulw.xyzw vf11, vf11, vf20
erleng.xyz P, vf12 | maxi.xy vf08, vf08, I
3072.0 | nop :i
sq.xyzw vf11, -1(vi03) | minii.xy vf08, vf08, I
nop | maddaw.xyzw ACC, vf26, vf12
mr32.z vf15, vf00 | maddw.xyzw vf09, vf27, vf15
L50:
lqi.xyzw vf10, vi01 | mulax.xyzw ACC, vf01, vf11
b L52 | madday.xyzw ACC, vf02, vf11
nop | maddz.xyzw vf11, vf03, vf11
L51:
sq.xyzw vf11, -1(vi03) | nop
3072.0 | mulax.xyzw ACC, vf01, vf11 :i
lqi.xyzw vf10, vi01 | minii.xy vf08, vf08, I
sq.xyzw vf13, 1(vi12) | madday.xyzw ACC, vf02, vf11
sq.xyzw vf13, 1(vi15) | maddz.xyzw vf11, vf03, vf11
L52:
lqi.xyzw vf13, vi01 | add.xyzw vf09, vf09, vf28
lqi.xyzw vf16, vi01 | maxw.w vf08, vf08, vf02
mtir vi12, vf10.x | itof0.xyzw vf23, vf23
mtir vi15, vf10.y | maxx.xyzw vf11, vf11, vf00
sq.xyzw vf09, -4(vi01) | miniw.w vf09, vf09, vf01
div Q, vf01.w, vf09.w | add.zw vf10, vf10, vf17
move.xyzw vf21, vf08 | add.xyzw vf13, vf13, vf18
iand vi12, vi12, vi05 | add.xyzw vf16, vf16, vf19
ilw.w vi08, -1(vi02) | mulax.xyzw ACC, vf04, vf11
ibgtz vi09, L53 | madday.xyzw ACC, vf05, vf11
iand vi15, vi15, vi05 | maddaz.xyzw ACC, vf06, vf11
nop | addx.w vf21, vf21, vf17
L53:
ibne vi05, vi12, L54 | maddw.xyzw vf11, vf07, vf00
ilw.x vi09, -9(vi01) | mul.xyz vf09, vf09, Q
mtir vi12, vf13.x | mul.xyzw vf15, vf15, Q
mtir vi15, vf13.y | ftoi4.xyzw vf21, vf21
b L55 | mul.xyzw vf11, vf11, vf23
lqi.xyzw vf23, vi03 | add.xyzw vf09, vf09, vf22
L54:
lq.xyzw vf20, 0(vi12) | mul.xyzw vf15, vf15, Q
nop | mulw.xyzw vf24, vf24, vf29
lq.xyzw vf31, 0(vi15) | ftoi4.xyzw vf21, vf21
lq.xyzw vf25, 0(vi08) | mul.xyzw vf11, vf11, vf23
lq.xyzw vf23, 1(vi12) | add.xyzw vf09, vf09, vf22
lq.xyzw vf20, 1(vi15) | mulax.xyzw ACC, vf20, vf24
lq.xyzw vf26, 1(vi08) | madday.xyzw ACC, vf31, vf24
lq.xyzw vf31, 2(vi12) | maddz.xyzw vf25, vf25, vf24
lq.xyzw vf23, 2(vi15) | mulax.xyzw ACC, vf23, vf24
lq.xyzw vf27, 2(vi08) | madday.xyzw ACC, vf20, vf24
lq.xyzw vf20, 3(vi12) | maddz.xyzw vf26, vf26, vf24
lq.xyzw vf31, 3(vi15) | mulax.xyzw ACC, vf31, vf24
lq.xyzw vf28, 3(vi08) | madday.xyzw ACC, vf23, vf24
lq.xyzw vf23, 4(vi12) | maddz.xyzw vf27, vf27, vf24
lq.xyzw vf20, 4(vi15) | mulax.xyzw ACC, vf20, vf24
lq.xyz vf29, 4(vi08) | madday.xyzw ACC, vf31, vf24
lq.xyzw vf31, 5(vi12) | maddz.xyzw vf28, vf28, vf24
lq.xyzw vf23, 5(vi15) | mulax.xyzw ACC, vf23, vf24
lq.xyz vf30, 5(vi08) | madday.xyzw ACC, vf20, vf24
lq.xyzw vf20, 6(vi12) | maddz.xyz vf29, vf29, vf24
lq.xyzw vf22, 6(vi15) | mulax.xyzw ACC, vf31, vf24
lq.xyzw vf31, 6(vi08) | madday.xyzw ACC, vf23, vf24
lqi.xyzw vf23, vi02 | maddz.xyz vf30, vf30, vf24
mtir vi12, vf13.x | mulax.xyzw ACC, vf20, vf24
mtir vi15, vf13.y | madday.xyzw ACC, vf22, vf24
lq.xyzw vf22, 2(vi00) | maddz.xyzw vf31, vf31, vf24
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
L55: (L70 in og)
ibgez vi09, L56 | mulaz.xyzw ACC, vf29, vf10
sq.xyzw vf21, 2(vi10) | maddaz.xyzw ACC, vf30, vf13
nop | ftoi4.xyzw vf21, vf08
L56:
mfp.w vf20, P | maddz.xyz vf13, vf31, vf16
sq.xyzw vf14, 0(vi10) | miniy.xyzw vf11, vf11, vf17
sq.xyzw vf14, 0(vi13) | miniw.w vf09, vf09, vf03
sq.xyzw vf21, 2(vi13) | mulaw.xyzw ACC, vf25, vf10
ilw.y vi09, -6(vi01) | mulw.xyzw vf12, vf12, vf20
1024.0 | ftoi0.xyzw vf11, vf11 :i
erleng.xyz P, vf13 | maxi.xy vf09, vf09, I
ibeq vi07, vi03, L67 | maddaw.xyzw ACC, vf26, vf13
mr32.z vf16, vf00 | maddw.xyzw vf10, vf27, vf16
L57:
sq.xyzw vf12, -1(vi03) | nop
3072.0 | mulax.xyzw ACC, vf01, vf12 :i
lqi.xyzw vf08, vi01 | minii.xy vf09, vf09, I
sq.xyzw vf11, 1(vi10) | madday.xyzw ACC, vf02, vf12
sq.xyzw vf11, 1(vi13) | maddz.xyzw vf12, vf03, vf12
lqi.xyzw vf11, vi01 | add.xyzw vf10, vf10, vf28
lqi.xyzw vf14, vi01 | maxw.w vf09, vf09, vf02
mtir vi10, vf08.x | itof0.xyzw vf23, vf23
mtir vi13, vf08.y | maxx.xyzw vf12, vf12, vf00
sq.xyzw vf10, -4(vi01) | miniw.w vf10, vf10, vf01
div Q, vf01.w, vf10.w | add.zw vf08, vf08, vf17
move.xyzw vf21, vf09 | add.xyzw vf11, vf11, vf18
iand vi10, vi10, vi05 | add.xyzw vf14, vf14, vf19
ilw.w vi08, -1(vi02) | mulax.xyzw ACC, vf04, vf12
ibgtz vi09, L58 | madday.xyzw ACC, vf05, vf12
iand vi13, vi13, vi05 | maddaz.xyzw ACC, vf06, vf12
nop | addx.w vf21, vf21, vf17
L58:
ibne vi05, vi10, L59 | maddw.xyzw vf12, vf07, vf00
ilw.x vi09, -9(vi01) | mul.xyz vf10, vf10, Q
mtir vi10, vf11.x | mul.xyzw vf16, vf16, Q
mtir vi13, vf11.y | ftoi4.xyzw vf21, vf21
b L60 | mul.xyzw vf12, vf12, vf23
lqi.xyzw vf23, vi03 | add.xyzw vf10, vf10, vf22
L59:
lq.xyzw vf20, 0(vi10) | mul.xyzw vf16, vf16, Q
nop | mulw.xyzw vf24, vf24, vf29
lq.xyzw vf31, 0(vi13) | ftoi4.xyzw vf21, vf21
lq.xyzw vf25, 0(vi08) | mul.xyzw vf12, vf12, vf23
lq.xyzw vf23, 1(vi10) | add.xyzw vf10, vf10, vf22
lq.xyzw vf20, 1(vi13) | mulax.xyzw ACC, vf20, vf24
lq.xyzw vf26, 1(vi08) | madday.xyzw ACC, vf31, vf24
lq.xyzw vf31, 2(vi10) | maddz.xyzw vf25, vf25, vf24
lq.xyzw vf23, 2(vi13) | mulax.xyzw ACC, vf23, vf24
lq.xyzw vf27, 2(vi08) | madday.xyzw ACC, vf20, vf24
lq.xyzw vf20, 3(vi10) | maddz.xyzw vf26, vf26, vf24
lq.xyzw vf31, 3(vi13) | mulax.xyzw ACC, vf31, vf24
lq.xyzw vf28, 3(vi08) | madday.xyzw ACC, vf23, vf24
lq.xyzw vf23, 4(vi10) | maddz.xyzw vf27, vf27, vf24
lq.xyzw vf20, 4(vi13) | mulax.xyzw ACC, vf20, vf24
lq.xyz vf29, 4(vi08) | madday.xyzw ACC, vf31, vf24
lq.xyzw vf31, 5(vi10) | maddz.xyzw vf28, vf28, vf24
lq.xyzw vf23, 5(vi13) | mulax.xyzw ACC, vf23, vf24
lq.xyz vf30, 5(vi08) | madday.xyzw ACC, vf20, vf24
lq.xyzw vf20, 6(vi10) | maddz.xyz vf29, vf29, vf24
lq.xyzw vf22, 6(vi13) | mulax.xyzw ACC, vf31, vf24
lq.xyzw vf31, 6(vi08) | madday.xyzw ACC, vf23, vf24
lqi.xyzw vf23, vi02 | maddz.xyz vf30, vf30, vf24
mtir vi10, vf11.x | mulax.xyzw ACC, vf20, vf24
mtir vi13, vf11.y | madday.xyzw ACC, vf22, vf24
lq.xyzw vf22, 2(vi00) | maddz.xyzw vf31, vf31, vf24
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
L60:
ibgez vi09, L61 | mulaz.xyzw ACC, vf29, vf08
sq.xyzw vf21, 2(vi11) | maddaz.xyzw ACC, vf30, vf11
nop | ftoi4.xyzw vf21, vf09
L61:
mfp.w vf20, P | maddz.xyz vf11, vf31, vf14
sq.xyzw vf15, 0(vi11) | miniy.xyzw vf12, vf12, vf17
sq.xyzw vf15, 0(vi14) | miniw.w vf10, vf10, vf03
sq.xyzw vf21, 2(vi14) | mulaw.xyzw ACC, vf25, vf08
ilw.y vi09, -6(vi01) | mulw.xyzw vf13, vf13, vf20
1024.0 | ftoi0.xyzw vf12, vf12 :i
erleng.xyz P, vf11 | maxi.xy vf10, vf10, I
ibeq vi07, vi03, L72 | maddaw.xyzw ACC, vf26, vf11
mr32.z vf14, vf00 | maddw.xyzw vf08, vf27, vf14
L62:
sq.xyzw vf13, -1(vi03) | nop
3072.0 | mulax.xyzw ACC, vf01, vf13 :i
lqi.xyzw vf09, vi01 | minii.xy vf10, vf10, I
sq.xyzw vf12, 1(vi11) | madday.xyzw ACC, vf02, vf13
sq.xyzw vf12, 1(vi14) | maddz.xyzw vf13, vf03, vf13
lqi.xyzw vf12, vi01 | add.xyzw vf08, vf08, vf28
lqi.xyzw vf15, vi01 | maxw.w vf10, vf10, vf02
mtir vi11, vf09.x | itof0.xyzw vf23, vf23
mtir vi14, vf09.y | maxx.xyzw vf13, vf13, vf00
sq.xyzw vf08, -4(vi01) | miniw.w vf08, vf08, vf01
div Q, vf01.w, vf08.w | add.zw vf09, vf09, vf17
move.xyzw vf21, vf10 | add.xyzw vf12, vf12, vf18
iand vi11, vi11, vi05 | add.xyzw vf15, vf15, vf19
ilw.w vi08, -1(vi02) | mulax.xyzw ACC, vf04, vf13
ibgtz vi09, L63 | madday.xyzw ACC, vf05, vf13
iand vi14, vi14, vi05 | maddaz.xyzw ACC, vf06, vf13
nop | addx.w vf21, vf21, vf17
L63:
ibne vi05, vi11, L64 | maddw.xyzw vf13, vf07, vf00
ilw.x vi09, -9(vi01) | mul.xyz vf08, vf08, Q
mtir vi11, vf12.x | mul.xyzw vf14, vf14, Q
mtir vi14, vf12.y | ftoi4.xyzw vf21, vf21
b L65 | mul.xyzw vf13, vf13, vf23
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22
L64:
lq.xyzw vf20, 0(vi11) | mul.xyzw vf14, vf14, Q
nop | mulw.xyzw vf24, vf24, vf29
lq.xyzw vf31, 0(vi14) | ftoi4.xyzw vf21, vf21
lq.xyzw vf25, 0(vi08) | mul.xyzw vf13, vf13, vf23
lq.xyzw vf23, 1(vi11) | add.xyzw vf08, vf08, vf22
lq.xyzw vf20, 1(vi14) | mulax.xyzw ACC, vf20, vf24
lq.xyzw vf26, 1(vi08) | madday.xyzw ACC, vf31, vf24
lq.xyzw vf31, 2(vi11) | maddz.xyzw vf25, vf25, vf24
lq.xyzw vf23, 2(vi14) | mulax.xyzw ACC, vf23, vf24
lq.xyzw vf27, 2(vi08) | madday.xyzw ACC, vf20, vf24
lq.xyzw vf20, 3(vi11) | maddz.xyzw vf26, vf26, vf24
lq.xyzw vf31, 3(vi14) | mulax.xyzw ACC, vf31, vf24
lq.xyzw vf28, 3(vi08) | madday.xyzw ACC, vf23, vf24
lq.xyzw vf23, 4(vi11) | maddz.xyzw vf27, vf27, vf24
lq.xyzw vf20, 4(vi14) | mulax.xyzw ACC, vf20, vf24
lq.xyz vf29, 4(vi08) | madday.xyzw ACC, vf31, vf24
lq.xyzw vf31, 5(vi11) | maddz.xyzw vf28, vf28, vf24
lq.xyzw vf23, 5(vi14) | mulax.xyzw ACC, vf23, vf24
lq.xyz vf30, 5(vi08) | madday.xyzw ACC, vf20, vf24
lq.xyzw vf20, 6(vi11) | maddz.xyz vf29, vf29, vf24
lq.xyzw vf22, 6(vi14) | mulax.xyzw ACC, vf31, vf24
lq.xyzw vf31, 6(vi08) | madday.xyzw ACC, vf23, vf24
lqi.xyzw vf23, vi02 | maddz.xyz vf30, vf30, vf24
mtir vi11, vf12.x | mulax.xyzw ACC, vf20, vf24
mtir vi14, vf12.y | madday.xyzw ACC, vf22, vf24
lq.xyzw vf22, 2(vi00) | maddz.xyzw vf31, vf31, vf24
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
L65:
ibgez vi09, L66 | mulaz.xyzw ACC, vf29, vf09
sq.xyzw vf21, 2(vi12) | maddaz.xyzw ACC, vf30, vf12
nop | ftoi4.xyzw vf21, vf10
L66: (L80 in og)
mfp.w vf20, P | maddz.xyz vf12, vf31, vf15
sq.xyzw vf16, 0(vi12) | miniy.xyzw vf13, vf13, vf17
sq.xyzw vf16, 0(vi15) | miniw.w vf08, vf08, vf03
sq.xyzw vf21, 2(vi15) | mulaw.xyzw ACC, vf25, vf09
ilw.y vi09, -6(vi01) | mulw.xyzw vf11, vf11, vf20
1024.0 | ftoi0.xyzw vf13, vf13 :i
erleng.xyz P, vf12 | maxi.xy vf08, vf08, I
ibne vi07, vi03, L51 | maddaw.xyzw ACC, vf26, vf12
mr32.z vf15, vf00 | maddw.xyzw vf09, vf27, vf15
b L77 | nop
nop | nop
;;;;;;;;;;; OG merc has a bunch of merc prime alternate paths here.
;;;; next we have 3x pipeline exits.
;;
L67:
3072.0 | mulax.xyzw ACC, vf01, vf12 :i
sq.xyzw vf12, -1(vi03) | minii.xy vf09, vf09, I
sq.xyzw vf11, 1(vi10) | madday.xyzw ACC, vf02, vf12
sq.xyzw vf11, 1(vi13) | maddz.xyzw vf12, vf03, vf12
iaddiu vi05, vi00, 0x173 | add.xyzw vf10, vf10, vf28
lq.xyzw vf26, 1(vi00) | maxw.w vf09, vf09, vf02
iaddi vi08, vi00, 0x1 | itof0.xyzw vf23, vf23
isw.x vi08, -2(vi05) | maxx.xyzw vf12, vf12, vf00
sq.xyzw vf10, -1(vi01) | miniw.w vf10, vf10, vf01
div Q, vf01.w, vf10.w | nop
move.xyzw vf21, vf09 | nop
iaddiu vi08, vi00, 0x42 | nop
isw.z vi08, -1(vi05) | mulax.xyzw ACC, vf04, vf12
ibgtz vi09, L68 | madday.xyzw ACC, vf05, vf12
isw.x vi00, -1(vi05) | maddaz.xyzw ACC, vf06, vf12
nop | addx.w vf21, vf21, vf17
L68:
sq.yzw vf26, -2(vi05) | maddw.xyzw vf12, vf07, vf00
ilw.x vi09, -6(vi01) | mul.xyz vf10, vf10, Q
iaddiu vi08, vi00, 0x171 | mul.xyzw vf16, vf16, Q
nop | ftoi4.xyzw vf21, vf21
nop | mul.xyzw vf12, vf12, vf23
lqi.xyzw vf23, vi03 | add.xyzw vf10, vf10, vf22
ibgez vi09, L69 | nop
sq.xyzw vf21, 2(vi11) | nop
nop | ftoi4.xyzw vf21, vf09
L69:
mfp.w vf20, P | nop
sq.xyzw vf15, 0(vi11) | miniy.xyzw vf12, vf12, vf17
sq.xyzw vf15, 0(vi14) | miniw.w vf10, vf10, vf03
sq.xyzw vf21, 2(vi14) | nop
ilw.y vi09, -3(vi01) | mulw.xyzw vf13, vf13, vf20
1024.0 | ftoi0.xyzw vf12, vf12 :i
nop | maxi.xy vf10, vf10, I
nop | nop
3072.0 | mulax.xyzw ACC, vf01, vf13 :i
sq.xyzw vf13, -1(vi03) | minii.xy vf10, vf10, I
sq.xyzw vf12, 1(vi11) | madday.xyzw ACC, vf02, vf13
sq.xyzw vf12, 1(vi14) | maddz.xyzw vf13, vf03, vf13
nop | nop
nop | maxw.w vf10, vf10, vf02
nop | itof0.xyzw vf23, vf23
nop | maxx.xyzw vf13, vf13, vf00
nop | nop
move.xyzw vf21, vf10 | nop
nop | nop
nop | mulax.xyzw ACC, vf04, vf13
ibgtz vi09, L70 | madday.xyzw ACC, vf05, vf13
nop | maddaz.xyzw ACC, vf06, vf13
nop | addx.w vf21, vf21, vf17
L70:
nop | maddw.xyzw vf13, vf07, vf00
ilw.x vi09, -3(vi01) | nop
xtop vi05 | nop
iaddiu vi05, vi05, 0x8c | ftoi4.xyzw vf21, vf21
ilwr.z vi01, vi05 | mul.xyzw vf13, vf13, vf23
ilwr.y vi03, vi05 | nop
ibgez vi09, L71 | nop
sq.xyzw vf21, 2(vi12) | nop
nop | ftoi4.xyzw vf21, vf10
L71:
nop | nop
sq.xyzw vf16, 0(vi12) | miniy.xyzw vf13, vf13, vf17
sq.xyzw vf16, 0(vi15) | nop
sq.xyzw vf21, 2(vi15) | nop
nop | nop
nop | ftoi0.xyzw vf13, vf13
lq.xyzw vf23, 124(vi00) | nop
iadd vi01, vi01, vi05 | nop
iadd vi03, vi03, vi05 | nop
sq.xyzw vf13, 1(vi12) | nop
b L82 | nop
sq.xyzw vf13, 1(vi15) |
L72:
3072.0 | mulax.xyzw ACC, vf01, vf13 :i
sq.xyzw vf13, -1(vi03) | minii.xy vf10, vf10, I
sq.xyzw vf12, 1(vi11) | madday.xyzw ACC, vf02, vf13
sq.xyzw vf12, 1(vi14) | maddz.xyzw vf13, vf03, vf13
iaddiu vi05, vi00, 0x173 | add.xyzw vf08, vf08, vf28
lq.xyzw vf26, 1(vi00) | maxw.w vf10, vf10, vf02
iaddi vi08, vi00, 0x1 | itof0.xyzw vf23, vf23
isw.x vi08, -2(vi05) | maxx.xyzw vf13, vf13, vf00
sq.xyzw vf08, -1(vi01) | miniw.w vf08, vf08, vf01
div Q, vf01.w, vf08.w | nop
move.xyzw vf21, vf10 | nop
iaddiu vi08, vi00, 0x42 | nop
isw.z vi08, -1(vi05) | mulax.xyzw ACC, vf04, vf13
ibgtz vi09, L73 | madday.xyzw ACC, vf05, vf13
isw.x vi00, -1(vi05) | maddaz.xyzw ACC, vf06, vf13
nop | addx.w vf21, vf21, vf17
L73:
sq.yzw vf26, -2(vi05) | maddw.xyzw vf13, vf07, vf00
ilw.x vi09, -6(vi01) | mul.xyz vf08, vf08, Q
iaddiu vi08, vi00, 0x171 | mul.xyzw vf14, vf14, Q
nop | ftoi4.xyzw vf21, vf21
nop | mul.xyzw vf13, vf13, vf23
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22
ibgez vi09, L74 | nop
sq.xyzw vf21, 2(vi12) | nop
nop | ftoi4.xyzw vf21, vf10
L74:
mfp.w vf20, P | nop
sq.xyzw vf16, 0(vi12) | miniy.xyzw vf13, vf13, vf17
sq.xyzw vf16, 0(vi15) | miniw.w vf08, vf08, vf03
sq.xyzw vf21, 2(vi15) | nop
ilw.y vi09, -3(vi01) | mulw.xyzw vf11, vf11, vf20
1024.0 | ftoi0.xyzw vf13, vf13 :i
nop | maxi.xy vf08, vf08, I
nop | nop
3072.0 | mulax.xyzw ACC, vf01, vf11 :i
sq.xyzw vf11, -1(vi03) | minii.xy vf08, vf08, I
sq.xyzw vf13, 1(vi12) | madday.xyzw ACC, vf02, vf11
sq.xyzw vf13, 1(vi15) | maddz.xyzw vf11, vf03, vf11
nop | nop
nop | maxw.w vf08, vf08, vf02
nop | itof0.xyzw vf23, vf23
nop | maxx.xyzw vf11, vf11, vf00
nop | nop
move.xyzw vf21, vf08 | nop
nop | nop
nop | mulax.xyzw ACC, vf04, vf11
ibgtz vi09, L75 | madday.xyzw ACC, vf05, vf11
nop | maddaz.xyzw ACC, vf06, vf11
nop | addx.w vf21, vf21, vf17
L75:
nop | maddw.xyzw vf11, vf07, vf00
ilw.x vi09, -3(vi01) | nop
xtop vi05 | nop
iaddiu vi05, vi05, 0x8c | ftoi4.xyzw vf21, vf21
ilwr.z vi01, vi05 | mul.xyzw vf11, vf11, vf23
ilwr.y vi03, vi05 | nop
ibgez vi09, L76 | nop
sq.xyzw vf21, 2(vi10) | nop
nop | ftoi4.xyzw vf21, vf08
L76:
nop | nop
sq.xyzw vf14, 0(vi10) | miniy.xyzw vf11, vf11, vf17
sq.xyzw vf14, 0(vi13) | nop
sq.xyzw vf21, 2(vi13) | nop
nop | nop
nop | ftoi0.xyzw vf11, vf11
lq.xyzw vf23, 124(vi00) | nop
iadd vi01, vi01, vi05 | nop
iadd vi03, vi03, vi05 | nop
sq.xyzw vf11, 1(vi10) | nop
b L82 | nop
sq.xyzw vf11, 1(vi13) | nop
L77:
3072.0 | mulax.xyzw ACC, vf01, vf11 :i
sq.xyzw vf11, -1(vi03) | minii.xy vf08, vf08, I
sq.xyzw vf13, 1(vi12) | madday.xyzw ACC, vf02, vf11
sq.xyzw vf13, 1(vi15) | maddz.xyzw vf11, vf03, vf11
iaddiu vi05, vi00, 0x173 | add.xyzw vf09, vf09, vf28
lq.xyzw vf26, 1(vi00) | maxw.w vf08, vf08, vf02
iaddi vi08, vi00, 0x1 | itof0.xyzw vf23, vf23
isw.x vi08, -2(vi05) | maxx.xyzw vf11, vf11, vf00
sq.xyzw vf09, -1(vi01) | miniw.w vf09, vf09, vf01
div Q, vf01.w, vf09.w | nop
move.xyzw vf21, vf08 | nop
iaddiu vi08, vi00, 0x42 | nop
isw.z vi08, -1(vi05) | mulax.xyzw ACC, vf04, vf11
ibgtz vi09, L78 | madday.xyzw ACC, vf05, vf11
isw.x vi00, -1(vi05) | maddaz.xyzw ACC, vf06, vf11
nop | addx.w vf21, vf21, vf17
L78:
sq.yzw vf26, -2(vi05) | maddw.xyzw vf11, vf07, vf00
ilw.x vi09, -6(vi01) | mul.xyz vf09, vf09, Q
iaddiu vi08, vi00, 0x171 | mul.xyzw vf15, vf15, Q ;; vi08 = 0x171: output location (fixed?)
nop | ftoi4.xyzw vf21, vf21
nop | mul.xyzw vf11, vf11, vf23
lqi.xyzw vf23, vi03 | add.xyzw vf09, vf09, vf22
ibgez vi09, L79 | nop
sq.xyzw vf21, 2(vi10) | nop
nop | ftoi4.xyzw vf21, vf08
L79:
mfp.w vf20, P | nop
sq.xyzw vf14, 0(vi10) | miniy.xyzw vf11, vf11, vf17
sq.xyzw vf14, 0(vi13) | miniw.w vf09, vf09, vf03
sq.xyzw vf21, 2(vi13) | nop
ilw.y vi09, -3(vi01) | mulw.xyzw vf12, vf12, vf20
1024.0 | ftoi0.xyzw vf11, vf11 :i
nop | maxi.xy vf09, vf09, I
nop | nop
3072.0 | mulax.xyzw ACC, vf01, vf12 :i
sq.xyzw vf12, -1(vi03) | minii.xy vf09, vf09, I
sq.xyzw vf11, 1(vi10) | madday.xyzw ACC, vf02, vf12
sq.xyzw vf11, 1(vi13) | maddz.xyzw vf12, vf03, vf12
nop | nop
nop | maxw.w vf09, vf09, vf02
nop | itof0.xyzw vf23, vf23
nop | maxx.xyzw vf12, vf12, vf00
nop | nop
move.xyzw vf21, vf09 | nop
nop | nop
nop | mulax.xyzw ACC, vf04, vf12
ibgtz vi09, L80 | madday.xyzw ACC, vf05, vf12
nop | maddaz.xyzw ACC, vf06, vf12
nop | addx.w vf21, vf21, vf17
L80:
nop | maddw.xyzw vf12, vf07, vf00
ilw.x vi09, -3(vi01) | nop
xtop vi05 | nop
iaddiu vi05, vi05, 0x8c | ftoi4.xyzw vf21, vf21 ;; vi05 = byte-header
ilwr.z vi01, vi05 | mul.xyzw vf12, vf12, vf23 ;; vi01 = lump
ilwr.y vi03, vi05 | nop ;; vi03 = rgba
ibgez vi09, L81 | nop
sq.xyzw vf21, 2(vi11) | nop
nop | ftoi4.xyzw vf21, vf09
L81:
nop | nop
sq.xyzw vf15, 0(vi11) | miniy.xyzw vf12, vf12, vf17
sq.xyzw vf15, 0(vi14) | nop
sq.xyzw vf21, 2(vi14) | nop
nop | nop
nop | ftoi0.xyzw vf12, vf12
lq.xyzw vf23, 124(vi00) | nop ;; unperspect
iadd vi01, vi01, vi05 | nop ;; lump
iadd vi03, vi03, vi05 | nop ;; rgba
sq.xyzw vf12, 1(vi11) | nop
sq.xyzw vf12, 1(vi14) | nop
;; COMMON finish part
L82:
xgkick vi08 | nop ;; normal draw?
;; pipeline startup for envmap math
lq.xyzw vf08, 2(vi01) | nop ;; vf08 = transformed vert
lqi.xyzw vf10, vi03 | nop ;; vf10 = transformed normal
ilw.x vi04, 1(vi05) | nop ;; vi04 = mat1-cnt
ilw.y vi06, 1(vi05) | nop ;; vi06 = mat2-cnt
ilw.z vi07, 1(vi05) | mul.xyzw vf09, vf08, vf23 ;; vi07 = mat3-cnt, unperspect the vert
iadd vi04, vi04, vi06 | subw.z vf10, vf10, vf00 ;; vi04 = mat1-cnt + mat2-cnt, refl1
iaddi vi01, vi01, 0x3 | nop ;; step lump
iadd vi04, vi04, vi07 | nop ;; vi04 = mat1 + mat2 + mat3 counts
iadd vi02, vi03, vi04 | addw.z vf09, vf00, vf09 ;; vi02 = end rgba, vert1
iaddi vi02, vi02, 0x2 | nop ;; end rgba more
lq.xyzw vf14, 118(vi00) | maxw.xyzw vf21, vf00, vf00 ;; vf14 = rgba-fade, vf21 = [1, 1, 1, 1]
lq.xyzw vf26, 371(vi00) | nop ;; vf26 = the giftag
nop | mul.xyz vf15, vf09, vf10 ;; multiply
lq.xyzw vf27, 119(vi00) | nop ;; vf27 = e-adgif0
nop | nop
lq.xyzw vf28, 120(vi00) | nop ;; vf28 = e-adgif1
nop | adday.xyzw vf15, vf15
lq.xyzw vf31, 121(vi00) | maddz.x vf15, vf21, vf15 ;; vf31 = e-adgif2
nop | nop
sq.xyzw vf26, 813(vi00) | nop ;; store giftag
lq.xyzw vf08, 2(vi01) | nop ;; pipe
lqi.xyzw vf11, vi03 | nop ;; pipe
div Q, vf15.x, vf10.z | nop ;; div
sq.xyzw vf27, 814(vi00) | mulaw.xyzw ACC, vf09, vf00 ;; store e-ad0, mul
nop | mul.xyzw vf09, vf08, vf23 ;; pipe
sq.xyzw vf28, 815(vi00) | subw.z vf11, vf11, vf00 ;; store e-ad1, pipe
iaddi vi01, vi01, 0x3 | nop ;; pipe
sq.xyzw vf31, 816(vi00) | nop ;; store e-ad2
nop | addw.z vf09, vf00, vf09 ;; pipe
lq.xyzw vf26, 0(vi00) | madd.xyzw vf10, vf10, Q ;; vf26 = tristrip giftag, madd
nop | nop
lq.xyzw vf27, 122(vi00) | nop ;; vf27 = e-ad3
nop | mul.xyz vf15, vf09, vf11 ;; pipe
eleng.xyz P, vf10 | nop ;; len
lq.xyzw vf28, 123(vi00) | nop ;; vf28 = e-ad4
nop | nop
lq.xyzw vf31, 377(vi00) | adday.xyzw vf15, vf15 ;; vf31 = old tristrip???
nop | maddz.x vf15, vf21, vf15 ;; pipe
mr32.xyzw vf26, vf26 | nop ;; rotate tristrip template
nop | nop
lq.xyzw vf08, 2(vi01) | nop ;; pipe
lqi.xyzw vf12, vi03 | nop ;; pipe
div Q, vf15.x, vf11.z | nop ;; pipe
mr32.xyzw vf26, vf26 | mulaw.xyzw ACC, vf09, vf00 ;; rotate | pipe
sq.xyzw vf27, 817(vi00) | mul.xyzw vf09, vf08, vf23 ;; store adgif3 | pipe
lq.xyzw vf25, -5(vi01) | subw.z vf12, vf12, vf00 ;; vf25 = lump[1] | pipe
iaddi vi01, vi01, 0x3 | nop ;; pipe
sq.xyzw vf28, 818(vi00) | nop ;; e-ad4 store
nop | addw.z vf09, vf00, vf09 ;; pipe
sq.xyzw vf31, 819(vi00) | madd.xyzw vf11, vf11, Q ;; tristrip store | pipe
nop | nop
mfp.w vf10, P | nop
sq.y vf26, 819(vi00) | mul.xyz vf15, vf09, vf12 ;; set abe | pipe
eleng.xyz P, vf11 | nop
nop | nop
div Q, vf23.z, vf10.w | nop ;; NOT PIPE (!)
nop | adday.xyzw vf15, vf15 ;; pipe
nop | maddz.x vf15, vf21, vf15 ;; pipe
nop | nop
nop | add.xyzw vf25, vf25, vf18 ;; lump dest stuff
L83:
lq.xyzw vf08, 2(vi01) | nop ;; pipe
lqi.xyzw vf13, vi03 | addaz.xyzw vf00, vf23
div Q, vf15.x, vf12.z | madd.xyzw vf10, vf10, Q
mtir vi10, vf25.x | mulaw.xyzw ACC, vf09, vf00
mtir vi13, vf25.y | mul.xyzw vf09, vf08, vf23
lq.xyzw vf25, -5(vi01) | subw.z vf13, vf13, vf00
;;
iaddi vi01, vi01, 0x3 | nop
lq.xyzw vf24, 0(vi10) | nop
lq.xyzw vf16, 2(vi10) | addw.z vf09, vf00, vf09
lq.xyzw vf20, 2(vi13) | madd.xyzw vf12, vf12, Q
sq.xyzw vf14, 443(vi10) | nop
mfp.w vf11, P | nop
sq.xyzw vf14, 443(vi13) | mul.xyz vf15, vf09, vf13
eleng.xyz P, vf12 | mulz.xy vf24, vf10, vf24
sq.xyzw vf16, 444(vi10) | nop
div Q, vf23.z, vf11.w | nop
sq.xyzw vf20, 444(vi13) | adday.xyzw vf15, vf15
sq.xyzw vf24, 442(vi10) | maddz.x vf15, vf21, vf15
ibeq vi02, vi03, L84 | nop
sq.xyzw vf24, 442(vi13) | add.xyzw vf25, vf25, vf18
lq.xyzw vf08, 2(vi01) | nop
lqi.xyzw vf10, vi03 | addaz.xyzw vf00, vf23
div Q, vf15.x, vf13.z | madd.xyzw vf11, vf11, Q
mtir vi10, vf25.x | mulaw.xyzw ACC, vf09, vf00
mtir vi13, vf25.y | mul.xyzw vf09, vf08, vf23
lq.xyzw vf25, -5(vi01) | subw.z vf10, vf10, vf00
iaddi vi01, vi01, 0x3 | nop
lq.xyzw vf24, 0(vi10) | nop
lq.xyzw vf16, 2(vi10) | addw.z vf09, vf00, vf09
lq.xyzw vf20, 2(vi13) | madd.xyzw vf13, vf13, Q
sq.xyzw vf14, 443(vi10) | nop
mfp.w vf12, P | nop
sq.xyzw vf14, 443(vi13) | mul.xyz vf15, vf09, vf10
eleng.xyz P, vf13 | mulz.xy vf24, vf11, vf24
sq.xyzw vf16, 444(vi10) | nop
div Q, vf23.z, vf12.w | nop
sq.xyzw vf20, 444(vi13) | adday.xyzw vf15, vf15
sq.xyzw vf24, 442(vi10) | maddz.x vf15, vf21, vf15
ibeq vi02, vi03, L84 | nop
sq.xyzw vf24, 442(vi13) | add.xyzw vf25, vf25, vf18
lq.xyzw vf08, 2(vi01) | nop
lqi.xyzw vf11, vi03 | addaz.xyzw vf00, vf23
div Q, vf15.x, vf10.z | madd.xyzw vf12, vf12, Q
mtir vi10, vf25.x | mulaw.xyzw ACC, vf09, vf00
mtir vi13, vf25.y | mul.xyzw vf09, vf08, vf23
lq.xyzw vf25, -5(vi01) | subw.z vf11, vf11, vf00
iaddi vi01, vi01, 0x3 | nop
lq.xyzw vf24, 0(vi10) | nop
lq.xyzw vf16, 2(vi10) | addw.z vf09, vf00, vf09
lq.xyzw vf20, 2(vi13) | madd.xyzw vf10, vf10, Q
sq.xyzw vf14, 443(vi10) | nop
mfp.w vf13, P | nop
sq.xyzw vf14, 443(vi13) | mul.xyz vf15, vf09, vf11
eleng.xyz P, vf10 | mulz.xy vf24, vf12, vf24
sq.xyzw vf16, 444(vi10) | nop
div Q, vf23.z, vf13.w | nop
sq.xyzw vf20, 444(vi13) | adday.xyzw vf15, vf15
sq.xyzw vf24, 442(vi10) | maddz.x vf15, vf21, vf15
ibeq vi02, vi03, L84 | nop
sq.xyzw vf24, 442(vi13) | add.xyzw vf25, vf25, vf18
lq.xyzw vf08, 2(vi01) | nop
lqi.xyzw vf12, vi03 | addaz.xyzw vf00, vf23
div Q, vf15.x, vf11.z | madd.xyzw vf13, vf13, Q
mtir vi10, vf25.x | mulaw.xyzw ACC, vf09, vf00
mtir vi13, vf25.y | mul.xyzw vf09, vf08, vf23
lq.xyzw vf25, -5(vi01) | subw.z vf12, vf12, vf00
iaddi vi01, vi01, 0x3 | nop
lq.xyzw vf24, 0(vi10) | nop
lq.xyzw vf16, 2(vi10) | addw.z vf09, vf00, vf09
lq.xyzw vf20, 2(vi13) | madd.xyzw vf11, vf11, Q
sq.xyzw vf14, 443(vi10) | nop
mfp.w vf10, P | nop
sq.xyzw vf14, 443(vi13) | mul.xyz vf15, vf09, vf12
eleng.xyz P, vf11 | mulz.xy vf24, vf13, vf24
sq.xyzw vf16, 444(vi10) | nop
div Q, vf23.z, vf10.w | nop
sq.xyzw vf20, 444(vi13) | adday.xyzw vf15, vf15
sq.xyzw vf24, 442(vi10) | maddz.x vf15, vf21, vf15
ibne vi02, vi03, L83 | nop
sq.xyzw vf24, 442(vi13) | add.xyzw vf25, vf25, vf18
L84:
iaddiu vi08, vi00, 0x32d | nop
xgkick vi08 | nop
nop | nop :e
nop | nop