Lines Matching full:cyc
177 // if so adds 2 cyc to latency, 1 uop, 1 res cycle for A57UnitB
254 // +2cyc for branch forms
385 // For "Load, register offset, minus" we need +1cyc, +1I
419 // Load, immed pre-indexed (4 cyc for load result, 1 cyc for Base update)
423 // Load, register pre-indexed (4 cyc for load result, 2 cyc for Base update)
424 // (5 cyc load result for not-lsl2 scaled)
441 // LDRD pre-indexed: 5(2) cyc for reg, 4(1) cyc for imm.
489 // LDRD post-indexed: 4(2) cyc for reg, 4(1) cyc for imm.
501 // 5cyc "I0/I1,L" for minus reg or scaled not plus lsl2
502 // otherwise 4cyc "L"
591 // TODO: no writeback latency defined in documentation (implemented as 1 cyc)
604 // For minus or for not plus lsl2 scaled we need 3cyc "I0/I1, S",
605 // otherwise 1cyc S.
613 // STRH,STRD: 3cyc "I0/I1, S" for minus reg, 1cyc S for imm or for plus reg.
625 // Store, immed pre-indexed (1cyc "S, I0/I1", 1cyc writeback)
723 // fp compare - 3cyc F1 for unconditional, 6cyc "F0/F1, F1" for conditional
764 // FP multiply accumulate, FZ: 9cyc "F0/F1" or 4 cyc for sequenced accumulate
768 // VFMA takes 9 cyc for common case and 4 cyc for VFMA->VFMA chain (5 read adv.)
769 // VMUL takes 5 cyc for common case and 1 cyc for VMUL->VFMA chain (4 read adv.)
794 // VMOV: 3cyc "F0/F1" for imm/reg
800 // 5cyc L for FP transfer, vfp to core reg,
801 // 5cyc L for FP transfer, core reg to vfp
806 // 8cyc "L,F0/F1" for FP transfer, core reg to upper or lower half of vfp D-reg
970 // ASIMD absolute diff, 3cyc F0/F1 for integer VABD
988 // ASIMD absolute diff long: 3cyc F0/F1 for VABDL
1013 // ASIMD multiply, D-form: 5cyc F0 for r0px, 4cyc F0 for r1p0 and later
1023 // ASIMD multiply, Q-form: 6cyc F0 for r0px, 5cyc F0 for r1p0 and later
1032 // 5cyc F0 for r0px, 4cyc F0 for r1p0 and later, 1cyc for accumulate sequence
1045 // 6cyc F0 for r0px, 5cyc F0 for r1p0 and later, 2cyc for accumulate sequence
1058 // 5cyc F0 for r0px, 4cyc F0 for r1p0 and later, 1cyc for accumulate sequence
1071 // 5cyc F0 for r0px, 4cyc F0 for r1p0 and later, 2cyc for accumulate sequence
1089 // 5cyc F0 for r0px, 4cyc F0 for r1p0 and later
1097 // 4cyc F1, 1cyc for accumulate sequence (3cyc ReadAdvance)
1103 // 4cyc F1, 1cyc for accumulate sequence (3cyc ReadAdvance)
1164 // ASIMD FP convert, half-precision: 8cyc F0/F1
1179 // ASIMD FP multiply accumulate: 9cyc F0/F1, 4cyc for accumulate sequence
1201 // ASIMD duplicate, core reg: 8cyc "L, F0/F1"
1204 // ASIMD duplicate, scalar: 3cyc "F0/F1"
1233 // ASIMD transfer, scalar to core reg: 6cyc "L, I0/I1"
1236 // ASIMD transfer, core reg to scalar: 8cyc "L, F0/F1"
1262 // 1-2 reg: 5cyc L, +I for writeback, 1 cyc wb latency
1267 // 3-4 reg: 6cyc L, +I for writeback, 1 cyc wb latency
1274 // ASIMD load, 1 element, one lane and all lanes: 8cyc "L, F0/F1"
1280 // ASIMD load, 2 element, multiple, 2 reg: 8cyc "L, F0/F1"
1286 // ASIMD load, 2 element, multiple, 4 reg: 9cyc "L, F0/F1"
1291 // ASIMD load, 2 element, one lane and all lanes: 8cyc "L, F0/F1"
1303 // ASIMD load, 3 element, multiple, 3 reg: 9cyc "L, F0/F1"
1318 // ASIMD load, 3 element, one lane, size 32: 8cyc "L, F0/F1"
1328 // ASIMD load, 3 element, one lane, size 8/16: 9cyc "L, F0/F1"
1338 // ASIMD load, 3 element, all lanes: 8cyc "L, F0/F1"
1348 // ASIMD load, 4 element, multiple, 4 reg: 9cyc "L, F0/F1"
1360 // ASIMD load, 4 element, one lane, size 32: 8cyc "L, F0/F1"
1372 // ASIMD load, 4 element, one lane, size 8/16: 9cyc "L, F0/F1"
1384 // ASIMD load, 4 element, all lanes: 8cyc "L, F0/F1"
1398 // ASIMD store, 1 element, multiple, 1 reg: 1cyc S
1402 // ASIMD store, 1 element, multiple, 2 reg: 2cyc S
1406 // ASIMD store, 1 element, multiple, 3 reg: 3cyc S
1411 // ASIMD store, 1 element, multiple, 4 reg: 4cyc S
1416 // ASIMD store, 1 element, one lane: 3cyc "F0/F1, S"
1421 // ASIMD store, 2 element, multiple, 2 reg: 3cyc "F0/F1, S"
1426 // ASIMD store, 2 element, multiple, 4 reg: 4cyc "F0/F1, S"
1431 // ASIMD store, 2 element, one lane: 3cyc "F0/F1, S"
1464 // AESD, AESE, AESIMC, AESMC: 3cyc F0
1466 // Crypto polynomial (64x64) multiply long (VMULL.P64): 3cyc F0
1468 // Crypto SHA1 xor ops: 6cyc F0/F1
1470 // Crypto SHA1 fast ops: 3cyc F0
1472 // Crypto SHA1 slow ops: 6cyc F0
1474 // Crypto SHA256 fast ops: 3cyc F0
1476 // Crypto SHA256 slow ops: 6cyc F0