Skip to content

[QST]How to make_tiled_copy for C layout after permutation? #3147

@zhang110112110

Description

@zhang110112110

How to make_tiled_copy from register to global memory for this layout? m:32 n:64, threads: 128
(row0)t0v0...v7, t1v0-v7, t2v0-v7, t3v0-v7, t64v0...v7, t65v0-v7, t66v0-v7, t67v0-v7 (fp16)
t4v0...v7, t5v0-v7, t6v0-v7, t7v0-v7, t68v0...v7, t69v0-v7, t70v0-v7, t71v0-v7
...
(row8)t0v8...v15, t1v8...v15, t2vv8...v15 t3v8...v15 t64v8...v15 t65vv8...v15, t66v8...v15 t67v8...v15
t4v8...v15 t5vv8...v15 t6v8...v15,t7vv8...v15, t68vv8...v15, t69v8...v15, t70v8...v15, t71v8...v15
...
(row16)t32v0...v7, t33v0-v7, t34v0-v7, t35v0-v7, t96v0...v7, t97v0-v7, t98v0-v7, t99v0-v7 (fp16)
..
(row31)t60v8...v15, t61v8...v15, t62v8...v15, t63v8...v15 t124v8...v15 t125vv8...v15, t126v8...v15 t127v8...v15
permutation layout:

Image

i have already transpose v2v3 and get 128B contiguous register for each threads. how to make_tiled_copy write 128B contiguous to global memory for this layout.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions