久久国产三级麻豆,亚洲日韩字幕

DragGAN

DragGAN是由谷歌、麻省理工學院和馬克斯普朗克研究所創(chuàng)建的一種新的人工智能模型。

通過點擊、拖動等簡單的交互操作就能改變拍攝對象的姿勢、形狀和表情等。

DragGAN改變了傳統(tǒng)的PS操作流程，只需簡單拖拽起點和終點，AI就會根據(jù)圖像的變化自動生成和補全圖像。

DragGAN可處理的圖像類型豐富多樣，無論是人類表情的調整還是自然風景的變化，都可以在瞬息之內輕松實現(xiàn)。

DragGAN的全流程包含一個基于Generator的前向操作和反向傳播過程。本文主要介紹在TPU-MLIR上適配DragGAN模型的前向操作的全部過程。

模型移植

推理代碼定位與模型導出

適配的模型代碼使用 XingangPan/DragGAN: Official Code for DragGAN (SIGGRAPH 2023) (github.com) ，模型的入口在 DragGAN/viz/renderer.py:357，可以在這里直接引入TPU-MLIR提供的 gen_shell 工具，直接 trace 生成 workspace 文件夾，onnx/pt 模型，以及默認的轉換腳本：

fromutils.gen_shellimportgenerate
generate(
"DragGan",
G,
dict(
ws=ws,
c=label
),
"../draggan_workspace",
)

運行源碼 README.md 中提供的腳本 python visualizer_drag_gradio.py，運行成功后可以在在同級目錄下得到如下的目錄結構：

draggan_workspace
├──cali_data
│└──data.npz
├──convert.sh
├──DragGan.onnx
├──DragGan.pt
├──data.npz
└──cali_data

模型移植過程中錯誤的分析和解決

RuntimeError: Op not support:{'RandomNormalLike'}

在 model_transform 階段，發(fā)現(xiàn)存在不支持的算子 RandomNormalLike：

RandomNormalLike（隨機數(shù)相關）的算子 1684x 無法支持，所以必須嘗試在原模型中避開這些算子。定位到模型代碼處，發(fā)現(xiàn)該算子用于提供一個噪音供下游使用。源碼中提供了三種噪音生成方式，分別是 random（隨機噪音），const（常量噪音），和 none（不提供噪音），因此可以通過設置 noise_mode = const 避開這一算子的使用。

對 Conv/DeConv Filter 為動態(tài)輸入情況的支持

DragGan 的模型結構中，有一部分 Conv 和 DeConv 的輸入是固定權重，而 FilterOp 部份是動態(tài)的從上游計算得到的輸入。這種情況在這之前未做考慮，需要添加支持。這包括在多個地方的代碼更改。下面通過具體的報錯提示來一步步分析、定位和解決。

model_transform 階段

在 tpu-mlir 的 Converter 中，權重（weight）和動態(tài)輸入（dynamic input）存儲在不同的變量中，其中，weight 通過 getWeightOp(name) 獲取，input 通過 getOperand(name) 獲取。如果不確定 op 是 dynamic input 還是 weight，可以使用 getOp(name) 來獲取。而在對 DragGan 的 model_transform.py 腳本的運行過程中，會遇到如下的報錯KeyError: '/synthesis/b8/conv0/Transpose_output_0'

此時對應模型結構，發(fā)現(xiàn)該 DeConv 的輸入 /synthesis/b8/conv0/Transpose_output_0 是作為一個 Weight 獲取的。

因此將ConvTranspose 的 filter_opd 的獲取邏輯改為 getOp 即可

同理，另外一個 KeyError 中，DeConv 的 filter 來自于動態(tài)輸入，所以同理，將 DeConv 獲取 filter 結點的邏輯同樣改為 getOp。

在 model_transform 階段，模型會首先轉換到DragGAN_origin.mlir，再經(jīng)過--shape-infer，--canonicalize 等過程，轉換為可以通過model_runner.py做推理的 Top Dialect 描述的 mlir 文件。在對 Top 層做推理驗證正確性時，DragGan 模型報出了精度為零的錯誤。通過觀察輸出的錯誤信息，發(fā)現(xiàn)是在 DeConv 層之后精度出現(xiàn)問題的，而且僅在 DeConv 的 filter 是動態(tài)輸入的情況下會有這一問題。

構建了一個 filter 是動態(tài)輸入的 DeConv 作為單側，復現(xiàn)該錯誤成功：

classDeConvCase(nn.Module):
def__init__(self)->None:
super().__init__()
self.deconv=nn.ConvTranspose2d(4,4,[2,2],stride=[1,1],bias=False)
self.deconv.weight.data=weight

defforward(self,x,y):
output_padding=self.deconv._output_padding(
x,
None,
[2,2],
[0,0],
[2,2],
1,
[1,1],
)

out=F.conv_transpose2d(x,y,None,[1,1],0,output_padding,1,1)

returnout,self.deconv(x)

此時通過斷點調試，發(fā)現(xiàn)錯誤原因有兩個：

正確性驗證階段推理時，在 init() 時設置權重，此時 weight 還沒有設置
動態(tài)輸入時沒有做對應的權重重排（WeightReorder）

tpu-mlir 在適配模型的過程會經(jīng)過多步轉換和多次優(yōu)化，為了保證轉換后的正確性，tpu-mlir 會做三次正確性驗證，分別針對 Top Dialect，Tpu Dialect 和 bmodel。Top 和 Tpu 層的正確性的核心代碼位于 ModuleInterpreter.[h/cpp]，該過程會從輸入開始，對每一個 Op 分配空間，進行初始化（init），在初始化結束后進行推理（inference），并在最終對每個 Op 進行析構（deinit）。而 DeConv 的精度錯誤之一則來自于 Inference 階段時 init 和 inference 的分離。

在 init 時，DeConv 會構造一個 Dnnl 的實例，此時會直接 copy 一份 Weight 在 Dnnl 實例中，但由于該 filter 為動態(tài)輸入， init 時值還沒有傳入，所以傳入的 filter 的值實質上是全零。導致在 inference 階段出現(xiàn)錯誤。定位后該問題比較好改，將 init 過程中對 Dnnl 實例的 setup 移到 inference 階段即可。Conv 也有同樣的問題，修改邏輯相同。

對 onnx 模型，DeConv 的 filter 的權重存儲方式是 input channel first（即 shape 為 [ic, oc, kw, kh]），而后端的計算過程大多都需要 output channel first（[oc, ic, kw, kh]），可以注意到 OnnxConverter 中，原本對 DeConv 的權重會存在一個轉置操作：

而動態(tài)權重自然沒有辦法實現(xiàn)這一操作。因此，需要添加一個圖優(yōu)化，當 DeConv 的 filter 是動態(tài)時，在其前面添加一個 [oc, ic] 互換的 Permute 操作。在添加 Permute 操作時，需要仔細考慮 DeConv 添加這一 Permute 的先決條件。確保該 Permute 添加是針對 DeConv 的動態(tài)權重，且同時不會重復添加。因此考慮在 DeConv 的 Operation 結構中添加 bool 類型的 dynweight_reorderd 參數(shù)。當 filter 不是 top.WeightOp （使用動態(tài)權重）且 dynweight_reordered 為 false （沒有添加對動態(tài) weight 的 Permute）時，添加這一 Permute，同時設置 dynweight_reorderd 參數(shù)為 true。

在 TopOps.td 文件對 DeConv 添加 dynweight_reorderd 參數(shù)后，對 DeConv 動態(tài)權重的圖優(yōu)化邏輯如下：

structReorderDynWeight:publicOpRewritePattern{
usingOpRewritePattern::OpRewritePattern;

LogicalResultmatchAndRewrite(DeconvOpop,
PatternRewriter&rewriter)constoverride{

autofilter_shape=module::getShape(op.getFilter());//or

if(module::isWeight(op.getOperand(1))){
returnfailure();
}
booldyn_weight_reorderd=op.getDynweightReorderd();
if(dyn_weight_reorderd){
returnfailure();
}

if(isa(op.getOperand(1).getDefiningOp())){
autopermute_op=
dyn_cast(op.getOperand(1).getDefiningOp());

//eraseifalreadyhavethispermutebutf romoriginalgraph
std::vector<int64_t>ps={1,0,2,3};
autoorder=module::getI64Array(permute_op.getOrder());
if(*order==ps){
permute_op.replaceAllUsesWith(permute_op.getInput());
rewriter.eraseOp(permute_op);
op.setDynweightReorderd(true);
returnsuccess();
}
}

rewriter.setInsertionPointAfterValue(op.getFilter());
std::stringname=module::getName(op.getOutput()).str();
autoloc=
NameLoc::get(rewriter.getStringAttr(name+"_reorder_permute"));

std::vector<int64_t>order={1,0};
autofilter_dim=filter_shape.size();
for(inti=2;iorder.push_back(i);
}

autop_type=
UnrankedTensorType::get(module::getElementType(op.getFilter()));
std::vectorattrs;
attrs.emplace_back(
rewriter.getNamedAttr("order",rewriter.getI64ArrayAttr(order)));

autonew_permute_op=rewriter.create(
loc,p_type,ValueRange{op.getFilter()},attrs);

new_permute_op.shape_inference();
op.setOperand(1,new_permute_op.getOutput());
op.setDynweightReorderd(true);
returnsuccess();
}
};

這里做了一個額外的判斷，當 DeConv 的 filter 位置已經(jīng)是 Permute 且其 order 和要添加的 Permute 一樣（1,0,2,3）時，兩個 Permute 可以直接融合，所以此時可以直接刪除該 Permute 并返回。其他的情況則是插入一個額外的 Permute 操作。Conv 層同樣要支持動態(tài) weight 的權重重排，要添加一個相同的圖優(yōu)化。

此外，Top 層的 shape-infer 要早于圖優(yōu)化，因此在做 shape-infer 時動態(tài) weight 的 shape 仍然還是 input channle first，所以 DeConv 的 output_shape 的 dim[1] 應該基于 filter_shape[1] 來判斷。對應的修改位于 lib/Dialect/Top/Interfaces/Deconv.cpp：

bmodel 運行錯誤 ASSERT /workspace/nntoolchain/TPU1686/bm1684x/cmodel/src/cmodel_common.cpp: gather_data: 207: dst_offset < (1<<18)

在大模型中定位這一錯誤較難，因此可以通過 mlir_cut.py 逐步縮小范圍，得到了最小可復現(xiàn)的 mlir：

mlir_cut.py--mlir*tpu.mlir--output_names/synthesis/b64/conv0/Conv_output_0_Conv--input_names/synthesis/b32/conv1/Mul_3_output_0_Mul,/synthesis/b64/conv0/Reshape_3_output_0_Reshape

tpuc-optDragGan_bm1684x_f32_final.mlir--codegen="model_file=DragGan_f32.bmodelembed_debug_info=true"-o/dev/null
model_runner.py--inputfake_data.npz--modelDragGan_f32.bmodel--outputDragGan_bm1684x_f32_model_outputs.npz

進一步構建了能夠復現(xiàn)該錯誤的單元測試：

通過控制變量，得到了以下現(xiàn)象：

關閉 layer-group，模型運行正常不報錯：這說明問題基本是出在 tpu-mlir 部份而不是后端算子部份
將上述的代碼中 DeConv 的 filter 從動態(tài)改為靜態(tài)，模型運行正常：說明問題仍然是動態(tài) Weight 導致的
構建基本的 DeConv 算子，無論是靜態(tài)和動態(tài)都運行正常，和上面的單側進行對比，發(fā)現(xiàn)區(qū)別在單個 DeConv 算子不會進行 LayerGroup：將問題定位到 tpu-mlir 的 LayerGroup 部份的代碼

此時進一步對比正常和出錯的 final.mlir，發(fā)現(xiàn) dynamic weight 和 weight 的 slice 屬性不一致，如下所示：

top.Weight 的 layer-group 是比較特殊。top.Weight 在整個 layer-group 都保存在 local memory 中（hold_in_lmem = true）；同時，weight 也不能切分 slice，每個 slice 都要用到完成的 filter，從而導致結果錯誤。

所以需要單獨針對 dynamic weight 處理，這包括設置其生命周期（hold_in_mem = true），以及將其 slice 設置為長度為 1，元素為其 shape 對應維度值的列表。這一過程可以在 lib/Dialect/Tpu/Transforms/LayerGroup/LayerGroupUtil.cpp 的 backward_update_slice 方法中完成：

優(yōu)化后再對比兩個單例的 final.mlir，發(fā)現(xiàn)此時 dynamic weight 的 slice 信息已經(jīng)和普通 weight 完全相同：

F16 和 int8 精度問題

在解決了 F32 的 bug 后，F(xiàn)16 和 int8 的 tpu 層 mlir 仍然存在精度問題。原本以為是 DeConv 的 F16 適配存在問題，通過使用 mlir_debugger 對每一層用正確數(shù)值做推理（也可以直接觀察輸出的 npz 文件以及 npz_tool 的比對結果），發(fā)現(xiàn)出錯的是 Active -> Mul 的結構，Active 是 ReduceSum 操作：