模塊介紹
卷積神經(jīng)網(wǎng)絡(luò)(CNN)可以分為卷積層、池化層、激活層、全鏈接層結(jié)構(gòu),本篇要實(shí)現(xiàn)的,就是CNN的卷積層中的window窗。
在卷積過程中,最復(fù)雜的就是卷積運(yùn)算,也就是Filter和圖片(輸入)相乘然后在相加的這一步驟。

我此處的構(gòu)想就是將其卷積這個(gè)步驟進(jìn)行拆分:加窗、載入權(quán)重、卷積運(yùn)算。因而對(duì)應(yīng)3個(gè)模塊,而此處實(shí)現(xiàn)的就是加窗這個(gè)模塊。而他主要負(fù)責(zé)的功能就是:提取輸入圖片中的數(shù)據(jù),生成對(duì)應(yīng)的窗口。 如上圖所示,對(duì)x[:,:,0]圖片進(jìn)行窗口提起,提取的第一個(gè)窗口(左上角第一個(gè))就是
[ 0 0 0 0 0 1 0 0 1 ] egin{bmatrix}0&0&0\0&0&1\0&0&1end{bmatrix}000000011
代碼
可配置參數(shù)、輸入和輸出定義
STRIDE為窗口滑動(dòng)的步長(zhǎng),KERNEL_SIZE對(duì)應(yīng)輸入卷積核的大小,PADDING 為補(bǔ)充的長(zhǎng)度
pixel_in 為輸出的圖片數(shù)據(jù),frame_start 為圖片開始輸入的標(biāo)志,pixel_valid為輸入有效標(biāo)志
window_out是圖片展成一維的窗口數(shù)據(jù)
module window#( parameterDATA_WIDTH = 16, // Widthofeachpixeldata parameterIMG_WIDTH = 32, // Widthofinputimage parameterIMG_HEIGHT = 32, // Heightofinputimage parameterKERNEL_SIZE = 3, // Sizeofconvolutionwindow (square) parameterSTRIDE = 1, // Strideofconvolution parameterPADDING = (KERNEL_SIZE - 1) / 2 // PaddingsizecalculatedforSAMEmode)( input wire clk, // Clock signal input wire rst_n, // Active low reset input wire [DATA_WIDTH-1:0] pixel_in, // Input pixel data input wire pixel_valid, // Input pixel valid signal input wire frame_start, // Start of new frame signal output reg [KERNEL_SIZE*KERNEL_SIZE*DATA_WIDTH-1:0] window_out, // Flattened window output output reg window_valid // Window data valid );
內(nèi)部信號(hào)定義
輸入的圖片數(shù)據(jù)是一個(gè)一個(gè)輸入的,用x_pos和y_pos 來記錄當(dāng)前pixel位于圖片中的位置
窗口在圖片上滑動(dòng),用x_window,y_window用來判斷窗口目前的位置
line_Buffer緩存輸入的數(shù)據(jù),同時(shí)進(jìn)行padding操作, 形成數(shù)據(jù)窗口,而window_buffer 在line_buffer上進(jìn)行滑動(dòng),形成窗口
狀態(tài)機(jī),分為三個(gè)狀態(tài) IDLE, LOAD,PROCESS, 分別對(duì)應(yīng)空閑,載入(開始載入數(shù)據(jù)),處理(形成window)
// Internal signals
reg [5:0] x_pos, y_pos; // Current input pixel position
reg [5:0] x_window, y_window; // Window center position
reg [DATA_WIDTH-1:0] line_buffer [0:KERNEL_SIZE][0:IMG_WIDTH+2*PADDING-1]; // Line buffer
reg [DATA_WIDTH-1:0] window_buffer [0:KERNEL_SIZE-1][0:KERNEL_SIZE-1]; // Window buffer
reg signed [6:0] src_y, src_x; // Temporary variables for coordinate calculation
// State machine
reg [1:0] current_state, next_state;
localparam IDLE = 2'b00, LOAD = 2'b01, PROCESS = 2'b10;
// Loop variables
integer i, j, k;
狀態(tài)的賦值以及跳轉(zhuǎn)
當(dāng)接收到frame_start信號(hào)(圖片開始輸入),狀態(tài)從空閑進(jìn)入到LOAD狀態(tài);
當(dāng)目前的圖片數(shù)據(jù)可以已經(jīng)足夠,可以用來生成穩(wěn)定的輸出窗口時(shí),進(jìn)入到PROCESS狀態(tài)
當(dāng)目前滑窗口提取完對(duì)應(yīng)數(shù)據(jù)窗口后,回到IDLE狀態(tài)
注:y_pos從0到KERNEL_SIZE-1時(shí),已經(jīng)有了KERNEL_SIZE行數(shù)據(jù)了,可以進(jìn)入窗口數(shù)據(jù)提取階段;實(shí)際上可以更早進(jìn)入,因?yàn)榇嬖赑adding。當(dāng)y_pos=KERNEL_SIZE-Padding-1的時(shí)候,就可以進(jìn)入了
// FSM state transitions
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
current_state <= IDLE;
else
current_state <= next_state;
end
always @(*) begin
case (current_state)
IDLE: next_state = frame_start ? LOAD : IDLE;
LOAD: next_state = (y_pos >= KERNEL_SIZE-1) ? PROCESS : LOAD;
PROCESS: next_state = (y_window >= IMG_HEIGHT && x_window == 0) ? IDLE : PROCESS;
default: next_state = IDLE;
endcase
end
狀態(tài)執(zhí)行
推薦使用拆分的方法,把一個(gè)狀態(tài)執(zhí)行的大always塊,分成很多子always塊。
a. 輸入圖片數(shù)據(jù)位置捕獲
當(dāng)前狀態(tài)為IDLE,圖片即將開始輸入時(shí),將定位信號(hào)復(fù)原
當(dāng)前狀態(tài)不為IDLE, 同時(shí)輸入有效,那么坐標(biāo)根據(jù)情況自增
// Input pixel position tracking
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
x_pos <= 0;
y_pos <= 0;
end else if (current_state == IDLE && frame_start) begin
x_pos <= 0;
y_pos <= 0;
end else if (pixel_valid && current_state != IDLE) begin
if (x_pos == IMG_WIDTH-1) begin
x_pos <= 0;
y_pos <= y_pos + 1;
end else begin
x_pos <= x_pos + 1;
end
end
end
b. Line_Buffer 的緩沖
每次開啟新的一行的數(shù)據(jù),對(duì)Line_Buffer 全部復(fù)位
然后對(duì)對(duì)應(yīng)的位置進(jìn)行實(shí)際數(shù)據(jù)的填充
// Line buffer management
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
for (i = 0; i <= KERNEL_SIZE; i = i + 1)
for (j = 0; j < IMG_WIDTH + 2*PADDING; j = j + 1)
line_buffer[i][j] <= 0;
end else if (pixel_valid && current_state != IDLE) begin
if (x_pos == 0) begin
// Clear the line buffer row at the start of each new line
for (k = 0; k < IMG_WIDTH + 2*PADDING; k = k + 1)
line_buffer[y_pos % (KERNEL_SIZE + 1)][k] <= 0;
end
line_buffer[y_pos % (KERNEL_SIZE + 1)][x_pos + PADDING] <= pixel_in;
end
end
c .Window position tracking
復(fù)位、一幀圖片的開始或即將進(jìn)入PROCESS狀態(tài),對(duì)window記位進(jìn)行復(fù)位
當(dāng)前狀態(tài)位PROCESS狀態(tài),同時(shí)沒有超過當(dāng)前圖片的高度時(shí),對(duì)window的位置進(jìn)行對(duì)應(yīng)的變化
// Window position tracking
always @(posedge clk or negedge rst_n) begin
if (!rst_n || frame_start || (current_state == LOAD && next_state == PROCESS)) begin
x_window <= 0;
y_window <= 0;
end else if (current_state == PROCESS && y_window < IMG_HEIGHT) begin
if (x_window + STRIDE >= IMG_WIDTH) begin
x_window <= 0;
y_window <= y_window + STRIDE;
end else begin
x_window <= x_window + STRIDE;
end
end
end
d. window_buffer的處理
// Window generation and output
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
window_valid <= 0;
for (i = 0; i < KERNEL_SIZE; i = i + 1)
for (j = 0; j < KERNEL_SIZE; j = j + 1)
window_buffer[i][j] <= 0;
end else begin
window_valid <= 0; // Default
if (current_state == PROCESS &&
x_window < IMG_WIDTH &&?
y_window < IMG_HEIGHT &&?
y_window + (KERNEL_SIZE>>1) <= y_pos) begin
// Generate window
for (i = 0; i < KERNEL_SIZE; i = i + 1) begin
for (j = 0; j < KERNEL_SIZE; j = j + 1) begin
src_y = y_window + i - (KERNEL_SIZE>>1);
src_x = x_window + j - (KERNEL_SIZE>>1);
if (src_y >= 0 && src_y < IMG_HEIGHT &&?
src_x >= 0 && src_x < IMG_WIDTH) begin
window_buffer[i][j] <= line_buffer[src_y % (KERNEL_SIZE + 1)][src_x + PADDING];
end else begin
window_buffer[i][j] <= 0; // Padding
end
end
end
window_valid <= 1;
end
end
end
當(dāng)window坐標(biāo)沒有超過圖片大小,確??梢陨纱翱跁r(shí),獲取生成。對(duì)KERNEL_SIZE>>1,等價(jià)于KERNEL_SIZE/2,表示中心位置的偏移量
e.g.
這樣就可以將卷積索引轉(zhuǎn)換為相對(duì)于中心的坐標(biāo),這樣可以用于判斷是否越界,從而進(jìn)行padding補(bǔ)充
以KERNEL_SIZE=3為例
| 卷積核位置 | src坐標(biāo)計(jì)算 | 結(jié)果 | 取值 |
|---|---|---|---|
| 0,0 | scr_y=0+0-1=-1 | 越界 | padding |
| 0,1 | src_y=0+0-1=-1 | 越界 | padding |
| 0,2 | src_y=0+0-1=-1 | 越界 | padding |
| 1,0 | src_x=0+0-1=-1 | 越界 | padding |
| 1,1 | src_y=0,src_x=0 | 有效 | 原圖[0,0] |
| 1,2 | src_y=0,src_x=1 | 有效 | 原圖[0,1] |
| 2,0 | src_x=0+0-1=-1 | 越界 | padding |
| 2,1 | src_y=1,src_x=0 | 有效 | 原圖[1,0] |
| 2,2 | scr_y=1,src_x=1 | 有效 | 原圖[1,1] |
e. 數(shù)據(jù)窗口的展平
// Flatten window buffer for output
always @(*) begin
for (i = 0; i < KERNEL_SIZE; i = i + 1) begin
for (j = 0; j < KERNEL_SIZE; j = j + 1) begin
window_out[(KERNEL_SIZE*KERNEL_SIZE-(i*KERNEL_SIZE+j))*DATA_WIDTH-1 -: DATA_WIDTH] = window_buffer[i][j];
end
end
end
endmodule
將原本二維的的數(shù)據(jù)(寬為KERNEL_SIZE, 高為KERNEL_SIZE, 位寬為DATA_WIDTH)的數(shù)據(jù),按照從罪小位排在最高位的順序,壓縮成一維的數(shù)據(jù)
測(cè)試
`timescale 1ns / 1ps
module window_tb();
// 測(cè)試用參數(shù) - 使用小尺寸便于觀察
parameter DATA_WIDTH = 8;
parameter IMG_WIDTH = 32;
parameter IMG_HEIGHT = 32;
parameter KERNEL_SIZE = 3;
parameter STRIDE = 1;
parameter PADDING = (KERNEL_SIZE - 1) / 2;
// 測(cè)試信號(hào)
reg clk;
reg rst_n;
reg [DATA_WIDTH-1:0] pixel_in;
reg pixel_valid;
reg frame_start;
wire [KERNEL_SIZE*KERNEL_SIZE*DATA_WIDTH-1:0] window_out;
wire window_valid;
// 實(shí)例化被測(cè)模塊
window #(
.DATA_WIDTH(DATA_WIDTH),
.IMG_WIDTH(IMG_WIDTH),
.IMG_HEIGHT(IMG_HEIGHT),
.KERNEL_SIZE(KERNEL_SIZE),
.STRIDE(STRIDE),
.PADDING(PADDING)
) dut (
.clk(clk),
.rst_n(rst_n),
.pixel_in(pixel_in),
.pixel_valid(pixel_valid),
.frame_start(frame_start),
.window_out(window_out),
.window_valid(window_valid)
);
// 時(shí)鐘生成
initial begin
clk = 0;
forever#5clk = ~clk;
end
// 測(cè)試數(shù)據(jù) - 5x5圖像
reg [DATA_WIDTH-1:0] test_image [0:IMG_HEIGHT-1][0:IMG_WIDTH-1];
// 窗口計(jì)數(shù)器
integer window_count = 0;
// 初始化測(cè)試圖像
task reset_test_image;
integer i, j;
begin
for(i = 0; i < IMG_HEIGHT; i = i + 1) begin
for(j = 0; j < IMG_WIDTH; j = j + 1) begin
test_image[i][j] =0;
end
end
end
endtask
task init_test_image;
integer i, j;
begin
for(i = 0; i < IMG_HEIGHT; i = i + 1) begin
for(j = 0; j < IMG_WIDTH; j = j + 1) begin
test_image[i][j] = i * IMG_WIDTH + j + 1;
end
end
end
endtask
// 顯示測(cè)試圖像
task display_test_image;
integer i, j;
begin
$display(" === 4x4 Test Image ===");
for(i = 0; i < IMG_HEIGHT; i = i + 1) begin
$write("Row %0d: ", i);
for(j = 0; j < IMG_WIDTH; j = j + 1) begin
$write("%3d ", test_image[i][j]);
end
$display("");
end
$display("====================== ");
end
endtask
// 發(fā)送一幀圖像數(shù)據(jù)
task send_frame;
integer i, j;
begin
$display("Sending 4x4 frame...");
init_test_image();
display_test_image();
// 發(fā)送frame_start信號(hào)
@(posedge clk);
frame_start = 1;
@(posedge clk);
frame_start = 0;
// 逐像素發(fā)送數(shù)據(jù)
for(i = 0; i < IMG_HEIGHT; i = i + 1) begin
for(j = 0; j < IMG_WIDTH; j = j + 1) begin
@(posedge clk);
pixel_in = test_image[i][j];
pixel_valid = 1;
$display("Sending pixel[%0d][%0d] = %0d at time %0t", i, j, pixel_in, $time);
end
end
@(posedge clk);
pixel_valid = 0;
$display("All pixels sent at time %0t", $time);
end
endtask
// 主測(cè)試序列
initial begin
$display("========================================");
$display("Window Test - Focus on Last Window");
$display("IMG_SIZE: %0dx%0d, KERNEL: %0dx%0d", IMG_WIDTH, IMG_HEIGHT, KERNEL_SIZE, KERNEL_SIZE);
$display("Expected windows: %0d", IMG_WIDTH * IMG_HEIGHT);
$display("========================================");
// 初始化信號(hào)
rst_n = 0;
pixel_in = 0;
pixel_valid = 0;
frame_start = 0;
reset_test_image();
// 復(fù)位序列
repeat(5) @(posedge clk);
rst_n = 1;
repeat(3) @(posedge clk);
// 發(fā)送測(cè)試幀
send_frame();
// 等待所有窗口輸出
repeat(50) @(posedge clk);
$display(" ========================================");
$display("Test Summary:");
$display("Total Windows Generated: %0d", window_count);
$display("Expected Windows: %0d", IMG_WIDTH * IMG_HEIGHT);
if(window_count == IMG_WIDTH * IMG_HEIGHT) begin
$display("SUCCESS: All windows generated!");
end else begin
$display("FAILURE: Missing windows!");
end
$display("========================================");
$finish;
end
// 窗口監(jiān)控
always @(posedge clk) begin
if(window_valid) begin
window_count = window_count + 1;
$display("Window %0d: pos(%0d,%0d) at time %0t",
window_count, dut.x_window, dut.y_window, $time);
// 顯示窗口內(nèi)容
$write("Window content: ");
$write("[%0d %0d %0d] ",
window_out[71:64], window_out[63:56], window_out[55:48]);
$write("[%0d %0d %0d] ",
window_out[47:40], window_out[39:32], window_out[31:24]);
$write("[%0d %0d %0d]",
window_out[23:16], window_out[15:8], window_out[7:0]);
$display("");
end
end
// 狀態(tài)機(jī)監(jiān)控
reg [1:0] prev_state = 2'b00;
always @(posedge clk) begin
if(dut.current_state != prev_state) begin
case(dut.current_state)
2'b00: $display("Time %0t: State -> IDLE", $time);
2'b01: $display("Time %0t: State -> LOAD", $time);
2'b10: $display("Time %0t: State -> PROCESS", $time);
default: $display("Time %0t: State -> UNKNOWN(%0d)", $time, dut.current_state);
endcase
prev_state = dut.current_state;
end
end
// 波形轉(zhuǎn)儲(chǔ)
initial begin
$dumpfile("window_tb.vcd");
$dumpvars(0, window_tb);
// 限制仿真時(shí)間
#2000;
$display("ERROR: Simulation timeout!");
$finish;
end
endmodule
結(jié)果
輸入數(shù)據(jù)

Row 0: 1 2 3 4 5
Row 1: 6 7 8 9 10
Row 2: 11 12 13 14 15
Row 3: 16 17 18 19 20
Line_Buffer 緩沖數(shù)據(jù)
Window_Buffer輸出數(shù)據(jù)

valid為高,window_buffer開始提取line_buffer數(shù)據(jù),同時(shí)輸出展平的window_out;

window_buffer提取完畢,valid拉低
-
FPGA
+關(guān)注
關(guān)注
1664文章
22508瀏覽量
639495 -
cnn
+關(guān)注
關(guān)注
3文章
356瀏覽量
23558 -
卷積神經(jīng)網(wǎng)絡(luò)
+關(guān)注
關(guān)注
4文章
374瀏覽量
12927
原文標(biāo)題:FPGA實(shí)現(xiàn)CNN卷積層:高效窗口生成模塊設(shè)計(jì)與驗(yàn)證
文章出處:【微信號(hào):gh_9d70b445f494,微信公眾號(hào):FPGA設(shè)計(jì)論壇】歡迎添加關(guān)注!文章轉(zhuǎn)載請(qǐng)注明出處。
發(fā)布評(píng)論請(qǐng)先 登錄
CNN之卷積層
對(duì)卷積層的C++實(shí)現(xiàn)詳細(xì)介紹
卷積神經(jīng)網(wǎng)絡(luò)(CNN)的簡(jiǎn)單介紹及代碼實(shí)現(xiàn)
如何去理解CNN卷積層與池化層計(jì)算?
MATLAB實(shí)現(xiàn)卷積神經(jīng)網(wǎng)絡(luò)CNN的源代碼
卷積神經(jīng)網(wǎng)絡(luò)概述 卷積神經(jīng)網(wǎng)絡(luò)的特點(diǎn) cnn卷積神經(jīng)網(wǎng)絡(luò)的優(yōu)點(diǎn)
cnn卷積神經(jīng)網(wǎng)絡(luò)模型 卷積神經(jīng)網(wǎng)絡(luò)預(yù)測(cè)模型 生成卷積神經(jīng)網(wǎng)絡(luò)模型
cnn卷積神經(jīng)網(wǎng)絡(luò)原理 cnn卷積神經(jīng)網(wǎng)絡(luò)的特點(diǎn)是什么
cnn卷積神經(jīng)網(wǎng)絡(luò)算法 cnn卷積神經(jīng)網(wǎng)絡(luò)模型
cnn卷積神經(jīng)網(wǎng)絡(luò)matlab代碼
卷積神經(jīng)網(wǎng)絡(luò)cnn中池化層的主要作用
卷積神經(jīng)網(wǎng)絡(luò)cnn模型有哪些
cnn卷積神經(jīng)網(wǎng)絡(luò)分類有哪些
卷積神經(jīng)網(wǎng)絡(luò)實(shí)現(xiàn)示例
FPGA圖像處理基礎(chǔ)----實(shí)現(xiàn)緩存卷積窗口
FPGA實(shí)現(xiàn)CNN卷積層的高效窗口生成模塊設(shè)計(jì)與驗(yàn)證
評(píng)論