Avafly's repos on GitHub
C++ · 8 人关注
optimize-gemm
My gemm optimization on RPi (ARM) achieved a 170x performance boost, showing speeds faster than Eigen and close to OpenBLAS.
C · 4 人关注
tiny-cnn
A tiny CNN which is extremely fast and lightweight, beat ONNXRuntime and ncnn.
Python · 3 人关注
Handcrafted-CNN
A handcrafted convolutional neural network. Matrix multiplication instead of loop operation, very fast.
C · 1 人关注
LeetCodeSolution
My LeetCode solutions in C AND C++. Directly executable, and all dynamically allocated memory can be freed.
C · 0 人关注
conv-layout-study
Simple study on the performance impact of NCHW and NHWC layouts for convolution.
Python · 0 人关注
D-FINE
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement [ICLR 2025 Spotlight]
C++ · 0 人关注
YOLOX-TensorRT10
A TensorRT 10 C++ implementation of YOLOX with dynamic‑shape support.