Weifeng Liu


Home

Publications

Talks

Services

TICOH


  Papers sorted by year, by topic or by venue.

2019

  • [c11] Zhen Xie, Guangming Tan, Weifeng Liu, Ninghui Sun. "IA-SpGEMM: An Input-aware Auto-tuning Framework for Parallel Sparse Matrix-Matrix Multiplication". 33rd ACM International Conference on Supercomputing (ICS '19).
    [PDF] [Slides] [DOI] [BibTeX]

  • [j7] Feng Zhang, Weifeng Liu, Ningxuan Feng, Jidong Zhai, Xiaoyong Du. "Performance Evaluation and Analysis of Sparse Matrix and Graph Kernels on Heterogeneous Processors". CCF Transactions on High Performance Computing (THPC).
    [PDF] [DOI] [BibTeX]

2018

  • [j6] Junhong Liu, Xin He, Weifeng Liu, Guangming Tan. "Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication". International Journal of Parallel Programming (IJPP).
    [PDF] [DOI] [DOI] [BibTeX]

  • [j5] Jing Chen, Jianbin Fang, Weifeng Liu, Tao Tang, Canqun Yang. "clMF: A Fine-Grained and Portable Alternating Least Squares Algorithm for Parallel Matrix Factorization". Future Generation Computer Systems (FGCS). (This is the extended paper of the Parlearning '17 work).
    [PDF] [DOI] [BibTeX] [Source code (opencl)]

  • [c10] Ang Li, Weifeng Liu, Linnan Wang, Kevin Barker, Shuaiwen Leon Song. "Warp-Consolidation: A Novel Execution Model for GPUs". 32nd ACM International Conference on Supercomputing (ICS '18).
    [PDF] [Slides] [DOI] [BibTeX]

  • [c9] Xinliang Wang, Weifeng Liu, Wei Xue, Li Wu. "swSpTRSV: A Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures". 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '18).
    [PDF] [Slides] [DOI] [BibTeX] [Source code (athread)]

  • [p1] Junhong Liu, Xin He, Weifeng Liu, Guangming Tan. "Register-based Implementation of the Sparse General Matrix-matrix Multiplication on GPUs". 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '18).
    [PDF] [Poster] [DOI] [BibTeX]

  • [j4] Huamin Ren, Nattiya Kanhabua, Andreas Møgelmose, Weifeng Liu, Kaustubh Kulkarni, Sergio Escalera, Xavier Baró, Thomas B. Moeslund. "Back-Dropout Transfer Learning for Action Recognition". IET Computer Vision (IET CV).
    [PDF] [DOI] [BibTeX]

2017

  • [c8] Ang Li, Weifeng Liu, Mads R. B. Kristensen, Brian Vinter, Hao Wang, Kaixi Hou, Andres Marquez, Shuaiwen Leon Song. "Exploring and Analyzing the Real Impact of Modern On-Package Memory on HPC Scientific Kernels". 2017 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). Nominated for best paper.
    [PDF] [Slides] [DOI] [BibTeX]

  • [j3] Weifeng Liu, Ang Li, Jonathan D. Hogg, Iain S. Duff, Brian Vinter. "Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides". Concurrency and Computation: Practice and Experience (CCPE). (This is the extended paper of the Euro-Par '16 work).
    [PDF] [DOI] [BibTeX] [Source code (cuda, opencl-amd)]
    [This Sync-free algorithm is incorporated in the MAGMA main branch.]

  • [c7] Kaixi Hou, Weifeng Liu, Hao Wang, Wu-chun Feng. "Fast Segmented Sort on GPUs". 31st ACM International Conference on Supercomputing (ICS '17).
    [PDF] [Slides] [DOI] [BibTeX] [Source code (cuda)]

  • [c6] Ang Li, Shuaiwen Leon Song, Weifeng Liu, Xu Liu, Akash Kumar, Henk Corporaal. "Locality-Aware CTA Clustering for Modern GPUs". 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). Received a HiPEAC Paper Award.
    [PDF] [Slides] [DOI] [BibTeX]

  • [w2] Jing Chen, Jianbin Fang, Weifeng Liu, Tao Tang, Xuhao Chen, Canqun Yang. "Efficient and Portable ALS Matrix Factorization for Recommender Systems". 6th International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics (held with IPDPS '17) (Parlearning '17).
    [PDF] [Slides] [DOI] [BibTeX]

2016

  • [c5] Weifeng Liu, Ang Li, Jonathan D. Hogg, Iain S. Duff, Brian Vinter. "A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves". 22nd International European Conference on Parallel and Distributed Computing (Euro-Par '16).
    [PDF] [Slides] [DOI] [BibTeX] [Source code (cuda, opencl-amd)]

  • [c4] Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng. "Parallel Transposition of Sparse Data Structures". 30th ACM International Conference on Supercomputing (ICS '16).
    [PDF] [Slides] [DOI] [BibTeX] [Source code (avx2, knc)]

2015

  • [c3] Weifeng Liu, Brian Vinter. "CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication". 29th ACM International Conference on Supercomputing (ICS '15).
    [PDF] [Slides] [DOI] [BibTeX] [Source code (avx2, avx512, knc, cuda, opencl-amd, opencl-nvidia)]
    [The CSR5 format is incorporated in the MAGMA main branch from version 2.2.0.]

  • [j2] Weifeng Liu, Brian Vinter. "Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors". Parallel Computing (PARCO). Volume 49, November 2015.
    [PDF] [DOI] [BibTeX] [Source code (cuda, opencl-amd, opencl-intel)]

  • [j1] Weifeng Liu, Brian Vinter. "A Framework for General Sparse Matrix-Matrix Multiplication on GPUs and Heterogeneous Processors". Journal of Parallel and Distributed Computing (JPDC). Volume 85, November 2015. (This is the extended paper of the IPDPS '14 work).
    [PDF] [Slides (LA '15)] [DOI] [BibTeX] [Source code (cuda, opencl-amd)]
    [This SpGEMM framework is incorporated in the clSPARSE main branch from version Beta 2.]

  • [c2] Huamin Ren, Weifeng Liu, Søren Ingvor Olsen, Sergio Escalera, Thomas B. Moeslund. "Unsupervised Behavior-Specific Dictionary Learning for Abnormal Event Detection". 26th British Machine Vision Conference (BMVC '15).
    [PDF] [DOI] [BibTeX]

2014

  • [c1] Weifeng Liu, Brian Vinter. "An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data". 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS '14).
    [PDF] [Slides] [DOI] [BibTeX] [Source code (cuda, opencl-amd)]

  • [w1] Weifeng Liu, Brian Vinter. "Ad-heap: An Efficient Heap Data Structure for Asymmetric Multicore Processors". 7th Workshop on General Purpose Processing Using GPUs (held with ASPLOS '14) (GPGPU-7).
    [PDF] [Slides] [DOI] [BibTeX]