Kaize Ding

(丁凯泽)

Data Mining and Machine Learning Laboratory
School of Computing and Augmented Intelligence
Arizona State University

About me: I am a PhD student of Computing Science and Engineering, Arizona State University. I work as a research assistant at Data Mining and Machine Learning Laboratory, advised by Professor Huan Liu. Before that, I obtained my master and bachelor degrees from Beijing University of Posts and Telecommunications (BUPT).

Research Interest: My research interests generally lie in data mining and machine learning, recently I'm focusing on graph neural networks, and its applications such as anomaly detection, recommendation, etc.


News

10/2021
One paper got accepted in WSDM 2022.
08/2021
One paper got accepted in EMNLP 2021.
08/2021
Two papers got accepted in CIKM 2021.
08/2021
Invited to be PC member of WSDM 2022.
01/2021
One paper got accepted in WWW 2021.
12/2020
One paper got accepted in SDM 2021.
12/2020
One paper got accepted in AAAI 2021.
12/2020
Invited to be PC member of NAACL 2021, ACL 2021.
09/2017
I enrolled as a PhD student at ASU.

Selected Papers

[Google Scholar] [Full List]

Learning to Selectively Learn for Weakly-supervised Paraphrase Generation
Kaize Ding, Dingcheng Li, Alexander Hanbo Li, Xing Fan, Chenlei Guo, Yang Liu, and Huan Liu
Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021.
@InProceedings{ding2021learning,
  title     = {Learning to Selectively Learn for Weakly-supervised Paraphrase Generation},
  author    = {Ding, Kaize and Li, Dingcheng and Li, Alexander Hanbo and Fan, Xing and Guo, Chenlei and Liu, Yang and Liu, Huan},
  booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year     = {2021},
}

Paraphrase generation is a longstanding NLP task that has diverse applications for downstream NLP tasks. However, the effectiveness of existing efforts predominantly relies on large amounts of golden labeled data. Though unsupervised endeavors have been proposed to address this issue, they may fail to generate meaningful paraphrases due to the lack of supervision signals. In this work, we go beyond the existing paradigms and propose a novel approach to generate high-quality paraphrases with weak supervision data. Specifically, we tackle the weakly-supervised paraphrase generation problem by: (1) obtaining abundant weakly-labeled parallel sentences via retrieval-based pseudo paraphrase expansion; and (2) developing a meta-learning framework to progressively select valuable samples for fine-tuning a pre-trained language model, i.e., BART, on the sentential paraphrasing task. We demonstrate that our approach achieves significant improvements over existing unsupervised approaches, and is even comparable in performance with supervised state-of-the-arts.

Few-shot Network Anomaly Detection with Cross-network Meta-learning
Kaize Ding*, Qinghai Zhou*, Hanghang Tong, and Huan Liu (*equal contribution)
The Web Conference (formerly WWW) 2021.
@InProceedings{ding2021few,
  title     = {Few-shot Network Anomaly Detection via Cross-network Meta-learning},
  author    = {Ding, Kaize and Zhou, Qinghai and Tong, Hanghang and Liu, Huan},
  booktitle = {Proceedings of the Web Conference 2021},
  year      = {2021}
}

In general, graph neural networks (GNNs) adopt the message-passing scheme to capture the information of a node (i.e., nodal attributes, and local graph structure) by iteratively transforming, aggregating the features of its neighbors. Nonetheless, recent studies show that the performance of GNNs can be easily hampered by the existence of abnormal or malicious nodes due to the vulnerability of neighborhood aggregation. Thus it is necessary to learn anomaly-resistant GNNs without the prior knowledge of ground-truth anomalies, given the fact that labeling anomalies is costly and requires intensive domain knowledge. In order to keep the effectiveness of GNNs on anomaly-contaminated graphs, in this paper, we propose a new framework named RARE-GNN (Reinforced Anomaly-Resistant Graph Neural Networks) which can detect anomalies from the input graph and learn anomaly-resistant GNNs simultaneously. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed framework.

Be More with Less: Hypergraph Attention Networks for Inductive Text Classification
Kaize Ding, Jianling Wang, Jundong Li, Dingchneg Li, and Huan Liu
Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020.
@InProceedings{ding2020more,
  title     = {Be more with less: Hypergraph attention networks for inductive text classification},
  author    = {Ding, Kaize and Wang, Jianling and Li, Jundong and Li, Dingcheng and Liu, Huan},
  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year     = {2020},
}

Text classification is a critical research topic with broad applications in natural language processing. Recently, graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task. Despite the success, their performance could be largely jeopardized in practice since they are: (1) unable to capture high-order interaction between words; (2) inefficient to handle large datasets and new documents. To address those issues, in this paper, we propose a principled model -- hypergraph attention networks (HyperGAT), which can obtain more expressive power with less computational consumption for text representation learning. Extensive experiments on various benchmark datasets demonstrate the efficacy of the proposed approach on the text classification task.

Graph Prototypical Networks for Few-shot Learning on Attributed Networks
Kaize Ding, Jianling Wang, Jundong Li, Kai Shu, Chenghao Liu, and Huan Liu
ACM International Conference on Information and Knowledge Management (CIKM) 2020.
@InProceedings{ding2020graph,
  title     = {Graph prototypical networks for few-shot learning on attributed networks},
  author    = {Ding, Kaize and Wang, Jianling and Li, Jundong and Shu, Kai and Liu, Chenghao and Liu, Huan},
  booktitle = {Proceedings of the 29th ACM International Conference on Information \& Knowledge Management},
  year     = {2020},
}

Attributed networks nowadays are ubiquitous in a myriad of high-impact applications, such as social network analysis, financial fraud detection, and drug discovery. As a central analytical task on attributed networks, node classification has received much attention in the research community. In real-world attributed networks, a large portion of node classes only contain limited labeled instances, rendering a long-tail node class distribution. Existing node classification algorithms are unequipped to handle the few-shot node classes. As a remedy, few-shot learning has attracted a surge of attention in the research community. Yet, few-shot node classification remains a challenging problem as we need to address the following questions: (i) How to extract meta-knowledge from an attributed network for few-shot node classification? (ii) How to identify the informativeness of each labeled instance for building a robust and effective model? To answer these questions, in this paper, we propose a graph meta-learning framework -- Graph Prototypical Networks (GPN). By constructing a pool of semi-supervised node classification tasks to mimic the real test environment, GPN is able to perform meta-learning on an attributed network and derive a highly generalizable model for handling the target classification task. Extensive experiments demonstrate the superior capability of GPN in few-shot node classification.

Deep Anomaly Detection on Attributed Networks with Graph Convolutional Networks
Kaize Ding, Jundong Li, Rohit Bhanushali, and Huan Liu
SIAM International Conference on Data Mining (SDM) 2019.
@InProceedings{ding2019deep,
  title     = {Deep anomaly detection on attributed networks},
  author    = {Ding, Kaize and Li, Jundong and Bhanushali, Rohit and Liu, Huan},
  booktitle = {Proceedings of the 2019 SIAM International Conference on Data Mining},
  year     = {2019},
}

Attributed networks are ubiquitous and form a critical component of modern information infrastructure, where additional node attributes complement the raw network structure in knowledge discovery. Recently, detecting anomalous nodes on attributed networks has attracted an increasing amount of research attention, with broad applications in various high-impact domains, such as cybersecurity, finance, and healthcare. Most of the existing attempts, however, tackle the problem with shallow learning mechanisms by ego-network or community analysis, or through subspace selection. Undoubtedly, these models cannot fully address the computational challenges on attributed networks. For example, they often suffer from the network sparsity and data nonlinearity issues, and fail to capture the complex interactions between different information modalities, thus negatively impact the performance of anomaly detection. To tackle the aforementioned problems, in this paper, we study the anomaly detection problem on attributed networks by developing a novel deep model. In particular, our proposed deep model: (1) explicitly models the topological structure and nodal attributes seamlessly for node embedding learning with the prevalent graph convolutional network (GCN); and (2) is customized to address the anomaly detection problem by virtue of deep autoencoder that leverages the learned embeddings to reconstruct the original data. The synergy between GCN and autoencoder enables us to spot anomalies by measuring the reconstruction errors of nodes from both the structure and the attribute perspectives. Extensive experiments on real-world attributed network datasets demonstrate the efficacy of our proposed algorithm.

Interactive Anomaly Detection on Attributed Networks
Kaize Ding, Jundong Li, and Huan Liu
ACM International Conference on Web Search and Data Mining (WSDM) 2019.
@InProceedings{ding2019interactive,
  title     = {Interactive anomaly detection on attributed networks},
  author    = {Ding, Kaize and Li, Jundong and Liu, Huan},
  booktitle = {Proceedings of the twelfth ACM international conference on web search and data mining},
  year     = {2019},
}

Performing anomaly detection on attributed networks concerns with finding nodes whose patterns or behaviors deviate significantly from the majority of reference nodes. Its success can be easily found in many real-world applications such as network intrusion detection, opinion spam detection and system fault diagnosis, to name a few. Despite their empirical success, a vast majority of existing efforts are overwhelmingly performed in an unsupervised scenario due to the expensive labeling costs of ground truth anomalies. In fact, in many scenarios, a small amount of prior human knowledge of the data is often effortless to obtain, and getting it involved in the learning process has shown to be effective in advancing many important learning tasks. Additionally, since new types of anomalies may constantly arise over time especially in an adversarial environment, the interests of human expert could also change accordingly regarding to the detected anomaly types. It brings further challenges to conventional anomaly detection algorithms as they are often applied in a batch setting and are incapable to interact with the environment. To tackle the above issues, in this paper, we investigate the problem of anomaly detection on attributed networks in an interactive setting by allowing the system to proactively communicate with the human expert in making a limited number of queries about ground truth anomalies. Our objective is to maximize the true anomalies presented to the human expert after a given budget is used up. Along with this line, we formulate the problem through the principled multi-armed bandit framework and develop a novel collaborative contextual bandit algorithm, named GraphUCB. In particular, our developed algorithm: (1) explicitly models the nodal attributes and node dependencies seamlessly in a joint framework; and (2) handles the exploration-exploitation dilemma when querying anomalies of different types. Extensive experiments on real-world datasets show the improvement of the proposed algorithm over the state-of-the-art algorithms.


Experiences

  • Microsoft Research, Research Intern, 2021
  • Amazon Alexa AI, Applied Scientist Intern, 2020
  • Microsoft Research Asia, Research Intern, 2016, 2017
  • Chinese University of Hong Kong, Research Assistant, 2017
  • Meituan, Intern, 2015
  • Sogou, Intern, 2014

Services

Program Committee

CIKM'21, ECML-PKDD'21, EMNLP21, ACL'21, IJCAI'21, IJCAI'20, ECML-PKDD'20

External Reviewer

KDD'19, WWW'19, SIGIR'18, ASONAM'18


Honors and Awards

  • ASU CIDSE Doctoral Fellowship, 2021
  • ASU Engineering Graduate Fellowship, 2019, 2020
  • ASU GPSA Travel Award, 2019
  • SDM Student Travel Award, 2019
  • Stars of Tomorrow (Award of Excellent Intern), Microsoft Research Asia, 2017

  Last update 2021