Photo: Wikipedia

Call for Papers and Submission Guidelines

The irreversible dependence on computing technology has paved the way for cybersecurity’s rapid emergence as one of modern society’s grand challenges. To combat the ever-evolving, highly-dynamic threat landscape, numerous academics and industry professionals are systematically searching through billions of log files, social media platforms (e.g., Dark Web), malware files, and other data sources to preemptively identify, mitigate, and remediate emerging threats and key threat actors. Artificial Intelligence (AI)-enabled analytics has started to play a pivotal role in sifting through large quantities of these heterogeneous cybersecurity data to execute fundamental cybersecurity tasks such as asset management, vulnerability prioritization, threat forecasting, and controls allocations. However, the volume, variety, veracity, and variety of cybersecurity data sharply contrasts with conventional data sources. Furthermore, significant challenges need to be addressed before an AI-based system can be deployed and operated in practice as a critical component of cyber defense. Major challenges include scale of the problem, adaptability, inference speed and efficiency, adversarial resilience, the urging demand for explainability, and the need for integrating human in the loop. Finally, industry and academic AI-enabled cybersecurity analytics are often siloed, which has slowed down the progress of addressing these challenges. To these ends, this workshop aims to convene academics and practitioners (from industry and government) to share, disseminate, and communicate completed research papers, work in progress, and review articles about AI-enabled cybersecurity analytics and deployable AI-based security defenses. Areas of interest include, but are not limited to:

Each manuscript must clearly articulate their data (e.g., key metadata, statistical properties, etc.), analytical procedures (e.g., representations, algorithm details, etc.), and evaluation set up and results (e.g., performance metrics, statistical tests, case studies, etc.). Providing these details will help reviewers better assess the novelty, technical quality, and potential impact. Making data, code, and processes publicly available to facilitate scientific reproducibility is not required. However, it is strongly encouraged, as it can help facilitate a culture of data/code sharing in this quickly developing discipline.

All submissions must be in PDF format and formatted according to the new Standard ACM Conference Proceedings Template. Submissions are limited to a 4-page initial submission, excluding references or supplementary materials. Upon acceptance, the authors are allowed to include an additional page (5-page total) for that camera ready version that accounts for reviewer comments. Authors should use supplementary material only for minor details that do not fit in the 4 pages, but enhance the scientific reproducibility of the work (e.g., model parameters). Since all reviews are double-blind, and author names and affiliations should NOT be listed. For accepted papers, at least one author must attend the workshop to present the work. Based on the reviews received, accepted papers will be designated as a contributed talk (four total, 15 minutes each), or as a poster. All accepted papers will be posted on the workshop website (will not appear in proceedings per ACM KDD Workshop regulations).


This workshop will be held on 8/15 at 1 PM – 5 PM Eastern Time. An agenda for the workshop is as follows:

1:00 - 1:05 pm ET Opening Remarks by Workshop Co-Chairs
1:05 - 2:00 pm ET Keynote 1
ML-based Security for web3 world of crypto
(​​Rajarshi Gupta)
2:00 - 3:20 pm ET Full Paper Presentation (20 min each paper, including QA)

2:00 - 2:20 pm ET
Bolstering Binary Datasets for Malware Detection Through Programmatic Data Augmentation; Michael D. Wong, Edward Raff, James Holt, and Ravi Netravali. (Paper)

2:20- 2:40 pm ET
Firenze: Model Evaluation Using Weak Signals; Bhavna Soman, Ali Torkamani, Michael J. Morais, Jeffrey Bickford, and Baris Coskun. (Paper)

2:40 - 3:00 pm ET
Ensemble of Deep Learning Models for Detecting DGA Malware; Kuan Tung, Kristina Hardi, Iris Safaka, and Theus Hossmann. (Paper)

3:00 - 3:20 pm ET
Sourcing Language Models and Text Information for Inferring Cyber Threat, Vulnerability and Mitigation Relationships; Erik Hemberg, Ashwin Srinivasan, Nick Rutar, and Una-May O’Reilly. (Paper)
3:20 - 3:30 pm ET Break
3:30 - 4:30 pm ET Keynote 2
Why I Hate Parsers and You Should Too
(Edward Raff)
4:30 - 4:55 pm ET Lighting talk (7 min each paper, including QA)

Using Synthetic Data to Reduce Model Convergence Time in Federated Learning
Audio Deepfake Detection: Do we have to use complex feature sets for success?
UDAW: Unsupervised Detection of Anomalies using Word2vec
4:55 - 5:00 pm ET Closing Remark

Keynote Speakers

Keynote Title: Why I Hate Parsers and You Should Too

Abstract: Cyber security at large, especially malware analysis, often spans multiple file formats and technical details. Extraordinary effort by the research community in dynamic analysis, sandboxing, kernel hooks, and other technologies delve deep into parsing all these formats to obtain the most accurate system possible. However, the research community often fails to neglect the runtime constraints of real-world solutions. This talk will go over why absconding parsing is good for AI research, better for practical cyber security, and how we’ve pushed forward in this direction.

BIO: Dr. Edward Raff is a Chief Scientist at Booz Allen Hamilton, Visiting professor at the University of Maryland, Baltimore County, and chair of the Conference on Applied Machine Learning and Information Systems. His research covers broad areas of basic and applied machine learning including reproducibility, adversarial attacks/defense, high performance computing, and all of these aspects merge at the intersection of machine learning for malware detection/analysis. Dr. Raff’s work has won five best paper awards, working at the intersection of academia, government, and industry.

Keynote Title: ML-based Security for web3 world of crypto

Abstract: The world of web3 encompasses billions of $$ in assets, across DeFis, DAOs, NFTs and cryptocurrencies. But this makes it a lucrative target for bad actors. Machine Learning is one of the most powerful tools to secure the web3 ecosystem. We use anomaly detection algorithms to identify fraud and malicious activity as it affects our users. However, the details of those algorithms tend to be different due to the unique nature of web3, where all transactions are public but all information is pseudonymized. In this talk, I will touch upon the wide applications of ML at Coinbase to secure the web3 ecosystem - novel challenges of managing crypto data, areas where we have unique insights, and research areas with unsolved challenges to tackle.

BIO: Rajarshi Gupta is the Head of Machine Learning at Coinbase, leading the ML science, engineering and ML platforms. Prior to this, Rajarshi was GM, ML Services at AWS, building a new service using AI to solve real-world enterprise challenges. Earlier, as VP and Head of AI at Avast Software, he was responsible for AI products for Avast 450M consumers. Rajarshi also worked for many years at Qualcomm Research, where he created ‘Smart Protect’, the first ever product to achieve On-Device Machine Learning for Security and shipped in over 1 billion Snapdragon chipsets. Rajarshi has a PhD in EECS from UC Berkeley and has built a unique expertise at the intersection of Artificial Intelligence and Cybersecurity. Rajarshi has authored 200+ U.S. Patents and is featured on the Wikipedia page for ‘most prolific inventors’ in history.

Key Dates

Submission Site

Submission Site: Easy Chair Submission

Workshop Co-Chairs

Dr. Sagar Samtani
Indiana University

Dr. Gang Wang
University of Illinois

Dr. Ali Ahmadzadeh
Blue Hexagon

Dr. Jay Yang
Rochester Institute of Technology

Dr. Hsinchun Chen
University of Arizona

Program Committee (listed alphabetically based on last name)