Skip to content

Jananijana2712/CyberSecurity_Classification

Repository files navigation

CyberSecurity Classification

Overview

This repository contains an experimental machine learning study conducted on a cybersecurity-related text classification dataset.

The dataset was provided by my juniors as part of their undergraduate project work. I explored various feature extraction techniques and classification models to evaluate different approaches for text classification and predictive analysis.

The objective was to experiment with multiple machine learning and deep learning techniques and compare their effectiveness on the given dataset.

Dataset

The study was performed using labeled datasets related to Trump and U.S. Presidential Election discussions.

Files included:

  • LabelledDataset.csv
  • LabelledDataset.xls
  • UpdatedDataset.csv
  • cleaned_datasetChecked.csv

Data Preprocessing

The dataset was cleaned and prepared through:

  • Data cleaning
  • Duplicate removal
  • Missing value handling
  • Text normalization
  • Feature preparation

Feature Extraction Techniques

Word Embedding

  • GloVe (Global Vectors for Word Representation)

Graph-Based Features

  • Graph Neural Network (GNN)

Probabilistic Features

  • Probabilistic Neural Network (PNN)

Models Implemented

Deep Learning Models

LSTM with GloVe Embeddings

  • Sequence modeling
  • Context-aware text representation

Graph Convolutional Network (GCN)

  • Graph-based classification
  • Node relationship learning

CNN-Based Classification

  • Convolutional Neural Network for text classification

Probabilistic Models

Probabilistic Neural Network (PNN)

  • Probabilistic classification approach

PNN + Random Forest

  • Hybrid classification model

PNN + Voting Classifier

  • Ensemble learning approach

GloVe + Voting Classifier

  • Ensemble learning using word embeddings

Probabilistic CNN

  • Combination of probabilistic and convolutional techniques

Technologies Used

  • Python
  • Jupyter Notebook
  • Pandas
  • NumPy
  • Scikit-Learn
  • TensorFlow
  • Keras

Repository Contents

  • Dataset Files
  • Data Preprocessing
  • Feature Extraction
  • Classification Models
  • Experimental Results
  • Jupyter Notebook Implementations

Purpose

This repository represents an independent exploration of machine learning techniques applied to a provided dataset. The work was performed to gain practical experience with:

  • Text Classification
  • Deep Learning Models
  • Graph-Based Learning
  • Ensemble Methods
  • Feature Engineering

Learning Outcomes

Through this study, I gained experience in:

  • Applying multiple classification algorithms
  • Comparing traditional and deep learning approaches
  • Working with GloVe embeddings
  • Understanding graph-based learning concepts
  • Evaluating ensemble classification techniques

Disclaimer

This was an experimental learning exercise conducted on a dataset provided by undergraduate students. The repository serves as a demonstration of machine learning experimentation and comparative model evaluation rather than a formal research project.

About

Experimental study on cybersecurity-related text classification using GloVe embeddings, GNN, PNN, CNN, Random Forest, and ensemble learning techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors