Skip to content

Anujmishra2005/Malware-Detection-in-Software-System-Using-Machine-Learning

Repository files navigation

Malware Detection Using Machine Learning

Project Overview

This project focuses on detecting malware using various machine learning techniques, including Logistic Regression, KNN, ANN, CNN, and Random Forest. The goal is to analyze a dataset (Malware_dataset.csv) and build models that can predict whether a given software is malicious or not based on its features.

Table of Contents

Technologies Used

  • Programming Languages: Python
  • Libraries:
    • numpy
    • pandas
    • matplotlib
    • seaborn
    • scikit-learn
    • tensorflow (for ANN and CNN)
    • keras
  • Tools:
    • Jupyter Notebook for interactive development
    • GitHub for version control

Dataset

The dataset used in this project is the Malware_dataset.csv, which contains various attributes about software samples, such as byte sequences, file characteristics, and labels indicating whether the software is benign or malicious.

  • Dataset Source: Kaggle (or specify your dataset source here).
  • Features:
    • Features might include information like file size, byte-level data, execution time, and others.
    • The label indicates whether a software is benign or malicious.

Models Implemented

Logistic Regression

Logistic Regression is a basic model that is often used for binary classification problems. It outputs probabilities to predict the class label.

K-Nearest Neighbors (KNN)

KNN is a simple algorithm that classifies data points based on the majority class of their neighbors.

Artificial Neural Network (ANN)

ANN is a deep learning model inspired by the structure of the human brain. It is used for tasks that require high-dimensional input data like images or sequence data.

Convolutional Neural Network (CNN)

CNNs are a class of deep learning models commonly used for image classification but are also used for sequence data such as time series or malware detection.

Random Forest

Random Forest is an ensemble learning technique that combines multiple decision trees to improve accuracy and avoid overfitting.

Feature Extraction

Feature extraction is the process of transforming raw data into a usable format for the machine learning models. Common techniques used for feature extraction include scaling, normalization, and encoding.

Model Evaluation

Model performance is evaluated using various metrics such as:

  • Accuracy: Measures the percentage of correctly classified samples.
  • Precision and Recall: Evaluate the classifier’s ability to handle positive and negative samples.
  • F1-Score: The harmonic mean of precision and recall.
  • Confusion Matrix: Provides a detailed breakdown of the true and false predictions made by the classifier.

About

This innovative project uses Machine Learning to detect malware in software systems, leveraging algorithms like ANN , CNN , KNN , PCA , Random Forest and Logistic Regression. By analyzing data patterns, it automatically classifies software as Legit or Malicious. It's a powerful AI-driven solution that strengthens cybersecurity and combats emerging

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors