Malware Detection Using Machine Learning

Project Overview

This project focuses on detecting malware using various machine learning techniques, including Logistic Regression, KNN, ANN, CNN, and Random Forest. The goal is to analyze a dataset (Malware_dataset.csv) and build models that can predict whether a given software is malicious or not based on its features.

Technologies Used

Programming Languages: Python
Libraries:
- numpy
- pandas
- matplotlib
- seaborn
- scikit-learn
- tensorflow (for ANN and CNN)
- keras
Tools:
- Jupyter Notebook for interactive development
- GitHub for version control

Dataset

The dataset used in this project is the Malware_dataset.csv, which contains various attributes about software samples, such as byte sequences, file characteristics, and labels indicating whether the software is benign or malicious.

Dataset Source: Kaggle (or specify your dataset source here).
Features:
- Features might include information like file size, byte-level data, execution time, and others.
- The label indicates whether a software is benign or malicious.

Models Implemented

Logistic Regression

Logistic Regression is a basic model that is often used for binary classification problems. It outputs probabilities to predict the class label.

K-Nearest Neighbors (KNN)

KNN is a simple algorithm that classifies data points based on the majority class of their neighbors.

Artificial Neural Network (ANN)

ANN is a deep learning model inspired by the structure of the human brain. It is used for tasks that require high-dimensional input data like images or sequence data.

Convolutional Neural Network (CNN)

CNNs are a class of deep learning models commonly used for image classification but are also used for sequence data such as time series or malware detection.

Random Forest

Random Forest is an ensemble learning technique that combines multiple decision trees to improve accuracy and avoid overfitting.

Feature Extraction

Feature extraction is the process of transforming raw data into a usable format for the machine learning models. Common techniques used for feature extraction include scaling, normalization, and encoding.

Model Evaluation

Model performance is evaluated using various metrics such as:

Accuracy: Measures the percentage of correctly classified samples.
Precision and Recall: Evaluate the classifier’s ability to handle positive and negative samples.
F1-Score: The harmonic mean of precision and recall.
Confusion Matrix: Provides a detailed breakdown of the true and false predictions made by the classifier.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
MDS.png		MDS.png
Malware_Detection_in_Software_Systems_Using_Machine_Learning.ipynb		Malware_Detection_in_Software_Systems_Using_Machine_Learning.ipynb
P1 Malware Detection.csv		P1 Malware Detection.csv
P1 Malware Detection.doc		P1 Malware Detection.doc
P1 Malware Detection.ipynb		P1 Malware Detection.ipynb
README.md		README.md
malware_dataset.csv		malware_dataset.csv
malware_detection_updated_with_models.py		malware_detection_updated_with_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Malware Detection Using Machine Learning

Project Overview

Table of Contents

Technologies Used

Dataset

Models Implemented

Logistic Regression

K-Nearest Neighbors (KNN)

Artificial Neural Network (ANN)

Convolutional Neural Network (CNN)

Random Forest

Feature Extraction

Model Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Malware Detection Using Machine Learning

Project Overview

Table of Contents

Technologies Used

Dataset

Models Implemented

Logistic Regression

K-Nearest Neighbors (KNN)

Artificial Neural Network (ANN)

Convolutional Neural Network (CNN)

Random Forest

Feature Extraction

Model Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages