Data Mining and Data Warehousing Lab

This repository contains practical implementations of various data mining and warehousing tasks. The projects utilize machine learning models, data preprocessing techniques, and clustering algorithms on different datasets like medical records, fuel consumption, customer segmentation, and academic performance.

Read diabetes.csv for diabetes that datasets consist of several medical predictor variables and one target variable, Outcome. Predictor variables include the number of pregnancies the patient has had, their BMI, insulin level, age, and so on. Experiment with the following issues with python programming language-

Tasks:

a) Show the number of patients information using a pie chart.
b) Handle missing values using mean value for one column, median for another and mode for 3rd one if (any).
c) Plot the boxplot of the pre-processed dataset.
d) Compare the performance results of the ML model like LR, SVM and DT.
e) Show the confusion matrix of your results.

View the Jupyter Notebook for this task

2. Petrol Consumption Dataset

Read petrol_consumption.csv Apply and Experiment with the following issues with python programming language:

Tasks:

a) Predict the fuel consumption using multiple linear regression.
b) Show and compare the results using 70:30, and 80:20 distribution during the training of the dataset.
c) Show the actual and predicted value in a scatter plot for 80:20 distribution.
d) Find the Mean Absolute Error.

View the Jupyter Notebook for this task

3. Mall Customers Dataset

Load the Mall_Customers.csv

Tasks:

a) Visualize male and female customer spending scores.
b) Find the ideal number of k using the elbow method.
c) Apply k-means clustering using 4 clusters and 5 clusters.
d) Draw the graph.

View the Jupyter Notebook for this task

4. Marks Dataset

Load the Marks.csv file. Then do the following:

Tasks:

a) Write the statement to display the first and third quartiles of all subjects
b) Find the standard deviation and variance of each subject
c) Find the summary of the data

View the Jupyter Notebook for this task

5. Label Encoding

It covers the Label Encoding technique to transform categorical data into a numerical format.

Tasks:

Apply Label Encoding to categorical variables in datasets.
Visualize the transformations.

View the Label Encoding Notebook

6. One Hot Encoding

This section focuses on One Hot Encoding for converting categorical data into a format suitable for machine learning algorithms.

Tasks:

Apply One Hot Encoding to transform categorical variables.
Show how to handle categorical features in machine learning pipelines.

View the One Hot Encoding Notebook

7. LR_SVM_DT_KNN_MLP_RF_GB_LGB

This section focuses on the performance comparison of multiple classifiers such as Logistic Regression (LR), SVM, Decision Trees, KNN, MLP, Random Forest (RF), Gradient Boosting (GB), and LightGBM (LGB).

Tasks:

Train multiple classifiers on the diabetes dataset.
Compare the performance using accuracy, confusion matrix, and F1 score.
Plot the results for visualization.

View the Classifier Comparison Notebook

8. Assignment

This assignment focuses on applying data preprocessing techniques to a dataset.

Tasks:

Implement Label Encoding and One Hot Encoding to handle categorical data.
Plot correlation heatmaps to visualize relationships between variables.
Apply standardization to scale features for model training.

View the Assignment Notebook

Getting Started

Clone the repository:

git clone https://github.com/nishatrhythm/Data-Mining-and-Data-Warehousing-Lab.git

Prerequisites

Ensure that you have Python installed, along with all necessary dependencies. You can install the dependencies using the requirements.txt file:

pip install -r requirements.txt

Usage

Navigate to the respective dataset directory and run the corresponding Python scripts or open the Jupyter notebooks to experiment with the code.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Mining and Data Warehousing Lab

Table of Contents

1. Diabetes Dataset

Tasks:

2. Petrol Consumption Dataset

Tasks:

3. Mall Customers Dataset

Tasks:

4. Marks Dataset

Tasks:

5. Label Encoding

Tasks:

6. One Hot Encoding

Tasks:

7. LR_SVM_DT_KNN_MLP_RF_GB_LGB

Tasks:

8. Assignment

Tasks:

Getting Started

Prerequisites

Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Assignment		Assignment
LR_SVM_DT_KNN_MLP_RF_GB_LGB		LR_SVM_DT_KNN_MLP_RF_GB_LGB
Label Encoding		Label Encoding
One Hot Encoding		One Hot Encoding
.gitattributes		.gitattributes
1.ipynb		1.ipynb
2.ipynb		2.ipynb
3.ipynb		3.ipynb
4.ipynb		4.ipynb
LICENSE		LICENSE
Mall_Customers.csv		Mall_Customers.csv
Marks.csv		Marks.csv
README.md		README.md
diabetes.csv		diabetes.csv
petrol_consumption.csv		petrol_consumption.csv

Folders and files

Latest commit

History

Repository files navigation

Data Mining and Data Warehousing Lab

Table of Contents

1. Diabetes Dataset

Tasks:

2. Petrol Consumption Dataset

Tasks:

3. Mall Customers Dataset

Tasks:

4. Marks Dataset

Tasks:

5. Label Encoding

Tasks:

6. One Hot Encoding

Tasks:

7. LR_SVM_DT_KNN_MLP_RF_GB_LGB

Tasks:

8. Assignment

Tasks:

Getting Started

Prerequisites

Usage

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages