Skip to main content

EDA on sparse data for classification problems

Project description

Sparse profile - EDA on sparse data

Module to perform EDA tasks for a classification problem with sparse data
Curently takes only numeric values

Sample usage

import pandas as pd
import numpy as np
from sparse_profile import sparse_profile

df = pd.DataFrame({
        'target' : [1, 1, 1, 1, 0, 0 ,0 ,0, 1, 0],
        'col_1' :  [1, 0, 0, 0, 0, 0, 0, 0, 0, 9],
        'col_2' :  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    })
sProfile = sparse_profile(df, 'target')
print(sProfile.top_gain)

Output maximum gain obtained from each column

col_2    0.422810
col_1    0.074882
dtype: float64
print(sProfile.report_sparsity)

Output percentage of zeros in column

col_1  0.8
col_2  0.1

Various sparse_profile reports can be accessed as attributes of the sparse_profile class object. List of all available attributes:

  • report_sparsity:      pandas dataframe, Percentage of zeros in each column
  • report_distinct:       pandas dataframe, Count of distinct non zero values in each column
  • report_overall:        pandas dataframe, Overall summary of each column (similar to pandas describe())
  • report_non_zero:    pandas dataframe, Summary of each column after removing zeros
  • gain_df:                   pandas dataframe, Relative information gain at decile cutoffs for each column wrt target column
  • auc_df:                    pandas dataframe, AUC of each column wrt target column
  • top_gain:                pandas dataframe, Columns sorted by maximum gain obtained from gain_df

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparse_profile-0.1.1.tar.gz (4.9 kB view hashes)

Uploaded Source

Built Distribution

sparse_profile-0.1.1-py3-none-any.whl (5.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page