sql-data-profiler

sql-data-profiler is a utility module that generates sql code to profile data sets (eg. tables) in Redshift.

Installation

npm install sql-data-profiler --save

Quick start

var sql_data_profiler = require('sql-data-profiler');

Table profiler

Generates a sql statement that provides basic stats on the data in the "contacts" table.

var data_profiler = sql_data_profiler.data_profiler;
 
var options = {
    target_table: 'contacts',
    target_columns: ['email', 'a_industry']
};
var sql_code = data_profiler(options);

which accepts the following options

target_table
target_columns
results_table
calculate_frequency
use_perm_table
truncate_table

Distribution analysis

** What does this do?**

var distribution_analyzer = sql_data_profiler.distribution_analyzer;
 
var sql_code = distribution_analyzer({
    target_table: 'contacts',
    target_variable: {
        data_set: 'SELECT DISTINCT contact_id FROM events WHERE meta_event = \'conversion\'',
        join: {
            source_column: 'contact_id',
            target_column: 'contact_id'
        }
    },
    target_columns: [
        {
            name: 'a_name',
            max_number_of_values: 10,
        },
        {
            name: 'a_gender',
            max_number_of_values: 10
        },
        {
            name: 'a_address',
            max_number_of_values: 10,
            transformation: 'has_any_value'
        }
    ]
});

Table stats

The following stats are calculated for each column.

Stats	Description
count_total	number of records in the table
count_not_null	number of records where the value for the specified column is not null
fill_rate	number of non-null values divided by number of records
count_distinct	number of distinct values
dupe_rate	number of distinct values divided by number of records where the value for the specified column is not null
maximum_value
minimum_value
most_frequent_value_1
most_frequent_value_1_frequency
most_frequent_value_2
most_frequent_value_2
most_frequent_value_3
most_frequent_value_3

TO DO

handle different data type (eg. boolean)
performance improvement

sql-data-profiler

sql-data-profiler

Installation

Quick start

Table profiler

Distribution analysis

Table stats

TO DO

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

Weekly Downloads

Version

License

Last publish

Collaborators

sql-data-profiler

sql-data-profiler

Installation

Quick start

Table profiler

Distribution analysis

Table stats

TO DO

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

DownloadsWeekly Downloads

Version

License

Last publish

Collaborators

Weekly Downloads