Skip to content

Welcome to Suraj Datalab Documentation

Suraj Datalab is a powerful Python package designed to streamline data analysis tasks. This package provides a set of tools for data cleaning, feature analysis, and preparing datasets for machine learning models. Whether you're dealing with categorical data, numerical data, or need to create folds for cross-validation, Suraj Datalab has you covered.

Key Features

Data Analysis

  • Categorical Feature Analysis: Analyze the distribution of categorical features with respect to a target variable.
  • Numerical Feature Analysis: Explore the distribution of numerical features, including outlier detection and visualization.

Data Cleaning

  • Missing Values Summary: Quickly generate a summary of missing values in your dataset.
  • Rare Category Replacement: Automatically identify and replace rare categories in your data to ensure robust models.

Cross-Validation Preparation

  • K-Folds Creation: Easily create standard K-Folds for cross-validation.
  • Stratified K-Folds for Classification: Ensure balanced folds in classification tasks by using stratified K-Folds.
  • Stratified K-Folds for Regression: Create stratified K-Folds for regression tasks using various binning methods.

Getting Started

To get started with Suraj Datalab, check out the Usage Guide, which provides detailed examples and instructions on how to use each function in the package.

Installation

You can install Suraj Datalab by cloning the repository and installing the required dependencies:

git clone https://github.com/yourusername/suraj_datalab.git
cd suraj_datalab
pip install -r requirements.txt

Documentation

For more detailed information about each function and how to use it, refer to the API Reference.

Learn More

For detailed usage instructions, please visit the Usage Guide.

For more information about my work, other projects, or to get in touch, visit my personal website


If you have any feedback or suggestions, feel free to open an issue on GitHub.