Welcome to Suraj Datalab Documentation
Suraj Datalab
is a powerful Python package designed to streamline data analysis tasks. This package provides a set of tools for data cleaning, feature analysis, and preparing datasets for machine learning models. Whether you're dealing with categorical data, numerical data, or need to create folds for cross-validation, Suraj Datalab
has you covered.
Key Features
Data Analysis
- Categorical Feature Analysis: Analyze the distribution of categorical features with respect to a target variable.
- Numerical Feature Analysis: Explore the distribution of numerical features, including outlier detection and visualization.
Data Cleaning
- Missing Values Summary: Quickly generate a summary of missing values in your dataset.
- Rare Category Replacement: Automatically identify and replace rare categories in your data to ensure robust models.
Cross-Validation Preparation
- K-Folds Creation: Easily create standard K-Folds for cross-validation.
- Stratified K-Folds for Classification: Ensure balanced folds in classification tasks by using stratified K-Folds.
- Stratified K-Folds for Regression: Create stratified K-Folds for regression tasks using various binning methods.
Getting Started
To get started with Suraj Datalab
, check out the Usage Guide, which provides detailed examples and instructions on how to use each function in the package.
Installation
You can install Suraj Datalab
by cloning the repository and installing the required dependencies:
git clone https://github.com/yourusername/suraj_datalab.git
cd suraj_datalab
pip install -r requirements.txt
Documentation
For more detailed information about each function and how to use it, refer to the API Reference.
Learn More
For detailed usage instructions, please visit the Usage Guide.
For more information about my work, other projects, or to get in touch, visit my personal website
If you have any feedback or suggestions, feel free to open an issue on GitHub.