Indox Retrieval Augmentation

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Software Development :: Build Tools

Project description

inDox
Advanced Search and Retrieval Augmentation Generative

Official Website • Documentation • Discord

NEW: Subscribe to our mailing list for updates and news!

Indox Retrieval Augmentation is an innovative application designed to streamline information extraction from a wide range of document types, including text files, PDF, HTML, Markdown, and LaTeX. Whether structured or unstructured, Indox provides users with a powerful toolset to efficiently extract relevant data.

Indox Retrieval Augmentation is an innovative application designed to streamline information extraction from a wide range of document types, including text files, PDF, HTML, Markdown, and LaTeX. Whether structured or unstructured, Indox provides users with a powerful toolset to efficiently extract relevant data. One of its key features is the ability to intelligently cluster primary chunks to form more robust groupings, enhancing the quality and relevance of the extracted information. With a focus on adaptability and user-centric design, Indox aims to deliver future-ready functionality with more features planned for upcoming releases. Join us in exploring how Indox can revolutionize your document processing workflow, bringing clarity and organization to your data retrieval needs.

Dependency Requirements

Before running this project, ensure that you have the following installed:

Python 3.8+: Required for running the Python backend.
PostgreSQL: Needed if you wish to store your data in a PostgreSQL database.
OpenAI API Key: Necessary if you are using the OpenAI embedding model.
HuggingFace API Key: Necessary if you are using the HuggingFace llms.

Ensure your system also meets these requirements:

Access to environmental variables for handling sensitive information like API keys.
Suitable hardware capable of supporting intensive computational tasks.

Installation

Getting Started

The following command will install the latest stable inDox

pip install Indox

To install the latest development version, you may run

pip install git+https://github.com/osllmai/inDox@main

To configure the CLI, run

indox configure

Clone the repository and navigate to the directory:

git clone https://github.com/osllmai/inDox.git
cd inDox

Install the required Python packages:

pip install -r requirements.txt

Configuration

Environment Variables

Set your OPENAI_API_KEY or HF_API_KEY in your environment variables for secure access.

Database Setup

Ensure your PostgreSQL database is up and running, and accessible from your application. This is necessary if you plan to use pgvector as your vector store.

Alternatively, you can use Chroma or Faiss as your vector store. Make sure to specify your choice and the necessary configurations in the config.yaml file.

Usage

Preparing Your Data

Define the File Path: Specify the path to your text or PDF file.
Load Embedding Models: Initialize your embedding model from OpenAI's selection of pre-trained models.

Quick Start

Import Indox Package

Import the necessary classes from the Indox package.

from Indox import IndoxRetrievalAugmentation

Initialize Indox

Create an instance of IndoxRetrievalAugmentation.

Indox = IndoxRetrievalAugmentation()

Initial Configuration

Configuration File: Ensure you locate and modify the Indox.config YAML file according to your needs before starting the application.

Dynamic Configuration Changes

For changes that need to be applied after the initial setup or during runtime:

Modifying Configurations: Use the following Python snippet to update your settings dynamically:
```
Indox.config["your_setting_that_need_to_change"] = "new_setting"
Indox.update_config()
```

Configuration Details

Here's a breakdown of the config dictionary and its properties:

PostgreSQL

conn_string: Your PostgreSQL database credentials.

Summary Model

max_tokens: Maximum token count the summary model can generate.
min_len: Minimum token count the summary model generates.
model_name: Default is gpt-3.5-turbo-0125, but it can be replaced with any Hugging Face model supporting the summarization pipeline.

PostgreSQL Setup with pgvector

If you want to use PostgreSQL for vector storage, you should perform the following steps:

Install pgvector: To install pgvector on your PostgreSQL server, follow the detailed installation instructions available on the official pgvector GitHub repository: pgvector Installation Instructions

Add Vector Extension: Connect to your PostgreSQL database and execute the following SQL command to create the pgvector extension:

-- Connect to your database
psql -U username -d database_name

-- Run inside your psql terminal
CREATE EXTENSION vector;
# Replace the placeholders with your actual PostgreSQL credentials and details

Additionally, for those interested in exploring other vector database options, you can consider using Chroma or * Faiss*. These provide alternative approaches to vector storage and retrieval that may better suit specific use cases or performance requirements.

Importing QA and Embedding Models

from Indox.QaModels import OpenAiQA

from Indox.Embeddings import OpenAiEmbedding

openai_qa = OpenAiQA(api_key=OPENAI_API_KEY,model="gpt-3.5-turbo-0125")
openai_embeddings = OpenAiEmbedding(model="text-embedding-3-small",openai_api_key=OPENAI_API_KEY)

Modifying Configuration Settings

To change a configuration setting, you can directly modify the Indox.config dictionary. Here is an example of how you can update a configuration setting:

# Example of modifying a configuration setting
Indox.config["old_config"] = "new_config"

# Applying the updated configuration
Indox.update_config()

We take advantage of the unstructured library to load documents and split them into chunks by title. This method helps in organizing thme document into manageable sections for further processing.

from Indox.DataLoaderSplitter import UnstructuredLoadAndSplit

docs_unstructured = UnstructuredLoadAndSplit(file_path=file_path)

Starting processing...
End Chunking process.

Storing document chunks in a vector store is crucial for enabling efficient retrieval and search operations. By converting text data into vector representations and storing them in a vector store, you can perform rapid similarity searches and other vector-based operations.

Indox.connect_to_vectorstore(collection_name="sample",embeddings=openai_embeddings)
Indox.store_in_vectorstore(chunks=docs_unstructured)

Quering

query = "your query!!??"

response_openai = Indox.answer_question(query=query,qa_model=openai_qa)

answer = response_openai[0]

context, score = response_openai[1]

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Software Development :: Build Tools

Release history Release notifications | RSS feed

This version

0.1.5

May 28, 2024

0.1.4

May 21, 2024

0.1.3

May 20, 2024

0.1.1

May 8, 2024

0.1

Apr 19, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Indox-0.1.5.tar.gz (32.6 kB view hashes)

Uploaded May 28, 2024 Source

Built Distribution

Indox-0.1.5-py3-none-any.whl (42.2 kB view hashes)

Uploaded May 28, 2024 Python 3

Hashes for Indox-0.1.5.tar.gz

Hashes for Indox-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`a14c0da88180478fee99c69dd73ad2959468321e44863519602a0746a9e7cf50`
MD5	`ae06444f3d54bd8aba0d44a5c240cf63`
BLAKE2b-256	`6a825b209c56fc2706ac61c5e04146e824884cf68a85bfcd99da7681add2d96e`

Hashes for Indox-0.1.5-py3-none-any.whl

Hashes for Indox-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`74be93b17948866295e471247fa08fbfb469a76b3357eec8142c0260adc412f0`
MD5	`243e5c0bedc97e13080ec27ce1b258ff`
BLAKE2b-256	`c43b9237a74213d18ddae7bde562d2fffad75d85901cd0ed08dcdc89a98266e1`

Indox 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

inDox Advanced Search and Retrieval Augmentation Generative

Dependency Requirements

Installation

Getting Started

Configuration

Environment Variables

Database Setup

Usage

Preparing Your Data

Quick Start

Import Indox Package

Initialize Indox

Initial Configuration

Dynamic Configuration Changes

Configuration Details

PostgreSQL

Summary Model

PostgreSQL Setup with pgvector

Importing QA and Embedding Models

Modifying Configuration Settings

Quering

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

inDox
Advanced Search and Retrieval Augmentation Generative