An easy to use semantic (soft) querying on pandas dataframes.
Project description
SoftPandas - Semantic Querying for Pandas
Example Usage:
- Let's say we want to get all red and black swim shorts that cost less than 600$
python test.py
Or play with this code:
import pandas as pd
from core.data_types import InputDataType
from core.soft_dataframe import SoftDataFrame
from embedders.clip_embedder import OpenClipEmbedder
from embedders.sentence_transformer_embedder import SentenceTransformerEmbedder
from sklearn.metrics.pairwise import cosine_similarity
lang_model = SentenceTransformerEmbedder('thenlper/gte-small',
metric=cosine_similarity, threshold=0.82, device="cpu")
vision_model = OpenClipEmbedder('ViT-B-32-256', metric=cosine_similarity,
threshold=0.25, pretrained="datacomp_s34b_b86k")
df = pd.read_csv("sample_data/men-swimwear.csv")
df = SoftDataFrame(df, soft_columns={'NAME': InputDataType.text,
'DESCRIPTION & COLOR': InputDataType.text,
'FABRIC': InputDataType.text},
models={InputDataType.text: lang_model, InputDataType.image: vision_model}
)
relevant_price_items = df.query("PRICE < 600")
df_filtered_desc = relevant_price_items.soft_query("'DESCRIPTION & COLOR' ~= 'red and black swim shorts'")
df = df_filtered_desc.add_soft_columns({'IMAGE': InputDataType.image}, inplace=False)
df_filtered_image = df.soft_query("'IMAGE' ~= 'red and black swim shorts'")
print(df_filtered_image)
- Saving and loading:
relevant_price_items.to_pickle("relevant_items.p")
a = pd.read_pickle("relevant_items.p")
TODO:
Add saving methods for SoftDataFrameMethod for adding new columns- Add dealing with Nans
if a column is Nan, just ignore it- If value isn't there, it shouldn't pass condition - similar to normal querying
- Batching of initial encoding -
- don't do it one by one
use device (cuda, mps, tpu, etc.)
- make into a package
- requirements file
- Add image
Long Term Goals:
- Add automatic feature extraction from images into new columns
- allows hard querying using visual data!
- Add ability to soft query based on Image
- Expand to more modalities