Skip to main content

An easy to use semantic (soft) querying on pandas dataframes.

Project description

SoftPandas - Semantic Querying for Pandas

SoftPandas

X (formerly Twitter) Follow

Example Usage:

  1. Let's say we want to get all red and black swim shorts that cost less than 600$

python test.py

Or play with this code:

import pandas as pd

from core.data_types import InputDataType
from core.soft_dataframe import SoftDataFrame
from embedders.clip_embedder import OpenClipEmbedder
from embedders.sentence_transformer_embedder import SentenceTransformerEmbedder
from sklearn.metrics.pairwise import cosine_similarity

lang_model = SentenceTransformerEmbedder('thenlper/gte-small',
                                metric=cosine_similarity, threshold=0.82, device="cpu")


vision_model = OpenClipEmbedder('ViT-B-32-256', metric=cosine_similarity,
                                threshold=0.25, pretrained="datacomp_s34b_b86k")

df = pd.read_csv("sample_data/men-swimwear.csv")
df = SoftDataFrame(df, soft_columns={'NAME': InputDataType.text,
                                     'DESCRIPTION & COLOR': InputDataType.text,
                                     'FABRIC': InputDataType.text},
                   models={InputDataType.text: lang_model, InputDataType.image: vision_model}
                   )

relevant_price_items = df.query("PRICE < 600")
df_filtered_desc = relevant_price_items.soft_query("'DESCRIPTION & COLOR' ~= 'red and black swim shorts'")
df = df_filtered_desc.add_soft_columns({'IMAGE': InputDataType.image}, inplace=False)

df_filtered_image = df.soft_query("'IMAGE' ~= 'red and black swim shorts'")
print(df_filtered_image)
  1. Saving and loading:
relevant_price_items.to_pickle("relevant_items.p")
a = pd.read_pickle("relevant_items.p")

TODO:

  1. Add saving methods for SoftDataFrame
  2. Method for adding new columns
  3. Add dealing with Nans
    • if a column is Nan, just ignore it
    • If value isn't there, it shouldn't pass condition - similar to normal querying
  4. Batching of initial encoding -
    • don't do it one by one
    • use device (cuda, mps, tpu, etc.)
  5. make into a package
    • requirements file
    • Add image

Long Term Goals:

  1. Add automatic feature extraction from images into new columns
    • allows hard querying using visual data!
  2. Add ability to soft query based on Image
  3. Expand to more modalities

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page