Skip to main content

A versatile Python package for data extraction from JSON-like structures with user-defined format keys, enhanced with synonym retrieval capabilities.

Project description

Complex Parser

Complex Parser is a powerful Python package designed to streamline the process of data extraction from JSON-like structures while also enriching the extracted data with synonym retrieval capabilities. Whether you're working with complex nested JSON data or simple dictionaries, this package provides a flexible and intuitive solution for extracting specific data elements based on user-defined format keys, all while expanding the semantic richness of your data through synonym retrieval.

Features

Data Extraction

  • Structured Data Extraction: Extract specific data elements from nested JSON-like structures based on user-specified format keys.
  • Customizable Format Keys: Define format keys to precisely target the data elements you need, making it adaptable to a wide range of data structures.
  • Efficient Data Parsing: Utilizes efficient algorithms to parse through the data and extract relevant information with minimal computational overhead.
  • Thread Based chuncking: Utilises threads to quickly sort through larger data sets.

Synonym Retrieval

  • Semantic Enrichment: Enhance the semantic richness of your data by retrieving synonyms for key terms using both WordNet and custom synonym lists.
  • Flexible Synonym Loading: Load additional synonyms from custom lists to expand the synonym pool for specific terms, allowing for fine-tuned control over synonym retrieval.

Ease of Use

  • Simple Integration: Integrate seamlessly into your Python projects with an intuitive interface and straightforward usage.
  • Comprehensive Documentation: Detailed documentation and examples provided for easy reference and quick integration into your projects.

Installation

You can install Complex Parser via pip:

pip install complex-parser

Usage:

Here's a simple example demonstrating how to use the package

from complex_parser import extract_data

# Example data
data = {
    "people":[
        {
            "name": "John",
            "age": 30,
            "address": {
                "road": "123 Main St",
                "city": "Anytown"
            }
        }, 
        {
            "name": "Joshua",
            "age": 3100,
            "address": {
                "road": "657 Loud St",
                "city": "Basictown",
                "landmark": "Town Square"
            }
        }, 
        {
            "name": "John",
            "age": 30,
            "location": {
                "road": "8474 Main St",
                "city": "None"
            }
        }, 
        {
            "fullname": "Job",
            "age": 27,
            "destination": {
                "road": "8474 John's St",
                "city": "London"
            }
        }, 
        {
            "unknown": "Job",
            "age": 27,
            "destination": {
                "road": "8474 John's St",
                "city": "London"
            }
        }
    ]
}
format_keys = ["name", "address"]
load_lists= {
    "address":[
        "location"
    ], 
    "name": [
        "fullname"
    ]
}
# Extract data with specified format keys
extracted_data = extract_data(data=data, format_keys=format_keys,load_lists=load_lists)
print(extracted_data)

results:

[{'name': 'John', 'age': 30, 'address': {'road': '123 Main St', 'city': 'Anytown'}}, {'name': 'Joshua', 'age': 3100, 'address': {'road': '657 Loud St', 'city': 'Basictown', 'landmark': 'Town Square'}}, {'name': 'John', 'age': 30, 'location': {'road': '8474 Main St', 'city': 'None'}}, {'fullname': 'Job', 'age': 27, 'destination': {'road': "8474 John's St", 'city': 'London'}}]

License:

This project is licensed under the Mozilla Public License Version 2.0 - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit bug reports, feature requests, or pull requests on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

complex_parser-0.0.2.tar.gz (4.6 kB view hashes)

Uploaded Source

Built Distribution

complex_parser-0.0.2-py3-none-any.whl (4.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page