Skip to main content

To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.

Project description

Final Year Project on EDU Segmentation:

To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.

Segbot:
http://138.197.118.157:8000/segbot/
https://www.ijcai.org/proceedings/2018/0579.pdf


Installation

To use the EDUSegmentation module, follow these steps:

  1. Import the download module to download all models:
from edu_segmentation.download import download_models
download_models()
  1. Import the edu_segmentation module and its related classes
from edu_segmentation.main import EDUSegmentation, ModelFactory, BERTUncasedModel, BERTCasedModel, BARTModel

Usage

The edu_segmentation module provides an easy-to-use interface to perform EDU segmentation using different strategies and models. Follow these steps to use it:

  1. Create a segmentation strategy:

    You can choose between the default segmentation strategy or a conjunction-based segmentation strategy.

    Conjunction-based segmentation strategy: After the text has been EDU-segmented, if there are conjunctions at the start or end of each segment, the conjunctions will be isolated as its own segment.

    Default segmentation strategy: No post-processing occurs after the text has been EDU-segmented

from edu_segmentation.main import DefaultSegmentation, ConjunctionSegmentation
  1. Create a model using the ModelFactory.

    Choose from BERT Uncased, BERT Cased, or BART models.
model_type = "bert_uncased"  # or "bert_cased", "bart"
model = ModelFactory.create_model(model_type)
  1. create an instance of EDUSegmentation using the chosen model:
edu_segmenter = EDUSegmentation(model)
  1. Segment the text using the chosen strategy:
text = "Your input text here."
granularity = "conjunction_words"  # or "default"
conjunctions = ["and", "but", "however"]  # Customize conjunctions if needed
device = 'cpu'  # Choose your device, e.g., 'cuda:0'

segmented_output = edu_segmenter.run(text, granularity, conjunctions, device)

Example

Here's a simple example demonstrating how to use the edu_segmentation module:

from edu_segmentation.download import download_models
from edu_segmentation.main import ModelFactory, EDUSegmentation

download_models()

# Create a BERT Uncased model
model = ModelFactory.create_model("bart") # or bert_cased or bert_uncased

# Create an instance of EDUSegmentation using the model
edu_segmenter = EDUSegmentation(model)

# Segment the text using the conjunction-based segmentation strategy
text = "The food is good, but the service is bad."
granularity = "conjunction_words" # or default
conjunctions = ["and", "but", "however"] # customise as needed
device = 'cpu' # or cuda

segmented_output = edu_segmenter.run(text, granularity, conjunctions, device)
print(segmented_output)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edu_segmentation-0.0.115.tar.gz (317.0 kB view hashes)

Uploaded Source

Built Distribution

edu_segmentation-0.0.115-py3-none-any.whl (327.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page