Project description

sentry-nodestore-elastic

Sentry nodestore Elasticsearch backend

Supported Sentry 24.x & elasticsearch 8.x versions

Use Elasticsearch cluster for store node objects from Sentry

By default selfhosted Sentry uses Postgresql database for settings and nodestore, and under high load it becomes a bottleneck, database size growing fast and slowing down entire system

Switching nodestore to dedicated Elasticsearch cluster provides more scalability:

Elasticsearch cluster may be scaled horizontally by adding more data nodes (Postgres not)
Data in Elasticsearch may be sharded and replicated between data nodes, which increases throughput
Elasticsearch can rebalance automatically when new data nodes added
Scheduled Sentry cleanup performs much faster and stable when using elastic nodestore because of simple deleting old indices (cleanup in Postgresql terabyte-size nodestore is a huge pain)

Installation

Rebuild sentry docker image with nodestore package installation

FROM getsentry/sentry:24.4.1
RUN  pip install sentry-nodestore-elastic

Configuration

Set SENTRY_NODESTORE at your sentry.conf.py

from elasticsearch import Elasticsearch
es = Elasticsearch(
        ['https://username:password@elasticsearch:9200'],
        http_compress=True,
        request_timeout=60,
        max_retries=3,
        retry_on_timeout=True,
        # ❯ openssl s_client -connect elasticsearch:9200 < /dev/null 2>/dev/null | openssl x509 -fingerprint -noout -in /dev/stdin
        ssl_assert_fingerprint=(
            "PUT_FINGERPRINT_HERE"
        )
    )
SENTRY_NODESTORE = 'sentry_nodestore_elastic.ElasticNodeStorage'
SENTRY_NODESTORE_OPTIONS = {
    'es': es,
    'refresh': False,  # ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html
    # other ES related options
}

from sentry.conf.server import *  # default for sentry.conf.py
INSTALLED_APPS = list(INSTALLED_APPS)
INSTALLED_APPS.append('sentry_nodestore_elastic')
INSTALLED_APPS = tuple(INSTALLED_APPS)

Usage

Setup elasticsearch index template

Elasticsearch shoud be up and running before this step, this will create index template in elasticsearch

sentry upgrade --with-nodestore

Or you can prepare index template manually with this json, it may be customized for your needs (but template name should be sentry because of nodestore init script checks)

{
  "template": {
    "settings": {
      "index": {
        "number_of_shards": "3",
        "number_of_replicas": "0",
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_content"
            }
          }
        }
      }
    },
    "mappings": {
      "dynamic": "false",
      "dynamic_templates": [],
      "properties": {
        "data": {
          "type": "text",
          "index": false,
          "store": true
        },
        "timestamp": {
          "type": "date",
          "store": true
        }
      }
    },
    "aliases": {
      "sentry": {}
    }
  }
}

Migrate data from default Postgres nodestore to elasticsearch

Postgres and Elasticsearch must be accessible from place where you run this code

from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk, BulkIndexError
import psycopg2

es = Elasticsearch(
        ['https://username:password@elasticsearch:9200'],
        http_compress=True,
        request_timeout=60,
        max_retries=3,
        retry_on_timeout=True,
        # ❯ openssl s_client -connect elasticsearch:9200 < /dev/null 2>/dev/null | openssl x509 -fingerprint -noout -in /dev/stdin
        ssl_assert_fingerprint=(
            "PUT_FINGERPRINT_HERE"
        )
    )

name = 'sentry'

conn = psycopg2.connect(dbname="sentry", user="sentry", password="password", host="hostname", port="5432")

cur = conn.cursor()
cur.execute("SELECT reltuples AS estimate FROM pg_class where relname = 'nodestore_node'")
result = cur.fetchone()
count = int(result[0])
print(f"Estimated rows: {count}")
cur.close()

cursor = conn.cursor(name='fetch_nodes')
cursor.execute("SELECT * FROM nodestore_node ORDER BY timestamp ASC")

while True:
    records = cursor.fetchmany(size=2000)

    if not records:
        break

    bulk_data = []

    for r in records:
        id = r[0]
        data = r[1]
        date = r[2].strftime("%Y-%m-%d")
        ts = r[2].isoformat()
        index = f"sentry-{date}"

        doc = {
            'data': data,
            'timestamp' : ts
        }

        action = {
                "_index": index,
                "_id": id,
                "_source": doc
        }

        bulk_data.append(action)

    bulk(es, bulk_data)
    count = count - 2000
    print(f"Remainig rows: {count}")

cursor.close()
conn.close()

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Operating System
Programming Language
- Python

Release history Release notifications | RSS feed

This version

1.0.1

Apr 25, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentry_nodestore_elastic-1.0.1.tar.gz (9.4 kB view hashes)

Uploaded Apr 25, 2024 Source

Built Distribution

sentry_nodestore_elastic-1.0.1-py3-none-any.whl (9.8 kB view hashes)

Uploaded Apr 25, 2024 Python 3

Hashes for sentry_nodestore_elastic-1.0.1.tar.gz

Hashes for sentry_nodestore_elastic-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`b75ac9563cc5d444bfe807ab4ebc5a2148718270e3d38fb680c58f5f74f90755`
MD5	`7678d5328630d9c503e22ae8b237b6b5`
BLAKE2b-256	`9095a147423ab2a18b7399c050d80a79232b5e92c53170efe493d3b2a98f3272`

Hashes for sentry_nodestore_elastic-1.0.1-py3-none-any.whl

Hashes for sentry_nodestore_elastic-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`50ecc6c7640e3c3cbf3f077bbbe7e113a89be63d9bed4b014c1078c8a704160b`
MD5	`b15ba765e10b99ebf16a896ba99698c1`
BLAKE2b-256	`9ef7f2fea8f1924101ec29a127224ff9c2129b01e079eb34099102056655e578`