Skip to main content

chDB is an in-process SQL OLAP Engine powered by ClickHouse

Project description

Build X86 PyPI Downloads Discord Twitter

chDB

中文

chDB is an embedded SQL OLAP Engine powered by ClickHouse [^1] For more details: The birth of chDB

Features

  • In-process SQL OLAP Engine, powered by ClickHouse
  • No need to install ClickHouse
  • Minimized data copy from C++ to Python with python memoryview
  • Input&Output support Parquet, CSV, JSON, Arrow, ORC and 60+more formats, samples
  • Support Python DB API 2.0, example

Arch

Get Started

Get started with chdb using our Installation and Usage Examples


Installation

Currently, chDB supports Python 3.8+ on macOS and Linux (x86_64 and ARM64).

pip install chdb

Usage

Run in command line

python3 -m chdb SQL [OutputFormat]

python3 -m chdb "SELECT 1,'abc'" Pretty

Data Input

The following methods are available to access on-disk and in-memory data formats:

🗂️ Query On File

(Parquet, CSV, JSON, Arrow, ORC and 60+)

You can execute SQL and return desired format data.

import chdb
res = chdb.query('select version()', 'Pretty'); print(res)

Work with Parquet or CSV

# See more data type format in tests/format_output.py
res = chdb.query('select * from file("data.parquet", Parquet)', 'JSON'); print(res)
res = chdb.query('select * from file("data.csv", CSV)', 'CSV');  print(res)
print(f"SQL read {res.rows_read()} rows, {res.bytes_read()} bytes, elapsed {res.elapsed()} seconds")

Pandas dataframe output

# See more in https://clickhouse.com/docs/en/interfaces/formats
chdb.query('select * from file("data.parquet", Parquet)', 'Dataframe')

🗂️ Query On Table

(Pandas DataFrame, Parquet file/bytes, Arrow bytes)

Query On Pandas DataFrame

import chdb.dataframe as cdf
import pandas as pd
# Join 2 DataFrames
df1 = pd.DataFrame({'a': [1, 2, 3], 'b': ["one", "two", "three"]})
df2 = pd.DataFrame({'c': [1, 2, 3], 'd': ["①", "②", "③"]})
ret_tbl = cdf.query(sql="select * from __tbl1__ t1 join __tbl2__ t2 on t1.a = t2.c",
                  tbl1=df1, tbl2=df2)
print(ret_tbl)
# Query on the DataFrame Table
print(ret_tbl.query('select b, sum(a) from __table__ group by b'))

🗂️ Query with Stateful Session

from chdb import session as chs

## Create DB, Table, View in temp session, auto cleanup when session is deleted.
sess = chs.Session()
sess.query("CREATE DATABASE IF NOT EXISTS db_xxx ENGINE = Atomic")
sess.query("CREATE TABLE IF NOT EXISTS db_xxx.log_table_xxx (x String, y Int) ENGINE = Log;")
sess.query("INSERT INTO db_xxx.log_table_xxx VALUES ('a', 1), ('b', 3), ('c', 2), ('d', 5);")
sess.query(
    "CREATE VIEW db_xxx.view_xxx AS SELECT * FROM db_xxx.log_table_xxx LIMIT 4;"
)
print("Select from view:\n")
print(sess.query("SELECT * FROM db_xxx.view_xxx", "Pretty"))

see also: test_stateful.py.

🗂️ Query with Python DB-API 2.0

import chdb.dbapi as dbapi
print("chdb driver version: {0}".format(dbapi.get_client_info()))

conn1 = dbapi.connect()
cur1 = conn1.cursor()
cur1.execute('select version()')
print("description: ", cur1.description)
print("data: ", cur1.fetchone())
cur1.close()
conn1.close()

🗂️ Query with UDF (User Defined Functions)

from chdb.udf import chdb_udf
from chdb import query

@chdb_udf()
def sum_udf(lhs, rhs):
    return int(lhs) + int(rhs)

print(query("select sum_udf(12,22)"))

see also: test_udf.py.

For more examples, see examples and tests.


Demos and Examples

Benchmark

Documentation

Events

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated. There are something you can help:

  • Help test and report bugs
  • Help improve documentation
  • Help improve code quality and performance

Bindings

We welcome bindings for other languages, please refer to bindings for more details.

License

Apache 2.0, see LICENSE for more information.

Acknowledgments

chDB is mainly based on ClickHouse [^1] for trade mark and other reasons, I named it chDB.

Contact


[^1]: ClickHouse® is a trademark of ClickHouse Inc. All trademarks, service marks, and logos mentioned or depicted are the property of their respective owners. The use of any third-party trademarks, brand names, product names, and company names does not imply endorsement, affiliation, or association with the respective owners.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

chdb-1.3.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (99.6 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

chdb-1.3.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (126.3 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

chdb-1.3.0-cp312-cp312-macosx_11_0_arm64.whl (74.8 MB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

chdb-1.3.0-cp312-cp312-macosx_10_15_x86_64.whl (91.8 MB view hashes)

Uploaded CPython 3.12 macOS 10.15+ x86-64

chdb-1.3.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (99.6 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

chdb-1.3.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (126.3 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

chdb-1.3.0-cp311-cp311-macosx_11_0_arm64.whl (74.8 MB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

chdb-1.3.0-cp311-cp311-macosx_10_15_x86_64.whl (91.8 MB view hashes)

Uploaded CPython 3.11 macOS 10.15+ x86-64

chdb-1.3.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (99.6 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

chdb-1.3.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (126.3 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

chdb-1.3.0-cp310-cp310-macosx_11_0_arm64.whl (74.8 MB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

chdb-1.3.0-cp310-cp310-macosx_10_15_x86_64.whl (91.8 MB view hashes)

Uploaded CPython 3.10 macOS 10.15+ x86-64

chdb-1.3.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (99.6 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

chdb-1.3.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (126.3 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

chdb-1.3.0-cp39-cp39-macosx_11_0_arm64.whl (74.8 MB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

chdb-1.3.0-cp39-cp39-macosx_10_15_x86_64.whl (91.8 MB view hashes)

Uploaded CPython 3.9 macOS 10.15+ x86-64

chdb-1.3.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (99.6 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

chdb-1.3.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (126.3 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

chdb-1.3.0-cp38-cp38-macosx_10_15_x86_64.whl (91.8 MB view hashes)

Uploaded CPython 3.8 macOS 10.15+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page