LanceDB supports Polars , a blazingly fast DataFrame library for Python written in Rust. Just like in Pandas, the Polars integration is enabled by PyArrow under the hood. A deeper integration between Lance Tables and Polars DataFrames is in progress, but at the moment, you can read a Polars DataFrame into LanceDB and output the search results from a query to a Polars DataFrame.
Create & Query LanceDB Table
From Polars DataFrame
First, we connect to a LanceDB database.
import lancedbLet’s also import Polars:
import polars as plSync API
uri = "data/sample-lancedb"
db = lancedb.connect(uri)We can then load a Polars DataFrame to LanceDB directly.
data = pl.DataFrame(
{
"vector": [[3.1, 4.1], [5.9, 26.5]],
"item": ["foo", "bar"],
"price": [10.0, 20.0],
}
)
table = db.create_table("pl_table", data=data)You can now perform similarity search via the LanceDB Python API.
query = [3.0, 4.0]
result = table.search(query).limit(1).to_polars()
print(result)
print(type(result))In addition to the selected columns, LanceDB also returns a vector
and also the _distance column which is the distance between the query
vector and the returned vector.
shape: (1, 4)
┌───────────────┬──────┬───────┬───────────┐
│ vector ┆ item ┆ price ┆ _distance │
│ --- ┆ --- ┆ --- ┆ --- │
│ array[f32, 2] ┆ str ┆ f64 ┆ f32 │
╞═══════════════╪══════╪═══════╪═══════════╡
│ [3.1, 4.1] ┆ foo ┆ 10.0 ┆ 0.0 │
└───────────────┴──────┴───────┴───────────┘
<class 'polars.dataframe.frame.DataFrame'>Note that the type of the result from a table search is a Polars DataFrame.
Async API
Let’s look at the same workflow, this time, using LanceDB’s async Python API.
uri = "data/sample-lancedb"
async_db = await lancedb.connect_async(uri)We can then load a Polars DataFrame to LanceDB directly.
data = pl.DataFrame(
{
"vector": [[3.1, 4.1], [5.9, 26.5]],
"item": ["foo", "bar"],
"price": [10.0, 20.0],
}
)
table = await db.create_table("pl_table_async", data=data)You can now perform similarity search via the LanceDB Python API.
query = [3.0, 4.0]
result = await (await table.search(query)).limit(1).to_polars()
print(result)
print(type(result))In addition to the selected columns, LanceDB also returns a vector
and also the _distance column which is the distance between the query
vector and the returned vector.
shape: (1, 4)
┌───────────────┬──────┬───────┬───────────┐
│ vector ┆ item ┆ price ┆ _distance │
│ --- ┆ --- ┆ --- ┆ --- │
│ array[f32, 2] ┆ str ┆ f64 ┆ f32 │
╞═══════════════╪══════╪═══════╪═══════════╡
│ [3.1, 4.1] ┆ foo ┆ 10.0 ┆ 0.0 │
└───────────────┴──────┴───────┴───────────┘
<class 'polars.dataframe.frame.DataFrame'>Note that the type of the result from a table search is a Polars DataFrame.
From Pydantic Models
Alternately, we can create an empty LanceDB Table using a Pydantic schema and populate it with a Polars DataFrame.
Let’s first import Polars:
import polars as plAnd then the necessary models from Pydantic:
from lancedb.pydantic import Vector, LanceModelFirst, let’s define a Pydantic model:
class Item(LanceModel):
vector: Vector(2)
item: str
price: floatWe can then create the table from the Pydantic model and add the Polars DataFrame to the Lance table as follows:
table = db.create_table("pydantic_table", schema=Item)
df = pl.DataFrame(data)
# Add Polars DataFrame to table
table.add(df)The table can now be queried as usual.
query = [3.0, 4.0]
result = table.search(query).limit(1).to_polars()
print(result)
print(type(result))shape: (1, 4)
┌───────────────┬──────┬───────┬───────────┐
│ vector ┆ item ┆ price ┆ _distance │
│ --- ┆ --- ┆ --- ┆ --- │
│ array[f32, 2] ┆ str ┆ f64 ┆ f32 │
╞═══════════════╪══════╪═══════╪═══════════╡
│ [3.1, 4.1] ┆ foo ┆ 10.0 ┆ 0.02 │
└───────────────┴──────┴───────┴───────────┘
<class 'polars.dataframe.frame.DataFrame'>This result is the same as the previous one, with a DataFrame returned.
Dump Table to LazyFrame
As you iterate on your application, you’ll likely need to work with the whole table’s data pretty frequently, for which Polars provides a lazily-evaluated alternative to a DataFrame.
LanceDB tables can also be converted directly into a Polars LazyFrame for further processing.
ldf = table.to_polars()
print(type(ldf))Unlike the search result from a query, we can see that the type of the result is a LazyFrame.
<class 'polars.lazyframe.frame.LazyFrame'>We can now work with the LazyFrame as we would in Polars, and collect the first result.
print(ldf.first().collect())shape: (1, 3)
┌───────────────┬──────┬───────┐
│ vector ┆ item ┆ price │
│ --- ┆ --- ┆ --- │
│ array[f32, 2] ┆ str ┆ f64 │
╞═══════════════╪══════╪═══════╡
│ [3.1, 4.1] ┆ foo ┆ 10.0 │
└───────────────┴──────┴───────┘The reason it’s beneficial to not convert the LanceDB Table
to a regular Polars DataFrame is that the table can potentially be way larger
than memory. Using a Polars LazyFrame allows us to work with such
larger-than-memory datasets by not loading it into memory all at once.