Skip to main content
Open In ColabOpen on GitHub

Milvus

Milvus is a database that stores, indexes, and manages massive embedding vectors generated by deep neural networks and other machine learning (ML) models.

This notebook shows how to use functionality related to the Milvus vector database.

Setupโ€‹

You'll need to install langchain-milvus with pip install -qU langchain-milvus to use this integration.

%pip install -qU langchain_milvus
Note: you may need to restart the kernel to use updated packages.

Credentialsโ€‹

No credentials are needed to use the Milvus vector store.

Initializationโ€‹

pip install -qU langchain-openai
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

Milvus Liteโ€‹

The easiest way to prototype is to use Milvus Lite, where everything is stored in a local vector database file. Only the Flat index can be used.

from langchain_milvus import Milvus

URI = "./milvus_example.db"

vector_store = Milvus(
embedding_function=embeddings,
connection_args={"uri": URI},
index_params={"index_type": "FLAT", "metric_type": "L2"},
)
API Reference:Milvus

Milvus Standaloneโ€‹

If you have a large amount of data (e.g., more than a million vectors), we recommend setting up a more performant Milvus server on Docker or Kubernetes.

Milvus Standalone also supports different indexes, if you want to improve retrieval functionality.

To launch the Docker container, run:

!curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh

!bash standalone_embed.sh start
Password:

Here we create a Milvus database:

from pymilvus import MilvusException, connections, db, Collection, utility

conn = connections.connect(host="127.0.0.1", port=19530)

# Check if the database exists
db_name = "milvus_demo"
try:
existing_databases = db.list_database()
if db_name in existing_databases:
print(f"Database '{db_name}' already exists.")

# Use the database context
db.using_database(db_name)

# Drop all collections in the database
collections = utility.list_collections()
for collection_name in collections:
collection = Collection(name=collection_name)
collection.drop()
print(f"Collection '{collection_name}' has been dropped.")

db.drop_database(db_name)
print(f"Database '{db_name}' has been deleted.")
else:
print(f"Database '{db_name}' does not exist.")
database = db.create_database(db_name)
print(f"Database '{db_name}' created successfully.")
except MilvusException as e:
print(f"An error occurred: {e}")
Database 'milvus_demo' does not exist.
Database 'milvus_demo' created successfully.

Note the change in the URI below. Once the instance is initialized, navigate to http://127.0.0.1:9091/webui to view the local web UI.

Here is an example of how you would use a dense embedding + the Milvus BM25 built-in function to assemble a hybrid retrieval vector store instance:

from langchain_milvus import BM25BuiltInFunction, Milvus

dense_index_param = {
"metric_type": "COSINE",
"index_type": "HNSW",
}
sparse_index_param = {
"metric_type": "BM25",
"index_type": "AUTOINDEX",
}

URI = "http://localhost:19530"

vectorstore = Milvus(
embedding_function=embeddings,
builtin_function=BM25BuiltInFunction(output_field_names="sparse"),
index_params=[dense_index_param, sparse_index_param],
vector_field=["dense", "sparse"],
connection_args={"uri": URI, "token": "root:Milvus", "db_name": "milvus_demo"},
consistency_level="Strong",
drop_old=False, # set to True if seeking to drop the collection with that name if it exists
)
API Reference:BM25BuiltInFunction | Milvus

Compartmentalize the data with Milvus Collectionsโ€‹

You can store unrelated documents in different collections within the same Milvus instance.

Here's how you can create a new collection:

from langchain_core.documents import Document

vector_store_saved = Milvus.from_documents(
[Document(page_content="foo!")],
embeddings,
collection_name="langchain_example",
connection_args={"uri": URI},
)
API Reference:Document

And here is how you retrieve that stored collection:

vector_store_loaded = Milvus(
embeddings,
connection_args={"uri": URI},
collection_name="langchain_example",
)

Manage vector storeโ€‹

Once you have created your vector store, we can interact with it by adding and deleting different items.

Add items to vector storeโ€‹

We can add items to our vector store by using the add_documents function.

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
metadata={"source": "tweet"},
)

document_2 = Document(
page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
metadata={"source": "news"},
)

document_3 = Document(
page_content="Building an exciting new project with LangChain - come check it out!",
metadata={"source": "tweet"},
)

document_4 = Document(
page_content="Robbers broke into the city bank and stole $1 million in cash.",
metadata={"source": "news"},
)

document_5 = Document(
page_content="Wow! That was an amazing movie. I can't wait to see it again.",
metadata={"source": "tweet"},
)

document_6 = Document(
page_content="Is the new iPhone worth the price? Read this review to find out.",
metadata={"source": "website"},
)

document_7 = Document(
page_content="The top 10 soccer players in the world right now.",
metadata={"source": "website"},
)

document_8 = Document(
page_content="LangGraph is the best framework for building stateful, agentic applications!",
metadata={"source": "tweet"},
)

document_9 = Document(
page_content="The stock market is down 500 points today due to fears of a recession.",
metadata={"source": "news"},
)

document_10 = Document(
page_content="I have a bad feeling I am going to get deleted :(",
metadata={"source": "tweet"},
)

documents = [
document_1,
document_2,
document_3,
document_4,
document_5,
document_6,
document_7,
document_8,
document_9,
document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)
API Reference:Document

Delete items from vector storeโ€‹

vector_store.delete(ids=[uuids[-1]])
(insert count: 0, delete count: 1, upsert count: 0, timestamp: 0, success count: 0, err count: 0, cost: 0)

Query vector storeโ€‹

Once your vector store has been created and the relevant documents have been added, you will most likely wish to query it during the running of your chain or agent.

Query directlyโ€‹

Performing a simple similarity search with filtering on metadata can be done as follows:

results = vector_store.similarity_search(
"LangChain provides abstractions to make working with LLMs easy",
k=2,
expr='source == "tweet"',
)
for res in results:
print(f"* {res.page_content} [{res.metadata}]")
* Building an exciting new project with LangChain - come check it out! [{'pk': '9905001c-a4a3-455e-ab94-72d0ed11b476', 'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'pk': '1206d237-ee3a-484f-baf2-b5ac38eeb314', 'source': 'tweet'}]

Similarity search with scoreโ€‹

You can also search with score:

results = vector_store.similarity_search_with_score(
"Will it be hot tomorrow?", k=1, expr='source == "news"'
)
for res, score in results:
print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")
* [SIM=21192.628906] bar [{'pk': '2', 'source': 'https://example.com'}]

For a full list of all the search options available when using the Milvus vector store, you can visit the API reference.

Query by turning into retrieverโ€‹

You can also transform the vector store into a retriever for easier usage in your chains.

retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})
retriever.invoke("Stealing from the bank is a crime", filter={"source": "news"})
[Document(metadata={'pk': 'eacc7256-d7fa-4036-b1f7-83d7a4bee0c5', 'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]

Usage for retrieval-augmented generationโ€‹

For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:

Per-User Retrievalโ€‹

When building a retrieval app, you often have to build it with multiple users in mind. This means that you may be storing data not just for one user, but for many different users, and they should not be able to see each otherโ€™s data.

Milvus recommends using partition_key to implement multi-tenancy. Here is an example:

The Partition key feature is not available in Milvus Lite, if you want to use it, you need to start Milvus server, as mentioned above.

from langchain_core.documents import Document

docs = [
Document(page_content="i worked at kensho", metadata={"namespace": "harrison"}),
Document(page_content="i worked at facebook", metadata={"namespace": "ankush"}),
]
vectorstore = Milvus.from_documents(
docs,
embeddings,
connection_args={"uri": URI},
drop_old=True,
partition_key_field="namespace", # Use the "namespace" field as the partition key
)
API Reference:Document

To conduct a search using the partition key, you should include either of the following in the boolean expression of the search request:

search_kwargs={"expr": '<partition_key> == "xxxx"'}

search_kwargs={"expr": '<partition_key> == in ["xxx", "xxx"]'}

Do replace <partition_key> with the name of the field that is designated as the partition key.

Milvus changes to a partition based on the specified partition key, filters entities according to the partition key, and searches among the filtered entities.

# This will only get documents for Ankush
vectorstore.as_retriever(search_kwargs={"expr": 'namespace == "ankush"'}).invoke(
"where did i work?"
)
[Document(page_content='i worked at facebook', metadata={'namespace': 'ankush'})]
# This will only get documents for Harrison
vectorstore.as_retriever(search_kwargs={"expr": 'namespace == "harrison"'}).invoke(
"where did i work?"
)
[Document(page_content='i worked at kensho', metadata={'namespace': 'harrison'})]

API referenceโ€‹

For detailed documentation of all __ModuleName__VectorStore features and configurations head to the API reference: https://python.langchain.com/api_reference/milvus/vectorstores/langchain_milvus.vectorstores.milvus.Milvus.html


Was this page helpful?