Skip to content

milvus: add array data type for collection create #23219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Aug 28, 2024
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
b1b11de
add array data type for milvus vector store collection create
Jun 20, 2024
275ea24
add array data type for milvus vector store collection create
Jun 20, 2024
a8e2d08
add array data type for milvus vector store collection create
Jun 20, 2024
c7fe30c
Merge branch 'master' into array_data_type
rgupta2508 Jul 2, 2024
3ab4b6c
add array data type for milvus vector store collection create
Jul 2, 2024
91635fa
add array data type for milvus vector store collection create
Jul 2, 2024
d3012fd
add array data type for milvus vector store collection create
Jul 2, 2024
bc33253
add array data type for milvus vector store collection create
Jul 2, 2024
c77b03a
Merge branch 'master' into array_data_type
rgupta2508 Jul 2, 2024
a4ce1a9
add array data type for milvus vector store collection create
Jul 2, 2024
9f70ca7
add array data type for milvus vector store collection create
Jul 2, 2024
b8b94bd
add array data type for milvus vector store collection create
Jul 2, 2024
2b5d0cf
add array data type for milvus vector store collection create
Jul 2, 2024
cecd707
add array data type for milvus vector store collection create
Jul 2, 2024
d6b90f7
add array data type for milvus vector store collection create
Jul 2, 2024
8ea4186
Merge branch 'master' into array_data_type
rgupta2508 Jul 3, 2024
fb3ed0f
add array data type for milvus vector store collection create
Jul 3, 2024
7719a0d
add array data type for milvus vector store collection create
Jul 3, 2024
0e22cff
add array data type for milvus vector store collection create
Jul 3, 2024
afd0140
Merge branch 'master' into array_data_type
rgupta2508 Jul 3, 2024
3601c2d
add array data type for milvus vector store collection create
Jul 3, 2024
ae234cb
update
Jul 4, 2024
8fa2512
Merge branch 'master' into array_data_type
rgupta2508 Jul 4, 2024
2b9d310
lint error
Jul 5, 2024
c25cd52
lint error
Jul 5, 2024
99f8db8
lint error
Jul 5, 2024
8585008
lint error
Jul 5, 2024
410d616
lint error
Jul 5, 2024
f94a687
Merge branch 'master' into array_data_type
rgupta2508 Jul 16, 2024
9ae4191
Merge branch 'master' into array_data_type
rgupta2508 Jul 16, 2024
3195bca
resolve comments
Jul 16, 2024
ce58e66
incorporate review comments
Jul 17, 2024
66101e7
incorporate review comments
Jul 17, 2024
d14cd87
incorporate review comments
Jul 17, 2024
4f8f1c4
incorporate review comments
Jul 17, 2024
e31a606
incorporate review comments
Jul 17, 2024
6ec9b86
incorporate review comments
Jul 17, 2024
c171cd5
Merge branch 'master' into array_data_type
rgupta2508 Aug 13, 2024
63387b1
partners[milvus]: refine milvus array dtype
zc277584121 Aug 27, 2024
798dcb8
Merge branch 'master' into array_data_type
zc277584121 Aug 28, 2024
f82668f
Merge branch 'master' into array_data_type
Aug 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 60 additions & 15 deletions libs/partners/milvus/langchain_milvus/vectorstores/milvus.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,30 @@ class Milvus(VectorStore):
metadata_field (str): Name of the metadta field. Defaults to None.
When metadata_field is specified,
the document's metadata will store as json.
metadata_schema (Optional[dict]): What is the dataType of each metadata fields,
Default is Varchar, Example of field schema dict is :-
{
"column1": {
"dtype": "DataType.ARRAY",
"kwargs": {
"element_type": "DataType.VARCHAR",
"max_capacity": 20,
"max_length": 1000
}
},
"column2": {
"dtype": "DataType.ARRAY",
"kwargs": {
"element_type": "DataType.INT64",
"max_capacity": 50
}
},
"column3": {
"dtype": "DataType.INT64"
}
}



The connection args used for this class comes in the form of a dict,
here are a few of the options:
Expand Down Expand Up @@ -208,6 +232,7 @@ def __init__(
replica_number: int = 1,
timeout: Optional[float] = None,
num_shards: Optional[int] = None,
metadata_schema: Optional[dict[str, Any]] = None,
):
"""Initialize the Milvus vector store."""
try:
Expand Down Expand Up @@ -267,6 +292,7 @@ def __init__(
self.replica_number = replica_number
self.timeout = timeout
self.num_shards = num_shards
self.metadata_schema = metadata_schema

# Create the connection to the server
if connection_args is None:
Expand Down Expand Up @@ -397,24 +423,43 @@ def _create_collection(
# Create FieldSchema for each entry in metadata.
for key, value in metadatas[0].items():
# Infer the corresponding datatype of the metadata
dtype = infer_dtype_bydata(value)
# Datatype isn't compatible
if dtype == DataType.UNKNOWN or dtype == DataType.NONE:
logger.error(
(
"Failure to create collection, "
"unrecognized dtype for key: %s"
),
key,
)
raise ValueError(f"Unrecognized datatype for {key}.")
# Dataype is a string/varchar equivalent
elif dtype == DataType.VARCHAR:
field_type = "dtype"
if (
key in self.metadata_schema # type: ignore
and field_type in self.metadata_schema[key] # type: ignore
):
kwargs = self.metadata_schema[key]["kwargs"] # type: ignore
fields.append(
FieldSchema(key, DataType.VARCHAR, max_length=65_535)
FieldSchema(
name=key,
dtype=self.metadata_schema[key][field_type], # type: ignore
**kwargs,
)
)
else:
fields.append(FieldSchema(key, dtype))
dtype = infer_dtype_bydata(value)
# Datatype isn't compatible
if dtype == DataType.UNKNOWN or dtype == DataType.NONE:
logger.error(
(
"Failure to create collection, "
"unrecognized dtype for key: %s"
),
key,
)
raise ValueError(f"Unrecognized datatype for {key}.")
# Dataype is a string/varchar equivalent
elif dtype == DataType.VARCHAR:
fields.append(
FieldSchema(key, DataType.VARCHAR, max_length=65_535)
)
elif dtype == DataType.ARRAY:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to still rely on the solution of the pymilvus problem. It seems that pymilvus needs to support return DataType.ARRAY and other informations, before this line of code can take effect milvus-io/pymilvus#2144

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, could you please add the corresponding unit test as the guarantee to the quality of other possible future PRs

kwargs = self.metadata_schema[key]["kwargs"] # type: ignore
fields.append(
FieldSchema(name=key, dtype=DataType.ARRAY, **kwargs)
)
else:
fields.append(FieldSchema(key, dtype))

# Create the text field
fields.append(
Expand Down
Loading
Loading