MongoDBAtlasVectorSearch.from_documents encounters an _asyncio.Future issue

zewei_chen · September 5, 2024, 11:34am

Why does executing the Python program independently with MongoDBAtlasVectorSearch.from_documents work without issues, but when placed inside a Sanic API function it raises AttributeError: '_asyncio.Future' object has no attribute 'inserted_ids' ?"

Michael_Lynn · September 6, 2024, 8:57am

I think it’s happening because of the way Sanic manages the event loop… Sanic functions asynchronously using its own event loop, so if you’re using MongoDBAtlasVectorSearch.from_documents and it involves async MongoDB operations (like an insert_many), the result is an asyncio.Future object, which needs to be awaited. If it’s not properly awaited, that’s when you see the error about 'inserted_ids'.

My first suggestin is to try to make sure any async MongoDB operations inside that method are awaited, especially if you’re using things like insert_many. Here’s a quick example of what it could look like:

@app.route("/vector_search", methods=["POST"])
async def vector_search(request):
    try:
        # Make sure to await this if it involves async operations
        result = await MongoDBAtlasVectorSearch.from_documents(request.json)
        return json({"success": True, "inserted_ids": result.inserted_ids})
    except AttributeError as e:
        return json({"success": False, "error": str(e)}, status=500)

Also - not sure about the rest of your app - but are you, or have you considered using Motor…

Motor is a full-featured, non-blocking MongoDB driver for Python asyncio and Tornado applications. Motor presents a coroutine-based API for non-blocking access to MongoDB.

Hope this helps… let us know how you make out.

zewei_chen · September 6, 2024, 10:31am

I am using Motor, and the code is similar to yours.

@bp.route('/knowledge/add', methods=['POST'])
async def knowledge_add(request):
    data = request.json
    user_code = data.get('user_code')
    file_path = data.get("file_path")
    loader = PyPDFLoader(file_path)
    data = loader.load()
    # 调用向量化方法
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=20)
    docs = text_splitter.split_documents(data)

    embed_model = OpenAIEmbeddings()

    vector_store = MongoDBAtlasVectorSearch.from_documents(
        documents=docs,
        embedding=embed_model,
        collection=collection,
        index_name="vector_index"
    )

     return create_response(msg=get_message(request, "knowledge_success"), data=vector_store)