Skip to content
Snippets Groups Projects
Select Git revision
  • main
1 result

database-server

  • Clone with SSH
  • Clone with HTTPS
  • Jonas Röger's avatar
    Jonas Röger authored
    88f03fc7
    History

    Database-Server

    The Database-Server package provides server side implementation of Database-API. It is a collection of wrapped retrieval methods.

    database-server Package

    The database-server Package contains only a placeholder database_app Flask application at the moment.

    Deploying

    Use the wheel:

    pip install database-server --index-url https://gitlab.fachschaften.org/api/v4/projects/2724/packages/pypi/simple

    Or install the package directly:

    python3 -m pip install .

    This currently installs two executables in the default python bin directory:

    • database-server is the server application
    • database-server-opensearch for direct OpenSearch interfacing

    Project setup

    Services

    All services used by the database-server are defined in the docker-compose.yml file. Make sure to edit .env to contain your HuggingFace API token, since the jinaai models require authorization. You can start the services all together with:

    cd database-server-services
    docker compose --profile cpu up -d # or --profile cuda

    NOTE: This will load some large models via TEI into memory.

    To only start selected services, run something like:

    cd database-server-services
    docker compose up tei-jinaai-jina-embedding-v2-small-en-cpu opensearch-node1 -d

    Testing the API manually

    Start the server by running:

    poetry run python database_server/database_app.py

    Use a tool like Postman to send requests without much effort. Send requests to http://localhost:8080/retrieval with content type application/json.

    Example Body for the Generic retriever:

    {
        "query": "My Query",
        "retrieverType": {
            "name": "Generic"
        },
        "maxLength": 50
    }

    Example Body for the OpenSearchTerm retriever (requires and existing index1 and index2):

    {
        "query": "Query",
        "retrieverType": {
            "name": "OpenSearchTerm",
            "args": {
                "operator": "or",
                "indices": [
                    "index1",
                    "index2"
                ],
                "minimum_should_match": 1
            }
        },
        "maxLength": 50
    }

    Dependencies

    The dependencies can be managed with the lean specification in requirements.txt, optionally poetry can be used to use explicit torch+cpu versions (seems to be a workaround for non-cuda systems).

    .venv

    Setup and activate a venv:

    python -m venv .venv
    source .venv/bin/activate

    Install the project/dependencies:

    pip install -r requirements.txt
    pip install -r requirements-dev.txt # optional
    pip install -e . # Setup project for local usage

    This will prompt you to enter your gitlab credentials. If you use 2FA, genearate a Personal Access Token. You can store your credentials in your .pypirc file:

    [gitlab]
    repository = https://gitlab.fachschaften.org/api/v4/projects/2744/packages/pypi
    username = __token__
    password = <your personal access token>

    Poetry

    You can simply use the devcontainer or install poetry yourself:

    Add credentials for the private Database-API package registry:

    poetry config http-basic.database_api <your-user-name> <your-access-token-or-password>

    Create the poetry env:

    poetry install --with=dev

    Create a shell for the environment:

    poetry shell

    pre-commit

    poetry run pre-commit install
    poetry run pre-commit install --hook-type commit-msg

    Known Issues

    Error Fix
    segmentation fault (core dumped) when loading pix2text Disable pytorch JIT by setting the PYTORCH_JIT=0 environment variable.