# Database-Server The Database-Server package provides server side implementation of [Database-API](https://gitlab.fachschaften.org/PG-Facts4Chat/datacollection/database-api). It is a collection of wrapped retrieval methods. ## `database-server` Package The `database-server` Package contains only a placeholder `database_app` Flask application at the moment. ## Deploying ### Use the wheel: ```bash pip install database-server --index-url https://gitlab.fachschaften.org/api/v4/projects/2724/packages/pypi/simple ``` ### Or install the package directly: ```bash python3 -m pip install . ``` This currently installs two executables in the default python `bin` directory: - `database-server` is the server application - `database-server-opensearch` for direct OpenSearch interfacing ## Project setup ## Services All services used by the database-server are defined in the [`docker-compose.yml`](database-server-services/docker-compose.yml) file. Make sure to edit [`.env`](database-server-services/.env) to contain your [HuggingFace](https://huggingface.co/) API token, since the `jinaai` models require authorization. You can start the services all together with: ```bash cd database-server-services docker compose --profile cpu up -d # or --profile cuda ``` **NOTE: This will load some large models via TEI into memory.** To only start selected services, run something like: ```bash cd database-server-services docker compose up tei-jinaai-jina-embedding-v2-small-en-cpu opensearch-node1 -d ``` ## Testing the API manually Start the server by running: ```bash poetry run python database_server/database_app.py ``` Use a tool like [Postman](https://www.postman.com/) to send requests without much effort. Send requests to `http://localhost:8080/retrieval` with content type `application/json`. Example Body for the Generic retriever: ```json { "query": "My Query", "retrieverType": { "name": "Generic" }, "maxLength": 50 } ``` Example Body for the OpenSearchTerm retriever (requires and existing index1 and index2): ```json { "query": "Query", "retrieverType": { "name": "OpenSearchTerm", "args": { "operator": "or", "indices": [ "index1", "index2" ], "minimum_should_match": 1 } }, "maxLength": 50 } ``` ### Dependencies The dependencies can be managed with the lean specification in `requirements.txt`, optionally [poetry](https://python-poetry.org/) can be used to use explicit `torch+cpu` versions (seems to be a workaround for non-cuda systems). #### .venv Setup and activate a venv: ```bash python -m venv .venv source .venv/bin/activate ``` Install the project/dependencies: ```bash pip install -r requirements.txt pip install -r requirements-dev.txt # optional pip install -e . # Setup project for local usage ``` This will prompt you to enter your gitlab credentials. If you use 2FA, genearate a Personal Access Token. You can store your credentials in your `.pypirc` file: ```ini [gitlab] repository = https://gitlab.fachschaften.org/api/v4/projects/2744/packages/pypi username = __token__ password = <your personal access token> ``` #### Poetry You can simply use the devcontainer or install poetry yourself: - [Official Installer](https://python-poetry.org/docs/#installation) or - Local pip installation: `pip install poetry` Add credentials for the private [Database-API](https://gitlab.fachschaften.org/PG-Facts4Chat/datacollection/database-api) package registry: ```bash poetry config http-basic.database_api <your-user-name> <your-access-token-or-password> ``` Create the poetry env: ```bash poetry install --with=dev ``` Create a shell for the environment: ```bash poetry shell ``` ### `pre-commit` ```bash poetry run pre-commit install poetry run pre-commit install --hook-type commit-msg ``` ## Known Issues | Error | Fix | | -------------------------------------------------------- | ------------------------------------------------------------------------ | | `segmentation fault (core dumped)` when loading pix2text | Disable pytorch JIT by setting the `PYTORCH_JIT=0` environment variable. |