# Database-Server

The Database-Server package provides server side implementation of [Database-API](https://gitlab.fachschaften.org/PG-Facts4Chat/datacollection/database-api).
It is a collection of wrapped retrieval methods.

## `database-server` Package

The `database-server` Package contains only a placeholder `database_app` Flask application at the moment.

## Deploying

### Use the wheel:

```bash
pip install database-server --index-url https://gitlab.fachschaften.org/api/v4/projects/2724/packages/pypi/simple
```

### Or install the package directly:

```bash
python3 -m pip install .
```

This currently installs two executables in the default python `bin` directory:
- `database-server` is the server application
- `database-server-opensearch` for direct OpenSearch interfacing

## Project setup

## Services

All services used by the database-server are defined in the [`docker-compose.yml`](database-server-services/docker-compose.yml) file.
Make sure to edit [`.env`](database-server-services/.env) to contain your
[HuggingFace](https://huggingface.co/) API token, since the `jinaai` models
require authorization.
You can start the services all together with:

```bash
cd database-server-services
docker compose --profile cpu up -d # or --profile cuda
```

**NOTE: This will load some large models via TEI into memory.**

To only start selected services, run something like:
```bash
cd database-server-services
docker compose up tei-jinaai-jina-embedding-v2-small-en-cpu opensearch-node1 -d
```

## Testing the API manually

Start the server by running:
```bash
poetry run python database_server/database_app.py
```

Use a tool like [Postman](https://www.postman.com/) to send requests without much effort.
Send requests to `http://localhost:8080/retrieval` with content type `application/json`.

Example Body for the Generic retriever:


```json
{
    "query": "My Query",
    "retrieverType": {
        "name": "Generic"
    },
    "maxLength": 50
}
```

Example Body for the OpenSearchTerm retriever (requires and existing index1 and index2):

```json
{
    "query": "Query",
    "retrieverType": {
        "name": "OpenSearchTerm",
        "args": {
            "operator": "or",
            "indices": [
                "index1",
                "index2"
            ],
            "minimum_should_match": 1
        }
    },
    "maxLength": 50
}
```

### Dependencies

The dependencies can be managed with the lean specification in `requirements.txt`,
optionally [poetry](https://python-poetry.org/) can be used to use explicit `torch+cpu` versions (seems to be a workaround for non-cuda systems).

#### .venv

Setup and activate a venv:

```bash
python -m venv .venv
source .venv/bin/activate
```

Install the project/dependencies:

```bash
pip install -r requirements.txt
pip install -r requirements-dev.txt # optional
pip install -e . # Setup project for local usage
```

This will prompt you to enter your gitlab credentials. If you use 2FA, genearate
a Personal Access Token. You can store your credentials in your `.pypirc` file:
```ini
[gitlab]
repository = https://gitlab.fachschaften.org/api/v4/projects/2744/packages/pypi
username = __token__
password = <your personal access token>
```

#### Poetry

You can simply use the devcontainer or install poetry yourself:
- [Official Installer](https://python-poetry.org/docs/#installation) or
- Local pip installation: `pip install poetry`

Add credentials for the private [Database-API](https://gitlab.fachschaften.org/PG-Facts4Chat/datacollection/database-api) package registry:

```bash
poetry config http-basic.database_api <your-user-name> <your-access-token-or-password>
```

Create the poetry env:
```bash
poetry install --with=dev
```

Create a shell for the environment:
```bash
poetry shell
```


### `pre-commit`
```bash
poetry run pre-commit install
poetry run pre-commit install --hook-type commit-msg
```

## Known Issues

| Error                                                    | Fix                                                                      |
| -------------------------------------------------------- | ------------------------------------------------------------------------ |
| `segmentation fault (core dumped)` when loading pix2text | Disable pytorch JIT by setting the `PYTORCH_JIT=0` environment variable. |