Paperless-ngx: Unterschied zwischen den Versionen

Aus darkrealm Wiki
Zur Navigation springen Zur Suche springen
Chris (Diskussion | Beiträge)
Keine Bearbeitungszusammenfassung
Chris (Diskussion | Beiträge)
Keine Bearbeitungszusammenfassung
 
(25 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt)
Zeile 1: Zeile 1:
== Installation ==
* sudo lxc-create -n paperless-ngx -B zfs -t download -- --dist debian --release bookworm --arch amd64
* sudo lxc-create -n paperless-ngx -B zfs -t download -- --dist debian --release bookworm --arch amd64


* dpkg-reconfigure tzdata
* dpkg-reconfigure tzdata
* dpkg-reconfigure locales
* dpkg-reconfigure locales
* apt install python3 python3-pip python3-dev imagemagick fonts-liberation gnupg libpq-dev default-libmysqlclient-dev pkg-config libmagic-dev mime-support libzbar0 poppler-utils apt-transport-https man-db vim bash-completion openssh-server unpaper ghostscript icc-profiles-free qpdf liblept5 libxml2 pngquant zlib1g tesseract-ocr redis tesseract-ocr-deu git locate curl wget python3.11-venv emacs-nox
* apt install python3 python3-pip python3-dev imagemagick fonts-liberation gnupg libpq-dev default-libmysqlclient-dev pkg-config libmagic-dev mime-support libzbar0 poppler-utils apt-transport-https man-db vim bash-completion openssh-server unpaper ghostscript icc-profiles-free qpdf liblept5 libxml2 pngquant zlib1g tesseract-ocr redis tesseract-ocr-deu git locate curl wget python3.11-venv emacs-nox autoconf libtool libleptonica-dev curl wget vim sudo mariadb-server
* adduser paperless --system --home /opt/paperless-ngx --group
* adduser paperless --system --home /opt/paperless-ngx --group
* curl -O -L https://github.com/paperless-ngx/paperless-ngx/releases/download/v2.1.3/paperless-ngx-v2.1.3.tar.xz
* curl -O -L https://github.com/paperless-ngx/paperless-ngx/releases/download/v2.1.3/paperless-ngx-v2.1.3.tar.xz
Zeile 11: Zeile 13:
* mkdir data
* mkdir data
* mkdir consume
* mkdir consume
* chown -R paperless:paperless .
* chown paperless:paperless media
* chown paperless:paperless media
* chown paperless:paperless data
* chown paperless:paperless data
Zeile 32: Zeile 35:


PAPERLESS_REDIS=redis://localhost:6379
PAPERLESS_REDIS=redis://localhost:6379
PAPERLESS_DBENGINE=mariadb
PAPERLESS_DBHOST=shodan.intern.darkrealm.dyndns.org
PAPERLESS_DBHOST=shodan.intern.darkrealm.dyndns.org
PAPERLESS_DBPORT=3306
PAPERLESS_DBPORT=3306
Zeile 41: Zeile 45:
# Paths and folders
# Paths and folders


PAPERLESS_CONSUMPTION_DIR=../consume
#PAPERLESS_CONSUMPTION_DIR=../consume
PAPERLESS_DATA_DIR=../data
#PAPERLESS_DATA_DIR=../data
PAPERLESS_TRASH_DIR=../trash
#PAPERLESS_TRASH_DIR=../trash
PAPERLESS_MEDIA_ROOT=../media
#PAPERLESS_MEDIA_ROOT=../media
PAPERLESS_CONSUMPTION_DIR=/mnt/storage/dms/consume
PAPERLESS_DATA_DIR=/mnt/storage/dms/data
PAPERLESS_TRASH_DIR=/mnt/storage/dms/trash
PAPERLESS_MEDIA_ROOT=/mnt/storage/dms/media
PAPERLESS_STATICDIR=../static
PAPERLESS_STATICDIR=../static
#PAPERLESS_FILENAME_FORMAT=
#PAPERLESS_FILENAME_FORMAT=
#PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=
PAPERLESS_FILENAME_FORMAT={{owner_username}}/{{correspondent}}/{{document_type}}/{{created_year}}/{{title}}
PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=true


# Security and hosting
# Security and hosting
Zeile 73: Zeile 82:
#PAPERLESS_OCR_DESKEW=true
#PAPERLESS_OCR_DESKEW=true
#PAPERLESS_OCR_ROTATE_PAGES=true
#PAPERLESS_OCR_ROTATE_PAGES=true
#PAPERLESS_OCR_ROTATE_PAGES_THRESHOLD=12.0
#PAPERLESS_OCR_ROTATE_PAGES_THRESHOLD=4
#PAPERLESS_OCR_USER_ARGS={}
#PAPERLESS_OCR_USER_ARGS={}
#PAPERLESS_CONVERT_MEMORY_LIMIT=0
#PAPERLESS_CONVERT_MEMORY_LIMIT=0
#PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless
#PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless


PAPERLESS_ENABLE_NLTK=true
#PAPERLESS_NLTK_DIR=../nltk_data
PAPERLESS_NLTK_DIR=/mnt/storage/dms/nltk_data
# Software tweaks
# Software tweaks


Zeile 85: Zeile 97:
#PAPERLESS_CONSUMER_POLLING=10
#PAPERLESS_CONSUMER_POLLING=10
#PAPERLESS_CONSUMER_DELETE_DUPLICATES=false
#PAPERLESS_CONSUMER_DELETE_DUPLICATES=false
#PAPERLESS_CONSUMER_RECURSIVE=false
PAPERLESS_CONSUMER_RECURSIVE=true
#PAPERLESS_CONSUMER_IGNORE_PATTERNS=[".DS_STORE/*", "._*", ".stfolder/*", ".stversions/*", ".localized/*", "desktop.ini"]
#PAPERLESS_CONSUMER_IGNORE_PATTERNS=[".DS_STORE/*", "._*", ".stfolder/*", ".stversions/*", ".localized/*", "desktop.ini"]
#PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=false
#PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=false
#PAPERLESS_CONSUMER_ENABLE_BARCODES=false
PAPERLESS_CONSUMER_ENABLE_BARCODES=true
#PAPERLESS_CONSUMER_BARCODE_STRING=PATCHT
#PAPERLESS_CONSUMER_BARCODE_STRING=PATCHT
#PAPERLESS_CONSUMER_BARCODE_UPSCALE=0.0
#PAPERLESS_CONSUMER_BARCODE_UPSCALE=0.0
Zeile 115: Zeile 127:
#PAPERLESS_GS_BINARY=/usr/bin/gs
#PAPERLESS_GS_BINARY=/usr/bin/gs
</pre>
</pre>
* DB und User erstellen


* sudo -Hu paperless python3 manage.py migrate
* sudo -Hu paperless python3 manage.py migrate
* sudo -Hu paperless python3 manage.py createsuperuser
* sudo -Hu paperless python3 manage.py createsuperuser
* sudo -Hu paperless python3 manage.py runserver
* cp -av ../scripts/*.service /etc/systemd/system/
* cp -av ../scripts/*.service /etc/systemd/system/
* cp -av ../scripts/*.socket /etc/systemd/system/
* cp -av ../scripts/*.socket /etc/systemd/system/
* systemctl daemon-reload
* systemctl daemon-reload
ImageMagick Policy in /etc/ImageMagick-6/policy.xml anpassen:
<pre>
  <policy domain="coder" rights="read|write" pattern="PDF" />
</pre>
JBIG2ENC compilen:
* git clone https://github.com/agl/jbig2enc
* cd jbig2enc
* ./autogen.sh
* ./configure
* make
* make install
NLTK installieren
* sudo -Hu paperless -s
* . venv-paperless-ngx/bin/activate
* python3
* import nltk
* nltk.download() (Nach /opt/paperless-ngx/nltk_data herunterladen)
* punkt, snowball_data und stopwords runterladen
Apache Tika und Gotenberg installieren (momentan über Docker)
* apt install docker-compose
* docker run -d -p 127.0.0.1:9998:9998 apache/tika
* docker run -d -p 127.0.0.1:3000:3000 gotenberg/gotenberg:8
== Update ==
* wget aktuellstes tar.xz
* systemctl stop paperless-*
* tar -xf <release.tar.xz> -C/opt/
* cd /opt/paperless-ngx
* cp -av paperless.conf.keep paperless.conf
* chown paperless:paperless . -R
* sudo -Hu paperless -s
* source /opt/paperless-ngx/venv-paperless-ngx/bin/activate
* pip install -r requirements.txt
* cd src
* python3 manage.py migrate
== Fixes ==
=== NLTK Fix ===
<pre>
sudo -Hu paperless -s
source /opt/paperless-ngx/venv-paperless-ngx/bin/activate
cd src
python3 -W ignore::RuntimeWarning -m nltk.downloader -d "/mnt/storage/dms/nltk_data" punkt_tab
</pre>
=== Logs show "possible incompatible database column" when deleting documents ===
You may see errors when deleting documents like:
<pre>
Data too long for column 'transaction_id' at row 1
</pre>
This error can occur in installations which have upgraded from a version of Paperless-ngx that used Django 4 (Paperless-ngx versions prior to v2.13.0) with a MariaDB/MySQL database. Due to the backawards-incompatible change in Django 5, the column "documents_document.transaction_id" will need to be re-created, which can be done with a one-time run of the following management command:
<pre>
sudo -Hu paperless -s
source /opt/paperless-ngx/venv-paperless-ngx/bin/activate
cd src
python3 manage.py convert_mariadb_uuid
</pre>
== Komplette Neuinstallation inkl. Komplettrestore ==
* lxc init images:debian/12 paperless-ngx
* lxc start papalass
* lxc shell papalass
* passwd
Anschliessend vorgehen wie bei [[#Installation]]
* die gesicherte ZIP-Datei des exports auf den neuen Paperless-Server kopieren und nach /temp entpacken
* sudo -Hu paperless -s
* source /opt/paperless-ngx/venv-paperless-ngx/bin/activate
* cd src
* python3 manage.py document_importer /temp/

Aktuelle Version vom 17. August 2025, 22:36 Uhr

Installation

  • sudo lxc-create -n paperless-ngx -B zfs -t download -- --dist debian --release bookworm --arch amd64
  • dpkg-reconfigure tzdata
  • dpkg-reconfigure locales
  • apt install python3 python3-pip python3-dev imagemagick fonts-liberation gnupg libpq-dev default-libmysqlclient-dev pkg-config libmagic-dev mime-support libzbar0 poppler-utils apt-transport-https man-db vim bash-completion openssh-server unpaper ghostscript icc-profiles-free qpdf liblept5 libxml2 pngquant zlib1g tesseract-ocr redis tesseract-ocr-deu git locate curl wget python3.11-venv emacs-nox autoconf libtool libleptonica-dev curl wget vim sudo mariadb-server
  • adduser paperless --system --home /opt/paperless-ngx --group
  • curl -O -L https://github.com/paperless-ngx/paperless-ngx/releases/download/v2.1.3/paperless-ngx-v2.1.3.tar.xz
  • tar -xf paperless-ngx-v2.1.3.tar.xz -C/opt/
  • cd /opt/paperless-ngx
  • mkdir media
  • mkdir data
  • mkdir consume
  • chown -R paperless:paperless .
  • chown paperless:paperless media
  • chown paperless:paperless data
  • chown paperless:paperless consume
  • sudo -Hu paperless python3 -m venv venv-paperless-ngx
  • sudo -Hu paperless -s
  • . venv-paperless-ngx/bin/activate
  • pip3 install -r requirements.txt
  • cd src
  • vim /opt/paperless-ngx/paperless.conf
# Have a look at the docs for documentation.
# https://docs.paperless-ngx.com/configuration/

# Debug. Only enable this for development.

#PAPERLESS_DEBUG=false

# Required services

PAPERLESS_REDIS=redis://localhost:6379
PAPERLESS_DBENGINE=mariadb
PAPERLESS_DBHOST=shodan.intern.darkrealm.dyndns.org
PAPERLESS_DBPORT=3306
PAPERLESS_DBNAME=paperless
PAPERLESS_DBUSER=paperless
PAPERLESS_DBPASS=paperless
PAPERLESS_DBSSLMODE=DISABLED

# Paths and folders

#PAPERLESS_CONSUMPTION_DIR=../consume
#PAPERLESS_DATA_DIR=../data
#PAPERLESS_TRASH_DIR=../trash
#PAPERLESS_MEDIA_ROOT=../media
PAPERLESS_CONSUMPTION_DIR=/mnt/storage/dms/consume
PAPERLESS_DATA_DIR=/mnt/storage/dms/data
PAPERLESS_TRASH_DIR=/mnt/storage/dms/trash
PAPERLESS_MEDIA_ROOT=/mnt/storage/dms/media
PAPERLESS_STATICDIR=../static
#PAPERLESS_FILENAME_FORMAT=
PAPERLESS_FILENAME_FORMAT={{owner_username}}/{{correspondent}}/{{document_type}}/{{created_year}}/{{title}}
PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=true

# Security and hosting

PAPERLESS_SECRET_KEY=Peem7AhD
PAPERLESS_URL=https://paperless.darkrealm.dyndns.org
#PAPERLESS_CSRF_TRUSTED_ORIGINS=https://example.com # can be set using PAPERLESS_URL
#PAPERLESS_ALLOWED_HOSTS=example.com,www.example.com # can be set using PAPERLESS_URL
#PAPERLESS_CORS_ALLOWED_HOSTS=https://localhost:8080,https://example.com # can be set using PAPERLESS_URL
#PAPERLESS_FORCE_SCRIPT_NAME=
#PAPERLESS_STATIC_URL=/static/
#PAPERLESS_AUTO_LOGIN_USERNAME=
#PAPERLESS_COOKIE_PREFIX=
#PAPERLESS_ENABLE_HTTP_REMOTE_USER=false

# OCR settings

PAPERLESS_OCR_LANGUAGE=eng+deu
#PAPERLESS_OCR_MODE=skip
#PAPERLESS_OCR_SKIP_ARCHIVE_FILE=never
#PAPERLESS_OCR_OUTPUT_TYPE=pdfa
#PAPERLESS_OCR_PAGES=1
#PAPERLESS_OCR_IMAGE_DPI=300
#PAPERLESS_OCR_CLEAN=clean
#PAPERLESS_OCR_DESKEW=true
#PAPERLESS_OCR_ROTATE_PAGES=true
#PAPERLESS_OCR_ROTATE_PAGES_THRESHOLD=4
#PAPERLESS_OCR_USER_ARGS={}
#PAPERLESS_CONVERT_MEMORY_LIMIT=0
#PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless

PAPERLESS_ENABLE_NLTK=true
#PAPERLESS_NLTK_DIR=../nltk_data
PAPERLESS_NLTK_DIR=/mnt/storage/dms/nltk_data
# Software tweaks

#PAPERLESS_TASK_WORKERS=1
#PAPERLESS_THREADS_PER_WORKER=1
PAPERLESS_TIME_ZONE=Europe/Berlin
#PAPERLESS_CONSUMER_POLLING=10
#PAPERLESS_CONSUMER_DELETE_DUPLICATES=false
PAPERLESS_CONSUMER_RECURSIVE=true
#PAPERLESS_CONSUMER_IGNORE_PATTERNS=[".DS_STORE/*", "._*", ".stfolder/*", ".stversions/*", ".localized/*", "desktop.ini"]
#PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=false
PAPERLESS_CONSUMER_ENABLE_BARCODES=true
#PAPERLESS_CONSUMER_BARCODE_STRING=PATCHT
#PAPERLESS_CONSUMER_BARCODE_UPSCALE=0.0
#PAPERLESS_CONSUMER_BARCODE_DPI=300
#PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED=false
#PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_SUBDIR_NAME=double-sided
#PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_TIFF_SUPPORT=false
#PAPERLESS_PRE_CONSUME_SCRIPT=/path/to/an/arbitrary/script.sh
#PAPERLESS_POST_CONSUME_SCRIPT=/path/to/an/arbitrary/script.sh
#PAPERLESS_FILENAME_DATE_ORDER=YMD
#PAPERLESS_FILENAME_PARSE_TRANSFORMS=[]
#PAPERLESS_NUMBER_OF_SUGGESTED_DATES=5
#PAPERLESS_THUMBNAIL_FONT_NAME=
#PAPERLESS_IGNORE_DATES=
#PAPERLESS_ENABLE_UPDATE_CHECK=

# Tika settings

PAPERLESS_TIKA_ENABLED=true
PAPERLESS_TIKA_ENDPOINT=http://localhost:9998
PAPERLESS_TIKA_GOTENBERG_ENDPOINT=http://localhost:3000

# Binaries

#PAPERLESS_CONVERT_BINARY=/usr/bin/convert
#PAPERLESS_GS_BINARY=/usr/bin/gs
  • DB und User erstellen
  • sudo -Hu paperless python3 manage.py migrate
  • sudo -Hu paperless python3 manage.py createsuperuser
  • cp -av ../scripts/*.service /etc/systemd/system/
  • cp -av ../scripts/*.socket /etc/systemd/system/
  • systemctl daemon-reload

ImageMagick Policy in /etc/ImageMagick-6/policy.xml anpassen:

  <policy domain="coder" rights="read|write" pattern="PDF" />

JBIG2ENC compilen:

NLTK installieren

  • sudo -Hu paperless -s
  • . venv-paperless-ngx/bin/activate
  • python3
  • import nltk
  • nltk.download() (Nach /opt/paperless-ngx/nltk_data herunterladen)
  • punkt, snowball_data und stopwords runterladen

Apache Tika und Gotenberg installieren (momentan über Docker)

  • apt install docker-compose
  • docker run -d -p 127.0.0.1:9998:9998 apache/tika
  • docker run -d -p 127.0.0.1:3000:3000 gotenberg/gotenberg:8

Update

  • wget aktuellstes tar.xz
  • systemctl stop paperless-*
  • tar -xf <release.tar.xz> -C/opt/
  • cd /opt/paperless-ngx
  • cp -av paperless.conf.keep paperless.conf
  • chown paperless:paperless . -R
  • sudo -Hu paperless -s
  • source /opt/paperless-ngx/venv-paperless-ngx/bin/activate
  • pip install -r requirements.txt
  • cd src
  • python3 manage.py migrate

Fixes

NLTK Fix

sudo -Hu paperless -s
source /opt/paperless-ngx/venv-paperless-ngx/bin/activate
cd src
python3 -W ignore::RuntimeWarning -m nltk.downloader -d "/mnt/storage/dms/nltk_data" punkt_tab

Logs show "possible incompatible database column" when deleting documents

You may see errors when deleting documents like:

Data too long for column 'transaction_id' at row 1

This error can occur in installations which have upgraded from a version of Paperless-ngx that used Django 4 (Paperless-ngx versions prior to v2.13.0) with a MariaDB/MySQL database. Due to the backawards-incompatible change in Django 5, the column "documents_document.transaction_id" will need to be re-created, which can be done with a one-time run of the following management command:

sudo -Hu paperless -s
source /opt/paperless-ngx/venv-paperless-ngx/bin/activate
cd src
python3 manage.py convert_mariadb_uuid

Komplette Neuinstallation inkl. Komplettrestore

  • lxc init images:debian/12 paperless-ngx
  • lxc start papalass
  • lxc shell papalass
  • passwd

Anschliessend vorgehen wie bei #Installation

  • die gesicherte ZIP-Datei des exports auf den neuen Paperless-Server kopieren und nach /temp entpacken
  • sudo -Hu paperless -s
  • source /opt/paperless-ngx/venv-paperless-ngx/bin/activate
  • cd src
  • python3 manage.py document_importer /temp/