Paperless
Paperless is an open source document management system that indexes your scanned documents and allows you to easily search for documents and store metadata alongside your documents. This article refers to Paperless-ng, a community-maintained fork of the original Paperless project that has been given up.
Installation
Install the paperless-ngAUR package.
Folders
Paperless gets installed to /usr/share/paperless
. The persistent storage of Paperless is located at /var/lib/paperless
and contains the media
folder, the data
folder (by default containing the SQLite database), the consume
folder for document consumption, the temporary uploads
folder and tmp
convert folder.
The consume
folder has write permissions for everyone in the paperless
group.
Consumption folder permissions
If you wish to allow users (e.g. "http") to put documents in the consumption folder, add them to the paperless
user group.
OCR languages
If you want Paperless to consume documents in a language other than English, you need to install the corresponding tesseract language data package. For German that would be tesseract-data-deu.
Reduce the size of generated PDF documents
You might want to install the optional dependency jbig2enc-gitAUR so Paperless can use it to reduce the size of generated PDF documents.
Start
Enable and start the Paperless systemd target:
# systemctl enable --now paperless.target
Your Paperless instance should now be available at port 8000.
Configuration
For details on Paperless configuration, visit its official documentation. The configuration file is located at /etc/paperless.conf
.
paperless
system user and provides a paperless-manage
command which should always be run as the paperless
user. See below for an example. The paperless-manage
command should be used wherever the official documentation refers to python3 manage.py
.Do not forget to restart paperless.target
after changing configurations.
Adjust the configuration to you needs
Open the configuration file located at /etc/paperless.conf
at adjust the parameters to your needs, especially those concerning OCR. For explanations of the individual settings, refer to the official documentation.
Set a secret key
After initial installation, you should generate and set a secret key. You do not need to remember it, but since it is used for securing signed data, you should keep it secret. To set a secret key, uncomment and modify the following line:
/etc/paperless.conf
#PAPERLESS_SECRET_KEY=change-me
To generate a key and set it in the configuration file, you can simply run the following command:
# sed -i /etc/paperless.conf -e "s|#PAPERLESS_SECRET_KEY=change-me|PAPERLESS_SECRET_KEY=$(date | md5sum | awk '{print $1;}')|"
Run database migrations
After initial installation and after updates, you should run the database migrations:
$ sudo -u paperless paperless-manage migrate
Create admin user
After initial installation, you should create an admin user for your Paperless instance:
$ sudo -u paperless paperless-manage createsuperuser
Nginx
Install Nginx and use the following configuration as a starting point for the Paperless virtual host:
/etc/nginx/sites-available/paperless.domain.tld
server { server_name paperless.domain.tld; listen 80; listen [::]:80; location / { # Adjust host and port as required. proxy_pass http://localhost:8000/; # These configuration options are required for WebSockets to work. proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Host $server_name; } }
Pacman hook
To automatically run migrations for the Paperless database on package updates, you can make use of the included pacman hook:
# mkdir -vp /etc/pacman.d/hooks # ln -sv /usr/share/paperless/docs/paperless-ng.hook /etc/pacman.d/hooks/
Troubleshooting
BadSignature errors logged when trying to import documents
If you see BadSignature
errors when trying to import documents, it is likely that your configuration file located at /etc/paperless.conf
is not taken into account because the template configuration file /usr/share/paperless/paperless.conf
is given precedence. In that case, remove or rename /usr/share/paperless/paperless.conf
and restart paperless.target
.
Warning about misconfigured retry and timeout
If you see a warning about misconfigured retry and timeout, you can safely ignore it and wait for the simple upstream fix in python-django-q. The warning would look like this:
gunicorn[29457]: /usr/lib/python3.9/site-packages/django_q/conf.py:136: UserWarning: Retry and timeout are misconfigured. Set retry larger than timeout, gunicorn[29457]: failure to do so will cause the tasks to be retriggered before completion. gunicorn[29457]: See https://django-q.readthedocs.io/en/latest/configure.html#retry for details. gunicorn[29457]: warn("""Retry and timeout are misconfigured. Set retry larger than timeout,
Thumbnail generation with ImageMagick fails
You have to disable a policy rule in /etc/ImageMagick-7/policy.xml
. Add <!--
and -->
to comment out the following line:
/etc/ImageMagick-7/policy.xml
<!-- <policy domain="delegate" rights="none" pattern="gs" /> -->
Consider the possible security implications noted at the beginning of the ImageMagick article. Also note that Paperless will fall back to using ghostscript anyway if the ImageMagick policy rule stays active.