611 lines
35 KiB
Markdown
611 lines
35 KiB
Markdown
---
|
||
tags:
|
||
- Writing
|
||
- table-wrap
|
||
authors:
|
||
- name: GPT-4.5
|
||
url: https://chatgpt.com
|
||
affiliation:
|
||
- name: OpenAI
|
||
url: https://openai.com
|
||
- name: Nicole Dresselhaus
|
||
affiliation:
|
||
- name: Humboldt-Universität zu Berlin
|
||
url: https://hu-berlin.de
|
||
orcid: 0009-0008-8850-3679
|
||
date: 2025-05-05
|
||
categories:
|
||
- Article
|
||
- Case-study
|
||
- ML
|
||
- NER
|
||
lang: en
|
||
citation: true
|
||
fileClass: authored
|
||
title: "Case Study: Local LLM-Based NER with n8n and Ollama"
|
||
abstract: |
|
||
Named Entity Recognition (NER) is a foundational task in text analysis,
|
||
traditionally addressed by training NLP models on annotated data. However, a
|
||
recent study – _“NER4All or Context is All You Need”_ – showed that
|
||
out-of-the-box Large Language Models (LLMs) can **significantly outperform**
|
||
classical NER pipelines (e.g. spaCy, Flair) on historical texts by using clever
|
||
prompting, without any model retraining. This case study demonstrates how to
|
||
implement the paper’s method using entirely local infrastructure: an **n8n**
|
||
automation workflow (for orchestration) and a **Ollama** server running a
|
||
14B-parameter LLM on an NVIDIA A100 GPU. The goal is to enable research
|
||
engineers and tech-savvy historians to **reproduce and apply this method
|
||
easily** on their own data, with a focus on usability and correct outputs rather
|
||
than raw performance.
|
||
|
||
We will walk through the end-to-end solution – from accepting a webhook input
|
||
that defines entity types (e.g. Person, Organization, Location) to prompting a
|
||
local LLM to extract those entities from a text. The solution covers setup
|
||
instructions, required infrastructure (GPU, memory, software), model
|
||
configuration, and workflow design in n8n. We also discuss potential limitations
|
||
(like model accuracy and context length) and how to address them. By the end,
|
||
you will have a clear blueprint for a **self-hosted NER pipeline** that
|
||
leverages the knowledge encoded in LLMs (as advocated by the paper) while
|
||
maintaining data privacy and reproducibility.
|
||
|
||
bibliography:
|
||
- ner4all-case-study.bib
|
||
citation-style: springer-humanities-brackets
|
||
nocite: |
|
||
@*
|
||
image: ../thumbs/writing_ner4all-case-study.png
|
||
---
|
||
|
||
## Background: LLM-Based NER Method Overview
|
||
|
||
The referenced study introduced a prompt-driven approach to NER, reframing it
|
||
“from a purely linguistic task into a humanities-focused task”. Instead of
|
||
training a specialized NER model for each corpus, the method leverages the fact
|
||
that large pretrained LLMs already contain vast world knowledge and language
|
||
understanding. The key idea is to **provide the model with contextual
|
||
definitions and instructions** so it can recognize entities in context. Notably,
|
||
the authors found that with proper prompts, a commercial LLM (ChatGPT-4) could
|
||
achieve **precision and recall on par with or better than** state-of-the-art NER
|
||
tools on a 1921 historical travel guide. This was achieved **zero-shot**, i.e.
|
||
without any fine-tuning or additional training data beyond the prompt itself.
|
||
|
||
**Prompt Strategy:** The success of this approach hinges on careful prompt
|
||
engineering. The final prompt used in the paper had multiple components:
|
||
|
||
- **Persona & Context:** A brief introduction framing the LLM as an _expert_
|
||
reading a historical text, possibly including domain context (e.g. “This text
|
||
is an early 20th-century travel guide; language is old-fashioned”). This
|
||
primes the model with relevant background.
|
||
- **Task Instructions:** A clear description of the NER task, including the list
|
||
of entity categories and how to mark them in text. For example: _“Identify all
|
||
Person (PER), Location (LOC), and Organization (ORG) names in the text and
|
||
mark each by enclosing it in tags.”_
|
||
- **Optional Examples:** A few examples of sentences with correct tagged output
|
||
(few-shot learning) to guide the model. Interestingly, the study found that
|
||
zero-shot prompting often **outperformed few-shot** until \~16 examples were
|
||
provided. Given the cost of preparing examples and limited prompt length, our
|
||
implementation will focus on zero-shot usage for simplicity.
|
||
- **Reiteration & Emphasis:** The prompt repeated key instructions in different
|
||
words and emphasized compliance (e.g. _“Make sure you follow the tagging
|
||
format exactly for every example.”_). This redundancy helps the model adhere
|
||
to instructions.
|
||
- **Prompt Engineering Tricks:** They included creative cues to improve
|
||
accuracy, such as offering a “monetary reward for each correct classification”
|
||
and the phrase _“Take a deep breath and think step by step.”_. These tricks,
|
||
drawn from prior work, encouraged the model to be thorough and careful.
|
||
- **Output Format:** Crucially, the model was asked to **repeat the original
|
||
text exactly** but insert tags around entity mentions. The authors settled on
|
||
a format like `<<PER ... /PER>>` to tag people, `<<LOC ... /LOC>>` for
|
||
locations, etc., covering each full entity span. This inline tagging format
|
||
leveraged the model’s familiarity with XML/HTML syntax (from its training
|
||
data) and largely eliminated problems like unclosed tags or extra spaces. By
|
||
instructing the model _not to alter any other text_, they ensured the output
|
||
could be easily compared to the input and parsed for entities.
|
||
|
||
**Why Local LLMs?** The original experiments used a proprietary API (ChatGPT-4).
|
||
To make the method accessible to all (and avoid data governance issues of cloud
|
||
APIs), we implement it with **open-source LLMs running locally**. Recent openly
|
||
licensed models are rapidly improving and can handle such extraction tasks given
|
||
the right prompt. Running everything locally also aligns with the paper’s goal
|
||
of “democratizing access” to NER for diverse, low-resource texts – there are no
|
||
API costs or internet needed, and data stays on local hardware for privacy.
|
||
|
||
## Solution Architecture
|
||
|
||
Our solution consists of a **workflow in n8n** that orchestrates the NER
|
||
process, and a **local Ollama server** that hosts the LLM for text analysis. The
|
||
high-level workflow is as follows:
|
||
|
||
1. **Webhook Trigger (n8n):** A user initiates the process by sending an HTTP
|
||
request to n8n’s webhook with two inputs: (a) a simple text defining the
|
||
entity categories of interest (for example, `"PER, ORG, LOC"`), and (b) the
|
||
text to analyze (either included in the request or accessible via a provided
|
||
file URL). This trigger node captures the input and starts the automation.
|
||
2. **Prompt Construction (n8n):** The workflow builds a structured prompt for
|
||
the LLM. Based on the webhook input, it prepares the system instructions
|
||
listing each entity type and guidelines, then appends the user’s text.
|
||
Essentially, n8n will merge the _entity definitions_ into a pre-defined
|
||
prompt template (the one derived from the paper’s method). This can be done
|
||
using a **Function node** or an **LLM Prompt node** in n8n to ensure the text
|
||
and instructions are combined correctly.
|
||
3. **LLM Inference (Ollama + LLM):** n8n then passes the prompt to an **Ollama
|
||
Chat Model node**, which communicates with the Ollama server’s API. The
|
||
Ollama daemon hosts the selected 14B model on the local GPU and returns the
|
||
model’s completion. In our case, the completion will be the original text
|
||
with NER tags inserted around the entities (e.g.
|
||
`<<PER John Doe /PER>> went to <<LOC Berlin /LOC>> ...`). This step harnesses
|
||
the A100 GPU to generate results quickly, using the chosen model’s weights
|
||
locally.
|
||
4. **Output Processing (n8n):** The tagged text output from the LLM can be
|
||
handled in two ways. The simplest is to **return the tagged text directly**
|
||
as the response to the webhook call – allowing the user to see their original
|
||
text with all entities highlighted by tags. Alternatively, n8n can
|
||
post-process the tags to extract a structured list of entities (e.g. a JSON
|
||
array of `{"entity": "John Doe", "type": "PER"}`{.json} objects). This
|
||
parsing can be done with a Regex or code node, but given our focus on
|
||
correctness, we often trust the model’s tagging format to be consistent (the
|
||
paper reported the format was reliably followed when instructed clearly).
|
||
Finally, an **HTTP Response** node sends the results back to the user (or
|
||
stores them), completing the workflow.
|
||
|
||
**Workflow Structure:** In n8n’s interface, the workflow might look like a
|
||
sequence of connected nodes: **Webhook → Function (build prompt) → AI Model
|
||
(Ollama) → Webhook Response**. If using n8n’s new AI Agent feature, some steps
|
||
(like prompt templating) can be configured within the AI nodes themselves. The
|
||
key is that the Ollama model node is configured to use the local server (usually
|
||
at `http://127.0.0.1:11434` by default) and the specific model name. We assume
|
||
the base pipeline (available on GitHub) already includes most of this structure
|
||
– our task is to **slot in the custom prompt and model configuration** for the
|
||
NER use case.
|
||
|
||
## Setup and Infrastructure Requirements
|
||
|
||
To reproduce this solution, you will need a machine with an **NVIDIA GPU** and
|
||
the following software components installed:
|
||
|
||
- **n8n (v1.**x** or later)** – the workflow automation tool. You can install
|
||
n8n via npm, Docker, or use the desktop app. For a server environment, Docker
|
||
is convenient. For example, to run n8n with Docker:
|
||
|
||
```bash
|
||
docker run -it --rm \
|
||
-p 5678:5678 \
|
||
-v ~/.n8n:/home/node/.n8n \
|
||
n8nio/n8n:latest
|
||
```
|
||
|
||
This exposes n8n on `http://localhost:5678` for the web interface. (If you use
|
||
Docker and plan to connect to a host-running Ollama, start the container with
|
||
`--network=host` to allow access to the Ollama API on localhost.)
|
||
|
||
- **Ollama (v0.x\*)** – an LLM runtime that serves models via an HTTP API.
|
||
Installing Ollama is straightforward: download the installer for your OS from
|
||
the official site (Linux users can run the one-line script
|
||
`curl -sSL https://ollama.com/install.sh | sh`). After installation, start the
|
||
Ollama server (daemon) by running:
|
||
|
||
```bash
|
||
ollama serve
|
||
```
|
||
|
||
This will launch the service listening on port 11434. You can verify it’s
|
||
running by opening `http://localhost:11434` in a browser – it should respond
|
||
with “Ollama is running”. _Note:_ Ensure your system has recent NVIDIA drivers
|
||
and CUDA support if using GPU. Ollama supports NVIDIA GPUs with compute
|
||
capability ≥5.0 (the A100 is well above this). Use `nvidia-smi` to confirm
|
||
your GPU is recognized. If everything is set up, Ollama will automatically use
|
||
the GPU for model inference (falling back to CPU if none available).
|
||
|
||
- **LLM Model (14B class):** Finally, download at least one large language model
|
||
to use for NER. You have a few options here, and you can “pull” them via
|
||
Ollama’s CLI:
|
||
|
||
- _DeepSeek-R1 14B:_ A 14.8B-parameter model distilled from larger reasoning
|
||
models (based on Qwen architecture). It’s optimized for reasoning tasks and
|
||
compares to OpenAI’s models in quality. Pull it with:
|
||
|
||
```bash
|
||
ollama pull deepseek-r1:14b
|
||
```
|
||
|
||
This downloads \~9 GB of data (the quantized weights). If you have a very
|
||
strong GPU (e.g. A100 80GB), you could even try `deepseek-r1:70b` (\~43 GB),
|
||
but 14B is a good balance for our use-case. DeepSeek-R1 is licensed MIT and
|
||
designed to run locally with no restrictions.
|
||
|
||
- _Cogito 14B:_ A 14B “hybrid reasoning” model by Deep Cogito, known for
|
||
excellent instruction-following and multilingual capability. Pull it with:
|
||
|
||
```bash
|
||
ollama pull cogito:14b
|
||
```
|
||
|
||
Cogito-14B is also \~9 GB (quantized) and supports an extended context
|
||
window up to **128k tokens** – which is extremely useful if you plan to
|
||
analyze very long documents without chunking. It’s trained in 30+ languages
|
||
and tuned to follow complex instructions, which can help in structured
|
||
output tasks like ours.
|
||
|
||
- _Others:_ Ollama offers many models (LLaMA 2 variants, Mistral, etc.). For
|
||
instance, `ollama pull llama2:13b` would get a LLaMA-2 13B model. These can
|
||
work, but for best results in NER with no fine-tuning, we suggest using one
|
||
of the above well-instructed models. If your hardware is limited, you could
|
||
try a 7-8B model (e.g., `deepseek-r1:7b` or `cogito:8b`), which download
|
||
faster and use \~4–5 GB VRAM, at the cost of some accuracy. In CPU-only
|
||
scenarios, even a 1.5B model is available – it will run very slowly and
|
||
likely miss more entities, but it proves the pipeline can work on minimal
|
||
hardware.
|
||
|
||
**Hardware Requirements:** Our case assumes an NVIDIA A100 GPU (40 GB), which
|
||
comfortably hosts a 14B model in memory and accelerates inference. In practice,
|
||
any modern GPU with ≥10 GB memory can run a 13–14B model in 4-bit quantization.
|
||
For example, an RTX 3090 or 4090 (24 GB) could handle it, and even smaller GPUs
|
||
(or Apple Silicon with 16+ GB RAM) can run 7B models. Ensure you have sufficient
|
||
**system RAM** as well (at least as much as the model size, plus overhead for
|
||
n8n – 16 GB RAM is a safe minimum for 14B). Disk space of \~10 GB per model is
|
||
needed. If using Docker for n8n, allocate CPU and memory generously to avoid
|
||
bottlenecks when the LLM node processes large text.
|
||
|
||
## Building the n8n Workflow
|
||
|
||
With the environment ready, we now construct the n8n workflow that ties
|
||
everything together. We outline each component with instructions:
|
||
|
||
### 1. Webhook Input for Entities and Text
|
||
|
||
Start by creating a **Webhook trigger** node in n8n. This will provide a URL
|
||
(endpoint) that you can send a request to. Configure it to accept a POST request
|
||
containing the necessary inputs. For example, we expect the request JSON to look
|
||
like:
|
||
|
||
```json
|
||
{
|
||
"entities": "PER, ORG, LOC",
|
||
"text": "John Doe visited Berlin in 1921 and met with the Board of Acme Corp."
|
||
}
|
||
```
|
||
|
||
Here, `"entities"` is a simple comma-separated string of entity types (you could
|
||
also accept an array or a more detailed schema; for simplicity we use the format
|
||
used in the paper: PER for person, LOC for location, ORG for organization). The
|
||
`"text"` field contains the content to analyze. In a real scenario, the text
|
||
could be much longer or might be sent as a file. If it's a file, one approach is
|
||
to send it as form-data and use n8n’s **Read Binary File** + **Move Binary
|
||
Data** nodes to get it into text form. Alternatively, send a URL in the JSON and
|
||
use an HTTP Request node in the workflow to fetch the content. The key is that
|
||
by the end of this step, we have the raw text and the list of entity labels
|
||
available in the n8n workflow as variables.
|
||
|
||
### 2. Constructing the LLM Prompt
|
||
|
||
Next, add a node to build the prompt that will be fed to the LLM. You can use a
|
||
**Function** node (JavaScript code) or the **“Set” node** to template a prompt
|
||
string. We will create two pieces of prompt content: a **system instruction**
|
||
(the role played by the system prompt in chat models) and the **user message**
|
||
(which will contain the text to be processed).
|
||
|
||
According to the method, our **system prompt** should incorporate the following:
|
||
|
||
- **Persona/Context:** e.g. _“You are a historian and archivist analyzing a
|
||
historical document. The language may be old or have archaic spellings. You
|
||
have extensive knowledge of people, places, and organizations relevant to the
|
||
context.”_ This establishes domain expertise in the model.
|
||
- **Task Definition:** e.g. _“Your task is to perform Named Entity Recognition.
|
||
Identify all occurrences of the specified entity types in the given text and
|
||
annotate them with the corresponding tags.”_
|
||
- **Entity Definitions:** List the entity categories provided by the user, with
|
||
a brief definition if needed. For example: _“The entity types are: PER
|
||
(persons or fictional characters), ORG (organizations, companies,
|
||
institutions), LOC (locations such as cities, countries, landmarks).”_ If the
|
||
user already provided definitions in the webhook, include those; otherwise a
|
||
generic definition as shown is fine.
|
||
- **Tagging Instructions:** Clearly explain the tagging format. We adopt the
|
||
format from the paper: each entity should be wrapped in `<<TYPE ... /TYPE>>`.
|
||
So instruct: _“Enclose each entity in double angle brackets with its type
|
||
label. For example: <\<PER John Doe /PER>> for a person named John Doe. Do not
|
||
alter any other text – only insert tags. Ensure every opening tag has a
|
||
closing tag.”_ Also mention that tags can nest or overlap if necessary (though
|
||
that’s rare).
|
||
- **Output Expectations:** Emphasize that the output should be the **exact
|
||
original text, verbatim, with tags added** and nothing else. For example:
|
||
_“Repeat the input text exactly, adding the tags around the entities. Do not
|
||
add explanations or remove any content. The output should look like the
|
||
original text with markup.”_ This is crucial to prevent the model from
|
||
omitting or rephrasing text. The paper’s prompt literally had a line: “Repeat
|
||
the given text exactly. Be very careful to ensure that nothing is added or
|
||
removed apart from the annotations.”.
|
||
- **Compliance & Thoughtfulness:** We can borrow the trick of telling the model
|
||
to take its time and be precise. For instance: _“Before answering, take a deep
|
||
breath and think step by step. Make sure you find **all** entities. You will
|
||
be rewarded for each correct tag.”_ While the notion of reward is
|
||
hypothetical, such phrasing has been observed to sharpen the model’s focus.
|
||
This is optional but can be useful for complex texts.
|
||
|
||
Once this system prompt is assembled as a single string, it will be sent as the
|
||
system role content to the LLM. Now, for the **user prompt**, we simply supply
|
||
the text to be analyzed. In many chat-based LLMs, the user message would contain
|
||
the text on which the assistant should perform the task. We might prefix it with
|
||
something like “Text to analyze:\n” for clarity, or just include the raw text.
|
||
(Including a prefix is slightly safer to distinguish it from any instructions,
|
||
but since the system prompt already set the task, the user message can be just
|
||
the document text.)
|
||
|
||
In n8n, if using the **Basic LLM Chain** node, you can configure it to use a
|
||
custom system prompt. For example, connect the Function/Set node output into the
|
||
LLM node, and in the LLM node’s settings choose “Mode: Complete” or similar,
|
||
then under **System Instructions** put an expression that references the
|
||
constructed prompt text (e.g., `{{ $json["prompt"] }}` if the prompt was output
|
||
to that field). The **User Message** can similarly be fed from the input text
|
||
field (e.g., `{{ $json["text"] }}`). Essentially, we map our crafted instruction
|
||
into the system role, and the actual content into the user role.
|
||
|
||
### 3. Configuring the Local LLM (Ollama Model Node)
|
||
|
||
Now configure the LLM node to use the **Ollama** backend and your downloaded
|
||
model. n8n provides an “Ollama Chat Model” integration, which is a sub-node of
|
||
the AI Agent system. In the n8n editor, add or open the LLM node (if using the
|
||
AI Agent, this might be inside a larger agent node), and look for model
|
||
selection. Select **Ollama** as the provider. You’ll need to set up a credential
|
||
for Ollama API access – use `http://127.0.0.1:11434` as the host (instead of the
|
||
default localhost, to avoid any IPv6 binding issues). No API key is needed since
|
||
it’s local. Once connected, you should see a dropdown of available models (all
|
||
the ones you pulled). Choose the 14B model you downloaded, e.g.
|
||
`deepseek-r1:14b` or `cogito:14b`.
|
||
|
||
Double-check the **parameters** for generation. By default, Ollama models have
|
||
their own preset for max tokens and temperature. For an extraction task, we want
|
||
the model to stay **focused and deterministic**. It’s wise to set a relatively
|
||
low temperature (e.g. 0.2) to reduce randomness, and a high max tokens so it can
|
||
output the entire text with tags (set max tokens to at least the length of your
|
||
input in tokens plus 10-20% for tags). If using Cogito with its 128k context,
|
||
you can safely feed very long text; with other models (often \~4k context),
|
||
ensure your text isn’t longer than the model’s context limit or use a model
|
||
variant with extended context. If the model supports **“tools” or functions**,
|
||
you won’t need those here – this is a single-shot prompt, not a multi-step agent
|
||
requiring tool usage, so just the chat completion mode is sufficient.
|
||
|
||
At this point, when the workflow runs to this node, n8n will send the system and
|
||
user messages to Ollama and wait for the response. The heavy lifting is done by
|
||
the LLM on the GPU, which will generate the tagged text. On an A100, a 14B model
|
||
can process a few thousand tokens of input and output in just a handful of
|
||
seconds (exact time depends on the model and input size).
|
||
|
||
### 4. Returning the Results
|
||
|
||
After the LLM node, add a node to handle the output. If you want to present the
|
||
**tagged text** directly, you can pass the LLM’s output to the final Webhook
|
||
Response node (or if using the built-in n8n chat UI, you would see the answer in
|
||
the chat). The tagged text will look something like:
|
||
|
||
```plain
|
||
<<PER John Doe /PER>> visited <<LOC Berlin /LOC>> in 1921 and met with the Board
|
||
of <<ORG Acme Corp /ORG>>.
|
||
```
|
||
|
||
This format highlights each identified entity. It is immediately human-readable
|
||
with the tags, and trivial to post-process if needed. For example, one could use
|
||
a regex like `<<(\w+) (.*?) /\1>>` to extract all `type` and `entity` pairs from
|
||
the text. In n8n, a quick approach is to use a **Function** node to find all
|
||
matches of that pattern in `item.json["data"]` (assuming the LLM output is in
|
||
`data`). Then one could return a JSON array of entities. However, since our
|
||
focus is on correctness and ease, you might simply return the marked-up text and
|
||
perhaps document how to parse it externally if the user wants structured data.
|
||
|
||
Finally, use an **HTTP Response** node (if the workflow was triggered by a
|
||
Webhook) to send back the results. If the workflow was triggered via n8n’s chat
|
||
trigger (in the case of interactive usage), you would instead rely on the chat
|
||
UI output. For a pure API workflow, the HTTP response will contain either the
|
||
tagged text or a JSON of extracted entities, which the user’s script or
|
||
application can then use.
|
||
|
||
**Note:** If you plan to run multiple analyses or have an ongoing service, you
|
||
might want to **persist the Ollama server** (don’t shut it down between runs)
|
||
and perhaps keep the model loaded in VRAM for performance. Ollama will cache the
|
||
model in memory after the first request, so subsequent requests are faster. On
|
||
an A100, you could even load two models (if you plan to experiment with which
|
||
gives better results) but be mindful of VRAM usage if doing so concurrently.
|
||
|
||
## Model Selection Considerations
|
||
|
||
We provided two example 14B models (DeepSeek-R1 and Cogito) to use with this
|
||
pipeline. Both are good choices, but here are some considerations and
|
||
alternatives:
|
||
|
||
- **Accuracy vs. Speed:** Larger models (like 14B or 30B) generally produce more
|
||
accurate and coherent results, especially for complex instructions, compared
|
||
to 7B models. Since our aim is correctness of NER output, the A100 allows us
|
||
to use a 14B model which offers a sweet spot. In preliminary tests, these
|
||
models can correctly tag most obvious entities and even handle some tricky
|
||
cases (e.g. person names with titles, organizations that sound like person
|
||
names, etc.) thanks to their pretrained knowledge. If you find the model is
|
||
making mistakes, you could try a bigger model (Cogito 32B or 70B, if resources
|
||
permit). Conversely, if you need faster responses and are willing to trade
|
||
some accuracy, a 7-8B model or running the 14B at a higher quantization (e.g.
|
||
4-bit) on CPU might be acceptable for smaller texts.
|
||
- **Domain of the Text:** The paper dealt with historical travel guide text
|
||
(1920s era). These open models have been trained on large internet corpora, so
|
||
they likely have seen a lot of historical names and terms, but their coverage
|
||
might not be as exhaustive as GPT-4. If your text is in a specific domain
|
||
(say, ancient mythology or very obscure local history), the model might miss
|
||
entities that it doesn’t recognize as famous. The prompt’s context can help
|
||
(for example, adding a note like _“Note: Mythological characters should be
|
||
considered PERSON entities.”_ as they did for Greek gods). For extremely
|
||
domain-specific needs, one could fine-tune a model or use a specialized one,
|
||
but that moves beyond the zero-shot philosophy.
|
||
- **Language:** If your texts are not in English, ensure the chosen model is
|
||
multilingual. Cogito, for instance, was trained in over 30 languages, so it
|
||
can handle many European languages (the paper also tested German prompts). If
|
||
using a model that’s primarily English (like some LLaMA variants), you might
|
||
get better results by writing the instructions in English but letting it
|
||
output tags in the original text. The study found English prompts initially
|
||
gave better recall even on German text, but with prompt tweaks the gap closed.
|
||
For our pipeline, you can simply provide the definitions in English and the
|
||
text in the foreign language – a capable model will still tag the foreign
|
||
entities. For example, Cogito or DeepSeek should tag a German sentence’s
|
||
_“Herr Schmidt”_ as `<<PER Herr Schmidt /PER>>`. Always test on a small sample
|
||
if in doubt.
|
||
- **Extended Context:** If your input text is very long (tens of thousands of
|
||
words), you should chunk it into smaller segments (e.g. paragraph by
|
||
paragraph) and run the model on each, then merge the outputs. This is because
|
||
most models (including DeepSeek 14B) have a context window of 2048–8192
|
||
tokens. However, Cogito’s 128k context capability is a game-changer – in
|
||
theory you could feed an entire book and get a single output. Keep in mind the
|
||
time and memory usage will grow with very large inputs, and n8n might need
|
||
increased timeout settings for such long runs. For typical use (a few pages of
|
||
text at a time), the standard context is sufficient.
|
||
|
||
In our implementation, we encourage experimenting with both DeepSeek-R1 and
|
||
Cogito models. Both are **open-source and free for commercial use** (Cogito uses
|
||
an Apache 2.0 license, DeepSeek MIT). They represent some of the best 14B-class
|
||
models as of early 2025. You can cite these models in any academic context if
|
||
needed, or even switch to another model with minimal changes to the n8n workflow
|
||
(just pull the model and change the model name in the Ollama node).
|
||
|
||
## Example Run
|
||
|
||
Let’s run through a hypothetical example to illustrate the output. Suppose a
|
||
historian supplies the following via the webhook:
|
||
|
||
- **Entities:** `PER, ORG, LOC`
|
||
- **Text:** _"Baron Münchhausen was born in Bodenwerder and served in the
|
||
Russian military under Empress Anna. Today, the Münchhausen Museum in
|
||
Bodenwerder is operated by the town council."_
|
||
|
||
When the workflow executes, the LLM receives instructions to tag people (PER),
|
||
organizations (ORG), and locations (LOC). With the prompt techniques described,
|
||
the model’s output might look like:
|
||
|
||
```plain
|
||
<<PER Baron Münchhausen /PER>> was born in <<LOC Bodenwerder /LOC>> and served
|
||
in the Russian military under <<PER Empress Anna /PER>>. Today, the <<ORG
|
||
Münchhausen Museum /ORG>> in <<LOC Bodenwerder /LOC>> is operated by the town
|
||
council.
|
||
```
|
||
|
||
All person names (Baron Münchhausen, Empress Anna) are enclosed in `<<PER>>`
|
||
tags, the museum is marked as an organization, and the town Bodenwerder is
|
||
marked as a location (twice). The rest of the sentence remains unchanged. This
|
||
output can be returned as-is to the user. They can visually verify it or
|
||
programmatically parse out the tagged entities. The correctness of outputs is
|
||
high: each tag corresponds to a real entity mention in the text, and there are
|
||
no hallucinated tags. If the model were to make an error (say, tagging "Russian"
|
||
as LOC erroneously), the user could adjust the prompt (for example, clarify that
|
||
national adjectives are not entities) and re-run.
|
||
|
||
## Limitations and Solutions
|
||
|
||
While this pipeline makes NER easier to reproduce, it’s important to be aware of
|
||
its limitations and how to mitigate them:
|
||
|
||
- **Model Misclassifications:** A local 14B model may not match GPT-4’s level of
|
||
understanding. It might occasionally tag something incorrectly or miss a
|
||
subtle entity. For instance, in historical texts, titles or honorifics (e.g.
|
||
_“Dr. John Smith”_) might confuse it, or a ship name might be tagged as ORG
|
||
when it’s not in our categories. **Solution:** Refine the prompt with
|
||
additional guidance. You can add a “Note” section in the instructions to
|
||
handle known ambiguities (the paper did this with notes about Greek gods being
|
||
persons, etc.). Also, a quick manual review or spot-check is recommended for
|
||
important outputs. Since the output format is simple, a human or a simple
|
||
script can catch obvious mistakes (e.g., if "Russian" was tagged LOC, a
|
||
post-process could remove it knowing it's likely wrong). Over time, if you
|
||
notice a pattern of mistakes, update the prompt instructions accordingly.
|
||
|
||
- **Text Reproduction Issues:** We instruct the model to output the original
|
||
text verbatim with tags, but LLMs sometimes can’t resist minor changes. They
|
||
may “correct” spelling or punctuation, or alter spacing. The paper noted this
|
||
tendency and used fuzzy matching when evaluating. In our pipeline, minor
|
||
format changes usually don’t harm the extraction, but if preserving text
|
||
exactly is important (say for downstream alignment), this is a concern.
|
||
**Solution:** Emphasize fidelity in the prompt (we already do). If needed, do
|
||
a diff between the original text and tagged text and flag differences. Usually
|
||
differences will be small (e.g., changing an old spelling to modern). You can
|
||
then either accept them or attempt a more rigid approach (like asking for a
|
||
JSON list of entity offsets – though that introduces other complexities and
|
||
was intentionally avoided by the authors). In practice, we found the tag
|
||
insertion approach with strong instructions yields nearly identical text apart
|
||
from the tags.
|
||
|
||
- **Long Inputs and Memory:** Very large documents may exceed the model’s input
|
||
capacity or make the process slow. The A100 GPU can handle a lot, but n8n
|
||
itself might have default timeouts for a single workflow execution.
|
||
**Solution:** For long texts, break the input into smaller chunks (maybe one
|
||
chapter or section at a time). n8n can loop through chunks using the Split In
|
||
Batches node or simply by splitting the text in the Function node and feeding
|
||
the LLM node multiple times. You’d then concatenate the outputs. If chunking,
|
||
ensure that if an entity spans a chunk boundary, it might be missed – usually
|
||
rare in well-chosen chunk boundaries (paragraph or sentence). Alternatively,
|
||
use Cogito for its extended context to avoid chunking. Make sure to increase
|
||
n8n’s execution timeout if needed (via environment variable
|
||
`N8N_DEFAULT_TIMEOUT`{.bash} or in the workflow settings).
|
||
|
||
- **Concurrent Usage:** If multiple users or processes hit the webhook
|
||
simultaneously, they would be sharing the single LLM instance. Ollama can
|
||
queue requests, but the GPU will handle them one at a time (unless running
|
||
separate instances with multiple GPUs). For a research setting with one user
|
||
at a time, this is fine. If offering this as a service to others, consider
|
||
queuing requests or scaling out (multiple replicas of this workflow on
|
||
different GPU machines). The stateless design of the prompt makes each run
|
||
independent.
|
||
|
||
- **n8n Learning Curve:** For historians new to n8n, setting up the workflow
|
||
might be unfamiliar. However, n8n’s no-code interface is fairly intuitive with
|
||
a bit of guidance. This case study provides the logic; one can also import
|
||
pre-built workflows. In fact, the _n8n_ community has template workflows (for
|
||
example, a template for chatting with local LLMs) that could be adapted. We
|
||
assume the base pipeline from the paper’s authors is available on GitHub –
|
||
using that as a starting point, one mostly needs to adjust nodes as described.
|
||
If needed, one can refer to n8n’s official docs or community forum for help on
|
||
creating a webhook or using function nodes. Once set up, running the workflow
|
||
is as easy as sending an HTTP request or clicking “Execute Workflow” in n8n.
|
||
|
||
- **Output Verification:** Since we prioritize correctness, you may want to
|
||
evaluate how well the model did, especially if you have ground truth
|
||
annotations. While benchmarking is out of scope here, note that you can
|
||
integrate evaluation into the pipeline too. For instance, if you had a small
|
||
test set with known entities, you could compare the model output tags with
|
||
expected tags using a Python script (n8n has an Execute Python node) or use an
|
||
NER evaluation library like _nervaluate_ for precision/recall. This is exactly
|
||
what the authors did to report performance, and you could mimic that to gauge
|
||
your chosen model’s accuracy.
|
||
|
||
## Conclusion
|
||
|
||
By following this guide, we implemented the **NER4All** paper’s methodology with
|
||
a local, reproducible setup. We used n8n to handle automation and prompt
|
||
assembly, and a local LLM (via Ollama) to perform the heavy-duty language
|
||
understanding. The result is a flexible NER pipeline that requires **no training
|
||
data or API access** – just a well-crafted prompt and a powerful pretrained
|
||
model. We demonstrated how a user can specify custom entity types and get their
|
||
text annotated in one click or API call. The approach leverages the strengths of
|
||
LLMs (vast knowledge and language proficiency) to adapt to historical or niche
|
||
texts, aligning with the paper’s finding that a bit of context and expert prompt
|
||
design can unlock high NER performance.
|
||
|
||
Importantly, this setup is **easy to reproduce**: all components are either
|
||
open-source or freely available (n8n, Ollama, and the models). A research
|
||
engineer or historian can run it on a single machine with sufficient resources,
|
||
and it can be shared as a workflow file for others to import. By removing the
|
||
need for extensive data preparation or model training, this lowers the barrier
|
||
to extracting structured information from large text archives.
|
||
|
||
Moving forward, users can extend this case study in various ways: adding more
|
||
entity types (just update the definitions input), switching to other LLMs as
|
||
they become available (perhaps a future 20B model with even better
|
||
understanding), or integrating the output with databases or search indexes for
|
||
further analysis. With the rapid advancements in local AI models, we anticipate
|
||
that such pipelines will become even more accurate and faster over time,
|
||
continually democratizing access to advanced NLP for all domains.
|
||
|
||
**Sources:** This implementation draws on insights from Ahmed et al. (2025) for
|
||
the prompt-based NER method, and uses tools like n8n and Ollama as documented in
|
||
their official guides. The chosen models (DeepSeek-R1 and Cogito) are described
|
||
in their respective releases. All software and models are utilized in accordance
|
||
with their licenses for a fully local deployment.
|
||
|
||
## About LLMs as 'authors' {.appendix}
|
||
|
||
The initial draft was created using "Deep-Research" from `gpt-4.5 (preview)`.
|
||
Final proofreading/content review/layouting by Nicole Dresselhaus. Do not fear
|
||
that this is some LLM-BS to get views on the homepage. I read everything
|
||
multiple times and would have written it with this content - just in worse
|
||
words.
|