Getting Started : AI Explained

Quick links:

DEVONthink's AI
AI Basics
Thoughts on choosing and using AI
What can't be done with AI

AI Explained

Artificial Intelligence (AI) has been the stuff of science fiction writers and tech evangelists for decades. Slowly and quietly, technology evolved as hardware improved and knowledge was gained. But in November of 2022, a company called OpenAI released a Large Language Model (LLM) called ChatGPT. Within a month, it was one of the most talked about technologies in the world, sparking everything from excitement to fear to puzzlement, becoming synonymous with the term "AI". Since then, we have seen a technological arms race to make it better, faster, more powerful and, well… more intelligent. Where this all leads, no one truly knows, but just as fire can be destructive it can also provide warmth and utility when handled properly. And while many companies rushed to be "first to market" with some kind of integration, our commitment to privacy and the safety of your data dictated our direction and investigations. After countless hours of development and testing, to the drawing board and back, we have created controlled ways for AI to be accessible, useful, and safe in your databases. We appreciate the patience and understanding from all of you as we journeyed down this path.

DEVONthink's AI

To clarify, an AI model is the technology used by services like ChatGPT, etc. In contrast, DEVONthink's AI predates these LLMs by 20 years and was built by us using an entirely different set of methods, methods that are still as powerful and functional today as they've always been. For example, the See Also inspector, the Tags and Graph inspectors, as well as commands in the Data menu are still controlled by our internal AI. The remainder of this section is about external AI services, like ChatGPT.

AI Basics

If you're new to AI, perhaps having heard people talking on forums, the news, etc., but you don't know much about it, here's a very simplified intro.

How does chat work?: At a very simplified level, your chat questions are broken down into bits and pieces, called "tokens", which are processed by AI. It examines a token then looks into its huge database of parameters trying to identify it and the most likely token that would follow it. It puts the best match in place then moves on to the next token. This process is repeated over and over again, mathematically constructing the tokens into words, then sentences returned to you as a response.

Parts of a query: There are three parts to making an AI query. Two are mandatory; the third, optional but useful.

AI model: The chat engine you're asking questions of. Obviously, you need to send the question somewhere. The default model you choose will handle those inquiries, though you can also choose others if you have access to them.
Prompt: This is your question or command. These can be simple, e.g., "Tell me about sea turtles." or "For the selected document, suggest three possible filenames." However, the more specific the prompts, the more specific the responses. You can include directives about how the response should be delivered, what level of explanation, etc. It's even possible to request certain types of formatting like asking for content returned in a Markdown table instead of a list.
Role: An optional but sometimes useful component, a role can define a persona to help direct the responses. This could be for yourself like, "I am in an undergraduate calculus class…" or for the AI engine such as "You are presenting an introductory workshop on woodworking…" The chat responses will be tailored to be appropriate for the role and setting. The role can be added as a part of your prompt. If you have a specific role you want to use with automation and templates, you can add a default role in the AI > Chat settings.

Example:

You are a biologist providing research data for government reporting. Provide a Markdown list of the last ten years of counted leatherback sea turtle eggs compared to hatchlings. Include columns for the number and percentage of increase or decrease. Include a prologue section to the document with an assessment of last years numbers. Include an epilogue with a forecast for this year's anticipated numbers.

We have also included some relevant terms in the Glossary > AI section of the Appendix.

Choosing an LLM: Remember that using AI is entirely optional. However, since you're reading this you are at least curious about it. So the first step is choosing an LLM. If you look at the AI > Chat settings, you will see where you set a default Chat model. These are the currently supported options, with no specific advocacy for any:

ChatGPT: OpenAI's breakthrough LLM, ChatGPT is a fast and powerful general purpose model but can also handle more technical inquiries.
Claude: Created by Anthropic, Claude is a privacy-focused LLM that provides excellent conversational responses to a variety of requests. It also can handle more technical inquiries. It does tend to provide more commentary, so you may want to add No chatter. to your prompt if you need to curb the excess.
Gemini: Google's Gemini LLM is a fast, no-nonsense responder, often just producing results with little excess commentary. This can be especially useful in information gathering and document construction.
Mistral: Mistral is from a French AI company traditionally focused on open-sourcing many of its models. Their models are broadly useful, from document analysis or code snippets to more general inquiries.
Perplexity: Beginning as a search-based AI engine similar to Google from the US, Perplexity offers self-branching searches with its Deep Research to gather and assess before responding. It also offers reasoning models. Deep Research is only intended for use with the Chat assistant.
Local LLMs: GPT4ALL, LMStudio, and Ollama are three applications that allow running an LLM downloaded on your Mac. These tout privacy and offline use but are limited by your hardware and the size of the model you can run.

Thoughts on choosing and using AI

As we've mentioned, adding AI capabilities was no trivial thing to investigate, discuss, and implement. And while we have come to terms with the areas of concern we could control, on a personal (or professional) level, there are still topics that you must consider and decide on for yourself.

Privacy: We are firm believers in data-privacy and we do whatever we can to keep things in your control. However, when dealing with a commercial AI service, your questions and potentially documents go to servers controlled by that service. We don't advocate being paranoid about using such a service, but it's good to know about and feel comfortable with whom you are sharing some of your data. However, you may wish to opt-out of having your AI "conversations" being used to further train their AI.

DEVONthink doesn't allow AI to have uncontrolled access to your data and databases but takes several steps to safeguard your privacy. By default, using the Chat assistant or any AI assisted automation only uses selected documents. When the database search option is enabled in the settings, only the selected items, items matched by smart rules, or the group selected in the Navigate sidebar, i.e., the current location, are accessible to AI, further limiting its reach.

DEVONthink also never sends your original document to an AI engine, taking these steps to keep things private:

Image files are scaled, recompressed, and sent without the original metadata.
For PDF documents without a text layer, a certain number of page thumbnails are sent, dependent on the AI model you're using.
Text-based documents only send raw text with no metadata.
In case of audio and video files, any available transcription or still images from video would be sent when chatting about them. Also, when using a remote transcription model like OpenAI's Whisper, the audio track is extracted and recompressed before it's sent for recognition.
To improve results and enhance privacy, links in content, including email addresses, are anonymized. This can also reduce token usage.
When using commercial AI models supporting tool calls, data is only sent on demand, never in advance.

Lastly, DEVONthink doesn't come with AI access enabled and running. It's up to you to set up and choose the options you want to use. So using external AI is completely optional, not a requirement, and not using it may be exactly the level of privacy you want.

Expenses: "There's no such thing as a free lunch." This applies to using a commercial LLM as well. While they typically offer a free account of some kind, it's very limited and made for familiarizing yourself with the process and responses. Once you determine this is something you want to use more often, get out your credit card. You will be purchasing tokens, again "bits of words", either in bulk or running up a tab as they're used. While they're typically relatively inexpensive per-token, heavier use of AI will deplete your reserves or increase your bill faster. As a courtesy, we've included property icons in the AI settings denoting if a particular chat model is known to be expensive.

Another thing to be aware of is the difference between using a chat agent, e.g., talking to ChatGPT, and using their API. An API is how third-party applications programmatically access services and data provided by a company. To use commercial AI services, you will need to create an API key for the service and enter it in the Chat > API Key settings. Check with your AI provider how to generate a key and any involved costs.

Here are some general recommendations that may help curb costs:

For everyday tasks and automation, use the cheapest model, e.g., Claude Haiku. DEVONthink automatically chooses the cheaper models for certain tasks, e.g., chatting with the Help viewer.
In the Usage dropdown are three options for broadly adjusting the number of tokens being used. This obviously affects your AI costs. While you can experiment with the settings, Auto strikes a good balance and is one less thing to think about. You can always temporarily change this for certain situations. Chatting about a document will use more tokens the longer the document. In this case, you could try setting the Usage to Cheapest.
Choose more expensive models if the lower-tiered model fails to produce a useful answer or you need the features of a higher model, like "thinking". The Chat assistant or AI-based smart actions, like the Chat - Query action, let you temporarily switch models.
When generating images, use very specific image prompts to avoid endlessly iterating through pictures that aren't what you're envisioning. While it can be fun to see what it comes up with, image generation is usually a more costly use of AI.

Quality of results: As we discussed previously, these chat engines are doing incredible computational gymnastics to produce responses but don't have any actual knowledge to draw from. It has no way to verify its answers in the way a human can. Earlier on they were known for "hallucinating", i.e., returning responses that were complete sentences but made no sense or had little to no relevance to the question asked. Things have certainly improved, and likely will continue to, but it is still possible to have incomplete or inaccurate responses. So while you may get a reasonable response, be aware the result isn't guaranteed to be accurate. Especially on questions of consequence, you should be checking the responses.

The limitations of local AI: No one wants to share their data. No want wants to pay for AI. We certainly understand those things. While there is an ever-growing list of "Run the AI of your choice on your Mac now!!" applications, let's take a realistic look at running your own AI:

Performance: AI requires a lot of computing resources to run well. Similar to how DEVONthink loads a database's index into memory for lightning-fast searches, classification, etc., an AI engine loads its model into memory. The larger the model, measured in the number of parameters, the more memory is required to effectively run it. However, many Macs have very limited RAM. Even a "good MacBook" with 16GB RAM, would only be able to run a small model of 6-8 billion parameters, given other apps and the operating system are also using machine resources. Those are very small models. Running larger models may be possible but they will be slower to the point of being inefficient.
Quality of responses: With the aforementioned limitations imposed by machine resources, you may feel a small model is just fine for you. While it may actually run, the responses you receive are almost certainly going to be from less accurate to hallucinating. Consider the disparity in knowledge between a child and an adult. Ask both, "Why is the sky blue?". A child would have a small number of "parameters" to drawn from to answer the question. But the adult, especially one with more specialized knowledge, would have a vastly larger set of parameters to respond from.
Context window: The context window is like the short term memory of AI, and it is a finite resource. The smaller the context window, the sooner AI will lose track of the "thread of the conversation". For commercial LLMs, this is large enough for ad-hoc inquiries but the size can be a limiting factor when trying to process longer documents, e.g., a scientific PDF. For locally run AI, the context window is much smaller. And using larger context windows locally consumes more resources.

However, if you're inclined and curious — and your hardware can support it — using local AI is certainly something you can explore.

What can't be done with AI

While there are many things that can be done, there are still limits imposed by privacy concerns or technical considerations. For example, AI has access to the current location in a main window, not all your databases. It also isn't going to operate like an automaton, creating databases, constructing its group structures, downloading and filling those groups, then examining and issuing reports on it all. It is an "assistant" in your labors, not your replacement.

Another critical thing to be aware of, AI is not going to "process and connect" years of your documents and information in your database. The way AI is hyped by many makes this sound feasible, but it actually is not. Will you be able to process documents in a useful way? Of course, but on a much more limited scale. So while we understand the hope, actually accomplishing this would be time and cost-prohibitive for most people and require sharing all those documents with third-parties.

We believe it's important to approach AI in DEVONthink with a good understanding of the possibilities and limitations (and yes, some things may change as technologies evolve). That all being said, we believe you'll find many uses for the extended abilities of AI in DEVONthink. In the next section, you'll find an overview of where AI is integrated, some practical use cases, and an important tip on how to use AI more effectively.