Next Generation Real-Time Analytics

Building Agentic Applications

Aug 02, 2024

Recently, there has been a lot of talk in AI circles about the term "agentic." When we refer to AI systems as "agentic" or "agents," we highlight their capacity to act autonomously and proactively, much like a human agent would. An agentic AI system can make decisions, take actions, and engage with its surroundings independently, aiming to accomplish set goals or tasks. This idea of agency in AI marks a transition from passive systems that follow preset instructions to active entities that can think, act, and react independently within defined parameters.

In this post, I'd like to talk about how AI influences real-time analytics by allowing users to interact with their real-time analytics through agents.

Agents

The easiest way for me to learn agents was to run an example and reverse engineer it. I started with a basic agent I took from Llamaindex.ai.

Agents have a task, and to execute these tasks, they need tools. Below, we define two tools: multiply and add. They are just simple Python functions.

from llama_index.core.tools import FunctionTool

def multiply(a: float, b: float) -> float:
    """Multiply two numbers and returns the product"""
    return a * b

def add(a: float, b: float) -> float:
    """Add two numbers and returns the sum"""
    return a + b

multiply_tool = FunctionTool.from_defaults(fn=multiply)
add_tool = FunctionTool.from_defaults(fn=add)

AI agents also need an LLM. Below we create an agent and pass in an LLM.

from llama_index.core.agent import ReActAgent

# Make sure you set an environment variable
# OPENAI_API_KEY=sk-proj-xxxx
llm = OpenAI(model="gpt-4o-mini", temperature=0)

# create the agent
agent = ReActAgent.from_tools(
    tools=[multiply_tool, add_tool], 
    llm=llm, 
    verbose=True
)

Now, we can ask it to perform the task below:

response = agent.chat("What is 20+(2*4)? Use a tool to calculate every step.")

Thought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: multiply
Action Input: {'a': 2, 'b': 4}
Observation: 8
Thought: I need to add 20 to the result of the multiplication.
Action: add
Action Input: {'a': 20, 'b': 8}
Observation: 28
Thought: I can answer without using any more tools. I'll use the user's language to answer
Answer: The result of 20 + (2 * 4) is 28.
The result of 20 + (2 * 4) is 28.

So simply, we defined a set of tools that do work, asked an agent to perform a task through a chat. We didn't code the logic of the task, the LLM performed a task based on the user input and picked the appropriate tools to perform the task.

As long as you provide the correct tools for the LLM, it will assemble the logic and execute the task.

Real-Time Interactive Analytics

Let's consider the idea of a traditional dashboard that provides real-time analytics. A stakeholder can monitor the business's health based on the metrics provided in a dashboard. This person can't interact with the analytics beyond simple drill-down and filtering that is prebuilt into the dashboard. What if we provide this person the ability to ask questions that extend beyond the scope of the dashboard? Let's build this out.

First, let's assume this stakeholder already has a real-time dashboard and is looking to find answers to more questions she may have. The use case is clickstream data. Below is a sample.

This data is streamed through Kafka into a Real-Time OLAP database, Apache Pinot.

Interactive Analytics

We've written an agent below and equipped it with these tools:

top_three_tool: This tool gets the top 3 purchasers in the application from the past week.
most_time_spent_tool: This tool gets the top 3 users that have spent the most time on the application in the past week.
get_active_users_tool: This tool gets all the active users of the application.
user_info_tool: This tool gets the details about a user.

All of these tools we've provided query a REALTIME table in Pinot, which Kafka is populating.

Create a connection to Pinot.

from typing import List
from pinotdb import connect
from llama_index.llms.openai import OpenAI
from llama_index.core.tools import FunctionTool
from llama_index.agent.openai import OpenAIAgent

import nest_asyncio

nest_asyncio.apply()

# THE CONNECTION TO PINOT
conn = connect(host='xxxxx:xxxx@broker.pinot.xxxxx.startree.cloud', 
                port=443, 
                path='/query/sql', 
                scheme='https',
                database="xxxxx")

Next, define the tools.

# DEFINE THE TOOLS
def get_active_users() -> List[str]:
    curs = conn.cursor()
    sql = f"""
SELECT 
    distinct user_id
from clickstream
    """
    top = []
    curs.execute(sql)
    for row in curs:
        print(row)
        top.append(row[0])
    return top

def get_user_details(user:str): 
    curs = conn.cursor()
    sql = f"""
SELECT 
    user_id,
    phone,
    adddress
from customers
where user_id = '{user}'
    """
    curs.execute(sql)
    if(len(curs) > 0):
        row = curs.next()
        return { "user":row[0], "phone": row[1], "address": row[2] }
    else:
        return None


def top_three() -> List[str]:
    curs = conn.cursor()
    sql = f"""
SELECT 
    user_id,
    count(event_type) as purchases
from clickstream
where 
    event_type = 'purchase'
    and timestamp > now() - 604800000 -- 1 week
group by user_id
order by purchases desc
limit 3
    """
    top = []
    curs.execute(sql)
    for row in curs:
        print(row)
        top.append(row[0])
    return top


def most_time_spent() -> List[str]:
    curs = conn.cursor()
    sql = f"""
SELECT 
    user_id
from clickstream
where timestamp > now() - 604800000 -- 1 week
order by duration desc
limit 3
    """
    top = []
    curs.execute(sql)
    for row in curs:
        print(row)
        top.append(row[0])
    return top

Next, provide tool details.


# PROVIDE TOOL DETAILS
top_three_tool = FunctionTool.from_defaults(
    fn=top_three,
    name="top_three",
    description="""
gets the top three purchasers in the application from the past week
""",
)

most_time_spent_tool = FunctionTool.from_defaults(
    fn=most_time_spent,
    name="most_time",
    description="""
gets the top 3 users that have spent the most time on the application in the past week
"""
)

get_active_users_tool = FunctionTool.from_defaults(
    fn=get_active_users,
    name="get_active_users",
    description="gets all the active users of the application"
)

user_info_tool = FunctionTool.from_defaults(
    fn= get_user_details,
    description="gets the details about a user"
)

Finally, create the agent and loop to chat with it.

# CREATE THE AGENT
agent = OpenAIAgent.from_tools(
    [
        top_three_tool, 
        most_time_spent_tool, 
        get_active_users_tool, 
        user_info_tool
    ],
    llm=OpenAI(temperature=0, model="gpt-4o-mini"),
    verbose=True,
    system_prompt="""
You are a Customer Success or Customer Experience agent \
that is looking to get users to spend more time \
in an application and ultimately provide incentives. \
You help this sales agent \
identify users of the application so that this person can \
personally contact them with additional incentives. \
""",
)


# INTERACT
while(True):
    ask = input("ask:")
    if(ask == "quit"):
        break;
    else:
        print(agent.chat(ask))

Notice the system_prompt provided to the agent. This is where we provide a role that can satisfy a set of tasks. Let's ask the question:

"To whom should I give additional incentives and how?"

Since verbose is set to True, we can see the invocation of the tool and its response. Also, notice the invocation of the tool get_user_details , which is the only one that takes an argument.

ask: to whom should I give additional incentives and how 

Added user message to memory: who should I give additional incentives and how 
=== Calling Function ===
Calling function: top_three with args: {}
['user_1', 7]
['user_5', 6]
['user_8', 6]
Got output: ['user_1', 'user_5', 'user_8']
========================

=== Calling Function ===
Calling function: most_time with args: {}
['user_6']
['user_1']
['user_3']
Got output: ['user_6', 'user_1', 'user_3']
========================

=== Calling Function ===
Calling function: get_user_details with args: {"user": "user_1"}
Got output: {'user': 'user_1', 'phone': '904.489.3398x881', 'address': '2925 Michael Key Suite 563\nLisastad, TN 19030'}
========================

=== Calling Function ===
Calling function: get_user_details with args: {"user": "user_5"}
Got output: {'user': 'user_5', 'phone': '3046371418', 'address': '322 Flores Village\nJeremymouth, KS 27564'}
========================

=== Calling Function ===
Calling function: get_user_details with args: {"user": "user_8"}
Got output: {'user': 'user_8', 'phone': '001-400-669-6891', 'address': '8550 Heath Manor\nDoyleview, MP 25852'}
========================

=== Calling Function ===
Calling function: get_user_details with args: {"user": "user_6"}
Got output: {'user': 'user_6', 'phone': '(625)264-4707', 'address': '01166 Michael Road\nWardchester, DC 93739'}
========================

=== Calling Function ===
Calling function: get_user_details with args: {"user": "user_3"}
Got output: {'user': 'user_3', 'phone': '+1-564-922-1177x3623', 'address': '09548 Samantha Island Apt. 910\nPort Lisafurt, FL 48437'}
========================

### LLM Response
To provide additional incentives, you should focus on the following users based on their engagement and spending:

### Top Purchasers:
1. **User 1**
   - Phone: 904.489.3398x881
   - Address: 2925 Michael Key Suite 563, Lisastad, TN 19030

2. **User 5**
   - Phone: 3046371418
   - Address: 322 Flores Village, Jeramymouth, KS 27564

3. **User 8**
   - Phone: 001-400-669-6891
   - Address: 8550 Heath Manor, Doyleview, MP 25852

### Most Time Spent:
1. **User 6**
   - Phone: (625)264-4707
   - Address: 01166 Michael Road, Wardchester, DC 93739

2. **User 1** (also a top purchaser)
   - Phone: 904.489.3398x881
   - Address: 2925 Michael Key Suite 563, Lisastad, TN 19030

3. **User 3**
   - Phone: +1-564-922-1177x3623
   - Address: 09548 Samantha Island Apt. 910, Port Lisafurt, FL 48437

### Recommendations for Incentives:
- **User 1**: Since they are both a top purchaser and have spent a lot of time, consider offering a loyalty program or exclusive discounts on future purchases.
- **User 5**: A special promotion or a gift card could encourage them to engage more with the app.
- **User 8**: A personalized message with a unique offer could entice them to return.
- **User 6**: Since they spend a lot of time, consider offering them a premium feature for free for a limited time.
- **User 3**: A referral bonus for bringing in new users could be effective.

By reaching out personally with tailored incentives, you can enhance user engagement and retention.
ask:which users live in new york
Added user message to memory: which users live in new york
Based on the information provided, none of the users listed (User 1, User 5, User 8, User 6, and User 3) have addresses that indicate they live in New York. Here are their locations again for reference:

1. **User 1**: Lisastad, TN
2. **User 5**: Jeramymouth, KS
3. **User 8**: Doyleview, MP
4. **User 6**: Wardchester, DC
5. **User 3**: Port Lisafurt, FL

If you need to find users specifically in New York, you may need to check a broader user database or specify additional criteria.

The tools that make up an agent can also combine keyword-based and similarity searches in a hybrid search.

Hybrid Search

Hybrid search involves combining traditional keyword-based search with advanced vector search capabilities. Another way of saying it is:

Combining structured search with similarity search.

By integrating vector search, which allows for similarity-based searches on data embeddings or vectors, with metadata filtering, users can retrieve more accurate and relevant results efficiently.

Below are two additional agent tools that can perform hybrid searches based on recent purchases when called in the right context. This will allow the agent to recommend other products.

get_last_purchases - Gets the last 3 purchases by the user.
get_similar_products - Takes a product description and searches for a similar product.


def get_last_purchases(user:str): 
    """
    Gets the last 3 purchases by the user.
    """

    curs = conn.cursor()
    sql = f"""
SELECT 
    description
from purchases
where user = '{user}'
order by timestamp desc
limit 3
"""
    curs.execute(sql)
    prods = []
    for row in curs:
        prods.append({ "product_id":row[0], "url": row[1], "description": row[3] })
    return prods;

def get_similar_products(product_description:str):
    """
    Takes a product description and does a search for a similar product.
    """
     
    desc_embed = get_embedding(product_description)

    curs = conn.cursor()
    sql = f"""
SELECT 
    product_id,
    cosine_distance(embedding, ARRAY{desc_embed}) AS dist, 
    url,
    description
from products
where VECTOR_SIMILARITY(embedding, ARRAY{desc_embed}, 10)
order by dist desc
limit 5
"""
    curs.execute(sql)
    prods = []
    for row in curs:
        prods.append({ "product_id":row[0], "url": row[1], "description": row[3] })

    return prods;

Here is the response from the agent to the question:

What product should I offer this user?

For **user_1**, you could offer a product similar to their recent purchase of a StarTree hat, which includes T-shirts with the same company name.

**Product Name**: Blue StarTree T-Shirt
**Description**: A blue T-Shirt that says StarTree on the front.

This aligns with the themes found in their recent purchases and could encourage them to engage more with the application.

Summary

Identifying the right tools for building agentic applications is often one of the most challenging aspects of the development process. It requires a thorough understanding of your objectives and the specific needs of your stakeholders. Once you've identified the tools, you may find it beneficial to combine them to enhance performance and minimize API calls, leading to a more efficient and streamlined system.

Without leveraging AI and large language models (LLMs), you would likely need to manually code the logic behind user requests, dissecting data and refining analytics until you achieve the desired results. This approach can be time-consuming and may not fully anticipate future needs. However, by creating a set of foundational tools, you can build a flexible framework that covers current requirements while also adapting to future use cases, ensuring your system remains robust and capable of evolving with changing demands.

Check out StarTree Cloud Free Tier — perfect for development and prototyping — and start running queries in minutes.
TRY FREE TIER

SUP! Hubert’s Substack

Discussion about this post