How we engineered an Activity Feeds solution at Freshworks
Why Platforms
Freshworks is a multi-product SaaS company offering numerous products in the customer experience, ITSM and CRM space, including Freshdesk, Freshservice, Freshworks CRM and more products you can find on our website. As you might imagine, there are various capabilities like notifications and emails that each of these products must offer, instances where the technology involved is deep in terms of complexity. We would benefit as an engineering organization by solving them exactly once. This is not a novel approach in most multi-product organizations, but this is exactly how we came to carving out a Platform within Freshworks to work on deep-tech problems that are not only built-to-scale for all our products, but also built to be plug-and-play systems for general integration into most use-cases our products will try to solve.
We will today look at one specific deep-tech problem we have tried to solve as part of our Freshworks Neo platform and share what we have learned in the process. This platform is internally code-named Hypertrail.
The Need for Hypertrail
Numerous modules in Freshworks’ products require capturing activities for different entities and use them to display a timeline. This entity could be a Freshdesk or Freshservice support ticket, a Freshdesk Contact, a Freshservice Workflow or even a product account (for admin activities/audit logs). We shall shortly look at the use-cases we tried to solve, but you can picture this as a service that persists activity trails in a chronological sense for any given entity while offering ways to filter the timeline on a set of predefined properties and report on these activities.
An activity consists of an interaction on an entity. For example, if the entity is a Freshdesk support ticket, its activities can include changing its properties like status, priority, agent assigned, attaching a note, etc.
We envisioned Hypertrail as Freshworks’ in-house solution to solve this for all our products in one fell swoop. With Hypertrail, any product in need of Activity timelines can rely on a built-to-scale, highly available, highly reliable environment to integrate within production.
Hypertrail today processes over 350 million activities every week and also serves over 34 million activity fetches a week. It hosts approximately 3TB of data in our US data center alone.
A closer look
Let’s take a closer look at what Hypertrail does through an example.
Freshdesk has the need to display activities that happen within the scope of a Freshdesk account (a typical audit log use-case) while also providing filtering for certain property fields (like users, agents, etc.). To meet this requirement, one could envision a module that extracts the required subset of data from the available event activity record to keep only the necessary information and performs transformations to store it in the required activity format.The backing service will also have an efficient and robust data storage model for fast retrieval while maintaining the ability to chronologically order data as well as apply multiple filters.
Hypertrail (aka activity-serv) stores activity timelines and allows fetching activities in paginated chunks in chronological and reverse-chronological order. It also has features that enable the requester to fetch activities within a certain time frame. Audit logs on Freshdesk and Freshservice accounts are powered by Hypertrail. Hypertrail also has a feature where audit logs can be exported in bulk in the form of files.
Use-cases Solved
Hypertrail can:
- List all activities associated with a particular object. Eg. all activities that were recorded for a single support ticket.
- Consolidate activities across different objects with support for filters (eg. dashboard activities for tickets).
- Provide an audit trail to track admin activities across the system with filtering capabilities.
Hypertrail collects and stores activities of any entity and in-turn offers APIs for on-the-go usage. Hypertrail consumes data from Freshworks’ internal Kafka message bus, performs ETL (Extract-Transform-Load) on the activities data, pre-processes information to create indexes and stores them for easy retrieval.
Platform Design
Let’s delve deeper into how this service is structured.
Hypertrail consists of a subsystem called Feeds, which is responsible for storing and indexing the data.
The Feeds subsystem aims to optimise data access for any use case which has lists and one or more property filters on them. It pre-processes and organises the ingested data based on the pre-registered types and properties. This enables pre-built filters for dashboards and audit-trail data. It uses MySQL (hosted using AWS RDS) and a self-hosted Cassandra (OSS: Apache Cassandra) cluster as data stores.
We store activity data along with some other mappings that help us with efficient retrieval of activities on Cassandra while we store our indexes on sharded MySQL databases. We have designed the indexes to be row oriented instead of column oriented and they are stored in huge tables in a EAV (Entity-Attribute-Value – wiki) format. This format allows us to do exact match filtering. The indexes generated for filtering are unique for each entity’s filter-value pair. Thus, we can have multiple indexes for one activity item. These indexes are mapped to the same activity item and are stored in separate columns (EAV model). When the time comes for fetching activities on one such filter-value pair, we can reverse calculate the index and efficiently fetch only the desired activities.
The indexes are additionally compressed for efficient storage. We have prioritised fetching data to be more efficient than ingesting it, therefore we’ve made sure that data is read optimised versus being write optimised. Range scans are very efficient in this model and the InnoDB B+ trees (MySQL reference) are leveraged to the fullest extent. The B+ tree structure ensures that a fixed maximum number of reads would be required to access any data requested based on the depth of the tree, which scales nicely with respect to the amount of indexes.
The raw data in Cassandra is stored at the top level as key-value and evenly spread across the cluster using Cassandra virtual nodes. This enables a predictable fetch latency and reduces the chances of hotspots. Cassandra has a ring-type architecture and data is spread across the nodes in the cluster based on token range. In clusters with a large number of nodes, there are chances of disbalance with some nodes being responsible for a relatively larger amount of data. With cassandra vnodes, each node’s token-range can be split into multiple smaller ranges and can be spread around the ring and each node can now be responsible for multiple token-ranges, instead of only one. Since the data is immutable, the write-amplification in Cassandra is negligible.
At Scale!
From January to September 2020, in our US data center, Hypertrail processed 3.45 billion events with a processing latency of 73.6ms (p95). Hypertrail has also served over 1.02 billion fetches with a latency of 27.8ms (p95).
In our EU data center, Hypertrail has processed 373 million events with a process latency of 33.6ms (p95) and has served over 122 million fetches with a latency of 20.4ms (p95).
Number of Activities ingested/served in US region:
Activities ingested per week(US)
Fetches per week(US)
From the start of the year 2020, Hypertrail has scaled from ingesting 40M activities a week to 300M+, and serving around 1.3M fetches a week to 34M in the 12-month period.
As per the size of data, in US-East, Hypertrail has around 3TB of data stored in the self-hosted Cassandra cluster and has more room to scale to even bigger loads.
Upcoming Enhancements
In 2021, we are going to upscale even further and add another 3TB of data in the US region alone.
While actively ensuring the service continues to scale reliably, we are also planning to invest in a few important areas in the immediate future.
Self-service: As more products wish to take up our service and adopt it to newer and deeper activity tracking use-cases, we would like to take our own engineers out of the loop as much as possible during this “onboarding” process. With strategic tools and automation, we would like engineers from product teams to simply raise a request for a new use-case to be onboarded. With the necessary details in their hand, they can complete the onboarding on their own.
Cassandra Management: It often becomes cumbersome to maintain a Cassandra cluster of significant size. Cassandra repair(Cassandra Repair) is a necessary task to maintain data consistency. By researching tools or building some automation logic and tools by ourselves, we aim to knock this burden off our engineers as well.
Currently, Hypertrail’s primary source of ingestion is through Kafka. We also plan to explore other event delivery systems, like SQS, which can be supported by Hypertrail to expand its scope of onboarding new use cases.
Cover Design: Vignesh Rajan