Skip to content

Data Pipeline: Overview’s Data Pipeline provides you with a fast and easy way to build in-house analytics atop flexible, clean, accessible, and real-time user interaction data.


Watch the Data Pipeline overview video

What is user interaction data?

Every time your users, customers, or prospects interact with your business online, they generate user interaction data.

This includes when they visit your website; read your content; use your mobile app; view your ad; sign up for a newsletter; buy a product/service; or do anything else associated with your business from a connected device. But this kind of data has traditionally been difficult to collect and access.

The kind of interactions users have with your business usually center around two things: users (visitors) and content (URLs). And the interactions can vary from the surface-level (such as content views) to the very detailed (such as precise number of seconds spent reading/watching a piece of content). measures all of this, providing a deep and detailed understanding of both visitors and the content on your site or in your apps/products. also provides robust infrastructure instrumenting your own user interactions, otherwise known as custom events.

Why do I need a Data Pipeline?

Though user interaction data is extremely valuable to businesses and organizations of all kinds, it’s typically locked behind a proprietary one-size-fits-all analytics dashboard. This means that the only way to access the data is via aggregated (rolled up) data exports, which lack of the level of granularity needed to tie user interactions to real business objectives.

Rather than spending time, money, and effort on engineering expensive data collection infrastructure from scratch, provides you with a cloud analytics pipeline that has already been scaled for hundreds of the web’s largest sites, and that already processes billions of monthly interactions.

Using Data Pipeline, you can turn any website or mobile app into a data stream of rich user interaction data — and you can do so in minutes, not months.

What makes different?

  • You own the data. The pipeline delivers you 100% of your raw, unsampled data. You get a stream with a firehose of every single event from your users, sites, and apps. Data is delivered fast: with end-to-end delivery times measured in seconds. You also get an elastic data store for full historical retention, stored in 15-minute chunks of compressed JSON data. There are no rollups. Every single event is captured, then stored securely and durably.
  • Delivered via standard AWS APIs.’s Data Pipeline runs in Amazon Web Services (AWS), and access is exposed to customers using standard AWS APIs: Amazon S3 for historical data and Amazon Kinesis Streams for real-time data. Production-quality client libraries exist for every popular programming language and analysis framework on the market. And you don’t need to be running in AWS to make use of them; many customers access the data from their local development machines, Google Cloud Platform or other cloud hosting providers.
  • Raw data for ultimate flexibility. Only raw event data (with one event per user interaction, aka hit-level or row-level data) can enable “sky’s the limit” ad-hoc analysis. With, every event is yours, so you can analyze it, transform it, alert on it, join it to other sources, or transfer it elsewhere.
  • Proven production technology. We already built out an amazing data collection stack that powers our real-time and historical content analytics dashboard that is used by thousands of users and hundreds of top websites.
  • Scaled architecture in minutes, not months. Our JavaScript tracker is served from global CDN and DNS. Our data collection beacons are distributed across geographic regions. We have an in-house infrastructure that guarantees data delivery and low latency, built atop the open source Kafka and Storm projects. All that infrastructure is yours at the flip of a switch.
  • A snap to integrate. The data formats we developed are especially easy for Python, R, Spark, Redshift, BigQuery: you name it. We also have standard integration schemas and recipes so that the data can be used in ad-hoc querying and dashboarding tools, such as Looker, Periscope, and Tableau.
  • Cheaper and easier than the alternatives. Because already runs an analytics service with massive scale — with over 50 billion monthly events from 475 million unique visitors — it can deeply discount its pipeline service. Several of our customers have found this product to represent an order-of-magnitude cost-savings, not to mention a 100% reduction in risks associated with development.

The fast path to custom analytics

In short,’s Data Pipeline gets you to your data insights faster. Don’t deal with the drudgery of building a real-time data collection and delivery pipeline. Don’t bang your head against some legacy vendor’s antiquated schemas and unclean data sources.

Instead, get on the fast path to custom analytics.

Next Steps

Further reading:

Help from our team:

  • Already a customer? Simply open a ticket, and we’ll be in touch with how to get you the raw data you’ve been looking for.
  • Not a customer? No worries! Fill out this form and someone will be in touch shortly.

Last updated: January 02, 2023