One new contestant to bring down the King: Apache Pulsar

One new contestant to bring down the King: Apache Pulsar

Apache Kafka seems to be the standard solution in nowadays architecture, but we should focus if it is the right choice for our needs.

One new contestant to bring down the King: Apache Pulsar
Photo by Ross Sokolovski on Unsplash

Nowadays, we’re in a new age of Event-Driven Architecture, and this is not the first time we’ve lived that. Before microservices and cloud, EDA was the new normal in enterprise integration. Based on different kinds of standards, there where protocols like JMS or AMQP used in broker-based products like TIBCO EMS, Active MQ, or IBM Websphere MQ, so this approach is not something new.

With the rise of microservices architectures and the API lead approach, it seemed that we’ve forgotten about the importance of the messaging systems, and we had to go through the same challenges we saw in the past to come to a new messaging solution to solve that problem. So, we’re coming back to EDA Architecture, pub-sub mechanism, to help us decouple the consumers and producers, moving from orchestration to choreography, and all these concepts fit better in nowaday worlds with more and more independent components that need cooperation and integration.

During this effort, we started to look at new technologies to help us implement that again. Still, with the new reality, we forgot about the heavy protocols and standards like JMS and started to think about other options. And we need to admit that we felt that there is a new king in this area, and this is one of the critical components that seem to be no matter what in today’s architecture: Apache Kafka.

And don’t get me wrong. Apache Kafka is fantastic, and it has been proven for so long, a production-ready solution, performant, with impressive capabilities for replay and powerful API to ease the integration. Apache Kafka has some challenges in this cloud-native world because it doesn’t play so well with some of its rules.

If you have used Apache Kafka for some time, you are aware that there are particular challenges with it. Apache Kafka has an architecture that comes from its LinkedIn days in 2011, where Kubernetes or even Docker and container technologies were not a thing, that makes to run Apache Kafka (purely stateful service) in a container fashion quite complicated. There are improvements using helm charts and operators to ease the journey, but still, it doesn’t feel like pieces can integrate well into that fashion. Another thing is the geo-replication that even with components like MirrorMaker, it is not something used, works smooth, and feels integrated.

Other technologies are trying to provide a solution for those capabilities, and one of them is also another Apache Foundation project that has been donated by Yahoo! and it is named Apache Pulsar.

Don’t get me wrong; this is not about finding a new truth, that single messaging solution that is perfect for today’s architectures: it doesn’t exist. In today’s world, with so many different requirements and variables for the different kinds of applications, one size fits all is no longer true. So you should stop thinking about which messaging solution is the best one, and think more about which one serves your architecture best and fulfills both technical and business requirements.

We have covered different ways for general communication, with several specific solutions for synchronous communication (service mesh technologies and protocols like REST, GraphQL, or gRPC) and different ones for asynchronous communication. We need to go deeper into the asynchronous communication to find what works best for you. But first, let’s speak a little bit more about Apache Pulsar.

Apache Pulsar

Apache Pulsar, as mentioned above, has been developed internally by Yahoo! and donated to the Apache Foundation. As stated on their official website, they are several key points to mention as we start exploring this option:

  • Pulsar Functions: Easily deploy lightweight compute logic using developer-friendly APIs without needing to run your stream processing engine
  • Proven in production: Apache Pulsar has run in production at Yahoo scale for over three years, with millions of messages per second across millions of topics
  • Horizontally scalable: Seamlessly expand capacity to hundreds of nodes
  • Low latency with durability: Designed for low publish latency (< 5ms) at scale with strong durability guarantees
  • Geo-replication: Designed for configurable replication between data centers across multiple geographic regions
  • Multi-tenancy: Built from the ground up as a multi-tenant system. Supports Isolation, Authentication, Authorization, and Quotas
  • Persistent storages: Persistent message storage based on Apache BookKeeper. Provides IO-level isolation between write and read operations
  • Client libraries: Flexible messaging models with high-level APIs for Java, C++, Python and GO
  • Operability: REST Admin API for provisioning, administration, tools, and monitoring. Deploy on bare metal or Kubernetes.

So, as we can see, in its design, Apache Pulsar is addressing some of the main weaknesses of Apache Kafka as Geo-replication and their cloud-native approach.

Apache Pulsar provides support for the pub/sub pattern, but also provides so many capabilities that also place as a traditional queue messaging system with their concept of exclusive topics where only one of the subscribers will receive the message. Also provides interesting concepts and features used in other messaging systems:

  • Dead Letter Topics: For messages that were not able to be processed by the consumer.
  • Persistent and Non-Persistent Topics: To decide if you want to persist your messages or not during the transition.
  • Namespaces: To have a logical distribution of your topics, so an application can be grouped in namespaces as we do, for example, in Kubernetes so we can isolate some applications from the others.
  • Failover: Similar to exclusive, but when the attached consumer failed to process another takes the chance to process the messages.
  • Shared: To be able to provide a round-robin approach similar to the traditional queue messaging system where all the subscribers will be attached to the topic, but the only one will receive the message, and it will distribute the load along all of them.
  • Multi-topic subscriptions: To be able to subscribe to several topics using a regexp (similar to the Subject approach from TIBCO Rendezvous, for example, in the 90s) that has been so powerful and popular.

But also, if you require features from Apache Kafka, you will still have similar concepts as partitioned topics, key-shared topics, and so on. So you have everything at your hand to choose which kind of configuration works best for you and your specific use cases, you also have the option to mix and match.

Apache Pulsar Architecture

Apache Pulsar Architecture is similar to other comparable messaging systems today. As you can see in the picture below from the Apache Pulsar website, those are the main components of the architecture:

One new contestant to bring down the King: Apache Pulsar
  • Brokers: One or more brokers handles incoming messages from producers, dispatches messages to consumers
  • BookKeeper Cluster for persistent storage of messages management
  • ZooKeeper Cluster for management purposes.

So you can see this architecture is also quite similar to the Apache Kafka one again with the addition of a new concept of the BookKeeper Cluster.

Broker in Apache Pulsar are stateless components that mainly will run two pieces

  • HTTP Server that exposes a REST API for management and is used by consumers and producers for topic lookup.
  • TCP Server using a binary protocol called dispatcher that is used for all the data transfers. Usually, Messages are dispatched out of a managed ledger cache for performance purposes. But also if this cache grows too big, it will interact with the BookKeeper cluster for persistence reasons.

To support the Global Replication (Geo-Replication), the Brokers manage replicators that tail the entries published in the local region and republish them to the remote regions.

Apache BookKeeper Cluster is used as persistent message storage. Apache BookKeeper is a distributed write-ahead log (WAL) system that manages when messages should be persisted. It also supports horizontal scaling based on the load and multi-log support. Not only messages are persistent but also the cursors that are the consumer position for a specific topic (similar to the offset in Apache Kafka terminology)

Finally, Zookeeper Cluster is used in the same role as Apache Kafka as a metadata configuration storage cluster for the whole system.

Hello World using Apache Pulsar

Let’s see how we can create a quick “Hello World” case using Apache Pulsar as a protocol, and to do that, we’re going to try to implement it in a cloud-native fashion. So we will do a single-node cluster of Apache Pulsar in a Kubernetes installation and deploy a producer application using Flogo technology and a consumer application using Go. Something similar to what you can see in the diagram below:

One new contestant to bring down the King: Apache Pulsar
Diagram about the test case we’re doing

And we’re going to try to keep it simple, so we will just use pure docker this time. So, first of all, just spin up the Apache Pulsar server and to do that we will use the following command:

docker run -it -p 6650:6650 -p 8080:8080 --mount source=pulsardata,target=/pulsar/data --mount source=pulsarconf,target=/pulsar/conf apachepulsar/pulsar:2.5.1   bin/pulsar standalone

And we will see an output similar to this one:

One new contestant to bring down the King: Apache Pulsar

Now, we need to create simple applications, and for that, Flogo and Go will be used.

Let’s start with the producer, and in this case, we will use the open-source version to create a quick application.

First of all, we will just use the Web UI (dockerized) to do that. Run the command:

docker run -it -p 3303:3303 flogo/flogo-docker eula-accept

And we install a new contribution to enable the Pulsar publisher activity. To do that we will click on the “Install new contribution” button and provide the following URL:

flogo install github.com/mmussett/flogo-components/activity/pulsar

And now we will create a simple flow as you can see in the picture below:

One new contestant to bring down the King: Apache Pulsar

We will now build the application using the menu, and that’s it!

One new contestant to bring down the King: Apache Pulsar

To be able to run just launch the application as you can see here:

./sample-app_linux_amd64

Now, we just need to create the Go-lang consumer to be able to do that we need to install the golang package:

go get github.com/apache/pulsar-client-go/pulsar

And now we need to create the following code:

package main
import (
 “fmt”
 “log”
 “github.com/apache/pulsar-client-go/pulsar”
)
func main() {
 client, err := pulsar.NewClient(pulsar.ClientOptions{URL: “pulsar://localhost:6650”})
 if err != nil {
 log.Fatal(err)
 }
defer client.Close()
channel := make(chan pulsar.ConsumerMessage, 100)
options := pulsar.ConsumerOptions{
 Topic: “counter”,
 SubscriptionName: “my-subscription”,
 Type: pulsar.Shared,
 }
options.MessageChannel = channel
consumer, err := client.Subscribe(options)
 if err != nil {
 log.Fatal(err)
 }
defer consumer.Close()
// Receive messages from channel. The channel returns a struct which contains message and the consumer from where
 // the message was received. It’s not necessary here since we have 1 single consumer, but the channel could be
 // shared across multiple consumers as well
 for cm := range channel {
 msg := cm.Message
 fmt.Printf(“Received message msgId: %v — content: ‘%s’\n”,
 msg.ID(), string(msg.Payload()))
consumer.Ack(msg)
}
}

And after running both programs, you can see the following output as you can see, we were able to communicate both applications in an effortless flow.

One new contestant to bring down the King: Apache Pulsar

This article is just a starting point, and we will continue talking about how to use Apache Pulsar in your architectures. If you want to take a look at the code we’ve used in this sample, you can find it here:

[visual-link-preview encoded=”eyJ0eXBlIjoiZXh0ZXJuYWwiLCJwb3N0IjowLCJwb3N0X2xhYmVsIjoiIiwidXJsIjoiaHR0cHM6Ly9naXRodWIuY29tL2FsZXhhbmRyZXYvcHVsc2FyLXNhbXBsZS1mbG9nby1nbyIsImltYWdlX2lkIjowLCJpbWFnZV91cmwiOiIiLCJ0aXRsZSI6IiIsInN1bW1hcnkiOiIiLCJ0ZW1wbGF0ZSI6InVzZV9kZWZhdWx0X2Zyb21fc2V0dGluZ3MifQ==”]

Microservices in SOA World and Enterprise Integrations

Last two-years everyone is talking about Microservices and they want to apply it anywhere.

It’s the same story with containers and docker (and it was before with Cloud Approach and even before that with SOA, BPM, EDA….).

Anything that has enough buzz from the community, it results with all customers (and “all kind” of customers) trying to apply the “new fashion” no matter what. Because of that all the System Integrator trying to search for somewhere where it fits (or even if it doesn’t fit…) to apply this “new thing” because it is “what we have to do now”. It’s like the fashion business. What is trendy today? That? Ok, Let’s do that.

Don’t get my wrong, this post is not going to be against microservice because I love the concept and I love the advantages it comes with it and how good it is to go to this kind of model.

But, I’d like to talk about some specific aspects that were not in the common talk and experience with this kind of technology. This pattern, model or paradigm, it is great and it is a proven success.

You can watch any Adrian Cockcroft talk about his experience at Netflix to be sure this is not only a BuzzWord (#NotOnlyABuzzWord) but, is it able to be use on all cases?

When we usually watch a talk about microservices is always the same story: microservices vs monolith application, especially web applications following some kind of client — server pattern or even a MVC pattern or something similar. And for that kind of applications is great, simple and clear.

But, what about Enterprise Applications where from the last decades we were following a SOA Approach: Is it applicable here?

For sure there are a lot of differences between Microservice Approach (the pure one, the one that Martin Fowler used in his article) and the SOA Paradigm. They don’t share the same principles but at the end they are closer than the usual contestants you see in all the talks (monolith webapp vs microservices)

Microservices talks about breaking the monolith and that’s easy for a web application, but what about an SOA Architecture? In this case is not even possible to go down that path.

If you ever have worked in Enterprise Integration you have seen some silos and it is mandatory to do not touch them and keep them that way. It is something not open to discuss.

They existed different reasons for that decision: It could be because they are so legacy no one knows about them, about how they do what they do, or could be because they are so critical no-one is going to down-path or only because they are not business-case to justify to replace this kind of silos.

So, what about now? Can we go down the path Microservices or should we stick with SOA approach? Microservices is not only about breaking the silos but is something very important, so no, we can not go the Microservices path for Enterprise Integrations, but we can gather all the other advantages the Microservices includes and try to applying it to our integration layer (now, we wouldn’t be talking about SOA Architecture because most of this advantages are against some of the SOA principles)

Microservices Wave is also about Agile & DevOps, about to be faster, to be automated, to be better, to reduce your time to market. It is about cloud (not in the term or public cloud but in the term that not be tied to your infrastructure). It is all about that things too.

So, Microservices are about so many things that we could apply even if we couldn’t go 100% over this. There are several names to this approach like Service-Based Architecture, but I’d like much more the micro-services approach (with dash in between, talking about services that are micro) because I think it explains better the approach.

So, we’d like to do smaller services to be more flexible, to be able to apply all this Devops things, and there we can apply all the other things related to the Microservices Wave.

And that’s not something new, that’s not something that is starting now or in the last years.

It is something that I’ve been seen since the beginning in my career (a decade ago) when I’ve started working with TIBCO AMX BusinessWorks that gives you the chance to decide yourself the scope of your services and depending on the needs to could create “Enterprise Applications” or you could go for “Enterprise Services” or “Small Services” that worked together to do the job.

And that path has been followed not only by TIBCO but some other companies as well, with the evolution of the ESB concept to be adapted for the new era, that were more like PaaS where allowed you to run your services in a “some-kind” of containerized world.

For example, TIBCO AMX Platform (from 2006) you could develop your services and applications using several kind of languages and options like the Graphical Editor for Mediations or Java, C/C++, .NET, Spring and so on using SCA standard and running on a elastic OSGI-based platform where you can manage all of them in the same way (sounds similar, right? 🙂 )

What about reusing? SOA paradigm have very high standard to ensure the reuse of the services and entreprise registry and repository… and microservice is (at the beggining) against reuse, you should duplicate instead of reusing to be able to be self-contained and more free. But, the latest advances on Microservices includes an Orchestration layer, things like Conductor that are going the path of reusing and orchestration. So, we can find a middle place, when you need to reuse if possible but not stop your agility to ensure 100% reuse of the chances available. Time to market is the critical driver here for everyone now, and all the “principles” have to adapt to that.

What about DevOps and Cloud? No problem, here you could include the same techniques for this case like you were doing previously. Infrastructure as Code, Contianers, Continuous Integration & Deployment and so on.

What about agile standards REST/JSON and so on? No problems here as well.

In summary, you can adopt and implement most of the flavors and components of the Microservices movement, but you need to compromise on others as well, and you are not going to be used “pure” Microservices, you are going to use another thing, and that’s not bad. You always have to adapt any paradigm for your specific use case.