Data brokers

Data brokers#

We will define the term “data broker” as a service whose main goal is to store data for the convenient use of other services.

Kafka#

Run the Kafka container using the following command:

docker run -p 9092:9092 -itd --name experimenting_kafka -v ./:/knowledge apache/kafka:4.0.0

In order to run notebooks, you are supposed to intsall python with some additional packages:

docker exec -u root experimenting_kafka apk add gcc python3-dev py3-pip musl-dev linux-headers
docker exec experimenting_kafka python3 -m venv /home/appuser/venv
docker exec experimenting_kafka /home/appuser/venv/bin/pip install ipykernel bash_kernel
docker exec experimenting_kafka /home/appuser/venv/bin/python -m bash_kernel.install

So, finally, once you have successfully set up all the preparation in the folder /opt/kafka/bin, you should be able to see scripts to manipulate kafka. They are represented in the following cell:

ls /opt/kafka/bin

connect-distributed.sh		 kafka-jmx.sh
connect-mirror-maker.sh		 kafka-leader-election.sh
connect-plugin-path.sh		 kafka-log-dirs.sh
connect-standalone.sh		 kafka-metadata-quorum.sh
kafka-acls.sh			 kafka-metadata-shell.sh
kafka-broker-api-versions.sh	 kafka-producer-perf-test.sh
kafka-client-metrics.sh		 kafka-reassign-partitions.sh
kafka-cluster.sh		 kafka-replica-verification.sh
kafka-configs.sh		 kafka-run-class.sh
kafka-console-consumer.sh	 kafka-server-start.sh
kafka-console-producer.sh	 kafka-server-stop.sh
kafka-console-share-consumer.sh  kafka-share-groups.sh
kafka-consumer-groups.sh	 kafka-storage.sh
kafka-consumer-perf-test.sh	 kafka-streams-application-reset.sh
kafka-delegation-tokens.sh	 kafka-topics.sh
kafka-delete-records.sh		 kafka-transactions.sh
kafka-dump-log.sh		 kafka-verifiable-consumer.sh
kafka-e2e-latency.sh		 kafka-verifiable-producer.sh
kafka-features.sh		 trogdor.sh
kafka-get-offsets.sh		 windows
kafka-groups.sh

Topics#

These are folders that contain events. To manipulate with topics, use the ./kafka-topic.sh script. The most important options for the script are:

--create to create a new topic.
--list to show created topics.
--delete to delete the topic.

The following cell illustrates the use of the --create option.

./kafka-topics.sh --create --topic myFirstTopic --bootstrap-server localhost:9092

Created topic myFirstTopic.

After creating, we can list all the topics that have been created:

./kafka-topics.sh --bootstrap-server localhost:9092 --list

myFirstTopic

And the --delete for the myFirstTopic:

./kafka-topics.sh --delete --topic myFirstTopic --bootstrap-server localhost:9092

Redis#

It is a no sql database, that in practice, resembles a large hash table. The following list describes the basic features of the redis:

Sure — here are the main features of Redis with concise descriptions:

In-Memory Storage: Stores all data in RAM, enabling extremely fast read and write operations.
Persistence Options: Supports snapshots (RDB) and append-only files (AOF) to persist data to disk.
Data Structures: Provides advanced structures such as strings, lists, sets, sorted sets, hashes, streams, bitmaps, and hyperloglogs.
Pub/Sub Messaging: Enables real-time communication between services through publish/subscribe channels.
Transactions: Groups multiple commands into atomic operations using MULTI, EXEC, and WATCH.
Lua Scripting: Executes server-side scripts for atomic and complex operations.
Replication: Supports master–replica replication for data redundancy and read scalability.
High Availability (Sentinel): Monitors Redis instances and performs automatic failover when needed.
Clustering: Distributes data across multiple nodes for horizontal scaling and fault tolerance.
Streams: Provides a log-based data structure for message queues and event sourcing.
Geospatial Indexes: Stores and queries location data using geohashes and radius queries.
Modules System: Allows extending Redis with custom data types and commands.
Eviction Policies: Manages memory automatically with configurable eviction strategies (e.g., LRU, LFU).
ACID-like Atomicity: Each individual Redis command executes atomically.
Built-in Caching Features: Includes TTLs, expirations, and key invalidation for efficient caching.

Check more in Redis page.

Mongo DB#

Mongo is a no-SQL database. It keeps data in json-like format.

For communication with the lanched database use mongosh CLI tool: Mongo DB shell Download.

To run a command without entering the the shell, use the --eval "command" option in mongosh.

Data in MongoDB is organizes as follows:

Databases.
Collection: a group of documents. Do not enfoce a schema: different documents can have different fields.
Document: is a single BSON document (binary represenation of the JSON documents).

Run the database with command:

docker run --name mongodb -p 27017:27017 -d mongo:8.0.15 &> /dev/null

If you lanched Mongo correctly, you should receive the response from it as illustrated on the following page.

curl localhost:27017

It looks like you are trying to access MongoDB over HTTP on the native driver port.

The following cell shows the statisctics for the current mongoDB.

mongosh --eval "db.stats()"

]0;mongosh mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000{
  db: 'test',
  collections: Long('1'),
  views: Long('0'),
  objects: Long('1'),
  avgObjSize: 29,
  dataSize: 29,
  storageSize: 20480,
  indexes: Long('1'),
  indexSize: 20480,
  totalSize: 40960,
  scaleFactor: Long('1'),
  fsUsedSize: 169364910080,
  fsTotalSize: 501809635328,
  ok: 1
}