Data brokers#
We will define the term “data broker” as a service whose main goal is to store data for the convenient use of other services.
Kafka#
Run the Kafka container using the following command:
docker run -p 9092:9092 -itd --name experimenting_kafka -v ./:/knowledge apache/kafka:4.0.0
In order to run notebooks, you are supposed to intsall python with some additional packages:
docker exec -u root experimenting_kafka apk add gcc python3-dev py3-pip musl-dev linux-headers
docker exec experimenting_kafka python3 -m venv /home/appuser/venv
docker exec experimenting_kafka /home/appuser/venv/bin/pip install ipykernel bash_kernel
docker exec experimenting_kafka /home/appuser/venv/bin/python -m bash_kernel.install
So, finally, once you have successfully set up all the preparation in the folder /opt/kafka/bin, you should be able to see scripts to manipulate kafka. They are represented in the following cell:
ls /opt/kafka/bin
connect-distributed.sh kafka-jmx.sh
connect-mirror-maker.sh kafka-leader-election.sh
connect-plugin-path.sh kafka-log-dirs.sh
connect-standalone.sh kafka-metadata-quorum.sh
kafka-acls.sh kafka-metadata-shell.sh
kafka-broker-api-versions.sh kafka-producer-perf-test.sh
kafka-client-metrics.sh kafka-reassign-partitions.sh
kafka-cluster.sh kafka-replica-verification.sh
kafka-configs.sh kafka-run-class.sh
kafka-console-consumer.sh kafka-server-start.sh
kafka-console-producer.sh kafka-server-stop.sh
kafka-console-share-consumer.sh kafka-share-groups.sh
kafka-consumer-groups.sh kafka-storage.sh
kafka-consumer-perf-test.sh kafka-streams-application-reset.sh
kafka-delegation-tokens.sh kafka-topics.sh
kafka-delete-records.sh kafka-transactions.sh
kafka-dump-log.sh kafka-verifiable-consumer.sh
kafka-e2e-latency.sh kafka-verifiable-producer.sh
kafka-features.sh trogdor.sh
kafka-get-offsets.sh windows
kafka-groups.sh
Topics#
These are folders that contain events. To manipulate with topics, use the ./kafka-topic.sh script. The most important options for the script are:
--createto create a new topic.--listto show created topics.--deleteto delete the topic.
The following cell illustrates the use of the --create option.
./kafka-topics.sh --create --topic myFirstTopic --bootstrap-server localhost:9092
Created topic myFirstTopic.
After creating, we can list all the topics that have been created:
./kafka-topics.sh --bootstrap-server localhost:9092 --list
myFirstTopic
And the --delete for the myFirstTopic:
./kafka-topics.sh --delete --topic myFirstTopic --bootstrap-server localhost:9092
Redis#
It is a no sql database, that in practice, resembles a large hash table. The following list describes the basic features of the redis:
Sure — here are the main features of Redis with concise descriptions:
In-Memory Storage: Stores all data in RAM, enabling extremely fast read and write operations.
Persistence Options: Supports snapshots (RDB) and append-only files (AOF) to persist data to disk.
Data Structures: Provides advanced structures such as strings, lists, sets, sorted sets, hashes, streams, bitmaps, and hyperloglogs.
Pub/Sub Messaging: Enables real-time communication between services through publish/subscribe channels.
Transactions: Groups multiple commands into atomic operations using
MULTI,EXEC, andWATCH.Lua Scripting: Executes server-side scripts for atomic and complex operations.
Replication: Supports master–replica replication for data redundancy and read scalability.
High Availability (Sentinel): Monitors Redis instances and performs automatic failover when needed.
Clustering: Distributes data across multiple nodes for horizontal scaling and fault tolerance.
Streams: Provides a log-based data structure for message queues and event sourcing.
Geospatial Indexes: Stores and queries location data using geohashes and radius queries.
Modules System: Allows extending Redis with custom data types and commands.
Eviction Policies: Manages memory automatically with configurable eviction strategies (e.g., LRU, LFU).
ACID-like Atomicity: Each individual Redis command executes atomically.
Built-in Caching Features: Includes TTLs, expirations, and key invalidation for efficient caching.
Check more in Redis page.
Mongo DB#
Mongo is a no-SQL database. It keeps data in json-like format.
For communication with the lanched database use mongosh CLI tool: Mongo DB shell Download.
To run a command without entering the the shell, use the --eval "command" option in mongosh.
Data in MongoDB is organizes as follows:
Databases.
Collection: a group of documents. Do not enfoce a schema: different documents can have different fields.
Document: is a single BSON document (binary represenation of the JSON documents).
Run the database with command:
docker run --name mongodb -p 27017:27017 -d mongo:8.0.15 &> /dev/null
If you lanched Mongo correctly, you should receive the response from it as illustrated on the following page.
curl localhost:27017
It looks like you are trying to access MongoDB over HTTP on the native driver port.
The following cell shows the statisctics for the current mongoDB.
mongosh --eval "db.stats()"
]0;mongosh mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000{
db: 'test',
collections: Long('1'),
views: Long('0'),
objects: Long('1'),
avgObjSize: 29,
dataSize: 29,
storageSize: 20480,
indexes: Long('1'),
indexSize: 20480,
totalSize: 40960,
scaleFactor: Long('1'),
fsUsedSize: 169364910080,
fsTotalSize: 501809635328,
ok: 1
}