If you want to pull data from a MySQL database into a Kafka topic, or export messages from a Kafka topic to Elasticsearch, you might start writing a custom producer or consumer application. While this works, writing and maintaining custom connection code is tedious and repetitive.

To solve this, Kafka provides **Kafka Connect**—a free, open-source framework designed specifically for streaming data between Kafka and external databases, search indexes, key-value stores, and cloud storage systems without writing a single line of code.

Kafka Connect Source vs. Sink Connector Architecture
Real-World Analogy: Airport Baggage Conveyer Belts

Imagine managing an airport terminal (Kafka Topic):

  • Source Connector is like the check-in luggage belt. It takes suitcases from individual travelers (external database tables) and carries them automatically into the central sorting area (Kafka topic).
  • Sink Connector is like the arrivals carousel belt. It takes suitcases out of the central sorting area (Kafka topic) and spins them out so passengers can pick them up and load them into their cars (Elasticsearch, S3, databases).

Source Connectors vs. Sink Connectors

1. Source Connector (Ingestion)

A Source Connector imports data from external systems into Kafka. For example, a **Debezium Source Connector** monitors transaction logs in a MySQL database. Whenever a row is added or updated, the connector instantly publishes that change as a structured message to a Kafka topic.

2. Sink Connector (Export)

A Sink Connector exports data from Kafka topics to external systems. For example, an **Elasticsearch Sink Connector** reads new messages from a Kafka topic and indexes them directly in Elasticsearch, making the data instantly searchable.

Under the Hood: Workers and Tasks

Kafka Connect runs on dedicated cluster servers called **Workers**. When you configure a connector via a simple REST API JSON payload, the worker splits the workload into parallel processes called **Tasks**. If a worker server fails, other workers automatically inherit its active tasks, making the ingestion pipeline completely fault-tolerant.

Example: REST Configuration JSON

To run a connector, you do not write Java code. Instead, you send a configuration JSON block to the Kafka Connect REST API endpoint:

// PUT /connectors/mysql-source-connector/config
{
  "connector.class": "io.debezium.connector.mysql.MySqlConnector",
  "tasks.max": "2",
  "database.hostname": "mysql-db-server",
  "database.port": "3306",
  "database.user": "debezium",
  "database.password": "db-pass",
  "database.server.id": "184054",
  "database.server.name": "dbserver1",
  "database.include.list": "inventory",
  "database.history.kafka.bootstrap.servers": "localhost:9092",
  "database.history.kafka.topic": "schema-changes.inventory"
}

Conclusion

Do not reinvent the wheel. If you need to integrate Kafka with standard databases, file systems, or analytical search indexes, look for pre-built **Kafka Connectors** first. They allow you to build reliable, scalable integration pipelines using simple configuration settings.