Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions cs-articles/train-chatgpt-on-own-data/docs/articles.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
article_id,title,category,author,publication_date,last_updated,reading_time_minutes,tags,summary
ART001,"Guide to Java Concurrency",Java,"John Smith",2024-10-15,2025-01-10,12,"concurrency, threads, java","A comprehensive guide to Java concurrency concepts including threads, ThreadPoolExecutor, CompletableFuture, and synchronization mechanisms. Covers deadlock prevention, thread safety, and best practices for writing concurrent code in Java applications."
ART002,"Building RESTful APIs with Spring Boot",Spring,"Sarah Johnson",2024-09-20,2025-02-05,15,"spring boot, rest, api","Learn how to create robust RESTful APIs using Spring Boot. This tutorial covers controller design, request mapping, response handling, validation, exception handling, and content negotiation. Includes practical examples following REST best practices."
ART003,"Introduction to Spring Security",Security,"Michael Chen",2024-11-12,2025-03-01,18,"spring security, authentication, authorization","An in-depth introduction to Spring Security framework. Covers authentication, authorization, security configurations, form login, OAuth2 integration, method-level security, and protection against common web vulnerabilities."
ART004,"Java 21 Features Overview",Java,"Emily Williams",2024-09-15,2025-01-20,10,"java 21, virtual threads, pattern matching","Explore the key features in Java 21 including virtual threads for improved scalability, pattern matching for switch expressions, record patterns, and the new SequencedCollection interface. Practical examples show how to leverage these features in your applications."
ART005,"Spring Data JPA Tutorial",Spring,"David Lee",2024-10-01,2025-02-10,14,"spring data, jpa, hibernate","Master database operations with Spring Data JPA. This article explains repositories, query methods, custom queries with JPQL and native SQL, specifications, auditing features, and performance optimization techniques. All examples use Spring Boot 3.2."
ART006,"Reactive Programming with Spring WebFlux",Spring,"Patricia Rodriguez",2024-11-05,2025-01-15,16,"reactive, webflux, spring","Understand reactive programming paradigm and how to implement it with Spring WebFlux. Learn about Mono, Flux, reactive operators, error handling, and building non-blocking applications that can handle high concurrency with fewer resources."
ART007,"Microservices with Spring Cloud",Architecture,"Robert Wilson",2024-08-25,2025-02-20,20,"microservices, spring cloud, architecture","Comprehensive guide to building microservices using Spring Cloud. Covers service discovery with Eureka, configuration management with Config Server, circuit breakers with Resilience4j, API gateway with Spring Cloud Gateway, and distributed tracing."
ART008,"Hibernate Performance Tuning",Performance,"Jennifer Taylor",2024-09-10,2025-01-25,15,"hibernate, performance, optimization","Advanced techniques for optimizing Hibernate ORM performance. Learn about fetch strategies, caching mechanisms, batch processing, connection pooling, and query optimization. Includes real-world examples and benchmarks for common scenarios."
ART009,"Testing Spring Boot Applications",Testing,"Andrew Kim",2024-10-20,2025-02-15,12,"testing, junit, mockito, spring boot","Complete guide to testing Spring Boot applications at all levels. Covers unit testing with JUnit 5 and Mockito, integration testing with @SpringBootTest, testing slices with @WebMvcTest and @DataJpaTest, and using TestContainers for integration tests with real databases."
ART010,"Getting Started with Kotlin for Java Developers",Kotlin,"Maria Garcia",2024-11-15,2025-03-10,10,"kotlin, java","A beginner-friendly introduction to Kotlin for Java developers. Learn the key differences between Kotlin and Java, null safety, extension functions, data classes, coroutines, and how to use Kotlin with Spring Boot for more concise and expressive code."
ART011,"Docker for Java Applications",DevOps,"Daniel Brown",2024-08-15,2025-01-05,8,"docker, devops, java","Learn how to containerize Java applications with Docker. This guide covers creating efficient Dockerfiles, multi-stage builds, container orchestration, and best practices specific to Java applications. Includes examples for Spring Boot applications."
ART012,"Spring Cache Abstraction",Spring,"Lisa Thompson",2024-09-05,2025-02-25,6,"cache, spring, performance","A detailed look at Spring's cache abstraction. Learn how to use @Cacheable, @CachePut, and @CacheEvict annotations, configure different cache providers (EhCache, Caffeine, Redis), and implement custom cache resolution for optimal performance."
ART013,"Java Stream API Deep Dive",Java,"James Wilson",2024-10-10,2025-01-30,14,"streams, functional, java","Master the Java Stream API for functional-style operations. Learn about intermediate operations (map, filter, sorted), terminal operations (collect, reduce), parallel streams, custom collectors, and performance considerations with practical examples."
ART014,"Building GraphQL APIs with Spring Boot",Spring,"Sophia Martinez",2024-11-01,2025-03-05,13,"graphql, spring boot, api","Complete tutorial on implementing GraphQL APIs using Spring Boot and GraphQL Java. Covers schema definition, resolvers, queries, mutations, subscriptions, error handling, and integration with Spring Data repositories."
ART015,"Spring Transaction Management",Spring,"Thomas Clark",2024-09-25,2025-02-01,9,"transactions, spring","Understand transaction management in Spring applications. Learn about @Transactional annotation, propagation levels, isolation levels, rollback rules, and advanced scenarios like distributed transactions. Includes best practices and common pitfalls to avoid."
ART016,"Java Collections Framework",Java,"Rachel Lee",2024-08-20,2025-01-12,11,"collections, java, data structures","In-depth guide to the Java Collections Framework. Analyzes different collection types (List, Set, Map, Queue), their implementations, performance characteristics, and usage patterns. Includes guidance on choosing the right collection for different scenarios."
ART017,"Dependency Injection Patterns in Spring",Spring,"Kevin Martin",2024-10-05,2025-02-12,7,"dependency injection, spring, design patterns","Explore various dependency injection patterns in Spring. Covers constructor injection, setter injection, field injection, and method injection. Discusses best practices, pros and cons of each approach, and when to use specific patterns."
ART018,"Building RESTful APIs with Spring HATEOAS",Spring,"Michelle Wong",2024-11-10,2025-03-15,9,"hateoas, rest, api, spring","Learn how to create truly RESTful APIs following HATEOAS principles using Spring HATEOAS. This article explains resource representation, link creation, relation types, and building self-documenting APIs that clients can navigate without prior knowledge."
ART019,"Java Memory Management and Garbage Collection",Java,"Christopher Davis",2024-09-15,2025-01-18,16,"memory management, garbage collection, jvm","Detailed exploration of Java memory management and garbage collection algorithms. Understand heap structure, garbage collectors (Serial, Parallel, G1, ZGC), memory leaks, and tuning parameters. Includes practical advice for performance optimization."
ART020,"Securing RESTful APIs with OAuth 2.0 and JWT",Security,"Amanda Nelson",2024-10-25,2025-02-28,14,"oauth, jwt, security, api","Comprehensive guide to implementing OAuth 2.0 and JWT authentication for RESTful APIs using Spring Security. Covers authorization server setup, resource server configuration, token validation, and best practices for secure API design."
ART021,"Spring Boot Actuator for Monitoring and Management",Operations,"Brian Miller",2024-08-10,2025-01-08,8,"actuator, monitoring, spring boot","Learn how to use Spring Boot Actuator to monitor and manage your applications in production. Covers endpoints for health checks, metrics, environment information, and how to integrate with monitoring systems like Prometheus and Grafana."
ART022,"Hibernate One-to-Many Relationship Mapping",ORM,"Jessica White",2024-09-30,2025-02-08,10,"hibernate, jpa, relationships","Detailed guide to implementing one-to-many relationships with Hibernate and JPA. Covers both unidirectional and bidirectional mappings, cascading operations, fetch strategies, and performance optimization techniques for relationship traversal."
ART023,"Building Reactive Microservices with Spring Boot and RabbitMQ",Messaging,"Jason Zhang",2024-11-20,2025-03-18,15,"reactive, microservices, rabbitmq, messaging","Learn how to build reactive microservices that communicate asynchronously using Spring Boot and RabbitMQ. Covers message producers, consumers, exchanges, queues, routing, and error handling patterns for resilient microservice architecture."
ART024,"Spring MVC vs Spring WebFlux",Spring,"Laura Robinson",2024-10-15,2025-01-22,9,"mvc, webflux, spring","Detailed comparison between traditional Spring MVC and reactive Spring WebFlux. Analyzes programming models, performance characteristics, use cases, and limitations of both approaches. Helps developers choose the right framework for their specific needs."
ART025,"Java UUID Generator",Java,"William Taylor",2024-08-18,2025-02-03,5,"uuid, java","Guide to generating and working with UUIDs in Java applications. Covers the built-in UUID class, different generation strategies, storage considerations in databases, and performance implications. Includes practical examples and common usage patterns."
ART026,"Spring Boot Auto-Configuration",Spring,"Emma Harris",2024-09-12,2025-01-28,8,"auto-configuration, spring boot","Deep dive into Spring Boot's auto-configuration mechanism. Understand how conditional annotations work, how to customize default configurations, create your own auto-configurations, and troubleshoot common issues with detailed examples."
ART027,"Java Design Patterns: Creational Patterns",Design Patterns,"Tyler Green",2024-10-28,2025-02-18,12,"design patterns, creational patterns, java","Comprehensive guide to creational design patterns in Java. Covers Singleton, Factory Method, Abstract Factory, Builder, and Prototype patterns with real-world examples. Learn when and how to apply each pattern to solve common design problems."
ART028,"Spring AOP Tutorial",Spring,"Rebecca Moore",2024-11-25,2025-03-08,10,"aop, aspect oriented programming, spring","Learn Aspect-Oriented Programming with Spring AOP. Covers aspects, pointcuts, advices, and how to use them for cross-cutting concerns like logging, security, and performance monitoring. Includes comparison with AspectJ and practical examples."
ART029,"Java Exception Handling Best Practices",Java,"Brandon Walker",2024-09-08,2025-01-16,7,"exception handling, java, best practices","Best practices for exception handling in Java applications. Covers checked vs unchecked exceptions, custom exception hierarchies, try-with-resources, multi-catch blocks, and how to design robust error handling strategies for different application layers."
ART030,"Building a REST API Documentation with SpringDoc and OpenAPI",Documentation,"Caroline Adams",2024-10-18,2025-02-22,9,"documentation, openapi, swagger, spring","Learn how to automatically generate comprehensive API documentation using SpringDoc and OpenAPI. Covers configuration, customization, security documentation, and how to enhance documentation with annotations for a better developer experience."
ART031,"Java Records: Immutable Data Classes",Java,"Nathan Scott",2024-11-08,2025-03-12,6,"records, java, immutability","Explore Java Records feature for creating concise immutable data classes. Learn about compact constructor, customizing accessors, inheritance limitations, and how records compare to traditional classes and Lombok's @Value annotation."
ART032,"Spring Data MongoDB Tutorial",NoSQL,"Olivia Parker",2024-08-22,2025-01-20,12,"mongodb, spring data, nosql","Comprehensive guide to working with MongoDB in Spring applications. Covers MongoRepository, custom queries, aggregations, transactions, and schema design best practices. Includes examples of complex document structures and relationships."
ART033,"Implementing JWT Authentication with Spring Security",Security,"Timothy Foster",2024-09-18,2025-02-15,13,"jwt, authentication, spring security","Step-by-step guide to implementing JWT-based authentication in Spring Security. Covers token generation, validation, refresh mechanisms, and securing endpoints. Includes complete implementation of authentication and authorization flows."
ART034,"Spring Boot Testing Strategies",Testing,"Victoria Lewis",2024-10-22,2025-03-02,11,"testing, spring boot, strategies","Advanced testing strategies for Spring Boot applications. Covers testing pyramid implementation, test slices, mocking strategies, test data management, and continuous integration setup. Focuses on writing maintainable and efficient tests."
ART035,"Java Serialization and Deserialization",Java,"Alexander Bennett",2024-11-03,2025-01-25,8,"serialization, java, io","In-depth guide to Java serialization and deserialization. Covers Serializable interface, transient fields, versioning with serialVersionUID, security considerations, and alternatives like JSON serialization with Jackson and protocol buffers."
ART036,"Caching in Spring Boot Applications",Performance,"Katherine Young",2024-08-28,2025-02-10,9,"caching, spring boot, performance","Detailed guide to implementing caching in Spring Boot applications. Covers Spring's cache abstraction, various cache providers (EhCache, Caffeine, Redis), distributed caching, cache eviction strategies, and monitoring cache performance."
ART037,"Java Reflection API",Java,"Jeremy Collins",2024-09-22,2025-01-15,14,"reflection, java, metaprogramming","Comprehensive guide to the Java Reflection API. Learn how to inspect classes, methods, and fields at runtime, invoke methods dynamically, and access private members. Includes performance considerations and common use cases in frameworks."
ART038,"Building Batch Processing Applications with Spring Batch",Batch Processing,"Stephanie Peterson",2024-10-12,2025-03-05,15,"batch processing, spring batch","Learn how to implement batch processing workflows with Spring Batch. Covers job configuration, chunk processing, readers, processors, writers, job orchestration, error handling, and scaling strategies for processing large volumes of data."
ART039,"Java NIO for High-Performance I/O",Java,"Zachary Morris",2024-11-18,2025-01-30,12,"nio, java, performance, io","Deep dive into Java NIO for high-performance I/O operations. Covers channels, buffers, selectors, memory-mapped files, asynchronous I/O, and file system API. Includes performance comparisons with traditional I/O and best practices."
ART040,"Spring Cloud Stream with Kafka",Messaging,"Megan Cooper",2024-09-05,2025-02-25,13,"kafka, spring cloud stream, messaging","Comprehensive guide to building message-driven microservices with Spring Cloud Stream and Apache Kafka. Covers binders, function-based programming model, consumer groups, error handling, and partitioning for scalable event processing."

Binary file not shown.
Binary file not shown.
59 changes: 59 additions & 0 deletions cs-articles/train-chatgpt-on-own-data/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
2to3==1.0
annotated-types==0.7.0
anyio==4.9.0
async-timeout==4.0.3
certifi==2022.9.24
charset-normalizer==2.1.1
colorama==0.4.6
distro==1.9.0
et-xmlfile==1.1.0
exceptiongroup==1.3.0
faiss-cpu==1.11.0
filelock==3.12.0
greenlet==1.1.3
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.4
Jinja2==3.1.2
jiter==0.9.0
jsonpatch==1.33
jsonpointer==3.0.0
langchain==0.3.25
langchain-core==0.3.60
langchain-text-splitters==0.3.8
langsmith==0.3.42
MarkupSafe==2.1.2
mpmath==1.3.0
names==0.3.0
networkx==3.1
numpy==1.26.4
openai==1.79.0
openpyxl==3.0.10
orjson==3.10.18
packaging==24.2
pandas==1.5.0
passlib==1.7.4
Pillow==9.5.0
pydantic==2.11.4
pydantic_core==2.33.2
PyPDF2==3.0.1
python-dateutil==2.8.2
python-dotenv==1.1.0
pytz==2022.2.1
PyYAML==6.0.2
requests==2.28.1
requests-toolbelt==1.0.0
six==1.16.0
sniffio==1.3.1
SQLAlchemy==1.4.41
sympy==1.12
tenacity==9.1.2
torch==2.0.1
torchaudio==2.0.2
torchvision==0.15.2
tqdm==4.67.1
typing-inspection==0.4.0
typing_extensions==4.13.2
urllib3==1.26.12
zstandard==0.23.0
87 changes: 87 additions & 0 deletions cs-articles/train-chatgpt-on-own-data/src/chat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
import os
import pickle
from typing import Optional

import numpy as np
import faiss
from openai import OpenAI
import config as cfg

client = OpenAI(api_key=cfg.OPENAI_API_KEY)
print("Loading vector index and chunks...")
index_path = os.path.join(cfg.VECTOR_DIR, "baeldung.idx")
chunks_path = os.path.join(cfg.VECTOR_DIR, "chunks.pkl")
index = faiss.read_index(index_path)

with open(chunks_path, "rb") as f:
chunk_data = pickle.load(f)
all_texts = chunk_data["texts"]
all_meta = chunk_data["meta"]

print(f"Loaded index with {len(all_texts)} chunks")


def retrieve_context(question, k=3):
q_emb = client.embeddings.create(
model=cfg.EMBEDDING_MODEL,
input=question
).data[0].embedding

D, I = index.search(np.array([q_emb], dtype="float32"), k)
relevant_chunks = []

for i in I[0]:
if i == -1:
continue
relevant_chunks.append({"text": all_texts[i], "meta": all_meta[i]})

return relevant_chunks


def chat_round(user_question: str, history: Optional[list[dict]] = None) -> str:
"""Handle a complete conversation turn with RAG."""
history = history or []

# Retrieve relevant knowledge base content
relevant_chunks = retrieve_context(user_question)

# Format retrieved content for inclusion in prompt
context_text = ""
for chunk in relevant_chunks:
meta = chunk["meta"]
source = meta.get("source", "Unknown")
article_id = meta.get("article_id", "")

context_text += f"[Source: {source}"
if article_id:
context_text += f", Article ID: {article_id}]"
else:
context_text += "]"

context_text += f"\n{chunk['text']}\n\n"
# Create system message with retrieved context
system_content = (
"You are customer-support agent. Answer questions about "
"Spring, Java, and web development. Use the following articles "
"as context for your answer.\n\n"
f"Knowledge Base Context:\n{context_text}\n\n"
"If the context doesn't contain relevant information, use your general knowledge "
"but prioritize the context's perspective."
)

system_msg = {"role": "system", "content": system_content}

# Prepare conversation messages
history_msgs = history.copy()
history_msgs.insert(0, system_msg)
history_msgs.append({"role": "user", "content": user_question})

# Make API call without function definitions
resp = client.chat.completions.create(
model=cfg.GPT_MODEL,
messages=history_msgs,
temperature=0
)

# Return direct response
return resp.choices[0].message.content
Loading