DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Agentic AI. It's everywhere. But what does that mean for developers? Learn to leverage agentic AI to improve efficiency and innovation.

Modernize enterprise Java apps. Learn to enhance generative AI capabilities across Jakarta EE and Spring Boot platforms.

Platform Engineering: Enhance the developer experience, establish secure environments, automate self-service tools, and streamline workflows

For Java apps, containerization helps solve the majority of challenges related to portability and consistency. See how.

Languages

Programming languages allow us to communicate with computers, and they operate like sets of instructions. There are numerous types of languages, including procedural, functional, object-oriented, and more. Whether you’re looking to learn a new language or trying to find some tips or tricks, the resources in the Languages Zone will give you all the information you need and more.

icon
Latest Premium Content
Refcard #357
NoSQL Migration Essentials
NoSQL Migration Essentials
Refcard #071
PostgreSQL Essentials
PostgreSQL Essentials
Refcard #029
MySQL Essentials
MySQL Essentials

DZone's Featured Languages Resources

Why You Should Migrate Microservices From Java to Kotlin: Experience and Insights

Why You Should Migrate Microservices From Java to Kotlin: Experience and Insights

By Konstantin Glumov
I work at one of the largest private banks in Eastern Europe, developing the backend for a mobile application. Our cluster consists of more than 400 microservices, and peak loads on individual services can reach five-digit values. When we initially started transitioning to a microservices architecture, all our code was written in Java. However, over time, we began actively migrating microservices to Kotlin. Today, all new microservices are created exclusively in Kotlin, and the share of Java code has decreased to less than 20%. In this article, I will explain why the migration to Kotlin has been so successful and why developers are eager to switch to this language, even with prior experience only in other JVM languages. Kotlin vs Java One of Kotlin’s main strengths is its full compatibility with Java code. Kotlin seamlessly interacts with Java classes and methods, and vice versa, allowing for a smooth transition to the new language without the need to rewrite existing code. Kotlin uses JVM bytecode, ensuring compatibility with all the Java libraries and frameworks that we actively use in our project. The modular structure of microservices is ideal for the gradual introduction of Kotlin. We started by developing new components in Kotlin, then slowly migrated older ones. Importantly, we could continue using Java code without worrying about conflicts or breakdowns. This approach allowed us to avoid significant risks and ensured the stable operation of the entire cluster. Kotlin offers a range of modern features that simplify development and make it more efficient. Here are a few examples: Coroutines: Enable easy and efficient writing of asynchronous code, avoiding the complexity of Java's multithreadingExtensions: Allow adding new functions to existing classes without modifying themNullable types: Solve the issue of NullPointerException, a common problem for Java developers Switching to Kotlin is smooth and natural because its syntax closely resembles Java. However, unlike Java, Kotlin eliminates much of the boilerplate, making the code more concise and readable. For instance, creating POJO classes in Kotlin can be done in a single line using data class. Similar improvements can be seen in other aspects of the language, such as working with collections, asynchronous programming, and handling nullable types. Code Examples: Java vs. Kotlin To illustrate, here are a few comparisons: POJO Class vs. Data Class Java public class Person { private String name; private int age; /* Also here you need to add a constructor, getters and setters for each field, toString, equals, hashCode */ } In Java, creating a simple POJO (Plain Old Java Object) requires explicitly defining fields, constructors, getters, setters, and often overriding toString(), equals(), and hashCode() methods. This results in a lot of boilerplate code. Kotlin data class Person(val name: String, val age: Int) Kotlin's data class automatically generates constructor, getters (and setters for var properties), toString(), equals(), hashCode(), and copy() methods. This single line of code is equivalent to dozens of lines in Java. Nullable Types Java String name = null; if (name != null) { System.out.println(name.length()); } else { System.out.println("Name is null"); } In Java, you need to explicitly check for null before accessing an object to avoid NullPointerException. This often leads to verbose null-checking code. Kotlin val name: String? = null println(name?.length ?: "Name is null") Kotlin uses the safe call operator (?.) and the Elvis operator (?:) to handle nullables concisely. This line safely accesses the length if name is not null, or otherwise prints "Name is null". In Kotlin, the safe call operator, ?., and the ?: operator are used for default values, simplifying null handling. Nested Object Checks Java if (person != null && person.getAddress() != null && person.getAddress().getCity() != null) { System.out.println(person.getAddress().getCity()); } In Java, checking nested objects for null requires multiple checks, leading to deeply nested if statements or long boolean expressions. Kotlin person?.address?.city?.let { println(it) } Kotlin's safe call operator (?.) allows for chaining nullable calls. The let function executes the block only if all the previous calls in the chain are non-null. Asynchronous Calls Java public Mono<String> fetchData() { Mono<String> first = fetchFirst(); Mono<String> second = fetchSecond(); return Mono.zip(first, second) .map(tuple -> tuple.getT1() + " " + tuple.getT2()); } This Java code uses Project Reactor's Mono for asynchronous programming. It fetches two pieces of data asynchronously and combines them. Kotlin suspend fun fetchData() { val first = async { fetchFirst() } val second = async { fetchSecond() } println("${first.await()} ${second.await()}") } Kotlin's coroutines make asynchronous code look and behave like synchronous code. The async function starts coroutines for fetching data, and await() suspends execution until the result is available. Collection Processing Java List<String> names = Arrays.asList("John", "Jane", "Doe"); List<String> filteredNames = names.stream() .filter(name -> name.startsWith("J")) .map(String::toUpperCase) .collect(Collectors.toList()); Java uses streams and lambda expressions for collection processing. This code filters names starting with "J" and converts them to uppercase. Kotlin val names = listOf("John", "Jane", "Doe") val filteredNames = names.filter { it.startsWith("J") } .map { it.uppercase() } Kotlin provides concise functions for collection processing. The code does the same as the Java version but with a more readable syntax. Kotlin offers a more concise and expressive syntax for working with collections. If-Else in Java and When in Kotlin Java int number = 3; String result; if (number == 1) { result = "One"; } else if (number == 2) { result = "Two"; } else { result = "Other"; } Java uses traditional if-else statements for conditional logic. This can become verbose with multiple conditions. Kotlin val number = 3 val result = when (number) { 1 -> "One" 2 -> "Two" else -> "Other" } Kotlin's when expression provides a more concise and powerful replacement for switch statements and complex if-else chains. Class Extensions Java public class StringUtils { public static String reverse(String s) { return new StringBuilder(s).reverse().toString(); } } In Java, adding functionality to existing classes often requires creating utility classes with static methods. Kotlin fun String.reverse(): String = this.reversed() Kotlin's extension functions allow adding new methods to existing classes without modifying their source code. This reverses a string and can be called directly on String objects. Kotlin val reversed = "Hello".reverse() // "olleH" The extension function can be called as if it were a method of the String class, making the code more intuitive and object-oriented. These examples clearly show how Kotlin simplifies and enhances development compared to Java through concise syntax, built-in features, and powerful tools for asynchronous programming and collection handling. Conclusion Speaking of the compatibility of Java and Kotlin within the same project, I want to emphasize that using microservices in both languages does not require creating separate libraries and starters. The only limitation is the Java version, which must not be higher than the one used in the services. Spring also fully supports both languages, and the difference in use lies only in some dependencies that need to be pulled in through Maven or Gradle build systems. After several years of using Java and Kotlin simultaneously on our project, I can confidently say that developers do not want to go back to Java. They like Kotlin much more — it is more concise and expressive and provides more opportunities for writing efficient code. Kotlin code is easier to read and understand, especially when regularly reviewing your colleagues' pull requests. For programmers with experience in other JVM languages, the transition to Kotlin is very fast. Of course, Java is not standing still. In the new versions, features like records have appeared, which serve as an analog of data classes in Kotlin. Virtual threads, which could replace coroutines, are also in development. Additionally, work is underway to improve the support for nullable objects. Nevertheless, most of our developers prefer to work with Kotlin. They appreciate its conciseness, expressiveness, and modern features that significantly improve productivity and code quality. The transition from Java to Kotlin has been simple and natural for us, allowing us to maintain the existing system and gradually introduce the new language. More
Top 10 C# Keywords and Features

Top 10 C# Keywords and Features

By Naga Santhosh Reddy Vootukuri DZone Core CORE
The language C# stands out as the top 5th programming language in a Stack Overflow survey. It is widely used for creating various applications, ranging from desktop to mobile to cloud native. With so many language keywords and features it will be taxing to developers to keep up to date with new feature releases. This article delves into the top 10 C# keywords every C# developer should know. 1. Async and Await Keywords: async, await The introduction of async and await keywords in C# make it easy to handle asynchronous programming in C#. They allow you to write code that performs operations without blocking the main thread. This capability is particularly useful for tasks that are I/O-bound or CPU-intensive. By making use of these keywords, programmers can easily handle long-running compute operations like invoking external APIs to get data or writing or reading from a network drive. This will help in developing responsive applications and can handle concurrent operations. Example C# public async Task<string> GetDataAsync() { using (HttpClient client = new HttpClient()) { string result = await client.GetStringAsync("http://bing.com"); return result; } } 2. LINQ Keywords: from, select, where, group, into, order by, join LINQ (Language Integrated Query) provides an easy way to query various data sources, such as databases, collections, and XML, directly within C# without interacting with additional frameworks like ADO.NET, etc. By using a syntax that is identical to SQL, LINQ enables developers to write queries in a readable way. Example C# var query = from student in students where student.Age > 18 orderby student.Name select student; 3. Properties Properties are mainly members that provide a flexible mechanism to read, write, or compute the value of a private field. Generally, we hide the internal private backing fields and expose them via a public property. This enables data to be accessed easily by the callers. In the below example, Name is the property that is hiding a backing field called name, marked as private to avoid outside callers modifying the field directly. Example C# class Person { private string name; // backing field public string Name // property { get { return name; } set { name = value; } } } class Program { static void Main(string[] args) { Person P1 = new Person(); P1.Name = "Sunny"; Console.WriteLine(P1.Name); } } 4. Generics Keywords: generic, <T> Generics allows you to write the code for a class without specifying the data type(s) that the class works on. It is a class that allows the user to define classes and methods with a placeholder. The introduction of Generics in C#2.0 has completely changed the landscape of creating modular reusable code which otherwise needs to be duplicated in multiple places. Imagine you are handling the addition of 2 numbers that are of int and then comes a requirement to add floats or double datatypes. We ended up creating the same duplicate code because we already defined a method with int datatypes in the method parameters. Generics makes it easy to define the placeholders and handle logic for different datatypes. Example C# public class Print { // Generic method which can take any datatype as method parameter public void Display<T>(T value) { Console.WriteLine($"The value is: {value}"); } } public class Program { public static void Main(string[] args) { Print print = new Print(); // Call the generic method with different data types print.Display<int>(10); print.Display<string>("Hello World"); print.Display<double>(20.5); } } 5. Delegates and Events Keywords: delegate, event A delegate is nothing but an object that refers to a method that you can invoke directly via delegate without calling the method directly. Delegates are equivalent to function pointers in C++. Delegates are type-safe pointers to any method. Delegates are mainly used in implementing the call-back methods and for handling events. Func<T> and Action<T> are inbuilt delegates provided out of the box in C#. Events, on the other hand, enable a class or object to notify other classes or objects when something of interest occurs. For example, think of a scenario where a user clicks a button on your website. It generates an event (in this case button click) to be handled by a corresponding event handler code. Examples Example code for declaring and instantiating a delegate: C# public delegate void MyDelegate1(string msg); // declare a delegate // This method will be pointed to by the delegate public static void PrintMessage(string message) { Console.WriteLine(message); } public static void Main(string[] args) { // Instantiate the delegate MyDelegate1 del = PrintMessage; // Call the method through the delegate del("Hello World"); } Example code for initiating an event and handling it via an event handler: C# // Declare a delegate public delegate void Notify(); public class ProcessBusinessLogic { public event Notify ProcessCompleted; // Declare an event public void StartProcess() { Console.WriteLine("Process Started!"); // Some actual work here.. OnProcessCompleted(); } // Method to call when the process is completed protected virtual void OnProcessCompleted() { ProcessCompleted?.Invoke(); } } public class Program { public static void Main(string[] args) { ProcessBusinessLogic bl = new ProcessBusinessLogic(); bl.ProcessCompleted += bl_ProcessCompleted; // Register event handler bl.StartProcess(); } // Event handler public static void bl_ProcessCompleted() { Console.WriteLine("Process Completed!"); } } 6. Lambda Expressions Keyword: lambda, => Lambda expressions provide an easy way to represent methods, particularly useful in LINQ queries and for defining short inline functions. This feature allows developers to write readable code by eliminating the need for traditional method definitions when performing simple operations. Lambda expressions enhance code clarity and efficiency by making them an invaluable tool for developers when working with C#. Example C# Func<int, int, int> add = (x, y) => x + y; int result = add(3, 4); // result is 7 7. Nullable Types Keyword: ? In C#, nullable types allow value types to have a null state, too. This comes in handy when you're working with databases or data sources that might have null values. Adding a ? after a value type helps developers handle cases where data could be missing or not defined. This prevents in causing potential errors when the code is running. This feature makes applications more reliable by giving a clear and straightforward way to handle optional or missing data. Example: C# int? num = null; if (num.HasValue) { Console.WriteLine($"Number: {num.Value}"); } else { Console.WriteLine("No value assigned."); } 8. Pattern Matching Keywords: switch, case Pattern matching is another useful feature introduced in C# 7.0 which then underwent a series of improvements in successive versions of the language. Pattern matching takes an expression and it helps in testing whether it matches a certain criteria or not. Instead of lengthy if-else statements, we can write code in a compact way that is easy to read. In the below example, I have used object where I assigned value 5 (which is of int datatype), which then uses pattern matching to print which datatype it is. Example C# object obj = 5; if (obj is int i) { Console.WriteLine($"Integer: {i}"); } switch (obj) { case int j: Console.WriteLine($"Integer: {j}"); break; case string s: Console.WriteLine($"String: {s}"); break; default: Console.WriteLine("Unknown type."); break; } 9. Extension Methods Keyword: this (in method signature) Extension methods allow developers to add new methods to existing types without changing their original code. These methods are static but work like instance methods of the extended type, offering a smooth way to add new functionality. Extension methods make code more modular and reusable giving developers the ability to extend types from outside libraries without messing up with the original code. Extension methods also support the "Open/Closed" principle, which means code is open to extension but closed to modifications. Example C# public static class StringExtensions { public static bool IsNullOrEmpty(this string value) { return string.IsNullOrEmpty(value); } } // Usage string str = null; bool result = str.IsNullOrEmpty(); // result is true 10. Tuples Keyword: tuple Tuples let you group multiple values into one single unit. They help when you want to send back more than one value from a method without using out parameters or making a new class only for the purpose of transferring data between objects. With tuples, you can package and return a set of related values, which makes our code easier to read and understand. You can give names to the fields in tuples or leave them unnamed. You then refer to the values using Item1 and Item2 as shown below. Example C# public (int, string) GetPerson() { return (1, "John Doe"); } // Usage var person = GetPerson(); Console.WriteLine($"ID: {person.Item1}, Name: {person.Item2}"); Conclusion By using async/await to handle tasks well, LINQ to get data, Properties to keep data safe, Generics to make sure the types are right, Delegates and Events for programs that react to events, Lambda expressions to write short functions, nullable types to deal with missing info, pattern matching to make code clearer and say more, extension methods to add new features, and tuples to organize data well, you can write code that's easier to manage and less likely to break. When you get good at using these features, you'll be able to build responsive, scalable, and top-notch applications. Happy Coding!!! More
Go: Unit and Integration Tests
Go: Unit and Integration Tests
By Suleiman Dibirov DZone Core CORE
Kotlin Coroutines and OpenTelemetry Tracing
Kotlin Coroutines and OpenTelemetry Tracing
By Nicolas Fränkel DZone Core CORE
Order in Chaos: Python Configuration Management for Enterprise Applications
Order in Chaos: Python Configuration Management for Enterprise Applications
By Prince Bose
Default Map Value
Default Map Value

In this post, I'll explain how to provide a default value when querying an absent key in a hash map in different programming languages. Java Let's start with Java, my first professional programming language. In older versions, retrieving a value from a map required using the get() method: Java Map map = new HashMap(); //1 Object value = map.get(new Object()); //2 if (value == null) { value = "default"; //3 } Initialize an empty map.Attempt to retrieve a non-existent key.Assign a default value if the key is absent. With Java 1.8, the Map interface introduced a more concise way to handle absent keys: Java var map = new HashMap<Object, String>(); var value = map.getOrDefault(new Object(), "default"); //1 Retrieve the value with a default in one step. Kotlin Kotlin provides several approaches to retrieve values from a map: get() and getOrDefault() function just like their Java counterparts.getValue() throws an exception if the key is missing.getOrElse() accepts a lambda to provide a default value lazily. Kotlin val map = mapOf<Any, String>() val default = map.getOrDefault("absent", "default") //1 val lazyDefault = map.getOrElse("absent") { "default" } //2 Retrieve the default value.Lazily evaluate the default value. Python Python is less forgiving than Java when handling absent keys — it raises a KeyError: Python map = {} value = map['absent'] #1 Raises a KeyError To avoid this, Python offers the get() method: Python map = {} value = map.get('absent', 'default') #1 Alternatively, Python's collections.defaultdict allows setting a default for all absent keys: Python from collections import defaultdict map = defaultdict(lambda: 'default') #1 value = map['absent'] Automatically provide a default value for any absent key. Ruby Ruby's default behavior returns nil for absent keys: Ruby map = {} value = map['absent'] For a default value, use the fetch method: Ruby map = {} value = map.fetch('absent', 'default') #1 Provide a default value for the absent key. Ruby also supports a more flexible approach with closures: Ruby map = {} value = map.fetch('absent') { |key| key } #1 Return the queried key instead of a constant. Lua My experience with Lua is relatively new, having picked it up for Apache APISIX. Let's start with Lua's map syntax: Lua map = {} --1 map["a"] = "A" map["b"] = "B" map["c"] = "C" for k, v in pairs(map) do --2 print(k, v) --3 end Initialize a new map.Iterate over key-value pairs.Print each key-value pair. Fun fact: the syntax for tables is the same as for maps: Lua table = {} --1 table[0] = "zero" table[1] = "one" table[2] = "two" for k,v in ipairs(table) do --2 print(k, v) --3 end Initialize a new mapLoop over the pairs of key values Print the following: 1 one 2 two Lua arrays start at index 0! We can mix and match indices and keys. The syntax is similar, but there's no difference between a table and a map. Indeed, Lua calls the data structure a table: Lua something = {} something["a"] = "A" something[1] = "one" something["b"] = "B" for k,v in pairs(something) do print(k, v) end The result is the following: 1 one a A b B In Lua, absent keys return nil by default: Lua map = {} value = map['absent'] To provide a default, Lua uses metatables and the __index metamethod: Metatables allow us to change the behavior of a table. For instance, using metatables, we can define how Lua computes the expression a+b, where a and b are tables. Whenever Lua tries to add two tables, it checks whether either of them has a metatable and whether that metatable has an __add field. If Lua finds this field, it calls the corresponding value (the so-called metamethod, which should be a function) to compute the sum. - Metatables and Metamethods Each table in Lua may have its own metatable. As I said earlier, when we access an absent field in a table, the result is nil. This is true, but it is not the whole truth. Such access triggers the interpreter to look for an __index metamethod: if there is no such method, as usually happens, then the access results in nil; otherwise, the metamethod will provide the result. - The __index Metamethod Here's how to use it: Lua table = {} --1 mt = {} --2 setmetatable(table, mt) --3 mt.__index = function (table, key) --4 return key end default = table['absent'] --5 Create the table.Create a metatable.Associate the metatable with the table.Define the __index function to return the absent key.The __index function is called because the key is absent. Summary This post explored how to provide default values when querying absent keys across various programming languages. Here's a quick summary: Programming languagePer callPer mapStaticLazy ScopeValueJava❎❌❎❌Kotlin❎❌❎❎Python❎❎❌❎Ruby❎❌❎❎Lua❌❎❎❌

By Nicolas Fränkel DZone Core CORE
The Chaos of Mismatched Ord and PartialOrd Implementations in Rust's BTreeSet
The Chaos of Mismatched Ord and PartialOrd Implementations in Rust's BTreeSet

Rust is known for its robust type system and powerful trait-based abstractions, which allow developers to write safe, efficient, and expressive code. BTreeSet in Rust is a powerful data structure for maintaining a sorted collection of unique elements. It provides the guarantees of log(n) insertion, deletion, and lookup times while keeping the elements in a well-defined order. However, when the Ord and PartialOrd trait implementations for a type differ, it can lead to unpredictable and chaotic behavior. This article explores this subtle pitfall using a practical example. Understanding Ord and PartialOrd The Ord Trait The Ord trait in Rust enforces a total order on elements. It’s used by collections like BTreeSet to maintain a consistent ordering. When you implement Ord for a type, you’re defining a complete ordering, which ensures that any two elements can be compared, and the ordering will always make sense. The PartialOrd Trait PartialOrd allows for partial ordering, meaning that not all pairs of elements need to be comparable. It’s less strict than Ord, but in practice, many types that implement PartialOrd also implement Ord. Problems arise when these two implementations do not align, especially in data structures that rely on consistent ordering. The Chaos Example To demonstrate the issue, let’s consider a custom struct Chaos and implement both Ord and PartialOrd for it, but with different logic: #[derive(Debug, Eq, Hash, Copy, Clone)] struct Chaos(i32); impl PartialOrd for Chaos { fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> { Some(self.0.cmp(&other.0).reverse()) // Reverse order for PartialOrd } } impl Ord for Chaos { fn cmp(&self, other: &Self) -> std::cmp::Ordering { self.0.cmp(&other.0) // Normal order for Ord } } impl PartialEq for Chaos { fn eq(&self, other: &Self) -> bool { self.0 == other.0 } } use std::collections::BTreeSet; fn main() { let mut set = BTreeSet::from([Chaos(1), Chaos(2), Chaos(3), Chaos(4)]); println!("Before insertion {:?}", set); set.insert(Chaos(0)); set.insert(Chaos(5)); println!("After insertion {:?}", set); } In this code, the Chaos struct has a simple integer as its sole field. However, the PartialOrd and Ord implementations are deliberately different: PartialOrd sorts the elements in descending order (reversed).Ord sorts the elements in ascending order (normal). Analyzing the Output When running the above code, the output is as follows: ❯ cargo run . Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s Running `target/debug/chaos .` Before insertion {Chaos(4), Chaos(3), Chaos(2), Chaos(1)} After insertion {Chaos(0), Chaos(4), Chaos(3), Chaos(2), Chaos(1), Chaos(5)} Initial State Before inserting any new elements, the set is initialized with the elements {Chaos(1), Chaos(2), Chaos(3), Chaos(4)}. Because the initialization uses PartialOrd, the elements are sorted in descending order: {Chaos(4), Chaos(3), Chaos(2), Chaos(1)} After Insertion When new elements (Chaos(0) and Chaos(5)) are inserted, the BTreeSet uses the Ord trait to maintain the order. Since Ord sorts in ascending order, the set is now partially sorted in descending order (from initialization) and partially in ascending order (from insertion): {Chaos(0), Chaos(4), Chaos(3), Chaos(2), Chaos(1), Chaos(5)} This is clearly chaotic and defies the expectations one might have for the behavior of a BTreeSet. Why This Matters: Real-World Implications In a real-world scenario, this mismatch between Ord and PartialOrd can lead to bugs that are hard to diagnose. For example, if your type’s sorting logic is critical for the correctness of your program, this inconsistency can lead to subtle errors that are only discovered much later, perhaps even in production. Best Practices When implementing Ord and PartialOrd for a type in Rust, it's essential to ensure consistency and avoid unnecessary complexity. By following these best practices, you can reduce the risk of bugs and maintain clean, maintainable code. 1. DRY: Reuse Logic to Ensure Consistency To avoid duplicating logic and ensure consistency between Ord and PartialOrd, implement cmp using the partial_cmp method. This approach not only adheres to the DRY principle but also guarantees that both traits share the same underlying comparison logic. impl PartialOrd for Chaos { fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> { Some(self.0.cmp(&other.0).reverse()) // Reverse order for PartialOrd } } impl Ord for Chaos { fn cmp(&self, other: &Self) -> std::cmp::Ordering { match self.partial_cmp(&other) { Some(v)=>v, None=>std::cmp::Ordering::Greater } } } By centralizing the comparison logic, you reduce the likelihood of introducing discrepancies between Ord and PartialOrd, leading to more predictable and reliable behavior. 2. Test for Consistency After implementing Ord and PartialOrd, thoroughly test your type to ensure that it behaves consistently in all contexts. Write tests that specifically check whether the ordering is maintained correctly when using both traits in data structures like BTreeSet. Conclusion The interplay between Ord and PartialOrd is a subtle aspect of Rust’s type system, but one that can have significant consequences when not handled correctly. By understanding the potential pitfalls and following best practices, you can avoid the chaos that mismatched implementations can cause. Always ensure your ordering logic is consistent, and you’ll be able to harness the full power of Rust’s sorted collections without fear.

By Dursun Koç DZone Core CORE
REST and HTTP Semantics
REST and HTTP Semantics

Roy Fielding created REST as his doctorate dissertation. After reading it, I would boil it down to three basic elements: A document that describes object stateA transport mechanism to transmit the object state back and forth between systemsA set of operations to perform on the state While Roy was focused solely on HTTP, I don't see why another transport could not be used. Here are some examples: Mount a WebDAV share (WebDAV is an HTTP extension, so is still using HTTP). Copy a spreadsheet (.xls, .xlsx, .csv, .ods) into the mounted folder, where each row is the new/updated state. The act of copying into the share indicates the operation of upserting, the name of the file indicates the type of data, and the columns are the fields. The server responds with (document name)-status.(document suffix), which provides a key for each row, a status, and possibly an error message. In this case, it does not really make sense to request data.Use gRPC. The object transmitted is the document, HTTP is the transport, and the name of the remote method is the operation. Data can be both provided and requested.Use FTP. Similar to WebDAV, it is file-based. The PUT command is upserting, and the GET command is requesting. GET only provides a filename, so it generally provides all data of the specified type. It is possible to allow for special filenames that indicate a hard-coded filter to GET a subset of data. Whenever I see REST implementations in the wild, they often do not follow basic HTTP semantics, and I have never seen any explanation given for this, just a bunch of varying opinions. None of those I found referenced the RFC. Most seem to figure that: POST = CreatePUT = Update the whole documentPATCH = Update a portion of a documentGET = Retrieve the whole document This is counter to what HTTP states regarding POST and PUT: PUT is "create" or "update". GET generally returns whatever was last PUT. If PUT creates, it MUST return 201 Created. If PUT updates, it MUST return 200 OK or 204 No Content. The RFC suggests the content for 200 OK of a PUT should be the status of the action. I think it would ok in the case of SQL to return the new row from a select statement. This has the advantage that any generated columns are returned to the caller without having to perform a separate GET.POST processes a resource according to its own semantics. Older RFCs said POST is for subordinates of a resource. All versions give the example of posting an article to a mailing list; all versions say if a resource is created that 201 Created SHOULD be returned. I would argue that effectively what POST really means is: Any data manipulation except create, full/partial update, or deleteAny operation that is not data manipulation, such as: Perform a full-text search for rows that match a phrase.Generate a GIS object to display on a map. The word MUST means your implementation is only HTTP compliant if you do what is stated. Using PUT only for updates obviously won't break anything, just because it isn't RFC compliant. If you provide clients that handle all the details of sending/receiving data, then what verbs get used won't matter much to the user of the client. I'm the kind of guy who wants a reason for not following the RFC. I have never understood the importance of separating create from update in REST APIs, any more than in web apps. Think about cell phone apps like calendar appointments, notes, contacts, etc: "Create" is hitting the plus icon, which displays a new form with empty or default values."Update" is selecting an object and hitting the pencil icon, which displays an entry form with current values.Once the entry form appears, it works exactly the same in terms of field validations. So why should REST APIs and web front ends be any different than cell phone apps? If it is helpful for phone users to get the same data entry form for "create" and "update," wouldn't it be just as helpful to API and web users? If you decide to use PUT as "create" or "update", and you're using SQL as a store, most vendors have an upsert query of some sort. Unfortunately, that does not help to decide when to return 200 OK or 201 Created. You'd have to look at the information your driver provides when a DML query executes to find a way to distinguish insert from update for an upsert or use another query strategy. A simple example would be to perform an update set ... where pk column = pk value. If one row was affected, then the row exists and was updated; otherwise, the row does not exist and an insert is needed. On Postgres, you can take advantage of the RETURNING clause, which can actually return anything, not just row data, as follows: SQL INSERT INTO <table> VALUES (...) ON CONFLICT(<pk column>) DO UPDATE SET (...) RETURNING (SELECT COUNT(<pk column>) FROM <table> WHERE <pk column> = <pk value>) exists The genius of this is that: The subselect in the RETURNING clause is executed first, so it determines if the row exists before the INSERT ON CONFLICT UPDATE query executes. The result of the query is one column named "exists", which is 1 if the row existed before the query executed, 0 if it did not.The RETURNING clause can also return the columns of the row, including anything generated that was not provided. You only have to figure out once how to deal with if an insert or update is needed and make a simple abstraction that all your PUTs can call that handles 200 OK or 201 Created. One nice benefit of using PUT as intended is that as soon as you see a POST you know for certain it is not retrieval or persistence, and conversely, you know to search for POST to find the code for any operation that is not retrieval or persistence. I think the benefits of using PUT and POST as described in the RFC outweigh whatever reasons people have for using them in a way that is not RFC-compliant.

By Greg Hall
Understanding Concurrency Patterns in Go
Understanding Concurrency Patterns in Go

Go, also known as Golang, has become a popular language for developing concurrent systems due to its simple yet powerful concurrency model. Concurrency is a first-class citizen in Go, making it easier to write programs that efficiently use multicore processors. This article explores essential concurrency patterns in Go, demonstrating how to leverage goroutines and channels to build efficient and maintainable concurrent applications. The Basics of Concurrency in Go Goroutines A goroutine is a lightweight thread managed by the Go runtime. Goroutines are cheap to create and have a small memory footprint, allowing you to run thousands of them concurrently. Go package main import ( "fmt" "time" ) func sayHello() { fmt.Println("Hello, Go!") } func main() { go sayHello() // Start a new goroutine time.Sleep(1 * time.Second) // Wait for the goroutine to finish } Channels Channels are Go's way of allowing goroutines to communicate with each other and synchronize their execution. You can send values from one goroutine to another through channels. Go package main import "fmt" func main() { ch := make(chan string) go func() { ch <- "Hello from goroutine" }() msg := <-ch fmt.Println(msg) } Don't communicate by sharing memory; share memory by communicating. (R. Pike) Common Concurrency Patterns Worker Pool Purpose To manage a fixed number of worker units (goroutines) that handle a potentially large number of tasks, optimizing resource usage and processing efficiency. Use Cases Task processing: Handling a large number of tasks (e.g., file processing, web requests) with a controlled number of worker threads to avoid overwhelming the system.Concurrency management: Limiting the number of concurrent operations to prevent excessive resource consumption.Job scheduling: Distributing and balancing workloads across a set of worker threads to maintain efficient processing. Example Go package main import ( "fmt" "sync" "time" ) // Worker function processes jobs from the jobs channel and sends results to the results channel func worker(id int, jobs <-chan int, results chan<- int, wg *sync.WaitGroup) { defer wg.Done() for job := range jobs { // Simulate processing the job fmt.Printf("Worker %d processing job %d\n", id, job) time.Sleep(time.Second) // Simulate a time-consuming task results <- job * 2 } } func main() { const numJobs = 15 const numWorkers = 3 jobs := make(chan int, numJobs) results := make(chan int, numJobs) var wg sync.WaitGroup // Start workers for w := 1; w <= numWorkers; w++ { wg.Add(1) go worker(w, jobs, results, &wg) } // Send jobs to the jobs channel for j := 1; j <= numJobs; j++ { jobs <- j } close(jobs) // Wait for all workers to finish go func() { wg.Wait() close(results) }() // Collect and print results for result := range results { fmt.Println("Result:", result) } } Fan-In Purpose To merge multiple input channels or data streams into a single output channel, consolidating results from various sources. Use Cases Log aggregation: Combining log entries from multiple sources into a single logging system for centralized analysis.Data merging: Aggregating data from various producers into a single stream for further processing or analysis.Event collection: Collecting events from multiple sources into one channel for unified handling. Example Go package main import ( "fmt" ) // Function to merge multiple channels into one func merge(channels ...<-chan int) <-chan int { var wg sync.WaitGroup merged := make(chan int) output := func(c <-chan int) { defer wg.Done() for n := range c { merged <- n } } wg.Add(len(channels)) for _, c := range channels { go output(c) } go func() { wg.Wait() close(merged) }() return merged } func worker(id int, jobs <-chan int) <-chan int { results := make(chan int) go func() { defer close(results) for job := range jobs { // Simulate processing fmt.Printf("Worker %d processing job %d\n", id, job) results <- job * 2 } }() return results } func main() { const numJobs = 5 jobs := make(chan int, numJobs) // Start workers and collect their result channels workerChannels := make([]<-chan int, 0, 3) for w := 1; w <= 3; w++ { workerChannels = append(workerChannels, worker(w, jobs)) } // Send jobs for j := 1; j <= numJobs; j++ { jobs <- j } close(jobs) // Merge results results := merge(workerChannels...) // Collect and print results for result := range results { fmt.Println("Result:", result) } } Fan-Out Purpose To distribute data or messages from a single source to multiple consumers, allowing each consumer to process the same data independently. Use Cases Broadcasting notifications: Sending notifications or updates to multiple subscribers or services simultaneously.Data distribution: Delivering data to multiple components or services that each needs to process or act upon the same information.Event handling: Emitting events to various handlers that perform different actions based on the event. Example Go package main import ( "fmt" "sync" "time" ) // Subscriber function simulates a subscriber receiving a notification func subscriber(id int, notification string, wg *sync.WaitGroup) { defer wg.Done() // Simulate processing the notification time.Sleep(time.Millisecond * 100) // Simulate some delay fmt.Printf("Subscriber %d received notification: %s\n", id, notification) } func main() { // List of subscribers (represented by IDs) subscribers := []int{1, 2, 3, 4, 5} notification := "Important update available!" var wg sync.WaitGroup // Broadcast notification to all subscribers concurrently for _, sub := range subscribers { wg.Add(1) go subscriber(sub, notification, &wg) } // Wait for all subscribers to receive the notification wg.Wait() fmt.Println("All subscribers have received the notification.") } Generator Purpose To produce a sequence of data or events that can be consumed by other parts of a system. Use Cases Data streams: Generating a stream of data items, such as log entries or sensor readings, that are processed by other components.Event emission: Emitting a series of events or notifications to be handled by event listeners or subscribers.Data simulation: Creating simulated data for testing or demonstration purposes. Example Go package main import ( "fmt" "time" ) // Generator function that produces integers func generator(start, end int) <-chan int { out := make(chan int) go func() { for i := start; i <= end; i++ { out <- i } close(out) }() return out } func main() { // Start the generator gen := generator(1, 10) // Consume the generated values for value := range gen { fmt.Println("Received:", value) } } Pipeline Purpose To process data through a series of stages, where each stage transforms or processes the data before passing it to the next stage. Use Cases Data transformation: Applying a sequence of transformations to data, such as filtering, mapping, and reducing.Stream processing: Handling data streams in a step-by-step manner, where each step performs a specific operation on the data.Complex processing workflows: Breaking down complex processing tasks into manageable stages, such as data ingestion, transformation, and output. Example Go package main import ( "fmt" ) func generator(nums ...int) <-chan int { out := make(chan int) go func() { for _, n := range nums { out <- n } close(out) }() return out } func sq(in <-chan int) <-chan int { out := make(chan int) go func() { for n := range in { out <- n * n } close(out) }() return out } func main() { c := generator(2, 3, 4) out := sq(c) for n := range out { fmt.Println(n) } } Conclusion Understanding and utilizing concurrency patterns in Go can significantly enhance the performance and efficiency of your applications. The language's built-in support for goroutines and channels simplifies the process of managing concurrent execution, making it an excellent choice for developing high-performance systems. You can fully utilize Go's concurrency model to build robust, scalable applications by mastering these patterns.

By Suleiman Dibirov DZone Core CORE
Applying the Pareto Principle To Learn a New Programming Language
Applying the Pareto Principle To Learn a New Programming Language

In this article, I will discuss how you can apply the Pareto principle to quickly learn a new programming language and start solving real-world problems while you develop a deeper understanding of the programming language. What Is the Pareto Principle? The Pareto principle, also known as the 80/20 rule, states that for many outcomes, roughly 80% of consequences come from 20% of causes. Applying this to a personal level, 80% of your work-related output could come from only 20% of your time. I first came to know about this principle after reading the book "The 80/20 Principle: The Secret to Achieving More with Less" written by Richard Koch. How to Apply the Pareto Principle to Quickly Learn a New Programming Language When I initially started to learn programming, I used inefficient methods to learn it. I was watching hours and hours of video courses and reading books trying to master all the concepts that ever existed in the programming language before attempting to solve any real-world problems. By doing this, I was losing motivation to continue to learn. Over time, I realized that this is not an efficient way to learn a new skill. Learning about the 80/20 rule made me realize that by learning around 20% of the concepts in a programming language I could solve 80% of the problems. I needed to learn a new programming language in a short period of time a couple of times. The first time, I was using a programming language at work that was not easy to use for attending interviews, and I wanted to switch to a new programming language for solving problems in technical interviews. The second time, I was in a new team that used a completely new programming language that I had never used in the past. I used the following 4-step approach which made it efficient to learn the new programming language while keeping me motivated to increase my skill level with the programming language. Step 1: Identify key concepts of the programming language. Identify key concepts such as data structures, flow control statements, functions, classes, etc.Step 2: Spend 20% of your effort to learn these key concepts. Pick up a book or a course, and focus on learning only the key concepts identified in Step 1.Step 3: Solve some real-life problems using these concepts. Depending on the purpose of learning, pick some real-life problems and try to solve them using the concepts that you learned in the 2 steps above. For example, if you are planning to do technical interviews, try to solve some problems from websites like LeetCode or HackerRank.Step 4: Learn additional concepts as you encounter them. If you are stuck solving the problem, search for how to solve this problem and learn the additional advanced concepts as you encounter them. What Are Some Important Programming Concepts? As an example, let's look at some of the core concepts of Python that can be quickly learned before attempting to solve some problems using Python: Data structures: Review important available data structures such as strings, lists, tuples, dictionaries, and sets.Loops: Python offers two types of loops - the "for" loop and the "while" loop. Also, understand how to use continue and break statements within the loops.Conditional statements: Understand how to use conditional statements such as if, else, and elif.Logical operators: Learn logical operators such as and, or, not, etc. Functions: Learn how to define functions, pass arguments to the functions, and return values from the functions.Classes: Learn how to create and use Classes.Important built-in functions: Try to learn important built-in functions such as range(), format(), max(), min(), len(), type(), sorted(), print(), round(), etc.Other concepts: Lambdas, list comprehensions Conclusion Learning a new programming language may look daunting but leveraging the Pareto principle will make it easier to learn it quickly by spending 20% of the time mastering important concepts such as data structures, loops, conditional statements, functions, and classes and applying the knowledge to solve 80% of real-life problems.

By Krishna Vinnakota
Using Zero-Width Assertions in Regular Expressions
Using Zero-Width Assertions in Regular Expressions

Anchors ^ $ \b \A \Z Anchors in regular expressions allow you to specify the context in a string where your pattern should be matched. There are several types of anchors: ^ matches the start of a line (in multiline mode) or the start of the string (by default).$ matches the end of a line (in multiline mode) or the end of the string (by default).\A matches the start of the string.\Z or \z matches the end of the string.\b matches a word boundary (before the first letter of a word or after the last letter of a word).\B matches a position that is not a word boundary (between two letters or between two non-letter characters). These anchors are supported in Java, PHP, Python, Ruby, C#, and Go. In JavaScript, \A and \Z are not supported, but you can use ^ and $ instead of them; just remember to keep the multiline mode disabled. For example, the regular expression ^abc will match the start of a string that contains the letters "abc". In multiline mode, the same regex will match these letters at the beginning of a line. You can use anchors in combination with other regular expression elements to create more complex matches. For example, ^From: (.*) matches a line starting with From: The difference between \Z and \z is that \Z matches at the end of the string but also skips a possible newline character at the end. In contrast, \z is more strict and matches only at the end of the string. If you have read the previous article, you may wonder if the anchors add any additional capabilities that are not supported by the three primitives (alternation, parentheses, and the star for repetition). The answer is that they do not, but they change what is captured by the regular expression. You can match a line starting with abc by explicitly adding the newline character: \nabc, but in this case, you will also match the newline character itself. When you use ^abc, the newline character is not consumed. In a similar way, ing\b matches all words ending with ing. You can replace the anchor with a character class containing non-letter characters (such as spaces or punctuation): ing\W, but in this case, the regular expression will also consume the space or punctuation character. If the regular expression starts with ^ so that it only matches at the start of the string, it's called anchored. In some programming languages, you can do an anchored match instead of a non-anchored search without using ^. For example, in PHP (PCRE), you can use the A modifier. So the anchors don't add any new capabilities to the regular expressions, but they allow you to manage which characters will be included in the match or to match only at the beginning or end of the string. The matched language is still regular. Zero-Width Assertions (?= ) (?! ) (?<= ) (?<! ) Zero-width assertions (also called lookahead and lookbehind assertions) allow you to check that a pattern occurs in the subject string without capturing any of the characters. This can be useful when you want to check for a pattern without moving the match pointer forward. There are four types of lookaround assertions: (?=abc)The next characters are “abc” (a positive lookahead)(?!abc)The next characters are not “abc” (a negative lookahead)(?<=abc)The previous characters are “abc” (a positive lookbehind)(?<!abc)The previous characters are not “abc” (a negative lookbehind) Zero-width assertions are generalized anchors. Just like anchors, they don't consume any character from the input string. Unlike anchors, they allow you to check anything, not only line boundaries or word boundaries. So you can replace an anchor with a zero-width assertion, but not vice versa. For example, ing\b could be rewritten as ing(?=\W|$). Zero-width lookahead and lookbehind are supported in PHP, JavaScript, Python, Java, and Ruby. Unfortunately, they are not supported in Go. Just like anchors, zero-width assertions still match a regular language, so from a theoretical point of view, they don't add anything new to the capabilities of regular expressions. They just make it possible to skip certain things from the captured string, so you only check for their presence but don't consume them. Checking Strings After and Before the Expression The positive lookahead checks that there is a subexpression after the current position. For example, you need to find all div selectors with the footer ID and remove the div part: Search forReplace toExplanationdiv(?=#footer)“div” followed by “#footer” (?=#footer) checks that there is the #footer string here, but does not consume it. In div#footer, only div will match. A lookahead is zero-width, just like the anchors. In div#header, nothing will match, because the lookahead assertion fails. Of course, this can be solved without any lookahead: Search forReplace toExplanationdiv#footer#footerA simpler equivalent Generally, any lookahead after the expression can be rewritten by copying the lookahead text into a replacement or by using backreferences. In a similar way, a positive lookbehind checks that there is a subexpression before the current position: Search forReplace toExplanation(?<=<a href=")news/blog/Replace “news/” preceded by “<a href="” with “blog/”<a href="news/<a href="blog/The same replacement without lookbehind The positive lookahead and lookbehind lead to a shorter regex, but you can do without them in this case. However, these were just basic examples. In some of the following regular expressions, the lookaround will be indispensable. Testing the Same Characters for Multiple Conditions Sometimes you need to test a string for several conditions. For example, you want to find a consonant without listing all of them. It may seem simple at first: [^aeiouy] However, this regular expression also finds spaces and punctuation marks, because it matches anything except a vowel. And you want to match any letter except a vowel. So you also need to check that the character is a letter. (?=[a-z])[^aeiouy]A consonant[bcdfghjklmnpqrstvwxz]Without lookahead There are two conditions applied to the same character here: After (?=[a-z]) is checked, the current position is moved back because a lookahead has a width of zero: it does not consume characters, but only checks them. Then, [^aeiouy] matches (and consumes) one character that is not a vowel. For example, it could be H in HTML. The order is important: the regex [^aeiouy](?=[a-z]) will match a character that is not a vowel, followed by any letter. Clearly, it's not what is needed. This technique is not limited to testing one character for two conditions; there can be any number of conditions of different lengths: border:(?=[^;}]*\<solid\>)(?=[^;}]*\<red\>)(?=[^;}]*\<1px\>)[^;}]*Find a CSS declaration that contains the words solid, red, and 1px in any order. This regex has three lookahead conditions. In each of them, [^;}]* skips any number of any characters except ; and } before the word. After the first lookahead, the current position is moved back and the second word is checked, etc. The anchors \< and \> check that the whole word matches. Without them, 1px would match in 21px. The last [^;}]* consumes the CSS declaration (the previous lookaheads only checked the presence of words, but didn't consume anything). This regular expression matches {border: 1px solid red}, {border: red 1px solid;}, and {border:solid green 1px red} (different order of words; green is inserted), but doesn't match {border:red solid} (1px is missing). Simulating Overlapped Matches If you need to remove repeating words (e.g., replace the the with just the), you can do it in two ways, with and without lookahead: Search forReplace toExplanation\<(\w+)\s+(?=\1\>)Replace the first of repeating words with an empty string\<(\w+)\s+\1\>\1Replace two repeating words with the first word The regex with lookahead works like this: the first parentheses capture the first word; the lookahead checks that the next word is the same as the first one. The two regular expressions look similar, but there is an important difference. When replacing 3 or more repeating words, only the regex with lookahead works correctly. The regex without lookahead replaces every two words. After replacing the first two words, it moves to the next two words because the matches cannot overlap: However, you can simulate overlapped matches with lookaround. The lookahead will check that the second word is the same as the first one. Then, the second word will be matched against the third one, etc. Every word that has the same word after it will be replaced with an empty string: The correct regex without lookahead is \<(\w+)(\s+\1)+\> It matches any number of repeating words (not just two of them). Checking Negative Conditions The negative lookahead checks that the next characters do NOT match the expression in parentheses. Just like a positive lookahead, it does not consume the characters. For example, (?!toves) checks that the next characters are not “toves” without including them in the match. <\?(?!php)“<?” without “php” after it This pattern will match <? in <?echo 'text'?> or in <?xml. Another example is an anagram search. To find anagrams for “mate”, check that the first character is one of M, A, T, or E. Then, check that the second character is one of these letters and is not equal to the first character. After that, check the third character, which has to be different from the first and the second one, etc. \<([mate])(?!\1)([mate])(?!\1)(?!\2)([mate])(?!\1)(?!\2)(?!\3)([mate])\>Anagram for “mate” The sequence (?!\1)(?!\2) checks that the next character is not equal to the first subexpression and is not equal to the second subexpression. The anagrams for “mate” are: meat, team, and tame. Certainly, there are special tools for anagram search, which are faster and easier to use. A lookbehind can be negative, too, so it's possible to check that the previous characters do NOT match some expression: \w+(?<!ing)\bA word that does not end with “ing” (the negative lookbehind) In most regex engines, a lookbehind must have a fixed length: you can use character lists and classes ([a-z] or \w), but not repetitions such as * or +. Aba is free from this limitation. You can go back by any number of characters; for example, you can find files not containing a word and insert some text at the end of such files. Search forReplace toExplanation(?<!Table of contents.*)$$<a href="/toc">Contents</a>Insert the link to the end of each file not containing the words “Table of contents”^^(?!.*Table of contents)<a href="/toc">Contents</a>Insert it to the beginning of each file not containing the words However, you should be careful with this feature because an unlimited-length lookbehind can be slow. Controlling Backtracking A lookahead and a lookbehind do not backtrack; that is, when they have found a match and another part of the regular expression fails, they don't try to find another match. It's usually not important, because lookaround expressions are zero-width. They consume nothing and don't move the current position, so you cannot see which part of the string they match. However, you can extract the matching text if you use a subexpression inside the lookaround. For example: Search forReplace toExplanation(?=\<(\w+))\1Repeat each word Since lookarounds don't backtrack, this regular expression never matches: (?=(\N*))\1\NA regex that doesn't backtrack and always fails\N*\NA regex that backtracks and succeeds on non-empty lines The subexpression (\N*) matches the whole line. \1 consumes the previously matched subexpression and \N tries to match the next character. It always fails because the next character is a newline. A similar regex without lookahead succeeds because when the engine finds that the next character is a newline, \N* backtracks. At first, it has consumed the whole line (“greedy” match), but now it tries to match less characters. And it succeeds when \N* matches all but the last character of the line and \N matches the last character. It's possible to prevent excessive backtracking with a lookaround, but it's easier to use atomic groups for that. In a negative lookaround, subexpressions are meaningless because if a regex succeeds, negative lookarounds in it must fail. So, the subexpressions are always equal to an empty string. It's recommended to use a non-capturing group instead of the usual parentheses in a negative lookaround. (?!(a))\1A regex that always fails: (not A) and A

By Peter Kankowski
Linting Excellence: How Black, isort, and Ruff Elevate Python Code Quality
Linting Excellence: How Black, isort, and Ruff Elevate Python Code Quality

Linting and Its Importance Q: Can linting make my code better? A: No. If your logic is not good enough, it cannot help you, but it can surely make it look prettier. Linting is the process of analyzing code to identify potential errors, code quality issues, and deviations from coding standards. It is a crucial part of modern software development for several reasons: Error detection: Linting helps catch bugs and errors early in the development process.Code quality: It enforces coding standards, making code more readable and maintainable.Consistency: Ensures a uniform coding style across the codebase, which is particularly important in collaborative projectsEfficiency: Reduces the time spent on code reviews by automatically checking for common issues Available Tools for Linting and Formatting Several tools are available for linting and formatting Python code. Among them, the most popular are Black, Ruff, isort, PyLint, and Flake8, to name a few. There are unique strengths and weaknesses for each of the tools and they are also used for a specific purpose. In this article, we will look at Black, Ruff, and isort. A Glorious Example of How Not to Code Before diving into the comparison, let's take a look at a sample of poorly written Python code. This will help us illustrate the differences and capabilities of Black, Ruff, and isort. Python import datetime from io import BytesIO from datetime import datetime from __future__ import unicode_literals import os, sys, time from base64 import b64encode from PIL import Image, ImageDraw, Image from flask import Flask, request, redirect, url_for, send_file from werkzeug.utils import secure_filename numbers = [1, 2, 4,5,6, ] MyClass.function(arg1, arg2, arg3, flag, option) def my_func(some_data: list, *args, path: os.PathLike, name: str, verbosity: bool = True, quiet: bool = False): """Processes `data` using `args` and saves to `path`.""" with open(path, 'a') as file: ... if first_condititon \ and second_condition: ... Black Features Black performs in-place code style changes with a prime focus on the following: Opinionated (e.g., Spaces over Tabs) PEP8 Compliance [See Pragmatism] Smallest possible diffStability: Black has minimal to no configuration parameters, to ensure code style consistency.Post-processing AST checks to ensure no change in logic. Optionally you can turn it off by using the –fast option. Installation Install Black by running this command: pip install black Example Usage black [options] <SOURCE_FOLDER-or-FILE> See black --help for more details. How Did It Perform? Python import datetime from io import BytesIO from datetime import datetime from __future__ import unicode_literals import os, sys, time from base64 import b64encode from PIL import Image, ImageDraw, Image from flask import Flask, request, redirect, url_for, send_file from werkzeug.utils import secure_filename numbers = [ 1, 2, 4, 5, 6, ] MyClass.function(arg1, arg2, arg3, flag, option) def my_func( some_data: list, *args, path: os.PathLike, name: str, verbosity: bool = True, quiet: bool = False ): """Processes `data` using `args` and saves to `path`.""" with open(path, "a") as file: ... if first_condititon and second_condition: ... P.S. Notice how it did not sort/format the imports. isort Features isort prioritizes import organization with a primary focus on: Sorting: Sorts the imports alphabeticallySections: Groups the imports into sections and by typeMulti-line imports: Arranges the multi-line imports into a balanced gridAdd/Remove imports: isort can be run or configured to add/remove imports automatically. Installation Install isort by running this command: pip install isort Example Usage isort [OPTIONS] <SOURCE_FOLDER-or-FILE> See isort --help for more details. How Did It Perform? Python from __future__ import unicode_literals import datetime import os import sys import time from base64 import b64encode from datetime import datetime from io import BytesIO from flask import Flask, redirect, request, send_file, url_for from PIL import Image, ImageDraw from werkzeug.utils import secure_filename numbers = [1, 2, 4,5,6, ] MyClass.function(arg1, arg2, arg3, flag, option) def my_func(some_data: list, *args, path: os.PathLike, name: str, verbosity: bool = True, quiet: bool = False): """Processes `data` using `args` and saves to `path`.""" with open(path, 'a') as file: ... if first_condititon \ and second_condition: ... P.S. Notice how the code was not formatted. Ruff Features Ruff performs comprehensive linting and autofixes, adding type hints, and ensuring code quality and consistency. Linting: Performs a wide range of linting checksAutofix: Can automatically fix many issuesIntegration: Easy to integrate with other tools such as isortConfiguration: Supports configuration via pyproject.toml or command-line flags. Installation Install Ruff by running this command: pip install ruff Example Usage For linting: ruff check [OPTIONS] <SOURCE_FOLDER-or-FILE> For formatting: ruff format [OPTIONS] <SOURCE_FOLDER-or-FILE> See ruff --help for more details. Note: Ruff does not automatically sort imports. In order to do this, run the following: Shell ruff check --select I --fix ruff format How Did It Perform? Python from __future__ import unicode_literals import datetime import os import sys import time from base64 import b64encode from datetime import datetime from io import BytesIO from flask import Flask, redirect, request, send_file, url_for from PIL import Image, ImageDraw from werkzeug.utils import secure_filename numbers = [ 1, 2, 4, 5, 6, ] MyClass.function(arg1, arg2, arg3, flag, option) def my_func( some_data: list, *args, path: os.PathLike, name: str, verbosity: bool = True, quiet: bool = False, ): """Processes `data` using `args` and saves to `path`.""" with open(path, "a") as file: ... if first_condititon and second_condition: ... Where Do They Stand? black isort ruff Purpose Code formatter Import sorter and formatter Linter and formatter Speed Fast Fast Extremely fast Primary Functionality Formats Python code to a consistent style Sorts and formats Python imports Lints Python code and applies autofixes Configuration pyproject.toml pyproject.toml, .isort.cfg, setup.cfg pyproject.toml or command-line flags Ease of Use High High High Popularity Very high High Increasing Pros Extensive, opinionated styling Import grouping and sectioning for improved readability Faster than most linters; developed on Rust Cons May not have extensive styling rules like pylint - Supports all F Rules from Flake8, although, Missing a majority of E rules Conclusion Black, Ruff, and isort are powerful tools that help maintain high code quality in Python projects. Each tool has its specific strengths, making them suitable for different aspects of code quality: Black: Best for automatic code formatting and ensuring a consistent styleisort: Perfect for organizing and formatting import statementsRuff: Ideal for comprehensive linting and fixing code quality issues quickly By understanding the unique features and benefits of each tool, developers can choose the right combination to fit their workflow and improve the readability, maintainability, and overall quality of their codebase.

By Prince Bose
Contexts in Go: A Comprehensive Guide
Contexts in Go: A Comprehensive Guide

Contexts in Go provide a standard way to pass metadata and control signals between goroutines. They are mainly used to manage task execution time, data passing, and operation cancellation. This article covers different types of contexts in Go and examples of how to use them. Introduction to Contexts Contexts in Go are represented by the context.Context interface, which includes methods for getting deadlines, cancellation, values, and done channels. The primary package for working with contexts is context. Go package context type Context interface { Deadline() (deadline time.Time, ok bool) Done() <-chan struct{} Err() error Value(key interface{}) interface{} } Context Types There are six main functions to create contexts: context.Background(): Returns an empty context; It is usually used as the root context for the entire application.context.TODO(): Returns a context that can be used when a context is required but not yet defined; It signals that the context needs further work.context.WithCancel(parent Context): Returns a derived context that can be canceled by calling the cancel functioncontext.WithDeadline(parent Context, d time.Time): Returns a derived context that automatically cancels at a specified time (deadline)context.WithTimeout(parent Context, timeout time.Duration): Similar to the WithDeadline, but the deadline is set by a durationcontext.WithValue(parent Context, key, val interface{}): Returns a derived context that contains a key-value pair Examples of Using Contexts Context With Cancelation A context with cancelation is useful when you need to stop a goroutine based on an event. Go package main import ( "context" "fmt" "time" ) func main() { ctx, cancel := context.WithCancel(context.Background()) go func() { select { case <-time.After(2 * time.Second): fmt.Println("Operation completed") case <-ctx.Done(): fmt.Println("Operation canceled") } }() // try to change this value to 3 and execute again time.Sleep(1 * time.Second) cancel() time.Sleep(2 * time.Second) } Context With Timeout This context automatically cancels after a specified duration. Go package main import ( "context" "fmt" "time" ) func main() { ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second) defer cancel() go func() { select { case <-time.After(3 * time.Second): fmt.Println("Operation completed") case <-ctx.Done(): fmt.Println("Operation timed out") } }() // try to change this value to 2 and execute again time.Sleep(4 * time.Second) } Context With Deadline A context with a deadline is similar to a context with a timeout, but the time is set as a specific value. Go package main import ( "context" "fmt" "time" ) func main() { // try to change this value to 3 and execute again deadline := time.Now().Add(2 * time.Second) ctx, cancel := context.WithDeadline(context.Background(), deadline) defer cancel() go func() { select { case <-time.After(3 * time.Second): fmt.Println("Operation completed") case <-ctx.Done(): fmt.Println("Operation reached deadline") } }() time.Sleep(4 * time.Second) } Context With Values Contexts can store arbitrary data as key-value pairs. This is useful for passing parameters and settings to handlers. Go package main import ( "context" "fmt" "time" ) func main() { ctx := context.WithValue(context.Background(), "key", "value") go func(ctx context.Context) { if v := ctx.Value("key"); v != nil { fmt.Println("Value found:", v) } else { fmt.Println("No value found") } }(ctx) time.Sleep(1 * time.Second) } Applying Contexts Contexts are widely used in various parts of Go applications, including network servers, databases, and client requests. They help properly manage task execution time, cancel unnecessary operations, and pass data between goroutines. Using in HTTP Servers Go package main import ( "context" "fmt" "net/http" "time" ) func handler(w http.ResponseWriter, r *http.Request) { ctx := r.Context() select { case <-time.After(5 * time.Second): fmt.Fprintf(w, "Request processed") case <-ctx.Done(): fmt.Fprintf(w, "Request canceled") } } func main() { http.HandleFunc("/", handler) http.ListenAndServe(":8080", nil) } This code sets up an HTTP server that handles requests with a context-aware handler. It either completes after 5 seconds or responds if the request is canceled. Using in Databases Go package main import ( "context" "database/sql" "fmt" "time" _ "github.com/go-sql-driver/mysql" ) func queryDatabase(ctx context.Context, db *sql.DB) { query := "SELECT sleep(5)" rows, err := db.QueryContext(ctx, query) if err != nil { fmt.Println("Query error:", err) return } defer rows.Close() for rows.Next() { var result string if err := rows.Scan(&result); err != nil { fmt.Println("Scan error:", err) return } fmt.Println("Result:", result) } } func main() { db, err := sql.Open("mysql", "user:password@tcp(localhost:3306)/dbname") if err != nil { fmt.Println("Database connection error:", err) return } defer db.Close() ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second) defer cancel() queryDatabase(ctx, db) } Here, we connect to a MySQL database and execute a query with a context timeout of 3 seconds. If the query takes longer, it is canceled, and an error message is printed. Using in Goroutines Go package main import ( "context" "fmt" "time" ) func worker(ctx context.Context, id int) { for { select { case <-ctx.Done(): fmt.Printf("Worker %d stopped\n", id) return case <-time.After(1 * time.Second): fmt.Printf("Worker %d working\n", id) } } } func main() { ctx, cancel := context.WithCancel(context.Background()) for i := 1; i <= 3; i++ { go worker(ctx, i) } time.Sleep(3 * time.Second) cancel() time.Sleep(1 * time.Second) } In this example, the code spawns three worker goroutines that print status messages every second. The workers stop when the main function cancels the context after 3 seconds. Using in an API Request With a Deadline Go package main import ( "context" "fmt" "net/http" "time" ) func fetchAPI(ctx context.Context, url string) { req, err := http.NewRequestWithContext(ctx, "GET", url, nil) if err != nil { fmt.Println("Request creation error:", err) return } client := &http.Client{} resp, err := client.Do(req) if err != nil { fmt.Println("Request error:", err) return } defer resp.Body.Close() if resp.StatusCode == http.StatusOK { fmt.Println("API request succeeded") } else { fmt.Println("API request failed with status:", resp.StatusCode) } } func main() { ctx, cancel := context.WithDeadline(context.Background(), time.Now().Add(2*time.Second)) defer cancel() fetchAPI(ctx, "http://example.com/api") } This example demonstrates making an API request with a 2-second deadline. If the request is not completed within this timeframe, it is canceled, ensuring that the program does not wait indefinitely. Conclusion Contexts in Go are a powerful tool for managing execution time, cancelation, and data passing between goroutines. Using contexts correctly helps avoid resource leaks, ensures timely task completion, and improves code structure and readability. Various types of contexts, such as those with cancellation, timeout, deadline, and values, provide flexible task management in Go applications.

By Suleiman Dibirov DZone Core CORE
Next-Gen Lie Detector: Stack Selection
Next-Gen Lie Detector: Stack Selection

The first lie detector which relied on eye movement appeared in 2014. The Converus team together with Dr. John C. Kircher, Dr. David C. Raskin, and Dr. Anne Cook launched EyeDetect — a brand-new solution to detect deception quickly and accurately. This event became a turning point in the polygraph industry. In 2021, we finished working on a contactless lie detection technology based on eye-tracking and presented it at the International Scientific and Practical Conference. As I was part of the developers’ team, in this article, I would like to share some insights into how we worked on the creation of the new system, particularly how we chose our backend stack. What Is a Contactless Lie Detector and How Does It Work? We created a multifunctional hardware and software system for contactless lie detection. This is how it works: the system tracks a person's psychophysiological reactions by monitoring eye movements and pupil dynamics and automatically calculates the final test results. Its software consists of 3 applications. Administrator application: Allows the creation of tests and the administration of processesOperator application: Enables scheduling test dates and times, assigning tests, and monitoring the testing processRespondent application: Allows users to take tests using a special code On the computer screen, along with simultaneous audio (either synthesized or pre-recorded by a specialist), the respondent is given instructions on how to take the test. This is followed by written true/false statements based on developed testing methodologies. The respondent reads each statement and presses the "true" or "false" key according to their assessment of the statement's relevance. After half a second, the computer displays the next statement. Then, the lie-detector measures response time and error frequency, extracts characteristics from recordings of eye position and pupil size, and calculates the significance of the statement or the "probability of deception." To make it more visual here is a comparison of the traditionally used polygraph and lie-detector. CriteriaClassic PolygraphContactless Lie Detector Working Principle Registers changes in GSR, cardiovascular, and respiratory activity to measure emotional arousal Registers involuntary changes in eye movements and pupil diameter to measure cognitive effort Duration Tests take from 1.5 to 5 hours, depending on the type of examination Tests take from 15 to 40 minutes Report Time From 5 minutes to several hours; written reports can take several days Test results and reports in less than 5 minutes automatically Accuracy Screening test: 85% Investigation: 89% Screening test: 86-90% Investigation: 89% Sensor contact Sensors are placed on the body, some of which cause discomfort, particularly the two pneumatic tubes around the chest and the blood pressure cuff No sensors are attached to the person Objectivity Specialists interpret changes in responses. The specialist can influence the result. Manual evaluation of polygraphs requires training and is a potential source of errors. Automated testing process ensuring maximum reliability and objectivity. AI evaluates responses and generates a report. Training Specialists undergo 2 to 10 weeks of training. Regular advanced training courses Standard operator training takes less than 4 hours; administrator training for creating tests takes 8 hours. Remote training with a qualification exam. As you can see, our lie detector made the process more comfortable and convenient compared to traditional lie detectors. First of all, the tests take less time, from 15 to 40 minutes. Besides, one can get the results almost immediately. They are generated automatically within minutes. Another advantage is that there are no physically attached sensors which can be even more uncomfortable in an already stressful environment. Operator training is also less time-consuming. Most importantly, the results' credibility is still very high. Backend Stack Choice Our team had experience with Python and asyncio. Previously, we developed projects using Tornado. But at that time FastAPI was gaining popularity, so this time we decided to use Python with FastAPI and SQLAlchemy (with asynchronous support). To complement our choice of a popular backend stack, we decided to host our infrastructure on virtual machines using Docker. Avoiding Celery Given the nature of our lie detector, several mathematical operations require time to complete, making real-time execution during HTTP requests impractical. We developed multiple background tasks. Although Celery is a popular framework for such tasks, we opted to implement our own task manager. This decision stemmed from our use of CI/CD, where we restart various services independently. Sometimes, services could lose connection with Redis during these restarts. Our custom task manager, extending the base aioredis library, ensures reconnection if a connection is lost. Background Tasks Architecture At the project's outset, we had a few background tasks, which increased as functionality expanded. Some tasks were interdependent, requiring sequential execution. Initially, we used a queue manager where each task, upon completion, would trigger the next task via a message queue. However, asynchronous execution could lead to data issues due to varying execution speeds of related tasks. We then replaced this with a task manager that uses gRPC to call related tasks, ensuring execution order and resolving data dependency issues between tasks. Logging We couldn't use popular bug-tracking systems like Sentry for a few reasons. First, we didn’t want to use any third-party services managed and deployed outside of our infrastructure, so we were limited to using a self-hosted Sentry. At that time, we only had one dedicated server divided into multiple virtual servers, and there weren't enough resources for Sentry. Additionally, we needed to store not only bugs but also all information about requests and responses, which required the use of Elastic. Thus, we chose to store logs in Elasticsearch. However, memory leak issues led us to switch to Prometheus and Typesense. Maintaining backward compatibility between Elasticsearch and Typesense was a priority for us, as we were still determining if the new setup would meet our needs. This decision worked quite well, and we saw improvements in resource usage. The main reason for switching from Elastic to Typesense was resource usage. Elastic often requires a huge amount of memory, which is never sufficient. This is a common problem discussed in various forums, such as this one. Since Typesense is developed in C, it requires considerably fewer resources. Full-Text Search (FTS) Using PostgreSQL as our main database, we needed an efficient FTS mechanism. Based on previous experience, PostgreSQL's built-in ts_query and ts_vector could have performed better with Cyrillic text. Thus, we decided to synchronize PostgreSQL with Elasticsearch. While not the fastest solution, it provided enough speed and flexibility for our needs. PDF Report Generation As you may know, generating PDFs in Python can be quite complicated. This issue is rather common — the main challenge here is that to generate a PDF in Python you need to create an HTML file and only then convert it to PDF, similar to how it's done in other languages. This conversion process can sometimes produce unpredictable artifacts that are difficult to debug. Meanwhile, generating PDFs with JavaScript is much easier. We used Puppeteer to create an HTML page and then save it as a PDF just as we would in a browser, avoiding these problems altogether. To Conclude In conclusion, I would like to stress that this project turned out to be demanding in terms of choosing the right solutions but at the same time, it was more than rewarding. We received numerous unconventional customer requests that often questioned standard rules and best practices. The most exciting part of the journey was implementing mathematical models developed by another team into the backend architecture and designing a database architecture to handle a vast amount of unique data. It made me realize once again that popular technologies and tools are not always the best option for every case. We always need to explore different methodologies and remain open to unconventional solutions for common tasks.

By Grigorii Novikov
Apache Hudi: A Deep Dive With Python Code Examples
Apache Hudi: A Deep Dive With Python Code Examples

In today's data-driven world, real-time data processing and analytics have become crucial for businesses to stay competitive. Apache Hudi (Hadoop Upserts and Incremental) is an open-source data management framework that provides efficient data ingestion and real-time analytics on large-scale datasets stored in data lakes. In this blog, we'll explore Apache Hudi with a technical deep dive and Python code examples, using a business example for better clarity. Table of Contents: Introduction to Apache Hudi Key Features of Apache HudiBusiness Use CaseSetting Up Apache HudiIngesting Data with Apache HudiQuerying Data with Apache HudiSecurity and Other Aspects SecurityPerformance OptimizationMonitoring and ManagementConclusion 1. Introduction to Apache Hudi Apache Hudi is designed to address the challenges associated with managing large-scale data lakes, such as data ingestion, updating, and querying. Hudi enables efficient data ingestion and provides support for both batch and real-time data processing. Key Features of Apache Hudi Upserts (Insert/Update) Efficiently handle data updates and inserts with minimal overhead. Traditional data lakes struggle with updates, but Hudi's upsert capability ensures that the latest data is always available without requiring full rewrites of entire datasets. Incremental Pulls Retrieve only the changed data since the last pull, which significantly optimizes data processing pipelines by reducing the amount of data that needs to be processed. Data Versioning Manage different versions of data, allowing for easy rollback and temporal queries. This versioning is critical for ensuring data consistency and supporting use cases such as time travel queries. ACID Transactions Ensure data consistency and reliability by providing atomic, consistent, isolated, and durable transactions on data lakes. This makes Hudi a robust choice for enterprise-grade applications. Compaction Hudi offers a compaction mechanism that optimizes storage and query performance. This process merges smaller data files into larger ones, reducing the overhead associated with managing numerous small files. Schema Evolution Handle changes in the data schema gracefully without disrupting the existing pipelines. This feature is particularly useful in dynamic environments where data models evolve over time. Integration With Big Data Ecosystem Hudi integrates seamlessly with Apache Spark, Apache Hive, Apache Flink, and other big data tools, making it a versatile choice for diverse data engineering needs. 2. Business Use Case Let's consider a business use case of an e-commerce platform that needs to manage and analyze user order data in real time. The platform receives a high volume of orders every day, and it is essential to keep the data up-to-date and perform real-time analytics to track sales trends, inventory levels, and customer behavior. 3. Setting Up Apache Hudi Before we dive into the code, let's set up the environment. We'll use PySpark and the Hudi library for this purpose. Shell # Install necessary libraries pip install pyspark==3.1.2 pip install hudi-spark-bundle_2.12 4. Ingesting Data With Apache Hudi Let's start by ingesting some order data into Apache Hudi. We'll create a DataFrame with sample order data and write it to a Hudi table. Python from pyspark.sql import SparkSession from pyspark.sql.functions import col, lit import datetime # Initialize Spark session spark = SparkSession.builder \ .appName("HudiExample") \ .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \ .config("spark.sql.hive.convertMetastoreParquet", "false") \ .getOrCreate() # Sample order data order_data = [ (1, "2023-10-01", "user_1", 100.0), (2, "2023-10-01", "user_2", 150.0), (3, "2023-10-02", "user_1", 200.0) ] # Create DataFrame columns = ["order_id", "order_date", "user_id", "amount"] df = spark.createDataFrame(order_data, columns) # Define Hudi options hudi_options = { 'hoodie.table.name': 'orders', 'hoodie.datasource.write.storage.type': 'COPY_ON_WRITE', 'hoodie.datasource.write.recordkey.field': 'order_id', 'hoodie.datasource.write.partitionpath.field': 'order_date', 'hoodie.datasource.write.precombine.field': 'order_date', 'hoodie.datasource.hive_sync.enable': 'true', 'hoodie.datasource.hive_sync.database': 'default', 'hoodie.datasource.hive_sync.table': 'orders', 'hoodie.datasource.hive_sync.partition_fields': 'order_date' } # Write DataFrame to Hudi table df.write.format("hudi").options(**hudi_options).mode("overwrite").save("/path/to/hudi/orders") print("Data ingested successfully.") 5. Querying Data With Apache Hudi Now that we have ingested the order data, let's query the data to perform some analytics. We'll use the Hudi DataSource API to read the data. Python # Read data from Hudi table orders_df = spark.read.format("hudi").load("/path/to/hudi/orders/*") # Show the ingested data orders_df.show() # Perform some analytics # Calculate total sales total_sales = orders_df.groupBy("order_date").sum("amount").withColumnRenamed("sum(amount)", "total_sales") total_sales.show() # Calculate sales by user sales_by_user = orders_df.groupBy("user_id").sum("amount").withColumnRenamed("sum(amount)", "total_sales") sales_by_user.show() 6. Security and Other Aspects When working with large-scale data lakes, security, and data governance are paramount. Apache Hudi provides several features to ensure your data is secure and compliant with regulatory requirements. Security Data Encryption Hudi supports data encryption at rest to protect sensitive information from unauthorized access. By leveraging Hadoop's native encryption support, you can ensure that your data is encrypted before it is written to disk. Access Control Integrate Hudi with Apache Ranger or Apache Sentry to manage fine-grained access control policies. This ensures that only authorized users and applications can access or modify the data. Audit Logging Hudi can be integrated with log aggregation tools like Apache Kafka or Elasticsearch to maintain an audit trail of all data operations. This is crucial for compliance and forensic investigations. Data Masking Implement data masking techniques to obfuscate sensitive information in datasets, ensuring that only authorized users can see the actual data. Performance Optimization Compaction As mentioned earlier, Hudi's compaction feature merges smaller data files into larger ones, optimizing storage and query performance. You can schedule compaction jobs based on your workload patterns. Indexing Hudi supports various indexing techniques to speed up query performance. Bloom filters and columnar indexing are commonly used to reduce the amount of data scanned during queries. Caching Leverage Spark's in-memory caching to speed up repeated queries on Hudi datasets. This can significantly reduce query latency for interactive analytics. Monitoring and Management Metrics Hudi provides a rich set of metrics that can be integrated with monitoring tools like Prometheus or Grafana. These metrics help you monitor the health and performance of your Hudi tables. Data Quality Implement data quality checks using Apache Griffin or Deequ to ensure that the ingested data meets your quality standards. This helps in maintaining the reliability of your analytics. Schema Evolution Hudi's support for schema evolution allows you to handle changes in the data schema without disrupting existing pipelines. This is particularly useful in dynamic environments where data models evolve over time. 7. Conclusion In this blog, we have explored Apache Hudi and its capabilities to manage large-scale data lakes efficiently. We set up a Spark environment, ingested sample order data into a Hudi table, and performed some basic analytics. We also discussed the security aspects and performance optimizations that Apache Hudi offers. Apache Hudi's ability to handle upserts, provide incremental pulls, and ensure data security makes it a powerful tool for real-time data processing and analytics. By leveraging Apache Hudi, businesses can ensure their data lakes are up-to-date, secure, and ready for real-time analytics, enabling them to make data-driven decisions quickly and effectively. Feel free to dive deeper into Apache Hudi's documentation and explore more advanced features to further enhance your data engineering workflows. If you have any questions or need further clarification, please let me know in the comments below!

By Harsh Daiya DZone Core CORE

Top Languages Experts

expert thumbnail

Kai Wähner

Technology Evangelist,
Confluent

Kai Waehner works as Technology Evangelist at Confluent. Kai’s main area of expertise lies within the fields of Big Data Analytics, Machine Learning / Deep Learning, Messaging, Integration, Microservices, Internet of Things, Stream Processing and Blockchain. He is regular speaker at international conferences such as JavaOne, O’Reilly Software Architecture or ApacheCon, writes articles for professional journals, and shares his experiences with new technologies on his blog (www.kai-waehner.de/blog). Contact and references: [email protected] / @KaiWaehner / www.kai-waehner.de
expert thumbnail

Alvin Lee

Founder,
Out of the Box Development, LLC

Full-stack developer and technology consultant specializing in web architectures, microservices, and API integrations.

The Latest Languages Topics

article thumbnail
React’s Unstoppable Rise: Why It’s Here to Stay
React transformed front-end development with its Virtual DOM, robust ecosystem, and continuous innovation — poised to stay the top choice for years to come.
February 6, 2025
by Maulik Suchak
· 583 Views · 1 Like
article thumbnail
Exploring the Purpose of Pytest Fixtures: A Practical Guide
This blog explains how to use Pytest fixtures for initializing and cleaning up Selenium WebDriver, with a practical example using the Sauce Labs Demo website.
February 3, 2025
by Sidharth Shukla
· 1,415 Views · 1 Like
article thumbnail
Java Stream API: 3 Things Every Developer Should Know About
Java Stream API simplifies collection processing with lazy evaluation, parallelism, and functional programming. Use it to write cleaner, efficient, and scalable code.
February 3, 2025
by Danil Temnikov
· 3,866 Views
article thumbnail
Building Neural Networks With Automatic Differentiation
In this post, we will write a basic DNN using simple Python. To do that, we need to understand automatic differentiation and then implement it in code.
February 3, 2025
by Mayank Gupta
· 1,248 Views · 1 Like
article thumbnail
Building RAG Apps With Apache Cassandra, Python, and Ollama
A brief introduction to Apache Cassandra for retrieval-augmented generation using Python and Ollama for developing applications free of cost locally or on a server.
February 3, 2025
by Varun Setia
· 1,764 Views
article thumbnail
Pydantic: Simplifying Data Validation in Python
Pydantic is a powerful Python library that uses type annotations to validate data structures. Learn about the powerful features of Pydantic with code examples.
February 3, 2025
by Vidyasagar (Sarath Chandra) Machupalli FBCS DZone Core CORE
· 1,226 Views · 3 Likes
article thumbnail
Web Scraping With LLMs, ScrapeGraphAI, and LangChain
In this article, learn how to use LLMs for web scraping with ScrapeGraphAI, LangChain, and Pydantic. This guide covers setup, configuration, and data extraction
January 31, 2025
by Juveria dalvi
· 2,777 Views
article thumbnail
Magic of Aspects: How AOP Works in Spring
Article explains how Aspect-Oriented Programming (AOP) simplifies modern app development by handling cross-cutting concerns like logging, security, and performance seam
January 31, 2025
by Danil Temnikov
· 2,688 Views · 3 Likes
article thumbnail
Page Transactions: A New Approach to Test Automation
Guará is the Python implementation of the design pattern Page Transactions. It focuses on the transactions a user can perform on an application, such as Submit Forms.
January 31, 2025
by Douglas Cardoso
· 1,803 Views
article thumbnail
Bridging Graphviz and Cytoscape.js for Interactive Graphs
Making Graphviz static digraphs interactive and compatible with Cytoscape by converting DOT format graphs into Cytoscape JSON using Python.
January 30, 2025
by Puneet Malhotra
· 2,084 Views · 1 Like
article thumbnail
SmartXML: An Alternative to XPath for Complex XML Files
We'll discuss SmartXML, an XPath alternative for parsing complex XML files, converting them to SQL, and loading the results into a database seamlessly.
January 30, 2025
by Luca Sanders
· 2,648 Views · 2 Likes
article thumbnail
Structured Logging in Grails 6.2.3
Comparison of traditional logging with structured logging, its advantages, and enhancements in the latest version of Grails 6.2x.
January 30, 2025
by Karthik Kamarapu
· 2,122 Views · 1 Like
article thumbnail
Passing JSON Variables in Azure Pipelines
Learn how to handle JSON variables in Azure DevOps pipelines, avoid escaping issues, and ensure seamless API integration with proper normalization techniques.
January 29, 2025
by Mohammed Basil
· 3,335 Views · 1 Like
article thumbnail
How to Split PDF Files into Separate Documents Using Java
In this article, we discuss how PDF file structure manages individual page objects, and we learn how to split those pages into new PDF documents with APIs.
January 29, 2025
by Brian O'Neill DZone Core CORE
· 3,324 Views · 1 Like
article thumbnail
Why You Don’t Need That New JavaScript Library
Sticking to vanilla JavaScript and proven libraries over flashy new tools leads to more maintainable, secure, and efficient software development.
January 29, 2025
by Denis Ermakov
· 3,088 Views
article thumbnail
Metal and the Simulated Annealing Algorithm
The Simulated Annealing algorithm described in this article demonstrates its effectiveness as a powerful tool for finding optimal solutions to complex problems.
January 29, 2025
by Vitaly Kuznetsov (Ippolitov)
· 2,032 Views · 2 Likes
article thumbnail
Using Custom React Hooks to Simplify Complex Scenarios
Learn some advanced techniques for building custom React hooks to simplify complex logic, improve code reuse, and enhance state management in your apps.
January 29, 2025
by Raju Dandigam
· 2,582 Views
article thumbnail
Scrape Amazon Product Reviews With Python
Let's learn how we can implement Python and Python scripts to scrape the Amazon website in an ethical way to extract product review data.
January 29, 2025
by Juveria dalvi
· 1,862 Views · 1 Like
article thumbnail
The Energy Efficiency of JVMs and the Role of GraalVM
Exploring the JVM ecosystem reveals a strong correlation between energy efficiency and code performance, with GraalVM as a top-tier runtime for optimizing both.
January 29, 2025
by Graziano Casto
· 3,062 Views · 3 Likes
article thumbnail
Using Spring AI to Generate Images With OpenAI's DALL-E 3
Integrate Spring AI with OpenAI's DALL-E 3 to generate images. Set up Spring Boot, configure the API integration, and customize settings easily.
January 28, 2025
by Danil Temnikov
· 4,422 Views · 4 Likes
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: