News AggregatorLarge Language Models: Changing the Game in Software Development With Code Generation, Debugging, and CI/CD IntegrationAggregated on: 2024-08-15 14:22:57 With AI, the domain of software development is experiencing a breakthrough phase with the continuous integration of state-of-the-art Large Language Models like GPT-4 and Claude Opus. These models extend beyond the role of traditional developer tools to directly assist developers in translating verbal instructions into executable code across a variety of programming languages, which speeds up the process of coding. Code Generation Enhancing Developer Productivity LLMs understand context and generate best-practice pieces of code, making them very good at enhancing the productivity of developers and their future research. They work as a developer on-call assistant, offering insight and alternatives that may even elude more experienced programmers. Such a role gains a lot of importance in large and complex projects where the integration of different software modules might introduce subtle, sometimes undetectable bugs. View more...Data Pipeline vs. ETL PipelineAggregated on: 2024-08-15 13:22:57 In today's world, data is a key success factor for many information systems. To exploit data, it needs to be moved and collected from many different locations, using many different technologies and tools. It is important to understand the difference between a data pipeline and an ETL pipeline. While both are designed to move data from one place to another, they serve different purposes and are optimized for different tasks. The comparison table below highlights the key differences: View more...Essential Guidelines for Building Optimized ETL Data Pipelines in the Cloud With Azure Data FactoryAggregated on: 2024-08-14 22:22:56 When building ETL data pipelines using Azure Data Factory (ADF) to process huge amounts of data from different sources, you may often run into performance and design-related challenges. This article will serve as a guide in building high-performance ETL pipelines that are both efficient and scalable. Below are the major guidelines to consider when building optimized ETL data pipelines in ADF: View more...Safeguarding Democracy in the Digital Age: Insights from Day 1 at Black Hat 2024 and Las Vegas OfficialsAggregated on: 2024-08-14 21:22:56 In an era where technology and geopolitics intersect more than ever before, the importance of cybersecurity in maintaining democratic processes cannot be overstated. At Black Hat 2024, global leaders and local officials converged to discuss the challenges and strategies for protecting elections, critical infrastructure, and the very foundations of democracy. This article delves into the insights shared at the conference, offering developers, engineers, and architects a comprehensive view of the cybersecurity landscape and its implications for democratic societies. The Changing Landscape of Global Threats Jeff Moss, founder of Black Hat, set the stage by highlighting the rapid evolution of the threat landscape. "Things are different now," Moss observed. "Things have sped up. You have all the routine problems and a giant bucket of other problems, you have all these risks you didn't think about." View more...Advance Traffic Management in Canary Using Istio, Argo Rollouts, and HPAAggregated on: 2024-08-14 20:22:56 As enterprises mature in their CI/CD journey, they tend to ship code faster, safely, and securely. One essential strategy the DevOps team applies is releasing code progressively to production, also known as canary deployment. Canary deployment is a bulletproof mechanism that safely releases application changes and provides flexibility for business experiments. It can be implemented using open-source software like Argo Rollouts and Flagger. However, advanced DevOps teams want to gain granular control over their traffic and pod scaling while performing canary deployment to reduce overall costs. Many enterprises achieve advanced traffic management of canary deployment at scale using open-source Istio service mesh. We want to share our knowledge with the DevOps community through this blog. Before we get started, let us discuss the canary architecture implemented by Argo Rollouts and Istio. View more...Chat With Your Code: Conversational AI That Understands Your CodebaseAggregated on: 2024-08-14 19:22:56 Imagine having a tool that understands your code and can answer your questions, provide insights, and even help debug issues — all through natural language queries. In this article, we’ll walk you through the process of creating a conversational AI that allows you to talk to your code using Chainlit, Qdrant, and OpenAI. Benefits of Conversational AI for Codebases Streamlined code review: Quickly review specific code modules and understand their context without spending time digging through the files. Efficient debugging: Ask questions about potential issues in the code and get targeted responses, which helps reduce the time spent on troubleshooting. Enhanced learning: New team members can learn about how different components in the code work without having to pair with existing experts in the code. Improved documentation: Summarizing using AI would help generate explanations for complex code, making it easier to enhance documentation. Now let us look at how we made that happen. View more...Exploring JSON Schema for Form Validation in Web ComponentsAggregated on: 2024-08-14 18:22:56 In the realm of modern web development, ensuring data integrity and user experience is paramount. JSON Schema has emerged as a powerful tool for validating the structure of JSON data, providing developers with a standardized approach to data validation, documentation, and extensibility. When combined with Web Components, JSON Schema becomes an even more potent solution for form validation, offering a modular, reusable, and maintainable approach to UI development. This blog post will walk you through the process of integrating JSON Schema validation into a custom Web Component, using a contact form as our example. View more...Unlocking the Potential of Synthetic Data for AI DevelopmentAggregated on: 2024-08-14 17:22:56 Data is the bane of existence for AI models, and the accuracy and effectiveness of the AI systems significantly depend upon the completeness of the data used during the training. Although real data undoubtedly makes AI systems more effective, there are certain challenges, as the real data can be imbalanced, biased, or incomplete. Hence, to cope with the shortages in real data, data scientists have to source synthetic data. Synthetic data is considerably more inexpensive than real data, but there are still some challenges, like ensuring demographic diversity, reliability, and sufficient volume accumulation, which data scientists must mitigate. View more...Creating Effective Exceptions in Java Code [Video]Aggregated on: 2024-08-14 16:22:56 This article will explore the critical topic of creating effective exceptions in your Java code. Exceptions are crucial in identifying when something goes wrong during code execution. They are instrumental in managing data inconsistency and business validation errors. We will outline three key steps for writing effective exceptions in your Java code. Writing and Defining an Exception Hierarchy Creating a Trackable Exception Message Avoiding Security Problems with Exceptions 1. Writing and Defining an Exception Hierarchy Define the exception hierarchy as the first step in your design process. By considering the domain, you can begin with more general exceptions and then move towards more specific ones. This approach enables you to trace issues using the hierarchy tree or exception names. View more...Top 10 C# Keywords and FeaturesAggregated on: 2024-08-14 15:22:56 The language C# stands out as the top 5th programming language in a Stack Overflow survey. It is widely used for creating various applications, ranging from desktop to mobile to cloud native. With so many language keywords and features it will be taxing to developers to keep up to date with new feature releases. This article delves into the top 10 C# keywords every C# developer should know. 1. Async and Await Keywords: async, await The introduction of async and await keywords in C# make it easy to handle asynchronous programming in C#. They allow you to write code that performs operations without blocking the main thread. This capability is particularly useful for tasks that are I/O-bound or CPU-intensive. By making use of these keywords, programmers can easily handle long-running compute operations like invoking external APIs to get data or writing or reading from a network drive. This will help in developing responsive applications and can handle concurrent operations. View more...Integrate Spring With Open AIAggregated on: 2024-08-14 14:22:56 In this article, I will discuss in a practical and objective way the integration of the Spring framework with the resources of the OpenAI API, one of the main artificial intelligence products on the market. The use of artificial intelligence resources is becoming increasingly necessary in several products, and therefore, presenting its application in a Java solution through the Spring framework allows a huge number of projects currently in production to benefit from this resource. View more...Enhance Your Communication Strategy: Deliver Multimedia Messages With AWS PinpointAggregated on: 2024-08-14 13:22:56 In today's digital world, email is the go-to channel for effective communication, with attachments containing flyers, images, PDF documents, etc. However, there could be business requirements for building a service for sending an SMS with an attachment as an MMS (Multimedia Messaging Service). This article delves into how to send multiple media messages (MMS), their limitations, and implementation details using the AWS Pinpoint cloud service. Setting Up AWS Pinpoint Service Setting Up Phone Pool In the AWS console, we navigate to AWS End User Messaging and set up the Phone pool. The phone pool comprises the phone numbers from which we will send the message; these are the numbers from which the end user will receive the MMS message. View more...Virtual Clusters: The Key to Taming Cloud Costs in the Kubernetes EraAggregated on: 2024-08-13 22:22:56 The economic volatility in the tech industry has most enterprises looking at their cloud bills and searching for deterministic ways to drive down costs. One of the interesting layers of that consideration is the cloud architecture itself. The following is an interview with Loft Labs CEO Lukas Gentele - the creator of the vCluster open-source project - to learn more about how virtualizing clusters is giving developers and platform teams productive new ways to right-size cloud resource utilization. Interview Question 1 What’s different about the cloud cost outlook today, from recent years - from your point of view? View more...Navigation From Adobe Reader With Konva.jsAggregated on: 2024-08-13 21:22:56 In one of the popular PDF viewers, Acrobat Reader, there is a tool called the "Hand Tool." It allows you to navigate through the document by dragging and dropping while holding down the mouse button. You can activate the "Hand Tool" by clicking the corresponding button on the toolbar. “Hand Tool” button in Acrobat Reader View more...Our Shift From Cypress to Playwright in TestingAggregated on: 2024-08-13 20:22:56 In our post about extensive-react-boilerplate updates, we mentioned that we migrated e2e-testing from Cypress to Playwright. Now, let's delve a little deeper into this change. At the time of writing the automated tests, we had a small amount of functionality to cover and didn't face significant limitations when using Cypress. Yet, we decided to turn our attention to Playwright for several reasons. We wanted to explore the framework created by Microsoft and understand why it is gaining popularity. Additionally, similar to the case when we added MongoDB support, we received requests from the community and colleagues who wanted to start a project based on boilerplate with Playwright tests. View more...Secrets Management Core PracticesAggregated on: 2024-08-13 19:37:56 Secrets management plays a pivotal role in any modern security environment, and its importance continues to be highlighted as time and time again, we witness security breaches across industries, even occurrences directly caused by the improper safeguarding or mishandling of secrets. In this Refcard, readers will learn about the core practices for a centralized secrets management strategy — from initial steps in creating a single source of truth to key measures for secrets injection, automation, compliance, monitoring, and more. View more...Why and How We Built a Primary-Replica Architecture of ClickHouseAggregated on: 2024-08-13 19:22:56 Our company uses artificial intelligence (AI) and machine learning to streamline the comparison and purchasing process for car insurance and car loans. As our data grew, we had problems with AWS Redshift which was slow and expensive. Changing to ClickHouse made our query performance faster and greatly cut our costs. But this also caused storage challenges like disk failures and data recovery. To avoid extensive maintenance, we adopted JuiceFS, a distributed file system with high performance. We innovatively use its snapshot feature to implement a primary-replica architecture for ClickHouse. This architecture ensures high availability and stability of the data while significantly enhancing system performance and data recovery capabilities. Over more than a year, it has operated without downtime and replication errors, delivering expected performance. View more...Building a Twilio Softphone With JavaScript, HTML, and FlaskAggregated on: 2024-08-13 18:22:56 Being able to dial and receive calls right in our web browser has a huge value proposition today given the digital era we find ourselves living in. Whether you're creating a customer service dashboard or simply trying to add voice communication to your application, building a softphone is an excellent example of leveraging modern web technologies. In this article, I will show you how to build a softphone using Twilio with just plain JavaScript and HTML that has some amazing features, but writing less code by replacing it with existing methods available. Prerequisites First things first, make sure you have the following: View more...What Does It Take to Manage an On-Premise vs Cloud Data Security Product?Aggregated on: 2024-08-13 17:22:56 Before we ponder this question, let’s first understand the major differences between an on-premise and a cloud data security product. An on-premise data security product means the management console is on the enterprise customer’s premises, whereas the security vendor hosts a cloud data security product in the cloud. Security vendors aid customers by providing clear guidance on installing and maintaining an on-premise solution. Customers are responsible for the hardware, OS, and product configuration to protect against threats and sensitive data leaks. The security vendors manage cloud solutions, and enterprise customers must configure the product to meet their needs. You might ask, "Why would an enterprise customer take on the burden of managing the installation and maintenance of the product, given that configuring and making it work already poses a significant toll on them?" Great question! Not all enterprise customers are comfortable having their data stored in the cloud along with other customers (typically, this happens in a multi-tenant cloud deployment). You might wonder why not opt for a private cloud. Of course, they can opt for a private cloud, nevertheless, the data is not entirely under their control. Security vendors manage the account and the data. View more...Avoiding Prompt-Lock: Why Simply Swapping LLMs Can Lead to FailureAggregated on: 2024-08-13 16:22:56 The mantra in the world of generative AI models today is "the latest is the greatest," but that’s far from the case. We are lured (and spoiled) by choice with new models popping up left and right. Good problem to have? Maybe, but it comes with a big opportunity: model fatigue. There’s an issue that has the potential to wreak havoc on your ML initiatives: prompt lock-in. Models today are so accessible that at the click of a button, anyone can virtually begin prototyping by pulling models from a repository like HuggingFace. Sounds too good to be true? That’s because it is. There are dependencies baked into models that can break your project. The prompt you perfected for GPT-3.5 will likely not work as expected in another model, even one with comparable benchmarks or from the same “model family.” Each model has its own nuances, and prompts must be tailored to these specificities to get the desired results. View more...Building Product To Learn AI, Part 2: Shake and BakeAggregated on: 2024-08-13 15:22:56 If you haven't already, be sure to review Part 1 where we reviewed data collection and prepared a dataset for our model to train on. In the previous section, we gathered the crucial "ingredients" for our AI creation — the data. This forms the foundation of our model. Remember, the quality of the ingredients (your data) directly impacts the quality of the final dish (your model's performance). View more...Neural Networks: From Perceptrons to Deep LearningAggregated on: 2024-08-13 14:22:56 From being inspired by the human brain to developing sophisticated models that allow for wonderful feats, the journey of neural networks has truly come a long way. In the following blog, we will discuss in depth the technical journey of neural networks — from the basic perceptron to advanced deep learning architectures driving AI innovations today. The Human System The human brain contains an estimated 86 billion neurons, all adjacent to each other and connected via synapses. Each neuron receives signals through the dendrites, then processes these through the soma, and sends its output down the axon to post-synaptic neurons. This complex network is how the brain is able to process vast amounts of information and perform exceedingly complex tasks. View more...A Hands-On Guide to OpenTelemetry: Programmatic Instrumentation for DevelopersAggregated on: 2024-08-13 13:22:56 Are you ready to start your journey on the road to collecting telemetry data from your applications? Great observability begins with great instrumentation! In this series, you'll explore how to adopt OpenTelemetry (OTel) and how to instrument an application to collect tracing telemetry. You'll learn how to leverage out-of-the-box automatic instrumentation tools and understand when it's necessary to explore more advanced manual instrumentation for your applications. By the end of this series, you'll have an understanding of how telemetry travels from your applications to the OpenTelemetry Collector, and be ready to bring OpenTelemetry to your future projects. Everything discussed here is supported by a hands-on, self-paced workshop authored by Paige Cruz. View more...DynamoDB: How To Move OutAggregated on: 2024-08-12 22:22:55 Moving data from one place to another is conceptually simple. You simply read from one datasource and write to another. However, doing that consistently and safely is another story. There are a variety of mistakes you can make if you overlook important details. We recently discussed the top reasons so many organizations are currently seeking DynamoDB alternatives. Beyond costs (the most frequently mentioned factor), aspects such as throttling, hard limits, and vendor lock-in are frequently cited as motivation for a switch. View more...How to Document Your AWS Cloud Infrastructure Using Multicloud-Diagrams FrameworkAggregated on: 2024-08-12 21:22:55 The Importance of Infrastructure Diagrams in Architecture In the world of cloud computing and complex distributed systems, creating infrastructure diagrams is vital for understanding, designing, and communicating the architecture of our applications. These diagrams serve as visual blueprints that help teams grasp the layout, connections, and workflows within their systems. They also play a crucial role in documentation, troubleshooting, and scaling operations. This article explores the importance of infrastructure diagrams, introduces the multicloud-diagrams framework, and explains the concept of Diagrams as Code. We will use AWS cloud nodes and services, but on-prem nodes are also available for usage. View more...Over-Architected? Maybe, Maybe NotAggregated on: 2024-08-12 20:22:55 An oft-heard criticism of way-to-many software solutions is that it's over-architected, implying that the design, abstractions, implementation, deployment, or whatever is unnecessarily complex, difficult to understand, unmaintainable, unnecessary, or wrong. Criticisms are often thrown into the ether without context or supporting narrative; criticisms that often stick. So what's gained by labeling a solution as over-architected? View more...A Deep Dive Into Recommendation Algorithms With Netflix Case Study and NVIDIA Deep Learning TechnologyAggregated on: 2024-08-12 19:22:55 What Are Recommendation Algorithms? Recommendation Engines are the secret behind every Internet transaction, be it Amazon, Netflix, Flipkart, YouTube, TikTok, even LinkedIn, Facebook, X(Twitter), Snapchat, Medium, Substack, HackerNoon. . . all of these sites and nearly every content curation or product marketplace site on the Internet make their big bucks from recommendation algorithms. Simply put, a recommendation algorithm builds a model of your likes, dislikes, favorites, things you prefer, genres you prefer, and items you prefer, and when one transaction is made on the site, they practically almost read your mind and predict the next product you are most likely to buy. Some of the recommendation algorithms on YouTube and TikTok are so accurate that they can keep users hooked for hours. I would be surprised if even one reader did not report a YouTube binge that came out of just scrolling and clicking/tapping for around ten minutes. View more...OpenTelemetry Tracing on Spring Boot: Java Agent vs. Micrometer TracingAggregated on: 2024-08-12 18:22:55 My demo of OpenTelemetry Tracing features two Spring Boot components. One uses the Java agent, and I noticed a different behavior when I recently upgraded it from v1.x to v2.x. In the other one, I'm using Micrometer Tracing because I compile to GraalVM native, and it can't process Java agents. I want to compare these three different ways in this post: Java agent v1, Java agent v2, and Micrometer Tracing. View more...PostgreSQL Support for Large Object ReplicationAggregated on: 2024-08-12 17:22:55 Replication of large objects isn't currently supported by the community version of PostgreSQL logical replication. If you try to replicate a large object with logical replication, PostgreSQL will return: Large objects aren’t supported by logical replication. It's a meaningful error (always nice), but not helpful if you have large objects that you need to replicate. pgEdge has developed an extension named LargeObjectLOgicalReplication (LOLOR) that provides support for replicating large objects. The primary goal of LOLOR is to provide seamless replication of large objects with pgEdge Spock multi-master distributed replication. View more...Data Fusion: Enhancing Interoperability, Privacy, and SecurityAggregated on: 2024-08-12 16:22:55 Data is the backbone of AI systems, and though the concept of Big Data quenches the data thirst of most AI systems, most of the data is not fit for use readily. To fully understand the problem at hand, accurate and all-encompassing datasets are still needed. Data fusion has gained a lot of traction in digital applications in recent years because the systems feeding on fusion data have higher efficiency and better decision-making skills. The following narrative explains how this multifaceted approach not only streamlines various data utilization needs but also addresses the increasing challenges in the data management landscape. View more...The Need for Application Security TestingAggregated on: 2024-08-12 15:22:55 Security plays a key role whether you are onboarding customer workloads to the cloud, designing and developing a new product, or upgrading an existing service. Security is critical in every leg of the software development life cycle (SDLC). Application security is important, as attackers and cybercriminals will target your software looking for vulnerabilities with the intent to steal data or disrupt operations. In the quest to cater to these challenges, the software industry came up with defending approaches to Application Security Testing which are broadly divided into three categories: SAST (static application security testing), DAST (dynamic application security testing), and IAST (interactive application security testing). View more...Java Concurrency: The Happens-Before GuaranteeAggregated on: 2024-08-12 14:22:55 Usually, when we write code, we have the assumption that the code is executed in the same sequence as it was written. This is not the case, since for optimization purposes, a re-ordering of the statements happens either on compile time or runtime. Regardless when a thread runs a program, the result should be as if all of the actions occurred in the order they appear in the program. The execution of the single thread program should follow as-if-serial semantics. Optimizations and re-orderings can be introduced as long as the result is guaranteed to be the same as the results of the program should the statements have been executed sequentially. View more...Istio Ambient Mesh Performance Test and BenchmarkingAggregated on: 2024-08-12 13:22:55 Istio is the most popular service mesh, but the DevOps and SREs community constantly complain about its performance. Istio Ambient is a sidecar-less approach by the Istio committee (majorly driven by SOLO.io) to improve performance. Since there are many promotions about Ambient mesh being production-ready, many of our prospects and enterprises are generally eager to try or migrate to Ambient mesh. Architecturally, the Istio Ambient mesh is a great design that improves performance. But whether it performs quickly is still a question. We have tried Istio Ambient Mesh and observed the performance countless times between January 2024 and July 2024, and we have yet to see any significant performance gains. View more...Apache Kafka + Flink + Snowflake: Cost-Efficient Analytics and Data GovernanceAggregated on: 2024-08-10 15:07:54 Snowflake is a leading cloud data warehouse and transitions into a data cloud that enables various use cases. The major drawback of this evolution is the significantly growing cost of the data processing. This blog post explores how data streaming with Apache Kafka and Apache Flink enables a "shift left architecture" where business teams can reduce cost, provide better data quality, and process data more efficiently. The real-time capabilities and unification of transactional and analytical workloads using Apache Iceberg's open table format enable new use cases and a best-of-breed approach without a vendor lock-in and the choice of various analytical query engines like Dremio, Starburst, Databricks, Amazon Athena, Google BigQuery, or Apache Flink. Snowflake and Apache Kafka Snowflake is a leading cloud-native data warehouse. Its usability and scalability made it a prevalent data platform in thousands of companies. This blog series explores different data integration and ingestion options, including traditional ETL/iPaaS and data streaming with Apache Kafka. The discussion covers why point-to-point Zero-ETL is only a short-term win, why Reverse ETL is an anti-pattern for real-time use cases, and when a Kappa Architecture and shifting data processing “to the left” into the streaming layer helps to build transactional and analytical real-time and batch use cases in a reliable and cost-efficient way. View more...Demystifying the Magic: A Look Inside the Algorithms of Speech RecognitionAggregated on: 2024-08-09 20:07:54 It seems every commercial device now features some implementation of, or an attempt at, speech recognition. From cross-platform voice assistants to transcription services and accessibility tools, and more recently a differentiator for LLMs — dictation has become an everyday user interface. With the market size of voice-user interfaces (VUI) projected to grow at a CAGR of 23.39% from 2023 to 2028, we can expect many more tech-first companies to adopt it. But how well do you understand the technology? Let's start by dissecting and defining the most common technologies that go into making speech recognition possible. View more...Why I Use RTK Query for API Calls in ReactAggregated on: 2024-08-09 18:52:54 The RTK Query part of the Redux Essentials tutorial is phenomenal, but since it’s part of a much larger suite of documentation, I feel like the gem that is RTK Query is getting lost. What Is Redux? Many people think of Redux as a state management library, which it is. To them, the main value of Redux is that it makes it possible to access (and change) the application state from anywhere in the application. This misses the point of using something like Redux, so let’s zoom out a bit and take another look. View more...Use Mistral AI To Build Generative AI Applications With GoAggregated on: 2024-08-09 17:07:53 Mistral AI offers models with varying characteristics across performance, cost, and more: Mistral 7B: The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration Mixtral 8x7B: A sparse mixture of experts model Mistral Large: Ideal for complex tasks that require large reasoning capabilities or are highly specialized (Synthetic Text Generation, Code Generation, RAG, or Agents) Let's walk through how to use these Mistral AI models on Amazon Bedrock with Go, and in the process, also get a better understanding of its prompt tokens. View more...Content Detection Technologies in Data Loss Prevention (DLP) ProductsAggregated on: 2024-08-09 15:52:53 Having worked with enterprise customers for a decade, I still see potential gaps in data protection. This article addresses the key content detection technologies needed in a Data Loss Prevention (DLP) product that developers need to focus on while developing a first-class solution. First, let’s look at a brief overview of the functionalities of a DLP product before diving into detection. Functionalities of a Data Loss Prevention Product The primary functionalities of a DLP product are policy enforcement, data monitoring, sensitive data loss prevention, and incident remediation. Policy enforcement allows security administrators to create policies and apply them to specific channels or enforcement points. These enforcement points include email, network traffic interceptors, endpoints (including BYOD), cloud applications, and data storage repositories. Sensitive data monitoring focuses on protecting critical data from leaking out of the organization's control, ensuring business continuity. Incident remediation may involve restoring data with proper access permissions, data encryption, blocking suspicious transfers, and more. View more...Connecting ChatGPT to Code Review Made EasyAggregated on: 2024-08-09 14:07:53 The era of artificial intelligence is already already in bloom. Everyone working in IT is already familiar with our "new best friend" for development — AI. Working as a DevOps Engineer at Innovecs, I’d like to share one of my latest findings. Concept Would you like every pull/merge request to be checked by ChatGPT-4 first and then by you? Do you want instant feedback on code changes before your colleagues see them? How about detecting who committed confidential data or API keys and where with the ability to tag the "culprit" for correction immediately? We’re perfectly aware that GPT can generate code quite well. . . but it turns out it can review it just as smoothly! I will immediately show how this works in practice (parts of the code are blurred to avoid showing too much). View more...The Case for Working on Non-Glamorous Migration ProjectsAggregated on: 2024-08-08 23:07:53 In my 13 years of engineering experience, I saw many people make career decisions based on the opportunity to work on a brand-new service. There is nothing wrong with that decision. However, today we are going to make a contradictory case of working on boring migration projects. What I did not realize early on in my career was that most of my foundational software development learning came from projects that were migration projects — e.g., migrating an underlying data store to another cloud-based technology or deprecating a monolithic service in favor of new microservices, etc. This is because migrations are inherently hard: you are forced to meet, if not exceed, an existing bar on availability, scale, latency, and customer experience which was built and honed over the years by multiple engineers. You won’t face those constraints on a brand-new system because you are free to define them. Not only that, no matter how thorough you are with migrations, there will be hidden skeletons in the closet to deal with when you switch over to new parts of the system (Check out this interesting article on how Doordash’s migration from Int to BigInt for a database field was fraught with blockers). View more...Batch vs. Real-Time Processing: Understanding the DifferencesAggregated on: 2024-08-08 21:37:53 The decision between batch and real-time processing is a critical one, shaping the design, architecture, and success of our data pipelines. While both methods aim to extract valuable insights from data, they differ significantly in their execution, capabilities, and use cases. Understanding the key distinctions between these two processing paradigms is crucial for organizations to make informed decisions and harness the full potential of their data. Key definitions can be summarized as follows: View more...Apache Flink 101: A Guide for DevelopersAggregated on: 2024-08-08 20:52:53 In recent years, Apache Flink has established itself as the de facto standard for real-time stream processing. Stream processing is a paradigm for system building that treats event streams (sequences of events in time) as its most essential building block. A stream processor, such as Flink, consumes input streams produced by event sources and produces output streams that are consumed by sinks (the sinks store results and make them available for further processing). Household names like Amazon, Netflix, and Uber rely on Flink to power data pipelines running at tremendous scale at the heart of their businesses, but Flink also plays a key role in many smaller companies with similar requirements for being able to react quickly to critical business events. View more...How To Create a CRUD Application in Less Than 15 MinutesAggregated on: 2024-08-08 18:07:53 CRUD applications form the backbone of most software projects today. If you're reading this article, chances are your project encountered some challenges, you’re seeking a faster way to accomplish this task, or you are looking for a Java framework to start with. You're not alone. With the tech world constantly evolving, especially with tighter budgets, there's a noticeable shift towards frameworks that bring everything under one roof to reduce the need for oversized teams. View more...Running PyTorch on GPUsAggregated on: 2024-08-08 16:07:53 Running an AI workload on a GPU machine requires the installation of kernel drivers and user space libraries from GPU vendors such as AMD and NVIDIA. Once the driver and software are installed, to use AI frameworks such as PyTorch and TensorFlow, one needs to use the proper framework built against the GPU target. Usually, the AI applications run on top of popular AI frameworks and as such hide the tedious installation steps. This article highlights the importance of the hardware, driver, software, and frameworks for running AI applications or workloads. This article deals with the Linux operating system, ROCm software stack for AMD GPU, CUDA software stack for NVIDIA GPU, and PyTorch for AI frameworks. Docker plays a critical part in bringing up the entire stack allowing the launch of various workloads in parallel. View more...JavaScript Frameworks: The Past, the Present, and the FutureAggregated on: 2024-08-08 15:07:53 When we talk about web development, we cannot help but mention JavaScript. Throughout the past several decades, JavaScript frameworks have been the backbone of web development, defining its direction. The capabilities of JavaScript tools have been steadily growing, enabling the creation of faster, more complex, and more efficient websites. This evolution has made a huge leap from jQuery to React, Angular, and Vue.js. We will look at the major milestones in the evolution of the JavaScript framework that have defined web development as we know it today. The Early Days: jQuery and Its Impact jQuery was created in 2005 by developer John Resig, who set out on a journey to realize an audacious idea for the time being: making JavaScript code writing fun. To achieve this daring goal, he stripped common and repetitive tasks of excessive markup and made them short and understandable. This simple recipe helped him create the most popular JavaScript library in the history of the internet. View more...How To Scale RAG and Build More Accurate LLMsAggregated on: 2024-08-08 14:07:53 Retrieval augmented generation (RAG) has emerged as a leading pattern to combat hallucinations and other inaccuracies that affect large language model content generation. However, RAG needs the right data architecture around it to scale effectively and efficiently. A data streaming approach grounds the optimal architecture for supplying LLMs with large volumes of continuously enriched, trustworthy data to generate accurate results. This approach also allows data and application teams to work and scale independently to accelerate innovation. Foundational LLMs like GPT and Llama are trained on vast amounts of data and can often generate reasonable responses about a broad range of topics, but do generate erroneous content. As Forrester noted recently, public LLMs “regularly produce results that are irrelevant or flat wrong,” because their training data is weighted toward publicly available internet data. In addition, these foundational LLMs are completely blind to the corporate data locked away in customer databases, ERP systems, corporate Wikis, and other internal data sources. This hidden data must be leveraged to improve accuracy and unlock real business value. View more...Leveraging Snowflake’s AI/ML Capabilities for Anomaly DetectionAggregated on: 2024-08-08 13:22:53 Anomaly detection is the process of identifying the data deviation from the expected results in a time-series data. This deviation can have a huge impact on forecasting models if not identified before the model creation. Snowflake Cortex AL/ML suite helps you train the models to spot and correct these outliers in order to help improve the quality of your results. Detecting outliers also helps in identifying the source of the deviations in processes. Anomaly detection works with both single and multi-series data. Multi-series data represents multiple independent threads of events. For example, if you have sales data for multiple stores, each store’s sales can be checked separately by a single model based on the store identifier. These outliers can be detected in time-series data using the Snowflake built-in class SNOWFLAKE.ML.ANOMALY_DETECTION. View more...Semi-Supervised Learning: How To Overcome the Lack of LabelsAggregated on: 2024-08-07 21:07:53 All successfully implemented machine learning models are backed by at least two strong components: data and model. In my discussions with ML engineers, I heard many times that, instead of spending a significant amount of time on data preparation, including labeling for supervised learning, they would rather spend their time on model development. When it comes to most problems, labeling huge amounts of data is way more difficult than obtaining it in the first place. Unlabeled data fails to provide the desired accuracy during training, and labeling huge datasets for supervised learning can be time-consuming and expensive. What if the data labeling budget was limited? What data should be labeled first? These are just some of the daunting questions facing ML engineers who would rather be doing productive work instead. View more...10 Kubernetes Cost Optimization TechniquesAggregated on: 2024-08-07 20:07:53 These are 10 strategies for reducing Kubernetes costs. We’ve split them into pre-deployment, post-deployment, and ongoing cost optimization techniques to help people at the beginning and middle of their cloud journeys, as well as those who have fully adopted the cloud and are just looking for a few extra pointers. So, let’s get started. View more...Docker vs. Podman: Exploring Container Technologies for Modern Web DevelopmentAggregated on: 2024-08-07 19:07:52 Among the most often used containerizing technologies in the realm of software development are Docker and Podman. Examining their use cases, benefits, and limitations, this article offers a thorough comparison of Docker and Podman. We will also go over useful cases of deploying web apps utilizing both technologies, stressing important commands and factors for producing container images. Introduction Containerization has become an essential technique for creating, transporting, and executing applications with unmatched uniformity across various computer environments. Docker, a pioneer in this field, has transformed software development techniques by introducing developers to the capabilities and adaptability of containers. This technology employs containerization to package an application and all its necessary components into a self-contained entity. This provides consistent functionality regardless of variations in development, staging, and production environments. View more... |
|