Cephalocon 2024: Full Schedule

4 - 5 December 2024 | Geneva, Switzerland
View More Details & Registration

The Sched app allows you to build your schedule but is separate from your event registration. You must be registered for Cephalocon 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Central European Time. To see the schedule in your preferred timezone, select from the drop-down menu located at the bottom of the menu to the right.

10:55 CET

Beyond Particle Physics: The Impact of Ceph and OpenStack on CERN's Multi-Datacenter Cloud Strategy - Enrico Bocchi & Jose Castro Leon, CERN

Wednesday December 4, 2024 10:55 - 11:30 CET

Auditorium C

CERN IT operates a large-scale storage and computing infrastructure at the service of scientific research and its user community: Ceph provides block, object, and file storage at a scale of 100 PBs, while OpenStack provisions bare-metal nodes, VMs, and virtual networking managing more than 450k CPUs. With the advent of a new computing center a few kilometers away from the main campus, compute and storage resources have been re-imagined to extend the capabilities offered by the infrastructure, while putting upfront clear design choices to favor availability and ease of operations. . In this presentation we report on how the new computing center was designed to host cohesively compute and storage resources, how integration with the existing computing center was achieved, and which new capabilities have been unlocked thanks to the newly-built DC. For Ceph in particular, we share insights on achieving data locality with compute resources, deploying a multi-site object storage service, and running a CephFS service that spans across both data centers.

Speakers

Enrico Bocchi

Ceph Technical Lead, CERN

Enrico is a Computing Engineer at CERN, where he has worked in the past 7 years in Distributed Storage Systems. He is responsible for operating and evolving critical production services at the scale of 100's of PBs including Ceph block and object storage. Enrico holds a joint-PhD... Read More →

Jose Castro Leon

Cloud Technical Leader, CERN

Jose is the Technical Leader for the CERN Cloud Infrastructure Service. He holds a Msc in Computer Science from Universidad de Oviedo. He joined CERN in 2010 and since then he was been working in virtualisation first and then he become part of the cloud team who build the CERN's OpenStack-based... Read More →

Wednesday December 4, 2024 10:55 - 11:30 CET
Auditorium C

Session Presentation

Audience Level Any

11:40 CET

Keeping Ceph RGW Object Storage Consistent - Jane Zhu, Bloomberg

Wednesday December 4, 2024 11:40 - 12:15 CET

Auditorium C

Data powers Bloomberg’s financial products. Ceph clusters are the backbone of Bloomberg’s internal S3 cloud storage systems, which host this data and serve billions of requests a day.During the intensive usage of the Ceph RGW object storage with multi-site settings, we encountered different types of data inconsistencies, such as bucket-index and RADOS object inconsistency, unfinished transactions, and multi-site replication inconsistency, etc. These inconsistencies may potentially be caused by software bugs, race conditions, system timeout, and other reasons. Since we cannot guarantee the system is always bug-free and operating smoothly, it’s crucial that we can identify the inconsistency – should it happen – and fix or report it. While there are existing tools and code in place to help address some of these issues, their usage has limitations. As such, we are proposing a scalable and extensible bucket scrubbing approach to systematically check and identify and fix any inconsistency in the RGW object storage system at the bucket-level, if possible. This talk will discuss the design of this bucket scrubbing system and a prototype of it that we are implementing at Bloomberg.

Speakers

Jane Zhu

Senior Software Engineer, Bloomberg

Dr. Jane Zhu is a Senior Software Engineer in the Storage Engineering team at Bloomberg. Jane and her team designed and built a highly available, scalable, and durable software-defined cloud storage platform inside the Bloomberg ecosystem. Jane worked in the industry for more than... Read More →

Wednesday December 4, 2024 11:40 - 12:15 CET
Auditorium C

Session Presentation

Audience Level Any

13:30 CET

Architecting Cloud Storage for AI Native Applications - Nathan Goulding, Vultr

Wednesday December 4, 2024 13:30 - 13:45 CET

Auditorium C

Cloud compute rearchitected how data is stored and managed to support global enterprise applications. AI is transforming cloud compute to put GPU at the core of delivering new AI driven services for employees and customers. Training and inferring new AI models requires a fundamental change in storage architecture to support the massive data requirements for AI workloads. In this session, learn how Vultr is pioneering a new architecture for cloud storage to support new AI native applications.

Speakers

Nathan Goulding

Senior Vice President, Engineering, Vultr

Nathan Goulding is an entrepreneurial-minded, product-focused technical leader with over 20 years of infrastructure, platform, and software as-a-service experience. As SVP, Engineering at Vultr, Nathan leads the engineering and technical product management teams. Prior to Vultr, Nathan... Read More →

Wednesday December 4, 2024 13:30 - 13:45 CET
Auditorium C

Session Presentation

13:50 CET

Remote Replication in MicroCeph: RBD and Beyond - Utkarsh Bhatt, Canonical

Wednesday December 4, 2024 13:50 - 14:05 CET

Auditorium C

Remote replication (for block, file, and object workload) is a highly desirable feature for backup, migration, and disaster recovery. Ceph offers a highly capable but non-homogenous user experience for remote replication across different workloads (RBD mirroring, CephFS mirroring, and RGW Multisite). The Squid release of MicroCeph introduces a new set of APIs that expose standardized procedures for remote cluster awareness and remote replication for the mentioned Ceph workloads. This lightning talk will highlight implementation details while demoing RBD remote replication in MicroCeph and plans for CephFs and RGW remote replication roadmap.

Speakers

Utkarsh Bhatt

Software Engineer, Canonical

Utkarsh Bhatt is a Software Engineer in the Ceph Engineering team. His team is responsible for producing the packages, charms, snaps, rocks and everything in between for the Canonical Ceph storage solutions. He graduated in 2020, and joined Canonical in May, 2022 after working for... Read More →

Wednesday December 4, 2024 13:50 - 14:05 CET
Auditorium C

Lightning Talk

14:15 CET

The Art of Teuthology - Patrick Donnelly, IBM, Inc.

Wednesday December 4, 2024 14:15 - 14:50 CET

Auditorium C

The Ceph project has used the Teuthology testing framework for much of its history. The custom framework is used to schedule batch jobs that perform e2e testing of Ceph. This is orchestrated using a suite of YAML fragments to alternate test modes, configurations, workloads, and other parameters. Teuthology assembles these fragments into a static matrix with potentially dozens of dimensions ultimately producing a combinatoric explosion of jobs which are evaluated, in practice, as smaller subsets for scheduling. We will explore an alternative directed graph model for constructing jobs from a suite of YAML fragments using path walks. Code adapted to this model has been constructed to produce subsets in linear time and provide Lua scriptable control of YAML fragment generation. The latter new feature empowers us to test Ceph with more rigor and completeness. For example, upgrade suites can be constructed using all possible versions of Ceph that are valid upgrade paths to a target release. We will explore this and other enhancements in depth. The audience can expect to leave with a firm and visual understanding of how QA is performed on Ceph and a vision for the future testing.

Speakers

Patrick Donnelly

Software Architect, IBM, Inc.

Patrick Donnelly is a Software Architect at IBM, Inc. working as part of the global development team on the open source Ceph distributed storage system. Patrick has principally worked on the Ceph file system (CephFS) since 2016. He has been working on Open Source projects for the... Read More →

Wednesday December 4, 2024 14:15 - 14:50 CET
Auditorium C

Session Presentation

Audience Level Intermediate

15:00 CET

Exploring RocksDB in RGW: How We Manage Tombstones - Sungjoon Koh & Ilsoo Byun, LINE Plus Corporation

Wednesday December 4, 2024 15:00 - 15:35 CET

Auditorium C

LINE, a global mobile messenger, has adopted Ceph as its main object storage. It is used to store different kinds of data, such as log files and application data. Thanks to its scalability, billions of objects are stored in our clusters. However, over time, object deletions lead to the accumulation of tombstones in RocksDB, resulting in delays during iteration. Slow iteration not only impacts LIST operation but also stalls subsequent requests. To address this issue, we first collected RocksDB metric called "skip count", which indicates the total number of tombstones detected during iterations. We then deployed a new job which compacts OSDs with high skip counts to prevent stalls. Additionally, we analyzed the pattern of tombstones and found out that a few prefixes account for over 80% tombstones, throughout the entire OSD. Based on this observation, we propose range-based compaction. In this presentation, we will first explain the basics of RocksDB and its role in Ceph Object Storage. Then, we will share our experience how we handled the RocksDB issue. Lastly, we will discuss our proposal for range-based compaction, which could further optimize overall system performance.

Speakers

Ilsoo Byun

Senior Manager, LINE Plus Corporation

Storage engineer at LINE

Sungjoon Koh

Cloud Storage Engineer, LINE Plus Corporation

Sungjoon Koh is a cloud storage engineer at LINE Plus Corporation, focusing on object storage and NVMe-oF-based block storage services. His current interests include enhancing Ceph's compatibility with the S3 standard and developing object migration features. Before joining LINE Plus... Read More →

Wednesday December 4, 2024 15:00 - 15:35 CET
Auditorium C

Session Presentation

Audience Level Intermediate

15:55 CET

Revisiting Ceph's Performance After 4 Years - Wido den Hollander, Your.Online

Wednesday December 4, 2024 15:55 - 16:30 CET

Auditorium C

As new generations of hardware become available and Ceph is improved, how does it's performance change? If we look back 4 years, how did Ceph's performance improve (or not)?

Speakers

Wido den Hollander

CTO, Your.Online

Wido has been a part of the Ceph community for over 10 years. Long time user, developer and advocate of the future of storage. He has worked as Ceph consultant and trainer and is now CTO of Your.Online, a European-based hosting group with companies throughout Europe and a large Ceph... Read More →

Wednesday December 4, 2024 15:55 - 16:30 CET
Auditorium C

Session Presentation

Audience Level Intermediate

16:40 CET

Ceph Manager Module Design and Operation, an in-Depth Review - Brad Hubbard, Redhat & Prashant Dhange, IBM Canada Ltd.

Wednesday December 4, 2024 16:40 - 17:15 CET

Auditorium C

This session will cover overall ceph manager design and operational aspects of the ceph MGR daemon. We will begin by giving an introduction to the MGR architecture, move on to discussing functionality of the mgr DaemonServer, mgr client, python module registry, base mgr module, and loading and unloading of the mgr modules. We will then move on to discuss module debugging, an example of GIL deadlock debugging, and how to troubleshoot MGR bugs and plugin issues. Finally, we discuss new features including tracking mgr ops and further improvements planned for future releases.

Speakers

Prashant Dhange

Ceph rados core engineer, IBM Canada Ltd.

With 15+ years of experience in storage and cloud computing, Prashant is a experienced professional with a strong background in system programming. Prashant's focus lies in developing and optimizing storage solutions, particularly through his in-depth work with Ceph RADOS, a pivotal... Read More →

Brad Hubbard

Principal Software Engineer, Redhat

Involved in supporting and contributing to the ceph project for well over ten years. Most recently as a RADOS core engineer working on features and bugs, both upstream and down, as well as advocating for the customer and expediting their issues internally. I have a passion for complex... Read More →

Wednesday December 4, 2024 16:40 - 17:15 CET
Auditorium C

Session Presentation

Audience Level Advanced

17:25 CET

Supporting 3 Availability Zones Stretch Cluster - Kamoltat (Junior) Sirivadhna, IBM

Wednesday December 4, 2024 17:25 - 18:00 CET

Auditorium C

A Ceph cluster stretched across 3 zones faces a potential scenario where data loss can occur due to unforeseeable circumstances. An example of such a scenario is when we have 6 replicas spread across 3 datacenters with a min_size of 3 and the setup is intended to prevent I/O from happening when there is only 1 datacenter available, however, there is an edge case where a placement group (PG) becomes available due to a lack of safeguarding during the process of temporary PG mappings in order ensure data availability. This scenario poses a risk when the sole surviving data center accepts writes, and then suddenly the 2 unavailable data centers come back up. At the same time, the surviving data center suddenly goes down, which means we would have a data loss situation. To prevent such a scenario from happening, we created a solution that utilizes an existing feature in stretch mode that would restrict how we choose the OSDs that would go into the acting set of a PG. This talk will take a deep dive into how this feature is implemented in the latest Ceph upstream as well as other features that improve the user experience with stretch cluster in the latest Ceph upstream release.

Speakers

Kamoltat (Junior) Sirivadhna

Software Engineer RADOS, IBM

Junior has been a Ceph contributor for 4 years, some of his work includes enhancing Stretch Mode/ Stretch Cluster features in Ceph and improving the PG auto scaler module. Furthermore, he also contributes to the enhancement of Teuthology, a Ceph Integration testing framework that... Read More →

Wednesday December 4, 2024 17:25 - 18:00 CET
Auditorium C

Session Presentation

Audience Level Any

10:50 CET

SWITCH: Operations, Data Management and Automation - Theofilos Mouratidis, SWITCH

Thursday December 5, 2024 10:50 - 11:25 CET

Auditorium C

SWITCH is the national research and education network (NREN) of Switzerland, a non-profit organisation that provides services to the universities and schools of the country. In the storage circle of the cloud team at SWITCH, we maintain and procure Ceph clusters mainly for S3. We have 3 iterations that differ in terms of automation and features namely OSv1/2/3. We currently develop the latest iteration using Ansible in a GitOps way, where the code is the source of truth and changes to the code automatically deploy configuration changes to various clusters. In this session, we will talk about the OSv3 Ansible collection and configuration management repos, where from defining an inventory that looks like the `ceph orch host ls` output and a short yaml file, we can immediately bootstrap clusters that connect together and provide multisite S3, without any manual steps. Now that we deploy our new clusters using the new technologies, we are in the migration phase where we try to maintain the old dying clusters (OSv1/2) and slowly migrate S3 data to the new ones with minimal or no user intervention.

Speakers

Theofilos Mouratidis

Cloud Engineer, SWITCH

My name is Theofilos Mouratidis and I am from Greece. I am currently a cloud engineer at SWITCH. I have a strong theoretical background and research interest in distributed systems. In the past I have worked for CERN and Proton in similar positions. I enjoy the sunny weather and go... Read More →

Thursday December 5, 2024 10:50 - 11:25 CET
Auditorium C

Session Presentation

Audience Level Intermediate

11:35 CET

State of CephFS: Three Easy Pieces - Venky Shankar, IBM & Patrick Donnelly, IBM, Inc.

Thursday December 5, 2024 11:35 - 12:10 CET

Auditorium C

This talk focusses on the current (and near future) state of the three pieces that make up a Ceph File System - Ceph Metadata Sever (MDS), Clients and a set of Ceph Manager Plugins. Much advancements have been made to the Ceph File System recently, opening up gateways for wider adoption. Some features are already available in recent releases and some are under development. We detail these enhancements by breaking up nicely into each of the three pieces. Ceph File System specific manager plugins have come a long way to now becoming the de-facto for subvolume/crash-consistent snapshot management and mirroring. We discuss about those. And finally, we peek into what is upcoming in CephFS for Tentacle ("T") release. Existing and new CephFS users would find it helpful to assess and plan ahead for its adoption.

Speakers

Patrick Donnelly

Software Architect, IBM, Inc.

Venky Shankar

CephFS PTL, IBM

I have worked in distributed file systems for over a decade. Currently leading the Ceph File Systems team and part of the Ceph Leadership Team.

Thursday December 5, 2024 11:35 - 12:10 CET
Auditorium C

Session Presentation

Audience Level Any

13:40 CET

Archive Zone: Lessons Learned - Ismael Puerto Freire & Xabier Guitián Domínguez, INDITEX

Thursday December 5, 2024 13:40 - 14:15 CET

Auditorium C

In this session, we will delve into the history and evolution of our Ceph clusters dedicated to the archive zone in production. We'll cover the entire journey, from the initial hardware selection to the deployment, and share the critical lessons we've learned along the way. Key topics include: Hardware Selection: How we chose the right hardware for our archive zone, including considerations and trade-offs. Common Mistakes: The pitfalls and mistakes we encountered during the deployment process, and how we overcame them. Best Practices: Steps and strategies to ensure a successful deployment, focusing on reliability, scalability, and performance. Optimization Tips: Techniques to optimize your Ceph cluster for archival purposes, ensuring efficient storage and retrieval of data. By the end of this talk, you will have a comprehensive understanding of the challenges and solutions involved in deploying a Ceph archive zone, enabling you to avoid common pitfalls and achieve a successful implementation in your environment.

Speakers

Xabier Guitián Domínguez

Technical Lead of Infrastructure, INDITEX

I am the Technical Lead of Infrastructure at Inditex, overseeing the operation and continuous evolution of the company's services. My role focuses on ensuring reliability, scalability, and innovation in infrastructure solutions to support Inditex's global operations

Ismael Puerto Freire

Solution Architect, INDITEX

I am a Solution Architect at Inditex, responsible for operating and evolving services based on Ceph and Kubernetes. I have been working with Ceph for six years, handling all types of storage: Object, Block, and FileSystem. My top priorities are maintaining resilience, performance... Read More →

Thursday December 5, 2024 13:40 - 14:15 CET
Auditorium C

Session Presentation

Audience Level Intermediate

14:25 CET

Advancing BlueStore with Real-World Insights - Adam Kupczyk, IBM

Thursday December 5, 2024 14:25 - 15:00 CET

Auditorium C

In past years we have invested significant effort to improve BlueStore's IO latency and throughput. Testing, including aging, have always done using artificial workloads. Obviously we optimized for those scenarios. Now we want to open new chapter in BlueStore maturity. Whenever possible we will use real-life workloads, provided by Ceph users. We will test new components and new proposed settings against those workloads. Testing aging will be augmented by shortcuts that will get the aging process complete faster. The ultimate goal is to preserve high performance new deployments enjoy as long as possible. We want to share this plan with community, get developers involved and convince users to share their workloads.

Speakers

Adam Kupczyk

Engineer, IBM

Mathematician by education. Engineer and programmer by job. Tester by necessity. Graduated Adam Mickiewicz University, Poznan. 25 years in software development.

Thursday December 5, 2024 14:25 - 15:00 CET
Auditorium C

Session Presentation

Audience Level Intermediate

15:10 CET

Benchmarking: Repeatable & Comparable - Trent Lloyd, Canonical (Ubuntu)

Thursday December 5, 2024 15:10 - 15:45 CET

Auditorium C

Your goal when benchmarking should be to ensure that the results are both continuously repeatable and fairly comparable to previously attempts. This is all too easy to get wrong. Benchmarking of any kind often presents tricky business, but storage has always presented particularly difficult challenges as even the simple Hard Drive has interesting performance characteristics that vary greatly depending on the workload or even chance. You might hope that was solved by SSDs, and that is true to an extent for real workloads, but they tend to give even more misleading results during synthetic benchmarks. I'll work through many different causes of inconsistent results in benchmarking both individual components and over-all performance of a Ceph cluster, with specific examples and graphs of real attempts. Items covered include - Working set size - Bi-modal SSD performance due to flash block management - Thin provisioning - Bandwidth limitations of SSDs, Backplanes, PCIe buses, CPUs, Memory and Networks - Filesystems - Caches of all kinds - Inconsistencies from benchmarking freshly deployed Ceph clusters - Benchmarking tools (Don't use anything other than fio, aws-cli is slow) - And more

Speakers

Trent Lloyd

Mr, Canonical (Ubuntu)

Trent Lloyd is a long time passionate speaker and member of the Linux & Open Source community, having first presented at the age of 15 to an audience at linux.conf.au 2003. He has spent the last 9 years in the Ubuntu Support Team at Canonical as a Sustaining Software Engineer specialising... Read More →

Thursday December 5, 2024 15:10 - 15:45 CET
Auditorium C

Session Presentation

Audience Level Intermediate

16:05 CET

Ceph Made Easy: One Dashboard for Multiple Ceph Clusters - Nizamudeen A, IBM India Private Ltd

Thursday December 5, 2024 16:05 - 16:40 CET

Auditorium C

The presentation is about a solution that we have created in the Ceph Dashboard for managing and monitoring multiple ceph clusters from a single cluster that we call a hub cluster. This approach simplifies the complexities of managing multiple clusters by providing a more streamlined and efficient user experience. I will describe the architecture of our implementation and how this would help admins to manage many clusters, ensuring optimal performance, reliability and ease of use. I will also demo various features which can leverage the multi-cluster setup like setting up replication between multiple clusters. Also with multiple clusters connected on a single cluster, this will also provide an overview dashboard where important information on the other clusters can be monitored including the real-time alerts that goes on in the other clusters. I’ll also share how we are planning to improve the feature and our testing strategies around it.

Speakers

Nizamudeen A

Software Engineer, IBM India Private Ltd

Software Engineer and component lead of Ceph Dashboard. I started 5 years ago as an intern at Red Hat contributing to Rook Operator. Eventually moved into Ceph Dashboard and started looking into the usability improvements and implementing workflows in the UI. Later picked up the lead... Read More →

Thursday December 5, 2024 16:05 - 16:40 CET
Auditorium C

Session Presentation

Audience Level Intermediate

16:50 CET

Cost-Effective, Dense, and Performant Prometheus Storage via QLC - Anthony D'Atri, Dreamsnake Productions

Thursday December 5, 2024 16:50 - 17:25 CET

Auditorium C

Prometheus is the metrics ecosystem of choice for modern computing, with exporters for Ceph, RAID HBAs, Redfish, time synchronization, and the panoply provided by node_exporter. Exporters are scraped multiple times per minute for effective queries, each ingesting as many as several housand metrics per system. Data may be kept locally or in external solutions including Ceph RGW. Retention of a year or more is valuable for trending and comparisons. A moderate-size deployment can easily fill tens or hundreds of terabytes. As retention and cardinality grow, so does processing. Prometheus will GC and flush its WAL every two hours, which can manifest visible yet spurious artifacts in visualization tools like Grafana and false alarms from alertmanager rules. Rotational media just don't cut it. While HDDs of capacities as large as 30TB are available, rotational + seek latencies, SATA stenosis, interminable resilvering, and SMR severely limit their viability. SSDs are increasingly viable as HDD replacements. We can improve cost and density by tailoring to the workload: intermittent sequential writes, and frequent random reads. This is a classic workload for modern QLC SSDs.

Speakers

Anthony D'Atri

Principled Engineer, Dreamsnake Productions

Anthony has run Ceph at scale for over eleven years and one can say that it is literally a part of him - ask him in person and he'll show you why. He is also an observability advocate and contributes daily to Ceph documentation and may be open to new opportunities.

Cost effective, Dense, and Performant Prometheus Storage via QLC Cephalocon 2024 6 pdf

Thursday December 5, 2024 16:50 - 17:25 CET
Auditorium C

Session Presentation

Audience Level Intermediate
Session Slides Attached Yes

17:35 CET

The ‘Scrub-Type to Limitations’ Matrix - Ronen Friedman, IBM

Thursday December 5, 2024 17:35 - 18:10 CET

Auditorium C

The scrub ‘restrictions overrides’ matrix: Scrubs can be triggered by multiple conditions, with each trigger resulting in a specific set of scrub-session behaviors and a specific set of limitations / restrictions that apply or are overridden (operator-initiated scrub, for example, are allowed to run on whatever day of the week or hour, regardless of configuration). The matrix of ‘scrub-type to restrictions’ was never fully nor consistently documented. Starting with the ‘Reef’, through ‘Squid’, and - hopefully - finalized in ‘Tentacles’ - we are working on clarifying, documenting and implementing the desired behaviors - the desired matrix. I will present, with the goal of receiving feedback from the Ceph community, what was already released with Squid, and - more important - what changes to this matrix are planned for Tentacles. For the community, this would be a great opportunity to influence the fine details of what will be part of the next Ceph release.

Speakers

Ronen Friedman

Software Architect, IBM

Ronen has been developing software for more than thirty years. He is a member of the Rados core team at Red Hat and now IBM for the last 5 years. Currently is the maintainer for the Ceph OSD Scrub.

Thursday December 5, 2024 17:35 - 18:10 CET
Auditorium C

Birds of a Feather

Audience Level Advanced