Summer School on String Algorithms

The CPM summer school will take place on June 12 and 13, which are Thursday and Friday of the week before the conference.

Speakers

Hideo Bannai, Institute of Science Tokyo.
Jonas Ellert, École Normale Supérieure, Paris.
Giulia Punzi, University of Pisa.

Location

University of Milano-Bicocca, ABACUS Building (U14 Building) . Click on the map for a larger map.

You can also check directions on Google Maps and Apple Maps.

Lunch info

There are few places around the location to have lunch:

Bicocca Village: shopping mall with plenty food alternatives
Toast To Coast: sandiwiches and daily lunch menus
Zero Pensieri: pinsa (pizza-like) and daily lunch menus

Registration

The registration deadline is June 10th. Registration form.

The school is organized by Giulia Bernardini (giulia.bernardini[at]unimi.it), Gabriele Fici (gabriele.fici[at]unipa.it) and Raffaella Rizzi (raffaella.rizzi[at]unimib.it).

Programme

Thursday, June 12th

Lecture1 (Ground floor, Room T023)

14:00-17:00

Approximate Pattern Matching 101 (Jonas Ellert)

Approximate pattern matching is a fundamental problem in string algorithmics. One version of this problem asks for approximate pattern occurrences with a small number of mismatches (i.e., with low Hamming distance). For example, the pattern “SummerIsCool” occurs in “CPM-SummerSchool-2025” with three mismatches. In this lecture, we consider offline algorithms that, given a pattern of length m, a text of length n, and an integer threshold k, find all approximate pattern occurrences with at most k mismatches. We delve into the most common techniques used by such algorithms, e.g.:

using FFT to achieve O(n log m) time for constant alphabet,
using an additional trick to achieve ~O(m*sqrt{m}) time for any alphabet,
using Kangaroo jumps to achieve O(nk) time,
selected techniques used by more advanced algorithms.

The lecture is aimed at students at all levels.

Friday, June 13th

Lecture2 (Ground floor, Room T023)

9:00-12:00

Dictionary Compression and Repetitiveness Measures (Hideo Bannai)

Dictionary compression is a family of compression methods that essentially represent data using “copy and paste” operations. This includes well-known methods such as LZ77, grammar-based compressors, and run-length compressed BWT. Compared to statistical methods, dictionary compressors can be more effective at modeling highly repetitive data, such as collections of genome sequences from the same species.

In this talk, I will give a (biased) survey of various dictionary compression methods and repetitiveness measures, discussing their relationships, recent developments, and open problems.

12:00-14:00

Lunch

Lecture3 (Ground floor, Room T023)

14:00-17:00

Exploring the Relationships between Graph and String Problems (Giulia Punzi)

Graphs are a fundamental tool in computer science, providing a versatile model of relationships between entities. The need to combine string problems with graph-theoretic approaches is becoming increasingly important, as information is often textual yet interconnected in a graph-representable way. Notable examples of this interplay include the World Wide Web graph, pangenome graphs, and automata theory.

In this lecture, we focus on a specific graph data structure used to encode string information: deBruijn graphs. Such graphs have been classically used in bioinformatics for genome sequence assembly applications. The nodes of a dBG represent the k-mers (length-k substrings) of an input text, while (directed) edges indicate overlaps of length k-1. We will introduce useful properties of deBruijn graphs, and show how some string-related problems can be approached through graph algorithms on dBGs.