Модели и алгоритмы вычислительных процессов для решения трудных вариантов проблемы булевой выполнимости

Павленко Артём Леонидович

Модели и алгоритмы вычислительных процессов для решения трудных вариантов проблемы булевой выполнимости тема диссертации и автореферата по ВАК РФ 00.00.00, кандидат наук Павленко Артём Леонидович

Павленко Артём Леонидович
кандидат наук
2024

Специальность ВАК РФ00.00.00

Количество страниц 244

Павленко Артём Леонидович. Модели и алгоритмы вычислительных процессов для решения трудных вариантов проблемы булевой выполнимости: дис. кандидат наук: 00.00.00 - Другие cпециальности. ФГАОУ ВО «Национальный исследовательский университет ИТМО». 2024. 244 с.

Оглавление диссертации кандидат наук Павленко Артём Леонидович

Оглавление

Реферат

Synopsis

Введение

Глава 1. Проблема булевой выполнимости (SAT) и параллельные алгоритмы ее решения на основе стратегии декомпозиции

1.1 Проблема булевой выполнимости и максимальной булевой выпол-

нимости (MaxSAT). Базовые определения и результаты

1.2 Примеры комбинаторных задач, эффективно сводимых к SAT и

MaxSAT

1.2.1 Задача проверки логической эквивалентности булевых схем

1.2.2 Задача обращения дискретной функции и ее приложения в

криптографии

1.2.3 Задача размещения предприятий в форме MaxSAT

1.3 Основные параллельные стратегии решения SAT: «портфельная»

стратегия и стратегия декомпозиции

1.3.1 Стратегия одновременного запуска нескольких решателей

(портфельная)

1.3.2 Стратегия декомпозиции: разбиение пространства поиска на

непересекающиеся подобласти - базовые принципы и известные решения

1.4 Основные алгоритмы решения задач SAT и MaxSAT

1.4.1 Неполные алгоритмы решения SAT и MaxSAT

1.4.2 Полные алгоритмы решения SAT: метод резолюций, алго-

ритм DPLL, алгоритм CDCL

1.4.3 Полные алгоритмы решения MaxSAT

1.5 Лазейки, декомпозиционные множества и декомпозиционная труд-

ность булевых формул

1.5.1 Понятие лазейки для задачи в ограничениях, строгие ла-

зейки для SAT и трудность формул относительно строгих лазеек

1.5.2 Декомпозиционная трудность булевой формулы относитель-

но полного алгоритма решения SAT

1.5.3 Задача вычисления декомпозиционной трудности как про-

блема метаэвристической псевдобулевой оптимизации

1.5.4 Декомпозиционная трудность в применении к оценкам крип-

тографической стойкости; инверсные лазейки

1.6 Алгоритмы метаэвристической оптимизации, используемые для поиска лазеек при решении задач SAT и MaxSAT

Выводы по главе

Глава 2. Применение параллельных эволюционных алгоритмов поиска лазеек при решении задач SAT и MaxSAT

2.1 Вероятностные лазейки и их применение к решению SAT и MaxSAT

2.1.1 Понятие вероятностной лазейки (р-лазейки)

2.1.2 Древовидные представления вероятностных лазеек

2.1.3 Применение р-лазеек к решению SAT

2.2 Природа псевдо-булевых оценочных функций (fitness functions), ис-

пользуемых для поиска лазеек в задачах SAT и MaxSAT

2.2.1 Различные типы лазеек и подходящие оценочные функции

2.2.2 Особенности оценочных функций для поиска лазеек в MaxSAT

2.3 Вариант эволюционного алгоритма для поиска р-лазеек фиксиро-

ванного размера

2.4 Техники сокращения пространства поиска

Выводы по главе

Глава 3. Модели и алгоритмы поиска лазеек в параллельных вычислительных средах и их реализация в виде библиотеки программ EvoguessAI

3.1 Параллельные модели эволюционных вычислений в применении к

поиску лазеек в задачах SAT и MaxSAT

3.1.1 Базовая модель параллельных вычислений

3.1.2 Асинхронная модель процессов поиска лазеек в параллель-

ных средах в применении к задачам SAT и MaxSAT

3.1.3 Асинхронная островная модель процессов поиска р-лазеек

в параллельных средах в применении к задачам SAT и MaxSAT

3.2 Структура библиотеки программ, реализующих параллельные эво-

люционные алгоритмы поиска лазеек в задачах SAT и MaxSAT (EvoguessAI)

3.2.1 Структура библиотеки EvoguessAI

3.2.2 Диаграмма классов библиотеки EvoguessAI

3.3 Реализованные алгоритмы библиотеки программ для задач SAT и

MaxSAT (EvoguessAI)

3.3.1 Алгоритм поиска вероятностных лазеек (р-лазеек)

3.3.2 Алгоритм решения SAT с использованием найденных веро-

ятностных лазеек

3.3.3 Алгоритм поиска полиномиальных инверсных лазеек

3.3.4 Алгоритм поиска вероятностных лазеек (р-лазеек) для зада-

чи MaxSAT

3.3.5 Алгоритм решения MaxSAT с использованием нескольких

р-лазеек

3.3.6 Алгоритм решения MaxSAT с использованием нескольких

р-лазеек. Гибридная стратегия

Выводы по главе

Глава 4. Применение алгоритмов, использующих лазейки, для решения трудных вариантов задачи булевой выполнимости

4.1 Задача проверки логической эквивалентности булевых схем

4.1.1 Классы тестовых задач

4.1.2 Оценка декомпозиционной трудности и решение LEC-задач

с использованием программы EvoguessAI

4.1.3 Решение тестовых классов LEC-задач посредством р-лазеек

с использованием программы EvoguessAI

4.2 Оценивание стойкости криптографических функций

4.2.1 Примеры построения инверсных лазеек для алгоритмов низ-

коресурсной криптографии с использованием EvoguessAI

4.2.2 Построение полиномиальных инверсных лазеек с использо-

ванием EvoguessAI на примере шифра Bivium

4.3 Применение EvoguessAI для поиска линеаризующих множеств в

задачах обращения низкоресурсных криптографических функций

4.3.1 Понятие линеаризующего множества для квадратичных си-

стем над GF(2)

4.3.2 Линеаризующие множества для шифра Trivium

4.3.3 Линеаризующие множества для шифра Bivium

4.4 Задача размещения предприятий и решение её трудных случаев

4.4.1 Сведение задачи размещения предприятий к MaxSAT

4.4.2 Решение задачи размещения предприятий с использованием

EvoguessAI

Выводы по главе

Заключение

Список литературы

Приложение A

Введение диссертации (часть автореферата) на тему «Модели и алгоритмы вычислительных процессов для решения трудных вариантов проблемы булевой выполнимости»

Реферат Общая характеристика работы

Актуальность темы. Интенсивное развитие комбинаторных алгоритмов обусловлено широким спектром их практической применимости. Ценность алгоритмов решения проблемы булевой выполнимости [1, 2], как подкласса комбинаторных алгоритмов, в том, что эти алгоритмы могут с успехом применяться к различным NP-полным и NP-трудным задач. Для решения этих задач не известны алгоритмы с полиномиальным временем работы, однако такие задачи возникают в большом числе практических приложений. Конкретно, речь идет о задачах верификации и тестирования программ и дискретных управляющих систем [3, 4], задачах планирования и построения расписаний [5], компьютерной безопасности и криптографии [6, 7, 8, 9] и многих других областях.

Базовая идея решения таких задач, которая активно эксплуатируется в последние 20 лет, состоит в следующем. Можно эффективно сводить NP-трудные задачи, возникающие в практических приложениях, к некоторой комбинаторной проблеме с хорошо развитой алгоритмикой. Как правило, в качестве такой базовой задачи выбирается комбинаторная задача, для которой естественным образом могут быть сформулированы разнообразные эвристики, позволяющие расширить множество задач, решение которых может быть найдено за разумное время. Известно относительно немного такого сорта «универсальных» проблем: задача булевой выполнимости (Boolean Satisfiability problem, SAT); задача максимальной выполнимости (MaxSAT), представляющая собой оптимизационный вариант SAT; задачи математического программирования (0-1-Integer Linear Programming (0-1-ILP), Mixed Integer Programming (MIP)), некоторые задачи на графах: задачи о покрытии (Covering problem), задача коммивояжера (Traveling Salesman Propblem, TSP), задача о максимальной клике и ряд других. Для всех перечисленных задач разработаны эвристические алгоритмы, показывающие хорошие результаты на входных данных весьма существенных размерностей.

Настоящая работа посвящена алгоритмам решения проблемы булевой выполнимости (SAT) и, конкретно, разработке параллельных алгоритмов решения SAT, а также различным техникам организации работы таких алгоритмов в

параллельных средах. Задача SAT является первой и одной из классических КР-полных/КР-трудных задач [10, 11, 12]. Она привлекала внимание исследователей еще с начала 60-х годов XX века в связи с работами по автоматическому доказательству теорем [13]. Последующий интерес к SAT долгое время был вызван фактом ее NP-трудности и, соответственно, возможностью использовать данную задачу для доказательства высокой вычислительной сложности других комбинаторных задач, к которым удавалось эффективно свести SAT. Однако параллельно развивались и эвристические алгоритмы решения SAT, используемые, главным образом, для построения тестов, детектирующих ошибки в микроэлектронных устройствах. Коренной перелом произошел в 1996 году с изобретением алгоритма CDCL [14, 15]. С этого момента стартует индустрия разработки и применения программных решателей SAT (SAT-решателей). Эти алгоритмы начинают активно использоваться для решения постоянно расширяющегося круга как практических, так и теоретических проблем. Основные спектры практического применения современных SAT-решателей: верификация и тестирование, криптография и криптоанализ, анализ протоколов безопасности, построение расписаний, построение комбинаторных конфигураций с заданными свойствами (например, корректирующих кодов), решение задач дискретной оптимизации (задачи размещения предприятий, задачи о покрытии) и многое другое.

Поскольку SAT является NP-трудной задачей, то для решения трудных ее вариантов оправдано применение парадигм параллельных вычислений. В этом плане выделяют два глобальных подхода к параллельному решение SAT: «портфельный» подход (portfolio approach) и подход на основе разбиений (partitioning approach). Идея портфельного подхода состоит в запуске на рассматриваемой трудной задаче нескольких SAT-решателей с различными стартовыми параметрами. Подход на основе разбиений, по сути, подразумевает параллелизм по данным, когда исходная задача разбивается (декомпозируется) на множество различных задач с непересекающимися областями поиска: если решены все задачи из такой декомпозиции, то найдено и решение исходной задачи, даже если получен ответ «формула невыполнима». Настоящая работа посвящена именно подходу на основе разбиений.

Ввиду сказанного выше о большом практическом значении SAT, актуальность тематики разработки параллельных алгоритмов, повышающих эффек-

тивность решения SAT, а также связанной с ней задачи MaxSAT, не вызывает сомнений. Известно большое число различных способов построения разбиения булевых формул. В настоящей работе представлены как новые методы построения разбиений, так и специальные алгоритмы, позволяющие строить параллельные планы решения рассматриваемых трудных вариантов SAT.

Более конкретно, в работе представлен новый метод оценки и оптимизации информационных процессов, реализующих решение задач SAT и MaxSAT в параллельных вычислительных средах с архитектурой MPP. В основе предложенного метода лежит идея использования специальных структур, называемых лазейками. Любая лазейка дает основу для естественной декомпозиции исходной трудной задачи на семейство более простых задач, которые могут быть решены с использованием параллельных вычислений. Тем самым наличие хорошей (в смысле мер, используемых в работе) лазейки позволяет существенно оптимизировать информационные процессы, реализующие решение задачи в параллельной среде, и сократить общее время такого решения.

Для поиска лазеек используются алгоритмы метаэвристической оптимизации, которые могут быть реализованы в виде параллельных программ. Соответствующие стратегии и их реализация также представлены в настоящей работе. Таким образом, работа содержит теоретические результаты: разработаны стратегии распараллеливания трудных вариантов SAT и MaxSAT на основе лазеек, а также метаэвристические (эволюционные) алгоритмы поиска лазеек. Кроме этого, представлены результаты практического использования разработанных алгоритмов: решены трудные задачи из области символьной верификации (проверка логической эквивалентности булевых схем), построены оценки трудности ряда дискретных функций, относящихся к низкоресурсной криптографии, а также решены некоторые оптимизационные задачи из класса задач о размещении предприятий (конкретно, речь идет о Branch Location Problem, BPL [16]). Все разработанные алгоритмы, применимые для решения трудных случаев SAT и MaxSAT, объединены в единую программную систему, получившую название EvoguessAI1.

Цель и задачи исследования. Цель работы - повысить эффективность решения трудных комбинаторных задач, имеющих промышленные приложения, сведенных к проблеме булевой выполнимости (SAT), за счет использования

1 https://github .com/aimclub/evoguess-ai

параллельных вычислений.

Для достижения поставленной цели были решены следующие задачи:

— изучен феномен декомпозиционной трудности комбинаторной задачи с булевыми ограничениями и раскрыта роль структур, называемых лазейками (backdoors), при построении параллельного плана решения рассматриваемой задачи;

— разработаны алгоритмы, основанные на метаэвристической оптимизации, для поиска декомпозиционных множеств, обеспечивающих наилучшее ускорение решения исходной комбинаторной задачи в форме SAT;

— разработаны алгоритмы поиска и решения задач SAT и MaxSAT в параллельных средах на основе понятия вероятностной лазейки (р-backdoor);

— разработаны алгоритмы эволюционного поиска лазеек, в основе которых лежат понятия асинхронной и островной моделей;

— выполнена реализация всех разработанных алгоритмов в форме библиотеки параллельных программ EvoguessAI, поддерживающих взаимодействие с вычислительным кластером произвольной мощности на основе CPU;

— проведены масштабные вычислительные эксперименты по решению комбинаторных задач с булевыми ограничениями из таких областей как символьная верификация и тестирование дискретных управляющих систем, оценивание стойкости криптографических функций, комбинаторные задачи размещения предприятий.

Методы и инструменты исследования. Теоретическая часть работы использует методологию дискретной математики и математической логики, теорию вычислительной сложности, теорию параллельных вычислений, а также теорию эволюционных алгоритмов. При построении вычислительных задач из области проверки логической эквивалентности булевых схем, а также для оценивания трудности криптографических функций использовалась программная система Transalg [17, 18]. Для решения конкретных примеров SAT применялись современные SAT-решатели, имеющиеся в открытом доступе (MiniSAT, Kissat, Cadical и др.). Для вызова решателей в роли оракулов (например, при поиске лазеек) использовался инструмент PySAT [19]. Все вычислительные эксперименты проводились при помощи библиотеки параллельных программ

EvoguessAI, разработанной в процессе написания настоящей диссертации. В экспериментах задействовался вычислительный кластер с узлами, содержащими по 36 процессорных ядер Intel Xeon E2695 каждый.

Научная новизна. Новыми являются все основные результаты, полученные в диссертации, в том числе:

1. Модели и алгоритмы, используемые для параллельного решения трудных примеров проблем SAT и MaxSAT, основанные на концепции лазеек;

2. Параллельные эволюционные алгоритмы, основанные на асинхронной и островной моделях, применимые для поиска вероятностных лазеек (p-backdoors) в трудных примерах задачи SAT;

3. Параллельные алгоритмы решения трудных комбинаторных задач с булевыми ограничениями, использующие найденные вероятностные лазейки;

4. Параллельный решатель задачи MaxSAT, использующий в своей работе р-лазейки.

5. Библиотека программ EvoguessAI, включающая все разработанные алгоритмы и поддерживающая работу с вычислительным кластером.

Положения, выносимые на защиту. На защиту выносятся следующие основные положения:

1. Метод и алгоритмы оптимизации информационных процессов, обеспечивающих решение в параллельных средах задач булевой выполнимости и максимальной выполнимости за счет их декомпозиции, отличающиеся от известных методов тем, что, с целью повышения эффективности решения трудных вариантов этих задач по времени и точности, для построения декомпозиций используются вероятностные лазейки;

2. Эволюционные алгоритмы, реализующие поиск и оценивание эффективности лазеек различных типов в параллельной вычислительной среде, отличающиеся от известных алгоритмов тем, что, с целью повышения скорости генерации вероятностных лазеек, копии алгоритма на различных вычислительных ядрах взаимодействуют в рамках островной архитектуры;

3. Библиотека программ EvoguessAI, предназначенная для решения задач булевой выполнимости и максимальной выполнимости, отличающаяся

от известных библиотек тем, что, с целью повышения эффективности решения трудных вариантов этих задач по времени и точности, интенсивно эксплуатируется концепция лазеек.

Теоретическая и практическая значимость. Теоретическая значимость результатов работы состоит в обосновании возможности эффективного использования таких концепций как декомпозиционная трудность и вероятностные лазейки для параллельного решения трудных комбинаторных задач, сведенных к SAT и MaxSAT. Практическая значимость состоит в возможности использовать разработанные алгоритмы для решения конкретных трудных вариантов задач из классов SAT и MaxSAT, полученных из промышленных приложений (верификация, криптография, оптимизация производств).

Соответствие специальности. Диссертация содержит описание новых вычислительных алгоритмов, применимых к решению NP-полных и NP-трудных задач, а также теоретические оценки сложности некоторых из разработанных алгоритмов, и, таким образом, относится к таким направлениям как «Информатика» (Computer science) и «Теоретическая информатика» (Theoretical computer science). Среди выносимых на защиту положений основным является метод и сопутствующие этому методу алгоритмы, позволяющие за счет использования лазеек оптимизировать информационные процессы, которые обеспечивают решение задач SAT и MaxSAT в параллельных вычислительных средах. Таким образом, в этом контексте диссертация соответствует пункту 1 паспорта специальности 2.3.8 («Разработка компьютерных методов и моделей описания, оценки и оптимизации информационных процессов и ресурсов»), а также пункту 6 паспорта этой специальности («Обеспечение информационных систем и процессов»).

Достоверность результатов проведенных исследований. Достоверность полученных в работе теоретических результатов обеспечивается строгостью определений и доказательствами базовых свойств разработанных алгоритмов. Достоверность практических результатов подтверждается корректностью сценариев проведенных вычислительных экспериментов, а также тем, что результаты, выдаваемые современными SAT-решателями, являются верифицируемыми.

Апробация результатов работы. Результаты, представленные в диссертации, докладывались и обсуждались на следующих конференци-

ях: "EvoApplications 2019: International Conference on the Applications of Evolutionary Computation" (Германия, г. Лейпциг, 2019 г.), "MIPRO 2019: 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics" (Хорватия, г. Опатия, 2019 г.), "GECCO 2019: the Genetic and Evolutionary Computation Conference" (Чехия, г. Прага, 2019 г.), "CP 2021: 27th International Conference on Principles and Practice of Constraint Programming" (Франция, г. Монпелье, 2021 г.), "CEC 2022: IEEE Congress on Evolutionary Computation" (Италия, г. Падуя, 2022 г.), "PCT 2022: Parallel Computational Technologies" (Россия, г. Дубна, 2022 г.), "AAAI 2022: 36th AAAI Conference on Artificial Intelligence" (Канада, г. Ванкувер, 2022 г.), "CEC 2024: IEEE Congress on Evolutionary Computation" (Япония, г. Йокогама, 2024 г.).

Основные результаты по теме исследования были получены в рамках следующих проектов:

— грант РНФ № 18-71-00150 "Разработка эволюционных стратегий поиска декомпозиций трудных вариантов задачи булевой выполнимости с применением к обращению криптографических функций";

— грант РНФ № 22-21-00583 "Методы оценивания декомпозиционной трудности булевых формул с применением к задачам синтеза и верификации дискретных управляющих систем";

— проект ЛИЦ "Технологии сильного искусственного интеллекта в промышленности", 2022-2024;

— Госзадание № 2019-1339 "Метакогнитивные технологии системного искусственного интеллекта", 2021-2024;

— Интеграционный проект (ИТМО, Хуавей) "Алгоритмы решения SAT для логических схем и анализа программ", 2022-2024.

Публикации и личный вклад автора. Результаты работы опубликованы в 10 печатных и двух электронных работах, в том числе: одна статья в журнале, входящем в квартиль Q1 по системе Scimago, одна статья в журнале, входящем в Q2 по Scimago, шесть статей в трудах рейтинговых конференций по рейтингу CORE, среди которых одна имеет рейтинг A*, две - рейтинг A, три - рейтинг B; труды всех конференций индексируются в Scopus и (или) Web of Science.

Работы, о которых идет речь, это статьи [20, 21, 22, 23, 24, 25, 26, 27, 28, 29] в списке источников. Ниже детально представлен вклад всех авторов данных

публикаций.

В статье [20] соискателем разработаны и реализованы все основные алгоритмы, общая идея углубленного статистического тестирования поведения эволюционных алгоритмов на задачах алгебраического криптоанализа принадлежит М.В. Буздалову, используемые в статье тесты были разработаны В.И. Ульянцевым.

В статье [21] соискателем разработаны и реализованы все основные алгоритмы, базовая концепция решения рассмотренных в статье задач принадлежит А.А. Семёнову, тестовые примеры подготовлены В.И. Ульянцевым.

В статье [22] соискателем разработана и реализована асинхронная параллельная версия эволюционного алгоритма, применимая для оценивания декомпозиционной трудности булевых формул. Д.С. Чивилихину и А.А. Семёнову принадлежит формальная постановка задачи.

В статье [23] соискателем разработана островная модель, позволяющая осуществлять поиск вероятностных лазеек в параллельных средах при помощи эволюционных алгоритмов. А.А. Семёновым выполнено доказательство теоретических свойств новой версии алгоритма (1+1)-EA.

В статье [24] соискателю принадлежит разработка и реализация всех алгоритмов, входящих в описанный в работе фреймворк, А.А. Семёнову, О.С. Заикину и В.И. Ульянцеву принадлежат постановки задач криптоанализа, для решения которых использовался разработанный фреймворк.

В статье [25] соискателем разработаны и реализованы параллельные эволюционные алгоритмы поиска вероятностных лазеек. А.А. Семёновым сформулировано понятие вероятностной лазейки и доказаны теоретические свойства таких лазеек. Д.С. Чивилихиным совместно с соискателем разработан параллельный алгоритм решения SAT, использующий множественные вероятностные лазейки. С.Е. Кочемазовым разработаны сценарии экспериментов и построены тестовые примеры.

В статье [26] соискателем разработаны и реализованы параллельные эволюционные алгоритмы, использованные для оценивания декомпозиционной трудности булевых формул. Д.С. Чивилихиным предложены эвристические алгоритмы сокращения пространства, на котором осуществляется эволюционный поиск, а также разработаны техники, позволяющие сокращать доказательство невыполнимых формул за счет найденных декомпозиционных множеств. А.А.

Семёновым доказаны теоретические результаты, относящиеся к декомпозиционной трудности булевых формул.

В статье [27] соискателем разработаны и реализованы параллельные эволюционные алгоритмы, осуществляющие поиск линеаризующих множеств, которые используются для решения систем квадратичных уравнений над полем GF(2). А.А. Семёновым и К.В. Антоновым разработана теория решения квадратичных систем уравнений над GF(2) за счет использования линеаризующих множеств. С.Е. Кочемазовым построены редукции задач алгебраического криптоанализа к квадратичным системам уравнений над GF(2).

В статье [28] соискателем разработаны и реализованы параллельные эволюционные алгоритмы поиска декомпозиционных множеств. А.А. Семёновым и А.С. Игнатьевым предложена концепция декомпозиционной трудности булевых формул и получены относящиеся к этой концепции теоретические результаты. Д.С. Чивилихиным описан эвристический алгоритм сокращения пространства работы эволюционного алгоритма, выполняющего поиск эффективной декомпозиции. И.В. Отпущенниковым и В.И. Ульянцевым подготовлены наборы тестовых примеров, относящихся к верификации булевых схем.

В статье [29] соискателем разработаны и реализованы параллельные эволюционные алгоритмы поиска инверсных лазеек, примененные к построению атак на криптографические хеш-функции. А.А. Семёновым разработан общий алгоритм оценивания эффективности криптографической атаки через размер дерева специального вида. И.А. Грибановой и В.С. Кондратьевым построены редукции задач обращения криптографических хеш-функций семейства MD к SAT. А.И. Гладушем выполнены эксперименты на вычислительном кластере Санкт-Петербургского политехнического университета Петра Великого.

В положениях, выносимых на защиту, результаты, составляющие пункты 2 и 3, получены автором диссертации единолично. Базовая концепция метода, упоминаемого в пункте 1, разработана соискателем в неделимом соавторстве с А.А. Семёновым. Все алгоритмы, реализующие данную концепцию, разработаны и реализованы автором диссертации единолично.

Среди других результатов диссертации новая версия алгоритма (1+1)-EA, приведенная во второй главе, разработана автором в неделимом соавторстве с А.А. Семёновым и К.И. Чухаревым. Структура диссертации выстроена по принципу увеличения авторского вклада: первая глава является обзорной и со-

держит известные результаты; вторая глава содержит результаты, полученные автором как лично, так и в соавторстве; третья и четвертая главы содержат результаты, полученные автором единолично.

Краткое содержание работы

Глава 1 содержит базовые понятия и результаты, на основании которых были получены все результаты, выносимые на защиту.

В пункте 1.1 вводятся базовые определения: булевы функции, формулы, приводятся формулировки задач SAT и MaxSAT.

Пункт 1.2 содержит примеры комбинаторных задач, которые эффективно сводятся к SAT и MaxSAT, а именно: задача проверки логической эквивалентности булевых схем (Logical Equivalence Checking, LEC), задача обращения дискретной функции и задача размещения предприятий (BLP).

В пункте 1.3 рассматриваются основные параллельные стратегии решения SAT. Главный интерес представляет стратегия декомпозиции, которая предполагает разбиение пространства поиска на непересекающиеся подобласти, а основные её методы описываются подробнее в последующих разделах.

Пункт 1.4 содержит обзор алгоритмов решения SAT и MaxSAT. Рассматриваются неполные алгоритмы решения, в основе которых обычно лежит идеология локального поиска (LS) - такие алгоритмы не имеют гарантий того, что решение задачи будет получено за конечное время. Приводится краткая ретроспектива развития полных алгоритмов решения SAT: от метода резолюций и алгоритма DPLL до актуального на текущий момент алгоритма CDCL.

В пункте 1.5 вводится понятие «лазейки» и рассматриваются различные подходы построения оценок декомпозиционной трудности булевых формул относительно них.

В пункте 1.5.1 рассматривается подкласс лазеек именуемых как «сильные лазейки» (Strong Backdoor Set, SBS), знание которых позволяет разбивать (декомпозировать) исходную задачу на множество подзадач, для каждой из которых решение находится некоторым полиномиальным алгоритмом Ар.

В пункте 1.5.2 дается определение декомпозиционной трудности (d-hardness) формулы С. Декомпозиционная трудность С - это следующая ве-

личина:

ßA{C) = mm ßB,A(C),

в которой ßA,B(С) - это суммарное время работы алгоритма А на множестве формул, полученных в результате разбиения С по всевозможным наборам значений переменных из множества В.

В пункте 1.5.3 задача нахождения лазейки В * с минимальным значением декомпозиционной трудности рассматривается как проблема метаэвристиче-ской псевдобулевой оптимизации целевой функции, вычисляемой при помощи метода Монте-Карло. Таким образом, использование метаэвристик позволяет строить оценки декомпозиционной трудности булевых формул.

В пункте 1.5.4 вводится класс инверсных лазеек, предназначенных для оценивания стойкости криптографических функций относительно атак из класса «угадывай и определяй» (guess-and-determine), использующих SAT решатели.

В пункте 1.6 рассматриваются базовые алгоритмы метаэвристической оптимизации для поиска лазеек в задачах SAT и MaxSAT. Одним из простейших алгоритмов является эволюционный алгоритм, использующий мутацию по схеме (1+1). Также рассматривается алгоритм (1+1)-FEA, в котором применяется специальный оператор мутации, использующий переменную вероятность мутации р = 1/п за счет выбора I £ {1,...,п/2} в соответствии с некоторым законом распределения. Верхняя оценка сложности такого алгоритма существенно лучше в сравнении с классическим (1+1)-EA. Наконец, в этом же пункте описан вариант генетического алгоритма «Элитизм», который реализует вероятностный оператор селекции для дальнейшего применения оператора скрещивания.

Глава 2 содержит новые теоретические результаты, являющиеся базой для моделей и алгоритмов, представленных в главах 3 и 4.

Пункт 2.1 посвящен вероятностным лазейкам (р-лазейкам): приводится определение, описывается древовидное представление р-лазеек и их применение к решению SAT.

В пункте 2.1.1 вводятся вероятностные лазейки (р-лазейки), которые основываются на идеологии сильных лазеек (SBS). Множество В С X называется р-лазейкой, если доля SAT задач для формул вида [ß/B], разрешимых полиномиальным алгоритмом Ар по всем возможным ß полиномиальным ал-

горитмом Ар по всем возможным ¡3 £ {0,1}|в| не меньше некоторого фиксированного р £ [0,1]. Через С[Р/В] обозначена формула, полученная в результате подстановки в С набора ¡3 значений переменных из В. Для поиска р-лазеек предлагается использовать эволюционные алгоритмы.

В пункте 2.1.2 описываются древовидные вероятностные лазейки. Для произвольной лазейки В гиперкуб {0,1}|в| можно представить в виде дерева Тр высоты к = \В|. Каждая из 2к ветвей в таком дереве является некоторым набором значений из {0,1}^. Идея заключается в последовательном обходе дерева Тр, запуская полиномиальный алгоритм Ар в каждой его вершине. Если в некоторой внутренней вершине Тр алгоритм Ар выдает ответ из множества {SAT,UNSAT}, то такая вершина помечается символом и дальше продолжать из неё пути к листьям не имеет смысла. Листьям, в которых алгоритм Ар выдает ответ INDET, присваивается метка х. Тогда древовидная р-лазейка определяется как размеченное таким способом дерево, которое обозначается как Тр. Для р-лазеек с р, близким к 1, дерево Тр часто является существенным образом несбалансированным, и, соответственно, обход дерева Тр может потребовать существенно меньшего чем 2к числа вызовов алгоритма Ар.

В пункте 2.1.3 описан алгоритм решения SAT, который предполагает использование нескольких вероятностных лазеек. Для этого строится декартово произведение Г = Г х ... х Г множеств «сложных» задач (не решенных алгоритмом Ар для отдельных лазеек) Г^, i £ {1,..., s} для лазеек В\,... ,BS. После чего для всех формул вида С[7], 7 £ Г предлагается запустить полный алгоритм решения SAT.

В пункте 2.2 описывается природа псевдо-булевых оценочных функций используемых для поиска лазеек в задачах SAT и MaxSAT. Множество В, представляющее лазейку, задается с помощью битового вектора Хр £ {0,1}|ха оценочная функция в самом общем виде выглядит следующим образом:

Список литературы диссертационного исследования кандидат наук Павленко Артём Леонидович, 2024 год

Список литературы

[1] Biere, Armin. Handbook of satisfiability / Armin Biere, Marijn Heule, Hans van Maaren. — IOS press, 2009. — Vol. 185.

[2] Гэри, М. Вычислительные машины и труднорешаемые задачи / М Гэри, Д Джонсон. — 1982.

[3] Symbolic model checking without BDDs / Armin Biere, Alessandro Cimatti, Edmund Clarke, Yunshan Zhu // Tools and Algorithms for the Construction and Analysis of Systems: 5th International Conference, TACAS'99 Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS'99 Amsterdam, The Netherlands, March 22-28, 1999 Proceedings 5 / Springer. — 1999. — Pp. 193-207.

[4] D, Kroening. Software verification / Kroening D // Handbook of satisfiability. — IOS Press, 2021. — Pp. 791-818.

[5] Prestwich, Steven D. CNF Encodings / Steven D Prestwich // Handbook of satisfiability. — 2009. — Vol. 185. — Pp. 75-97.

[6] Bard, Gregory. Algebraic cryptanalysis / Gregory Bard. — Springer Science & Business Media, 2009.

[7] Courtois, Nicolas T. Algebraic cryptanalysis of the data encryption standard / Nicolas T Courtois, Gregory V Bard // Cryptography and Coding: 11th IMA International Conference, Cirencester, UK, December 18-20, 2007. Proceedings 11 / Springer. — 2007. — Pp. 152-169.

[8] Parallel logical cryptanalysis of the generator A5/1 in BNB-Grid system / Alexander Semenov, Oleg Zaikin, Dmitry Bespalov, Mikhail Posypkin // Parallel Computing Technologies: 11th International Conference, PaCT 2011, Kazan, Russia, September 19-23, 2011. Proceedings 11 / Springer. — 2011. — Pp. 473-483.

[9] On cryptographic attacks using backdoors for SAT / Alexander Semenov, Oleg Zaikin, Ilya Otpuschennikov et al. // Proceedings of the AAAI Conference on Artificial Intelligence. — Vol. 32. — 2018.

[10] Cook, S. A. The Complexity of Theorem-Proving Procedures / S. A. Cook // STOC. — ACM, 1971. — Pp. 151-158.

[11] Karp, R. M. Reducibility among Combinatorial Problems / R. M. Karp // Complexity of Computer Computations: Proceedings of a symposium on the Complexity of Computer Computations / Ed. by R. E. Miller, J. W. Thatcher, J. D. Bohlinger. — Boston, MA: Springer US, 1972. — Pp. 85-103.

[12] Levin, L. Universal Sequential Search Problems / L. Levin // Problems of Information Transmission. — 1973. — Vol. 9, no. 3. — Pp. 265-266.

[13] Chang, C.-L. Symbolic logic and mechanical theorem proving / C.-L. Chang, R. C. T. Lee. — Academic Press, 1973.

[14] Marques-Silva. GRASP-a new search algorithm for satisfiability / MarquesSilva, Karem A Sakallah // Proceedings of International Conference on Computer Aided Design / IEEE. — 1996. — Pp. 220-227.

[15] Marques-Silva, Joao P. GRASP: A search algorithm for propositional satisfiability / Joao P Marques-Silva, Karem A Sakallah // IEEE Transactions on Computers. — 1999. — Vol. 48, no. 5. — Pp. 506-521.

[16] Laporte, Gilbert. Introduction to location science / Gilbert Laporte, Stefan Nickel, Francisco Saldanha-da Gama. — Springer, 2019.

[17] Encoding cryptographic functions to SAT using TRANSALG system / Ilya Otpuschennikov, Alexander Semenov, Irina Gribanova et al. // arXiv preprint arXiv:1607.00888. — 2016.

[18] Translation of algorithmic descriptions of discrete functions to SAT with applications to cryptanalysis problems / Alexander Semenov, Ilya Otpuschennikov, Irina Gribanova et al. // Logical Methods in Computer Science. — 2020. — Vol. 16.

[19] Ignatiev, Alexey. PySAT: A Python toolkit for prototyping with SAT oracles / Alexey Ignatiev, Antonio Morgado, Joao Marques-Silva // International Conference on Theory and Applications of Satisfiability Testing / Springer. — 2018. — Pp. 428-437.

[20] Pavlenko, Artem. Fitness comparison by statistical testing in construction of SAT-based guess-and-determine cryptographic attacks / Artem Pavlenko, Maxim Buzdalov, Vladimir Ulyantsev // Proceedings of the Genetic and Evolutionary Computation Conference. — 2019. — Pp. 312-320.

[21] Pavlenko, Artem. Evolutionary computation techniques for constructing SAT-based attacks in algebraic cryptanalysis / Artem Pavlenko, Alexander Semenov, Vladimir Ulyantsev // Applications of Evolutionary Computation: 22nd International Conference, EvoApplications 2019, Held as Part of EvoStar 2019, Leipzig, Germany, April 24-26, 2019, Proceedings 22 / Springer. — 2019. — Pp. 237-253.

[22] Pavlenko, Artem. Asynchronous evolutionary algorithm for finding backdoors in Boolean satisfiability / Artem Pavlenko, Daniil Chivilikhin, Alexander Semenov // 2022 IEEE Congress on Evolutionary Computation (CEC) / IEEE. — 2022. — Pp. 1-8.

[23] Pavlenko, Artem. Using Island Model in Asynchronous Evolutionary Strategy to Search for Backdoors for SAT / Artem Pavlenko, Alexander Semenov // 2024 IEEE Congress on Evolutionary Computation (CEC) / IEEE. — 2024.

[24] Parallel Framework for Evolutionary Black-box optimization with Application to Algebraic Cryptanalysis / Artem Pavlenko, A Semenov, Vladimir Ulyantsev, Oleg Zaikin // 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) / IEEE. — 2019. — Pp. 1144-1149.

[25] On probabilistic generalization of backdoors in Boolean satisfiability / Alexander Semenov, Artem Pavlenko, Daniil Chivilikhin, Stepan Kochemazov // Proceedings of the AAAI Conference on Artificial Intelligence. — Vol. 36. — 2022. — Pp. 10353-10361.

[26] Chivilikhin, Daniil. Decomposing hard SAT instances with metaheuristic optimization / Daniil Chivilikhin, Artem Pavlenko, Alexander Semenov // International Journal of Artificial Intelligence. — 2023. — Vol. 21, no. 2. — Pp. 61-92.

[27] Using Linearizing Sets to Solve Multivariate Quadratic Equations in Algebraic Cryptanalysis / Alexander Semenov, Kirill Antonov, Stepan Kochemazov, Artem Pavlenko // IEEE Access. — 2023.

[28] Evaluating the hardness of SAT instances using evolutionary optimization algorithms / Alexander Semenov, Daniil Chivilikhin, Artem Pavlenko et al. // International Conference on Principles and Practice of Constraint Programming 2021 / Schloss Dagstuhl. — 2021. — P. 47.

[29] Measuring the effectiveness of SAT-based guess-and-determine attacks in algebraic cryptanalysis / Andrey Gladush, Irina Gribanova, Viktor Kondratiev et al. // International Conference on Parallel Computational Technologies / Springer. — 2022. — Pp. 143-157.

[30] Zaikin, Oleg. Branch location problems with maximum satisfiability / Oleg Zaikin, Alexey Ignatiev, Joao Marques-Silva // European Conference on Artificial Intelligence 2020 / IOS Press. — 2020. — Pp. 379-386.

[31] Цейтин, Григорий Самуилович. О сложности вывода в исчислении высказываний / Григорий Самуилович Цейтин // Записки научных семинаров ПОМИ. — 1968. — Vol. 8, no. 0. — Pp. 234-259.

[32] Bacchus, Fahiem. Maximum satisfiabiliy / Fahiem Bacchus, Matti Järvisalo, Ruben Martins // Handbook of satisfiability. — IOS Press, 2021. — Pp. 929991.

[33] Cook, Stephen A. The complexity of theorem-proving procedures / Stephen A Cook // Logic, Automata, and Computational Complexity: The Works of Stephen A. Cook. — 2023. — Pp. 143-152.

[34] Karp, Richard M. Reducibility among combinatorial problems / Richard M Karp. — Springer, 2010.

[35] Menezes, Alfred J. Handbook of applied cryptography / Alfred J Menezes, Paul C Van Oorschot, Scott A Vanstone. — CRC press, 2018.

[36] De Canniere, Christophe. Trivium: A stream cipher construction inspired by block cipher design principles / Christophe De Canniere // International Conference on Information Security / Springer. — 2006. — Pp. 171-186.

[37] McDonald, Cameron. Attacking bivium with minisat / Cameron McDonald, Chris Charnes, Josef Pieprzyk // ECRYPT Stream Cipher Project, Report. — 2007. — Vol. 40. — P. 2007.

[38] Eibach, Tobias. Attacking Bivium using SAT solvers / Tobias Eibach, Enrico Pilz, Gunnar Volkel // Theory and Applications of Satisfiability Testing-SAT 2008: 11th International Conference, SAT 2008, Guangzhou, China, May 12-15, 2008. Proceedings 11 / Springer. — 2008. — Pp. 63-76.

[39] Soos, Mate. Extending SAT solvers to cryptographic problems / Mate Soos, Karsten Nohl, Claude Castelluccia // International Conference on Theory and Applications of Satisfiability Testing / Springer. — 2009. — Pp. 244-257.

[40] Semenov, Alexander. Algorithm for finding partitionings of hard variants of boolean satisfiability problem with application to inversion of some cryptographic functions / Alexander Semenov, Oleg Zaikin // SpringerPlus.

— 2016. — Vol. 5. — Pp. 1-16.

[41] Location science. — Cham: Springer International Publishing, 2015.

[42] Hyvarinen, Antti EJ. Grid based propositional satisfiability solving / Antti EJ Hyvarinen et al. — 2011.

[43] Williams, Ryan. Backdoors to typical case complexity / Ryan Williams, Carla P Gomes, Bart Selman // IJCAI. — Vol. 3. — 2003. — Pp. 11731178.

[44] Kautz, Henry A. Incomplete Algorithms. / Henry A Kautz, Ashish Sabharwal, Bart Selman // Handbook of satisfiability. — 2009. — Vol. 185. — Pp. 185-203.

[45] Russell, S. Artificial Intelligence: A Modern Approach / S. Russell, P. Norvig.

— Prentice Hall, 1995.

[46] MacWilliams, Florence Jessie. The theory of error-correcting codes / Florence Jessie MacWilliams, Neil James Alexander Sloane. — Elsevier, 1977.

— Vol. 16.

[47] Kirkpatrick, S. Optimization by Simmulated Annealing / S. Kirkpatrick, D. Gelatt Jr., M. P. Vecchi // Sci. — 1983. — Vol. 220, no. 4598. — Pp. 671680.

[48] Semenov, Alexander A. Merging variables: one technique of search in pseudo-Boolean optimization / Alexander A Semenov // Mathematical Optimization Theory and Operations Research: 18th International Conference, MOTOR 2019, Ekaterinburg, Russia, July 8-12, 2019, Revised Selected Papers 18 / Springer. — 2019. — Pp. 86-102.

[49] Luke, Sean. Essentials of metaheuristics / Sean Luke. — 2nd edition. — 2009.

[50] Gottlieb, Jens. Evolutionary algorithms for the satisfiability problem / Jens Gottlieb, Elena Marchiori, Claudio Rossi // Evolutionary computation. — 2002. — Vol. 10, no. 1. — Pp. 35-50.

[51] Lardeux, Frédéric. GASAT: a genetic local search algorithm for the satisfiability problem / Frederic Lardeux, Frederic Saubion, Jin-Kao Hao // Evolutionary Computation. — 2006. — Vol. 14, no. 2. — Pp. 223-253.

[52] Schoning, T. A probabilistic algorithm for k-SAT and constraint satisfaction problems / T Schoning // 40th Annual Symposium on Foundations of Computer Science (Cat. No. 99CB37039) / IEEE. — 1999. — Pp. 410-414.

[53] Cai, Shaowei. Deep cooperation of CDCL and local search for SAT / Shaowei Cai, Xindi Zhang // Theory and Applications of Satisfiability Testing-SAT 2021: 24th International Conference, Barcelona, Spain, July 5-9, 2021, Proceedings 24 / Springer. — 2021. — Pp. 64-81.

[54] Cai, Shaowei. Local Search and Its Application in CDCL/CDCL (T) solvers for SAT/SMT / Shaowei Cai // 2023 Formal Methods in Computer-Aided Design (FMCAD) / IEEE. — 2023. — Pp. 1-1.

[55] Balint, Adrian. Choosing probability distributions for stochastic local search and the role of make versus break / Adrian Balint, Uwe Schoning // International Conference on Theory and Applications of Satisfiability Testing / Springer. — 2012. — Pp. 16-29.

[56] Improving implementation of SLS solvers for SAT and new heuristics for k-SAT with long clauses / Adrian Balint, Armin Biere, Andreas Frohlich, Uwe Schoning // Theory and Applications of Satisfiability Testing-SAT 2014: 17th International Conference, Held as Part of the Vienna Summer of Logic,

VSL 2014, Vienna, Austria, July 14-17, 2014. Proceedings 17 / Springer. — 2014. — Pp. 302-316.

[57] Mladenovic, Nenad. Variable neighborhood search / Nenad Mladenovic, Pierre Hansen // Computers & operations research. — 1997. — Vol. 24, no. 11. — Pp. 1097-1100.

[58] Otpuschennikov, Ilya V. Using merging variables-based local search to solve special variants of maxsat problem / Ilya V Otpuschennikov, Alexander A Semenov // Mathematical Optimization Theory and Operations Research: 19th International Conference, MOTOR 2020, Novosibirsk, Russia, July 6-10, 2020, Revised Selected Papers 19 / Springer. — 2020. — Pp. 363378.

[59] Semenov, Alexander. On some variants of the merging variables based (1+ 1)-evolutionary algorithm with application to maxsat problem / Alexander Semenov, Ilya Otpuschennikov, Kirill Antonov // Mathematical Optimization Theory and Operations Research: 20th International Conference, MOTOR 2021, Irkutsk, Russia, July 5-10, 2021, Proceedings 20 / Springer.

— 2021. — Pp. 111-124.

[60] Davis, Martin. A computing procedure for quantification theory / Martin Davis, Hilary Putnam // Journal of the ACM (JACM). — 1960.

— Vol. 7, no. 3. — Pp. 201-215.

[61] Davis, Martin. A machine program for theorem-proving / Martin Davis, George Logemann, Donald Loveland // Communications of the ACM. — 1962.

— Vol. 5, no. 7. — Pp. 394-397.

[62] Marques-Silva, Joao. Conflict-driven clause learning SAT solvers / Joao Marques-Silva, Ines Lynce, Sharad Malik // Handbook of satisfiability.

— IOS Press, 2021. — Pp. 133-182.

[63] Robinson, John Alan. A machine-oriented logic based on the resolution principle / John Alan Robinson // Journal of the ACM (JACM). — 1965. — Vol. 12, no. 1. — Pp. 23-41.

[64] Beame, Paul. Understanding the power of clause learning / Paul Beame, Henry Kautz, Ashish Sabharwal // IJCAI / Citeseer. — 2003. — Pp. 11941201.

[65] Beame, Paul. Towards understanding and harnessing the potential of clause learning / Paul Beame, Henry Kautz, Ashish Sabharwal // Journal of artificial intelligence research. — 2004. — Vol. 22. — Pp. 319-351.

[66] Haken, Armin. The intractability of resolution / Armin Haken // Theoretical computer science. — 1985. — Vol. 39. — Pp. 297-308.

[67] Ignatiev, Alexey. RC2: an efficient MaxSAT solver / Alexey Ignatiev, Antonio Morgado, Joäo Marques-Silva // Journal on Satisfiability, Boolean Modeling and Computation. — 2019. — Vol. 11, no. 1. — Pp. 53-64.

[68] Davies, Jessica. Solving MAXSAT by solving a sequence of simpler SAT instances / Jessica Davies, Fahiem Bacchus // International conference on principles and practice of constraint programming / Springer. — 2011. — Pp. 225-239.

[69] Berg, Jeremias. Core-boosted linear search for incomplete MaxSAT / Jeremias Berg, Emir Demirovic, Peter J Stuckey // Integration of Constraint Programming, Artificial Intelligence, and Operations Research: 16th International Conference, CPAIOR 2019, Thessaloniki, Greece, June 4-7, 2019, Proceedings 16 / Springer. — 2019. — Pp. 39-56.

[70] Measuring the Hardness of SAT Instances. / Carlos Ansotegui, Maria Luisa Bonet, Jordi Levy, Felip Manya // AAAI. — Vol. 8. — 2008. — Pp. 222-228.

[71] Stockmeyer, Larry. Classifying the computational complexity of problems / Larry Stockmeyer // The journal of symbolic logic. — 1987. — Vol. 52, no. 1.

— Pp. 1-43.

[72] Metropolis, Nicholas. The monte carlo method / Nicholas Metropolis, Stanislaw Ulam // Journal of the American statistical association. — 1949.

— Vol. 44, no. 247. — Pp. 335-341.

[73] Feller, William. An introduction to probability theory and its applications / William Feller. — 3rd edition. — John Wiley & Sons Inc., 1968. — Vol. 1.

[74] Golic, Jovan Dj. Cryptanalysis of alleged A5 stream cipher / Jovan Dj Golic // International Conference on the Theory and Applications of Cryptographic Techniques / Springer. — 1997. — Pp. 239-255.

[75] Boros, Endre. Pseudo-boolean optimization / Endre Boros, Peter L Hammer // Discrete applied mathematics. — 2002. — Vol. 123, no. 1-3. — Pp. 155-225.

[76] Muhlenbein, Heinz. How genetic algorithms really work: I. mutation and hillclimbing / Heinz Muhlenbein // Proc. 2nd Int. Conf. on Parallel Problem Solving from Nature, 1992 / Elsevier. — 1992.

[77] Droste, Stefan. On the analysis of the (1+ 1) evolutionary algorithm / Stefan Droste, Thomas Jansen, Ingo Wegener // Theoretical Computer Science. — 2002. — Vol. 276, no. 1-2. — Pp. 51-81.

[78] Wegener, Ingo. Theoretical aspects of evolutionary algorithms / Ingo Wegener // International colloquium on automata, languages, and programming / Springer. — 2001. — Pp. 64-78.

[79] Fast genetic algorithms / Benjamin Doerr, Huu Phuoc Le, Regis Makhmara, Ta Duy Nguyen // Proceedings of the genetic and evolutionary computation conference. — 2017. — Pp. 777-784.

[80] Karp, Richard M. Monte-Carlo approximation algorithms for enumeration problems / Richard M Karp, Michael Luby, Neal Madras // Journal of algorithms. — 1989. — Vol. 10, no. 3. — Pp. 429-448.

[81] Probabilistic generalization of backdoor trees with application to SAT / Alexander Semenov, Daniil Chivilikhin, Stepan Kochemazov, Ibragim Dzhiblavi // Proceedings of the AAAI Conference on Artificial Intelligence. — Vol. 37. — 2023. — Pp. 4095-4103.

[82] Samer, Marko. Backdoor Trees. / Marko Samer, Stefan Szeider // AAAI. — Vol. 8. — 2008. — Pp. 13-17.

[83] Nocedal, Jorge. Numerical optimization / Jorge Nocedal, Stephen J Wright.

— Springer, 1999.

[84] Look-Ahead Based SAT Solvers / Armin Biere, Marijn Heule, Hans van Maaren, Toby Walsh // Handbook of Satisfiability. — 2021.

— Vol. 336. — P. 183.

[85] Cube and conquer: Guiding CDCL SAT solvers by lookaheads / Marijn JH Heule, Oliver Kullmann, Siert Wieringa, Armin Biere // Haifa Verification Conference / Springer. — 2011. — Pp. 50-65.

[86] Alba, Enrique. Improving flexibility and efficiency by adding parallelism to genetic algorithms / Enrique Alba, Jose M Troya // Statistics and Computing.

— 2002. — Vol. 12, no. 2. — Pp. 91-114.

[87] Depolli, Matjaz. Asynchronous master-slave parallelization of differential evolution for multi-objective optimization / MatjaZ Depolli, Roman Trobec, Bogdan Filipic // Evolutionary computation. — 2013. — Vol. 21, no. 2. — Pp. 261-291.

[88] Whitley, L. Darrell. Island Model genetic Algorithms and Linearly Separable Problems / L. Darrell Whitley, Soraya B. Rana, Robert B. Heckendorn // Evolutionary Computing, AISB Workshop. — Vol. 1305 of LNCS. — 1997. — Pp. 109-125.

[89] Whitley, L. Darrell. The Island Model Genetic Algorithm: On Separability, Population Size and Convergence / L. Darrell Whitley, Soraya B. Rana, Robert B. Heckendorn // Journal of computing and information technology.

— 1999. — Vol. 7, no. 1. — Pp. 33-47.

[90] Roosta, Seyed H. Parallel processing and parallel algorithms: theory and computation / Seyed H Roosta. — Springer Science & Business Media, 2012.

[91] Kurdi, Mohamed. An effective new island model genetic algorithm for job shop scheduling problem / Mohamed Kurdi // Comput. Oper. Res. — 2016.

— Vol. 67. — Pp. 132-142.

[92] Kuehlmann, Andreas. Circuit-based Boolean reasoning / Andreas Kuehlmann, Malay K Ganai, Viresh Paruthi // Proceedings of the 38th annual Design Automation Conference. — 2001. — Pp. 232-237.

[93] Introduction to algorithms / Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, Clifford Stein. — 1st edition. — MIT press, 2022.

[94] Knuth, Donald E. The Art of Computer Programming: Seminumerical Algorithms, Volume 2 / Donald E Knuth. — Addison-Wesley Professional,

2014.

[95] Dadda, L. Some schemes for parallel multipliers / L Dadda // Alta Frequenza.

— 1965. — Vol. 34, no. 05. — Pp. 349-356.

[96] Kaufmann, Daniela. Verifying large multipliers by combining SAT and computer algebra / Daniela Kaufmann, Armin Biere, Manuel Kauers // 2019 Formal Methods in Computer Aided Design (FMCAD) / IEEE. — 2019. — Pp. 28-36.

[97] Gates, William H. Bounds for sorting by prefix reversal / William H Gates, Christos H Papadimitriou // Discrete mathematics. — 1979. — Vol. 27, no. 1.

— Pp. 47-57.

[98] Spence, Ivor. Weakening cardinality constraints creates harder satisfiability benchmarks / Ivor Spence // Journal of Experimental Algorithmics (JEA). —

2015. — Vol. 20. — Pp. 1-14.

[99] Audemard, Gilles. Predicting learnt clauses quality in modern SAT solvers / Gilles Audemard, Laurent Simon // Twenty-first international joint conference on artificial intelligence / Citeseer. — 2009.

[100] Wetzler, Nathan. DRAT-trim: Efficient checking and trimming using expressive clausal proofs / Nathan Wetzler, Marijn JH Heule, Warren A Hunt Jr // International Conference on Theory and Applications of Satisfiability Testing / Springer. — 2014. — Pp. 422-429.

[101] CaDiCaL, Kissat, Paracooba, Plingeling and Treengeling Entering the SAT Competition 2020 / Armin Biere, Katalin Fazekas, Mathias Fleury,

Maximillian Heisinger // Proc. of SAT Competition 2020 - Solver and Benchmark Descriptions. — Vol. B-2020-1 of Department of Computer Science Report Series B. — University of Helsinki, 2020. — Pp. 51-53.

[102] Een, Niklas. An extensible SAT-solver / Niklas Een, Niklas Sorensson // International conference on theory and applications of satisfiability testing / Springer. — 2003. — Pp. 502-518.

[103] Biere, Armin. Kissat SAT solver, commit 97917dd. — https://github.com/ arminbiere/kissat. — 2022. — Accessed: 2022-08-10.

[104] Gunther, Christoph G. Alternating step generators controlled by de Bruijn sequences / Christoph G Gunther // Advances in Cryptology—EUROCRYPT'87: Workshop on the Theory and Application of Cryptographic Techniques Amsterdam, The Netherlands, April 13-15, 1987 Proceedings 6 / Springer. — 1988. — Pp. 5-14.

[105] Raddum, Havard. Cryptanalytic Results on TRIVIUM. eSTREAM, ECRYPT stream cipher project, Report 2006/039, 2006.

[106] Maximov, Alexander. Two trivial attacks on Trivium / Alexander Maximov, Alex Biryukov // Selected Areas in Cryptography: 14th International Workshop, SAC 2007, Ottawa, Canada, August 16-17, 2007, Revised Selected Papers 14 / Springer. — 2007. — Pp. 36-55.

[107] Model design for a reduced variant of a Trivium Type Stream Cipher / Antonio Castro Lechtaler, Marcelo Cipriano, Edith Garcia et al. // Journal of Computer Science and Technology. — 2014. — Vol. 14, no. 01. — Pp. 55-58.

[108] Algebraic analysis of Trivium-like ciphers / Sui-Guan Teo, Kenneth Koon-Ho Wong, Harry Bartlett et al. // Cryptology ePrint Archive. — 2013.

[109] Khazaei, Shahram. Reduced complexity attacks on the alternating step generator / Shahram Khazaei, Simon Fischer, Willi Meier // International Workshop on Selected Areas in Cryptography / Springer. — 2007. — Pp. 116.

[110] Zeng, Kencheng. On the linear consistency test (LCT) in cryptanalysis with applications / Kencheng Zeng, Chung-Huang Yang, TRN Rao // Advances in Cryptology—CRYPTO'89 Proceedings 9 / Springer. — 1990. — Pp. 164-174.

[111] Zaikin, Oleg. An improved SAT-based guess-and-determine attack on the alternating step generator / Oleg Zaikin, Stepan Kochemazov // Information Security: 20th International Conference, ISC 2017, Ho Chi Minh City, Vietnam, November 22-24, 2017, Proceedings 20 / Springer. — 2017. — Pp. 2138.

[112] Glover, Fred. Tabu search / Fred Glover, Manuel Laguna. — Springer, 1998.

[113] Rokk 1.0. 1 / Takeru Yasumoto, Takumi Okuwaga, A Belov et al. // SAT Competition. — 2014. — Vol. 2014. — P. 70.

[114] Dinur, Itai. Cube attacks on tweakable black box polynomials / Itai Dinur, Adi Shamir // Advances in Cryptology-EUROCRYPT 2009: 28th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Cologne, Germany, April 26-30, 2009. Proceedings 28 / Springer. — 2009. — Pp. 278-299.

[115] A key-recovery attack on 855-round Trivium / Ximing Fu, Xiaoyun Wang, Xiaoyang Dong, Willi Meier // Annual International Cryptology Conference / Springer. — 2018. — Pp. 160-184.

[116] Een, Niklas. Translating pseudo-boolean constraints into SAT / Niklas Een, Niklas Sorensson // Journal on Satisfiability, Boolean Modeling and Computation. — 2006. — Vol. 2, no. 1-4. — Pp. 1-26.

[117] Logic minimization algorithms for VLSI synthesis / Robert K Brayton, Gary D Hachtel, Curt McMullen, Alberto Sangiovanni-Vincentelli. — Springer Science & Business Media, 1984. — Vol. 2.

Приложение А

Evolutionary Computation Techniques for Constructing SAT-Based Attacks in Algebraic Cryptanalysis

Artem Pavlenko1(B), Alexander Semenov2, and Vladimir Ulyantsev1

1 ITMO University, St. Petersburg, Russia {alpavlenko,ulyantsev}@corp.ifmo.ru 2 Matrosov Institute for System Dynamics and Control Theory SB RAS,

Irkutsk, Russia biclop.rambler@yandex.ru

Abstract. In this paper we present the results on applying evolutionary computation techniques to construction of several cryptographic attacks. In particular, SAT-based guess-and-determine attacks studied in the context of algebraic cryptanalysis. Each of these attacks is built upon some set of Boolean variables, thus it can be specified by a Boolean vector. We use two general evolutionary strategies to find an optimal vector: (1+1)-EA and GA. Based on these strategies parallel algorithms (based on modern SAT-solvers) for solving the problem of minimization of a special pseudo-Boolean function are implemented. This function is a fitness function used to evaluate the runtime of a guess-and-determine attack. We compare the efficiency of (1+1)-EA and GA with the algorithm from the Tabu search class, that was earlier used to solve related problems. Our GA-based solution showed the best results on a number of test instances, namely, cryptanalysis problems of several stream ciphers (cryptographic keystream generators).

Keywords: Algebraic cryptanalysis • Guess-and-determine attack • SAT • Evolutionary computation

1 Introduction

Algebraic Cryptanalysis (see [1]) is a way of breaking ciphers through solving systems of algebraic equations over finite fields. The corresponding attacks are called algebraic. Systems of algebraic equations constructed for strong ciphers are usually difficult for all known state-of-the-art algorithms. The resulting system of equations can be simplified by guessing the values of some of its variables. Then we can try all possible assignments of such variables, every time obtaining some simplified system. It might happen that the time spent by some algorithm on

The study was funded by a grant from the Russian Science Foundation (project No. 18-71-00150).

P. Kaufmann and P. A. Castillo (Eds.): EvoApplications 2019, LNCS 11454, pp. 237-253, 2019. https://doi.org/10.100r/9r8-3-030-16692-2_16

®

Check for updates

solving all such systems will be significantly smaller than the brute force attack time (e.g., if we test all possible keys of the cipher in question). An algebraic attack that uses some guessed bit set to simplify the system of cryptanalysis equations is called a guess-and-determine attack.

Over the last 20 years a large number of guess-and-determine attacks have been designed. In the vast majority of cases, a guess-and-determine attack is based on the analysis of the cipher features (see Fig. 1 with Trivium-Toy 64 cipher example). Such an analysis usually requires a lot of manual work. The recent papers [2,3] describe an automatic method for constructing guess-and-determine attacks. In the framework of this method, the weakened equations of cryptanalysis are solved using modern Boolean SATisfiability (SAT) solvers. Each guessed bit set is represented as a point in the Boolean hypercube. An arbitrary point is associated with the value of a special function that evaluates the complexity of the corresponding guess-and-determine attack. This function is a black-box pseudo-Boolean function. Thus, the construction of a guess-and-determine attack is reduced to the pseudo-Boolean black-box optimization problem. Various metaheuristic algorithms can be used to solve it and the papers [2,3] employ the simplest local search schemes, such as Simulated Annealing and Tabu Search. The main purpose of this paper is to demonstrate capabilities of evolutionary computation in application to the problem of automatic construction of guess-and-determine attacks in algebraic cryptanalysis. Below is the brief outline of the present work.

In Sect. 2 we introduce basic notations and facts of the presented paper. In particular, we briefly describe construction the known reduction of the problem of cryptographic attacks to the problem of pseudo-Boolean optimization. The corresponding pseudo-Boolean function & is not specified analytically and to minimize it metaheuristic algorithms related to local search methods were previously used [2,3]. In the present paper in order to solve this problem we apply two common strategies: (1+1)-Evolutionary Algorithm and Genetic Algorithm. The corresponding algorithms and techniques are described in Sect.3. Section 4 contains results of computational experiments. In Sect. 5 we summarize the obtained results and outline future research.

2 Preliminaries

In this section, we give some auxiliary information from the Boolean functions theory and cryptanalysis.

2.1 Boolean Functions, Formulas and Boolean Satisfiability Problem (SAT)

Let {0,1}fc, k e N denote a set of all binary words of length k ({0,1}0 corresponds to an empty word). The words from {0,1}fc, k > 1 are sometimes called Boolean vectors of length k, whereas the set {0,1}fc, (k > 1) is referred to as a Boolean hypercube of dimension k. An arbitrary total function of the form

Fig. 1. Visualization of the key stream generator Trivium-Toy 64 with three registers. The guessed bit set, which is considered as an individual of evolutionary algorithms in the present work, is colored grey

f : {0,1}k — {0,1} is called a Boolean function of arity k. An arbitrary function of the form h : {0,1}* — {0,1}*, where

{0,1}* = Q {0,1}k

k=0

is called a discrete function.

Boolean variables are the variables that take values from {0,1}. A Boolean formula with respect to k variables is an expression built by special rules over the alphabet comprising k Boolean variables x1 ,...,xk and special symbols called Boolean connectives. An arbitrary Boolean formula with respect to k variables defines a Boolean function of the kind fk : {0,1}k — {0,1}. The set of Boolean connectives is called complete if they can be used to create any Boolean function of arbitrary arity. Such a set is called a complete system of connectives or a complete basis. The following set is a complete basis: {A, V, -}, where A is conjunction, V is disjunction and - is negation.

Assume that F is an arbitrary Boolean formula, X is a set of variables that can be found in F, B is an arbitrary subset of X (B Q X). By {0,1}'s 1 we denote a set of all assignments of variables from B.

Let x be a Boolean variable. The formula that consists of a single variable or a negation is called a literal. Let x be an arbitrary Boolean variable. A pair of literals (x, -x) is called a complementary pair. An arbitrary disjunction of different literals, which do not have any complementary pairs among them, is called a clause. An arbitrary conjunction of different clauses is called a Conjunctive Normal Form (CNF). If C is a CNF and X = {xi, ...,xn} is a set of all Boolean variables that can be found in C, then we can say that C is a CNF over the set of variables X.

Let C be a CNF over X, \X\ = k. Denote fC : {0,1}k — {0,1} a Boolean function defined by the CNF C. The CNF C is called satisfiable if there exists such a E {0,1}k (i.e. an assignment of variables from X), that fC(a) = 1 holds.

If such a exists, then a is called the satisfying assignment of C. If such a does not exist, then C is called unsatisfiable. The Boolean Satisfiability Problem, shortly denoted as SAT, has the following formulation: for any given CNF C, find if C is satisfiable. SAT is a classic NP-complete problem [4].

Recently, SAT has been targeted by algorithms that demonstrate a high efficiency for a wide class of applied problems [5]. Application of SAT solvers proved to be very successful in the following areas: symbolic verification, bioinformatics, combinatorics and Ramsey's theory, cryptanalysis.

2.2 Guess-and-Determine Attacks in Algebraic Cryptanalysis

As has been mentioned above, the algebraic cryptanalysis implies solution of systems of algebraic equations (usually over the field GF(2)) that describe some cipher.

Any cipher can be considered as a total discrete function of the kind

f : {0,1}" ^{0,1}m (1)

Then the cryptanalysis problem can be considered in the context of the problem of finding a preimage for some known value of the function (1): using the known Y e Range f, Range f С {0,1}m, find such a e {0,1}" that f (a) = 7. We will call it the inversion problem for the function (1).

It is well-known (see, e.g. [1]) that the algorithm for calculating function (1) can be effectively described by a system of algebraic equations over the field GF(2). Denote this system by E(f). Roughly speaking, it describes the process of finding an output of the function f that corresponds to an arbitrary input. Let X be a set of all variables that can be found in E(f). Denote by Xln and Xout sets of variables that were assigned to inputs and outputs of the function f, respectively.

Substitution of the values into the system E(f) is determined in a standard way. It can be shown that if we substitute an arbitrary assignment a e {0,1}" for some variables from Xln into E(f), we can derive assignments for all other variables from E(f) by the following simple rules.

Substitute an arbitrary 7 e Range f into E(f), and let E(f, 7) be the resulting system. If we manage to solve E(f, Y) then we can extract such a e {0,1}" from the solution of E(f, 7) that f (a) = 7. However, this is a difficult problem for strong ciphers.

The simplest and most efficient way of solving E(f, 7) is a sequential substitution of all possible a e {0,1}" into E(f, 7). We denote the resulting system as E(f, a, 7). In line with the above, an arbitrary system E(f, a, 7) can be easily solved. If f (a) = 7, then by applying the simple rules mentioned above, we derive a contradiction from E(f, a, 7). If f (a) = 7, the contradiction does not occur and each variable from X will get a certain value. This scheme of trying all possible inputs corresponds to the method of cryptanalysis called the brute force attack.

For some ciphers and for functions of the form (1) that describe them, we can select a subset B of the set X with the following properties:

1. \B\^n;

2. the problem of finding a solution of E (f, j, ß) or proving its inconsistency can be solved relatively quickly using some algorithm A; here ß is an assignment for variables from B;

3. the total time required for finding a : f (a) = 7 by trying various ß G {0,1}'Б' is considerably smaller than the time of the brute force attack.

If the requirements listed above are fulfilled, then we can talk about a guess-and-determine attack based on the guessed bit set B.

Over the last 15-20 years a substantial number of different guess-and-determine attacks have been proposed. Some of them proved to be fatal for the corresponding ciphers. One of the simplest examples of the guess-and-determine attack is the attack on the A5/1 cipher described by Ross Andersen in 1994 [6]. For a long time, A5/1 served as a standard for encrypting the GSM traffic in cellular telephony. This cipher uses a 64-bit secret key. R. Anderson noted that if we choose in a certain way 53 bits of the internal state of the A5/1 registers and we know a certain number of bits of the key flow, then we can recover 11 unknown bits of the state of the registers by solving a trivial linear system over the field GF(2) (in fact, we just have to solve a triangular system). Therefore, the Anderson attack is based on the guessed bit set B : \ B\ = 53, the role of the A algorithm is played by the algorithm for solving systems of linear equations. The form of these equations enables implementation of this attack on specialized computational architectures, for instance, in [7] the Anderson attack was performed on an FPGA device. The runtime of the corresponding attacks is up to 10 h for one cryptanalysis problem.

A series of works on algebraic cryptanalysis (see e.g., [1,8-10]) implies that algorithm A does not have to be polynomial. The problem of solving systems of the form E(f, 7) can be efficiently reduced to combinatorial problems that are difficult in the worst case (NP-hard), but not difficult for most of their particular cases. One of the most computationally attractive problems here is SAT. Accordingly, any SAT-solving algorithm can play the role of A. In all the articles listed above, as well as in a number of other papers, the problem of finding a solution to an arbitrary system of the form E(f, 7) was reduced to SAT for CNF C(f, 7). To solve the resulting SAT instances, CDCL-based SAT solvers were used [11].

Some of those works describe guess-and-determine attacks where weakened cryptanalysis equations are solved using SAT-solvers. In [8] SAT-solvers were used to build a guess-and-determine attack on the truncated variants of the DES cipher, in [9] similar attacks targeted truncated variants of the GOST 28147-89 cipher. The paper [10] described a SAT-based guess-and-determine attack on the A5/1 cipher that used the guessed bit set B : \B\ = 31.

2.3 Automatic Methods for Constructing Guess-and-Determine Attacks

In all of the above attacks, the guessed bit set was selected while analyzing the characteristics of the cipher in question. In other words, the corresponding attacks were constructed manually. As mentioned above, papers [2,3] propose approaches to the automatic construction of guess-and-determine attacks. To this end, special functions were introduced: they evaluate the efficiency of the attack and are calculated using a probabilistic experiment. Such functions are black-box pseudo-Boolean functions [12]. In [2,3] the problem of constructing a guess-and-determine attack with the lowest time was reduced to the pseudo-Boolean black-box optimization problem. Next, we briefly describe the corresponding techniques presented in [3].

Consider the problem of finding the preimage for the function f : {0,1}n — {0,1}m, given by some efficient algorithm. Following the ideology of symbolic execution [4,13], we can build the CNF C(f) that possesses some important properties using the algorithm that computes f (for more details see [3]). Let

Y E {0,1}m be an output of the function f and let B be an arbitrary set of variables in C(f) that does not include variables from Xout. Consider B as a guessed bit set and denote by 3 an arbitrary assignment for variables from B. Let C(f, y, 3) denote the CNF resulting from the substitution of the assignments

Y and 3 into C(f).

Let A be some algorithm for solving SAT. Fix some positive number t. For each 3 E {0,1}B build a CNF C(f,Y,3) and apply the algorithm A to it, limiting the algorithm runtime to t. If A fails to solve the corresponding SAT instance in time t, then we terminate A and move to the next 3. If for some 3 E {0,1}B the algorithm A finds a satisfying assignment for C(f, y, 3) in time < t, then thereby it will find a E {0,1}n : f (a) = y.

It was shown in [3] that if a was randomly chosen in correspondence with a given on {0,1}n uniform distribution and if y : Y = f (a) is known, then we can determine the probability of the following event: by applying the brute force strategy described above to the set {0,1}B' we will find a : f (a) = y. Denote this probability by pB. It might be very small. Then we can repeat the strategy considering different outputs Yi ,---,Yr of the function f (for many ciphers, it is enough to solve this problem at least for one such output to find the secret key). Then the probability of finding the preimage for at least one output Y1,---,Yr is

P*b = 1 - (1 - PB )r.

It is obvious that for a fixed pB > 0 the probability PB tends to 1 as r increases.

Note that the strategy described above takes time 2s ■ t, (s = \B\) to process one output Yi, i E {1,..., r}, therefore, the upper bound for the total runtime of the corresponding attack is 2s ■ t ■ r. It was shown in [3] that if r « pB,

then P*B > 0.95, which means that the inversion of at least one of the r outputs of the function under consideration is an almost certain event. For this reason, it is desirable to find such B, for which the value

2s • t • —

PB

reaches the minimum on a set of possible alternatives of guessed bit sets. Unfortunately, if we want to calculate precisely the probability pB, we have to search through the entire set {0,1}", which is infeasible. Therefore, the probability pB takes the form of the expected value E[£B] of the random variable £B of a special kind. Thus, it is required to minimize the function

(2)

over all possible sets B. The variable £B is derived from simple probability experiments (see details in [3]), whereas we use the Monte Carlo method [14] to evaluate E[£b ].

The paper [3] considers the problem of constructing an efficient SAT-based guess-and-determine attack in the context of the minimization problem of a black-box pseudo-Boolean function with the vector xB as the function's input. The unit components of xB select the set B in the set of variables found in C(f). Then a probability experiment for calculating the function (2) is set up for this set B. The obtained number is an estimate of the time for the guess-and-determine attack, where B is used as a guessed bit set. The goal is to find B with the smallest value of the estimate for the function (2).

In [2,3], pseudo-Boolean optimization problems were solved by simple meta-heuristics such as Simulated Annealing and Tabu Search. The main results of this paper deal with application of evolutionary algorithms to these problems.

3 Applying Evolutionary Computations to Construction of Guess-and-Determine Attacks

In this section we formulate the problem of constructing an efficient guess-and-determine attack as the pseudo-Boolean optimization problem. We also describe the basic techniques of evolutionary computing that we employ.

3.1 Construction of an Efficient Guess-and-Determine Attack as the Problem of Pseudo-Boolean Function Minimization

We consider the cryptanalysis problem as the problem of inversion of the function (1). Construct the CNF C(f). Let X be a set of variables found in C(f). Let W = X \ Xout and let B be an arbitrary subset of W. We can present an arbitrary B with the help of the Boolean vector xB of length q = \W|. Ones in xB will indicate those variables from W that were included into B.

Following the ideas from [3], define the pseudo-Boolean function

$ : {0,1}9 — R. (3)

The arbitrary vector xB E {0,1}9 is an input of the function (3). We use this vector to build the set B, B Q W.

Recall that Xm, \Xm| = n is the set formed by the variables from X corresponding to the input of the function (1). Define uniform distribution on {0,1}n and choose the corresponding Boolean vectors a1,..., aM (aj E {0,1}n, j E {1,..., M}). We will refer to this set of vectors as a random sample of size M. In view of the above, substitution of an arbitrary aj into C(f) and application of simple rules yields a derivation of assignments for all variables from X. Let Yj be an assignment for variables from Xout obtained as a result of this derivation. By 3j we denote the assignment derived for variables from B, j e{1,..., M}.

Let us construct the CNFs C (f,3i ,Yi),...,C(f, 3 m , Ym ). Let each of these CNFs be an input of the SAT-solver A and set the solving time of each corresponding SAT instance as t. For each j e{1,..., M} consider a random variable j, that evaluates to 1 if SAT instance for the CNF C(f,3j ,Yj) was solved in time < t, and j =0 otherwise. The value of the function (3) at arbitrary point XB E {0,1}s is defined as follows:

$(XB) = 2 'B' ■ t , (4)

z^j=isb

where t and M are the parameters of this function.

In accordance with the Monte Carlo method, the bigger the size of the random sample M, the better approximation of (2) is given by the function (4).

Now consider the problem of finding the minimum of the function (4) over the Boolean hypercube {0,1}9. To solve this problem we will use evolutionary algorithms.

3.2 Evolutionary Computation Techniques Used for Minimization of the Suggested Pseudo-Boolean Function

In this section we present some techniques that complement such strategies as the (1+1)-Evolutionary Algorithm [15] ((1+1)-EA) and one variant of the Genetic Algorithm (GA). We used these techniques to solve the minimization problem for functions of type (4) for several stream ciphers.

Basic Schemes of Evolutionary Computation. Algorithm 1, which is given below, is a common outline of an evolutionary algorithm for solving the minimization problem of an arbitrary function of type (4).

As an input, the algorithm takes the CNF C(f), a random sample size M, the time limit t, the minimization strategy S, and the initial guessed bit set represented by the Boolean vector Xstart.

Algorithm 1. Evolutionary algorithm

Input: CNF formula C(f), initial sample size M, time limit t, strategy S, initial guessed bits Xstart

1: P ^ InitPopulation(S, Xstart)

2: (Xbest^best) ^ (Xstart, $ (Xstart ,M,t) )

3: Nstag ^ 0 > stagnation count

4: while not StopCondition( ) do

5: Nstag ^ Nstag + 1

6: for x in P do

7: v $(x,M,t)

8: if v < vbest then

9: (Xbest,Vbest) ^ (x,v)

10: Nstag ^ 0

11: end if

12: M ^ SELECTM()

13: end for

14: if Nstag < NsmagX then

15: P ^ GetNextPopulation(S, P)

16: else

17: P ^ RESTART(S, Xstart)

18: Nstag ^ 0

19: end if 20: end while 21: return (xbest,vbest)

Each Boolean vector x from {0,1}9 is considered as a population in the general concept of evolutionary computation. The value of the function <P of type (4) at point x is considered as a value of the fitness function for the corresponding population.

The function InitPopulation(S, xstart) forms the initial population according to the input vector xstart within the frames of the chosen evolutionary strategy S. The pair (xbest,^best) corresponds to point in {0,1}9 with the current Best Known Value (BKV) of the function

The value of <P at each specific point xB € {0,1}9 is computed using the scheme described in the previous section: construct the set B defined by x, construct the random sample {a1,..., aM}, find Boolean vectors 71,..., yM and f31,..., f3M induced by this random sample, and finally build the SAT instances C (f, (31,Y1),..., C(f, f3M ,yM ). We use the SAT-solver A to solve all these SAT instances, each SAT-instance should be solved in time t. If the satisfying assignment is found or the time limit t is exceeded, the corresponding solving process is terminated. In the first case we set j = 1, in the second case j = 0 (j = 1,..., M). Then we compute (4).

In Algorithm 1, the transition to the next population is performed via the function GetNextPopulation(S,P). The situation, when the algorithm fails to improve the current BKV during one iteration of the population change is called stagnation. If the number of stagnations exceeds some given limit ^Stag,

then we perform a restart: a new starting population is formed using the function Restart(S, Xstart). The algorithm stops if the given general limit (for example, 12 h of operation of a computing cluster) is exceeded.

Techniques of Improvement. Note that the value of the observed random variable becomes known only after algorithm A has run for some time (not greater than t). On the other hand, the greater M is, the more accurate estimate of the time of the guess-and-determine attack is given by the value of function (4). Accordingly, the computation time required to find the value of function (4) is critically dependent on the random sample size M. Therefore, the greater M is, the more computation is required to find the value of (4). When M > 500 we have to use a computing cluster to minimize (4). As we will see further from computational experiments, reduction of the random sample size allows significantly increasing the algorithm's speed (the speed correlated with the number of hypercube points in which the objective function value was calculated during optimization).

Next, we describe a special technique that helps significantly increase the number of points that the algorithm processes during a fixed time limit (function SelectM). This technique is based on the dynamic change of the random sample size. However, before we proceed, we will focus on some important details.

Note that if we consider functions of type (4), we can significantly reduce the dimension of the search space by taking into account the features of the original cryptanalysis problem. It was shown in [2,3] that we can consider {0,1}n, n = \Xm| as a hypercube over which we perform minimization. In other words, the set B can be searched for as some subset in the set of all variables of the secret key. This can explained by the fact that we have already mentioned above: substitution of arbitrary assignments of all variables from Xm into C(f) derives the assignment for all variables from X\Xm. The SAT-solver A performs this derivation very quickly (essentially, the running time is linearly dependent on \C(f )\, because in this case A does not have to solve a combinatorial problem). This happens because Xm is a Strong Unit Propagation Backdoor Set (SUPBS) [16]. Therefore, Xstart = 1n (the Boolean vector that consists of n ones) can always be chosen as a starting population of our minimization procedure. This point corresponds to the situation Bstart = Xm. Any subsequent point will define some subset of Xin.

Thus, values of the random variable £B in point Xstart = 1n are computed very quickly. This property holds for several subsequent points and can be observed in Fig. 2, where we present the results of two runs of the algorithm described above.

Since the time spent on solving a single SAT instance in the Random Sample during the Fast Descent phase is already small, the spectrum of the function (4) on such random samples contains close values. This means that there is no principal need to use random samples of big size. The general idea is to change M while the algorithm runs, depending on the "homogeneity" of the sample in terms of the values of the observed random variable. After a whole range of experiments,

1020

0 50 100 150 200 250 300 350

Iteration number

Fig. 2. Typical minimization of function (4). The process can be tentatively divided into two phases: Fast Descent and Meticulous Search.

we chose the following scheme: at the initial stage we use M = 10, then, passing through special checkpoints, M changes taking values 50,100, 300, 500, 800. The decision to change the sample size is made heuristically depending on the portion of j G {1,..., M}, for which j = 1 at the current value of M. General considerations here are as follows. Suppose that we consider two random samples R1 and R2 of size M = 1000 and that in 495 cases out of 1000 j = 1 for the sample Ri, and j = 1 for the sample R2 in 510 cases out of 1000. The difference between these two values is «3%. Now let M = 100, and again we consider two random samples R1, R2. Suppose that j = 1 in 47 cases out of 100 for R1 and j = 1 in 51 cases out of 100 for R2. Here we have the difference «8%. Therefore, if in this situation we take M = 100, we will loose «5% of accuracy, but we will spend 10 times less resources on computing the value of our function. Thus, in the initial search stage, when BKV often improves at each iteration, minor loss of accuracy is not an issue. In final iterations, when the algorithm takes too much time to improve some BKV, the accuracy of E] is very important, which is why it is reasonable to considerably increase the sample size in later stages of the algorithm for minimization of (4).

When traversing the hypercube we use hash tables to store the passed points. If the mutation results in a point where the value of the function (4) was calculated earlier, then, after finding this point in the hash table, recalculation is not performed.

Finally, here we describe a special variant of the Genetic Algorithm (GA), which we used for minimization of (4). This algorithm employs the technique known as elitism. Our algorithm works with populations that consist of N

individuals. As in the case of (1+1)-EA, the GA starts with the point Xstart = 1n for which it constructs N replicas. Each such replica is an individual. Let us describe an arbitrary iteration of the algorithm. Let Pcurr = {Ii,..., IN} and Pnext, \Pnext \ = N be the current and the new populations, respectively. Choose from Pcurr L individuals with the best values of the function (4) and move them to Pnext (elitism). Let vi,...,vN be the values of (4) for all individuals from Pcurr and consider the set of numbers U = {ui,...,uN}, where uj = V-. We associate with any individual Ij E Pcurr a number pj defined as pj = ^n3 u . For individuals from Pcurr, consider the set U with the probability distribution Xu = {pi,... ,Pn}. Choose randomly H individuals from Pcurr with respect to the distribution Xu and apply to each of them the standard mutation, flipping each bit with the probability n. Move the resulting individuals to Pnew. Finally, choose from Pcurr individuals with respect to the distribution Xu and perform over them one of the known crossover operations. Assume that G individuals were obtained as the result of crossover and move them to Pnew.

Thus the individuals with smaller value of function (4) are more likely to be selected for mutation or crossover. Finally, we require the following condition to be satisfied: L + H + G = N .In our experiments, we used a variant of the described algorithm with N =10, L = 2, H = G = 4.

4 Computational Experiments

We applied the algorithms and techniques described in the previous section to the cryptanalysis problems of keystream generators.

Stream cipher or keystream generator (see e.g. [17]) is a discrete function of kind (1), such that m ^ n. This function, taking an arbitrary n-bit word as input, generates a word of length m that behaves as a random sequence. The generator's input is called the secret key, the output is called the keystream. The practical implication of the keystream generator is very quick generation of a long keystream for a given random key. Short keys can be exchanged by participants via, for example, nonsymmetrical cryptography that provides substantial guarantees for resistance, but is extremely slow.

Suppose that for a common secret key a the participants A and B simultaneously generate the same keystream S = f (a). If A wants to send B a secret message x E {0,1}m, it creates a ciphertext x © S (a componentwise addition modulo 2), which is sent to the public channel. Upon receiving the ciphertext and knowing S, B easily finds x. It often happens that a malefactor (or adversary) M knows a fragment of the message x - this can be, for example, some known proprietary information which is ciphered together with the secret message. This very system vulnerability for a long time had been present in the traffic encryption protocol of cellular telephony (see [18]). In this situation, M can use a ciphertext to find the fragment S, denoted as Y, and then try to find a as an preimage of y for a mapping f. To do that, M has to solve a system of algebraic equations that describes construction of Y from a, or he has to solve a corresponding SAT instance. For a correct definition of a, the length of the

keystream fragment 7 should not be smaller than the length of a (usually it is slightly larger). Such cryptanalysis of a keystream fragment in order to find a secret key is called the known plaintext scenario [17].

In view of the above, consider the inversion problem for the function f : {0,1}" ^ {0,1}fc, к « n, where for the known 7 G Range f one needs to find such a G {0,1}" that f (a) = 7. We analyzed several functions that can be used as keystream generators:

1. Stream ciphers of the family Trivium (Trivium-Toy 64, Trivium-Toy 96,

Bivium).

2. Alternating Step Generator (ASG).

Below we give a brief description of these ciphers.

The Trivium stream cipher was proposed in [19]. Being one of the eSTREAM project winners, this cipher attracts a lot of attention from cryptanalysts. In several papers (see e.g. [20,21]) it was shown that there are guess-and-determine attacks for Trivium which are more efficient than brute force attacks in the context of the so-called state recovery problem.

The Bivium cipher is a weakened version of Trivium (it uses only two original registers out of three). This cipher, as well as Trivium, was described in [19], where the author states that it is of mainly a research interest. A series of attacks on Bivium (including the algebraic ones that use SAT) can be found in [21-25].

Papers [26,27] propose a general approach to the construction of Trivium-like ciphers with a smaller total size of registers which preserves the algebraic properties of the original Trivium. Below we follow [26] and refer to this family as Trivium-Toy. In particular, by Trivium-Toy L we denote a cipher from this family, in which L is a total size of state that should be recovered. Hereinafter we consider state recovery attacks on Trivium-Toy 96 (this cipher is described in [26]) and Trivium-Toy 64.

The Alternating Step Generator (ASG) was suggested in [28] and actually it is a common design of keystream generators that can be used for construction of stream ciphers with different lengths of the secret key. The most attractive property of ASG is ease of implementation and high speed of keystream generation. ASG was targeted by several attacks, the analysis of which is presented in [29]. Some of these attacks employ a significant amount of keystream. The original paper [28] describes an attack that tried all possible ways to fill the control register. Essentially, this is a guess-and-determine attack which uses the so-called Linear Consistency Test [30] as an algorithm for solving weakened systems of cryptanalysis equations. The paper [31] presents a SAT-based guess-and-determine attack for ASG based on building SAT Partitionings of hard variants of SAT described in [2].

We implemented the strategies (1+1)-EA and GA presented above and Tabu Search [32] in the same manner as it is described in [2,3]. The created program is an MPI application that uses the SAT solver ROKK [33] as a computational core.

Each experiment was conducted on 180 cores of a computing cluster equipped with Intel Xeon E5-2695 processors. The duration of one experiment was 12 h. In the minimization process of functions of the kind (4) we used a random sample of size M = 500. For each point taken as a solution we recalculated the value of the objective function using a sample of size M = 104. The resulting value was taken as final.

As test problems we considered cryptanalysis problems for keystream generators from the Trivium and the ASG families. The results of computational experiments are presented in Table 1. For a specific generator G notation G n/m means that one needs to find an n-bit secret key by analyzing m bits of a keystream. SAT instances encoding the corresponding cryptanalysis problems were constructed using the TRANSALG system [34] which translates algorithms for calculating discrete functions of the kind (1) into SAT.

Let us note that GA-elitism showed the best results among the considered strategies for minimization of functions of the kind (4), overtaking competitors in three test problems out of six.

Table 1. Experimental results for six cryptographic algorithms. The leftmost column contains the name of a keystream generator for which the cryptanalysis problem is considered. The remaining table is divided into three sections corresponding to strategies (1+1)-EA, GA and Tabu Search. The first column of each section contains the power of the guessed bit set which corresponds to the best guess-and-determine attack found. The second column contains a time estimation (in seconds) for the attack

(1+1)-EA GA Tabu Search

Power of G&D attack Power of G&D attack Power of G&D

guessed (seconds) guessed (seconds) guessed attack

bit set bits set bits set (seconds)

Trivium-Toy 64/75 21 3.19e+07 22 5.36e+07 17 4.30e+07

Trivium-Toy 96/100 33 1.28e+13 40 2.09e+12 34 3.14e+12

Bivium 177/200 32 2.60e+12 39 1.49e+12 40 4.29e+12

ASG 72/76 9 5604.8 8 6155.19 8 5601.33

ASG 96/112 13 6.76e+06 16 3.72e+06 14 3.95e+06

ASG 192/200 47 2.27e+18 44 2.84e+17 47 1.14e+16

5 Conclusion and Future Work

In the presented paper we applied evolutionary computation strategies to construct guess-and-determine attacks arising in algebraic cryptanalysis. Each of these attacks implies solving a family of algebraic equations over a finite field (usually, GF(2)). Each equation from such family is the result of substituting the values of some bits into a general equation describing how a considered cipher works. The substituted bits are called the guessed bits. To solve the equations we use SAT solvers, similar to a number of other works. It is clear that

different sets of guessed bits correspond to guess-and-determine attacks with different runtime. We consider the problem of finding a set of guessed bits that yields an attack with the lowest runtime as the problem of minimization of a pseudo-Boolean function. To solve it we used two general strategies employed in evolutionary computation: (1+1)-EA and GA. The proposed algorithms were implemented in the form of a parallel MPI program for a computing cluster. Using these algorithms we constructed guess-and-determine attacks for several keystream ciphers. The obtained attacks are better than the ones constructed using the Tabu Search algorithm, but not dramatically better. However, from our point of view, we have only touched the potential of evolutionary computation in this area. In the nearest future we plan to significantly extend the spectrum of employed evolutionary computation techniques in application to problems of constructing Algebraic cryptanalysis attacks.

Acknowledgements. The authors would like to thank Daniil Chivilikhin, Maxim Buzdalov and anonymous reviewers for useful comments.

References

1. Bard, G.V.: Algebraic Cryptanalysis. Springer, New York (2009). https://doi.org/ 10.1007/978-0-387-88757-9

2. Semenov, A., Zaikin, O.: Algorithm for finding partitionings of hard variants of boolean satisfiability problem with application to inversion of some cryptographic functions. SpringerPlus 5(1), 554 (2016)

3. Semenov, A., Zaikin, O., Otpuschennikov, I., Kochemazov, S., Ignatiev, A.: On cryptographic attacks using backdoors for SAT. In: Proceedings of AAAI 2018, pp. 6641-6648 (2018)

4. Cook, S.A.: The complexity of theorem-proving procedures. In: Proceedings of the Third Annual ACM Symposium on Theory of Computing, pp. 151-158. ACM (1971)

5. Biere, A., Heule, M., van Maaren, H., Walsh, T. (eds.): Handbook of Satisfiability, Frontiers in Artificial Intelligence and Applications, vol. 185. IOS Press (2009)

6. Anderson, R.: A5 (was: hacking digital phones). Newsgroup Communication (1994). http://yarchive.net/phone/gsmcipher.html

7. Gendrullis, T., Novotny, M., Rupp, A.: A real-world attack breaking A5/1 within hours. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 266-282. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85053-3_17

8. Courtois, N.T., Bard, G.V.: Algebraic cryptanalysis of the data encryption standard. In: Galbraith, S.D. (ed.) Cryptography and Coding 2007. LNCS, vol. 4887, pp. 152-169. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77272-9.10

9. Courtois, N.T., Gawinecki, J.A., Song, G.: Contradiction immunity and guess-then-determine attacks on GOST. Tatra Mountains Math. Publ. 53, 65-79 (2012)

10. Semenov, A., Zaikin, O., Bespalov, D., Posypkin, M.: Parallel logical cryptanalysis of the generator A5/1 in BNB-grid system. In: Malyshkin, V. (ed.) PaCT 2011. LNCS, vol. 6873, pp. 473-483. Springer, Heidelberg (2011). https://doi.org/10. 1007/978-3-642-23178-0.43

11. Marques-Silva, J., Lynce, I., Malik, S.: Conflict-driven clause learning SAT solvers. In: Frontiers in Artificial Intelligence and Applications, vol. 85, pp. 131-153 (2009)

12. Boros, E., Hammer, P.L.: Pseudo-Boolean optimization. Discrete Appl. Math. 123(1-3), 155-225 (2002)

13. King, J.C.: Symbolic execution and program testing. Commun. ACM 19(7), 385394 (1976)

14. Metropolis, N., Ulam, S.: The Monte Carlo method. J. Am. Stat. Assoc. 44(247), 335-341 (1949)

15. Rudolph, G.: Convergence Properties of Evolutionary Algorithms. Verlag Dr. Kovac, Hamburg (1997)

16. Williams, R., Gomes, C.P., Selman, B.: Backdoors to typical case complexity. In: IJCAI 2003, pp. 1173-1178 (2003)

17. Menezes, A.J., Vanstone, S.A., Oorschot, P.C.V.: Handbook of Applied Cryptography, 1st edn. CRC Press Inc., Boca Raton (1996)

18. Nohl, K.: Attacking Phone Privacy, pp. 1-6. Black Hat, Las Vegas (2010)

19. Canniere, C.: TRIVIUM: a stream cipher construction inspired by block cipher design principles. In: Katsikas, S.K., Lopez, J., Backes, M., Gritzalis, S., Preneel, B. (eds.) ISC 2006. LNCS, vol. 4176, pp. 171-186. Springer, Heidelberg (2006). https://doi.org/10.1007/11836810_13

20. Raddum, H.: Cryptanalytic Results on Trivium. eSTREAM, ECRYPT Stream Cipher Project, Report 2006/039 (2006)

21. Maximov, A., Biryukov, A.: Two trivial attacks on TRIVIUM. In: Adams, C., Miri, A., Wiener, M. (eds.) SAC 2007. LNCS, vol. 4876, pp. 36-55. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77360-3_3

22. Eibach, T., Pilz, E., Volkel, G.: Attacking Bivium using SAT solvers. In: Kleine Biining, H., Zhao, X. (eds.) SAT 2008. LNCS, vol. 4996, pp. 63-76. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79719-7_7

23. Eibach, T., Volkel, G., Pilz, E.: Optimising Grobner bases on Bivium. Math. Com-put. Sci. 3(2), 159-172 (2010)

24. Soos, M., Nohl, K., Castelluccia, C.: Extending SAT solvers to cryptographic problems. In: Kullmann, O. (ed.) SAT 2009. LNCS, vol. 5584, pp. 244-257. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02777-2_24

25. Huang, Z., Lin, D.: Attacking Bivium and Trivium with the characteristic set method. In: Nitaj, A., Pointcheval, D. (eds.) AFRICACRYPT 2011. LNCS, vol. 6737, pp. 77-91. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21969-6.5

26. Castro Lechtaler, A., Cipriano, M., Garcia, E., Liporace, J., Maiorano, A., Mal-vacio, E.: Model design for a reduced variant of a Trivium type stream cipher. J. Comput. Sci. Technol. 14(01), 55-58 (2014)

27. Teo, S.G., Wong, K.K.H., Bartlett, H., Simpson, L., Dawson, E.: Algebraic analysis of Trivium-like ciphers. In: Australasian Information Security Conference (ACSW-AISC 2014), vol. 149, pp. 77-81. Australian Computer Society (2014)

28. Gunther, C.G.: Alternating step generators controlled by De Bruijn sequences. In: Chaum, D., Price, W.L. (eds.) EUROCRYPT 1987. LNCS, vol. 304, pp. 5-14. Springer, Heidelberg (1988). https://doi.org/10.1007/3-540-39118-5_2

29. Khazaei, S., Fischer, S., Meier, W.: Reduced complexity attacks on the alternating step generator. In: Adams, C., Miri, A., Wiener, M. (eds.) SAC 2007. LNCS, vol. 4876, pp. 1-16. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77360-3_1

30. Zeng, K., Yang, C.H., Rao, T.R.N.: On the Linear Consistency Test (LCT) in cryptanalysis with applications. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 164-174. Springer, New York (1990). https://doi.org/10.1007/0-387-34805-0.16

31. Zaikin, O., Kochemazov, S.: An improved SAT-based guess-and-determine attack on the alternating step generator. In: Nguyen, P., Zhou, J. (eds.) ISC 2017. LNCS, vol. 10599, pp. 21-38. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69659-1.2

32. Glover, F., Laguna, M.: Tabu Search. Kluwer Academic Publishers, Boston (1997)

33. Yasumoto, T., Okuwaga, T.: ROKK 1.0.1. In: Belov, A., Diepold, D., Heule, M., Järvisalo, M. (eds.) SAT Competition 2014, p. 70 (2014)

34. Otpuschennikov, I., Semenov, A., Gribanova, I., Zaikin, O., Kochemazov, S.: Encoding cryptographic functions to SAT using TRANSALG system. In: ECAI 2016. FAIA, vol. 285, pp. 1594-1595 (2016)

Fitness Comparison by Statistical Testing in Construction of SAT-Based Guess-and-Determine Cryptographic Attacks

Artem Pavlenko Maxim Buzdalov Vladimir Ulyantsev

ITMO University ITMO University ITMO University

Saint Petersburg, Russia Saint Petersburg, Russia Saint Petersburg, Russia

alpavlenko@corp.ifmo.ru mbuzdalov@corp.ifmo.ru ulyantsev@corp.ifmo.ru

ABSTRACT

Algebraic cryptanalysis studies breaking ciphers by solving algebraic equations. Some of the promising approaches use SAT solvers for this purpose. Although the corresponding satisfiability problems are hard, their difficulty can often be lowered by choosing a set of variables to brute force over, and by solving each of the corresponding reduced problems using a SAT solver, which is called the guess-and-determine attack. In many successful cipher breaking attempts this set was chosen analytically, however, the nature of the problem makes evolutionary computation a good choice.

We investigate one particular method for constructing guess-and-determine attacks based on evolutionary algorithms. This method estimates the fitness of a particular guessed bit set by Monte-Carlo simulations. We show that using statistical tests within the comparator of fitness values, which can be used to reduce the necessary number of samples, together with a dynamic strategy for the upper limit on the number of samples, speeds up the attack by a factor of 1.5 to 4.3 even on a distributed cluster.

CCS CONCEPTS

• Security and privacy ^ Block and stream ciphers; Cryptanalysis and other attacks; • Theory of computation ^ Evolutionary algorithms; Constraint and logic programming.

KEYWORDS

Algebraic cryptanalysis, satisfiability problems, approximate fitness evaluation.

ACM Reference Format:

Artem Pavlenko, Maxim Buzdalov, and Vladimir Ulyantsev. 2019. Fitness Comparison by Statistical Testing in Construction of SAT-Based Guess-and-Determine Cryptographic Attacks. In Genetic and Evolutionary Computation Conference (GECCO '19), July 13-17, 2019, Prague, Czech Republic. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3321707.3321847

1 INTRODUCTION

Cryptanalysis is a field of cryptology that studies how hard it is to break cryptographic systems. The main desire of cryptanalysis is to find an easy way to break a given code, however, as most

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. GECCO '19, July 13-17, 2019, Prague, Czech Republic

modern cryptographic techniques are very strong, even a small improvement compared to brute force is sometimes a good result.

One of its subfields, algebraic cryptanalysis, aims at reducing problems of breaking codes to solving algebraic equations. For some classes of such equations, very powerful tools have been developed, which are of course used not only for cryptanalysis; the most prominent example is the Boolean satisfiability problem (SAT) and numerous efficient tools called SAT solvers. However, problems derived from cryptography are still very hard and often require some effort to make the existing tools usable.

One of the well-established ways to help SAT solvers with the cryptography problems is called the guess-and-determine attack, and is effectively a combination of brute force search on the carefully chosen set of Boolean variables (the guessed bit set) and running a SAT solver to determine the values of other variables. A good choice of the guessed bit set can significantly reduce the time of solving the equation, and thus to complete the attack. However, finding such a set is a hard problem, and most known attacks of this sort were designed analytically.

We continue a line of the research that aims at finding a good guessed bit set using evolutionary algorithms. Our main contribution is towards improvement of evaluation and comparison of fitness values. As this particular application domain is quite far away from a typical GECCO contribution, we first introduce it on a more detailed level, which will hopefully make the reader comfortable with the problem, and then formulate our contribution.

1.1 Basics of Cryptanalysis

A typical cryptographic scenario is to take some message, or plaintext, which needs to be secretly passed to the receiving side, and turn it into the ciphertext, that can be easily restored at the receiving side without losses, but is very hard to be understood when eavesdropped. The goal of cryptanalysis is to restore as much information from the plaintext data, given the ciphertext. Apart from guessing the exact plaintext, partial breaks (guessing the algorithm, deducing information about the plaintext) are often useful as well.

The attacks can be classified into the following categories [36]:

• Ciphertext-only: the only available data is the ciphertext (or a collection of ciphertexts).

• Known-plaintext: there are also plaintexts, for each of which the corresponding ciphertext is known.

• Chosen-plaintext (or chosen-ciphertext): one can obtain ciphertexts for as many plaintexts as one wishes (or vice versa), but such ability is granted only once.

• Adaptive chosen-plaintext (or adaptive chosen-ciphertext): same as above, but subsequent plaintexts (or ciphertexts) can be chosen based on information learned from previous ones.

• Related-key attack: one can obtain ciphertexts from the given plaintexts using two keys which are related in a known way.

An attack can require different resources in different quantities, such as time (the amount of calculations needed to complete the attack, or an equivalent measure), memory (how much of a computer memory is needed to make the attack possible), or data (the lengths of plaintexts or ciphertexts).

It is often the case that a particular research did not lead to a successful discovery of a plaintext from the given ciphertext (or to a similar success in terms of other possible types of attacks), but rather estimated the amount of the resources needed to do this, even if this amount is astronomically large. An example of such statement is "SHA-1 collisions now 252" [27], which basically means that a way is found to find two texts with the same value of the SHA-1 hash function in roughly 252 elementary operations. The presented amount of resources need not be practical to treat the attack as a successful one; in fact, numbers much smaller than the ones required to perform a brute force attack are to be interpreted as a signal to move away from the particular compromised cryptosystem long before such an attack becomes physically possible.

1.2 Symmetric and Public-Key Cryptography

Modern cryptographic algorithms, as well as the corresponding fields of study, can be roughly split in two categories: the symmetric-key cryptography and the public-key, or asymmetric cryptography. Cryptographic pseudorandom number generators is another related field of study, which is aimed at constructing cryptographically strong pseudorandom number generators, as well as finding and exploiting their weaknesses.

The symmetric-key cryptographic algorithms encrypt and decrypt the messages using the same secret key. An example of this approach is as follows: assume the plaintext message is a bit string A of length n, and there is a secret key K, also a bit string of length n, that is available both to the sender and the receiver. Then the cipher-text is B = AФ K, where the Ф is the bitwise exclusive-OR operation. The receiver can restore the plaintext by applying the same operation B®K = (AФK)фK = A. The cryptographic strength in all such algorithms is determined by the properties of K, more precisely, of the algorithms that generate such secret keys. An essential practice is to avoid using the same K for more than one message, which leads to considering stream ciphers, also known as keystreams, instead of single secret keys K, and the necessity of keystream generators. Many cryptographic attacks on symmetric-key algorithms try to

predict the keystream bits following the eavesdropped sequence.

On the other hand, the public-key cryptography requires two essentially different keys: the public key, which is known to everyone and is used to generate ciphertexts intended to be read by the receiver, and the private key, which is known only to the receiver and is used to decrypt the incoming messages. The cryptographic strength is based on the difficulty of restoring the private key using the available information, such as the public key. The RSA algorithm [35] is a well-known example of this approach. Note that digital signatures are also based on similar principles where the roles of the keys are essentially reversed: a private key is used to compute the signature, while a public key is used to ensure to verify the sender's identity and that the message is not altered.

In this paper, we restrict ourselves with symmetric-key cryptography and stream ciphers.

1.3 Algebraic Cryptanalysis

Algebraic cryptanalysis performs attacks by reducing the problem of conducting a particular attack to a system of polynomial equations, typically over some finite field, and then solving this system [2]. The attack generally follows three phases.

First, an algorithm is translated to a system of equations, where the set of variables generally consists of the input and the output bits, or, in the case of keystream generators, the output bits and the internal state of the algorithm. The system is intended to be solved for concrete values of the output bits, and when it is solved, values of all other variables become known, thus revealing either the plaintext or the internal state of the algorithm, where the latter enables predicting the rest of the keystream. This translation can be manual, which allows for some additional considerations that can simplify the system, or automated [20, 30].

Second, the constructed system is analyzed, and the efficient way of solving it is chosen. A SAT solver can be used [8, 9, 37, 38], some of which are designed to work well with cryptography problems [39]. Other means are also of use, such as solvers for constraint satisfaction problems [15] or less-known approaches such as the Characteristic Set method [18, 19]. On this stage, the bounds on time and data can be estimated. Finally, the actual attack can be performed when the data becomes available.

1.4 Guess-and-Determine Attacks

Existing solvers of equations over finite fields, such as SAT solvers, typically find cryptography-derived equations to be very hard. However, sometimes it is possible to hybridize such a solver with plain brute force over a subset of variables, letting the solver do its work for the rest of variables once the chosen ones are fixed to certain values. Of course, it would be a surprise if a randomly chosen subset of variables worked, but a good choice can result in significant reduction of the attack time. This technique is called the guess-and-determine attack, and the set ofvariables to brute force over is called the guessed bit set.

In fact, such an attack can use a much simpler tool for solving the remaining equations. One of the simplest examples of the guess-and-determine attack was presented in [1] for the A5/1 cipher, which served as a standard for encrypting the GSM traffic in cellular telephony. The internal state of this cipher is completely described by 64 bits. It was found that a carefully chosen set of 53 bits enables finding the remaining 11 bits using a triangular system of equations.

Most guess-and-determine attacks use more powerful tools to solve reduced systems of equations, however the choice of the guessed bit set is still largely based on analysis of the properties of the cipher [9, 18]. Some of the recent papers tend to automate this process by using e.g. tabu search or genetic algorithms [37, 38].

1.5 Evolutionary Computation Applied to Cryptography

Applications of evolutionary computation to cryptanalysis, and to cryptography in general, are wide and diverse and can be both destructive and constructive.

The survey [22] references works using evolutionary computation methods for finding short addition chains that are used to speed up modular exponentiation of the form B = Ac mod P compared to the classical binary method, breaking Merkle-Hellman knapsack-based cryptosystems by a constant factor faster than previously known methods, constructing cryptographic functions that possess certain properties (having high nonlinearity, being balanced) or their building blocks called the S-boxes, evolving cryptographic pseudorandom number generators. A lot of work towards constructing cryptographically sound functions was also done in [7].

An earlier survey [31] also mentions more exotic works connected with evolutionary computation, such as the ICIGA system that uses the building blocks of evolutionary algorithms (mutation and crossover) as instruments for encryption and decryption [41], or genetic programming to evolve hardware for RSA cryptosystems that is fast, consumes small enough space on a chip, and is resistant to side-channel leakage, all in the same time [28].

There is also some recent interest for analysing quantum cryptography using evolutionary algorithms. For instance, [23] proposes a genetic algorithm to measure the tolerance of a quantum key distribution algorithm to adversaries that are not all-powerful but rather having access only to physically possible devices.

Genetic algorithms working on binary strings are often seen as natural candidates for cracking ciphers. For instance, [13] attempts to crack the RC4 keystream generator using the genetic algorithm, where the individuals are the descriptions of the initial state of the cipher (the state is a permutation of size 256). The fitness function reflects how much of the keystream produced from the guessed state matches the required keystream. The authors derive the attack time to be 21215 generations, however, this estimation is based on the extrapolation of the observed progress, and thus is rather questionable. A particle swarm optimizer and simulated annealing were also tested, but were found to be inferior to the genetic algorithm.

The work [32] also attacks the RC4 cipher, but in a different way: here, the individual is the description of a linear feedback shift register, with which one tries to approximate the cipher. The fitness is as follows: the register is initialized with the first bits of the keystream, and the rest of the real keystream is compared with the keystream produced by the individual, where the needed amount of first bits is not considered. The paper reports a maximum similarity of the fitted linear feedback shift register to be 80%.

1.6 Contribution and Structure of the Paper

This paper continues a line of the research which:

• performs cryptanalysis of keystream generators, thus staying within the realm of symmetric cryptography;

• uses an automated conversion of their definition to SAT formulas using tools such as [30];

• implements guess-and-determine attacks over these SAT formulas using existing SAT solvers to do the rest [37];

• chooses guessed bit sets using black-box optimizers (evolutionary algorithms in this paper, tabu search in [38]).

In this framework, measuring the fitness of a guessed bit set relates on Monte-Carlo sampling and becomes more precise with more samples, which will be detailed in subsequent sections. We aim at reducing the overall computing time, without changing the

landscape of the results, by improving the detection of whether one fitness is smaller than, greater than, or equal to another fitness using statistical testing. This improved comparison allows reducing the number of samples, needed to perform before the decision is made with enough confidence. As a result, the running times needed to find a good guessed bit set reduced by up to 4.3 times, even when running on a computing cluster with all the scheduling overheads.

The rest of the paper is structured as follows. Section 2 explains the pipeline in more detail, including the description of the fitness function. Section 3 elaborates on the proposed usage of statistical testing to improve comparison of fitness values, taking into account that the entire algorithm is typically run in a distributed environment. Section 4 presents results of experiments with and without the proposed technique, and also discusses the consequences of applying certain statistical tests. Section 5 concludes the paper and gives the final remarks.

2 PRELIMINARIES

2.1 Boolean Satisfiability and SAT Solvers

The Boolean satisfiability problem (SAT) is defined as follows. There are n Boolean variables x\,..., xn which take values from {false, true}, or, alternatively written, from {0; 1}. An expression F(x1,..., xn) is given, which evaluates to a Boolean value and has a known structure. The problem is to find a satisfying assignment, which is a mapping from a variable x; to the value v; it needs to take, such that F(v1,..., vn) = 1, or to show that there is no satisfying assignment.

The expressions are usually presented using a small set of small Boolean functions, or a basis. If any Boolean function can be expressed in a certain basis, such basis is called complete. The most popular complete basis consists of negation (a function that takes one argument x and returns 1 - x, denoted as -x), conjunction (takes two arguments and returns 1 iff both arguments are 1, denoted as A) and disjunction (takes two arguments and returns 0 iff both arguments are 0, denoted as V). As both conjunction and disjunction are associative and commutative, an expression of the form x1 A x2 A ... A xn is called a conjunction of variables x1 ... xn, and similarly x1 V x2 V . . . V xn is a disjunction of these variables.

Practical satisfiability problems are usually presented in the conjunctive normal form, or CNF. In this form, the expression F is a conjunction of m subexpressions F1 A ... A Fm, where each F; is a disjunction of either a variable or its negation. Many classes of problems can be expressed in CNF in polynomial time, however, finding a satisfying assignment is a well-known NP-complete problem, apart from a few special cases.

A large demand to solve satisfiability problems appearing in practice has led to development of many highly efficient software tools called SAT solvers [5]. There are yearly competitions dedicated to comparison of SAT solvers on problems from different areas [4].

2.2 SAT-Based Keystream Breaking

Обратите внимание, представленные выше научные тексты размещены для ознакомления и получены посредством распознавания оригинальных текстов диссертаций (OCR). В связи с чем, в них могут содержаться ошибки, связанные с несовершенством алгоритмов распознавания. В PDF файлах диссертаций и авторефератов, которые мы доставляем, подобных ошибок нет.

Оглавление диссертации кандидат наук Павленко Артём Леонидович

Рекомендованный список диссертаций по специальности «Другие cпециальности», 00.00.00 шифр ВАК

Методы и средства преобразования процедурных описаний дискретных функций в булевы уравнения2011 год, кандидат технических наук Отпущенников, Илья Владимирович

Методы синтеза тестов для цифровых синхронных схем на основе реконфигурируемых аппаратных средств2008 год, кандидат технических наук Борисевич, Алексей Валерьевич

Генерация конечных автоматов на основе муравьиных алгоритмов2015 год, кандидат наук Чивилихин Даниил Сергеевич

Информационная технология минимизации функционалов, ассоциированных с задачей выполнимость2009 год, кандидат физико-математических наук Хныкин, Иван Геннадьевич

Введение диссертации (часть автореферата) на тему «Модели и алгоритмы вычислительных процессов для решения трудных вариантов проблемы булевой выполнимости»

Похожие диссертационные работы по специальности «Другие cпециальности», 00.00.00 шифр ВАК

Параллельная технология решения SAT-задач и ее реализация в виде пакета прикладных программ2008 год, кандидат технических наук Заикин, Олег Сергеевич

Методы обращения дискретных функций с применением двоичных решающих диаграмм2010 год, кандидат физико-математических наук Игнатьев, Алексей Сергеевич

Приближенные методы решения таблиц покрытий для синтеза комбинационных схем из ПЛМ1984 год, кандидат технических наук Бабушкин, Владимир Иванович

Разработка и исследование параллельных комбинаторных алгоритмов2007 год, кандидат технических наук Тимошевская, Наталия Евгеньевна

Список литературы диссертационного исследования кандидат наук Павленко Артём Леонидович, 2024 год