---
bibliography:
- bibliography.bib
---

```{=latex}
\renewcommand{\sfdefault}{lmss}
```
```{=latex}
\renewcommand{\familydefault}{\rmdefault}
```
```{=latex}
\renewcommand{\familydefault}{\rmdefault}
```
```{=latex}
\newcommand{\simpleicml@headingfont}{}
```
```{=latex}
\newcommand{\icmlpdfinfo}[1]{\hypersetup{#1}}
```
```{=latex}
\renewcommand{\headrulewidth}{0.3pt}
```
```{=latex}
\renewcommand{\headrule}{\color{ICMLAccent}\rule{\headwidth}{\headrulewidth}}
```
```{=latex}
\newcommand{\icmlrunningtitle}[1]{\def\@icmlrunningtitle{#1}}
```
```{=latex}
\newcommand{\@icmlrunningtitle}{}
```
```{=latex}
\newcommand{\theicmltitle}{%
  \ifx\@icmlrunningtitle\empty
    \@icmltitle
  \else
    \@icmlrunningtitle
  \fi
}
```
```{=latex}
\newenvironment{compacttable}
  {\begingroup\setlength{\tabcolsep}{4pt}}
  {\endgroup}
```
```{=latex}
\def\@icmltitle{}
```
```{=latex}
\def\@icmlauthors{}
```
```{=latex}
\def\@icmlaffiliations{}
```
```{=latex}
\def\@icmlabstract{}
```
```{=latex}
\newcommand{\icmltitle}[1]{\gdef\@icmltitle{#1}}
```
```{=latex}
\newcommand{\icmlauthors}[1]{\gdef\@icmlauthors{#1}}
```
```{=latex}
\newcommand{\icmlaffiliations}[1]{\gdef\@icmlaffiliations{#1}}
```
```{=latex}
\newcommand{\icmlabstract}[1]{\gdef\@icmlabstract{#1}}
```
```{=latex}
\newcommand{\icmlmaketitle}{%
  % Build title in a global box FIRST
  \global\setbox\@icmltitlebox=\vbox{%
    \begin{center}%
      {\LARGE\bfseries \@icmltitle \par}%
      \vspace{0.6em}%
      {\large \@icmlauthors \par}%
      \vspace{0.3em}%
      {\normalsize
        \begin{minipage}{0.96\textwidth}\centering
          \@icmlaffiliations\par
        \end{minipage}%
      }\par
      \vspace{0.6em}
      \noindent
      {\setlength{\fboxsep}{10pt}%
          \colorbox{ICMLAccent!10}{%
            \begin{minipage}{0.95\textwidth}%
              \@icmlabstract
            \end{minipage}%
          }%
      }\par
    \end{center}%
  }%
  % Now use the box in twocolumn
  \thispagestyle{empty}%
  \twocolumn[\unvbox\@icmltitlebox]
}
```
```{=latex}
\newcommand{\@runningtitle}{}
```
```{=latex}
\renewcommand{\icmlrunningtitle}[1]{\gdef\@runningtitle{#1}}
```
```{=latex}
\newcommand{\sectionheaderline}[1]{\gdef\@sectionline{#1}}
```
```{=latex}
\newcommand{\@sectionline}{}
```
```{=latex}
\newcommand{\seclink}[3]{%
  \ifnum#2=\value{currentsection}
    \textbf{\hyperref[#1]{Sec #2: #3}}%
  \else
    \hyperref[#1]{Sec #2: #3}%
  \fi
}
```
```{=latex}
\newcommand{\negval}[1]{\textcolor{textred}{\scriptsize (#1)}}
```
```{=latex}
\newcommand{\posval}[1]{\textcolor{textgreen}{\scriptsize (#1)}}
```
```{=latex}
\newcommand{\cmark}{\textcolor{darkblue}{\ding{51}}}
```
```{=latex}
\newcommand{\xmark}{\textcolor{darkred}{\ding{55}}}
```
```{=latex}
\providecommand\@ifxundefined[1]{%
 \ifx#1\@undefined\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
}
```
```{=latex}
\providecommand\@ifnum[1]{%
 \ifnum#1\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
}
```
```{=latex}
\providecommand\@ifx[1]{%
 \ifx#1\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
}
```
```{=latex}
\providecommand\appdef[2]{%
 \toks@\expandafter{#1}\@temptokena{#2}%
 \edef#1{\the\toks@\the\@temptokena}%
}
```
```{=latex}
\newcommand\bibstyle@chicago{\bibpunct{(}{)}{;}{a}{,}{,}}
```
```{=latex}
\newcommand\bibstyle@named{\bibpunct{[}{]}{;}{a}{,}{,}}
```
```{=latex}
\newcommand\bibstyle@agu{\bibpunct{[}{]}{;}{a}{,}{,~}}
```
```{=latex}
\newcommand\bibstyle@copernicus{\bibpunct{(}{)}{;}{a}{,}{,}}
```
```{=latex}
\let\bibstyle@egu=\bibstyle@copernicus
```
```{=latex}
\let\bibstyle@egs=\bibstyle@copernicus
```
```{=latex}
\newcommand\bibstyle@agsm{\bibpunct{(}{)}{,}{a}{}{,}\gdef\harvardand{\&}}
```
```{=latex}
\newcommand\bibstyle@kluwer{\bibpunct{(}{)}{,}{a}{}{,}\gdef\harvardand{\&}}
```
```{=latex}
\newcommand\bibstyle@dcu{\bibpunct{(}{)}{;}{a}{;}{,}\gdef\harvardand{and}}
```
```{=latex}
\newcommand\bibstyle@aa{\bibpunct{(}{)}{;}{a}{}{,}}
```
```{=latex}
\newcommand\bibstyle@pass{\bibpunct{(}{)}{;}{a}{,}{,}}
```
```{=latex}
\newcommand\bibstyle@anngeo{\bibpunct{(}{)}{;}{a}{,}{,}}
```
```{=latex}
\newcommand\bibstyle@nlinproc{\bibpunct{(}{)}{;}{a}{,}{,}}
```
```{=latex}
\newcommand\bibstyle@cospar{\bibpunct{/}{/}{,}{n}{}{}%
     \gdef\bibnumfmt##1{##1.}}
```
```{=latex}
\newcommand\bibstyle@esa{\bibpunct{(Ref.~}{)}{,}{n}{}{}%
     \gdef\bibnumfmt##1{##1.\hspace{1em}}}
```
```{=latex}
\newcommand\bibstyle@nature{\bibpunct{}{}{,}{s}{}{\textsuperscript{,}}%
     \gdef\bibnumfmt##1{##1.}}
```
```{=latex}
\newcommand\bibstyle@plain{\bibpunct{[}{]}{,}{n}{}{,}}
```
```{=latex}
\let\bibstyle@alpha=\bibstyle@plain
```
```{=latex}
\let\bibstyle@abbrv=\bibstyle@plain
```
```{=latex}
\let\bibstyle@unsrt=\bibstyle@plain
```
```{=latex}
\newcommand\bibstyle@plainnat{\bibpunct{[}{]}{,}{a}{,}{,}}
```
```{=latex}
\let\bibstyle@abbrvnat=\bibstyle@plainnat
```
```{=latex}
\let\bibstyle@unsrtnat=\bibstyle@plainnat
```
```{=latex}
\let\NAT@merge\z@
```
```{=latex}
\def\NAT@sort{\z@}
```
```{=latex}
\def\NAT@cmprs{\z@}
```
```{=latex}
\def\NAT@nmfmt#1{{\NAT@up#1}}
```
```{=latex}
\renewcommand\bibstyle[1]{\csname bibstyle@#1\endcsname}
```
```{=latex}
\let\@citestyle\bibstyle
```
```{=latex}
\newcommand\citestyle[1]{\@citestyle{#1}\let\bibstyle\@gobble}
```
```{=latex}
\newcommand\bibpunct[7][, ]%
  {\gdef\NAT@open{#2}\gdef\NAT@close{#3}\gdef
   \NAT@sep{#4}\global\NAT@numbersfalse
     \ifx #5n\global\NAT@numberstrue\global\NAT@superfalse
   \else
     \ifx #5s\global\NAT@numberstrue\global\NAT@supertrue
   \fi\fi
   \gdef\NAT@aysep{#6}\gdef\NAT@yrsep{#7}%
   \gdef\NAT@cmt{#1}%
   \NAT@@setcites
  }
```
```{=latex}
\newcommand\setcitestyle[1]{
 \@for\@tempa:=#1\do
 {\def\@tempb{round}\ifx\@tempa\@tempb
    \renewcommand\NAT@open{(}\renewcommand\NAT@close{)}\fi
  \def\@tempb{square}\ifx\@tempa\@tempb
    \renewcommand\NAT@open{[}\renewcommand\NAT@close{]}\fi
  \def\@tempb{angle}\ifx\@tempa\@tempb
    \renewcommand\NAT@open{$<$}\renewcommand\NAT@close{$>$}\fi
  \def\@tempb{curly}\ifx\@tempa\@tempb
    \renewcommand\NAT@open{\{}\renewcommand\NAT@close{\}}\fi
  \def\@tempb{semicolon}\ifx\@tempa\@tempb
    \renewcommand\NAT@sep{;}\fi
  \def\@tempb{colon}\ifx\@tempa\@tempb
    \renewcommand\NAT@sep{;}\fi
  \def\@tempb{comma}\ifx\@tempa\@tempb
    \renewcommand\NAT@sep{,}\fi
  \def\@tempb{authoryear}\ifx\@tempa\@tempb
    \NAT@numbersfalse\fi
  \def\@tempb{numbers}\ifx\@tempa\@tempb
    \NAT@numberstrue\NAT@superfalse\fi
  \def\@tempb{super}\ifx\@tempa\@tempb
    \NAT@numberstrue\NAT@supertrue\fi
  \expandafter\NAT@find@eq\@tempa=\relax\@nil
  \if\@tempc\relax\else
    \expandafter\NAT@rem@eq\@tempc
    \def\@tempb{open}\ifx\@tempa\@tempb
     \xdef\NAT@open{\@tempc}\fi
    \def\@tempb{close}\ifx\@tempa\@tempb
     \xdef\NAT@close{\@tempc}\fi
    \def\@tempb{aysep}\ifx\@tempa\@tempb
     \xdef\NAT@aysep{\@tempc}\fi
    \def\@tempb{yysep}\ifx\@tempa\@tempb
     \xdef\NAT@yrsep{\@tempc}\fi
    \def\@tempb{notesep}\ifx\@tempa\@tempb
     \xdef\NAT@cmt{\@tempc}\fi
    \def\@tempb{citesep}\ifx\@tempa\@tempb
     \xdef\NAT@sep{\@tempc}\fi
  \fi
 }%
 \NAT@@setcites
}
```
```{=latex}
\def\NAT@find@eq#1=#2\@nil
```
```{=latex}
\def\NAT@rem@eq#1={\def\@tempc{#1}}
```
```{=latex}
\def\NAT@@setcites{\global\let\bibstyle\@gobble}
```
```{=latex}
\newcommand\NAT@open{(}
```
```{=latex}
\newcommand\NAT@close{)}
```
```{=latex}
\newcommand\NAT@sep{;}
```
```{=latex}
\newcommand\NAT@aysep{,}
```
```{=latex}
\newcommand\NAT@yrsep{,}
```
```{=latex}
\newcommand\NAT@cmt{, }
```
```{=latex}
\newcommand\NAT@cite%
    [3]{\ifNAT@swa\NAT@@open\if*#2*\else#2\NAT@spacechar\fi
        #1\if*#3*\else\NAT@cmt#3\fi\NAT@@close\else#1\fi\endgroup}
```
```{=latex}
\newcommand\NAT@citenum%
    [3]{\ifNAT@swa\NAT@@open\if*#2*\else#2\NAT@spacechar\fi
        #1\if*#3*\else\NAT@cmt#3\fi\NAT@@close\else#1\fi\endgroup}
```
```{=latex}
\newcommand\NAT@citesuper[3]{\ifNAT@swa
\if*#2*\else#2\NAT@spacechar\fi
\unskip\kern\p@\textsuperscript{\NAT@@open#1\NAT@@close}%
   \if*#3*\else\NAT@spacechar#3\fi\else #1\fi\endgroup}
```
```{=latex}
\providecommand\textsuperscript[1]{\mbox{$^{\mbox{\scriptsize#1}}$}}
```
```{=latex}
\providecommand\@firstofone[1]{#1}
```
```{=latex}
\newcommand\NAT@citexnum{}
```
```{=latex}
\def\NAT@citexnum[#1][#2]#3{%
  \NAT@reset@parser
  \NAT@sort@cites{#3}%
  \NAT@reset@citea
  \@cite{\def\NAT@num{-1}\let\NAT@last@yr\relax\let\NAT@nm\@empty
    \@for\@citeb:=\NAT@cite@list\do
    {\@safe@activestrue
     \edef\@citeb{\expandafter\@firstofone\@citeb\@empty}%
     \@safe@activesfalse
     \@ifundefined{b@\@citeb\@extra@b@citeb}{%
       {\reset@font\bfseries?}
        \NAT@citeundefined\PackageWarning{natbib}%
       {Citation `\@citeb' on page \thepage \space undefined}}%
     {\let\NAT@last@num\NAT@num\let\NAT@last@nm\NAT@nm
      \NAT@parse{\@citeb}%
      \ifNAT@longnames\@ifundefined{bv@\@citeb\@extra@b@citeb}{%
        \let\NAT@name=\NAT@all@names
        \global\@namedef{bv@\@citeb\@extra@b@citeb}{}}{}%
      \fi
      \ifNAT@full\let\NAT@nm\NAT@all@names\else
        \let\NAT@nm\NAT@name\fi
      \ifNAT@swa
       \@ifnum{\NAT@ctype>\@ne}{%
        \@citea
        \NAT@hyper@{\@ifnum{\NAT@ctype=\tw@}{\NAT@test{\NAT@ctype}}{\NAT@alias}}%
       }{%
        \@ifnum{\NAT@cmprs>\z@}{%
         \NAT@ifcat@num\NAT@num
          {\let\NAT@nm=\NAT@num}%
          {\def\NAT@nm{-2}}%
         \NAT@ifcat@num\NAT@last@num
          {\@tempcnta=\NAT@last@num\relax}%
          {\@tempcnta\m@ne}%
         \@ifnum{\NAT@nm=\@tempcnta}{%
          \@ifnum{\NAT@merge>\@ne}{}{\NAT@last@yr@mbox}%
         }{%
           \advance\@tempcnta by\@ne
           \@ifnum{\NAT@nm=\@tempcnta}{%
             \ifx\NAT@last@yr\relax
               \def@NAT@last@yr{\@citea}%
             \else
               \def@NAT@last@yr{--\NAT@penalty}%
             \fi
           }{%
             \NAT@last@yr@mbox
           }%
         }%
        }{%
         \@tempswatrue
         \@ifnum{\NAT@merge>\@ne}{\@ifnum{\NAT@last@num=\NAT@num\relax}{\@tempswafalse}{}}{}%
         \if@tempswa\NAT@citea@mbox\fi
        }%
       }%
       \NAT@def@citea
      \else
        \ifcase\NAT@ctype
          \ifx\NAT@last@nm\NAT@nm \NAT@yrsep\NAT@penalty\NAT@space\else
            \@citea \NAT@test{\@ne}\NAT@spacechar\NAT@mbox{\NAT@super@kern\NAT@@open}%
          \fi
          \if*#1*\else#1\NAT@spacechar\fi
          \NAT@mbox{\NAT@hyper@{{\citenumfont{\NAT@num}}}}%
          \NAT@def@citea@box
        \or
          \NAT@hyper@citea@space{\NAT@test{\NAT@ctype}}%
        \or
          \NAT@hyper@citea@space{\NAT@test{\NAT@ctype}}%
        \or
          \NAT@hyper@citea@space\NAT@alias
        \fi
      \fi
     }%
    }%
      \@ifnum{\NAT@cmprs>\z@}{\NAT@last@yr}{}%
      \ifNAT@swa\else
        \@ifnum{\NAT@ctype=\z@}{%
          \if*#2*\else\NAT@cmt#2\fi
        }{}%
        \NAT@mbox{\NAT@@close}%
      \fi
  }{#1}{#2}%
}
```
```{=latex}
\def\NAT@citea@mbox{%
 \@citea\mbox{\NAT@hyper@{{\citenumfont{\NAT@num}}}}%
}
```
```{=latex}
\def\NAT@hyper@#1{%
 \hyper@natlinkstart{\@citeb\@extra@b@citeb}#1\hyper@natlinkend
}
```
```{=latex}
\def\NAT@hyper@citea#1{%
 \@citea
 \NAT@hyper@{#1}%
 \NAT@def@citea
}
```
```{=latex}
\def\NAT@hyper@citea@space#1{%
 \@citea
 \NAT@hyper@{#1}%
 \NAT@def@citea@space
}
```
```{=latex}
\def\def@NAT@last@yr#1{%
 \protected@edef\NAT@last@yr{%
  #1%
  \noexpand\mbox{%
   \noexpand\hyper@natlinkstart{\@citeb\@extra@b@citeb}%
   {\noexpand\citenumfont{\NAT@num}}%
   \noexpand\hyper@natlinkend
  }%
 }%
}
```
```{=latex}
\def\NAT@last@yr@mbox{%
 \NAT@last@yr\let\NAT@last@yr\relax
 \NAT@citea@mbox
}
```
```{=latex}
\newcommand\NAT@test[1]{%
 \@ifnum{#1=\@ne}{%
  \ifx\NAT@nm\NAT@noname
   \begingroup\reset@font\bfseries(author?)\endgroup
   \PackageWarning{natbib}{%
    Author undefined for citation`\@citeb' \MessageBreak on page \thepage%
   }%
  \else \NAT@nm
  \fi
 }{%
  \if\relax\NAT@date\relax
   \begingroup\reset@font\bfseries(year?)\endgroup
   \PackageWarning{natbib}{%
    Year undefined for citation`\@citeb' \MessageBreak on page \thepage%
   }%
  \else \NAT@date
  \fi
 }%
}
```
```{=latex}
\let\citenumfont=\@empty
```
```{=latex}
\newcommand\NAT@citex{}
```
```{=latex}
\def\NAT@spacechar{\ }
```
```{=latex}
\def\NAT@separator{\NAT@sep\NAT@penalty}
```
```{=latex}
\def\NAT@reset@citea{\c@NAT@ctr\@ne\let\@citea\@empty}
```
```{=latex}
\def\NAT@def@citea{\def\@citea{\NAT@separator\NAT@space}}
```
```{=latex}
\def\NAT@def@citea@space{\def\@citea{\NAT@separator\NAT@spacechar}}
```
```{=latex}
\def\NAT@def@citea@close{\def\@citea{\NAT@@close\NAT@separator\NAT@space}}
```
```{=latex}
\def\NAT@def@citea@box{\def\@citea{\NAT@mbox{\NAT@@close}\NAT@separator\NAT@spacechar}}
```
```{=latex}
\newcommand\NAT@@open{\ifNAT@par\NAT@open\fi}
```
```{=latex}
\newcommand\NAT@@close{\ifNAT@par\NAT@close\fi}
```
```{=latex}
\newcommand\NAT@alias{\@ifundefined{al@\@citeb\@extra@b@citeb}{%
  {\reset@font\bfseries(alias?)}\PackageWarning{natbib}
  {Alias undefined for citation `\@citeb'
  \MessageBreak on page \thepage}}{\@nameuse{al@\@citeb\@extra@b@citeb}}}
```
```{=latex}
\let\NAT@up\relax
```
```{=latex}
\newcommand\NAT@Up[1]{{\let\protect\@unexpandable@protect\let~\relax
  \expandafter\NAT@deftemp#1}\expandafter\NAT@UP\NAT@temp}
```
```{=latex}
\newcommand\NAT@deftemp[1]{\xdef\NAT@temp{#1}}
```
```{=latex}
\newcommand\NAT@UP[1]{\let\@tempa\NAT@UP\ifcat a#1\MakeUppercase{#1}%
  \let\@tempa\relax\else#1\fi\@tempa}
```
```{=latex}
\newcommand\shortcites[1]{%
  \@bsphack\@for\@citeb:=#1\do
  {\@safe@activestrue
   \edef\@citeb{\expandafter\@firstofone\@citeb\@empty}%
   \@safe@activesfalse
   \global\@namedef{bv@\@citeb\@extra@b@citeb}{}}\@esphack}
```
```{=latex}
\newcommand\NAT@biblabel[1]{\hfill}
```
```{=latex}
\newcommand\NAT@biblabelnum[1]{\bibnumfmt{#1}}
```
```{=latex}
\let\bibnumfmt\@empty
```
```{=latex}
\providecommand\@biblabel[1]{[#1]}
```
```{=latex}
\newcommand\NAT@bibsetnum[1]{\settowidth\labelwidth{\@biblabel{#1}}%
   \setlength{\leftmargin}{\labelwidth}\addtolength{\leftmargin}{\labelsep}%
   \setlength{\itemsep}{\bibsep}\setlength{\parsep}{\z@}%
   \ifNAT@openbib
     \addtolength{\leftmargin}{\bibindent}%
     \setlength{\itemindent}{-\bibindent}%
     \setlength{\listparindent}{\itemindent}%
     \setlength{\parsep}{0pt}%
   \fi
}
```
```{=latex}
\newcommand\NAT@bibsetup%
   [1]{\setlength{\leftmargin}{\bibhang}\setlength{\itemindent}{-\leftmargin}%
       \setlength{\itemsep}{\bibsep}\setlength{\parsep}{\z@}}
```
```{=latex}
\newcommand\NAT@set@cites{%
  \ifNAT@numbers
    \ifNAT@super \let\@cite\NAT@citesuper
       \def\NAT@mbox##1{\unskip\nobreak\textsuperscript{##1}}%
       \let\citeyearpar=\citeyear
       \let\NAT@space\relax
       \def\NAT@super@kern{\kern\p@}%
    \else
       \let\NAT@mbox=\mbox
       \let\@cite\NAT@citenum
       \let\NAT@space\NAT@spacechar
       \let\NAT@super@kern\relax
    \fi
    \let\@citex\NAT@citexnum
    \let\@biblabel\NAT@biblabelnum
    \let\@bibsetup\NAT@bibsetnum
    \renewcommand\NAT@idxtxt{\NAT@name\NAT@spacechar\NAT@open\NAT@num\NAT@close}%
    \def\natexlab##1{}%
    \def\NAT@penalty{\penalty\@m}%
  \else
    \let\@cite\NAT@cite
    \let\@citex\NAT@citex
    \let\@biblabel\NAT@biblabel
    \let\@bibsetup\NAT@bibsetup
    \let\NAT@space\NAT@spacechar
    \let\NAT@penalty\@empty
    \renewcommand\NAT@idxtxt{\NAT@name\NAT@spacechar\NAT@open\NAT@date\NAT@close}%
    \def\natexlab##1{##1}%
  \fi}
```
```{=latex}
\DeclareRobustCommand\citet
   {\begingroup\NAT@swafalse\let\NAT@ctype\z@\NAT@partrue
     \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
```
```{=latex}
\newcommand\NAT@citetp{\@ifnextchar[{\NAT@@citetp}{\NAT@@citetp[]}}
```
```{=latex}
\newcommand\NAT@@citetp{}
```
```{=latex}
\def\NAT@@citetp[#1]{\@ifnextchar[{\@citex[#1]}{\@citex[][#1]}}
```
```{=latex}
\DeclareRobustCommand\citep
   {\begingroup\NAT@swatrue\let\NAT@ctype\z@\NAT@partrue
         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
```
```{=latex}
\DeclareRobustCommand\cite
    {\begingroup\let\NAT@ctype\z@\NAT@partrue\NAT@swatrue
      \@ifstar{\NAT@fulltrue\NAT@cites}{\NAT@fullfalse\NAT@cites}}
```
```{=latex}
\newcommand\NAT@cites{\@ifnextchar [{\NAT@@citetp}{%
     \ifNAT@numbers\else
     \NAT@swafalse
     \fi
    \NAT@@citetp[]}}
```
```{=latex}
\DeclareRobustCommand\citealt
   {\begingroup\NAT@swafalse\let\NAT@ctype\z@\NAT@parfalse
         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
```
```{=latex}
\DeclareRobustCommand\citealp
   {\begingroup\NAT@swatrue\let\NAT@ctype\z@\NAT@parfalse
         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
```
```{=latex}
\DeclareRobustCommand\citenum
   {\begingroup
     \NAT@swatrue\let\NAT@ctype\z@\NAT@parfalse\let\textsuperscript\NAT@spacechar
     \NAT@citexnum[][]}
```
```{=latex}
\DeclareRobustCommand\citeauthor
   {\begingroup\NAT@swafalse\let\NAT@ctype\@ne\NAT@parfalse
    \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
```
```{=latex}
\DeclareRobustCommand\Citet
   {\begingroup\NAT@swafalse\let\NAT@ctype\z@\NAT@partrue
     \let\NAT@up\NAT@Up
     \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
```
```{=latex}
\DeclareRobustCommand\Citep
   {\begingroup\NAT@swatrue\let\NAT@ctype\z@\NAT@partrue
     \let\NAT@up\NAT@Up
         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
```
```{=latex}
\DeclareRobustCommand\Citealt
   {\begingroup\NAT@swafalse\let\NAT@ctype\z@\NAT@parfalse
     \let\NAT@up\NAT@Up
         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
```
```{=latex}
\DeclareRobustCommand\Citealp
   {\begingroup\NAT@swatrue\let\NAT@ctype\z@\NAT@parfalse
     \let\NAT@up\NAT@Up
         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
```
```{=latex}
\DeclareRobustCommand\Citeauthor
   {\begingroup\NAT@swafalse\let\NAT@ctype\@ne\NAT@parfalse
     \let\NAT@up\NAT@Up
    \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
```
```{=latex}
\DeclareRobustCommand\citeyear
   {\begingroup\NAT@swafalse\let\NAT@ctype\tw@\NAT@parfalse\NAT@citetp}
```
```{=latex}
\DeclareRobustCommand\citeyearpar
   {\begingroup\NAT@swatrue\let\NAT@ctype\tw@\NAT@partrue\NAT@citetp}
```
```{=latex}
\newcommand\citetext[1]{\NAT@open#1\NAT@close}
```
```{=latex}
\DeclareRobustCommand\citefullauthor
   {\citeauthor*}
```
```{=latex}
\newcommand\defcitealias[2]{%
   \@ifundefined{al@#1\@extra@b@citeb}{}
   {\PackageWarning{natbib}{Overwriting existing alias for citation #1}}
   \@namedef{al@#1\@extra@b@citeb}{#2}}
```
```{=latex}
\DeclareRobustCommand\citetalias{\begingroup
   \NAT@swafalse\let\NAT@ctype\thr@@\NAT@parfalse\NAT@citetp}
```
```{=latex}
\DeclareRobustCommand\citepalias{\begingroup
   \NAT@swatrue\let\NAT@ctype\thr@@\NAT@partrue\NAT@citetp}
```
```{=latex}
\renewcommand\nocite[1]{\@bsphack
  \@for\@citeb:=#1\do{%
    \@safe@activestrue
    \edef\@citeb{\expandafter\@firstofone\@citeb\@empty}%
    \@safe@activesfalse
    \if@filesw\immediate\write\@auxout{\string\citation{\@citeb}}\fi
    \if*\@citeb\else
    \@ifundefined{b@\@citeb\@extra@b@citeb}{%
       \NAT@citeundefined \PackageWarning{natbib}%
       {Citation `\@citeb' undefined}}{}\fi}%
  \@esphack}
```
```{=latex}
\newcommand\NAT@parse[1]{%
  \begingroup
   \let\protect=\@unexpandable@protect
   \let~\relax
   \let\active@prefix=\@gobble
   \edef\NAT@temp{\csname b@#1\@extra@b@citeb\endcsname}%
   \aftergroup\NAT@split
   \expandafter
  \endgroup
  \NAT@temp{}{}{}{}{}@@%
  \expandafter\NAT@parse@date\NAT@date??????@@%
  \ifciteindex\NAT@index\fi
}
```
```{=latex}
\def\NAT@split#1#2#3#4#5@@{%
  \gdef\NAT@num{#1}\gdef\NAT@name{#3}\gdef\NAT@date{#2}%
  \gdef\NAT@all@names{#4}%
  \ifx\NAT@num\@empty\gdef\NAT@num{0}\fi
  \ifx\NAT@noname\NAT@all@names \gdef\NAT@all@names{#3}\fi
}
```
```{=latex}
\def\NAT@reset@parser{%
  \global\let\NAT@num\@empty
  \global\let\NAT@name\@empty
  \global\let\NAT@date\@empty
  \global\let\NAT@all@names\@empty
}
```
```{=latex}
\newcommand\NAT@parse@date{}
```
```{=latex}
\def\NAT@parse@date#1#2#3#4#5#6@@{%
  \ifnum\the\catcode`#1=11\def\NAT@year{}\def\NAT@exlab{#1}\else
  \ifnum\the\catcode`#2=11\def\NAT@year{#1}\def\NAT@exlab{#2}\else
  \ifnum\the\catcode`#3=11\def\NAT@year{#1#2}\def\NAT@exlab{#3}\else
  \ifnum\the\catcode`#4=11\def\NAT@year{#1#2#3}\def\NAT@exlab{#4}\else
    \def\NAT@year{#1#2#3#4}\def\NAT@exlab{{#5}}\fi\fi\fi\fi}
```
```{=latex}
\newcommand\NAT@index{}
```
```{=latex}
\let\NAT@makeindex=\makeindex
```
```{=latex}
\renewcommand\makeindex{\NAT@makeindex
  \renewcommand\NAT@index{\@bsphack\begingroup
     \def~{\string~}\@wrindex{\NAT@idxtxt}}}
```
```{=latex}
\newcommand\NAT@idxtxt{\NAT@name\NAT@spacechar\NAT@open\NAT@date\NAT@close}
```
```{=latex}
\newcommand\citeindextype{default}
```
```{=latex}
\newcommand\NAT@index@alt{{\let\protect=\noexpand\let~\relax
  \xdef\NAT@temp{\NAT@idxtxt}}\expandafter\NAT@exp\NAT@temp\@nil}
```
```{=latex}
\newcommand\NAT@exp{}
```
```{=latex}
\def\NAT@exp#1\@nil
```
```{=latex}
\newcommand\NAT@ifcmd{\futurelet\NAT@temp\NAT@ifxcmd}
```
```{=latex}
\newcommand\NAT@ifxcmd{\ifx\NAT@temp\relax\else\expandafter\NAT@bare\fi}
```
```{=latex}
\def\NAT@bare#1(#2)#3(@)#4\@nil
```
```{=latex}
\newcommand\NAT@wrout[5]{%
\if@filesw
      {\let\protect\noexpand\let~\relax
       \immediate
       \write\@auxout{\string\bibcite{#5}{{#1}{#2}{{#3}}{{#4}}}}}\fi
\ignorespaces}
```
```{=latex}
\def\NAT@noname{{}}
```
```{=latex}
\renewcommand\bibitem{\@ifnextchar[{\@lbibitem}{\@lbibitem[]}}
```
```{=latex}
\let\NAT@bibitem@first@sw\@secondoftwo
```
```{=latex}
\def\@lbibitem[#1]#2{%
  \if\relax\@extra@b@citeb\relax\else
    \@ifundefined{br@#2\@extra@b@citeb}{}{%
     \@namedef{br@#2}{\@nameuse{br@#2\@extra@b@citeb}}%
    }%
  \fi
  \@ifundefined{b@#2\@extra@b@citeb}{%
   \def\NAT@num{}%
  }{%
   \NAT@parse{#2}%
  }%
  \def\NAT@tmp{#1}%
  \expandafter\let\expandafter\bibitemOpen\csname NAT@b@open@#2\endcsname
  \expandafter\let\expandafter\bibitemShut\csname NAT@b@shut@#2\endcsname
  \@ifnum{\NAT@merge>\@ne}{%
   \NAT@bibitem@first@sw{%
    \@firstoftwo
   }{%
    \@ifundefined{NAT@b*@#2}{%
     \@firstoftwo
    }{%
     \expandafter\def\expandafter\NAT@num\expandafter{\the\c@NAT@ctr}%
     \@secondoftwo
    }%
   }%
  }{%
   \@firstoftwo
  }%
  {%
   \global\advance\c@NAT@ctr\@ne
   \@ifx{\NAT@tmp\@empty}{\@firstoftwo}{%
    \@secondoftwo
   }%
   {%
    \expandafter\def\expandafter\NAT@num\expandafter{\the\c@NAT@ctr}%
    \global\NAT@stdbsttrue
   }{}%
   \bibitem@fin
   \item[\hfil\NAT@anchor{#2}{\NAT@num}]%
   \global\let\NAT@bibitem@first@sw\@secondoftwo
   \NAT@bibitem@init
  }%
  {%
   \NAT@anchor{#2}{}%
   \NAT@bibitem@cont
   \bibitem@fin
  }%
  \@ifx{\NAT@tmp\@empty}{%
    \NAT@wrout{\the\c@NAT@ctr}{}{}{}{#2}%
  }{%
    \expandafter\NAT@ifcmd\NAT@tmp(@)(@)\@nil{#2}%
  }%
}
```
```{=latex}
\def\bibitem@fin{%
 \@ifxundefined\@bibstop{}{\csname bibitem@\@bibstop\endcsname}%
}
```
```{=latex}
\def\NAT@bibitem@init{%
 \let\@bibstop\@undefined
}
```
```{=latex}
\def\NAT@bibitem@cont{%
 \let\bibitem@Stop\bibitemStop
 \let\bibitem@NoStop\bibitemContinue
}
```
```{=latex}
\def\BibitemOpen{%
 \bibitemOpen
}
```
```{=latex}
\def\BibitemShut#1{%
 \bibitemShut
 \def\@bibstop{#1}%
 \let\bibitem@Stop\bibitemStop
 \let\bibitem@NoStop\bibitemNoStop
}
```
```{=latex}
\def\bibitemStop{}
```
```{=latex}
\def\bibitemNoStop{.\spacefactor\@mmm\space}
```
```{=latex}
\def\bibitemContinue{\spacefactor\@mmm\space}
```
```{=latex}
\providecommand{\bibAnnote}[3]{%
  \BibitemShut{#1}%
  \def\@tempa{#3}\@ifx{\@tempa\@empty}{}{%
   \begin{quotation}\noindent
    \textsc{Key:}\ #2\\\textsc{Annotation:}\ \@tempa
   \end{quotation}%
  }%
}
```
```{=latex}
\providecommand{\bibAnnoteFile}[2]{%
  \IfFileExists{#2}{%
    \bibAnnote{#1}{#2}{\input{#2}}%
  }{%
    \bibAnnote{#1}{#2}{}%
  }%
}
```
```{=latex}
\let\bibitemOpen\relax
```
```{=latex}
\let\bibitemShut\relax
```
```{=latex}
\def\bibfield{\@ifnum{\NAT@merge>\tw@}{\@bibfield}{\@secondoftwo}}
```
```{=latex}
\def\@bibfield#1#2{%
 \begingroup
  \let\Doi\@gobble
  \let\bibinfo\relax
  \let\restore@protect\@empty
  \protected@edef\@tempa{#2}%
  \aftergroup\def\aftergroup\@tempa
 \expandafter\endgroup\expandafter{\@tempa}%
 \expandafter\@ifx\expandafter{\csname @bib#1\endcsname\@tempa}{%
  \expandafter\let\expandafter\@tempa\csname @bib@X#1\endcsname
 }{%
  \expandafter\let\csname @bib#1\endcsname\@tempa
  \expandafter\let\expandafter\@tempa\csname @bib@Y#1\endcsname
 }%
 \@ifx{\@tempa\relax}{\let\@tempa\@firstofone}{}%
 \@tempa{#2}%
}
```
```{=latex}
\def\bibinfo#1{%
 \expandafter\let\expandafter\@tempa\csname bibinfo@X@#1\endcsname
 \@ifx{\@tempa\relax}{\@firstofone}{\@tempa}%
}
```
```{=latex}
\def\@bib@Xauthor#1{\let\@bib@Xjournal\@gobble}
```
```{=latex}
\def\@bib@Xjournal#1{\begingroup\let\bibinfo@X@journal\@bib@Z@journal#1\endgroup}
```
```{=latex}
\def\@bibibid@#1{\textit{ibid}.}
```
```{=latex}
\let\SK@lbibitem\@lbibitem
```
```{=latex}
\def\@lbibitem[#1]#2{%
     \SK@lbibitem[#1]{#2}\SK@\SK@@label{#2}\ignorespaces}
```
```{=latex}
\newcommand\NAT@force@numbers{%
  \ifNAT@numbers\else
  \PackageError{natbib}{Bibliography not compatible with author-year
  citations.\MessageBreak
  Press <return> to continue in numerical citation style}
  {Check the bibliography entries for non-compliant syntax,\MessageBreak
   or select author-year BibTeX style, e.g. plainnat}%
  \global\NAT@numberstrue\fi}
```
```{=latex}
\providecommand\bibcite{}
```
```{=latex}
\renewcommand\bibcite[2]{%
 \@ifundefined{b@#1\@extra@binfo}{\relax}{%
   \NAT@citemultiple
   \PackageWarningNoLine{natbib}{Citation `#1' multiply defined}%
 }%
 \global\@namedef{b@#1\@extra@binfo}{#2}%
}
```
```{=latex}
\newcommand\NAT@testdef[2]{%
  \def\NAT@temp{#2}%
  \expandafter \ifx \csname b@#1\@extra@binfo\endcsname\NAT@temp
  \else
    \ifNAT@swa \NAT@swafalse
      \PackageWarningNoLine{natbib}{%
        Citation(s) may have changed.\MessageBreak
        Rerun to get citations correct%
      }%
    \fi
  \fi
}
```
```{=latex}
\newcommand\NAT@apalk{}
```
```{=latex}
\newcommand\citeauthoryear{}
```
```{=latex}
\def\citeauthoryear#1#2#3(@)(@)\@nil
```
```{=latex}
\newcommand\citestarts{\NAT@open}
```
```{=latex}
\newcommand\citeends{\NAT@close}
```
```{=latex}
\newcommand\betweenauthors{and}
```
```{=latex}
\newcommand\astroncite{}
```
```{=latex}
\def\astroncite#1#2(@)(@)\@nil
```
```{=latex}
\newcommand\citename{}
```
```{=latex}
\def\citename#1#2(@)(@)\@nil
```
```{=latex}
\newcommand\harvarditem[4][]{%
 \if\relax#1\relax
   \bibitem[#2(#3)]{#4}%
 \else
   \bibitem[#1(#3)#2]{#4}%
 \fi
}
```
```{=latex}
\newcommand\harvardleft{\NAT@open}
```
```{=latex}
\newcommand\harvardright{\NAT@close}
```
```{=latex}
\newcommand\harvardyearleft{\NAT@open}
```
```{=latex}
\newcommand\harvardyearright{\NAT@close}
```
```{=latex}
\newcommand\harvardurl[1]{\textbf{URL:} \textit{#1}}
```
```{=latex}
\providecommand\bibsection{}
```
```{=latex}
\renewenvironment{thebibliography}[1]{%
 \bibsection
 \parindent\z@
 \bibpreamble
 \bibfont
 \list{\@biblabel{\the\c@NAT@ctr}}{\@bibsetup{#1}\global\c@NAT@ctr\z@}%
 \ifNAT@openbib
   \renewcommand\newblock{\par}%
 \else
   \renewcommand\newblock{\hskip .11em \@plus.33em \@minus.07em}%
 \fi
 \sloppy\clubpenalty4000\widowpenalty4000
 \sfcode`\.\@m
 \let\NAT@bibitem@first@sw\@firstoftwo
    \let\citeN\cite \let\shortcite\cite
    \let\citeasnoun\cite
}{%
 \bibitem@fin
 \bibpostamble
 \def\@noitemerr{%
  \PackageWarning{natbib}{Empty `thebibliography' environment}%
 }%
 \endlist
 \bibcleanup
}
```
```{=latex}
\let\bibfont\@empty
```
```{=latex}
\let\bibpreamble\@empty
```
```{=latex}
\let\bibpostamble\@empty
```
```{=latex}
\def\bibcleanup{\vskip-\lastskip}
```
```{=latex}
\providecommand\reset@font{\relax}
```
```{=latex}
\providecommand\bibname{Bibliography}
```
```{=latex}
\providecommand\refname{References}
```
```{=latex}
\newcommand\NAT@citeundefined{\gdef \NAT@undefined {%
    \PackageWarningNoLine{natbib}{There were undefined citations}}}
```
```{=latex}
\let \NAT@undefined \relax
```
```{=latex}
\newcommand\NAT@citemultiple{\gdef \NAT@multiple {%
    \PackageWarningNoLine{natbib}{There were multiply defined citations}}}
```
```{=latex}
\let \NAT@multiple \relax
```
```{=latex}
\providecommand\@mkboth[2]{}
```
```{=latex}
\providecommand\MakeUppercase{\uppercase}
```
```{=latex}
\providecommand{\@extra@b@citeb}{}
```
```{=latex}
\def\NAT@anchor#1#2{%
 \hyper@natanchorstart{#1\@extra@b@citeb}%
  \def\@tempa{#2}\@ifx{\@tempa\@empty}{}{\@biblabel{#2}}%
 \hyper@natanchorend
}
```
```{=latex}
\providecommand\hyper@natanchorstart[1]{}
```
```{=latex}
\providecommand\hyper@natanchorend{}
```
```{=latex}
\providecommand\hyper@natlinkstart[1]{}
```
```{=latex}
\providecommand\hyper@natlinkend{}
```
```{=latex}
\providecommand\hyper@natlinkbreak[2]{#1}
```
```{=latex}
\providecommand\@safe@activestrue{}
```
```{=latex}
\providecommand\@safe@activesfalse{}
```
```{=latex}
\newcommand\NAT@sort@cites[1]{%
  \let\NAT@cite@list\@empty
  \@for\@citeb:=#1\do{\expandafter\NAT@star@cite\@citeb\@@}%
  \if@filesw
    \expandafter\immediate\expandafter\write\expandafter\@auxout
      \expandafter{\expandafter\string\expandafter\citation\expandafter{\NAT@cite@list}}%
  \fi
  \@ifnum{\NAT@sort>\z@}{%
    \expandafter\NAT@sort@cites@\expandafter{\NAT@cite@list}%
  }{}%
}
```
```{=latex}
\def\NAT@star@cite{%
  \let\NAT@star@sw\@secondoftwo
  \@ifnum{\NAT@merge>\z@}{%
   \@ifnextchar*{%
    \let\NAT@star@sw\@firstoftwo
    \NAT@star@cite@star
   }{%
    \NAT@star@cite@nostar
   }%
  }{%
   \NAT@star@cite@noextension
  }%
}
```
```{=latex}
\def\NAT@star@cite@star*{%
 \NAT@star@cite@nostar
}
```
```{=latex}
\def\NAT@star@cite@nostar{%
 \let\nat@keyopt@open\@empty
 \let\nat@keyopt@shut\@empty
 \@ifnextchar[{\NAT@star@cite@pre}{\NAT@star@cite@pre[]}%
}
```
```{=latex}
\def\NAT@star@cite@pre[#1]{%
 \def\nat@keyopt@open{#1}%
 \@ifnextchar[{\NAT@star@cite@post}{\NAT@star@cite@post[]}%
}
```
```{=latex}
\def\NAT@star@cite@post[#1]#2\@@
```
```{=latex}
\def\NAT@star@cite@noextension#1\@@
```
```{=latex}
\def\NAT@cite@list@append#1{%
  \edef\@citeb{\@firstofone#1\@empty}%
  \if@filesw\@ifxundefined\@cprwrite{}{\expandafter\@cprwrite\@citeb=}\fi
  \if\relax\nat@keyopt@open\relax\else
   \global\expandafter\let\csname NAT@b@open@\@citeb\endcsname\nat@keyopt@open
  \fi
  \if\relax\nat@keyopt@shut\relax\else
   \global\expandafter\let\csname NAT@b@shut@\@citeb\endcsname\nat@keyopt@shut
  \fi
  \toks@\expandafter{\NAT@cite@list}%
  \ifx\NAT@cite@list\@empty
    \@temptokena\expandafter{\@citeb}%
  \else
    \@temptokena\expandafter{\expandafter,\@citeb}%
  \fi
  \edef\NAT@cite@list{\the\toks@\the\@temptokena}%
}
```
```{=latex}
\newcommand\NAT@sort@cites@[1]{%
  \count@\z@
  \@tempcntb\m@ne
  \let\@celt\delimiter
  \def\NAT@num@list{}%
  \let\NAT@cite@list\@empty
  \let\NAT@nonsort@list\@empty
  \@for \@citeb:=#1\do{\NAT@make@cite@list}%
  \ifx\NAT@nonsort@list\@empty\else
   \protected@edef\NAT@cite@list{\NAT@cite@list\NAT@nonsort@list}%
  \fi
  \ifx\NAT@cite@list\@empty\else
   \protected@edef\NAT@cite@list{\expandafter\NAT@xcom\NAT@cite@list @@}%
  \fi
}
```
```{=latex}
\def\NAT@make@cite@list{%
  \advance\count@\@ne
  \@safe@activestrue
  \edef\@citeb{\expandafter\@firstofone\@citeb\@empty}%
  \@safe@activesfalse
  \@ifundefined{b@\@citeb\@extra@b@citeb}%
   {\def\NAT@num{A}}%
   {\NAT@parse{\@citeb}}%
  \NAT@ifcat@num\NAT@num
   {\@tempcnta\NAT@num \relax
    \@ifnum{\@tempcnta<\@tempcntb}{%
      \let\NAT@@cite@list=\NAT@cite@list
      \let\NAT@cite@list\@empty
      \begingroup\let\@celt=\NAT@celt\NAT@num@list\endgroup
      \protected@edef\NAT@num@list{%
       \expandafter\NAT@num@celt \NAT@num@list \@gobble @%
      }%
    }{%
      \protected@edef\NAT@num@list{\NAT@num@list \@celt{\NAT@num}}%
      \protected@edef\NAT@cite@list{\NAT@cite@list\@citeb,}%
      \@tempcntb\@tempcnta
    }%
   }%
   {\protected@edef\NAT@nonsort@list{\NAT@nonsort@list\@citeb,}}%
}
```
```{=latex}
\def\NAT@celt#1{%
  \@ifnum{#1>\@tempcnta}{%
    \xdef\NAT@cite@list{\NAT@cite@list\@citeb,\NAT@@cite@list}%
    \let\@celt\@gobble
  }{%
    \expandafter\def@NAT@cite@lists\NAT@@cite@list\@@
  }%
}
```
```{=latex}
\def\NAT@num@celt#1#2{%
 \ifx#1\@celt
  \@ifnum{#2>\@tempcnta}{%
    \@celt{\number\@tempcnta}%
    \@celt{#2}%
  }{%
    \@celt{#2}%
    \expandafter\NAT@num@celt
  }%
 \fi
}
```
```{=latex}
\def\def@NAT@cite@lists#1,#2\@@
```
```{=latex}
\def\NAT@nextc#1,#2@@{#1,}
```
```{=latex}
\def\NAT@restc#1,#2{#2}
```
```{=latex}
\def\NAT@xcom#1,@@{#1}
```
```{=latex}
\renewcommand\lstlistingname{Algorithm}
```
```{=latex}
\def\lstfloatautorefname{Listing}
```
```{=latex}
\newcommand{\1}{\mathbf{1}}
```
```{=latex}
\newcommand{\norm}[1]{\left\lVert #1\right\rVert}
```
```{=latex}
\newcommand{\grad}{\nabla}
```
```{=latex}
\newcommand{\Hess}{H}
```
```{=latex}
\newcommand{\tr}{\mathrm{tr}}
```
```{=latex}
\newcommand{\Ball}{\mathrm{B}}
```
```{=latex}
\newcommand{\vol}{v_d}
```
```{=latex}
\newcommand{\surf}{s_{d-1}}
```
```{=latex}
\newcommand{\RR}{\mathbb{R}}
```
```{=latex}
\newcommand{\PP}{\mathbb{P}}
```
```{=latex}
\newcommand{\EE}{\mathbb{E}}
```
```{=latex}
\newcommand{\ind}{\mathbbm{1}}
```
```{=latex}
\newcommand{\Law}{\mathcal{L}}
```
```{=latex}
\newcommand{\sphere}{S^{d-1}}
```
```{=latex}
\newcommand{\inner}[2]{\left\langle #1, #2 \right\rangle}
```
```{=latex}
\newcommand{\BL}{\mathrm{BL}}
```
```{=latex}
\newcommand{\dBL}{d_{\mathrm{BL}}}
```
```{=latex}
\newcommand{\RightarrowDist}{\Rightarrow}
```
```{=latex}
\newcommand{\toProb}{\xrightarrow{\,\mathbb{P}\,}}
```
```{=latex}
\def\ceil#1{\lceil #1 \rceil}
```
```{=latex}
\def\floor#1{\lfloor #1 \rfloor}
```
```{=latex}
\def\1{\bm{1}}
```
```{=latex}
\newcommand{\ReLU}{\text{ReLU}}
```
```{=latex}
\newcommand{\flatten}{\text{vec}}
```
```{=latex}
\newcommand{\train}{\mathcal{D}}
```
```{=latex}
\newcommand{\valid}{\mathcal{D_{\mathrm{valid}}}}
```
```{=latex}
\newcommand{\test}{\mathcal{D_{\mathrm{test}}}}
```
```{=latex}
\def\eps{{\epsilon}}
```
```{=latex}
\def\cst{{\rm cst}}
```
```{=latex}
\newcommand{\rmat}[2]{\mathcal{M}_{#1,#2}(\mathbb{R})}
```
```{=latex}
\newcommand{\romat}[2]{\mathcal{O}_{#1}(\mathbb{R})}
```
```{=latex}
\DeclareMathOperator{\spn}{span}
```
```{=latex}
\DeclareMathOperator{\diag}{diag}
```
```{=latex}
\DeclareMathOperator{\sign}{sign}
```
```{=latex}
\DeclareMathOperator{\Tr}{Tr}
```
```{=latex}
\newcommand{\Trp}[1]{\Tr\left(#1\right)}
```
```{=latex}
\DeclareMathOperator{\eigvec}{eigvec}
```
```{=latex}
\newcommand{\eigvecp}[1]{\eigvec\left(#1\right)}
```
```{=latex}
\def\reta{{\textnormal{$\eta$}}}
```
```{=latex}
\def\ra{{\textnormal{a}}}
```
```{=latex}
\def\rb{{\textnormal{b}}}
```
```{=latex}
\def\rc{{\textnormal{c}}}
```
```{=latex}
\def\rd{{\textnormal{d}}}
```
```{=latex}
\def\re{{\textnormal{e}}}
```
```{=latex}
\def\rf{{\textnormal{f}}}
```
```{=latex}
\def\rg{{\textnormal{g}}}
```
```{=latex}
\def\rh{{\textnormal{h}}}
```
```{=latex}
\def\ri{{\textnormal{i}}}
```
```{=latex}
\def\rj{{\textnormal{j}}}
```
```{=latex}
\def\rk{{\textnormal{k}}}
```
```{=latex}
\def\rl{{\textnormal{l}}}
```
```{=latex}
\def\rn{{\textnormal{n}}}
```
```{=latex}
\def\ro{{\textnormal{o}}}
```
```{=latex}
\def\rp{{\textnormal{p}}}
```
```{=latex}
\def\rq{{\textnormal{q}}}
```
```{=latex}
\def\rr{{\textnormal{r}}}
```
```{=latex}
\def\rs{{\textnormal{s}}}
```
```{=latex}
\def\rt{{\textnormal{t}}}
```
```{=latex}
\def\ru{{\textnormal{u}}}
```
```{=latex}
\def\rv{{\textnormal{v}}}
```
```{=latex}
\def\rw{{\textnormal{w}}}
```
```{=latex}
\def\rx{{\textnormal{x}}}
```
```{=latex}
\def\ry{{\textnormal{y}}}
```
```{=latex}
\def\rz{{\textnormal{z}}}
```
```{=latex}
\def\rvepsilon{{\mathbf{\epsilon}}}
```
```{=latex}
\def\rvtheta{{\mathbf{\theta}}}
```
```{=latex}
\def\rva{{\mathbf{a}}}
```
```{=latex}
\def\rvb{{\mathbf{b}}}
```
```{=latex}
\def\rvc{{\mathbf{c}}}
```
```{=latex}
\def\rvd{{\mathbf{d}}}
```
```{=latex}
\def\rve{{\mathbf{e}}}
```
```{=latex}
\def\rvf{{\mathbf{f}}}
```
```{=latex}
\def\rvg{{\mathbf{g}}}
```
```{=latex}
\def\rvh{{\mathbf{h}}}
```
```{=latex}
\def\rvu{{\mathbf{i}}}
```
```{=latex}
\def\rvj{{\mathbf{j}}}
```
```{=latex}
\def\rvk{{\mathbf{k}}}
```
```{=latex}
\def\rvl{{\mathbf{l}}}
```
```{=latex}
\def\rvm{{\mathbf{m}}}
```
```{=latex}
\def\rvn{{\mathbf{n}}}
```
```{=latex}
\def\rvo{{\mathbf{o}}}
```
```{=latex}
\def\rvp{{\mathbf{p}}}
```
```{=latex}
\def\rvq{{\mathbf{q}}}
```
```{=latex}
\def\rvr{{\mathbf{r}}}
```
```{=latex}
\def\rvs{{\mathbf{s}}}
```
```{=latex}
\def\rvt{{\mathbf{t}}}
```
```{=latex}
\def\rvu{{\mathbf{u}}}
```
```{=latex}
\def\rvv{{\mathbf{v}}}
```
```{=latex}
\def\rvw{{\mathbf{w}}}
```
```{=latex}
\def\rvx{{\mathbf{x}}}
```
```{=latex}
\def\rvy{{\mathbf{y}}}
```
```{=latex}
\def\rvz{{\mathbf{z}}}
```
```{=latex}
\def\erva{{\textnormal{a}}}
```
```{=latex}
\def\ervb{{\textnormal{b}}}
```
```{=latex}
\def\ervc{{\textnormal{c}}}
```
```{=latex}
\def\ervd{{\textnormal{d}}}
```
```{=latex}
\def\erve{{\textnormal{e}}}
```
```{=latex}
\def\ervf{{\textnormal{f}}}
```
```{=latex}
\def\ervg{{\textnormal{g}}}
```
```{=latex}
\def\ervh{{\textnormal{h}}}
```
```{=latex}
\def\ervi{{\textnormal{i}}}
```
```{=latex}
\def\ervj{{\textnormal{j}}}
```
```{=latex}
\def\ervk{{\textnormal{k}}}
```
```{=latex}
\def\ervl{{\textnormal{l}}}
```
```{=latex}
\def\ervm{{\textnormal{m}}}
```
```{=latex}
\def\ervn{{\textnormal{n}}}
```
```{=latex}
\def\ervo{{\textnormal{o}}}
```
```{=latex}
\def\ervp{{\textnormal{p}}}
```
```{=latex}
\def\ervq{{\textnormal{q}}}
```
```{=latex}
\def\ervr{{\textnormal{r}}}
```
```{=latex}
\def\ervs{{\textnormal{s}}}
```
```{=latex}
\def\ervt{{\textnormal{t}}}
```
```{=latex}
\def\ervu{{\textnormal{u}}}
```
```{=latex}
\def\ervv{{\textnormal{v}}}
```
```{=latex}
\def\ervw{{\textnormal{w}}}
```
```{=latex}
\def\ervx{{\textnormal{x}}}
```
```{=latex}
\def\ervy{{\textnormal{y}}}
```
```{=latex}
\def\ervz{{\textnormal{z}}}
```
```{=latex}
\def\rmA{{\mathbf{A}}}
```
```{=latex}
\def\rmB{{\mathbf{B}}}
```
```{=latex}
\def\rmC{{\mathbf{C}}}
```
```{=latex}
\def\rmD{{\mathbf{D}}}
```
```{=latex}
\def\rmE{{\mathbf{E}}}
```
```{=latex}
\def\rmF{{\mathbf{F}}}
```
```{=latex}
\def\rmG{{\mathbf{G}}}
```
```{=latex}
\def\rmH{{\mathbf{H}}}
```
```{=latex}
\def\rmI{{\mathbf{I}}}
```
```{=latex}
\def\rmJ{{\mathbf{J}}}
```
```{=latex}
\def\rmK{{\mathbf{K}}}
```
```{=latex}
\def\rmL{{\mathbf{L}}}
```
```{=latex}
\def\rmM{{\mathbf{M}}}
```
```{=latex}
\def\rmN{{\mathbf{N}}}
```
```{=latex}
\def\rmO{{\mathbf{O}}}
```
```{=latex}
\def\rmP{{\mathbf{P}}}
```
```{=latex}
\def\rmQ{{\mathbf{Q}}}
```
```{=latex}
\def\rmR{{\mathbf{R}}}
```
```{=latex}
\def\rmS{{\mathbf{S}}}
```
```{=latex}
\def\rmT{{\mathbf{T}}}
```
```{=latex}
\def\rmU{{\mathbf{U}}}
```
```{=latex}
\def\rmV{{\mathbf{V}}}
```
```{=latex}
\def\rmW{{\mathbf{W}}}
```
```{=latex}
\def\rmX{{\mathbf{X}}}
```
```{=latex}
\def\rmY{{\mathbf{Y}}}
```
```{=latex}
\def\rmZ{{\mathbf{Z}}}
```
```{=latex}
\def\ermA{{\textnormal{A}}}
```
```{=latex}
\def\ermB{{\textnormal{B}}}
```
```{=latex}
\def\ermC{{\textnormal{C}}}
```
```{=latex}
\def\ermD{{\textnormal{D}}}
```
```{=latex}
\def\ermE{{\textnormal{E}}}
```
```{=latex}
\def\ermF{{\textnormal{F}}}
```
```{=latex}
\def\ermG{{\textnormal{G}}}
```
```{=latex}
\def\ermH{{\textnormal{H}}}
```
```{=latex}
\def\ermI{{\textnormal{I}}}
```
```{=latex}
\def\ermJ{{\textnormal{J}}}
```
```{=latex}
\def\ermK{{\textnormal{K}}}
```
```{=latex}
\def\ermL{{\textnormal{L}}}
```
```{=latex}
\def\ermM{{\textnormal{M}}}
```
```{=latex}
\def\ermN{{\textnormal{N}}}
```
```{=latex}
\def\ermO{{\textnormal{O}}}
```
```{=latex}
\def\ermP{{\textnormal{P}}}
```
```{=latex}
\def\ermQ{{\textnormal{Q}}}
```
```{=latex}
\def\ermR{{\textnormal{R}}}
```
```{=latex}
\def\ermS{{\textnormal{S}}}
```
```{=latex}
\def\ermT{{\textnormal{T}}}
```
```{=latex}
\def\ermU{{\textnormal{U}}}
```
```{=latex}
\def\ermV{{\textnormal{V}}}
```
```{=latex}
\def\ermW{{\textnormal{W}}}
```
```{=latex}
\def\ermX{{\textnormal{X}}}
```
```{=latex}
\def\ermY{{\textnormal{Y}}}
```
```{=latex}
\def\ermZ{{\textnormal{Z}}}
```
```{=latex}
\def\vzero{{\bm{0}}}
```
```{=latex}
\def\vone{{\bm{1}}}
```
```{=latex}
\def\vmu{{\bm{\mu}}}
```
```{=latex}
\def\vsigma{{\bm{\sigma}}}
```
```{=latex}
\def\vtheta{{\bm{\theta}}}
```
```{=latex}
\def\vepsilon{{\bm{\epsilon}}}
```
```{=latex}
\def\va{{\bm{a}}}
```
```{=latex}
\def\vb{{\bm{b}}}
```
```{=latex}
\def\vc{{\bm{c}}}
```
```{=latex}
\def\vd{{\bm{d}}}
```
```{=latex}
\def\ve{{\bm{e}}}
```
```{=latex}
\def\vf{{\bm{f}}}
```
```{=latex}
\def\vg{{\bm{g}}}
```
```{=latex}
\def\vh{{\bm{h}}}
```
```{=latex}
\def\vi{{\bm{i}}}
```
```{=latex}
\def\vj{{\bm{j}}}
```
```{=latex}
\def\vk{{\bm{k}}}
```
```{=latex}
\def\vl{{\bm{l}}}
```
```{=latex}
\def\vm{{\bm{m}}}
```
```{=latex}
\def\vn{{\bm{n}}}
```
```{=latex}
\def\vo{{\bm{o}}}
```
```{=latex}
\def\vp{{\bm{p}}}
```
```{=latex}
\def\vq{{\bm{q}}}
```
```{=latex}
\def\vr{{\bm{r}}}
```
```{=latex}
\def\vs{{\bm{s}}}
```
```{=latex}
\def\vt{{\bm{t}}}
```
```{=latex}
\def\vu{{\bm{u}}}
```
```{=latex}
\def\vv{{\bm{v}}}
```
```{=latex}
\def\vw{{\bm{w}}}
```
```{=latex}
\def\vx{{\bm{x}}}
```
```{=latex}
\def\vy{{\bm{y}}}
```
```{=latex}
\def\vz{{\bm{z}}}
```
```{=latex}
\def\evalpha{{\alpha}}
```
```{=latex}
\def\evbeta{{\beta}}
```
```{=latex}
\def\evepsilon{{\epsilon}}
```
```{=latex}
\def\evlambda{{\lambda}}
```
```{=latex}
\def\evomega{{\omega}}
```
```{=latex}
\def\evmu{{\mu}}
```
```{=latex}
\def\evpsi{{\psi}}
```
```{=latex}
\def\evsigma{{\sigma}}
```
```{=latex}
\def\evtheta{{\theta}}
```
```{=latex}
\def\eva{{a}}
```
```{=latex}
\def\evb{{b}}
```
```{=latex}
\def\evc{{c}}
```
```{=latex}
\def\evd{{d}}
```
```{=latex}
\def\eve{{e}}
```
```{=latex}
\def\evf{{f}}
```
```{=latex}
\def\evg{{g}}
```
```{=latex}
\def\evh{{h}}
```
```{=latex}
\def\evi{{i}}
```
```{=latex}
\def\evj{{j}}
```
```{=latex}
\def\evk{{k}}
```
```{=latex}
\def\evl{{l}}
```
```{=latex}
\def\evm{{m}}
```
```{=latex}
\def\evn{{n}}
```
```{=latex}
\def\evo{{o}}
```
```{=latex}
\def\evp{{p}}
```
```{=latex}
\def\evq{{q}}
```
```{=latex}
\def\evr{{r}}
```
```{=latex}
\def\evs{{s}}
```
```{=latex}
\def\evt{{t}}
```
```{=latex}
\def\evu{{u}}
```
```{=latex}
\def\evv{{v}}
```
```{=latex}
\def\evw{{w}}
```
```{=latex}
\def\evx{{x}}
```
```{=latex}
\def\evy{{y}}
```
```{=latex}
\def\evz{{z}}
```
```{=latex}
\def\mA{{\bm{A}}}
```
```{=latex}
\def\mB{{\bm{B}}}
```
```{=latex}
\def\mC{{\bm{C}}}
```
```{=latex}
\def\mD{{\bm{D}}}
```
```{=latex}
\def\mE{{\bm{E}}}
```
```{=latex}
\def\mF{{\bm{F}}}
```
```{=latex}
\def\mG{{\bm{G}}}
```
```{=latex}
\def\mH{{\bm{H}}}
```
```{=latex}
\def\mI{{\bm{I}}}
```
```{=latex}
\def\mJ{{\bm{J}}}
```
```{=latex}
\def\mK{{\bm{K}}}
```
```{=latex}
\def\mL{{\bm{L}}}
```
```{=latex}
\def\mM{{\bm{M}}}
```
```{=latex}
\def\mN{{\bm{N}}}
```
```{=latex}
\def\mO{{\bm{O}}}
```
```{=latex}
\def\mP{{\bm{P}}}
```
```{=latex}
\def\mQ{{\bm{Q}}}
```
```{=latex}
\def\mR{{\bm{R}}}
```
```{=latex}
\def\mS{{\bm{S}}}
```
```{=latex}
\def\mT{{\bm{T}}}
```
```{=latex}
\def\mU{{\bm{U}}}
```
```{=latex}
\def\mV{{\bm{V}}}
```
```{=latex}
\def\mW{{\bm{W}}}
```
```{=latex}
\def\mX{{\bm{X}}}
```
```{=latex}
\def\mY{{\bm{Y}}}
```
```{=latex}
\def\mZ{{\bm{Z}}}
```
```{=latex}
\def\mBeta{{\bm{\beta}}}
```
```{=latex}
\def\mPhi{{\bm{\Phi}}}
```
```{=latex}
\def\mLambda{{\bm{\Lambda}}}
```
```{=latex}
\def\mSigma{{\bm{\Sigma}}}
```
```{=latex}
\newcommand{\tens}[1]{\bm{\mathsfit{#1}}}
```
```{=latex}
\def\tA{{\tens{A}}}
```
```{=latex}
\def\tB{{\tens{B}}}
```
```{=latex}
\def\tC{{\tens{C}}}
```
```{=latex}
\def\tD{{\tens{D}}}
```
```{=latex}
\def\tE{{\tens{E}}}
```
```{=latex}
\def\tF{{\tens{F}}}
```
```{=latex}
\def\tG{{\tens{G}}}
```
```{=latex}
\def\tH{{\tens{H}}}
```
```{=latex}
\def\tI{{\tens{I}}}
```
```{=latex}
\def\tJ{{\tens{J}}}
```
```{=latex}
\def\tK{{\tens{K}}}
```
```{=latex}
\def\tL{{\tens{L}}}
```
```{=latex}
\def\tM{{\tens{M}}}
```
```{=latex}
\def\tN{{\tens{N}}}
```
```{=latex}
\def\tO{{\tens{O}}}
```
```{=latex}
\def\tP{{\tens{P}}}
```
```{=latex}
\def\tQ{{\tens{Q}}}
```
```{=latex}
\def\tR{{\tens{R}}}
```
```{=latex}
\def\tS{{\tens{S}}}
```
```{=latex}
\def\tT{{\tens{T}}}
```
```{=latex}
\def\tU{{\tens{U}}}
```
```{=latex}
\def\tV{{\tens{V}}}
```
```{=latex}
\def\tW{{\tens{W}}}
```
```{=latex}
\def\tX{{\tens{X}}}
```
```{=latex}
\def\tY{{\tens{Y}}}
```
```{=latex}
\def\tZ{{\tens{Z}}}
```
```{=latex}
\def\gA{{\mathcal{A}}}
```
```{=latex}
\def\gB{{\mathcal{B}}}
```
```{=latex}
\def\gC{{\mathcal{C}}}
```
```{=latex}
\def\gD{{\mathcal{D}}}
```
```{=latex}
\def\gE{{\mathcal{E}}}
```
```{=latex}
\def\gF{{\mathcal{F}}}
```
```{=latex}
\def\gG{{\mathcal{G}}}
```
```{=latex}
\def\gH{{\mathcal{H}}}
```
```{=latex}
\def\gI{{\mathcal{I}}}
```
```{=latex}
\def\gJ{{\mathcal{J}}}
```
```{=latex}
\def\gK{{\mathcal{K}}}
```
```{=latex}
\def\gL{{\mathcal{L}}}
```
```{=latex}
\def\gM{{\mathcal{M}}}
```
```{=latex}
\def\gN{{\mathcal{N}}}
```
```{=latex}
\def\gO{{\mathcal{O}}}
```
```{=latex}
\def\gP{{\mathcal{P}}}
```
```{=latex}
\def\gQ{{\mathcal{Q}}}
```
```{=latex}
\def\gR{{\mathcal{R}}}
```
```{=latex}
\def\gS{{\mathcal{S}}}
```
```{=latex}
\def\gT{{\mathcal{T}}}
```
```{=latex}
\def\gU{{\mathcal{U}}}
```
```{=latex}
\def\gV{{\mathcal{V}}}
```
```{=latex}
\def\gW{{\mathcal{W}}}
```
```{=latex}
\def\gX{{\mathcal{X}}}
```
```{=latex}
\def\gY{{\mathcal{Y}}}
```
```{=latex}
\def\gZ{{\mathcal{Z}}}
```
```{=latex}
\def\sA{{\mathbb{A}}}
```
```{=latex}
\def\sB{{\mathbb{B}}}
```
```{=latex}
\def\sC{{\mathbb{C}}}
```
```{=latex}
\def\sD{{\mathbb{D}}}
```
```{=latex}
\def\sF{{\mathbb{F}}}
```
```{=latex}
\def\sG{{\mathbb{G}}}
```
```{=latex}
\def\sH{{\mathbb{H}}}
```
```{=latex}
\def\sI{{\mathbb{I}}}
```
```{=latex}
\def\sJ{{\mathbb{J}}}
```
```{=latex}
\def\sK{{\mathbb{K}}}
```
```{=latex}
\def\sL{{\mathbb{L}}}
```
```{=latex}
\def\sM{{\mathbb{M}}}
```
```{=latex}
\def\sN{{\mathbb{N}}}
```
```{=latex}
\def\sO{{\mathbb{O}}}
```
```{=latex}
\def\sP{{\mathbb{P}}}
```
```{=latex}
\def\sQ{{\mathbb{Q}}}
```
```{=latex}
\def\sR{{\mathbb{R}}}
```
```{=latex}
\def\sS{{\mathbb{S}}}
```
```{=latex}
\def\sT{{\mathbb{T}}}
```
```{=latex}
\def\sU{{\mathbb{U}}}
```
```{=latex}
\def\sV{{\mathbb{V}}}
```
```{=latex}
\def\sW{{\mathbb{W}}}
```
```{=latex}
\def\sX{{\mathbb{X}}}
```
```{=latex}
\def\sY{{\mathbb{Y}}}
```
```{=latex}
\def\sZ{{\mathbb{Z}}}
```
```{=latex}
\def\emLambda{{\Lambda}}
```
```{=latex}
\def\emA{{A}}
```
```{=latex}
\def\emB{{B}}
```
```{=latex}
\def\emC{{C}}
```
```{=latex}
\def\emD{{D}}
```
```{=latex}
\def\emE{{E}}
```
```{=latex}
\def\emF{{F}}
```
```{=latex}
\def\emG{{G}}
```
```{=latex}
\def\emH{{H}}
```
```{=latex}
\def\emI{{I}}
```
```{=latex}
\def\emJ{{J}}
```
```{=latex}
\def\emK{{K}}
```
```{=latex}
\def\emL{{L}}
```
```{=latex}
\def\emM{{M}}
```
```{=latex}
\def\emN{{N}}
```
```{=latex}
\def\emO{{O}}
```
```{=latex}
\def\emP{{P}}
```
```{=latex}
\def\emQ{{Q}}
```
```{=latex}
\def\emR{{R}}
```
```{=latex}
\def\emS{{S}}
```
```{=latex}
\def\emT{{T}}
```
```{=latex}
\def\emU{{U}}
```
```{=latex}
\def\emV{{V}}
```
```{=latex}
\def\emW{{W}}
```
```{=latex}
\def\emX{{X}}
```
```{=latex}
\def\emY{{Y}}
```
```{=latex}
\def\emZ{{Z}}
```
```{=latex}
\def\emSigma{{\Sigma}}
```
```{=latex}
\newcommand{\etens}[1]{\mathsfit{#1}}
```
```{=latex}
\def\etLambda{{\etens{\Lambda}}}
```
```{=latex}
\def\etA{{\etens{A}}}
```
```{=latex}
\def\etB{{\etens{B}}}
```
```{=latex}
\def\etC{{\etens{C}}}
```
```{=latex}
\def\etD{{\etens{D}}}
```
```{=latex}
\def\etE{{\etens{E}}}
```
```{=latex}
\def\etF{{\etens{F}}}
```
```{=latex}
\def\etG{{\etens{G}}}
```
```{=latex}
\def\etH{{\etens{H}}}
```
```{=latex}
\def\etI{{\etens{I}}}
```
```{=latex}
\def\etJ{{\etens{J}}}
```
```{=latex}
\def\etK{{\etens{K}}}
```
```{=latex}
\def\etL{{\etens{L}}}
```
```{=latex}
\def\etM{{\etens{M}}}
```
```{=latex}
\def\etN{{\etens{N}}}
```
```{=latex}
\def\etO{{\etens{O}}}
```
```{=latex}
\def\etP{{\etens{P}}}
```
```{=latex}
\def\etQ{{\etens{Q}}}
```
```{=latex}
\def\etR{{\etens{R}}}
```
```{=latex}
\def\etS{{\etens{S}}}
```
```{=latex}
\def\etT{{\etens{T}}}
```
```{=latex}
\def\etU{{\etens{U}}}
```
```{=latex}
\def\etV{{\etens{V}}}
```
```{=latex}
\def\etW{{\etens{W}}}
```
```{=latex}
\def\etX{{\etens{X}}}
```
```{=latex}
\def\etY{{\etens{Y}}}
```
```{=latex}
\def\etZ{{\etens{Z}}}
```
```{=latex}
\newcommand{\pdata}{p_{\rm{data}}}
```
```{=latex}
\newcommand{\ptrain}{\hat{p}_{\rm{data}}}
```
```{=latex}
\newcommand{\Ptrain}{\hat{P}_{\rm{data}}}
```
```{=latex}
\newcommand{\pmodel}{p_{\rm{model}}}
```
```{=latex}
\newcommand{\Pmodel}{P_{\rm{model}}}
```
```{=latex}
\newcommand{\ptildemodel}{\tilde{p}_{\rm{model}}}
```
```{=latex}
\newcommand{\pencode}{p_{\rm{encoder}}}
```
```{=latex}
\newcommand{\pdecode}{p_{\rm{decoder}}}
```
```{=latex}
\newcommand{\precons}{p_{\rm{reconstruct}}}
```
```{=latex}
\newcommand{\E}{\mathbb{E}}
```
```{=latex}
\newcommand{\Ls}{\mathcal{L}}
```
```{=latex}
\newcommand{\R}{\mathbb{R}}
```
```{=latex}
\newcommand{\emp}{\tilde{p}}
```
```{=latex}
\newcommand{\lr}{\alpha}
```
```{=latex}
\newcommand{\reg}{\lambda}
```
```{=latex}
\newcommand{\rect}{\mathrm{rectifier}}
```
```{=latex}
\newcommand{\softmax}{\mathrm{softmax}}
```
```{=latex}
\newcommand{\sigmoid}{\sigma}
```
```{=latex}
\newcommand{\softplus}{\zeta}
```
```{=latex}
\newcommand{\KL}{D_{\mathrm{KL}}}
```
```{=latex}
\newcommand{\Var}{\mathrm{Var}}
```
```{=latex}
\newcommand{\standarderror}{\mathrm{SE}}
```
```{=latex}
\newcommand{\Cov}{\mathrm{Cov}}
```
```{=latex}
\newcommand{\normlzero}{L^0}
```
```{=latex}
\newcommand{\normlone}{L^1}
```
```{=latex}
\newcommand{\normltwo}{L^2}
```
```{=latex}
\newcommand{\normlp}{L^p}
```
```{=latex}
\newcommand{\normmax}{L^\infty}
```
```{=latex}
\newcommand{\parents}{Pa}
```
```{=latex}
\DeclareMathOperator*{\argmax}{arg\,max}
```
```{=latex}
\DeclareMathOperator*{\argmin}{arg\,min}
```
```{=latex}
\let\ab\allowbreak
```
Experiment Details fontupper=, colback=orange!5, colframe=orange!35!black, fonttitle=**, boxsep=1pt, left=1.5mm, right=1.5mm, top=2mm, bottom=1mm,** detail

Definition fontupper=, colback=green!5, colframe=green!35!black, fonttitle=**, boxsep=1pt, left=1.5mm, right=1.5mm, top=2mm, bottom=1mm,** def

Theorem fontupper=, colback=red!5, colframe=red!35!black, fonttitle=**, boxsep=1pt, left=1.5mm, right=1.5mm, top=2mm, bottom=1mm,** theorem Lemma fontupper=, colback=blue!3, colframe=blue!35!black, fonttitle=**, boxsep=1pt, left=1.5mm, right=1.5mm, top=2mm, bottom=1mm,** lemma Proposition breakable, enhanced, fontupper=, colback=red!5, colframe=red!35!black, fonttitle=**, boxsep=1pt, left=1.5mm, right=1.5mm, top=2mm, bottom=1mm,** proposition Corollary breakable, enhanced, fontupper=, colback=red!5, colframe=red!35!black, fonttitle=**, boxsep=1pt, left=1.5mm, right=1.5mm, top=2mm, bottom=1mm,** corollary

Introduction {#sec:intro}
============

One of the most important aspects of advanced machine intelligence is the ability to understand the physical world that surrounds us. This ability enables AI systems to learn, reason, plan and act in the real world in order to assist humans [@lecun2022path]. Intelligent systems that need to act in the real world includes wearable devices and robots [@fung2025embodied]. Machine learning tasks that make up for this ability include captioning, retrieval, visual question answering, action tracking, reasoning and planning etc [@bordes2024introduction; @chen2025planning]. Systems for such real-world applications must have real-time response with low latency and inference cost.

Currently, the common approach to achieve these tasks is to use large token-generative Vision Language Models (VLMs) [@liu2023visual; @dai2023instructblip; @alayrac2022flamingo; @chen2024visual; @cho2025perceptionlm; @chen2022pali], which takes visual input $X_V$, textual query $X_Q$ to generate desired textual response $Y$ autoregressively in token space, *i.e.,* $(X_V, X_Q) \mapsto Y$. This is straightforward but inadequate for two main reasons. First, VLMs are expensive to develop, because they are trained to generate responses $Y$ to queries by capturing both task-relevant semantics with task-irrelevant surface linguistic features such as words choice, style or paraphrasing. During training, VLMs must model both aspects, which results in unnecessary computing effort spent producing diverse token sequences that ultimately do not impact the correctness of the output. Second, real-time tasks involving live streaming video (*e.g.,* live action tracking) require sparse and selective decoding (*e.g.,*, emitting a description only when a new event occurs) [@zhou2024streaming]. However, VLMs rely on autoregressive token-by-token decoding, which must be completed before revealing the underlying semantics of $Y$. This process introduces unnecessary latency and hampers the ability to update semantics dynamically in real time.

![](figures/vl_jepa_small.png){#fig:vl_jepa_small width="0.55\\linewidth"}

This paper introduces the Joint Embedding Predictive Architecture for Vision-Language (VL-JEPA), turning expensive learning of data-space token generation into more efficient latent-space semantic prediction. As illustrated in Fig. [1](#fig:vl_jepa_small){reference-type="ref" reference="fig:vl_jepa_small"}, the model employs **x-encoder** to map vision inputs $X_V$ into embedding $S_V$, a **y-encoder** to map the textual target $Y$ into an embedding $S_Y$, and a **predictor** that learns the mapping $(S_V, X_Q) \mapsto S_Y$ where $X_Q$ is a textual query (*i.e.,* the prompt). The training objective is defined in the embedding space $\mathcal{L}_\texttt{VL-JEPA} = D(\hat{S}_Y, S_Y)$ instead of the data space $\mathcal{L}_\texttt{VLM} = D(\hat{Y}, Y)$. During inference, a **y-decoder** reads out the predicted embedding $\hat{S}_Y$ to text space $\hat{Y}$ when needed.

Thanks to its **non-generative** nature, VL-JEPA is not forced to reconstruct every surface detail of $Y$ in the token space. Instead, it only needs to predict the abstract representation $S_Y$ in the embedding space. In the raw one-hot token space, different plausible $Y$ outputs for the same input often appear nearly orthogonal if they don't share overlapping tokens. However, in the embedding space, these diverse targets can be mapped to nearby points that share similar semantics. This simplifies the target distribution thus makes the learning process more efficient. In addition, unlike VLMs, this approach eliminates the need for learning language generation with a heavy decoder during training, resulting in significant efficiency gains.

Thanks to its **non-autoregressive** nature, VL-JEPA can produce continuous streams of target semantic embeddings within sliding windows with minimal latency as it only require a single forward pass without autoregressive decoding. This is particularly advantageous for real-time online applications such as live action tracking, scene recognition, or planning, where the embedding stream can be selectively decoded by a lightweight y-decoder, enabling efficient and prompt updates.

In this work, we empirically validate the advantages of VL-JEPA. We conduct a strictly controlled comparison against classical token-generative VLM [@liu2023visual; @cho2025perceptionlm]: both setups use the same vision encoder, spatial resolution, frame rate, training data, batch size, and number of iterations, etc., with the *only* difference being the objective in token space or embedding space. Under this matched training condition, VL-JEPA delivers consistently higher performance on zero-shot captioning and classification while using roughly half the trainable parameters, indicating that embedding-space supervision improves learning efficiency.

Beyond the training phase, VL-JEPA also delivers substantial inference-time efficiency improvement through *selective decoding*, where decoding happens only due to significant change in the predicted embedding stream. Empirically, this strategy reduces the number of decoding operations by $\sim$2.85$\times$ while preserving overall output quality measured by average CIDEr scores.

Our final VL-JEPA models are trained in two stages: 1) a pretraining stage using caption data to establish robust vision-language alignment, and 2) a supervised finetuning (SFT) stage that equips the model with VQA capabilities. The model resulting from the first stage, denoted as **VL-JEPA$_\texttt{BASE}$**, is evaluated on *zero-shot* classification and text-to-video retrieval. VL-JEPA$_\texttt{BASE}$ outperforms CLIP [@radford2021learning], SigLIP2 [@tschannen2025siglip], and Perception Encoder [@bolya2025perception] models in terms of average classification accuracy (across 8 datasets) and retrieval recall\@1 (across 8 datasets). Following the second stage, the resulting **VL-JEPA$_\texttt{SFT}$** demonstrates significantly improved classification performance due to its exposure to in-domain training data. As a unified *generalist* model, VL-JEPA$_\texttt{SFT}$ approaches the performance of *specialist* models optimized for individual benchmarks. Simultaneously, VL-JEPA$_\texttt{SFT}$ exhibits effective VQA capabilities, achieving performance on par with established VLM families, such as InstructBLIP [@dai2023instructblip] and Qwen-VL [@bai2023qwen], across four datasets covering compositional visual reasoning [@hudson2019gqa], complex object counting [@acharya2019tallyqa], and object hallucination [@li2023evaluating; @li2025analyzing].

In summary, the contributions of this paper are as follows:

-   We introduce VL-JEPA, the first non-generative model that can perform general-domain vision-language tasks in real-time, built on a joint embedding predictive architecture.

-   We demonstrate in controlled experiments that VL-JEPA, trained with latent space embedding prediction, outperforms VLMs that rely on data space token prediction.

-   We show that VL-JEPA delivers significant efficiency gains over VLMs for online video streaming applications, thanks to its non-autoregressive design and native support for selective decoding.

-   We highlight that our VL-JEPA$_\texttt{SFT}$ model, with an unified model architecture, can effectively handle a wide range of classification, retrieval, and VQA tasks at the same time.

![image](figures/vl_jepa.png){width="\\linewidth"}

Methodology {#sec:method}
===========

We propose **VL-JEPA** (Fig. [1](#fig:vl_jepa_small){reference-type="ref" reference="fig:vl_jepa_small"}), a model with the joint embedding predictive architecture (JEPA) for vision-language tasks. VL-JEPA is trained with triplets $\langle X_V, X_Q, Y \rangle$, where $X_V$ denotes the **visual input** (a single image or a sequence of video frames), $X_Q$ is a **textual query** (*i.e.,* a question) and $Y$ is the **textual target** (*i.e.,* the answer) to be predicted. The VL-JEPA comprises of four components:

1.  **`X-Encoder`** $(X_V \mapsto S_V)$ compresses high-volume visual inputs to compact visual embeddings--a sequence of continuous vectors analogous to \`\`visual tokens" in classical VLMs.

2.  **`Predictor`** $(\langle S_V, X_Q\rangle \mapsto \hat{S}_Y)$ is the core component of VL-JEPA. It maps visual embeddings to a prediction of target embedding, with a textual query as conditioning.

3.  **`Y-Encoder`** $(Y \mapsto S_Y)$ embeds the textual target into a continuous latent space as the prediction target. The target embedding is expected to abstract away task irrelevant information.

4.  **`Y-Decoder`** $(\hat{S}_Y \mapsto \hat{Y})$ is not involved during the main training phrase of VL-JEPA. At inference time, it translates the predicted embedding as human-readable text when necessary.

Fig. [\[fig:vl\_jepa\]](#fig:vl_jepa){reference-type="ref" reference="fig:vl_jepa"} illustrates how we instantiate the VL-JEPA architecture in this paper. For the `X-Encoder`, we chose V-JEPA 2 [@assran2025v], a Vision Transformer that outputs a sequence of visual tokens, which are then projected and fed into the `Predictor` initialized using Llama 3 Transformer layers. Query conditioning is achieved by tokenizing and embedding the textual query and feeding the resulting textual token embeddings into the `Predictor` along with the visual embeddings. The outputs of the Llama 3 Transformer layers are pooled and projected into the target embedding space produced by the `Y-Encoder`, which is initialized by EmbeddingGemma-300M [@vera2025embeddinggemma]. We provide more technical details in §[3](#sec:implementation-details){reference-type="ref" reference="sec:implementation-details"}.

**Training Objective.** JEPA models typically optimize two objectives jointly: 1) prediction error in the embedding space, and 2) additional regularization that avoids representation collapse [@bardes2021vicreg; @balestriero2025lejepa]. Any loss that implements these two properties can be applied to VL-JEPA. Alternatively, the regularization term can be replaced by other anti-collapse strategies, such as using an exponential moving average (EMA) for the `Y-Encoder` [@assran2025v] or freezing the `Y-Encoder` [@zhou2025dino].

In this work, we adopt the **InfoNCE loss** [@radford2021learning] due to its maturity in the vision-language domain. More advanced non-sample-contrastive regularization, such as VICReg [@bardes2021vicreg] and SIGReg [@balestriero2025lejepa] can also be applied but we leave the exploration to future works. InfoNCE loss can be mathematically divided [@wang2020understanding] into: 1) a representation *alignment* term that minimizes the distance between normalized prediction and target embeddings, and 2) a *uniformity* regularization term that pushes embeddings in a batch apart from each other, thus avoiding representation collapse. We train the `Predictor` and the `Y-Encoder` jointly with bi-directional InfoNCE loss, enabling them to mutually learn from each other.

Compared to the token-space loss used by generative VLMs, calculating the training loss in the embedding space is beneficial due to the **simplified target distribution**. Specifically, many real-world prediction tasks are inherently ill-posed: for the same input $X$, there may exist multiple plausible targets $Y$ that are all acceptable. For example, given the query *\`\`What will happen here if I flip this light switch down?"*, both *\`\`the lamp is turned off"* and *\`\`room will go dark"* are valid answers. In the raw one-hot token space, however, the two sequences are orthogonal since they share no overlapping tokens. But when VL-JEPA's `Y-Encoder` embeds them into nearby points (ideally yielding a compact unimodal distribution), the learning task becomes much easier: the model no longer needs to fit multiple disjoint high-density regions in sparse token space, but only a single coherent mode in a continuous embedding space.

**Multi-tasking.** VL-JEPA supports diverse tasks using a *single*, *unified* architecture (Fig. [\[fig:vl\_jepa\]](#fig:vl_jepa){reference-type="ref" reference="fig:vl_jepa"}). For vision-text-to-text generation tasks, such as captioning or open-ended VQA, the query $X_Q$ is a captioning prompt or a question, and the predictor learns to predict the embedding of the target output, $\hat S_Y$, which is then decoded into text. VL-JEPA also supports CLIP-style open-vocabulary classification and discriminative VQA, where candidate label texts are encoded into embeddings and compared with prediction $\hat S_Y$ to select the nearest match. For text-to-video retrieval, candidate videos are mapped to their predicted embeddings $\hat S_Y$ using a retrieval a captioning prompt, and then ranked by similarity to the encoded textual retrieval query.

**Selective Decoding.** Real-world video applications often require online streaming inference, such as tracking user actions in smart glasses for procedural assistance [@chen2024videollm], monitoring world states for online planning, navigation and robotics [@shukor2025smolvla; @black2025real; @song2025accelerating]. A central challenge is balancing two competing needs: the model must continuously update semantics as new frames arrive, but computational efficiency and latency are critical.

Existing VLMs typically rely on explicit memory mechanisms [@zhou2024streaming; @qian2024streaming] to decide when to decode or complex KV-cache optimizations [@di2025streaming] for efficiency, since autoregressive language models are expensive to run continuously. VL-JEPA, in contrast, natively supports selective decoding. Since it predicts a semantic answer embedding non-autoregressively, the model provides a continuous semantic stream of $\hat S_Y$ that can be monitored in real time. This stream can be stabilized with simple smoothing (*e.g.,* average pooling) and decoded only when a significant semantic shift is detected, such as when the local window variance exceeds a threshold. In this way, VL-JEPA maintains always-on semantic monitoring while avoiding unnecessary decoding, achieving both responsiveness and efficiency.

Implementation of VL-JEPA {#sec:implementation-details}
=========================

Model Architecture
------------------

**X-Encoder.** Unless otherwise specified, we use a frozen `V-JEPA 2 ViT-L` [@assran2025v] with 304M parameters, a self-supervised vision model that excels at both image and video tasks. Each video input is uniformly sampled into frames at 256$^2$ resolution. For image inputs, the same image is duplicated to match the input shape.

**Predictor.** The predictor is initialized with the last 8 Transformer layers of `Llama-3.2-1B`, resulting in 490M trainable parameters. The text tokenizer and token embedding are also from `Llama-3.2-1B`. We allow maximum 512 query tokens, and put `[PAD]` tokens for short queries. We disable the causal attention mask so that both vision and query embeddings can be jointly attended. Linear projections connect the predictor with the vision and text embeddings, and average pooling on non-`[PAD]` tokens is applied to obtain the predicted target embedding.

**Y-Encoder.** We use `EmbeddingGemma-300M` [@vera2025embeddinggemma] as the initialization of the `Y-Encoder`. We set maximum context length of 512 to handle detailed captions. We found that setting a learning rate multiplier of $\times$0.05 to all text encoder parameters improves performance, since the quality of embedding prediction would be suboptimal in the beginning of training. Linear projection head is applied to both `Predictor` and `Y-Encoder`, obtaining a shared embedding space with 1,536 dimensions, where the loss is calculated.

Two-stage Training
------------------

**Large-scale Pretraining.** VL-JEPA is trained with two stages. The first query-free pretraining stage aims to establish robust vision-language alignment using massive caption data. We use Datacomp [@gadre2023datacomp] and YFCC-100M [@thomee2016yfcc100m] for image-text data. For video-text data, we use [Action100M]{.smallcaps} [@chen2026action100m], which consists action description and video captions generated on HowTo100M videos [@chen2025planning].

We first do image-only training on Datacomp and YFCC-100M with only 1 frame per visual input, which allows us to use a large batch size of 24k. After 100k iterations, the model has seen 2B samples and achieved 61.6% ImageNet zero-shot accuracy (without prompt ensembling). Then, we continue with video pretraining with 8 frames per input for 60k iterations and 32 frames for 10k iterations in the end. The pretraining takes 4 weeks using 24 nodes with 8$\times$NVIDIA H200 GPUs each. We adopt a constant learning rate of $5{\times}10^{-5}$ to facilitate extended training. We refer the readers to the Action100M paper [@chen2026action100m] for more details. We call the resulting model **VL-JEPA$_\texttt{BASE}$** and measure *zero-shot* classification and retrieval performance with this model.

**Supervised Finetuning.** The second query-conditioned supervised finetuning (SFT) stage empowers VL-JEPA VQA capabilities while maintaining the pretrained vision-language alignment for classification and retrieval. The training data is selected from the PLM data mixture [@cho2025perceptionlm], including 25M VQA samples, 2.8M captioning samples, 1.8M classification samples, and downsampled pretraining stage data to avoid catastrophic forgetting.

We train the model for 83k steps with a batch size of 3,072 ($\sim$2.5s days with 24 nodes), with cosine learning rate annealing applied to improve convergence. Since excessive human labeled data is included in this SFT data mixture, we no longer emphasize *zero-shot* evaluation for the resulting **VL-JEPA$_\texttt{SFT}$** from this stage. Instead, we evaluate VQA capabilities and compare it with state-of-the-art *specialist* models.

Experiments {#sec:experiments}
===========

We begin by evaluating VL-JEPA's classification and retrieval performance in §[4.1](#sec:exp.cls_ret){reference-type="ref" reference="sec:exp.cls_ret"}, and benchmark VL-JEPA on VQA datasets in §[4.2](#sec:exp.vqa){reference-type="ref" reference="sec:exp.vqa"}. We demonstrate application of VL-JEPA for understanding the relationship between world state changes and action concepts (*i.e.,* inverse dynamics) in §[4.3](#sec:exp.worldprediction){reference-type="ref" reference="sec:exp.worldprediction"}. In §[4.5](#sec:exp.embbeddings_vs_tokens){reference-type="ref" reference="sec:exp.embbeddings_vs_tokens"}, we demonstrate the advantage of embedding prediction by comparing it with a token-predictive VLM baseline under a strictly controlled setting. In §[4.6](#sec:exp.selective decoding){reference-type="ref" reference="sec:exp.selective decoding"}, we evaluate the effectiveness of VL-JEPA's selective decoding, and show that it reduces decoding cost while maintaining the performance. Next, we analyze VL-JEPA's `Y-Encoder` in §[4.7](#sec:exp.y-encoder){reference-type="ref" reference="sec:exp.y-encoder"}. Finally, we present ablation studies in §[4.8](#sec:exp.ablations){reference-type="ref" reference="sec:exp.ablations"}.

Classification and Retrieval {#sec:exp.cls_ret}
----------------------------

**Evaluation Setup.** We evaluate VL-JEPA following the CLIP-style evaluation protocol (see Fig.[\[fig:vl\_jepa\]](#fig:vl_jepa){reference-type="ref" reference="fig:vl_jepa"} and §[2](#sec:method){reference-type="ref" reference="sec:method"} \`\`Multi-tasking"). We assess VL-JEPA on a broad suite of benchmarks, including 8 classification datasets and 8 retrieval datasets. For *zero-shot* evaluation, we compare against *generalist foundation models* CLIP [@radford2021learning], SigLIP2 [@tschannen2025siglip], and Perception Encoder (PE-Core)[@bolya2025perception]. We additionally report reference numbers from *specialist models* that are individually optimized for each benchmark.

**Results.** Table [\[tab:cls\_ret\]](#tab:cls_ret){reference-type="ref" reference="tab:cls_ret"} summarizes the results. In the strict zero-shot setting, VL-JEPA$_\texttt{BASE}$ achieves higher average accuracy (52.5 vs 44.7) across the 8 classification datasets and higher average recall\@1 (63.7 vs 58.1) across the 8 retrieval datasets than the best baseline PE-Core-G. Per-dataset scores show that VL-JEPA$_\texttt{BASE}$ is particularly strong on *motion-centric* benchmarks (SSv2, EK-100, EgoExo4D, and step recognition on COIN and CrossTask), while relatively weaker on *appearance-centric* benchmarks (Kinetics-400 and task recognition on COIN and CrossTask). This is due to VL-JEPA$_\texttt{BASE}$ has seen substantially fewer vision-language pairs (only 3.6B in comparison with PE-Core-G's 86B). After supervised finetuning, VL-JEPA$_\texttt{SFT}$ improves significantly upon VL-JEPA$_\texttt{BASE}$ since the model has seen in-domain training data. As a single *generalist* model, the performance of VL-JEPA$_\texttt{SFT}$ is approaching *specialist* models optimized individually for each dataset.

Visual Question Answering {#sec:exp.vqa}
-------------------------

**Evaluation Setup.** We evaluate VL-JEPA$_\texttt{SFT}$ on discriminative VQA tasks. The inference process involves encode candidate answers using the `Y-Encoder` and selecting the answer that minimizes the distance to the predicted embedding (see Fig. [\[fig:vl\_jepa\]](#fig:vl_jepa){reference-type="ref" reference="fig:vl_jepa"}). We select four benchmarks that prioritize visual perception rather than knowledge and reasoning. We evaluate on GQA [@hudson2019gqa], a dataset for real-world visual reasoning and compositional QA, reporting accuracy on the testdev-balanced split. For TallyQA [@acharya2019tallyqa], which targets complex counting, we follow [@chen2022pali] and report the weighted average accuracy across the \`\`simple" and \`\`complex" splits. Finally, to assess object hallucination, we utilize POPE [@li2023evaluating] and POPEv2 [@li2025analyzing]. For POPE, we report the average accuracy across the \`\`random", \`\`popular", and \`\`adversarial" settings on MS-COCO.

**Results.** Table [4.2](#sec:exp.vqa){reference-type="ref" reference="sec:exp.vqa"} compares VL-JEPA$_\texttt{SFT}$ against established VLM families, including BLIP-2 [@li2023blip], InstructBLIP [@dai2023instructblip], Qwen-VL [@bai2023qwen], InternVL [@chen2024internvl], Llava-1.5 [@vallaeys2024improved], SmolVLM [@marafioti2025smolvlm], PaLI [@chen2022pali], PaliGemma [@beyer2024paligemma], and Video-LLaVA [@lin2024video]. VL-JEPA$_\texttt{SFT}$ outperforms many of these baselines despite requiring significantly less computational resources--classical VLMs rely on extensively pretrained CLIP backbones combined with multi-stage visual instruction tuning. In comparison, VL-JEPA$_\texttt{SFT}$ employs a *unified architecture* and a *single embedding space* to seamlessly handle VQA, classification, and retrieval (Tab. [\[tab:cls\_ret\]](#tab:cls_ret){reference-type="ref" reference="tab:cls_ret"}).

WorldPrediction-WM {#sec:exp.worldprediction}
------------------

**Evaluation Setup.** We evaluate VL-JEPA on the \`\`world modeling'' task in the [WorldPrediction]{.smallcaps} [@chen2025worldprediction] benchmark, where the model is provided with two images representing the initial and final world states and must identify, among four candidate video clips, the action that explains the observed transition. To adapt VL-JEPA, we duplicate and concatenate the initial and final state images to extract a *state embedding*, and encode each action candidate into *action embeddings*. The model then selects the candidate whose embedding is closest to the state embedding.

**Results.** Table [\[tab:worldprediction\]](#tab:worldprediction){reference-type="ref" reference="tab:worldprediction"} shows accuracy comparisons. VL-JEPA$_\texttt{BASE}$ attains **63.9%** and VL-JEPA$_\texttt{SFT}$ attains **65.7%** top-1 accuracy on [WorldPrediction]{.smallcaps}-WM, establishing a new state of the art. Our VL-JEPA model not only substantially surpasses existing VLMs of comparable or larger scale but also exceeds the performance of frontier LLMs such as GPT-4o, Claude-3.5-sonnet, and Gemini-2.0.

Action Anticipation
-------------------

[\[tab:action\_anticipation\]]{#tab:action_anticipation label="tab:action_anticipation"}

[\[tab:coin\_forecasting\]]{#tab:coin_forecasting label="tab:coin_forecasting"}

![image](figures/fair_comparison.png){width="\\linewidth"}

**Evauation Setup.** We assess the action anticipation (*i.e.,* forecasting future actions) capabilities of fine-tuned VL-JEPA on two benchmarks: action anticipation with EPIC-KITCHENS-100 [@damen2022rescaling], and COIN [@tang2019coin] next step forecasting. In this task, given a window of context video, the model is required to predict the next action that will occur after a specified anticipation time---the temporal gap between the end of the context segment and the onset of the subsequent action. VL-JEPA$_\texttt{SFT}$ is fine-tuned and we compare our results against the state-of-the-art baselines.

**Results** As compated in Table VL-JEPA demonstrates strong performance in both benchmarks (Table [\[tab:action\_anticipation\]](#tab:action_anticipation){reference-type="ref" reference="tab:action_anticipation"} and Table [\[tab:coin\_forecasting\]](#tab:coin_forecasting){reference-type="ref" reference="tab:coin_forecasting"}). For the Epic-Kitchens-100, particularly at the standard 1-second anticipation interval, where it achieves a Recall\@5 of 34.18 -- surpassing the previous V-JEPA 2 that uses the same ViT-L-256px encoder by 1.48 points. While the larger V-JEPA 2 ViT-g-384px model attains the highest score at 1 second, VL-JEPA remains competitive and shows notable advantages at longer anticipation intervals. On the COIN next step forecasting task, as shown in Tab. [\[tab:coin\_forecasting\]](#tab:coin_forecasting){reference-type="ref" reference="tab:coin_forecasting"}, VL-JEPA achieves 56.2%, outperform strong VLM baselines VideoLLM-online [@chen2024videollm], VideoLLM-MoD [@wu2024videollm], ProVideLLM [@chatterjee2025memory]. Overall, these findings demonstrated that VL-JEPA can be finetuned to handle semantic uncertain prediction tasks.

Embedding Prediction vs. Token Prediction: A Controlled Comparison {#sec:exp.embbeddings_vs_tokens}
------------------------------------------------------------------

**Evaluation Setup.** In this section, we compare VL-JEPA to a token-generative VLM baseline under a strictly aligned training conditions. Both models use the same Perception Encoder [@bolya2025perception] (frozen ViT-L-14 with 336$^2$ resolution, no tiling, 16 frames per video) for vision inputs. We use the same training iterations with the same effective batch size of 128, same learning rate scheduler on the same pretraining data mixture described above (§[3](#sec:implementation-details){reference-type="ref" reference="sec:implementation-details"}). The only difference is the prediction task: VL-JEPA predicts target embeddings [@duquenne2023sonar] using a 0.5B predictor, whereas the VLM baseline performs next-token prediction with cross-entropy using a 1B LLM. For VLM, we use the standard training recipe and codebase of PerceptionLM [@cho2025perceptionlm], aligning frozen vision encoder and text-only LLM `Llama-3.2-1B`. For VL-JEPA, we initialize the predictor from the 8-16 layers of `Llama-3.2-1B`.

We evaluate both models at regular checkpoints throughout training spanning from 500K to 15M samples seen. At each checkpoint, we measure the performance on video captioning and video classification. For video captioning, we report CIDEr scores averaged across YouCook2 [@zhou2018towards], MSR-VTT [@xu2016msr] and PVD-Bench [@bolya2025perception]. VL-JEPA decodes the predicted embeddings while VLM generates the tokens directly. For video classification, we report top-5 accuracy averaged across CrossTask-Step, CrossTask-Task [@zhukov2019cross] and EgoExo4D [@grauman2024ego]. For VL-JEPA we choose the candidate with lowest cosine distance to the predicted embedding, while for VLM we pick the class with lowest perplexity.

**Results.** As shown in Fig. [\[fig:vljepa\_vs\_vlm\]](#fig:vljepa_vs_vlm){reference-type="ref" reference="fig:vljepa_vs_vlm"}, both models yield comparable performance after 500K samples seen in both tasks, with respectively 1.23 and 1.35 CIDEr in video captioning and 14.9% and 14.0% top-5 accuracy for VL-JEPA and VLM. After a few iterations, we show that VL-JEPA's performance increase is much sharper compared to VLM, reaching 14.7 CIDEr and 35.3% top-5 accuracy after 5M samples seen. This gap remains constant as training scales at 15M samples with 14.8 CIDEr and 41.0% top-5 accuracy for VL-JEPA, while the VLM baseline yield respectively 7.1 CIDEr and 27.2% top-5 accuracy. This controlled comparison highlights the benefit of predicting embeddings rather than tokens, showing both higher sample efficiency and stronger absolute performance.

We compare the inference cost of the above VL-JEPA and the VLM by pre-loading 64 video frames into memory and repeatedly decoding text 100 times with the same prompt, measuring the average time per sample. As shown in Fig. [\[fig:vljepa\_vs\_vlm\]](#fig:vljepa_vs_vlm){reference-type="ref" reference="fig:vljepa_vs_vlm"} (right most), both models exhibit comparable latency when generating text. What differentiates our model from classical VLM is the decoupling between the prompt processing (\`\`Query Embedding") and the video encoder (\`\`Encoder + Predictor") from the text generation module (\`\`Decoder"). This allows us to only use the first part of the model to perform retrieval and decode text only when needed (see Section [4.6](#sec:exp.selective decoding){reference-type="ref" reference="sec:exp.selective decoding"} below), making our model more scalable for online video inference.

Effectiveness of Selective Decoding {#sec:exp.selective decoding}
-----------------------------------

![image](figures/selective_decoding.png){width="1\\linewidth"}

**Evaluation Setup.** We evaluate the effectiveness of VL-JEPA's embedding-guided selective decoding on long-form video streams. To this end, we design a benchmark task where the goal is to recover a temporal sequence of annotations while minimizing the number of text decoding operations, which dominate inference cost. As shown in Fig. [\[fig:selective\_decoding\]](#fig:selective_decoding){reference-type="ref" reference="fig:selective_decoding"} (left), decoding is performed only at selected points along the VL-JEPA embedding stream, yielding a sequence of $N$ decoded outputs $[(\hat{t}_1, \hat{y}_1), (\hat{t}_2, \hat{y}_2), \ldots, (\hat{t}_N, \hat{y}_N)]$. Each ground-truth annotation $[(t_1, y_1), (t_2, y_2), \ldots, (t_T, y_T)]$ is then aligned to its nearest decoded output in time (illustrated as $\circ \cdot \cdot \cdot \circ$ in Fig. [\[fig:selective\_decoding\]](#fig:selective_decoding){reference-type="ref" reference="fig:selective_decoding"}), and CIDEr is computed between matched pairs. We use the EgoExo4D [@grauman2024ego] validation set in procedural activity domains, which consists of 218 videos with an average duration of 6 minutes and about $T=143$ atomic action annotations per video.

As a baseline, we consider *uniform sampling*, where decoding points are placed at fixed intervals regardless of the underlying video content. Standard streaming VLMs are limited to this strategy, whereas VL-JEPA supports a more effective alternative: *adaptive selection* of decoding points guided by its predicted embeddings. We apply agglomerative clustering with temporal connectivity constraints [@murtagh2012algorithms] to partition the embedding sequence into $N$ segments of high intra-segment monosemanticity [@chen2024subobject], measured by variance (*i.e.,* Ward distance). The intuition is that within a semantically coherent segment, decoded outputs are highly similar, so decoding once per segment captures the essential information while greatly reducing overall decoding cost. The midpoint of each segment is then chosen as the decoding point, and decoding is performed either from the exact embedding or from the average-pooled embedding within the segment.

**Results.** As shown in Fig. [\[fig:selective\_decoding\]](#fig:selective_decoding){reference-type="ref" reference="fig:selective_decoding"} (right), we sweep the average decoding frequency from 2.0 Hz down to 0.01 Hz (*i.e.,* average intervals between consecutive decoding operations from 0.5s to 100s) by adjusting either the stride of uniform sampling or the number of clusters in adaptive selection. Across the entire range, adaptive selection consistently Pareto-dominates uniform sampling. In particular, selective decoding at 0.35 Hz (*i.e.,* $\sim$2.85s interval) matches the performance of uniform decoding at 1 Hz, reducing decoding cost by $\sim$2.85$\times$. We further observe that average pooling provides consistent gains for both strategies, since it provides denoising and stabilization on embeddings prior feeding into the decoder.

Evaluation of Y-Encoder {#sec:exp.y-encoder}
-----------------------

**Evaluation Setup.** We evaluate whether the JEPA architecture improves the `Y-Encoder` by following the uni-modal text-only (TOT) evaluation setup. We use the hard-negative benchmarks SugarCrepe++ [@10.5555/3737916.3738487] and VISLA [@dumpala2024vislabenchmarkevaluatingembedding]. These datasets test sensitivity to semantic and lexical changes in image descriptions. Each dataset contains triplets: two semantically similar descriptions of the same image ($p1$ and $p2$), and one negative description ($n$) created by altering attributes, relations, or objects. We compare `Y-Encoders` from different models by computing the cosine similarity for all description pairs. We check that the similarity between positives $sim(p1,p2)$ is higher than both the similarity between each positive and the negative $sim(p1,n)$ and $sim(p2,n)$. We report accuracy (%) across all samples.

**Results.** Table [\[tab:y-decoder\]](#tab:y-decoder){reference-type="ref" reference="tab:y-decoder"} shows the performance of different models on text hard-negative benchmarks. VL-JEPA$_\texttt{BASE}$ achieves a micro average accuracy of $63.9\%$ on SugarCrepe++ and $42.9\%$ on VISLA. This is higher than the best other models: PE-Core scores $58.6\%$ on SugarCrepe++ and SigLIP2 scores $40.4\%$ on VISLA. The finetuned VL-JEPA$_\texttt{SFT}$ model also achieves competitive results, with $58.4\%$ on SugarCrepe++ and $39.5\%$ on VISLA. These results indicate that VL-JEPA$_\texttt{BASE}$ has a `Y-Encoder` that is more resilient to text hard-negatives.

Ablation Study {#sec:exp.ablations}
--------------

**Evaluation Setup.** We study different design choices for VL-JEPA. Here we train all ablation models on the SFT stage data for 10K steps with a batch size of 512 (5M samples seen) and constant learning rate. We report average classification top-1 accuracy of 8 datasets (Tab. [\[tab:cls\_ret\]](#tab:cls_ret){reference-type="ref" reference="tab:cls_ret"}), average text-to-video retrieval recall\@1 of 8 datasets (Tab. [\[tab:cls\_ret\]](#tab:cls_ret){reference-type="ref" reference="tab:cls_ret"}), and average VQA accuracy of 4 datasets (CLEVR, GQA, TallyQA simple and complex). We report the results in Tab. [\[tab:ablations\]](#tab:ablations){reference-type="ref" reference="tab:ablations"}.

**Results.** **(a) Pretraining.** Dropping the first query-free pretraining stage on image and video captions significantly hurt performance, especially on classification (-21.7) and retrieval (-17.3). **(b) LR Multiplier.** The sweet point of learning rate multiplier to the Y-Encoder is around 0.05 to 0.10. Either faster or slower learning degrades the performance. **(c) Loss Function.** InfoNCE generally give superior performance compared to cosine, L1, and L2 losses, with the only exception being cosine loss outperform InfoNCE on VQA. However, only InfoNCE has the anti-collapse regularization and can be applied with unfrozen Y-Encoder. **(d) Predictor.** In terms of predictor size, more layers yield better performance, especially on VQA performance. We also see that if using the original causal attention instead of updating to bi-direction attention hurt VQA performance (-1.9), since query tokens are appended after visual tokens, and visual tokens are no longer able to attend to query tokens. Finally, we also see that LLama-3 initialization is beneficial to VQA performance, although vision-language alignment (classification and retrieval) is a bit worse compared to randomly initialized Transformer layers. **(e) Y-Encoder.** We tried different text encoder as the Y-Encoder, and confirmed that VL-JEPA works well with other embedding models than EmbeddingGemma-300M. Generally, larger encoder leads to better performance, with visually aligned text encoders (PE models) has significant advantage in classification and retrieval.

Related Works {#sec:related-works}
=============

**JEPA Models.** JEPA model learns by predicting the representation of a target input $Y$ from the representation of a context input $X$. Early instantiations include I-JEPA for image encoding [@assran2023self] and V-JEPA for video encoding [@bardes2023v], which demonstrated the effectiveness of this objective over pixel reconstruction approach in their respective modality. Recent JEPA work falls into two categories. One category of work emphasizes better unimodal representation learning [@assran2023self; @bardes2023v; @fei2023jepa] or cross-modal alignment [@lei2025m3; @jose2025dinotxt]. The other direction targets world modeling, where pretrained encoders are frozen and action-conditioned predictors are trained for conditional prediction of state representations [@zhou2025dino; @baldassarre2025back; @assran2025v; @terver2025drives]. This has shown good results but remains limited to narrow domains like mazes or robotic pick-and-place. Our proposed VL-JEPA is the first designed for general-purpose vision--language tasks. It performs conditional latent prediction over vision and text, and preserves efficiency while enabling flexible, multitask architecture.

**Vision Language Models.** Existing vision-language models largely fall into two families: (1) CLIP-style models with a non-predictive joint-embedding architecture (JEA) [@radford2021learning; @zhai2023sigmoid; @bolya2025perception; @liu2024remoteclip; @chen2023protoclip] encode images and texts independently into a common latent space, $X_V \!\mapsto\! S_V$ and $Y \!\mapsto\! S_Y$. By minimizing $\mathcal{L}_\texttt{CLIP} = D(S_V, S_Y)$ with a contrastive loss (*e.g.,* InfoNCE), CLIP learns aligned *representations* that support zero-shot classification and vision--language retrieval; (2) Generative VLMs [@liu2023visual; @chen2022pali; @dai2023instructblip; @alayrac2022flamingo; @chen2024visual; @cho2025perceptionlm; @beyer2024paligemma] connect a vision encoder [@radford2021learning; @fini2025multimodal] with a language model (*e.g.,* LLM). They are typically trained with $\mathcal{L}_\texttt{VLM} = D(\hat{Y}, Y)$, *i.e.,* next token prediction with cross-entropy loss, and can learn to handle various vision-text-to-text generation tasks such as VQA.

Our proposed VL-JEPA integrates the architectural advantages and task coverage of both CLIPs and VLMs (Table [\[tab:task\_coverage\]](#tab:task_coverage){reference-type="ref" reference="tab:task_coverage"}). Since VL-JEPA learns in embedding space, it can leverage web-scale noisy image--text pairs [@jia2021scaling], yielding strong open-domain features. On the other hand, VL-JEPA supports conditional generation tasks with a readout text decoder. Meanwhile, compared to generative VLMs that optimize directly in data space, VL-JEPA is more efficient at learning in the latent space. In addition, it is also more efficient for online inference, as it allows naturally selective decoding.

**Efficient Vision Language Models.** The growing size and training cost of VLMs has motivated efforts to improve efficiency. On the training side, strong performance can be achieved by updating only a subset of parameters, such as the vision--language connector [@tsimpoukelli2021multimodal; @alayrac2022flamingo; @vallaeys2024improved; @shukor2023ep; @koh2023groundingfromage; @merullo2023linearlylimber; @dai2023instructblip]. At inference, efficiency is pursued through pruning parameters or visual tokens [@cao2023pumer; @shukor2024skipping; @vasu2025fastvlm]. For real-time use cases, recent work explores small VLMs [@yao2024minicpm; @marafioti2025smolvlm] and heuristics to reduce query frequency in asynchronous inference [@shukor2025smolvla].

**Latent-space Language Modeling.** Current state-of-the-art LLMs are trained to decode and reason in text space using autoregressive generation and chain-of-thought prompting [@wei2023chainofthoughtpromptingelicitsreasoning]. Text-space LLMs have rapidly improved and now achieve strong results on a wide range of benchmarks. However, the discrete nature of their reasoning trace may limit both speed and performance in the long term. Several works have explored latent-space LLMs that process or reason in latent space, such as Large Concept Models [@lcmteam2024largeconceptmodelslanguage] and COCONUT [@hao2025traininglargelanguagemodels]. These models focus on unimodal latent-space reasoning. With VL-JEPA, our goal is to align vision and text representations in a shared multi-modal latent space. This approach aims to enable better abstractions and improve both the performance and speed of vision-language models (VLMs). We hope VL-JEPA will serve as a foundation for future work on multi-modal latent space reasoning, including visual chain-of-thought methods [@li2025imaginereasoningspacemultimodal].

Conclusion {#sec:conclusion}
==========

We have presented VL-JEPA, a new vision--language model built upon the joint embedding predictive architecture. By shifting supervision from discrete token space to continuous semantic embedding space, VL-JEPA simplifies the learning target, avoids redundant modeling of surface linguistic variability, and enables non-autoregressive prediction. Through controlled experiments, we show that VL-JEPA outperforms generative VLMs trained with cross-entropy loss under matched training data budget, while achieving superior training efficiency and significantly lower inference latency. Beyond generation tasks, the embedding-based design further allows VL-JEPA to handle open-vocabulary classification and cross-modal retrieval within a single unified architecture. Its ability to emit continuous semantic embeddings also makes it particularly well suited for real-time video applications, where selective decoding can improve both responsiveness and efficiency. In this work, we demonstrated the advantages of VL-JEPA over standard VLMs, particularly in computational efficiency, streaming applications, and video-language tasks. Our goal at this stage, is not to propose a universal alternative to VLMs, as this would require broader evaluation on tasks such as reasoning, tool use, and agentic behaviors where current token generative VLMs excel. Finally, although our results show clear benefits from scaling parameters and dataset size, we did not fully explore this direction, leaving it for future work.

Acknowledgments {#acknowledgments .unnumbered}
===============

We would like to thank Loïc Barrault, Lucas Beyer, Quentin Garrido, João Maria Janeiro, Yifu Qiu, Koustuv Sinha, Basile Terver, and François Yvon for providing valuable feedback and support to this work. We thank Adrien Bardes for detailed V-JEPA 2 baseline performance on EK-100 action anticipation. We acknowledge Alejandro Castillejo Muñoz for his efforts on real-time demonstration and qualitative evaluation of VL-JEPA.
