Octava: regresión logística: diferencia entre fmincg y fminunc

Question

Octava: regresión logística: diferencia entre fmincg y fminunc

A menudo uso fminunc para un problema de regresión logística. He leído en la web que Andrew Ng usa fmincg en lugar de fminunc, con los mismos argumentos. Los resultados son diferentes, y a menudo fmincg es más exacto, pero no demasiado. (Estoy comparando los resultados de la función fmincg fminunc contra los mismos datos)

Entonces, mi pregunta es: ¿cuál es la diferencia entre estas dos funciones? ¿Qué algoritmo ha implementado cada función? (Ahora, solo uso estas funciones sin saber exactamente cómo funcionan).

Gracias:)

39

algorithm machine-learning neural-network octave

Author: Nishant, 2012-08-24

Source

6 answers

En contraste con otras respuestas que sugieren que la principal diferencia entre fmincg y fminunc es la precisión o la velocidad, quizás la diferencia más importante para algunas aplicaciones es la eficiencia de la memoria. En el ejercicio de programación 4 (es decir, Entrenamiento de Redes Neuronales) de la clase de Aprendizaje Automático de Andrew Ng en Coursera, el comentario en ex4.m acerca de fmincg es

%% =================== Parte 8: la Formación NN ===================
% Ahora ha implementado todo el código necesario para entrenar a un neuronal
% red. Para entrenar su red neuronal, ahora usaremos" fmincg", que
% es una función que funciona de manera similar a "fminunc". Recordemos que estos
% optimizadores avanzados son capaces de entrenar nuestras funciones de costo de manera eficiente como
% siempre y cuando les proporcionemos los cálculos de gradiente.

Al igual que el póster original, también tenía curiosidad sobre cómo los resultados de ex4.m puede diferir usando fminunc en lugar de fmincg. Así que traté de reemplazar el fmincg call

options = optimset('MaxIter', 50);
[nn_params, cost] = fmincg(costFunction, initial_nn_params, options);

Con la siguiente llamada a fminunc

options = optimset('GradObj', 'on', 'MaxIter', 50);
[nn_params, cost, exit_flag] = fminunc(costFunction, initial_nn_params, options);

Pero recibió el siguiente mensaje de error de una compilación de 32 bits de Octave ejecutándose en Windows:

Error: memoria agotada o tamaño solicitado demasiado grande para el rango del índice de octava escriba {tratando de volver a prompt

Una compilación de 32 bits de MATLAB que se ejecuta en Windows proporciona un mensaje de error más detallado:

Error al usar find
Sin memoria. Escriba la MEMORIA DE AYUDA para su opcion.
Error en spones (línea 14)
[i, j] = find(S);
Error en el color (línea 26)
J = spones (J);
Error en sfminbx (línea 155)
grupo = color (Hstr,p);
Error en fminunc (línea 408)
[x,FVAL,~,EXITFLAG,OUTPUT,GRAD,HESSIAN] = sfminbx(funfcn,x,l, u, ...
Error en ex4 (línea 205)
[nn_params, cost, exit_flag] = fminunc (costFunction, initial_nn_params, options);

El comando de memoria MATLAB en mi computadora portátil informes:

Matriz máxima posible: 2046 MB (2.146 e + 09 bytes) *
Memoria disponible para todas las matrices: 3402 MB (3.568 e + 09 bytes) * *
Memoria utilizada por MATLAB: 373 MB (3.910 e+08 bytes)
Memoria física (RAM): 3561 MB (3.734 e+09 bytes)
* Limitado por espacio de dirección virtual contiguo disponible.
** Limitado por el espacio de dirección virtual disponible.

Anteriormente estaba pensando que el profesor Ng eligió usar fmincg para entrenar el ex4.m red neuronal (que tiene 400 funciones de entrada, 401 incluyendo la entrada de sesgo) para aumentar la velocidad de entrenamiento. Sin embargo, ahora creo que su razón para usar fmincg fue aumentar la eficiencia de la memoria lo suficiente como para permitir que el entrenamiento se realice en compilaciones de 32 bits de Octave/MATLAB. Una breve discusión sobre el trabajo necesario para obtener una compilación de Octave de 64 bits que se ejecuta en el sistema operativo Windows es aquí.

19

Author: gregS,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2015-06-07 17:39:16

Según el propio Andrew Ng, fmincg no se usa para obtener un resultado más preciso (recuerde, su función de costo será la misma en cualquier caso, y su hipótesis no es más simple o más compleja), sino porque es más eficiente para hacer el descenso de gradiente para hipótesis especialmente complejas. Él mismo parece usar fminunc donde la hipótesis tiene pocas características, pero fmincg donde tiene cientos.

10

Author: Edward Dixon,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2016-01-30 17:53:08

¿Por qué funciona fmincg?

Aquí hay una copia del código fuente con comentarios que explican los diversos algoritmos utilizados. Es extraordinario, ya que hace lo mismo que el cerebro de un niño cuando aprende a discriminar entre perro y silla.

Esta es la fuente de octava para fmincg.m.

function [X, fX, i] = fmincg(f, X, options, P1, P2, P3, P4, P5)
% Minimize a continuous differentialble multivariate function. Starting point
% is given by "X" (D by 1), and the function named in the string "f", must
% return a function value and a vector of partial derivatives. The Polack-
% Ribiere flavour of conjugate gradients is used to compute search directions,
% and a line search using quadratic and cubic polynomial approximations and the
% Wolfe-Powell stopping criteria is used together with the slope ratio method
% for guessing initial step sizes. Additionally a bunch of checks are made to
% make sure that exploration is taking place and that extrapolation will not
% be unboundedly large. The "length" gives the length of the run: if it is
% positive, it gives the maximum number of line searches, if negative its
% absolute gives the maximum allowed number of function evaluations. You can
% (optionally) give "length" a second component, which will indicate the
% reduction in function value to be expected in the first line-search (defaults
% to 1.0). The function returns when either its length is up, or if no further
% progress can be made (ie, we are at a minimum, or so close that due to
% numerical problems, we cannot get any closer). If the function terminates
% within a few iterations, it could be an indication that the function value
% and derivatives are not consistent (ie, there may be a bug in the
% implementation of your "f" function). The function returns the found
% solution "X", a vector of function values "fX" indicating the progress made
% and "i" the number of iterations (line searches or function evaluations,
% depending on the sign of "length") used.
%
% Usage: [X, fX, i] = fmincg(f, X, options, P1, P2, P3, P4, P5)
%
% See also: checkgrad
%
% Copyright (C) 2001 and 2002 by Carl Edward Rasmussen. Date 2002-02-13
%
%
% (C) Copyright 1999, 2000 & 2001, Carl Edward Rasmussen
%
% Permission is granted for anyone to copy, use, or modify these
% programs and accompanying documents for purposes of research or
% education, provided this copyright notice is retained, and note is
% made of any changes that have been made.
%
% These programs and documents are distributed without any warranty,
% express or implied.  As the programs were written for research
% purposes only, they have not been tested to the degree that would be
% advisable in any important application.  All use of these programs is
% entirely at the user's own risk.
%
% [ml-class] Changes Made:
% 1) Function name and argument specifications
% 2) Output display
%

% Read options
if exist('options', 'var') && ~isempty(options) && isfield(options, 'MaxIter')
    length = options.MaxIter;
else
    length = 100;
end

RHO = 0.01;                            % a bunch of constants for line searches
SIG = 0.5;       % RHO and SIG are the constants in the Wolfe-Powell conditions
INT = 0.1;    % don't reevaluate within 0.1 of the limit of the current bracket
EXT = 3.0;                    % extrapolate maximum 3 times the current bracket
MAX = 20;                         % max 20 function evaluations per line search
RATIO = 100;                                      % maximum allowed slope ratio

argstr = ['feval(f, X'];                      % compose string used to call function
for i = 1:(nargin - 3)
  argstr = [argstr, ',P', int2str(i)];
end
argstr = [argstr, ')'];

if max(size(length)) == 2, red=length(2); length=length(1); else red=1; end
S=['Iteration '];

i = 0;                                            % zero the run length counter
ls_failed = 0;                             % no previous line search has failed
fX = [];
[f1 df1] = eval(argstr);                      % get function value and gradient
i = i + (length<0);                                            % count epochs?!
s = -df1;                                        % search direction is steepest
d1 = -s'*s;                                                 % this is the slope
z1 = red/(1-d1);                                  % initial step is red/(|s|+1)

while i < abs(length)                                      % while not finished
  i = i + (length>0);                                      % count iterations?!

  X0 = X; f0 = f1; df0 = df1;                   % make a copy of current values
  X = X + z1*s;                                             % begin line search
  [f2 df2] = eval(argstr);
  i = i + (length<0);                                          % count epochs?!
  d2 = df2'*s;
  f3 = f1; d3 = d1; z3 = -z1;             % initialize point 3 equal to point 1
  if length>0, M = MAX; else M = min(MAX, -length-i); end
  success = 0; limit = -1;                     % initialize quanteties
  while 1
    while ((f2 > f1+z1*RHO*d1) | (d2 > -SIG*d1)) & (M > 0)
      limit = z1;                                         % tighten the bracket
      if f2 > f1
        z2 = z3 - (0.5*d3*z3*z3)/(d3*z3+f2-f3);                 % quadratic fit
      else
        A = 6*(f2-f3)/z3+3*(d2+d3);                                 % cubic fit
        B = 3*(f3-f2)-z3*(d3+2*d2);
        z2 = (sqrt(B*B-A*d2*z3*z3)-B)/A;       % numerical error possible - ok!
      end
      if isnan(z2) | isinf(z2)
        z2 = z3/2;                  % if we had a numerical problem then bisect
      end
      z2 = max(min(z2, INT*z3),(1-INT)*z3);  % don't accept too close to limits
      z1 = z1 + z2;                                           % update the step
      X = X + z2*s;
      [f2 df2] = eval(argstr);
      M = M - 1; i = i + (length<0);                           % count epochs?!
      d2 = df2'*s;
      z3 = z3-z2;                    % z3 is now relative to the location of z2
    end
    if f2 > f1+z1*RHO*d1 | d2 > -SIG*d1
      break;                                                % this is a failure
    elseif d2 > SIG*d1
      success = 1; break;                                             % success
    elseif M == 0
      break;                                                          % failure
    end
    A = 6*(f2-f3)/z3+3*(d2+d3);                      % make cubic extrapolation
    B = 3*(f3-f2)-z3*(d3+2*d2);
    z2 = -d2*z3*z3/(B+sqrt(B*B-A*d2*z3*z3));        % num. error possible - ok!
    if ~isreal(z2) | isnan(z2) | isinf(z2) | z2 < 0   % num prob or wrong sign?
      if limit < -0.5                               % if we have no upper limit
        z2 = z1 * (EXT-1);                 % the extrapolate the maximum amount
      else
        z2 = (limit-z1)/2;                                   % otherwise bisect
      end
    elseif (limit > -0.5) & (z2+z1 > limit)          % extraplation beyond max?
      z2 = (limit-z1)/2;                                               % bisect
    elseif (limit < -0.5) & (z2+z1 > z1*EXT)       % extrapolation beyond limit
      z2 = z1*(EXT-1.0);                           % set to extrapolation limit
    elseif z2 < -z3*INT
      z2 = -z3*INT;
    elseif (limit > -0.5) & (z2 < (limit-z1)*(1.0-INT))   % too close to limit?
      z2 = (limit-z1)*(1.0-INT);
    end
    f3 = f2; d3 = d2; z3 = -z2;                  % set point 3 equal to point 2
    z1 = z1 + z2; X = X + z2*s;                      % update current estimates
    [f2 df2] = eval(argstr);
    M = M - 1; i = i + (length<0);                             % count epochs?!
    d2 = df2'*s;
  end                                                      % end of line search

  if success                                         % if line search succeeded
    f1 = f2; fX = [fX' f1]';
    fprintf('%s %4i | Cost: %4.6e\r', S, i, f1);
    s = (df2'*df2-df1'*df2)/(df1'*df1)*s - df2;      % Polack-Ribiere direction
    tmp = df1; df1 = df2; df2 = tmp;                         % swap derivatives
    d2 = df1'*s;
    if d2 > 0                                      % new slope must be negative
      s = -df1;                              % otherwise use steepest direction
      d2 = -s'*s;
    end
    z1 = z1 * min(RATIO, d1/(d2-realmin));          % slope ratio but max RATIO
    d1 = d2;
    ls_failed = 0;                              % this line search did not fail
  else
    X = X0; f1 = f0; df1 = df0;  % restore point from before failed line search
    if ls_failed | i > abs(length)          % line search failed twice in a row
      break;                             % or we ran out of time, so we give up
    end
    tmp = df1; df1 = df2; df2 = tmp;                         % swap derivatives
    s = -df1;                                                    % try steepest
    d1 = -s'*s;
    z1 = 1/(1-d1);
    ls_failed = 1;                                    % this line search failed
  end
  if exist('OCTAVE_VERSION')
    fflush(stdout);
  end
end
fprintf('\n');

7

Author: Eric Leschinski,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2015-02-25 04:18:52

Fmincg es más preciso que fminunc. Tiempo tomado por ambos es casi same.In Red neuronal o en general que tiene más no de pesos fminunc puede dar fuera de la memoria error.So fmincg es más eficiente en memoria.

Usando fminunc, la precisión es 93.10 y el tiempo requerido es 11.3794 segundos.

Testing lrCostFunction() with regularization
Cost: 2.534819
Expected cost: 2.534819
Gradients:
 0.146561
 -0.548558
 0.724722
 1.398003
Expected gradients:
 0.146561
 -0.548558
 0.724722
 1.398003
Program paused. Press enter to continue.

Training One-vs-All Logistic Regression...
id = 1512324857357
Elapsed time is 11.3794 seconds.
Program paused. Press enter to continue.

Training Set Accuracy: 93.100000

Usando fmincg, la precisión es de 95.12 y el tiempo requerido es de 11.7978 segundos.

Testing lrCostFunction() with regularization
Cost: 2.534819
Expected cost: 2.534819
Gradients:
 0.146561
 -0.548558
 0.724722
 1.398003
Expected gradients:
 0.146561
 -0.548558
 0.724722
 1.398003
Program paused. Press enter to continue.

Training One-vs-All Logistic Regression...
id = 1512325280047
Elapsed time is 11.7978 seconds.

Training Set Accuracy: 95.120000

1

Author: Goyal Vicky,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2017-12-03 18:50:53

Fmincg utiliza el método de gradiente conjugado

Si nos fijamos en la imagen de este enlace, verá que el método CG converge mucho más rápido que fminunc, pero asume una serie de restricciones que creo que no son necesarias en el método fminunc (BFGS) (vectores conjugados vs vectores no conjugados).

En otras palabras, el método fmincg es más rápido pero más grueso que fminunc, por lo que funciona mejor para matrices enormes (muchas características, como miles) que para matrices más pequeñas con cientos de características. Espero que esto ayude.

0

Author: Sv Vr,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2018-03-05 04:16:41

score 41 · Accepted Answer

Tendrás que mirar dentro del código de fmincg porque no es parte de Octave. Después de algunas búsquedas encontré que es un archivo de función proporcionado por la clase de Aprendizaje Automático de Coursera como parte de la tarea. Lea los comentarios y respuestas sobre esta pregunta para una discusión sobre los algoritmos.