%% Introduction to MATLAB for Functional Data Analysis % % Author: Giles Hooker % % Last Modified: 24/01/07 % % This file represents Lecture 2 in the BTRY 694: Functional Data Analysis. % It is intended to provide an introduction to Matlab at the same time as % illustrating some data-exploration techniques for Functional Data % Analysis. % % To begin with, note that Matlab has a text editing window (this one) in % which you can write commands and programs. As will all programming % languages, programs can be commented -- Matlab will ignore anything after % a '%' on the line. You should always comment liberally; for your own sake % as well as those who read your code. % % Matlab will also publish commented files to HTML; use '%%' at a line to % create a heading. Any comments following a heading will also make use of % LaTeX comments, so that I can write % % y = f(x) + \epsilon % % and have it come out nicely in HTML! See Help (press F1) on Cell Mode and % Publishing. % % Two useful keys to know are F9, which will enter highlighted commands % into the MATLAB window, and F5 which will run all commands in the file. %% Loading, Generating and Saving Data % For help on any function, you can look in the official help (press F1) or % you can type 'help ' at the prompt. For example help load % First Load a data set. load 'handwriting.mat' % MATLAB stores data in a binary .mat format. You can force it to load a % tab delimited file by the option load -ASCII 'handwritingx.dat' % Alternatively, you can use a user interface to make sure you are getting % the file correct: uiload % Load enters variables into MATLAB memory, to view your workspace type who % or for more details whos % If you have generated data and want to save it save 'handwriting.mat' % If you want to only save specific variables save 'handwritingx.mat' fdarrayx % To save as a text file save -ASCII 'fdarrayx' % Some tips for generating data: % To create a sequence x = 1:10 % Integers between 0 and 10 % Note that most commands result in MATLAB outputting the result of that % command. This can be supressed by: x = 1:10; % A colon on the end of the line never hurts, we can always look at a % variable by typing x % Sequences can be incremented in non-unit intervals x = 0:2.5:10 x = linspace(1,10,4) % Matlab treats variables as arrays and indexes these as follows x(1) x(3:4) % I can also build data myself x = [1 2; 3 4] % Note here that the ';' indicates a new row in the array. To index % multidimensional arrays, I can take x(1,2) % I can also take individual rows or columns. x(:,1) x(1,:) % This notation can be extended to multi-dimensional arrays. % I can also look at the size of x size(x) numel(x) % And if I want to get rid of some entries x(:,1) = []; % I can generate random data x = rand(10,2) % uniform x = randn(10,3) % normal % The function 'random' will do many more. help(random) % A couple of other useful functions A = ones(3,3) B = zeros(3,3) C = eye(3,3) % Matlab also allows multi-dimensional arrays D = zeros(2,2,2) % A standard convention: E = ones([2 2 2]) % Matlab allows vectors of integers as indices: f = 1:10; f(1:2:9) %% Operations and Arithmetic % Matlab performs addition and subtraction component-wise on arrays. G = A + B + C % It also performs scalar multiplication and division this way G/pi % But '*', '/', '^' call matrix operations G*B % Component operations can be called with '.*' etc G.^2 % Reshaping H = reshape(1:10,5,2) % Transpose H' %% Exploring Functional Data size(fdarray) % Separate the dimensions xdata = fdarray(:,:,1); ydata = fdarray(:,:,2); % Time: fdatime = linspace(0, 2.3, 1401)'; % plot a line plot(fdatime,xdata(:,1)) % take a closer look plot(fdatime(1:100),xdata(1:100,1),'.') % what about all the replications? plot(fdatime(1:100),xdata(1:100,:)) % How to I plot both sets through time? subplot(2,1,1) plot(fdatime,xdata) subplot(2,1,2) plot(fdatime,ydata) % Alternatively: plot(xdata,ydata) % I can add labels xlabel('x position','fontsize',20) ylabel('y position','fontsize',20) title('Handwriting Data','fontsize',20) % Now lets look at some pointwise statistics: xmean = mean(xdata); % Mean size(xmean) % Ooops! xmean = mean(xdata,2); ymean = mean(ydata,2); subplot(2,1,1) plot(fdatime,xmean,'LineWidth',2) hold on plot(fdatime,ymean,'r','LineWidth',2) hold off % OK, now lets look at variance xvar = var(xdata,[],2); yvar = var(ydata,[],2); subplot(2,1,2) plot(fdatime,xvar,'c--','LineWidth',2); hold on plot(fdatime,yvar,'g--','LineWidth',2); hold off % I wonder if they have any relationship plot(xmean,xvar) hold on plot(ymean,yvar,'r') hold off % OK, what about more sophisticated analyses? sub_samp = 1:8:800; % Reduce for the sake of computing time and visualization xdata_s = xdata(sub_samp,:); ydata_s = ydata(sub_samp,:); fdatime_s = fdatime(sub_samp,:); fda_x_cor = corr(xdata_s'); fda_y_cor = corr(ydata_s'); fda_xy_cor = corr(xdata_s',ydata_s'); % Now lets look at the correlation process: surf(fdatime_s,fdatime_s,fda_x_cor) % Alternatively, a contour plot is sometimes more helpful: contourf(fdatime_s,fdatime_s,fda_x_cor) % I'll make sense of this by looking at the x-process as well subplot(2,1,1) contourf(fdatime_s,fdatime_s,fda_x_cor) subplot(2,1,2) plot(fdatime_s,xdata_s) % I need to register the second plot to have the same axis axis([fdatime_s(1) fdatime_s(100) -0.04 0.02]) % Now do the same with y subplot(2,1,1) contourf(fdatime_s,fdatime_s,fda_y_cor) subplot(2,1,2) plot(fdatime_s,ydata_s) axis([fdatime_s(1) fdatime_s(100) -0.04 0.04]) % Now lets look at cross correlation subplot(2,2,1) plot(xdata_s,fdatime_s) axis([-0.04 0.02 fdatime_s(1) fdatime_s(100)]) subplot(2,2,4) plot(fdatime_s,ydata_s) axis([fdatime_s(1) fdatime_s(100) -0.04 0.04]) subplot(2,2,2) contourf(fdatime_s,fdatime_s,fda_xy_cor); subplot(2,2,3) plot(xdata_s,ydata_s) % I like that graph so much that I think I'll save it print -dpng 'handwriting_crosscorr' % A bit of multivariate analysis will formalize what's going on here. % First I want to look at covariance xcov = cov(xdata_s'); ycov = cov(ydata_s'); % Eigen-analysis of correlation matrices % x first [vx,dx] = eig(xcov); dx = diag(dx); plot(dx) dx(91:100)/sum(dx) xmean_s = xmean(sub_samp); plot(fdatime_s,xmean(sub_samp)) hold on plot(fdatime_s,xmean_s+sqrt(dx(100))*vx(:,100),'r') plot(fdatime_s,xmean_s-sqrt(dx(100))*vx(:,100),'r--') plot(fdatime_s,xmean_s+sqrt(dx(99))*vx(:,99),'c') plot(fdatime_s,xmean_s-sqrt(dx(99))*vx(:,99),'c--') hold off % And I can do the same thing with y [vy,dy] = eig(ycov); dy = diag(dy); plot(dy) dy(91:100)/sum(dy) ymean_s = ymean(sub_samp); plot(fdatime_s,ymean(sub_samp)) hold on plot(fdatime_s,ymean_s+sqrt(dy(100))*vy(:,100),'r') plot(fdatime_s,ymean_s-sqrt(dy(100))*vy(:,100),'r--') plot(fdatime_s,ymean_s+sqrt(dy(99))*vy(:,99),'c') plot(fdatime_s,ymean_s-sqrt(dy(99))*vy(:,99),'c--') hold off % And, of course, I can put them together plot(xmean_s,ymean_s) hold on plot(xmean_s+sqrt(dx(100))*vx(:,100),ymean_s+sqrt(dy(100))*vy(:,100),'r') plot(xmean_s+sqrt(dx(100))*vx(:,100),ymean_s-sqrt(dy(100))*vy(:,100),'r--') % Now lets look at some derivatives. % First consider a difference dxdata = diff(xdata,1)/0.0016; dydata = diff(ydata,1)/0.0016; fdatime_d = fdatime(1:1400); % What does this look like? plot(fdatime_d,dydata) % Wow; is it really that noisy? plot(fdatime(1:100),dydata(1:100,1)) % Alright, perhaps we can smooth this out by averaging: dxmean = mean(dxdata,2); dymean = mean(dydata,2); plot(fdatime_d,dxmean) % Now, looking back at that variance: plot(dxmean,xvar(1:1400),'.') hold on plot(dymean,yvar(1:1400),'r.') % What about acceleration? d2xdata = diff(dxdata,1)/0.0016; d2ydata = diff(dydata,1)/0.0016; fdatime_d2 = fdatime(1:1399); plot(fdatime_d2,d2ydata) % Aaaaugh! plot(fdatime(1:100),d2ydata(1:100,1)) % So the lesson is that differencing can be disasterous numerically; and % even worse statistically! d2xmean = mean(d2xdata,2); d2ymean = mean(d2ydata,2); plot(fdatime_d2,d2ymean); %% More Matlab -- loops % % Matlab provides the standard loop options. % I could, for example obtain the pointwise mean in the following way: [N,n] = size(xdata) xmean2 = zeros(N,1); for i = 1:N j = 1; while j <= n xmean2(i) = j/(j+1)*xmean2(i) + 1/(j+1)*xdata(i,j); j = j + 1; end end % see also if, switch statements. %% More Matlab -- functions A = rand(500); B = rand(500); C = mat_prod(A,B); % This takes a while -- reasons to beware of loops; Matlab is very good at % vector operations. D = A*B; %% Other Data Structures % Structs and Cells are like lists in R. % Structs contain named fields, for example: prod.A = randn(2,2); prod.B = randn(2,3); prod.fn = @mat_prod; prod.P = prod.fn(prod.A,prod.B); prod % These are frequently used to reduce the number of intputs to a function xbundle.xdata = xdata; xbundle.xmean = xmean; xbundle.dxdata = dxdata; xbundle.dxmean = dxmean; xbundle.time = fdatime; plot(xbundle.xdata); % They can also be used to define other classes of objects. the FDA package % will make extensive (often implicit) use of these. % Cell arrays allow arbitrary objects to be stored in indexed arrays. The % main piece of notation to be aware of here is that '( )' now refers to % the cell location, whereas '{ }' refers to cell contents. prodcell = cell(2,2); prodcell{1,1} = A; prodcell(1,2) = {B}; prodcell(2,:) = {P}; prodcell prodcell(2,:) = {@mat_prod,P}; prodcell % These can be usefull for storing data with unequal lengths or collections % of other objects that don't fit easily into arrays.