The Wayback Machine - https://web.archive.org/web/20160514130116/https://www.cs.odu.edu/~zeil/cs252/s15/Public/ftp/

File Transfer

Steven Zeil

Last modified: Jan 19, 2015

Contents:
1. Before You Start: Understanding Transfer Modes
1.1 Text Files versus Binary Files
1.2 Identifying the File Contents
1.3 Transfer Modes
2. Transferring Files Across the ODU Local Network: Samba
3. Transferring Files Across the Internet: ftp
3.1 Anonymous versus Private FTP
3.2 Transferring Files
4. Secure File Transfer: sftp and scp
5. Problems and Inconsistencies

If you prepare files on one machine but want to use them on another, you need some means of transferring them. For example, if you edit files on your home PC or on a PC at one of the Teletechnet sites, you will eventually need to get those files onto the CS Department network. On the other hand, you may want to take files your instructor has provided off of that network for use on your home PC.

1. Before You Start: Understanding Transfer Modes

A complicating factor in transferring files from one computer to another is that you must decide whether the files you want to transfer should be treated as “text” or as “ binary”.

1.1 Text Files versus Binary Files

All files are containers of bytes. But in many files, these bytes are intended to represent text:

Such files are referred to as text files. The way in which text is encoded as numbers in a text file is governed by various international standards, which we’ll look at in just a moment. Because so many programs observe these standards, you can safely manipulate text files with a wide range of different text-oriented programs, even ones that the original creator of the text file might not have intended or might not have known about.

In other files, the numbers really are numbers. They encode data in a more complicated fashion, not line by line of text. These are called binary files. The way in which information is encoded into a binary file is entirely determined by the programs (and their programmers) intended for use with the file. Manipulating a binary file with any program not intended for that specific kind of encoding can be disastrous.

Variations among Text Files

ASCII

Since the late 1960’s, most text files were encoded using the ASCII character set. This encoding uses the numbers 0–127 and so fits comfortable into an 8-bit byte (which can actually hold the numbers 0–255). The numbers 32–126 denote the ASCII characters, including the blank space, upper and lower-case letters, numbers, and punctuation characters. Numbers 0–31 and 127 are used for various control characters_[1]. Originally, these control characters were used to “control” output device behavior. For example CR (carriage return) cause a printer to move its print head to the leftmost column of a page. LF (line feed) caused a printer to move one line down on the page.

In modern usage, only a few of these control characters will appear in a text file. TAB characters are common, and CR and LF are, as we shall see, used as line terminators. FF (form feed) character were originally used to tell a printer to feed in a new form (page) and are still occasionally used to indicate the start of a new page.

One of the ways to tell if a file is intended to be an ASCII text file is to look for characters that fall outside the normal range of characters. If it has bytes containing numbers 128–255, it definitely is not ASCII text. If it has bytes containing numbers in the range 0–31, other than 10 (LF), 11 (TAB), 12 (FF), or 13 (CR), it probably is not ASCII text.

ASCII Line Termination

Even if you have ASCII text in a file, there is some variation on how to encode the file. Windows and other operating systems disagree on how to divide an ASCII text file into lines.

This means that ASCII text files created in Windows tend to look, to a Unix program, as if they have extra ^M characters near the end of each line. ASCII files created in Unix, on the other hand, tend to look to a Windows program as if the entire file contains only one line with odd ^J characters sprinkled inside.

Unicode

Although ASCII has been queen of the text encoding world for most of the history of computing, it has limitations. Modern applications need many more than the 96 printable characters available in ASCII. Unicode is an international standard encoding that uses multiple bytes per character to extend ASCII (the 128 ASCII characters are preserved in Unicode at their original numeric values), adding characters from international alphabets, mathematical and musical symbols, simple graphics, and a variety of other “utility” characters.

Unicode actually has multiple ways of encoding these extensive characters. One Unicode encoding simply uses two bytes per character, for a total of 65238 possible characters. But since most text files are still heavily oriented towards ASCII (0–127), this doubles the size of the typical text file with a lot of zero bytes. So another popular encoding (called "“UTF–8”) uses 1 byte to represent an ASCII character, with a special non-ASCII byte value used to signal that the next character coming will be a multi-byte Unicode value.

Like ASCII text files, lines in Unicode text files can be terminated by LF or by a CR-LF sequence, depending on the operating system. Unicode also introduces a its own optional non-ASCII control characters to signal the end of a line and the end of a paragraph. Given this many options, though, it’s generally safe to assume that any program sophisticated enough to handle Unicode will be able to cope with any of the multiple options for line termination a file might employ.

1.2 Identifying the File Contents

How can you tell if a file is text or binary?

Let’s get one thing out of the way right now:

You can not tell if a text is text or binary by double-clicking on it in an operating system window to open it up. Launching a file in this way simply runs whatever program the operating system believes is most appropriate to that file. That program may very well show you text, but that doesn’t mean the information was encoded as a text file. On the other hand, the program might not show you graphics with text at all, but that does not mean that the graphics were not drawn from a description written in ASCII text.

You can get a hint as to whether a file is text or binary by looking at the file extension (the 2 or 3 letters after the final ‘.’ in a file name).

But that’s only a hint. There are so many programs in the world that some are bound to use the same extensions.

In Linux, the best way to see what kind of data is in a file is to use the file command, e.g.,

file mystery.dat

The file command will print a description fo the file contents. This description can be a few lines long. If the file contains ASCII text or Unicode text, this will be stated explicitly as part of the description. If the file is ASCII text but with Windows-style line termination, it will state that as well.

In Windows, your best way to see if a file is text or binary is to open it in a text editor such as NotePad (not in a word processor such as Word).

1.3 Transfer Modes

Some file transfer programs allow you to choose between transfer modes: text (ASCII) mode or binary mode.

In a text-mode transfer, the program doing the transfer compares the operating system of the local machine you are running on with the operating system of the remote machine that you are transferring to or from. If those operating systems use different line termination conventions, the transferred lines are modified accordingly during the transfer.

In a binary-mode transfer, the files are transferred exactly, with no changes, regardless of whether the operating systems of the two machines match or not.

When in doubt, transfer in binary mode. If you do a transfer in binary mode and then discover that you have a text file with the wrong line terminators, you can correct that on the Unix side. On the other hand, if you transfer in text/ASCII mode and the file transferred is actually binary, you will wind up with a corrupted, unusable file that cannot be repaired.

2. Transferring Files Across the ODU Local Network: Samba

If you are sitting at a Windows PC that is part of either the CS Dept’s own local network (e.g., in the CS Dept. labs or connected to the CS Dept’s wireless network) or part of the ODU ITS local network (most ODU computer labs on the Norfolk, Virginia Beach, and Peninsula Gradate Center campuses), then you can access your Unix account directories directly from within Windows. This is because the CS Dept. Unix file servers run a service called “Samba”, a program that mediates file access between UNIX and other systems.

Again, let me emphasize that Samba only works on a local network. If you are connecting to the campus via the Internet, forget Samba.

To use Samba, you might not need to do anything at all. If you are logged in to a CS Dept PC and you have a Z: drive mapped, that is actually a Samba connection to your Unix home directory.

If you have no such drive, use the Windows “Start->Run” button to run

\\userdata.cs.odu.edu\undergrad\your-login-name

(Graduate students would use “grad” instead of “undergrad” in the path above.) You may be prompted for a password, or a login name and password. Supply your CS Unix login/password and, if all is well, a Windows Explorer window should open displaying the contents of your Unix home directory. You can now manipulate files in this Window just as you would in any Windows directory/folder, but the changes are occurring in your Unix directory.

Now that we know that it works, we can make this whole process more convenient by mapping a Windows drive letter to your Unix account, giving you a “fake” disk drive that actually accesses your Unix files. From inside any Windows Explorer or “ My Computer” window, select “Tools” (or right-click on the “My Computer” icon) and select “ Map Network Drive…”. Select an unused drive letter, and enter that same address/command string as in the last step for the “ Folder”. Make sure the “Reconnect at logon” box is checked. Finally, if your login name for logging into the PC is different from your CS Unix login name, look for a “Connect using a different user name” link, click on that and supply your Unix login information. Click on OK/Finish and within a few seconds, you should have a new drive available that actually maps onto your Unix account.

Two things to keep in mind when using Samba to access your Unix files from Windows:

3. Transferring Files Across the Internet: ftp

FTP (File Transfer Protocol) is the mechanism for transferring files over the Internet. Although most browsers provide some support for FTP, they usually only permit downloads (from the remote machine to your local PC) and usually only permit access to public repositories, not to password-protected accounts.

You can only use FTP to transfer files to or from a machine that has been set up as an FTP server. In the ODU CS Dept., we have one such machine: ftp.cs.odu.edu.

3.1 Anonymous versus Private FTP

When you connect to an FTP server, you will be prompted for a login name and password. Thus, in the “normal” mode of FTP, you must have an account on the server system.

But some servers also have been set up to provide files to the public at large. By convention, servers that allow this do so by recognizing a special login name, “anonymous”, for which almost any password is accepted. Again, by convention, users who log in as “anonymous” are expected to supply their own email address as the password. [2]

You can use web browsers to do anonymous ftp downloads. If you want to find, for example, a file named foo.txt in directory /pub/repository/textfiles/ on an ftp server ftp.server.net, you can direct a web browser to ftp://ftp.server.net/pub/repostory/textfiles/foo.txt to view the file immediately, or to ftp://ftp.server.net/pub/repostory/textfiles/ to view the directory, after which you can right-click on the desired file and select an optin to save it on your local machine.

A web browser directed to an ftp:// URL will do an anonymous ftp login on your behalf.

On the CS Dept server, ftp.cs.odu.edu, you can use ftp:// URLs or you can log in as “anonymous” to gain access to the public area. You cannot, however, access your own files from there.

Alternatively, you can log in with your own Unix account login name and password, in which case you will have access to your own files and directories on the Unix network, but cannot access the public area.

This helps explain why you need to learn to use FTP rather than relying on your web browser for file transfer. Web browsers, when directed to an “ftp://” URL, do an anonymous login. If you need access to your own files and directories, that’s no help.

3.2 Transferring Files

To transfer files via FTP, you need to have an FTP client program on the machine at which you are sitting. In a sense, this is very similar to ssh, where you run an “ssh client” program on your local machine and use it to issue commands to a remote ssh server. For that matter, it’s not unlike fetching a web page from a remote web server, which requires you to run a web client (a.k.a., a browser) on your local machine. The ftp client program also runs on your local machine and allows you to issue commands to a remote one, but these commands are all related to file transfers.

Not surprisingly, the way you actually launch and run the client depends upon just which client you have. Most versions of MS Windows comes with an FTP client, called ftp. This is a bare-bones, text interface version of FTP. It lacks proper support “passive mode”, an underlying part of the FTP protocol for working through firewalls and/or routers on many local networks.

The CygWin project and the author of PuTTY provides similar FTP clients, which uses the same text-based interface. Both will work through common firewall/router setups.

There are a variety of FTP clients, commercial and freeware, that offer GUI ("Graphic User Interface) displays. You’ll find some recommended ones listed on the course Library page.

GUI interfaces can simplify FTP use, but these interfaces may create problems by hiding the text/binary mode settings, leading to corrupt transfers. Still, you may find the GUI-based packages recommended on the Library pages much easier to work with.

FTP via a Text Interface

To use the MS Windows ftp, click the “Start” button, select “Run” , and for the program name type

ftp ftp.cs.odu.edu 

(Assuming you want to connect to the CS Dept. server. If you have reason to access another FTP server, just replace the machine name accordingly.)

If you are running the CygWin ftp, just type the same command into your shell.

You will then be prompted for your login name and your password. Enter those as usual.

Your next command should be

hash 

This simply increases the amount of feedback you get about the progress made during file transfers.

Before actually transferring files, you must decide whether to use binary or text file transfer. If you want binary transfers, give the command

binary

and if you want text transfers, give the command

ascii 

You can switch back and forth between these modes as necessary if you are transferring multiple files, some text and some binary.

Now you can use the commands cd, pwd, and ls to navigate the Unix directory structure as if you were in the shell. Usually, you will cd to the directory of the remote machine in which you wish to download or upload files, do an ls to see what’s there, and then proceed to transfer the specific files.

You can also change the directory of the local machine in which you will be working by giving the lcd (local cd ) command. Note that the directory you give to this command must make sense in terms of the local machine’s directory structure. For example,

lcd c:\courses\cs252 

for the MS Windows ftp client, but

lcd /cygdrive/c/courses/cs252 

under Linux/OSX/CygWin.

To get a file from the remote FTP server to your local machine, the command is

get filename

To put a file form your local machine onto the remote FTP server, the command is

put filename

Neither the get nor put commands can include wildcards ( * ? ) in the filename, but by changing the commands to mget and mput, you are allowed to use wild cards.[3]

To end your ftp session, the command is

quit 

Here, then, is an example of an ftp session in which three files were downloaded from ~zeil/data on the Unix network into c:\misc\temp on the local PC:

Connected to ftp.cs.odu.edu
220 ProFTPD 1.2.Opre10 Server (ODU CS FTP Server)
User (ftp.cs.odu.edu(none))): zeil
331 Password required for zeil.
Password:
230 User zeil logged in
ftp> hash
Hash mark printing On  ftp: (2048 bytes/hash mark) .
ftp> lcd c:\misc\temp
Local directory now C:\misc\temp
ftp> cd data
250 CWD command successful
ftp> ls
200 Port Command successful
Directory of /home/zeil/data
file1.txt
file2.dat
file3.txt
file4.dat
226 Transfer Complete
ftp> ascii
200 Type set to A.
ftp> get file1.txt
200 PORT command successful
150 Opening ASCII mode data connection for file1.txt (20101 bytes).
#########
226 Transfer complete
ftp: 20627 bytes received in 1.17 seconds
ftp> binary
200 Type set to I.
ftp: mget *.dat
200 Type set to I.
mget: file2.dat? y
200 PORT command successful
150 Opening BINARY mode data connection for file2.dat (1001 bytes).

226 Transfer complete
ftp: 1001 bytes received in .23 seconds
mget: file4.dat? y
200 PORT command successful
150 Opening BINARY mode data connection for file4.dat (2123 bytes).
#
226 Transfer complete
ftp: 2123 bytes received in .35 seconds
ftp> quit

(You may notice that, during the ASCII mode transfer, the number of bytes transferred was larger than the size of the file. That’s because, in an ASCII transfer from a Unix machine to a Windows machine, each new-line character is replaced by a carriage-return character followed by a new-line.)

FTP via GUI Interface

GUI-based FTP clients differ considerably in the detail of how they operate. If you use one of these, you will have to rely on its own built-in help or its source website to learn how to use it. Here, we will have to settle for an example of one such client.

WinSCP is shown here. It is fairly typical of GUI-based FTP clients, showing two window “panes”, side by side. One shows directories and files on your local machine. The other shows directories and files on a remote machine to which you have connected. You can transfer files from one machine to the other by dragging and dropping file icons from one pane to the other.

4. Secure File Transfer: sftp and scp

FTP is one of the older services on the internet. It works, but has some limitations. Some people may encounter issues using it from systems behind very aggressive firewalls. It also is not very secure. Not only is everything you transfer sent in a plain, unencoded format, but even your login name and password are sent in plain-text.

SFTP (Secure File Transfer Protocol) is a more modern variant that encodes your entire session. It is built upon the secure ssh service, and therefore shares ssh’s ability to tunnel out through most reasonably configured firewalls. And, if you know how to use ftp, then you pretty much know how to use sftp as well. Just give the command

sftp machinename

instead of

ftp machinename

After that, things go pretty much the same as in text-mode ftp. Alternatively, some programs provide GUI-based interfaces to sftp. The WinSCP program shown earlier, for example, can also be used for sftp.

There are just a few differences to be aware of:

As a general rule, I tend to use sftp instead of ftp unless I need to access a pool of anonymous ftp files.

Another secure alternative to ftp is scp , which you can think of as an attempt to extend the normal Unix cp command to work across networks. The basic format of an scp command is

scp loginName1@machine1:file1 loginName1@machine1:file1

to copy a file from one machine to another.

For example, from my home Linux machine, if I wanted to grab a copy of my .emacs file from my home directory on atria.cs.odu.edu , I might say:

scp zeil@atria.cs.odu.edu:/home/zeil/.emacs myAtria.emacs

Personally, I seldom use command-line scp because the paths on the remote machine tend to get long and, unlike paths on your local machine, you cannot use the Tab-key to complete file and directory names after typing the first few characters. I generally use sftp instead. Many command-line sftp clients do tab completion on the remote files. Even if they do not, the built-in ls command to list the current directory o nthe remote machine makes it easy to copy-and-paste long file names.

Of course, just as with ftp, you can also eschew the command-line entirely and get a graphical scp/sftp client that lests you transfer files by dragging them between local and remote windows. See the Library page for suggestions.

5. Problems and Inconsistencies

If you don’t know whether to use binary or text transfer mode, try binary first.

If, however, you have transferred files to a Unix system and discover them to be full of ^M characters (you can see this by viewing the file in emacs ) of if you use the file command and it reports that the file is ASCII text with (Windows-style) CRLF line terminators, this is a sign that you should have used text mode. You can still recover, however, by using the command tr:

tr -d '\r' < file1 > file2

to produce a new file file2 from file2 by converting the line ends to the Unix format.

You can do much the same via the command dos2unix:

dos2unix file1

You can also prepare a text file for transfer to a Windows system with unix2dos:

unix2dos file1

Be sure to check your file with the file command before using dos2unix or unix2dos. If the file is not, in truth, ASCII text, these command will likely leave you with a badly corrupted file.


  1. The Ctrl key label on most keyboards is an abbreviation for “control”. Most of the control characters can be typed by holding down the Ctrl key while typing the ASCII character 96 places higher in the ASCII character set. For example, the TAB or Horizontal Tab character has code 9. You can type it by holding down Ctrl while typing the key corresponding to ASCII code 9+96=106, the ‘i’. Of course, that is probably what ou get by using the Tab key on the keyboard as well.  ↩

  2. Some FTP servers actually check to see if the password supplied for an anonymous login appears to be a legitimate email address (e.g., that it contains an ‘@’ character somewhere, with at least one ‘.’ after that).  ↩

  3. mget and mput will ask you, for each file matching the pattern you give, whether or not you really want to transfer it. If you are really sure you want to transfer every file matching the pattern, you can use the prompt command to turn this behavior off.  ↩