Delete from FTP using Azure Data Factory
During development of an ETL process in Azure Data Factory, you may come across a need to delete the data from FTP. It may be neccessary e.g. when you want to make place for new incoming data or you don’t want your pipeline to pull the old data twice. The solution of the problem lays in a custom activity.
Hey, I noticed that many people start to use this tutorial, so I have created a new, easier to follow version.
To create a custom activity you have to follow the steps from the Microsoft’s tutorial, which is here..
To delete the files from FTP or clean a certain catalog, you may use this C# code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
using System.IO; using System.Collections.Generic; using Microsoft.Azure.Management.DataFactories.Models; using Microsoft.Azure.Management.DataFactories.Runtime; using System.Net; using System.Linq; namespace DeleteFromFTP { public class DeleteFromFTP : IDotNetActivity { public IDictionary<string, string> Execute( IEnumerable linkedServices, IEnumerable datasets, Activity activity, IActivityLogger logger) { //Creating refrences for the pipeline DotNetActivity dotNetActivity = (DotNetActivity)activity.TypeProperties; string Path = dotNetActivity.ExtendedProperties["Path"]; string IP = dotNetActivity.ExtendedProperties["IP"]; string Login = dotNetActivity.ExtendedProperties["Login"]; string Password = dotNetActivity.ExtendedProperties["Password"]; AzureStorageLinkedService linkedService = linkedServices.First(ls => ls.Name == dotNetActivity.PackageLinkedService).Properties.TypeProperties as AzureStorageLinkedService; DeleteFTPDirectory(Path+"/", IP, Login, Password); return new Dictionary<string, string>(); } public static List DirectoryListing(string Path, string ServerAdress, string Login, string Password) { FtpWebRequest request = (FtpWebRequest)WebRequest.Create("ftp://" + ServerAdress + Path); request.Credentials = new NetworkCredential(Login, Password); request.Method = WebRequestMethods.Ftp.ListDirectory; FtpWebResponse response = (FtpWebResponse)request.GetResponse(); Stream responseStream = response.GetResponseStream(); StreamReader reader = new StreamReader(responseStream); List result = new List(); while (!reader.EndOfStream) { result.Add(reader.ReadLine()); } reader.Close(); response.Close(); return result; } public static void DeleteFTPFile(string Path, string ServerAdress, string Login, string Password) { FtpWebRequest clsRequest = (System.Net.FtpWebRequest)WebRequest.Create("ftp://" + ServerAdress + Path); clsRequest.Credentials = new System.Net.NetworkCredential(Login, Password); clsRequest.Method = WebRequestMethods.Ftp.DeleteFile; string result = string.Empty; FtpWebResponse response = (FtpWebResponse)clsRequest.GetResponse(); long size = response.ContentLength; Stream datastream = response.GetResponseStream(); StreamReader sr = new StreamReader(datastream); result = sr.ReadToEnd(); sr.Close(); datastream.Close(); response.Close(); } public static void DeleteFTPDirectory(string Path, string ServerAdress, string Login, string Password) { FtpWebRequest clsRequest = (System.Net.FtpWebRequest)WebRequest.Create("ftp://" + ServerAdress + Path); clsRequest.Credentials = new System.Net.NetworkCredential(Login, Password); List filesList = DirectoryListing(Path, ServerAdress, Login, Password); foreach (string file in filesList) { DeleteFTPFile(Path + file, ServerAdress, Login, Password); } } } } |
After compiling the code and uploading it to blob, the pipeline activity will look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
{ "name": "Delete_from_FTP", "properties": { "description": "Activity to delete files from a folder", "activities": [ { "type": "DotNetActivity", "typeProperties": { "assemblyName": "DeleteFromFTP.dll", "entryPoint": "DeleteFromFTP.DeleteFromFTP", "packageLinkedService": , "packageFile": , "extendedProperties": { "Path": , "IP": , "Login": , "Password": , } }, "inputs": [ { "name": "Input Dataset } ], "outputs": [ { "name": "Output Dataset" } ], "policy": { "timeout": "01:00:00", "concurrency": 1, "longRetry": 10, "longRetryInterval": "01:00:00" }, "scheduler": { "frequency": "Day", "interval": 1 }, "name": "Delete_from_FTP", "linkedServiceName": "AzureBatchLinkedService" } ], "isPaused": false, "hubName": "Your_hub_name", "pipelineMode": "Scheduled" } } |
If you don’t want to include the login and the password in the activity, you will have to do so in C# code.
A simple and tested code 🙂