阿里云 Serverless Computing + 关注
手机版

Guidelines for Function Compute Development - Crawler

  1. 云栖社区>
  2. 阿里云 Serverless Computing>
  3. 博客>
  4. 正文

Guidelines for Function Compute Development - Crawler

tanhe123 发布时间:2019-01-11 16:46:21 浏览268 评论0

摘要: The Guidelines for Function Compute Development - Use Fun Local for Local Running and Debugging briefly describes how to use Fun Local for the local running and debugging of functions.

The Guidelines for Function Compute Development - Use Fun Local for Local Running and Debugging briefly describes how to use Fun Local for the local running and debugging of functions. It does not focus on Fun Local's significant efficiency improvements to Function Compute development.

This document uses the development of a simple crawler function as an example (for the codes, see Function Compute Console Template). It demonstrates how to develop a serverless crawler application that features auto scaling and charges by the number of calls.

Procedure

We develop the crawler application in multiple steps. Upon completion of each step, we will perform run verification.

1. Create a Fun project

Create a directory named image-crawler as the root directory of the project.In the directory, create a file named template.yml with the following content:

ROSTemplateFormatVersion: '2015-09-01'
Transform: 'Aliyun::Serverless-2018-04-03'
Resources:
  localdemo:
    Type: 'Aliyun::Serverless::Service'
    Properties:
      Description: 'local invoke demo'
    image-crawler:
      Type: 'Aliyun::Serverless::Function'
      Properties:
        Handler: index.handler
        CodeUri: code/
        Description: 'Hello world with python2.7!'
        Runtime: python2.7

For more information about the serverless application model defined in Fun, click here.

After the preceding settings, the project directory structure is as follows:

.
└── template.yml

2. Write the Hello World function code

In the root directory, create a directory named code. In the code directory, create a file named index.py containing the Hello World function:

def handler(event, context):
    return 'hello world!'

In the root directory, run the following command:

fun local invoke image-crawler

The function runs successfully:

After the preceding settings, the project directory structure is as follows:

.
├── code
│   └── index.py
└── template.yml

3. Run the function through a trigger event

Modify the code in step 2 and print this event into the log.

import logging

logger = logging.getLogger()

def handler(event, context):
    logger.info("event: " + event)
    return 'hello world!'

Run the function through a trigger event. The following result is returned.

As we can see, the function receives the trigger event properly.

For more Fun Local help information, see.

4. Obtain the web source code

Next, write the code to obtain the web content.

import logging
import json
import urllib

logger = logging.getLogger()

def handler(event, context):
    logger.info("event: " + event)
    evt = json.loads(event)
    url = evt['url']
  
    html = get_html(url)
  
    logger.info("html content length: " + str(len(html)))
    return 'Done!'

def get_html(url):
    page = urllib.urlopen(url)
    html = page.read()
    return html

Because the code logic is simple, we directly use the Urllib library to read the web content.

Run the function. The following result is returned:

5. Parse the images on the webpage

Here, we will parse JPG images on the webpage by using regular expressions. This step is complex because it involves minor adjustments to the regular expressions.To solve the issue quickly, we decide to use the local debugging method provided by Fun Local.For more information about the local debugging method, see Guidelines for Function Compute Development - Use Fun Local for Local Running and Debugging.

First, set a breakpoint in the following line:

logger.info("html content length: " + str(len(html)))

Then, start the function in debugging mode. When VS Code is connected, the function continues running to the line with the breakpoint we set:

Click Locals. We can see local variables, including html, the html source code we obtained.Copy the value of the html variable, analyze it, and then design a regular expression.

Write a simple regular expression, for example, http:\/\/[^\s,"]*\.jpg.

How can we quickly verify that the code is correct?We can use the Watch (monitoring) function provided by VS Code for this purpose.

Create a Watch variable and enter the following value:

re.findall(re.compile(r'http:\/\/[^\s,"]*\.jpg'), html)

Press Enter. The following result is returned:

We may modify the regular expression and test it over and over again until we get it right.

Add the correct image parsing logic to the code:

reg = r'http:\/\/[^\s,"]*\.jpg'
imgre = re.compile(reg)

def get_img(html):
    return re.findall(imgre, html)

Call the logic in the handler method:

def handler(event, context):
    logger.info("event: " + event)
    evt = json.loads(event)
    url = evt['url']
  
    html = get_html(url)
  
    img_list = get_img(html)
    logger.info(img_list)
  
    return 'Done!'

After the code is written, run the code locally to verify the result:

echo '{"url": "https://image.baidu.com/search/index?tn=baiduimage&word=%E5%A3%81%E7%BA%B8"}' \
    | fun local invoke image-crawler

As we can see, values of img_list have been generated on the console:

6. Upload images to an OSS instance

We will store the parsed images on the OSS instance.

First, use environment variables to configure OSS Endpoint and OSS Bucket.

Configure the environment variables in the template (OSS Bucket must be created in advance):

EnvironmentVariables:
    OSSEndpoint: oss-cn-hangzhou.aliyuncs.com
    BucketName: fun-local-test

Then, obtain the two environment variables in the function:

endpoint = os.environ['OSSEndpoint']
bucket_name = os.environ['BucketName']

When running a function, Fun Local provides an additional variable to indicate that the function is running locally.This allows us to perform certain localized operations. For example, we can connect ApsaraDB for RDS for online running and MySQL for local running.

Here, we use the indicator variable to create an OSS client in different ways. This is because the Access Key obtained for AssumeRole by using credentials is a temporary key for online running, while this restriction does not apply to local running.Use either of the two methods provided by OSS to create an OSS client:

creds = context.credentials

if (local):
    auth = oss2.Auth(creds.access_key_id,
                     creds.access_key_secret)
else:
    auth = oss2.StsAuth(creds.access_key_id,
                        creds.access_key_secret,
                        creds.security_token)
                        
bucket = oss2.Bucket(auth, endpoint, bucket_name)

Traverse all images and upload all of them to the OSS instance:

count = 0
for item in img_list:
    count += 1
    logging.info(item)
    # Get each picture
    pic = urllib.urlopen(item)
    # Store all the pictures in oss bucket, keyed by timestamp in microsecond unit
    bucket.put_object(str(datetime.datetime.now().microsecond) + '.png', pic)  

Run the function locally:

echo '{"url": "https://image.baidu.com/search/index?tn=baiduimage&word=%E5%A3%81%E7%BA%B8"}' \
    | fun local invoke image-crawler

From the logs, we can see that the images are parsed one by one and then uploaded to the OSS instance.

After logging on to the OSS console, we can see the images.

Deployment

After local development, we must launch the service online, so it can be called.Fun simplifies the whole process, including logging on to the console, creating services and functions, configuring environment variables, and creating roles.

Local running differs from online running in terms of how we authorize Function Compute to access the OSS instance.To authorize Function Compute to access the OSS instance, add the following configuration to the template.yml file (for more information about Policies, see):

Policies: AliyunOSSFullAccess

Then, the content of the template.yml file is as follows:

ROSTemplateFormatVersion: '2015-09-01'
Transform: 'Aliyun::Serverless-2018-04-03'
Resources:
  localdemo:
    Type: 'Aliyun::Serverless::Service'
    Properties:
      Description: 'local invoke demo'
      Policies: AliyunOSSFullAccess
    image-crawler:
      Type: 'Aliyun::Serverless::Function'
      Properties:
        Handler: index.handler
        CodeUri: code/
        Description: 'Hello world with python2.7!'
        Runtime: python2.7
        EnvironmentVariables:
          OSSEndpoint: oss-cn-hangzhou.aliyuncs.com
          BucketName: fun-local-test

Next, run the fun deploy command. Logs indicating successful deployment are displayed.

Verification

Verify the deployment on the Function Compute console

Log on to the Function Compute console. We can see that the services, functions, code, and environment variables are ready.

Write the JSON code used for verification into the trigger event and trigger the event:

The returned result is the same as for local running:

Verify the deployment by running a fcli command

For fcli help documentation, see.

On the terminal, run the following command to obtain the function list:

fcli function list --service-name localdemo

As we can see, image-crawler has been created.

{
  "Functions": [
    "image-crawler",
    "java8",
    "nodejs6",
    "nodejs8",
    "php72",
    "python27",
    "python3"
  ],
  "NextToken": null
}

Run the following command to call the function:

fcli function invoke --service-name localdemo \
    --function-name image-crawler \
    --event-str '{"url": "https://image.baidu.com/search/index?tn=baiduimage&word=%E5%A3%81%E7%BA%B8"}'

After it runs successfully, the returned result is the same as the console and Fun Local.

Conclusion

We have now completed the development process.The source code in this document is hosted on GitHub Repo.

This document shows how to use the local running and debugging capabilities of Fun Local to develop a function locally and run the function repeatedly to get feedback and facilitate code iteration.

By running the fun deploy command, we can deploy the function developed locally to the cloud and obtain the expected results without any modification to the code.

The method described in this document is only one of the function development methods in Function Compute.This document intends to show developers that the proper development of functions in Function Compute can be a smooth and enjoyable process.We hope you have fun with Fun.

Note

This article was translated from 《开发函数计算的正确姿势 —— 爬虫》.

【云栖快讯】云栖专辑 | 阿里开发者们的第20个感悟:好的工程师为人写代码,而不仅是为编译器  详情请点击

网友评论