GCP13 :: dataflow Map 활용, Lambda

Lambda 를 사용하기 전

import apache_beam as beam

def strip_header_and_newline(text):
  return text.strip('\n')
def strip_header_and_newline2(text):
  return text.strip('#')

with beam.Pipeline() as pipeline:
  plants = (
      pipeline
      | 'Gardening plants' >> beam.Create([
          '# 🍓Strawberry\n',
          '# 🥕Carrot\n',
          '# 🍆Eggplant\n',
          '# 🍅Tomato\n',
          '# 🥔Potato\n',
      ])
      | 'Strip header' >> beam.Map(strip_header_and_newline)
      #| 'Strip header' >> beam.Map(lambda newline : newline.strip('\n'))
      | '# header' >> beam.Map(strip_header_and_newline2)
      | beam.Map(print))

#result

🍓Strawberry
🥕Carrot
🍆Eggplant
🍅Tomato
🥔Potato

Lambda를 사용한 후

import apache_beam as beam

def strip_header_and_newline(text):
  return text.strip('\n')
def strip_header_and_newline2(text):
  return text.strip('#')

with beam.Pipeline() as pipeline:
  plants = (
      pipeline
      | 'Gardening plants' >> beam.Create([
          '# 🍓Strawberry\n',
          '# 🥕Carrot\n',
          '# 🍆Eggplant\n',
          '# 🍅Tomato\n',
          '# 🥔Potato\n',
      ])
      | 'Strip header' >> beam.Map(lambda newline : newline.strip('\n'))
      | '# header' >> beam.Map(lambda newline_2 : newline_2.strip('#'))
      | beam.Map(print))

#result

🍓Strawberry
🥕Carrot
🍆Eggplant
🍅Tomato
🥔Potato

추가적으로 FlatMap()의 경우는 1:N의 관계를 가지고 있어서

import apache_beam as beam

def split_words(text):
  return text.split(',')

with beam.Pipeline() as pipeline:
  plants = (
      pipeline
      | 'Gardening plants' >> beam.Create([
          '🍓Strawberry,🥕Carrot,🍆Eggplant',
          '🍅Tomato,🥔Potato',
      ])
      | 'Split words' >> beam.FlatMap(lambda line : line.split(','))
      | beam.Map(print))

경우에도 첫째줄에 3개의 인자의 경우에 대해 , 로 모두 Split 할 수 있다.

#result

🍓Strawberry
🥕Carrot
🍆Eggplant
🍅Tomato
🥔Potato

'GCP' 카테고리의 다른 글

데이터 스튜디오 이용해 bigQuery 시각화 (0)	2020.08.20
GCP14 :: streaming 데이터 처리 (0)	2020.08.18
GCP12 :: Apache Beam, Map() vs FlatMap() (0)	2020.08.15
GCP11 :: Google Cloud Platform 명령어 정리 (0)	2020.08.15
GCP10 :: Dataflow 정리 (0)	2020.08.14

GCP13 :: dataflow Map 활용, Lambda

'GCP' 카테고리의 다른 글

댓글

이 글 공유하기

티스토리툴바

'GCP' 카테고리의 다른 글

댓글

이 글 공유하기

다른 글

데이터 스튜디오 이용해 bigQuery 시각화

GCP14 :: streaming 데이터 처리

GCP12 :: Apache Beam, Map() vs FlatMap()

GCP11 :: Google Cloud Platform 명령어 정리

티스토리툴바